Overzealous Vary header prevents clients from caching service root representation

Bug #521927 reported by Martin Pool
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
High
Unassigned

Bug Description

using launchpadlib 1.5.4 from lucid, constructing a Launchpad instance apparently makes the same http call twice, adding about 3-5s to the startup time of any launchpadlib-using application:

loaded existing credentials
> /home/mbp/hydrazine/trunk/hydrazine/__init__.py(39)create_session()
-> return Launchpad(credentials, service_root,
(Pdb) n
> /home/mbp/hydrazine/trunk/hydrazine/__init__.py(40)create_session()
-> lplib_cachedir)
(Pdb)
send: 'GET /beta/ HTTP/1.1\r\nHost: api.edge.launchpad.net\r\nAccept-Encoding: identity\r\nte: deflate, gzip\r\nAuthorization: OAuth realm="OAuth", oauth_nonce="61057780", oauth_timestamp="1266187167", oauth_consumer_key="Hydrazine", oauth_signature_method="PLAINTEXT", oauth_version="1.0", oauth_token="zzzzzz", oauth_signature="zzzzzzzzz"\r\naccept: application/vnd.sun.wadl+xml, application/vd.sun.wadl+xml\r\nuser-agent: Python-httplib2/$Rev$\r\n\r\n'
reply: 'HTTP/1.1 200 Ok\r\n'
header: Date: Sun, 14 Feb 2010 22:39:33 GMT
header: Server: zope.server.http (HTTP)
header: X-Powered-By: Zope (www.zope.org), Python (www.python.org)
header: Content-Type: application/vnd.sun.wadl+xml
header: Content-Length: 1099105
header: Etag: "28a2b78bfdbb9586f98a63ac248882bbcdcfde37"
header: Vary: Cookie,Authorization,Accept
header: Via: 1.1 wildcard.edge.launchpad.net
send: 'GET /beta/ HTTP/1.1\r\nHost: api.edge.launchpad.net\r\nAccept-Encoding: identity\r\nte: deflate, gzip\r\nAuthorization: OAuth realm="OAuth", oauth_nonce="52211659", oauth_timestamp="1266187191", oauth_consumer_key="Hydrazine", oauth_signature_method="PLAINTEXT", oauth_version="1.0", oauth_token="zzzzzzzz", oauth_signature="zzzzzzz"\r\naccept: application/json\r\nuser-agent: Python-httplib2/$Rev$\r\n\r\n'
reply: 'HTTP/1.1 200 Ok\r\n'
header: Date: Sun, 14 Feb 2010 22:39:52 GMT
header: Server: zope.server.http (HTTP)
header: X-Powered-By: Zope (www.zope.org), Python (www.python.org)
header: Content-Type: application/json
header: Content-Length: 913
header: Etag: "cebfdee8cd795f9c01cf36e3b216f96d0068c275"
header: Vary: Cookie,Authorization,Accept
header: Via: 1.1 wildcard.edge.launchpad.net
--Return--
> /home/mbp/hydrazine/trunk/hydrazine/__init__.py(40)create_session()-><launchp...x8b784cc>
-> lplib_cachedir)
(Pdb) launchpadlib
*** NameError: name 'launchpadlib' is not defined
(Pdb) import launchpadlib
(Pdb) launchpadlib.__version__
'1.5.4'

Related branches

Revision history for this message
Martin Pool (mbp) wrote :

I guess it's actually getting different content because of the Accept header? Still, it would be nice to do only one round trip.

Revision history for this message
James Westby (james-w) wrote :

Yes, it's getting the WADL that describes the service, and then requesting a JSON
representation of the root resource. I'm not sure how that could be turned in to a single
request.

Thanks,

James

Changed in launchpadlib:
status: New → Invalid
James Westby (james-w)
Changed in launchpadlib:
status: Invalid → New
Revision history for this message
Leonard Richardson (leonardr) wrote :

If you have a cache set up, it should be making conditional requests, and the conditional requests should be returning 304 ("Not Modified") because the service root only changes when we do a new release of launchpadlib. If we added a reasonable Cache-Control header when serving the WADL and JSON, launchpadlib would only try to retrieve these documents, say, once a week. This seems more practical than cramming the WADL and JSON representations into the same response and getting rid of one of the two requests.

(Do you have a cache set up? Was that HTTP dump on a fresh installation?)

Revision history for this message
Leonard Richardson (leonardr) wrote :

Sorry, the service root changes when we do a new release of *launchpad*.

Revision history for this message
Martin Pool (mbp) wrote : Re: [Bug 521927] Re: constructing a sessions unnecessarily makes two http calls
Download full text (3.6 KiB)

On 17 February 2010 03:53, Leonard Richardson
<email address hidden> wrote:
> If you have a cache set up, it should be making conditional requests,
> and the conditional requests should be returning 304 ("Not Modified")
> because the service root only changes when we do a new release of
> launchpadlib. If we added a reasonable Cache-Control header when serving
> the WADL and JSON, launchpadlib would only try to retrieve these
> documents, say, once a week. This seems more practical than cramming the
> WADL and JSON representations into the same response and getting rid of
> one of the two requests.

I suspect a fair amount of time is in starting the connection, so just
sending not modified may not totally fix things, but it would
certainly help, and you're right it would be cleaner than trying to
squash them together. In fact the very crude view given by 'bugclient
--debug' does show some lag while getting the body - and the wadl is
over 1MB(!!) so avoiding that startup would help.

The requests obviously are not sending if-not-modified-since etc and
the server is sending an etag but no date.

send: 'GET /beta/ HTTP/1.1\r\nHost:
api.edge.launchpad.net\r\nAccept-Encoding: identity\r\nte: deflate,
gzip\r\nAuthorization: OAuth realm="OAuth", oauth_nonce="66438850",
oauth_timestamp="1266365634", oauth_consumer_key="Hydrazine",
oauth_signature_method="PLAINTEXT", oauth_version="1.0",
oauth_token="uuuuuuuuuuuuuuu",
oauth_signature="uuuuuuuuuuuuuuuuuuuu"\r\naccept:
application/vnd.sun.wadl+xml,
application/vd.sun.wadl+xml\r\nuser-agent:
Python-httplib2/$Rev$\r\n\r\n'
reply: 'HTTP/1.1 200 Ok\r\n'
header: Date: Wed, 17 Feb 2010 00:23:01 GMT
header: Server: zope.server.http (HTTP)
header: X-Powered-By: Zope (www.zope.org), Python (www.python.org)
header: Content-Type: application/vnd.sun.wadl+xml
header: Content-Length: 1099105
header: Etag: "b81864b7425d7c9d586096a3362cf1f47e8ea99d"
header: Vary: Cookie,Authorization,Accept
header: Via: 1.1 wildcard.edge.launchpad.net
> (Do you have a cache set up? Was that HTTP dump on a fresh
> installation?)

I think I do have a cache; at least my client code is following the
launchpadlib instructions as they were a couple of months ago. You
can see it in lp:hydrazine.

~/.cache/launchpadlib contains

total 5928
-rw-r--r-- 1 mbp 1324 2010-02-17 11:14
api.edge.launchpad.net,beta,-application,json,5600078570afb551b07f3697ec032007
-rw-r--r-- 1 mbp 709 2010-02-17 11:14
api.edge.launchpad.net,beta,pillars-application,json,6a86736a4b082cccc2d62ff7046425b2
-rw-r--r-- 1 mbp 1099573 2010-02-17 11:13
api.edge.launchpad.net,beta,-application,vnd.sun.wadl+xml,8f29db2e0dac01f431816a14846bf440
-rw-r--r-- 1 mbp 1541 2010-02-17 10:58
api.edge.launchpad.net,beta,bzr-svn,+bug,522836-application,json,756aa788a75cb38ec3e0973cfc78ed8a
-rw-r--r-- 1 mbp 4626 2010-02-17 10:58
api.edge.launchpad.net,beta,bzr-application,json,586e8acb7964c1acbc76fee01d8345af
-rw-r--r-- 1 mbp 3642 2010-02-17 10:58
api.edge.launchpad.net,beta,bugs,522836-application,json,a27f0ded3a6d64fe9547b7754f99dff0
-rw-r--r-- 1 mbp 1570 2010-02-17 10:58
api.edge.launchpad.net,beta,bugs,522836,bug_tasks-application,json,725c3fbc2be928972c5...

Read more...

summary: - constructing a sessions unnecessarily makes two http calls
+ launchpadlib doesn't cache wadl; reads 1MB at startup
Revision history for this message
Martin Pool (mbp) wrote : Re: launchpadlib doesn't cache wadl; reads 1MB at startup

confirming; tweak the importance as you see fit

Changed in launchpadlib:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Martin Pool (mbp) wrote :

see also bug 522965

Revision history for this message
Leonard Richardson (leonardr) wrote :

I'm working on this today. lazr.restfulclient has the correct behavior but launchpadlib doesn't.

Revision history for this message
Leonard Richardson (leonardr) wrote :
Download full text (4.8 KiB)

lazr.restfulclient has the correct behavior because the lazr.restful installation used for tests doesn't serve a 'Vary' header. Launchpad does serve a 'Vary' header: "Cookie,Authorization,Accept", and this triggers what I think is a bug in httplib2.

Here's the httplib2 code:

1057 vary_headers = vary.lower().replace(' ', '').split(',')
1058 for header in vary_headers:
1059 key = '-varied-%s' % header
1060 value = info[key]
1061 if headers.get(header, '') != value:
1062 cached_value = None
1063 break

Let me run you through it. When we requested the service root the first time, Launchpad served a Vary header of "Cookie,Authorization,Accept". We cached that header along with the rest of the original response.

Now we're about to request the service root again, and we would like to make it a conditional request. Here's what that Vary header is saying: "If you're about to request this URL, and you're thinking of making it conditional, but the value you're planning to send for 'Cookie' (or Authorization, or Accept) is different from what you cached the first time, don't even bother!"

Why? Well, the simplest example is Accept. Let's say you cache the WADL representation of the service root, and then you're about to make a request for the JSON representation of the service root. Can you make that request conditional? No, because if the server returns 304 Not Modified, you don't have anything on the client side you can use. The cached WADL representation is not a valid substitute for a JSON representation. "Vary: Accept" means "Different media types are not interchangeable."

(In real life I've hacked the cache used by httplib2 to store different representations of the same resource in different files, so this doesn't matter for our purposes. Before the hack, the WADL and JSON representations of a resource were constantly trading places in a single file because of this Vary header. This was the right thing to do in terms of HTTP, but it was useless from a caching perspective.)

OK, now the thing I think is a bug in httplib2. Launchpad serves "Vary: Cookie", but (on the web service) it doesn't serve Cookie in its responses, and lazr.restfulclient never sends Cookie in its requests. So the "Vary: Cookie" thing shouldn't matter, right? Well, take another look at the code.

1059 key = '-varied-%s' % header ...

Read more...

Revision history for this message
Leonard Richardson (leonardr) wrote :

Here's the change to httplib2 that I believe introduced the bug:

http://code.google.com/p/httplib2/issues/detail?id=45

It was put into httplib2 in the 0.6 release, which came out on December 28, a couple months before this bug was filed.

Revision history for this message
Leonard Richardson (leonardr) wrote :

http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.6:

"the cache MUST NOT use such a cache entry to construct a response to the new request unless all of the selecting request-headers present in the new request match the corresponding stored request-headers in the original request."

Since the Cookie header is not in the new request, it doesn't matter what Vary says. (And it wasn't in the original request either, so even if I'm reading that part wrong, the new value "matches" the old value.)

Revision history for this message
Leonard Richardson (leonardr) wrote :

I've filed a bug against httplib2 and included a patch.

http://code.google.com/p/httplib2/issues/detail?id=94

The easiest way to fix this is probably to tell Launchpad to remove "Cookie" from the Accept header when serving web service requests. That's easier than fixing httplib2 in the short run, and removing 'Cookie' should be enough to get this to work.

Revision history for this message
Martin Pool (mbp) wrote : Re: [Bug 521927] Re: launchpadlib doesn't cache wadl; reads 1MB at startup

On 2 April 2010 06:35, Leonard Richardson
<email address hidden> wrote:
> I've filed a bug against httplib2 and included a patch.
>
> http://code.google.com/p/httplib2/issues/detail?id=94
>
> The easiest way to fix this is probably to tell Launchpad to remove
> "Cookie" from the Accept header when serving web service requests.
> That's easier than fixing httplib2 in the short run, and removing
> 'Cookie' should be enough to get this to work.

To me that seems more correct anyhow. The wadl doesn't actually vary
based on cookies or authorization, does it? Does it even vary by
Accept?

--
Martin <http://launchpad.net/~mbp/>

Revision history for this message
Leonard Richardson (leonardr) wrote : Re: launchpadlib doesn't cache wadl; reads 1MB at startup

Yes, removing "Cookie" is more correct. The service root doesn't vary by cookies (not used at all) or authorization, but it does vary by "Accept". The service root has both a JSON and a WADL representation.

Changed in launchpadlib:
status: Confirmed → Fix Committed
affects: launchpadlib → launchpad-foundations
summary: - launchpadlib doesn't cache wadl; reads 1MB at startup
+ Overzealous Vary header prevents clients from caching service root
+ representation
tags: added: api
Revision history for this message
Martin Pool (mbp) wrote : Re: [Bug 521927] Re: launchpadlib doesn't cache wadl; reads 1MB at startup

nice one!

Curtis Hovey (sinzui)
Changed in launchpad-foundations:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.