Merge lp:~leonardr/launchpad/test-representation-cache into lp:launchpad
| Status: | Merged |
|---|---|
| Merged at revision: | 11011 |
| Proposed branch: | lp:~leonardr/launchpad/test-representation-cache |
| Merge into: | lp:launchpad |
| Diff against target: |
323 lines (+220/-10) 8 files modified
lib/canonical/database/sqlbase.py (+7/-0) lib/canonical/launchpad/pagetests/webservice/cache.txt (+55/-0) lib/canonical/launchpad/zcml/webservice.zcml (+5/-0) lib/lp/services/memcache/client.py (+3/-1) lib/lp/services/memcache/doc/restful-cache.txt (+102/-0) lib/lp/services/memcache/restful.py (+39/-0) lib/lp/services/memcache/tests/test_doc.py (+8/-8) versions.cfg (+1/-1) |
| To merge this branch: | bzr merge lp:~leonardr/launchpad/test-representation-cache |
| Related bugs: |
| Reviewer | Review Type | Date Requested | Status |
|---|---|---|---|
| Stuart Bishop | 2010-06-01 | Approve on 2010-06-02 | |
| Brad Crittenden (community) | code | 2010-06-01 | Approve on 2010-06-01 |
|
Review via email:
|
|||
Description of the Change
This branch integrates version 0.9.27 of lazr.restful to create a memcached-based cache for JSON representations of entries. In tests with a populated cache, (https:/
lazr.restful takes care of putting representations in the cache and retrieving them at appropriate times. This code takes care of invalidating the cache when a Storm object changes, and turning a Storm object into a unique cache key.
I plan to make one more change to this branch: make it easy to for the LOSAs to turn the cache off if it turns out to cause problems in production. I'm not really sure how to do this, so I'm presenting the branch as is as I work on this other problem.
| Stuart Bishop (stub) wrote : | # |
On Tue, Jun 1, 2010 at 11:12 PM, Leonard Richardson
<email address hidden> wrote:
> === modified file 'lib/canonical/
> + def __storm_
> + """Invalidate the web service cache."""
> + cache = getUtility(
> + cache.delete(self)
This looks like a decent way of hooking in. Unfortunatly, SQLBase is
legacy - used by objects providing the SQLObject compatibility layer.
Some of our newer classes use the Storm base class directly. If there
is no way to register an on-flush hook with Storm, you will need to
create a Launchpad specific subclass of storm.Storm that provides this
hook, and update classes using the Storm base class directly to use
LPStorm.
The cache can become stale by:
1) Raw SQL, at psql command line or executed by appservers or scripts.
2) Stored procedures
3) Store.remove()
4) Using Storm for bulk updates, inserts or deletions.
5) Scripts or appservers running with non-identical [memcache] config section.
6) Scripts or appservers unable to contact memcached servers
7) Network glitches
We should probably ignore the last three. Are the first four a problem
though for the webservice? If it doesn't really matter, fine.
Otherwise we might need some ON DELETE and ON UPDATE database triggers
to keep things in sync. Hopefully this is unnecessary, as it could
will kill our performance gain.
=== added file 'lib/canonical/
+The cache starts out empty.
+
+ >>> print cache.get_
+ None
+
+Retrieving a representation of an object populates the cache.
+
+ >>> ignore = webservice.
+
+ >>> cache.get_
+ '{...}'
Is webservice.get making a real connection to the webservice here? I
am curious if the cache is populated when the object is retrieved,
rather than when the transaction that retrieved the object commits.
=== added file 'lib/lp/
> +An object's cache key is derived from its Storm metadata: its database
> +table name and its primary key.
> + >>> cache_key = cache.key_for(
> + ... person, 'media/type', 'web-service-
> + >>> print person.id, cache_key
> + 29 Person(
Do we need the LPCONFIG here too? Or is it ok for edge & production to mix?
Hmm... perhaps IMemcacheClient should grow a real API and
automatically prepend the LPCONFIG - not sure if have a use case for
edge and production sharing data, but it is convenient to have them
share the same physical Memcached servers.
> +When a Launchpad object is modified, its JSON representations for
> +recognized web service versions are automatically removed from the
> +cache.
> +
> + >>> person.addressline1 = "New address"
> + >>> from canonical.
> + >>> syncUpdate(person)
> +
> + >>> print cache.get(person, json_type, "beta", default="missing")
> + missing
> +
> + >>> print cache.get(person, json_type, "1.0", default="missing")
> + missing
Should we document the cases where this doesn't happen, or is this
irrelevant to the webservice?
...
| Leonard Richardson (leonardr) wrote : | # |
> 1) Raw SQL, at psql command line or executed by appservers or scripts.
> 2) Stored procedures
> 3) Store.remove()
> 4) Using Storm for bulk updates, inserts or deletions.
> 5) Scripts or appservers running with non-identical [memcache] config section.
> 6) Scripts or appservers unable to contact memcached servers
> 7) Network glitches
>
> We should probably ignore the last three. Are the first four a problem
> though for the webservice? If it doesn't really matter, fine.
> Otherwise we might need some ON DELETE and ON UPDATE database triggers
> to keep things in sync. Hopefully this is unnecessary, as it could
> will kill our performance gain.
Certainly the first four are a problem. So far we have yet to come to
terms with performance improvements that involve serving stale data from
the web service (or allowing users to use the possibly stale data they
already have).
Can we put the invalidation code in the ON DELETE/ON UPDATE trigger,
rather than in the Storm trigger? Would it hurt performance to just move
the invalidation down a layer?
> + >>> ignore = webservice.
> +
> + >>> cache.get_
> + '{...}'
>
> Is webservice.get making a real connection to the webservice here? I
> am curious if the cache is populated when the object is retrieved,
> rather than when the transaction that retrieved the object commits.
An HTTP request is being constructed and through Python machinations it
is invoking the web service code. But nothing's going over the network.
The cache is populated by the web service code during the GET handling.
Cache population shouldn't have anything to do with the database.
I hope this answers your question.
> > + >>> cache_key = cache.key_for(
> > + ... person, 'media/type', 'web-service-
> > + >>> print person.id, cache_key
> > + 29 Person(
>
> Do we need the LPCONFIG here too? Or is it ok for edge & production to mix?
>
> Hmm... perhaps IMemcacheClient should grow a real API and
> automatically prepend the LPCONFIG - not sure if have a use case for
> edge and production sharing data, but it is convenient to have them
> share the same physical Memcached servers.
Aaaah, it's absolutely not OK for edge and production to mix data. The
JSON representations include http: links to one server or the other. I
can fix this in the branch by tacking the instance name on to the cache
key.
> Should we document the cases where this doesn't happen, or is this
> irrelevant to the webservice?
>
> store.execute(
> Person.
I'd say that's relevant, and another argument for putting the cache
invalidation in a database hook rather than an ORM hook.
>
> > === added file 'lib/lp/
>
> > +class MemcachedStormR
> > + """Caches lazr.restful representations of Storm objects in memcached."""
> > +
> > + def __init__(self):
> > + self.client = memcache_
>
> You should be using getUtility(
> end...
| Stuart Bishop (stub) wrote : | # |
On Wed, Jun 2, 2010 at 7:28 PM, Leonard Richardson
<email address hidden> wrote:
>> 1) Raw SQL, at psql command line or executed by appservers or scripts.
>> 2) Stored procedures
>> 3) Store.remove()
>> 4) Using Storm for bulk updates, inserts or deletions.
>> 5) Scripts or appservers running with non-identical [memcache] config section.
>> 6) Scripts or appservers unable to contact memcached servers
>> 7) Network glitches
>>
>> We should probably ignore the last three. Are the first four a problem
>> though for the webservice? If it doesn't really matter, fine.
>> Otherwise we might need some ON DELETE and ON UPDATE database triggers
>> to keep things in sync. Hopefully this is unnecessary, as it could
>> will kill our performance gain.
>
> Certainly the first four are a problem. So far we have yet to come to
> terms with performance improvements that involve serving stale data from
> the web service (or allowing users to use the possibly stale data they
> already have).
>
> Can we put the invalidation code in the ON DELETE/ON UPDATE trigger,
> rather than in the Storm trigger? Would it hurt performance to just move
> the invalidation down a layer?
We can put the invalidation code in a trigger.
It will certainly hurt performance. How much, I'm not sure. It also
depends on how much effort we put into minimizing it. The stored
procedure will need to be invoked for every row updated or deleted.
The simplest approach would be a Python stored procedure for the
trigger that invalidates the cache in band. This would be huge
overhead, both the Python overhead and waiting for network round
trips.
A better approach would be for the trigger to notify another process
of the keys that need invalidating, and have that process do it
asynchronously. This would be less overhead than our existing Slony-I
replication triggers. There would be some lag.
An alternative, which I'd need to investigate, would be to piggy back
on the existing Slony-I replication information. Whenever a row is
updated or deleted, we generate events containing this information so
the changes can be applied on the subscriber database nodes. There
would be some lag here too, but no measurable database impact.
>> + >>> ignore = webservice.
>> +
>> + >>> cache.get_
>> + '{...}'
>>
>> Is webservice.get making a real connection to the webservice here? I
>> am curious if the cache is populated when the object is retrieved,
>> rather than when the transaction that retrieved the object commits.
>
> An HTTP request is being constructed and through Python machinations it
> is invoking the web service code. But nothing's going over the network.
>
> The cache is populated by the web service code during the GET handling.
> Cache population shouldn't have anything to do with the database.
>
> I hope this answers your question.
I'm wondering if it is possible that the cache gets populated, but the
webservice request fails, causing an invalid value to get stored. If
we don't populate the cache on updates then it should be fine.
>> > === added file 'lib/lp/
>>
>> > +class MemcachedStormR
| Leonard Richardson (leonardr) wrote : | # |
> A better approach would be for the trigger to notify another process
> of the keys that need invalidating, and have that process do it
> asynchronously. This would be less overhead than our existing Slony-I
> replication triggers. There would be some lag.
>
> An alternative, which I'd need to investigate, would be to piggy back
> on the existing Slony-I replication information. Whenever a row is
> updated or deleted, we generate events containing this information so
> the changes can be applied on the subscriber database nodes. There
> would be some lag here too, but no measurable database impact.
I'm +1 on either of these ideas. The only downside is that the invalidation algorithm will become more complex over time, as I make lazr.restful cache more things and as we create more versions of the Launchpad web service.
Here's what I mean. Right now when User(29,) is invalidated we need to remove the following three cache keys:
User(29,
User(29,
User(29,
That's one key for every web service version. We need to have access to a list of the currently active versions, either managed separately or obtained from getUtility(
You also brought up the fact that all our Launchpad instances share memcached hardware. So we need to change Launchpad's key generation to include the instance name. And we need to change the invalidation code to have a list of all instance names, and to invalidate these nine keys:
User(29,
User(29,
User(29,
User(29,
User(29,
User(29,
User(29,
User(29,
User(29,
In the future we may change lazr.restful to cache the WADL and/or HTML representations as well as the JSON representation. In that case there could be up to 27 cache keys for User(29,).
This isn't unmanageable, but it would be much simpler if we could tell memcached to delete "User(29,).*"
> I'm wondering if it is possible that the cache gets populated, but the
> webservice request fails, causing an invalid value to get stored. If
> we don't populate the cache on updates then it should be fine.
I'm pretty sure the cache is populated right before the representation is returned to the client. If there's an error it will happen during the representation creation. If for some reason an error happens afterwards it doesn't change the fact that the representation is accurate. The cache is certainly not populated on updates. So I think we're OK.

Hi Leonard,
Thanks for this branch -- it looks very promising. A 4x increase would be great.
It looks like you haven't pushed your changes to the download cache up yet so lazr.restful 0.9.27 isn't available. Be sure to do that.
You define but don't use 'json_type' in webservice/ cache.txt.
The __all__ in memcache/client.py should be on multiple lines.
It looks like there are some valid complaints from 'make lint'. Please run it and clean up the ones that are real.