Merge lp:~leonardr/lazr.restful/representation-cache into lp:lazr.restful
| Status: | Merged |
|---|---|
| Merged at revision: | 130 |
| Proposed branch: | lp:~leonardr/lazr.restful/representation-cache |
| Merge into: | lp:lazr.restful |
| Diff against target: |
861 lines (+581/-40) 10 files modified
src/lazr/restful/NEWS.txt (+10/-0) src/lazr/restful/_operation.py (+11/-12) src/lazr/restful/_resource.py (+88/-16) src/lazr/restful/declarations.py (+11/-4) src/lazr/restful/docs/webservice-declarations.txt (+6/-1) src/lazr/restful/example/base/subscribers.py (+1/-0) src/lazr/restful/example/base/tests/representation-cache.txt (+277/-0) src/lazr/restful/interfaces/_rest.py (+52/-0) src/lazr/restful/simple.py (+124/-6) src/lazr/restful/version.txt (+1/-1) |
| To merge this branch: | bzr merge lp:~leonardr/lazr.restful/representation-cache |
| Related bugs: |
| Reviewer | Review Type | Date Requested | Status |
|---|---|---|---|
| Eleanor Berger (community) | code | 2010-05-24 | Approve on 2010-05-24 |
|
Review via email:
|
|||
Description of the Change
This branch makes it possible to store preconstructed string representations of entries in a cache. If an entry is present in the cache, the preconstructed representation is used (and possibly redacted) rather than generating a new representation.
On its own, this isn't a huge performance improvement. The huge performance improvement comes from collections. If you request a page of collection, and 50% of the entries on that page are present in the cache, 50% of the representations will come from the cache and incorporated into a collection representation. The other half of the representations will be generated placed in the cache, so that if you request that page again, _all_ the entry representations will come from the cache.
In my Launchpad performance tests based on memcached and this lazr.restful branch (https:/
To make this performance win worth the complexity, I had to make the new 'redacted_fields' attribute very very fast. Typically we check whether the user has permission on an attribute by trying to access the attribute and catching an Unauthorized exception. But if the attribute in question is a calculated attribute, accessing it might trigger a database request or something equally slow. We need to use the Zope permission checker directly.
The problem is that the web service doesn't know which field name to pass into the Zope permission checker. If a field's real name is "fooBar" but it's published as "foo_bar" on the web service, the Zope permission checker expects "fooBar" but all the web service has access to is 'foo_bar'.
To get around this problem, I changed the export() declaration to set 'original_name' every time it sets 'as'. In the example above, 'as' will be 'foo_bar', and everything in the web service will call the field 'foo_bar', except for 'redacted_fields', which will look in 'original_name' to find that it needs to pass 'fooBar' into the Zope permission checker.
| Leonard Richardson (leonardr) wrote : | # |
| Leonard Richardson (leonardr) wrote : | # |
Another thing I forgot to mention: although the cache interface has a hook for removing objects from the cache, lazr.restful itself will never call that hook. It's the responsibility of the application to call that hook when the cache needs invalidation.
| Gary Poster (gary) wrote : | # |
Summary: After we verify that this general approach gives real-world benefit, I suspect that we should always populate the cache, not only when there are no redacted fields for the current user. We could do this by stripping the security proxy for getting the initial data and populating the cache, and then doing the same logic that you do now for redacting existing JSON caches to get the actual desired end result.
IRC conversation:
[10:43am] gary_poster: leonardr: in your branch, when there is no cache, did you contemplate always generating it, even if there are redacted fields? Example: if no cache, generate dict of the entire non-redacted version; else if cache and redacted fields, parse out cache to dict; else return cache. (Now we have a non-redacted dict, if we are still here.)
[10:43am] gary_poster: Now, redact dict, turn into JSON, and return. There are variations of that, some of which might be better, but I imagine you get the drift.
[11:01am] leonardr: gary, i'm not sure what the benefit would be
[11:01am] leonardr: also, if there are redacted fields we _cannot_ calculate an unredacted cache due to the security policy
[11:05am] gary_poster: leonardr: the goal would be to create a source for further cache hits. This could be particularly important for objects that frequently have one or more fields redacted. In that case, the cache would rarely or, in the worst case, never be filled (and therefore never or rarely used). Since DB access is the main expense, you discovered, I strongly suspect that loading JSON and redacting will be significantly cheaper than simply creating the JSON.
[11:05am] gary_poster: Also, I'm skeptical of "cannot"; isn't it just a matter of doing the usual work with an unproxied object?
[11:07am] leonardr: yes, we would have to strip the proxy
[11:10am] leonardr: ok, i see what you're saying. we would cache it all the time, whether we were sending a redacted version or not
[11:10am] gary_poster: right
[11:11am] leonardr: i could certainly do that in a future branch. do you know of launchpad objects that typically have redacted fields?
[11:13am] gary_poster: bac would probably know, but he's out. My first guess: anything private, or (perhaps more interesting, perhaps not) anything referring to something provate.
[11:13am] gary_poster: private
[11:14am] leonardr: if an object's url contains private information, a link to that url would be redacted
[11:14am] gary_poster: so, that's an example?
[11:14am] leonardr: but i don't know of any specific launchpad object that does that. it's something to look for
[11:15am] gary_poster: bugs that are marked as security issues
[11:15am] gary_poster: private projects
[11:15am] gary_poster: private teams
[11:15am] gary_poster: private bugs
[11:15am] leonardr: so anything that links to those objects might end up redacted
[11:16am] gary_poster: (and there's more coming, if I understand correctly)
[11:16am] gary_poster: right
[11:17am] leonardr: ok, let's get the basic cache working, make sure it improves performance in real situations, and then i'll work on that
[11:17am] gary_poster: cool, makes sense

Although we'll be using a memcached-based cache when we integrate this code into Launchpad, there's no memcached-code here. For testing purposes I use a cache that's backed by a Python dict.