lazr.restful

Merge lp:~leonardr/lazr.restful/representation-cache into lp:lazr.restful

representation-cache
Merge into trunk

Proposed by Leonard Richardson on 2010-05-24

Status:	Merged
Merged at revision:	130
Proposed branch:	lp:~leonardr/lazr.restful/representation-cache
Merge into:	lp:lazr.restful
Diff against target:	861 lines (+581/-40) 10 files modified src/lazr/restful/NEWS.txt (+10/-0) src/lazr/restful/_operation.py (+11/-12) src/lazr/restful/_resource.py (+88/-16) src/lazr/restful/declarations.py (+11/-4) src/lazr/restful/docs/webservice-declarations.txt (+6/-1) src/lazr/restful/example/base/subscribers.py (+1/-0) src/lazr/restful/example/base/tests/representation-cache.txt (+277/-0) src/lazr/restful/interfaces/_rest.py (+52/-0) src/lazr/restful/simple.py (+124/-6) src/lazr/restful/version.txt (+1/-1)
To merge this branch:	bzr merge lp:~leonardr/lazr.restful/representation-cache
Related bugs:	Link a bug report

Reviewer	Review Type	Date Requested	Status
Eleanor Berger (community)	code	2010-05-24	Approve on 2010-05-24
Review via email: mp+25895@code.launchpad.net

Description of the change

This branch makes it possible to store preconstructed string representations of entries in a cache. If an entry is present in the cache, the preconstructed representation is used (and possibly redacted) rather than generating a new representation.

On its own, this isn't a huge performance improvement. The huge performance improvement comes from collections. If you request a page of collection, and 50% of the entries on that page are present in the cache, 50% of the representations will come from the cache and incorporated into a collection representation. The other half of the representations will be generated placed in the cache, so that if you request that page again, _all_ the entry representations will come from the cache.

In my Launchpad performance tests based on memcached and this lazr.restful branch (https://dev.launchpad.net/Foundations/Webservice/Performance#Store%20representations%20in%20memcached), I found that the operation of retrieving a fully cached collection was about five times faster than if there was no cache.

To make this performance win worth the complexity, I had to make the new 'redacted_fields' attribute very very fast. Typically we check whether the user has permission on an attribute by trying to access the attribute and catching an Unauthorized exception. But if the attribute in question is a calculated attribute, accessing it might trigger a database request or something equally slow. We need to use the Zope permission checker directly.

The problem is that the web service doesn't know which field name to pass into the Zope permission checker. If a field's real name is "fooBar" but it's published as "foo_bar" on the web service, the Zope permission checker expects "fooBar" but all the web service has access to is 'foo_bar'.

To get around this problem, I changed the export() declaration to set 'original_name' every time it sets 'as'. In the example above, 'as' will be 'foo_bar', and everything in the web service will call the field 'foo_bar', except for 'redacted_fields', which will look in 'original_name' to find that it needs to pass 'fooBar' into the Zope permission checker.

Revision history for this message

Leonard Richardson (leonardr) wrote on 2010-05-24:

Although we'll be using a memcached-based cache when we integrate this code into Launchpad, there's no memcached-code here. For testing purposes I use a cache that's backed by a Python dict.

Revision history for this message

Leonard Richardson (leonardr) wrote on 2010-05-24:

Another thing I forgot to mention: although the cache interface has a hook for removing objects from the cache, lazr.restful itself will never call that hook. It's the responsibility of the application to call that hook when the cache needs invalidation.

Revision history for this message

Eleanor Berger (intellectronica) on 2010-05-24:

review: Approve (code)

Revision history for this message

Gary Poster (gary) wrote on 2010-05-24:

Summary: After we verify that this general approach gives real-world benefit, I suspect that we should always populate the cache, not only when there are no redacted fields for the current user. We could do this by stripping the security proxy for getting the initial data and populating the cache, and then doing the same logic that you do now for redacting existing JSON caches to get the actual desired end result.

IRC conversation:

[10:43am] gary_poster: leonardr: in your branch, when there is no cache, did you contemplate always generating it, even if there are redacted fields? Example: if no cache, generate dict of the entire non-redacted version; else if cache and redacted fields, parse out cache to dict; else return cache. (Now we have a non-redacted dict, if we are still here.)
[10:43am] gary_poster: Now, redact dict, turn into JSON, and return. There are variations of that, some of which might be better, but I imagine you get the drift.
[11:01am] leonardr: gary, i'm not sure what the benefit would be
[11:01am] leonardr: also, if there are redacted fields we _cannot_ calculate an unredacted cache due to the security policy
[11:05am] gary_poster: leonardr: the goal would be to create a source for further cache hits. This could be particularly important for objects that frequently have one or more fields redacted. In that case, the cache would rarely or, in the worst case, never be filled (and therefore never or rarely used). Since DB access is the main expense, you discovered, I strongly suspect that loading JSON and redacting will be significantly cheaper than simply creating the JSON.
[11:05am] gary_poster: Also, I'm skeptical of "cannot"; isn't it just a matter of doing the usual work with an unproxied object?
[11:07am] leonardr: yes, we would have to strip the proxy
[11:10am] leonardr: ok, i see what you're saying. we would cache it all the time, whether we were sending a redacted version or not
[11:10am] gary_poster: right
[11:11am] leonardr: i could certainly do that in a future branch. do you know of launchpad objects that typically have redacted fields?
[11:13am] gary_poster: bac would probably know, but he's out. My first guess: anything private, or (perhaps more interesting, perhaps not) anything referring to something provate.
[11:13am] gary_poster: private
[11:14am] leonardr: if an object's url contains private information, a link to that url would be redacted
[11:14am] gary_poster: so, that's an example?
[11:14am] leonardr: but i don't know of any specific launchpad object that does that. it's something to look for
[11:15am] gary_poster: bugs that are marked as security issues
[11:15am] gary_poster: private projects
[11:15am] gary_poster: private teams
[11:15am] gary_poster: private bugs
[11:15am] leonardr: so anything that links to those objects might end up redacted
[11:16am] gary_poster: (and there's more coming, if I understand correctly)
[11:16am] gary_poster: right
[11:17am] leonardr: ok, let's get the basic cache working, make sure it improves performance in real situations, and then i'll work on that
[11:17am] gary_poster: cool, makes sense

Summary: After we verify that this general approach gives real-world benefit, I suspect that we should always  populate the cache, not only when there are no redacted fields for the current user.  We could do this by stripping the security proxy for getting the initial data and populating the cache, and then doing the same logic that you do now for redacting existing JSON caches to get the actual desired end result.

IRC conversation:

[10:43am] gary_poster: leonardr: in your branch, when there is no cache, did you contemplate always generating it, even if there are redacted fields?  Example: if no cache, generate dict of the entire non-redacted version; else if cache and redacted fields, parse out cache to dict; else return cache.  (Now we have a non-redacted dict, if we are still here.) 
[10:43am] gary_poster: Now, redact dict, turn into JSON, and return.  There are variations of that, some of which might be better, but I imagine you get the drift.
[11:01am] leonardr: gary, i'm not sure what the benefit would be
[11:01am] leonardr: also, if there are redacted fields we _cannot_ calculate an unredacted cache due to the security policy
[11:05am] gary_poster: leonardr: the goal would be to create a source for further cache hits.  This could be particularly important for objects that frequently have one or more fields redacted.  In that case, the cache would rarely or, in the worst case, never be filled (and therefore never or rarely used).  Since DB access is the main expense, you discovered, I strongly suspect that loading JSON and redacting will be significantly cheaper than simply creating the JSON.
[11:05am] gary_poster: Also, I'm skeptical of "cannot"; isn't it just a matter of doing the usual work with an unproxied object?
[11:07am] leonardr: yes, we would have to strip the proxy
[11:10am] leonardr: ok, i see what you're saying. we would cache it all the time, whether we were sending a redacted version or not
[11:10am] gary_poster: right
[11:11am] leonardr: i could certainly do that in a future branch. do you know of launchpad objects that typically have redacted fields?
[11:13am] gary_poster: bac would probably know, but he's out.  My first guess: anything private, or (perhaps more interesting, perhaps not) anything referring to something provate.
[11:13am] gary_poster: private
[11:14am] leonardr: if an object's url contains private information, a link to that url would be redacted
[11:14am] gary_poster: so, that's an example?
[11:14am] leonardr: but i don't know of any specific launchpad object that does that. it's something to look for
[11:15am] gary_poster: bugs that are marked as security issues
[11:15am] gary_poster: private projects
[11:15am] gary_poster: private teams
[11:15am] gary_poster: private bugs
[11:15am] leonardr: so anything that links to those objects might end up redacted
[11:16am] gary_poster: (and there's more coming, if I understand correctly)
[11:16am] gary_poster: right
[11:17am] leonardr: ok, let's get the basic cache working, make sure it improves performance in real situations, and then i'll work on that
[11:17am] gary_poster: cool, makes sense

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Gary Poster

Launchpad code reviewers from Canonical

Leonard Richardson

 === modified file 'src/lazr/restful/NEWS.txt'
 --- src/lazr/restful/NEWS.txt	2010-05-17 17:53:57 +0000
 +++ src/lazr/restful/NEWS.txt	2010-05-24 14:15:38 +0000
@@ -2,6 +2,16 @@
  NEWS for lazr.restful
  =====================
++0.9.27 (Development)
++====================
++
++Added the ability to define a representation cache used to store the
++JSON representations of entry resources, rather than building them
++from scratch every time. Although the cache has hooks for
++invalidation, lazr.restful will never invalidate any part of the cache
++on its own. You need to hook lazr.restful's invalidation code into
++your ORM or other data store.
++
 .9.26 (2010-05-18)
  ===================
 === modified file 'src/lazr/restful/_operation.py'
 --- src/lazr/restful/_operation.py	2010-01-05 19:24:12 +0000
 +++ src/lazr/restful/_operation.py	2010-05-24 14:15:38 +0000
@@ -84,22 +84,21 @@
              # If the result is a web service collection, serve only one
              # batch of the collection.
              collection = getMultiAdapter((result, self.request), ICollection)
--            result = CollectionResource(collection, self.request).batch()
++            result = CollectionResource(collection, self.request).batch() + '}'
          elif self.should_batch(result):
--            result = self.batch(result, self.request)
--
--        # Serialize the result to JSON. Any embedded entries will be
--        # automatically serialized.
--        try:
--            json_representation = simplejson.dumps(
--                result, cls=ResourceJSONEncoder)
--        except TypeError, e:
--            raise TypeError("Could not serialize object %s to JSON." %
--                            result)
++            result = self.batch(result, self.request) + '}'
++        else:
++            # Serialize the result to JSON. Any embedded entries will be
++            # automatically serialized.
++            try:
++                result = simplejson.dumps(result, cls=ResourceJSONEncoder)
++            except TypeError, e:
++                raise TypeError("Could not serialize object %s to JSON." %
++                                result)
          self.request.response.setStatus(200)
          self.request.response.setHeader('Content-Type', self.JSON_TYPE)
--        return json_representation
++        return result
      def should_batch(self, result):
          """Whether the given response data should be batched."""
 === modified file 'src/lazr/restful/_resource.py'
 --- src/lazr/restful/_resource.py	2010-05-17 17:52:57 +0000
 +++ src/lazr/restful/_resource.py	2010-05-24 14:15:38 +0000
@@ -70,7 +70,7 @@
  from zope.schema.interfaces import (
      ConstraintNotSatisfied, IBytes, IField, IObject, RequiredMissing)
  from zope.security.interfaces import Unauthorized
--from zope.security.proxy import removeSecurityProxy
++from zope.security.proxy import getChecker, removeSecurityProxy
  from zope.security.management import checkPermission
  from zope.traversing.browser import absoluteURL, AbsoluteURL
  from zope.traversing.browser.interfaces import IAbsoluteURL
@@ -84,7 +84,7 @@
  from lazr.restful.interfaces import (
      ICollection, ICollectionField, ICollectionResource, IEntry, IEntryField,
      IEntryFieldResource, IEntryResource, IFieldHTMLRenderer, IFieldMarshaller,
--    IHTTPResource, IJSONPublishable, IReferenceChoice,
++    IHTTPResource, IJSONPublishable, IReferenceChoice, IRepresentationCache,
      IResourceDELETEOperation, IResourceGETOperation, IResourcePOSTOperation,
      IScopedCollection, IServiceRootResource, ITopLevelEntryLink,
      IUnmarshallingDoesntNeedValue, IWebServiceClientRequest,
@@ -97,7 +97,8 @@
  WADL_SCHEMA_FILE = os.path.join(os.path.dirname(__file__),
                                  'wadl20061109.xsd')
--# Levels of detail to use when unmarshalling the data.
++# Constants and levels of detail to use when unmarshalling the data.
++MISSING = object()
  NORMAL_DETAIL = object()
  CLOSEUP_DETAIL = object()
@@ -599,7 +600,7 @@
      def batch(self, entries, request):
          """Prepare a batch from a (possibly huge) list of entries.
--        :return: A hash:
++        :return: A JSON string representing a hash:
          'entries' contains a list of EntryResource objects for the
            entries that actually made it into this batch
          'total_size' contains the total size of the list.
@@ -608,6 +609,11 @@
          'prev_url', if present, contains a URL to get the previous batch
           in the list.
          'start' contains the starting index of this batch
++
++        Note that the JSON string will be missing its final curly
++        brace. This is in case the caller wants to add some additional
++        keys to the JSON hash. It's the caller's responsibility to add
++        a '}' to the end of the string returned from this method.
          """
          if not hasattr(entries, '__len__'):
              entries = IFiniteSequence(entries)
@@ -617,8 +623,7 @@
          resources = [EntryResource(entry, request)
                       for entry in navigator.batch
                       if checkPermission(view_permission, entry)]
--        batch = { 'entries' : resources,
--                  'total_size' : navigator.batch.listlength,
++        batch = { 'total_size' : navigator.batch.listlength,
                    'start' : navigator.batch.start }
          if navigator.batch.start < 0:
              batch['start'] = None
@@ -628,7 +633,17 @@
          prev_url = navigator.prevBatchURL()
          if prev_url != "":
              batch['prev_collection_link'] = prev_url
--        return batch
++        json_string = simplejson.dumps(batch, cls=ResourceJSONEncoder)
++
++        # String together a bunch of entry representations, possibly
++        # obtained from a representation cache.
++        entry_strings = [
++            resource._representation(HTTPResource.JSON_TYPE)
++            for resource in resources]
++        json_string = (json_string[:-1] + ', "entries": ['
++                       + (", ".join(entry_strings) + ']'))
++        # The caller is responsible for tacking on the final curly brace.
++        return json_string
  class CustomOperationResourceMixin:
@@ -708,8 +723,6 @@
              return "DELETE not supported."
          return operation()
--        return operation()
--
  class FieldUnmarshallerMixin:
@@ -733,12 +746,11 @@
          :return: a 2-tuple (representation_name, representation_value).
          """
--        missing = object()
--        cached_value = missing
++        cached_value = MISSING
          if detail is NORMAL_DETAIL:
              cached_value = self._unmarshalled_field_cache.get(
--                field_name, missing)
--        if cached_value is not missing:
++                field_name, MISSING)
++        if cached_value is not MISSING:
              return cached_value
          field = field.bind(self.context)
@@ -1442,6 +1454,29 @@
                      self.request), self.request),
              adapter.singular_type)
++    @property
++    def redacted_fields(self):
++        """Names the fields the current user doesn't have permission to see."""
++        failures = []
++        checker = getChecker(self.context)
++        for name, field in getFieldsInOrder(self.entry.schema):
++            try:
++                # Can we view the field's value? We check the
++                # permission directly using the Zope permission
++                # checker, because doing it indirectly by fetching the
++                # value may have very slow side effects such as
++                # database hits.
++                tagged_values = field.getTaggedValue('lazr.restful.exported')
++                original_name = tagged_values['original_name']
++                checker.check(self.context, original_name)
++            except Unauthorized:
++                # This is an expensive operation that will make this
++                # request more expensive still, but it happens
++                # relatively rarely.
++                repr_name, repr_value = self._unmarshallField(name, field)
++                failures.append(repr_name)
++        return failures
++
      def isModifiableField(self, field, is_external_client):
          """Returns true if this field's value can be changed.
@@ -1463,10 +1498,45 @@
      def _representation(self, media_type):
          """Return a representation of this entry, of the given media type."""
++
          if media_type in [self.WADL_TYPE, self.DEPRECATED_WADL_TYPE]:
              return self.toWADL().encode("utf-8")
          elif media_type == self.JSON_TYPE:
--            return simplejson.dumps(self, cls=ResourceJSONEncoder)
++            cache = None
++            try:
++                cache = getUtility(IRepresentationCache)
++                representation = cache.get(
++                    self.context, self.JSON_TYPE, self.request.version)
++            except ComponentLookupError:
++                # There's no representation cache.
++                representation = None
++
++            redacted_fields = self.redacted_fields
++            if representation is None:
++                # Either there is no cache, or the representation
++                # wasn't in the cache.
++                representation = simplejson.dumps(self, cls=ResourceJSONEncoder)
++                # If there's a cache, and this representation doesn't
++                # contain any redactions, store it in the cache.
++                if cache is not None and len(redacted_fields) == 0:
++                    cache.set(self.context, self.JSON_TYPE,
++                              self.request.version, representation)
++            else:
++                # We have a representation, but we might not be able
++                # to use it as-is.
++                if len(redacted_fields) != 0:
++                    # We can't use the representation as is. We need
++                    # to deserialize it, redact certain fields, and
++                    # reserialize it. Hopefully this is faster than
++                    # generating the representation from scratch!
++                    json = simplejson.loads(representation)
++                    for field in redacted_fields:
++                        json[field] = self.REDACTED_VALUE
++                    # There's no need to use the ResourceJSONEncoder,
++                    # because we loaded the cached representation
++                    # using the standard decoder.
++                    representation = simplejson.dumps(json)
++            return representation
          elif media_type == self.XHTML_TYPE:
              return self.toXHTML().encode("utf-8")
          else:
@@ -1516,7 +1586,7 @@
              result = self.batch(entries)
          self.request.response.setHeader('Content-type', self.JSON_TYPE)
--        return simplejson.dumps(result, cls=ResourceJSONEncoder)
++        return result
      def batch(self, entries=None):
          """Return a JSON representation of a batch of entries.
@@ -1526,7 +1596,9 @@
          if entries is None:
              entries = self.collection.find()
          result = super(CollectionResource, self).batch(entries, self.request)
--        result['resource_type_link'] = self.type_url
++        result += (
++            ', "resource_type_link" : ' + simplejson.dumps(self.type_url)
++            + '}')
          return result
      @property
 === modified file 'src/lazr/restful/declarations.py'
 --- src/lazr/restful/declarations.py	2010-02-25 17:07:16 +0000
 +++ src/lazr/restful/declarations.py	2010-05-24 14:15:38 +0000
@@ -151,10 +151,17 @@
              if tag_stack['type'] != FIELD_TYPE:
                  continue
              for version, tags in tag_stack.stack:
--                # Set 'as' for every version in which the field is published
--                # but no 'as' is specified.
--                if tags.get('as') is None and tags.get('exported') != False:
--                    tags['as'] = name
++                # Set 'as' for every version in which the field is
++                # published but no 'as' is specified. Also set
++                # 'original_name' for every version in which the field
++                # is published--this will help with performance
++                # optimizations around permission checks.
++                if tags.get('exported') != False:
++                    tags['original_name'] = name
++                    if tags.get('as') is None:
++                        tags['as'] = name
++
++
          annotate_exported_methods(interface)
          return interface
 === modified file 'src/lazr/restful/docs/webservice-declarations.txt'
 --- src/lazr/restful/docs/webservice-declarations.txt	2010-04-14 14:56:46 +0000
 +++ src/lazr/restful/docs/webservice-declarations.txt	2010-05-24 14:15:38 +0000
@@ -49,7 +49,7 @@
      ...
      ...     inventory_number = TextLine(title=u'The inventory part number.')
--These declarations adds tagged value to the original interface elements.
++These declarations add tagged values to the original interface elements.
  The tags are in the lazr.restful namespace and are dictionaries of
  elements.
@@ -74,12 +74,15 @@
      type: 'entry'
      >>> print_export_tag(IBook['title'])
      as: 'title'
++    original_name: 'title'
      type: 'field'
      >>> print_export_tag(IBook['author'])
      as: 'author'
++    original_name: 'author'
      type: 'field'
      >>> print_export_tag(IBook['base_price'])
      as: 'price'
++    original_name: 'base_price'
      type: 'field'
      >>> print_export_tag(IBook['inventory_number'])
      tag 'lazr.restful.exported' is not present
@@ -751,9 +754,11 @@
      ...     print_export_tag(IUser[name])
      == name ==
      as: 'name'
++    original_name: 'name'
      type: 'field'
      == nickname ==
      as: 'nickname'
++    original_name: 'nickname'
      type: 'field'
      == rename ==
      as: 'rename'
 === modified file 'src/lazr/restful/example/base/subscribers.py'
 --- src/lazr/restful/example/base/subscribers.py	2009-09-01 14:37:41 +0000
 +++ src/lazr/restful/example/base/subscribers.py	2010-05-24 14:15:38 +0000
@@ -5,6 +5,7 @@
  __metaclass__ = type
  __all__ = ['update_cookbook_revision_number']
++from zope.interface import Interface
  import grokcore.component
  from lazr.lifecycle.interfaces import IObjectModifiedEvent
  from lazr.restful.example.base.interfaces import ICookbook
 === added file 'src/lazr/restful/example/base/tests/representation-cache.txt'
 --- src/lazr/restful/example/base/tests/representation-cache.txt	1970-01-01 00:00:00 +0000
 +++ src/lazr/restful/example/base/tests/representation-cache.txt	2010-05-24 14:15:38 +0000
@@ -0,0 +1,277 @@
++**********************************
++The in-memory representation cache
++**********************************
++
++Rather than having lazr.restful calculate a representation of an entry
++every time it's requested, you can register an object as the
++representation cache. String representations of entries are generated
++once and stored in the representation cache.
++
++lazr.restful works fine when there is no representation cache
++installed; in fact, this is the only test that uses one.
++
++    >>> from zope.component import getUtility
++    >>> from lazr.restful.interfaces import IRepresentationCache
++    >>> getUtility(IRepresentationCache)
++    Traceback (most recent call last):
++    ...
++    ComponentLookupError: ...
++
++DictionaryBasedRepresentationCache
++==================================
++
++A representation cache can be any object that implements
++IRepresentationCache, but for test purposes we'll be using a simple
++DictionaryBasedRepresentationCache. This object transforms the
++IRepresentationCache operations into operations on a Python dict-like
++object.
++
++    >>> from lazr.restful.simple import DictionaryBasedRepresentationCache
++    >>> dictionary = {}
++    >>> cache = DictionaryBasedRepresentationCache(dictionary)
++
++It's not a good idea to use a normal Python dict in production,
++because there's no limit on how large the dict can become. In a real
++situation you want something with an LRU implementation. That said,
++let's see how the DictionaryBasedRepresentationCache works.
++
++All IRepresentationCache implementations will cache a representation
++under a key derived from the object whose representation it is, the
++media type of the representation, and a web service version name.
++
++    >>> from lazr.restful.example.base.root import C4 as greens_object
++    >>> json = "application/json"
++    >>> print cache.get(greens_object, json, "devel")
++    None
++    >>> print cache.get(greens_object, json, "devel", "missing")
++    missing
++
++    >>> cache.set(greens_object, json, "devel", "This is the 'devel' value.")
++    >>> print cache.get(greens_object, json, "devel")
++    This is the 'devel' value.
++    >>> sorted(dictionary.keys())
++    ['http://cookbooks.dev/devel/cookbooks/Everyday%20Greens,application/json']
++
++This allows different representations of the same object to be stored
++for different versions.
++
++    >>> cache.set(greens_object, json, "1.0", "This is the '1.0' value.")
++    >>> print cache.get(greens_object, json, "1.0")
++    This is the '1.0' value.
++    >>> sorted(dictionary.keys())
++    ['http://cookbooks.dev/1.0/cookbooks/Everyday%20Greens,application/json',
++     'http://cookbooks.dev/devel/cookbooks/Everyday%20Greens,application/json']
++
++Deleting an object from the cache will remove all its representations.
++
++    >>> cache.delete(greens_object)
++    >>> sorted(dictionary.keys())
++    []
++    >>> print cache.get(greens_object, json, "devel")
++    None
++    >>> print cache.get(greens_object, json, "1.0")
++    None
++
++A representation cache
++======================
++
++Now let's register our DictionaryBasedRepresentationCache as the
++representation cache for this web service, and see how it works within
++lazr.restful.
++
++    >>> from zope.component import getSiteManager
++    >>> sm = getSiteManager()
++    >>> sm.registerUtility(cache, IRepresentationCache)
++
++    >>> from lazr.restful.testing.webservice import WebServiceCaller
++    >>> webservice = WebServiceCaller(domain='cookbooks.dev')
++
++When we retrieve a JSON representation of an entry, that
++representation is added to the cache.
++
++    >>> ignored = webservice.get("/recipes/1")
++    >>> [the_only_key] = dictionary.keys()
++    >>> print the_only_key
++    http://cookbooks.dev/devel/recipes/1,application/json
++
++Note that the cache key incorporates the web service version name
++("devel") and the media type of the representation
++("application/json").
++
++Associated with the key is a string: the JSON representation of the object.
++
++    >>> import simplejson
++    >>> print simplejson.loads(dictionary[the_only_key])['self_link']
++    http://cookbooks.dev/devel/recipes/1
++
++If we get a representation of the same resource from a different web
++service version, that representation is stored separately.
++
++    >>> ignored = webservice.get("/recipes/1", api_version="1.0")
++    >>> for key in sorted(dictionary.keys()):
++    ...     print key
++    http://cookbooks.dev/1.0/recipes/1,application/json
++    http://cookbooks.dev/devel/recipes/1,application/json
++
++    >>> key1 = "http://cookbooks.dev/1.0/recipes/1,application/json"
++    >>> key2 = "http://cookbooks.dev/devel/recipes/1,application/json"
++    >>> dictionary[key1] == dictionary[key2]
++    False
++
++Cache invalidation
++==================
++
++lazr.restful does not automatically invalidate the representation
++cache, because it only knows about a subset of the changes that might
++invalidate the cache--the changes that happen through the web service
++itself.
++
++If you want to invalidate the cache whenever the web service changes
++an object, you can write a listener for ObjectModifiedEvent objects
++(see doc/webservice.txt for an example). But most of the time, you'll
++want to invalidate the cache when something deeper happens--something
++like a change to the objects in your ORM.
++
++Let's signal a change to recipe #1. Let's say someone changed that
++recipe, using a web application that has no connection to the web
++service except for a shared database. We can detect the database
++change, but what do we do when that change happens?
++
++Here's the recipe object.
++
++    >>> from lazr.restful.example.base.root import RECIPES
++    >>> recipe = [recipe for recipe in RECIPES if recipe.id == 1][0]
++
++To remove its representation from the cache, we pass it into the
++cache's delete() method.
++
++    >>> print cache.get(recipe, json, 'devel')
++    {...}
++    >>> cache.delete(recipe)
++
++All the relevant representations are deleted.
++
++    >>> print cache.get(recipe, json, 'devel')
++    None
++    >>> dictionary.keys()
++    []
++
++Data visibility
++===============
++
++Only full representations are added to the cache. If the
++representation you request includes a redacted field (because you
++don't have permission to see that field's true value), the
++representation is not added to the cache.
++
++    >>> from urllib import quote
++    >>> greens_url = quote("/cookbooks/Everyday Greens")
++    >>> greens = webservice.get(greens_url).jsonBody()
++    >>> print greens['confirmed']
++    tag:launchpad.net:2008:redacted
++
++    >>> dictionary.keys()
++    []
++
++This means that if your entry resources typically contain data that's
++only visible to a select few users, you won't get much benefit out of
++a representation cache.
++
++What if a full representation is in the cache, and the user requests a
++representation that must be redacted? Let's put some semi-fake data in
++the cache and find out.
++
++    >>> import simplejson
++    >>> greens['name'] = "This comes from the cache; it is not generated."
++    >>> greens['confirmed'] = True
++    >>> cache.set(greens_object, json, 'devel', simplejson.dumps(greens))
++
++When we GET the corresponding resource, we get a representation that
++definitely comes from the cache, not the original data source.
++
++    >>> cached_greens = webservice.get(greens_url).jsonBody()
++    >>> print cached_greens['name']
++    This comes from the cache; it is not generated.
++
++But the redacted value is still redacted.
++
++    >>> print cached_greens['confirmed']
++    tag:launchpad.net:2008:redacted
++
++Cleanup: clear the cache.
++
++    >>> dictionary.clear()
++
++Collections
++===========
++
++Collections are full of entries, and representations of collections
++are built from the cache if possible. We'll demonstrate this with the
++collection of recipes.
++
++First, we'll hack the cached representation of a single recipe.
++
++    >>> recipe = webservice.get("/recipes/1").jsonBody()
++    >>> recipe['instructions'] = "This representation is from the cache."
++    >>> [recipe_key] = dictionary.keys()
++    >>> dictionary[recipe_key] = simplejson.dumps(recipe)
++
++Now, we get the collection of recipes.
++
++    >>> recipes = webservice.get("/recipes").jsonBody()['entries']
++
++The fake instructions we put into an entry's cached representation are
++also present in the collection.
++
++    >>> for instructions in (
++    ...     sorted(recipe['instructions'] for recipe in recipes)):
++    ...     print instructions
++    A perfectly roasted chicken is...
++    Draw, singe, stuff, and truss...
++    ...
++    This representation is from the cache.
++
++To build the collection, lazr.restful had to generate representations
++of all the cookbook entries. As it generated each representation, it
++populated the cache.
++
++    >>> for key in sorted(dictionary.keys()):
++    ...     print key
++    http://cookbooks.dev/devel/recipes/1,application/json
++    http://cookbooks.dev/devel/recipes/2,application/json
++    http://cookbooks.dev/devel/recipes/3,application/json
++    http://cookbooks.dev/devel/recipes/4,application/json
++
++If we request the collection again, all the entry representations will
++come from the cache.
++
++    >>> for key in dictionary.keys():
++    ...     value = simplejson.loads(dictionary[key])
++    ...     value['instructions'] = "This representation is from the cache."
++    ...     dictionary[key] = simplejson.dumps(value)
++
++    >>> recipes = webservice.get("/recipes").jsonBody()['entries']
++    >>> for instructions in (
++    ...     sorted(recipe['instructions'] for recipe in recipes)):
++    ...     print instructions
++    This representation is from the cache.
++    This representation is from the cache.
++    This representation is from the cache.
++    This representation is from the cache.
++
++Cleanup: de-register the cache.
++
++    >>> sm.registerUtility(None, IRepresentationCache)
++
++Of course, the hacks we made to the cached representations have no
++effect on the objects themselves. Once the hacked cache is gone, the
++representations look just as they did before.
++
++    >>> recipes = webservice.get("/recipes").jsonBody()['entries']
++    >>> for instructions in (
++    ...     sorted(recipe['instructions'] for recipe in recipes)):
++    ...     print instructions
++    A perfectly roasted chicken is...
++    Draw, singe, stuff, and truss...
++    Preheat oven to...
++    You can always judge...
 === modified file 'src/lazr/restful/interfaces/_rest.py'
 --- src/lazr/restful/interfaces/_rest.py	2010-05-17 17:52:57 +0000
 +++ src/lazr/restful/interfaces/_rest.py	2010-05-24 14:15:38 +0000
@@ -34,6 +34,7 @@
      'IHTTPResource',
      'IJSONPublishable',
      'IJSONRequestCache',
++    'IRepresentationCache',
      'IResourceOperation',
      'IResourceGETOperation',
      'IResourceDELETEOperation',
@@ -606,4 +607,55 @@
          """Traverse to a sub-object."""
++class IRepresentationCache(Interface):
++    """A cache for resource representations.
++
++    Register an object as the utility for this interface and
++    lazr.restful will use that object to cache resource
++    representations. If no object is registered as the utility,
++    representations will not be cached.
++
++    This is designed to be used with memcached, but you can plug in
++    other key-value stores. Note that this cache is intended to store
++    string representations, not deserialized JSON objects or anything
++    else.
++    """
++
++    def get(object, media_Type, version, default=None):
++        """Retrieve a representation from the cache.
++
++        :param object: An IEntry--the object whose representation you want.
++        :param media_type: The media type of the representation to get.
++        :param version: The version of the web service for which to
++            fetch a representation.
++        :param default: The object to return if no representation is
++            cached for this object.
++
++        :return: A string representation, or `default`.
++        """
++        pass
++
++    def set(object, media_type, version, representation):
++        """Add a representation to the cache.
++
++        :param object: An IEntry--the object whose representation this is.
++        :param media_type: The media type of the representation.
++        :param version: The version of the web service in which this
++            representation should be stored.
++        :param representation: The string representation to store.
++        """
++        pass
++
++    def delete(object):
++        """Remove *all* of an object's representations from the cache.
++
++        This means representations for every (supported) media type
++        and every version of the web service. Currently the only
++        supported media type is 'application/json'.
++
++        :param object: An IEntry--the object being represented.
++        """
++        pass
++
++
  InvalidBatchSizeError.__lazr_webservice_error__ = 400
 === modified file 'src/lazr/restful/simple.py'
 --- src/lazr/restful/simple.py	2010-01-28 15:33:31 +0000
 +++ src/lazr/restful/simple.py	2010-05-24 14:15:38 +0000
@@ -2,7 +2,9 @@
  __metaclass__ = type
  __all__ = [
++    'BaseRepresentationCache',
      'BaseWebServiceConfiguration',
++    'DictionaryBasedRepresentationCache',
      'IMultiplePathPartLocation',
      'MultiplePathPartAbsoluteURL',
      'Publication',
@@ -24,19 +26,24 @@
  from zope.publisher.publish import mapply
  from zope.proxy import sameProxiedObjects
  from zope.security.management import endInteraction, newInteraction
--from zope.traversing.browser import AbsoluteURL as ZopeAbsoluteURL
++from zope.traversing.browser import (
++    absoluteURL, AbsoluteURL as ZopeAbsoluteURL)
  from zope.traversing.browser.interfaces import IAbsoluteURL
  from zope.traversing.browser.absoluteurl import _insufficientContext, _safe
  import grokcore.component
--from lazr.restful import EntryAdapterUtility, ServiceRootResource
++from lazr.restful import (
++    EntryAdapterUtility, HTTPResource, ServiceRootResource)
  from lazr.restful.interfaces import (
--    IServiceRootResource, ITopLevelEntryLink, ITraverseWithGet,
--    IWebServiceConfiguration, IWebServiceLayer)
++    IRepresentationCache, IServiceRootResource, ITopLevelEntryLink,
++    ITraverseWithGet, IWebServiceConfiguration, IWebServiceLayer)
  from lazr.restful.publisher import (
--    WebServicePublicationMixin, WebServiceRequestTraversal)
--from lazr.restful.utils import implement_from_dict
++    browser_request_to_web_service_request, WebServicePublicationMixin,
++    WebServiceRequestTraversal)
++from lazr.restful.utils import (
++    get_current_browser_request, implement_from_dict,
++    tag_request_with_version_name)
  class PublicationMixin(object):
@@ -351,6 +358,117 @@
      __call__ = __str__
++class BaseRepresentationCache(object):
++    """A useful base class for representation caches.
++
++    When an object is invalidated, all of its representations must be
++    removed from the cache. This means representations of every media
++    type for every version of the web service. Subclass this class and
++    you won't have to worry about removing everything. You can focus
++    on implementing key_for() and delete_by_key(), which takes the
++    return value of key_for() instead of a raw object.
++
++    You can also implement set_by_key() and get_by_key(), which also
++    take the return value of key_for(), instead of set() and get().
++    """
++    implements(IRepresentationCache)
++
++    def get(self, obj, media_type, version, default=None):
++        """See `IRepresentationCache`."""
++        key = self.key_for(obj, media_type, version)
++        return self.get_by_key(key, default)
++
++    def set(self, obj, media_type, version, representation):
++        """See `IRepresentationCache`."""
++        key = self.key_for(obj, media_type, version)
++        return self.set_by_key(key, representation)
++
++    def delete(self, object):
++        """See `IRepresentationCache`."""
++        config = getUtility(IWebServiceConfiguration)
++        for version in config.active_versions:
++            key = self.key_for(object, HTTPResource.JSON_TYPE, version)
++            self.delete_by_key(key)
++
++    def key_for(self, object, media_type, version):
++        """Generate a unique key for an object/media type/version.
++
++        :param object: An IEntry--the object whose representation you want.
++        :param media_type: The media type of the representation to get.
++        :param version: The version of the web service for which to
++            fetch a representation.
++        """
++        raise NotImplementedError()
++
++    def get_by_key(self, key, default=None):
++        """Delete a representation from the cache, given a key.
++
++        :key: The cache key.
++        """
++        raise NotImplementedError()
++
++    def set_by_key(self, key):
++        """Delete a representation from the cache, given a key.
++
++        :key: The cache key.
++        """
++        raise NotImplementedError()
++
++    def delete_by_key(self, key):
++        """Delete a representation from the cache, given a key.
++
++        :key: The cache key.
++        """
++        raise NotImplementedError()
++
++
++class DictionaryBasedRepresentationCache(BaseRepresentationCache):
++    """A representation cache that uses an in-memory dict.
++
++    This cache transforms IRepresentationCache operations into
++    operations on a dictionary.
++
++    Don't use a Python dict object in a production installation! It
++    can easily grow to take up all available memory. If you implement
++    a dict-like object that maintains a maximum size with an LRU
++    algorithm or something similar, you can use that. But this class
++    was written for testing.
++    """
++    def __init__(self, use_dict):
++        """Constructor.
++
++        :param use_dict: A dictionary to keep representations in. As
++        noted in the class docstring, in a production installation
++        it's a very bad idea to use a standard Python dict object.
++        """
++        self.dict = use_dict
++
++    def key_for(self, obj, media_type, version):
++        """See `BaseRepresentationCache`."""
++        # Create a fake web service request for the appropriate version.
++        config = getUtility(IWebServiceConfiguration)
++        web_service_request = config.createRequest("", {})
++        web_service_request.setVirtualHostRoot(
++            names=[config.path_override, version])
++        tag_request_with_version_name(web_service_request, version)
++
++        # Use that request to create a versioned URL for the object.
++        value = absoluteURL(obj, web_service_request) + ',' + media_type
++        return value
++
++    def get_by_key(self, key, default=None):
++        """See `IRepresentationCache`."""
++        return self.dict.get(key, default)
++
++    def set_by_key(self, key, representation):
++        """See `IRepresentationCache`."""
++        self.dict[key] = representation
++
++    def delete_by_key(self, key):
++        """Implementation of a `BaseRepresentationCache` method."""
++        del self.dict[key]
++
++
  BaseWebServiceConfiguration = implement_from_dict(
      "BaseWebServiceConfiguration", IWebServiceConfiguration, {}, object)
 === modified file 'src/lazr/restful/version.txt'
 --- src/lazr/restful/version.txt	2010-05-10 11:48:42 +0000
 +++ src/lazr/restful/version.txt	2010-05-24 14:15:38 +0000
@@ -1,1 +1,1 @@
--0.9.26
++0.9.27

lazr.restful

Merge lp:~leonardr/lazr.restful/representation-cache into lp:lazr.restful

Commit message

Description of the change

Preview Diff

Subscribers