Merge lp:~james-w/lazr.restfulclient/improve-collection-fetch-optimisation into lp:lazr.restfulclient

Proposed by James Westby
Status: Work in progress
Proposed branch: lp:~james-w/lazr.restfulclient/improve-collection-fetch-optimisation
Merge into: lp:lazr.restfulclient
Diff against target: 39 lines (+12/-10)
1 file modified
src/lazr/restfulclient/resource.py (+12/-10)
To merge this branch: bzr merge lp:~james-w/lazr.restfulclient/improve-collection-fetch-optimisation
Reviewer Review Type Date Requested Status
Graham Binns (community) code Needs Information
Review via email: mp+16841@code.launchpad.net
To post a comment you must log in.
Revision history for this message
James Westby (james-w) wrote :

Hi,

This change helps with fetching large collections. There is
on optimisation that sets the desired page size if it looks
like a full page will be unneeded, but the way it was coded
meant that the first fetch was always full.

The change here means that the optimisation is more widely applied,
but critically for me you can actually control the size of the fetches
my looping the slices. Some API calls are liable to timeout, so
chunking them smaller allows you to use them, and that's currently
not possible. With this change you can iterate slices to control
the page size.

Thanks,

James

Revision history for this message
James Westby (james-w) wrote :

Actually this doesn't allow you to limit the page size
all the time. Once you get beyond the first page it
makes a request with no size specified.

In order to do this properly we would either need a
max-size request variable, would that be feasible for
LP?

Thanks,

James

Revision history for this message
Graham Binns (gmb) wrote :

Hi James,

Thanks for this branch. A couple of things:

 1. I don't see any tests for this change. Are there tests that already cover this functionality? If so, how will this change affect them? If not, can you add some?
 2. Formatting multi-line conditionals is always a bit weird. We usually wrap after the first 'and' or 'or'. I've written a patch that fixes this for this branch: http://pastebin.ubuntu.com/558064/

> Actually this doesn't allow you to limit the page size
> all the time. Once you get beyond the first page it
> makes a request with no size specified.
>
> In order to do this properly we would either need a
> max-size request variable, would that be feasible for
> LP?
>
 3. I don't really understand what you mean here (I'm still jetlagged, which doesn't help). Can you explain it further (possibly using words of no more than one syllable)?

review: Needs Information (code)

Unmerged revisions

87. By James Westby

Account for the case where there isn't a first page properly.

86. By James Westby

Make the optimisation in Collection._get_slice apply to more cases.

Collection._get_slice had an optimisation to fetch smaller pages if it
thought that a full page was unneeded. However, it only applied this on
the second and subsequent fetch. This change makes it apply to the first
fetch as well, as it applies there equally.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'src/lazr/restfulclient/resource.py'
2--- src/lazr/restfulclient/resource.py 2009-12-17 15:03:38 +0000
3+++ src/lazr/restfulclient/resource.py 2010-01-05 11:41:18 +0000
4@@ -753,6 +753,17 @@
5
6 # Iterate over pages until we have the correct number of entries.
7 while more_needed > 0 and page_url is not None:
8+ if (first_page_size is not None
9+ and more_needed > 0 and more_needed < first_page_size):
10+ # An optimization: it's likely that we need less than
11+ # a full page of entries, because the number we need
12+ # is less than the size of the first page we got.
13+ # Instead of requesting a full-sized page, we'll
14+ # request only the number of entries we think we'll
15+ # need. If we're wrong, there's no problem; we'll just
16+ # keep looping.
17+ page_url = self._with_url_query_variable_set(
18+ page_url, 'ws.size', more_needed)
19 representation = simplejson.loads(
20 unicode(self._root._browser.get(page_url)))
21 current_page_entries = representation['entries']
22@@ -766,16 +777,7 @@
23 break
24 if first_page_size is None:
25 first_page_size = len(current_page_entries)
26- if more_needed > 0 and more_needed < first_page_size:
27- # An optimization: it's likely that we need less than
28- # a full page of entries, because the number we need
29- # is less than the size of the first page we got.
30- # Instead of requesting a full-sized page, we'll
31- # request only the number of entries we think we'll
32- # need. If we're wrong, there's no problem; we'll just
33- # keep looping.
34- page_url = self._with_url_query_variable_set(
35- page_url, 'ws.size', more_needed)
36+
37
38 if slice.step is not None:
39 entry_dicts = entry_dicts[::slice.step]

Subscribers

People subscribed via source and target branches