Merge lp:~adeuring/lazr.batchnavigator/slicing-error-for-too-short-last-backwards-batch into lp:lazr.batchnavigator

Proposed by Abel Deuring
Status: Merged
Approved by: Graham Binns
Approved revision: 48
Merged at revision: 48
Proposed branch: lp:~adeuring/lazr.batchnavigator/slicing-error-for-too-short-last-backwards-batch
Merge into: lp:lazr.batchnavigator
Diff against target: 324 lines (+234/-9)
3 files modified
src/lazr/batchnavigator/tests/test_z3batching.py (+135/-1)
src/lazr/batchnavigator/z3batching/batch.py (+83/-1)
src/lazr/batchnavigator/z3batching/interfaces.py (+16/-7)
To merge this branch: bzr merge lp:~adeuring/lazr.batchnavigator/slicing-error-for-too-short-last-backwards-batch
Reviewer Review Type Date Requested Status
Graham Binns (community) code Approve
Review via email: mp+72745@code.launchpad.net

Description of the change

The problem:

  - When _Batch.sliced_list() retrieves a slice of a
    result set (line 235 of batch.py) to get the data
    of a backwards batch:

    sliced = self.range_factory.getSlice(size, self.range_memo,
        self.range_forwards)

  - when the result of this call has less elements than
    needed

the method called

    self.range_factory.getSlice(needed, self.range_memo, forwards=True)

in order to retrieve more elements of a result set. This
works fine, if self.range_memo, i.e., the endpoint memo
value used to retrieve the first, too small, chunk of data,
can be used as an endpoint value to retrieve a chunk of data
that follows the already existing chunk.

This works for the ListRangeFactory, which uses the regular
Python slicing protocol: The expression s[a, b] is that part
of the sequence s which starts at index a and ends before
index b.

In this case, self.range_memo is basically used like 'b' above
in

    sliced = self.range_factory.getSlice(size, self.range_memo,
        self.range_forwards)

and it is used as 'a' in

    self.range_factory.getSlice(needed, self.range_memo, forwards=True)

But Launchpad's class StormRangeFactory works slightly
different: Its method getEndpointmemos() returns the
values of the columns used for sorting: The two values
returned by this method should be used in a query like
"return N rows from the result set where the sort columns
have values (smaller than|larger than) the endpoint memo
value".

If we assume a result set like

    [1, 2, 3, 4, 5]

and the bachwards batch was retrieved for the memo value 3
(factory.getSlice(size, 3, backwards=True)), the result is:

   [1, 2]

The second call (factory.setSlice(size, 3, forwards=True)
would return

  [4, 5]

The core of the fix: sliced_list() must explicitly retrieve
the endpoint memo values for the already retrieved chunk of
data. That's just these lines of the diff:

+ partial = _PartialBatch(sliced)
+ extra_memo = (
+ self.range_factory.getEndpointMemos(partial))
     extra = self.range_factory.getSlice(needed,
- self.range_memo, forwards=True)
+ extra_memo[1], forwards=True)

The problem with these innocent lines: A method
IRangeFactory.getEndpointMemos(batch) can use _any_
parameters from batch...

So we need to build again an object which implements the
full IBatch interface (at least formally -- the methods
prevBatch() and nextBatch() are irrelevant, hopefully also
in the future...) so that any range factory has access
to whatever batching related attribute or method it wants...

This "partial batch" (is there a better name?) is implemented
by, well, class _PartialBatch.

I also added the methods __getitem__, sliced_list, trueSize
to IBatch, because these sliced_list is explicitly used by
StormRangeFactory to retrieve results. And ListRangeFactory
wants access to trueSize.

I first reproduced and then the problem of the "missing
element" in a backwards batch with the new method
test_last_backwards_batch_with_value_based_range_factory(),
which needs a special IRangeFactory, which in turn needed
a bit of testing (test_PartialBatch()).

That's roughly how a six lines diff exploded into 320 lines ;)

To post a comment you must log in.
Revision history for this message
Robert Collins (lifeless) wrote :

turtles all the way down huh.

review: approve

Revision history for this message
Graham Binns (gmb) wrote :

Marking this approved as a proxy for Robert.

review: Approve (code)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'src/lazr/batchnavigator/tests/test_z3batching.py'
2--- src/lazr/batchnavigator/tests/test_z3batching.py 2011-08-23 13:14:41 +0000
3+++ src/lazr/batchnavigator/tests/test_z3batching.py 2011-08-24 16:24:23 +0000
4@@ -16,7 +16,13 @@
5 import operator
6 import unittest
7
8-from lazr.batchnavigator.z3batching.batch import _Batch, BATCH_SIZE
9+from zope.interface.verify import verifyObject, verifyClass
10+
11+from lazr.batchnavigator.z3batching.batch import (
12+ _Batch,
13+ BATCH_SIZE,
14+ _PartialBatch,
15+ )
16 from lazr.batchnavigator.z3batching.interfaces import IBatch
17 from lazr.batchnavigator import ListRangeFactory
18
19@@ -59,6 +65,52 @@
20 return super(ListWithIncorrectLength, self).__getslice__(start, end)
21
22
23+class RangeFactoryWithValueBasedEndpointMemos:
24+ """A RangeFactory which uses data values from a batch as endpoint memos.
25+ """
26+ def __init__(self, results):
27+ self.results = results
28+
29+ def getEndpointMemos(self, batch):
30+ """See `IRangeFactory`."""
31+ return batch[0], batch[-1]
32+
33+ getEndpointMemosFromSlice = getEndpointMemos
34+
35+ def getSlice(self, size, endpoint_memo='', forwards=True):
36+ """See `IRangeFactory`."""
37+ if size == 0:
38+ return []
39+ if endpoint_memo == '':
40+ if forwards:
41+ return self.results[:size]
42+ else:
43+ sliced = self.results[-size:]
44+ sliced.reverse()
45+ return sliced
46+
47+ if forwards:
48+ index = 0
49+ while (index < len(self.results) and
50+ endpoint_memo >= self.results[index]):
51+ index += 1
52+ return self.results[index:index+size]
53+ else:
54+ index = len(self.results) - 1
55+ while (index >= 0 and endpoint_memo < self.results[index]):
56+ index -= 1
57+ if index < 0:
58+ return []
59+ start_index = max(0, index - size)
60+ sliced = self.results[start_index:index]
61+ sliced.reverse()
62+ return sliced
63+
64+ def getSliceByIndex(self, start, end):
65+ """See `IRangeFactory`."""
66+ return self.results[start:end]
67+
68+
69 class TestingInfrastructureTest(unittest.TestCase):
70 def test_ListWithExplosiveLen(self):
71 # For some of the tests we want to be sure len() of the underlying
72@@ -93,6 +145,46 @@
73 self.assertEqual([], weird_list[-2:-1])
74 self.assertEqual([3], weird_list[-3:-2])
75
76+ def test_RangeFactoryWithValueBased_getEndpointMemos(self):
77+ data = [str(value) for value in range(10)]
78+ self.assertEqual(
79+ ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'], data)
80+ factory = RangeFactoryWithValueBasedEndpointMemos(data)
81+ # The endpoint memo values are the values of the first and last
82+ # elelemnt of a batch.
83+ batch = data[:3]
84+ self.assertEqual(('0', '2'), factory.getEndpointMemos(batch))
85+ batch = data[4:8]
86+ self.assertEqual(('4', '7'), factory.getEndpointMemos(batch))
87+ # getSlice() called with an empty memo value returns the
88+ # first elements if forwards is True...
89+ self.assertEqual(
90+ ['0', '1', '2'],
91+ factory.getSlice(size=3, endpoint_memo='', forwards=True))
92+ # ...and the last elements if forwards is False.
93+ self.assertEqual(
94+ ['9', '8', '7'],
95+ factory.getSlice(size=3, endpoint_memo='', forwards=False))
96+ # A forwards slice starts with a value larger than the
97+ # given memo value.
98+ self.assertEqual(
99+ ['6', '7'],
100+ factory.getSlice(size=2, endpoint_memo='5', forwards=True))
101+ # A backwards slice starts with a value smaller than the
102+ # given memo value.
103+ self.assertEqual(
104+ ['4', '3'],
105+ factory.getSlice(size=2, endpoint_memo='5', forwards=False))
106+ # A slice is smaller than requested if the end of the results
107+ # is reached.
108+ self.assertEqual(
109+ ['8', '9'],
110+ factory.getSlice(size=3, endpoint_memo='7', forwards=True))
111+ self.assertEqual(
112+ [], factory.getSlice(size=3, endpoint_memo='A', forwards=True))
113+ self.assertEqual(
114+ [], factory.getSlice(size=3, endpoint_memo=' ', forwards=False))
115+
116
117 class RecordingFactory(ListRangeFactory):
118 def __init__(self, results):
119@@ -514,3 +606,45 @@
120 self.assertEqual([1, 2, 3, 4], batch.sliced_list)
121 # And we get a previous batch.
122 self.assertEqual([0, 1, 2, 3], new_batch.prevBatch().sliced_list)
123+
124+ def test_last_backwards_batch_with_value_based_range_factory(self):
125+ # Another slice is added in _Batch.sliced_list() when the
126+ # regular slice of a backwards batch does not return the
127+ # number of required elements. This works for range factories
128+ # which are based on the values too.
129+ data = [str(value) for value in range(10)]
130+ range_factory = RangeFactoryWithValueBasedEndpointMemos(data)
131+ batch = _Batch(
132+ data, range_factory=range_factory, size=3, range_memo='1',
133+ start=1, range_forwards=False)
134+ self.assertEqual(['0', '1', '2', '3'], batch.sliced_list)
135+
136+ def test_PartialBatch(self):
137+ # PartialBatch implements the full IBatch interface.
138+ from zope.interface.common.mapping import IItemMapping
139+ self.assertTrue(verifyClass(IBatch, _PartialBatch))
140+ partial = _PartialBatch(sliced_list=range(3))
141+ self.assertTrue(verifyObject(IBatch, partial))
142+ # trueSize is the length of sliced_list
143+ self.assertEqual(3, partial.trueSize)
144+ # sliced_list is passed by the contrucotr parameter sliced_list
145+ self.assertEqual([0, 1, 2], partial.sliced_list)
146+ # __len__() returns the length of the sliced list
147+ self.assertEqual(3, len(partial))
148+ # __iter__() iterates over sliced_list
149+ self.assertEqual([0, 1, 2], [element for element in partial])
150+ # __contains__() works.
151+ self.assertTrue(1 in partial)
152+ self.assertFalse(3 in partial)
153+ # prevBatch(), nextBatch() exost but are not implemented.
154+ self.assertRaises(NotImplementedError, partial.prevBatch)
155+ self.assertRaises(NotImplementedError, partial.nextBatch)
156+ # first and last are implemented.
157+ self.assertEqual(0, partial.first())
158+ self.assertEqual(2, partial.last())
159+ # total() return the length of sliced_list
160+ self.assertEqual(3, partial.total())
161+ # startNumber, endNumber() are implemented
162+ self.assertEqual(1, partial.startNumber())
163+ self.assertEqual(4, partial.endNumber())
164+
165
166=== modified file 'src/lazr/batchnavigator/z3batching/batch.py'
167--- src/lazr/batchnavigator/z3batching/batch.py 2011-08-22 17:05:29 +0000
168+++ src/lazr/batchnavigator/z3batching/batch.py 2011-08-24 16:24:23 +0000
169@@ -31,6 +31,85 @@
170 # as BatchNavigator. In Launchpad, we override it via a config option.
171 BATCH_SIZE = 50
172
173+class _PartialBatch:
174+ """A helper batch implementation.
175+
176+ _Batch.sliced_list() below needs to retrieve a second chunk of
177+ data when a call of range_factory.getSlice() for a backwards batch
178+ returns less elements than requested because the start of the result
179+ set is reached.
180+
181+ In this case, another (forwards) batch must be retrieved. _Batch
182+ must not assume that the memo value used to retrieve the first, too
183+ small, result set can be used to retrieve the additional data. (The
184+ class RangeFactoryWithValueBasedEndpointMemos in
185+ tests/test_z3batching.py is an example where this assumption fails.)
186+
187+ Instead, _Batch.sliced_list() must retrieve the endpoint memos for
188+ the partial data and use them to retrieve the missing part of the
189+ result set. Since
190+ - IRangeFactory.getEndpointMemos(batch) is free to use any
191+ property of IBatch,
192+ - a call like self.range_factory.getEndpointMemos(self) in
193+ _Batch.sliced_list() leads to infinite recursions if the
194+ range factory wants to access sliced_list,
195+
196+ we use this helper class for the getEndpointMemos() call in
197+ _Batch.sliced_list().
198+ """
199+ implements(IBatch)
200+
201+ def __init__(self, sliced_list):
202+ self.start = 0
203+ self.trueSize = len(sliced_list)
204+ self.sliced_list = sliced_list
205+ self.size = len(sliced_list)
206+
207+ def __len__(self):
208+ """See `IBatch`."""
209+ return len(self.sliced_list)
210+
211+ def __iter__(self):
212+ """See `IBatch`."""
213+ return iter(self.sliced_list)
214+
215+ def __getitem__(self, index):
216+ """See `IBatch`."""
217+ return self.sliced_list[index]
218+
219+ def __contains__(self, key):
220+ """See `IBatch`."""
221+ return 0 <= key < len(self.sliced_list)
222+
223+ def nextBatch(self):
224+ """See `IBatch`."""
225+ raise NotImplementedError
226+
227+ def prevBatch(self):
228+ """See `IBatch`."""
229+ raise NotImplementedError
230+
231+ def first(self):
232+ """See `IBatch`."""
233+ return self.sliced_list[0]
234+
235+ def last(self):
236+ """See `IBatch`."""
237+ return self.sliced_list[-1]
238+
239+ def total(self):
240+ """See `IBatch`."""
241+ return len(self.sliced_list)
242+
243+ def startNumber(self):
244+ """See `IBatch`."""
245+ return 1
246+
247+ def endNumber(self):
248+ """See `IBatch`."""
249+ return len(self.sliced_list) + 1
250+
251+
252 class _Batch(object):
253 implements(IBatch)
254
255@@ -171,8 +250,11 @@
256 # range memo may have constrained us. So we need to get
257 # some more results:
258 needed = size - len(sliced)
259+ partial = _PartialBatch(sliced)
260+ extra_memo = (
261+ self.range_factory.getEndpointMemos(partial))
262 extra = self.range_factory.getSlice(needed,
263- self.range_memo, forwards=True)
264+ extra_memo[1], forwards=True)
265 sliced = sliced + extra
266 self.is_first_batch = True
267 # This is the first time we get an inkling of (approximately)
268
269=== modified file 'src/lazr/batchnavigator/z3batching/interfaces.py'
270--- src/lazr/batchnavigator/z3batching/interfaces.py 2009-03-03 18:02:58 +0000
271+++ src/lazr/batchnavigator/z3batching/interfaces.py 2011-08-24 16:24:23 +0000
272@@ -15,6 +15,7 @@
273
274 $Id$
275 """
276+from zope.interface import Attribute
277 from zope.interface.common.mapping import IItemMapping
278
279 class IBatch(IItemMapping):
280@@ -34,29 +35,37 @@
281 """Creates an iterator for the contents of the batch (not the entire
282 list)."""
283
284+ def __getitem__(index):
285+ """Return the element at the given offset."""
286+
287 def __contains__(key):
288 """Checks whether the key (in our case an index) exists."""
289
290- def nextBatch(self):
291+ def nextBatch():
292 """Return the next batch. If there is no next batch, return None."""
293
294- def prevBatch(self):
295+ def prevBatch():
296 """Return the previous batch. If there is no previous batch, return
297 None."""
298
299- def first(self):
300+ def first():
301 """Return the first element of the batch."""
302
303- def last(self):
304+ def last():
305 """Return the last element of the batch."""
306
307- def total(self):
308+ def total():
309 """Return the length of the list (not the batch)."""
310
311- def startNumber(self):
312+ def startNumber():
313 """Give the start **number** of the batch, which is 1 more than the
314 start index passed in."""
315
316- def endNumber(self):
317+ def endNumber():
318 """Give the end **number** of the batch, which is 1 more than the
319 final index."""
320+
321+ sliced_list = Attribute(
322+ "A sliced list as returned by IRangeFactory.sliced_list.")
323+
324+ trueSize = Attribute("The actual size of this batch.")

Subscribers

People subscribed via source and target branches