Merge lp:~adeuring/lazr.batchnavigator/slicing-error-for-too-short-last-backwards-batch into lp:lazr.batchnavigator

Proposed by Abel Deuring
Status: Merged
Approved by: Graham Binns
Approved revision: 48
Merged at revision: 48
Proposed branch: lp:~adeuring/lazr.batchnavigator/slicing-error-for-too-short-last-backwards-batch
Merge into: lp:lazr.batchnavigator
Diff against target: 324 lines (+234/-9)
3 files modified
src/lazr/batchnavigator/tests/test_z3batching.py (+135/-1)
src/lazr/batchnavigator/z3batching/batch.py (+83/-1)
src/lazr/batchnavigator/z3batching/interfaces.py (+16/-7)
To merge this branch: bzr merge lp:~adeuring/lazr.batchnavigator/slicing-error-for-too-short-last-backwards-batch
Reviewer Review Type Date Requested Status
Graham Binns (community) code Approve
Review via email: mp+72745@code.launchpad.net

Description of the change

The problem:

  - When _Batch.sliced_list() retrieves a slice of a
    result set (line 235 of batch.py) to get the data
    of a backwards batch:

    sliced = self.range_factory.getSlice(size, self.range_memo,
        self.range_forwards)

  - when the result of this call has less elements than
    needed

the method called

    self.range_factory.getSlice(needed, self.range_memo, forwards=True)

in order to retrieve more elements of a result set. This
works fine, if self.range_memo, i.e., the endpoint memo
value used to retrieve the first, too small, chunk of data,
can be used as an endpoint value to retrieve a chunk of data
that follows the already existing chunk.

This works for the ListRangeFactory, which uses the regular
Python slicing protocol: The expression s[a, b] is that part
of the sequence s which starts at index a and ends before
index b.

In this case, self.range_memo is basically used like 'b' above
in

    sliced = self.range_factory.getSlice(size, self.range_memo,
        self.range_forwards)

and it is used as 'a' in

    self.range_factory.getSlice(needed, self.range_memo, forwards=True)

But Launchpad's class StormRangeFactory works slightly
different: Its method getEndpointmemos() returns the
values of the columns used for sorting: The two values
returned by this method should be used in a query like
"return N rows from the result set where the sort columns
have values (smaller than|larger than) the endpoint memo
value".

If we assume a result set like

    [1, 2, 3, 4, 5]

and the bachwards batch was retrieved for the memo value 3
(factory.getSlice(size, 3, backwards=True)), the result is:

   [1, 2]

The second call (factory.setSlice(size, 3, forwards=True)
would return

  [4, 5]

The core of the fix: sliced_list() must explicitly retrieve
the endpoint memo values for the already retrieved chunk of
data. That's just these lines of the diff:

+ partial = _PartialBatch(sliced)
+ extra_memo = (
+ self.range_factory.getEndpointMemos(partial))
     extra = self.range_factory.getSlice(needed,
- self.range_memo, forwards=True)
+ extra_memo[1], forwards=True)

The problem with these innocent lines: A method
IRangeFactory.getEndpointMemos(batch) can use _any_
parameters from batch...

So we need to build again an object which implements the
full IBatch interface (at least formally -- the methods
prevBatch() and nextBatch() are irrelevant, hopefully also
in the future...) so that any range factory has access
to whatever batching related attribute or method it wants...

This "partial batch" (is there a better name?) is implemented
by, well, class _PartialBatch.

I also added the methods __getitem__, sliced_list, trueSize
to IBatch, because these sliced_list is explicitly used by
StormRangeFactory to retrieve results. And ListRangeFactory
wants access to trueSize.

I first reproduced and then the problem of the "missing
element" in a backwards batch with the new method
test_last_backwards_batch_with_value_based_range_factory(),
which needs a special IRangeFactory, which in turn needed
a bit of testing (test_PartialBatch()).

That's roughly how a six lines diff exploded into 320 lines ;)

To post a comment you must log in.
Revision history for this message
Robert Collins (lifeless) wrote :

turtles all the way down huh.

review: approve

Revision history for this message
Graham Binns (gmb) wrote :

Marking this approved as a proxy for Robert.

review: Approve (code)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'src/lazr/batchnavigator/tests/test_z3batching.py'
--- src/lazr/batchnavigator/tests/test_z3batching.py 2011-08-23 13:14:41 +0000
+++ src/lazr/batchnavigator/tests/test_z3batching.py 2011-08-24 16:24:23 +0000
@@ -16,7 +16,13 @@
16import operator16import operator
17import unittest17import unittest
1818
19from lazr.batchnavigator.z3batching.batch import _Batch, BATCH_SIZE19from zope.interface.verify import verifyObject, verifyClass
20
21from lazr.batchnavigator.z3batching.batch import (
22 _Batch,
23 BATCH_SIZE,
24 _PartialBatch,
25 )
20from lazr.batchnavigator.z3batching.interfaces import IBatch26from lazr.batchnavigator.z3batching.interfaces import IBatch
21from lazr.batchnavigator import ListRangeFactory27from lazr.batchnavigator import ListRangeFactory
2228
@@ -59,6 +65,52 @@
59 return super(ListWithIncorrectLength, self).__getslice__(start, end)65 return super(ListWithIncorrectLength, self).__getslice__(start, end)
6066
6167
68class RangeFactoryWithValueBasedEndpointMemos:
69 """A RangeFactory which uses data values from a batch as endpoint memos.
70 """
71 def __init__(self, results):
72 self.results = results
73
74 def getEndpointMemos(self, batch):
75 """See `IRangeFactory`."""
76 return batch[0], batch[-1]
77
78 getEndpointMemosFromSlice = getEndpointMemos
79
80 def getSlice(self, size, endpoint_memo='', forwards=True):
81 """See `IRangeFactory`."""
82 if size == 0:
83 return []
84 if endpoint_memo == '':
85 if forwards:
86 return self.results[:size]
87 else:
88 sliced = self.results[-size:]
89 sliced.reverse()
90 return sliced
91
92 if forwards:
93 index = 0
94 while (index < len(self.results) and
95 endpoint_memo >= self.results[index]):
96 index += 1
97 return self.results[index:index+size]
98 else:
99 index = len(self.results) - 1
100 while (index >= 0 and endpoint_memo < self.results[index]):
101 index -= 1
102 if index < 0:
103 return []
104 start_index = max(0, index - size)
105 sliced = self.results[start_index:index]
106 sliced.reverse()
107 return sliced
108
109 def getSliceByIndex(self, start, end):
110 """See `IRangeFactory`."""
111 return self.results[start:end]
112
113
62class TestingInfrastructureTest(unittest.TestCase):114class TestingInfrastructureTest(unittest.TestCase):
63 def test_ListWithExplosiveLen(self):115 def test_ListWithExplosiveLen(self):
64 # For some of the tests we want to be sure len() of the underlying116 # For some of the tests we want to be sure len() of the underlying
@@ -93,6 +145,46 @@
93 self.assertEqual([], weird_list[-2:-1])145 self.assertEqual([], weird_list[-2:-1])
94 self.assertEqual([3], weird_list[-3:-2])146 self.assertEqual([3], weird_list[-3:-2])
95147
148 def test_RangeFactoryWithValueBased_getEndpointMemos(self):
149 data = [str(value) for value in range(10)]
150 self.assertEqual(
151 ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'], data)
152 factory = RangeFactoryWithValueBasedEndpointMemos(data)
153 # The endpoint memo values are the values of the first and last
154 # elelemnt of a batch.
155 batch = data[:3]
156 self.assertEqual(('0', '2'), factory.getEndpointMemos(batch))
157 batch = data[4:8]
158 self.assertEqual(('4', '7'), factory.getEndpointMemos(batch))
159 # getSlice() called with an empty memo value returns the
160 # first elements if forwards is True...
161 self.assertEqual(
162 ['0', '1', '2'],
163 factory.getSlice(size=3, endpoint_memo='', forwards=True))
164 # ...and the last elements if forwards is False.
165 self.assertEqual(
166 ['9', '8', '7'],
167 factory.getSlice(size=3, endpoint_memo='', forwards=False))
168 # A forwards slice starts with a value larger than the
169 # given memo value.
170 self.assertEqual(
171 ['6', '7'],
172 factory.getSlice(size=2, endpoint_memo='5', forwards=True))
173 # A backwards slice starts with a value smaller than the
174 # given memo value.
175 self.assertEqual(
176 ['4', '3'],
177 factory.getSlice(size=2, endpoint_memo='5', forwards=False))
178 # A slice is smaller than requested if the end of the results
179 # is reached.
180 self.assertEqual(
181 ['8', '9'],
182 factory.getSlice(size=3, endpoint_memo='7', forwards=True))
183 self.assertEqual(
184 [], factory.getSlice(size=3, endpoint_memo='A', forwards=True))
185 self.assertEqual(
186 [], factory.getSlice(size=3, endpoint_memo=' ', forwards=False))
187
96188
97class RecordingFactory(ListRangeFactory):189class RecordingFactory(ListRangeFactory):
98 def __init__(self, results):190 def __init__(self, results):
@@ -514,3 +606,45 @@
514 self.assertEqual([1, 2, 3, 4], batch.sliced_list)606 self.assertEqual([1, 2, 3, 4], batch.sliced_list)
515 # And we get a previous batch.607 # And we get a previous batch.
516 self.assertEqual([0, 1, 2, 3], new_batch.prevBatch().sliced_list)608 self.assertEqual([0, 1, 2, 3], new_batch.prevBatch().sliced_list)
609
610 def test_last_backwards_batch_with_value_based_range_factory(self):
611 # Another slice is added in _Batch.sliced_list() when the
612 # regular slice of a backwards batch does not return the
613 # number of required elements. This works for range factories
614 # which are based on the values too.
615 data = [str(value) for value in range(10)]
616 range_factory = RangeFactoryWithValueBasedEndpointMemos(data)
617 batch = _Batch(
618 data, range_factory=range_factory, size=3, range_memo='1',
619 start=1, range_forwards=False)
620 self.assertEqual(['0', '1', '2', '3'], batch.sliced_list)
621
622 def test_PartialBatch(self):
623 # PartialBatch implements the full IBatch interface.
624 from zope.interface.common.mapping import IItemMapping
625 self.assertTrue(verifyClass(IBatch, _PartialBatch))
626 partial = _PartialBatch(sliced_list=range(3))
627 self.assertTrue(verifyObject(IBatch, partial))
628 # trueSize is the length of sliced_list
629 self.assertEqual(3, partial.trueSize)
630 # sliced_list is passed by the contrucotr parameter sliced_list
631 self.assertEqual([0, 1, 2], partial.sliced_list)
632 # __len__() returns the length of the sliced list
633 self.assertEqual(3, len(partial))
634 # __iter__() iterates over sliced_list
635 self.assertEqual([0, 1, 2], [element for element in partial])
636 # __contains__() works.
637 self.assertTrue(1 in partial)
638 self.assertFalse(3 in partial)
639 # prevBatch(), nextBatch() exost but are not implemented.
640 self.assertRaises(NotImplementedError, partial.prevBatch)
641 self.assertRaises(NotImplementedError, partial.nextBatch)
642 # first and last are implemented.
643 self.assertEqual(0, partial.first())
644 self.assertEqual(2, partial.last())
645 # total() return the length of sliced_list
646 self.assertEqual(3, partial.total())
647 # startNumber, endNumber() are implemented
648 self.assertEqual(1, partial.startNumber())
649 self.assertEqual(4, partial.endNumber())
650
517651
=== modified file 'src/lazr/batchnavigator/z3batching/batch.py'
--- src/lazr/batchnavigator/z3batching/batch.py 2011-08-22 17:05:29 +0000
+++ src/lazr/batchnavigator/z3batching/batch.py 2011-08-24 16:24:23 +0000
@@ -31,6 +31,85 @@
31# as BatchNavigator. In Launchpad, we override it via a config option.31# as BatchNavigator. In Launchpad, we override it via a config option.
32BATCH_SIZE = 5032BATCH_SIZE = 50
3333
34class _PartialBatch:
35 """A helper batch implementation.
36
37 _Batch.sliced_list() below needs to retrieve a second chunk of
38 data when a call of range_factory.getSlice() for a backwards batch
39 returns less elements than requested because the start of the result
40 set is reached.
41
42 In this case, another (forwards) batch must be retrieved. _Batch
43 must not assume that the memo value used to retrieve the first, too
44 small, result set can be used to retrieve the additional data. (The
45 class RangeFactoryWithValueBasedEndpointMemos in
46 tests/test_z3batching.py is an example where this assumption fails.)
47
48 Instead, _Batch.sliced_list() must retrieve the endpoint memos for
49 the partial data and use them to retrieve the missing part of the
50 result set. Since
51 - IRangeFactory.getEndpointMemos(batch) is free to use any
52 property of IBatch,
53 - a call like self.range_factory.getEndpointMemos(self) in
54 _Batch.sliced_list() leads to infinite recursions if the
55 range factory wants to access sliced_list,
56
57 we use this helper class for the getEndpointMemos() call in
58 _Batch.sliced_list().
59 """
60 implements(IBatch)
61
62 def __init__(self, sliced_list):
63 self.start = 0
64 self.trueSize = len(sliced_list)
65 self.sliced_list = sliced_list
66 self.size = len(sliced_list)
67
68 def __len__(self):
69 """See `IBatch`."""
70 return len(self.sliced_list)
71
72 def __iter__(self):
73 """See `IBatch`."""
74 return iter(self.sliced_list)
75
76 def __getitem__(self, index):
77 """See `IBatch`."""
78 return self.sliced_list[index]
79
80 def __contains__(self, key):
81 """See `IBatch`."""
82 return 0 <= key < len(self.sliced_list)
83
84 def nextBatch(self):
85 """See `IBatch`."""
86 raise NotImplementedError
87
88 def prevBatch(self):
89 """See `IBatch`."""
90 raise NotImplementedError
91
92 def first(self):
93 """See `IBatch`."""
94 return self.sliced_list[0]
95
96 def last(self):
97 """See `IBatch`."""
98 return self.sliced_list[-1]
99
100 def total(self):
101 """See `IBatch`."""
102 return len(self.sliced_list)
103
104 def startNumber(self):
105 """See `IBatch`."""
106 return 1
107
108 def endNumber(self):
109 """See `IBatch`."""
110 return len(self.sliced_list) + 1
111
112
34class _Batch(object):113class _Batch(object):
35 implements(IBatch)114 implements(IBatch)
36115
@@ -171,8 +250,11 @@
171 # range memo may have constrained us. So we need to get250 # range memo may have constrained us. So we need to get
172 # some more results:251 # some more results:
173 needed = size - len(sliced)252 needed = size - len(sliced)
253 partial = _PartialBatch(sliced)
254 extra_memo = (
255 self.range_factory.getEndpointMemos(partial))
174 extra = self.range_factory.getSlice(needed,256 extra = self.range_factory.getSlice(needed,
175 self.range_memo, forwards=True)257 extra_memo[1], forwards=True)
176 sliced = sliced + extra258 sliced = sliced + extra
177 self.is_first_batch = True259 self.is_first_batch = True
178 # This is the first time we get an inkling of (approximately)260 # This is the first time we get an inkling of (approximately)
179261
=== modified file 'src/lazr/batchnavigator/z3batching/interfaces.py'
--- src/lazr/batchnavigator/z3batching/interfaces.py 2009-03-03 18:02:58 +0000
+++ src/lazr/batchnavigator/z3batching/interfaces.py 2011-08-24 16:24:23 +0000
@@ -15,6 +15,7 @@
1515
16$Id$16$Id$
17"""17"""
18from zope.interface import Attribute
18from zope.interface.common.mapping import IItemMapping19from zope.interface.common.mapping import IItemMapping
1920
20class IBatch(IItemMapping):21class IBatch(IItemMapping):
@@ -34,29 +35,37 @@
34 """Creates an iterator for the contents of the batch (not the entire35 """Creates an iterator for the contents of the batch (not the entire
35 list)."""36 list)."""
3637
38 def __getitem__(index):
39 """Return the element at the given offset."""
40
37 def __contains__(key):41 def __contains__(key):
38 """Checks whether the key (in our case an index) exists."""42 """Checks whether the key (in our case an index) exists."""
3943
40 def nextBatch(self):44 def nextBatch():
41 """Return the next batch. If there is no next batch, return None."""45 """Return the next batch. If there is no next batch, return None."""
4246
43 def prevBatch(self):47 def prevBatch():
44 """Return the previous batch. If there is no previous batch, return48 """Return the previous batch. If there is no previous batch, return
45 None."""49 None."""
4650
47 def first(self):51 def first():
48 """Return the first element of the batch."""52 """Return the first element of the batch."""
4953
50 def last(self):54 def last():
51 """Return the last element of the batch."""55 """Return the last element of the batch."""
5256
53 def total(self):57 def total():
54 """Return the length of the list (not the batch)."""58 """Return the length of the list (not the batch)."""
5559
56 def startNumber(self):60 def startNumber():
57 """Give the start **number** of the batch, which is 1 more than the61 """Give the start **number** of the batch, which is 1 more than the
58 start index passed in."""62 start index passed in."""
5963
60 def endNumber(self):64 def endNumber():
61 """Give the end **number** of the batch, which is 1 more than the65 """Give the end **number** of the batch, which is 1 more than the
62 final index."""66 final index."""
67
68 sliced_list = Attribute(
69 "A sliced list as returned by IRangeFactory.sliced_list.")
70
71 trueSize = Attribute("The actual size of this batch.")

Subscribers

People subscribed via source and target branches