lazr.batchnavigator

Merge lp:~adeuring/lazr.batchnavigator/slicing-error-for-too-short-last-backwards-batch into lp:lazr.batchnavigator

slicing-error-for-too-short-last-backwards-batch
Merge into trunk

Proposed by Abel Deuring on 2011-08-24

Status:	Merged
Approved by:	Graham Binns on 2011-08-25
Approved revision:	48
Merged at revision:	48
Proposed branch:	lp:~adeuring/lazr.batchnavigator/slicing-error-for-too-short-last-backwards-batch
Merge into:	lp:lazr.batchnavigator
Diff against target:	324 lines (+234/-9) 3 files modified src/lazr/batchnavigator/tests/test_z3batching.py (+135/-1) src/lazr/batchnavigator/z3batching/batch.py (+83/-1) src/lazr/batchnavigator/z3batching/interfaces.py (+16/-7)
To merge this branch:	bzr merge lp:~adeuring/lazr.batchnavigator/slicing-error-for-too-short-last-backwards-batch
Related bugs:	Link a bug report

Reviewer	Review Type	Date Requested	Status
Graham Binns (community)	code	2011-08-24	Approve on 2011-08-25
Review via email: mp+72745@code.launchpad.net

Description of the change

The problem:

  - When _Batch.sliced_list() retrieves a slice of a
    result set (line 235 of batch.py) to get the data
    of a backwards batch:

sliced = self.range_factory.getSlice(size, self.range_memo,
self.range_forwards)

- when the result of this call has less elements than
needed

the method called

self.range_factory.getSlice(needed, self.range_memo, forwards=True)

in order to retrieve more elements of a result set. This
works fine, if self.range_memo, i.e., the endpoint memo
value used to retrieve the first, too small, chunk of data,
can be used as an endpoint value to retrieve a chunk of data
that follows the already existing chunk.

This works for the ListRangeFactory, which uses the regular
Python slicing protocol: The expression s[a, b] is that part
of the sequence s which starts at index a and ends before
index b.

In this case, self.range_memo is basically used like 'b' above
in

sliced = self.range_factory.getSlice(size, self.range_memo,
self.range_forwards)

and it is used as 'a' in

self.range_factory.getSlice(needed, self.range_memo, forwards=True)

But Launchpad's class StormRangeFactory works slightly
different: Its method getEndpointmemos() returns the
values of the columns used for sorting: The two values
returned by this method should be used in a query like
"return N rows from the result set where the sort columns
have values (smaller than|larger than) the endpoint memo
value".

If we assume a result set like

[1, 2, 3, 4, 5]

and the bachwards batch was retrieved for the memo value 3
(factory.getSlice(size, 3, backwards=True)), the result is:

[1, 2]

The second call (factory.setSlice(size, 3, forwards=True)
would return

[4, 5]

The core of the fix: sliced_list() must explicitly retrieve
the endpoint memo values for the already retrieved chunk of
data. That's just these lines of the diff:

+ partial = _PartialBatch(sliced)
+ extra_memo = (
+ self.range_factory.getEndpointMemos(partial))
extra = self.range_factory.getSlice(needed,
- self.range_memo, forwards=True)
+ extra_memo[1], forwards=True)

The problem with these innocent lines: A method
IRangeFactory.getEndpointMemos(batch) can use _any_
parameters from batch...

So we need to build again an object which implements the
full IBatch interface (at least formally -- the methods
prevBatch() and nextBatch() are irrelevant, hopefully also
in the future...) so that any range factory has access
to whatever batching related attribute or method it wants...

This "partial batch" (is there a better name?) is implemented
by, well, class _PartialBatch.

I also added the methods __getitem__, sliced_list, trueSize
to IBatch, because these sliced_list is explicitly used by
StormRangeFactory to retrieve results. And ListRangeFactory
wants access to trueSize.

I first reproduced and then the problem of the "missing
element" in a backwards batch with the new method
test_last_backwards_batch_with_value_based_range_factory(),
which needs a special IRangeFactory, which in turn needed
a bit of testing (test_PartialBatch()).

That's roughly how a six lines diff exploded into 320 lines ;)

Revision history for this message

Robert Collins (lifeless) wrote on 2011-08-24:

turtles all the way down huh.

review: approve

Revision history for this message

Graham Binns (gmb) wrote on 2011-08-25:

Marking this approved as a proxy for Robert.

review: Approve (code)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Abel Deuring

LAZR Developers

Leonard Richardson

 === modified file 'src/lazr/batchnavigator/tests/test_z3batching.py'
 --- src/lazr/batchnavigator/tests/test_z3batching.py	2011-08-23 13:14:41 +0000
 +++ src/lazr/batchnavigator/tests/test_z3batching.py	2011-08-24 16:24:23 +0000
@@ -16,7 +16,13 @@
  import operator
  import unittest
--from lazr.batchnavigator.z3batching.batch import _Batch, BATCH_SIZE
++from zope.interface.verify import verifyObject, verifyClass
++
++from lazr.batchnavigator.z3batching.batch import (
++    _Batch,
++    BATCH_SIZE,
++    _PartialBatch,
++    )
  from lazr.batchnavigator.z3batching.interfaces import IBatch
  from lazr.batchnavigator import ListRangeFactory
@@ -59,6 +65,52 @@
          return super(ListWithIncorrectLength, self).__getslice__(start, end)
++class RangeFactoryWithValueBasedEndpointMemos:
++    """A RangeFactory which uses data values from a batch as endpoint memos.
++    """
++    def __init__(self, results):
++        self.results = results
++
++    def getEndpointMemos(self, batch):
++        """See `IRangeFactory`."""
++        return batch[0], batch[-1]
++
++    getEndpointMemosFromSlice = getEndpointMemos
++
++    def getSlice(self, size, endpoint_memo='', forwards=True):
++        """See `IRangeFactory`."""
++        if size == 0:
++            return []
++        if endpoint_memo == '':
++            if forwards:
++                return self.results[:size]
++            else:
++                sliced = self.results[-size:]
++                sliced.reverse()
++                return sliced
++
++        if forwards:
++            index = 0
++            while (index < len(self.results) and
++                   endpoint_memo >= self.results[index]):
++                index += 1
++            return self.results[index:index+size]
++        else:
++            index = len(self.results) - 1
++            while (index >= 0 and endpoint_memo < self.results[index]):
++                index -= 1
++            if index < 0:
++                return []
++            start_index = max(0, index - size)
++            sliced = self.results[start_index:index]
++            sliced.reverse()
++            return sliced
++
++    def getSliceByIndex(self, start, end):
++        """See `IRangeFactory`."""
++        return self.results[start:end]
++
++
  class TestingInfrastructureTest(unittest.TestCase):
      def test_ListWithExplosiveLen(self):
          # For some of the tests we want to be sure len() of the underlying
@@ -93,6 +145,46 @@
          self.assertEqual([], weird_list[-2:-1])
          self.assertEqual([3], weird_list[-3:-2])
++    def test_RangeFactoryWithValueBased_getEndpointMemos(self):
++        data = [str(value) for value in range(10)]
++        self.assertEqual(
++            ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'], data)
++        factory = RangeFactoryWithValueBasedEndpointMemos(data)
++        # The endpoint memo values are the values of the first and last
++        # elelemnt of a batch.
++        batch = data[:3]
++        self.assertEqual(('0', '2'), factory.getEndpointMemos(batch))
++        batch = data[4:8]
++        self.assertEqual(('4', '7'), factory.getEndpointMemos(batch))
++        # getSlice() called with an empty memo value returns the
++        # first elements if forwards is True...
++        self.assertEqual(
++            ['0', '1', '2'],
++            factory.getSlice(size=3, endpoint_memo='', forwards=True))
++        # ...and the last elements if forwards is False.
++        self.assertEqual(
++            ['9', '8', '7'],
++            factory.getSlice(size=3, endpoint_memo='', forwards=False))
++        # A forwards slice starts with a value larger than the
++        # given memo value.
++        self.assertEqual(
++            ['6', '7'],
++            factory.getSlice(size=2, endpoint_memo='5', forwards=True))
++        # A backwards slice starts with a value smaller than the
++        # given memo value.
++        self.assertEqual(
++            ['4', '3'],
++            factory.getSlice(size=2, endpoint_memo='5', forwards=False))
++        # A slice is smaller than requested if the end of the results
++        # is reached.
++        self.assertEqual(
++            ['8', '9'],
++            factory.getSlice(size=3, endpoint_memo='7', forwards=True))
++        self.assertEqual(
++            [], factory.getSlice(size=3, endpoint_memo='A', forwards=True))
++        self.assertEqual(
++            [], factory.getSlice(size=3, endpoint_memo=' ', forwards=False))
++
  class RecordingFactory(ListRangeFactory):
      def __init__(self, results):
@@ -514,3 +606,45 @@
          self.assertEqual([1, 2, 3, 4], batch.sliced_list)
          # And we get a previous batch.
          self.assertEqual([0, 1, 2, 3], new_batch.prevBatch().sliced_list)
++
++    def test_last_backwards_batch_with_value_based_range_factory(self):
++        # Another slice is added in _Batch.sliced_list() when the
++        # regular slice of a backwards batch does not return the
++        # number of required elements. This works for range factories
++        # which are based on the values too.
++        data = [str(value) for value in range(10)]
++        range_factory = RangeFactoryWithValueBasedEndpointMemos(data)
++        batch = _Batch(
++            data, range_factory=range_factory, size=3, range_memo='1',
++            start=1, range_forwards=False)
++        self.assertEqual(['0', '1', '2', '3'], batch.sliced_list)
++
++    def test_PartialBatch(self):
++        # PartialBatch implements the full IBatch interface.
++        from zope.interface.common.mapping import IItemMapping
++        self.assertTrue(verifyClass(IBatch, _PartialBatch))
++        partial = _PartialBatch(sliced_list=range(3))
++        self.assertTrue(verifyObject(IBatch, partial))
++        # trueSize is the length of sliced_list
++        self.assertEqual(3, partial.trueSize)
++        # sliced_list is passed by the contrucotr parameter sliced_list
++        self.assertEqual([0, 1, 2], partial.sliced_list)
++        # __len__() returns the length of the sliced list
++        self.assertEqual(3, len(partial))
++        # __iter__() iterates over sliced_list
++        self.assertEqual([0, 1, 2], [element for element in partial])
++        # __contains__() works.
++        self.assertTrue(1 in partial)
++        self.assertFalse(3 in partial)
++        # prevBatch(), nextBatch() exost but are not implemented.
++        self.assertRaises(NotImplementedError, partial.prevBatch)
++        self.assertRaises(NotImplementedError, partial.nextBatch)
++        # first and last are implemented.
++        self.assertEqual(0, partial.first())
++        self.assertEqual(2, partial.last())
++        # total() return the length of sliced_list
++        self.assertEqual(3, partial.total())
++        # startNumber, endNumber() are implemented
++        self.assertEqual(1, partial.startNumber())
++        self.assertEqual(4, partial.endNumber())
++
 === modified file 'src/lazr/batchnavigator/z3batching/batch.py'
 --- src/lazr/batchnavigator/z3batching/batch.py	2011-08-22 17:05:29 +0000
 +++ src/lazr/batchnavigator/z3batching/batch.py	2011-08-24 16:24:23 +0000
@@ -31,6 +31,85 @@
  # as BatchNavigator. In Launchpad, we override it via a config option.
  BATCH_SIZE = 50
++class _PartialBatch:
++    """A helper batch implementation.
++
++    _Batch.sliced_list() below needs to retrieve a second chunk of
++    data when a call of range_factory.getSlice() for a backwards batch
++    returns less elements than requested because the start of the result
++    set is reached.
++
++    In this case, another (forwards) batch must be retrieved. _Batch
++    must not assume that the memo value used to retrieve the first, too
++    small, result set can be used to retrieve the additional data. (The
++    class RangeFactoryWithValueBasedEndpointMemos in
++    tests/test_z3batching.py is an example where this assumption fails.)
++
++    Instead, _Batch.sliced_list() must retrieve the endpoint memos for
++    the partial data and use them to retrieve the missing part of the
++    result set. Since
++      - IRangeFactory.getEndpointMemos(batch) is free to use any
++        property of IBatch,
++      - a call like self.range_factory.getEndpointMemos(self) in
++       _Batch.sliced_list() leads to infinite recursions if the
++       range factory wants to access sliced_list,
++
++    we use this helper class for the getEndpointMemos() call in
++    _Batch.sliced_list().
++    """
++    implements(IBatch)
++
++    def __init__(self, sliced_list):
++        self.start = 0
++        self.trueSize = len(sliced_list)
++        self.sliced_list = sliced_list
++        self.size = len(sliced_list)
++
++    def __len__(self):
++        """See `IBatch`."""
++        return len(self.sliced_list)
++
++    def __iter__(self):
++        """See `IBatch`."""
++        return iter(self.sliced_list)
++
++    def __getitem__(self, index):
++        """See `IBatch`."""
++        return self.sliced_list[index]
++
++    def __contains__(self, key):
++        """See `IBatch`."""
++        return 0 <= key < len(self.sliced_list)
++
++    def nextBatch(self):
++        """See `IBatch`."""
++        raise NotImplementedError
++
++    def prevBatch(self):
++        """See `IBatch`."""
++        raise NotImplementedError
++
++    def first(self):
++        """See `IBatch`."""
++        return self.sliced_list[0]
++
++    def last(self):
++        """See `IBatch`."""
++        return self.sliced_list[-1]
++
++    def total(self):
++        """See `IBatch`."""
++        return len(self.sliced_list)
++
++    def startNumber(self):
++        """See `IBatch`."""
++        return 1
++
++    def endNumber(self):
++        """See `IBatch`."""
++        return len(self.sliced_list) + 1
++
++
  class _Batch(object):
      implements(IBatch)
@@ -171,8 +250,11 @@
                          # range memo may have constrained us. So we need to get
                          # some more results:
                          needed = size - len(sliced)
++                        partial = _PartialBatch(sliced)
++                        extra_memo = (
++                            self.range_factory.getEndpointMemos(partial))
                          extra = self.range_factory.getSlice(needed,
--                            self.range_memo, forwards=True)
++                            extra_memo[1], forwards=True)
                          sliced = sliced + extra
                          self.is_first_batch = True
          # This is the first time we get an inkling of (approximately)
 === modified file 'src/lazr/batchnavigator/z3batching/interfaces.py'
 --- src/lazr/batchnavigator/z3batching/interfaces.py	2009-03-03 18:02:58 +0000
 +++ src/lazr/batchnavigator/z3batching/interfaces.py	2011-08-24 16:24:23 +0000
@@ -15,6 +15,7 @@
  $Id$
  """
++from zope.interface import Attribute
  from zope.interface.common.mapping import IItemMapping
  class IBatch(IItemMapping):
@@ -34,29 +35,37 @@
          """Creates an iterator for the contents of the batch (not the entire
          list)."""
++    def __getitem__(index):
++        """Return the element at the given offset."""
++
      def __contains__(key):
          """Checks whether the key (in our case an index) exists."""
--    def nextBatch(self):
++    def nextBatch():
          """Return the next batch. If there is no next batch, return None."""
--    def prevBatch(self):
++    def prevBatch():
          """Return the previous batch. If there is no previous batch, return
          None."""
--    def first(self):
++    def first():
          """Return the first element of the batch."""
--    def last(self):
++    def last():
          """Return the last element of the batch."""
--    def total(self):
++    def total():
          """Return the length of the list (not the batch)."""
--    def startNumber(self):
++    def startNumber():
          """Give the start **number** of the batch, which is 1 more than the
          start index passed in."""
--    def endNumber(self):
++    def endNumber():
          """Give the end **number** of the batch, which is 1 more than the
          final index."""
++
++    sliced_list = Attribute(
++        "A sliced list as returned by IRangeFactory.sliced_list.")
++
++    trueSize = Attribute("The actual size of this batch.")

lazr.batchnavigator

Merge lp:~adeuring/lazr.batchnavigator/slicing-error-for-too-short-last-backwards-batch into lp:lazr.batchnavigator

Commit message

Description of the change

Preview Diff

Subscribers