Merge lp:~jameinel/bzr/1.17-chk-multilevel into lp:~bzr/bzr/trunk-old

Proposed by John A Meinel
Status: Superseded
Proposed branch: lp:~jameinel/bzr/1.17-chk-multilevel
Merge into: lp:~bzr/bzr/trunk-old
Diff against target: 1246 lines
To merge this branch: bzr merge lp:~jameinel/bzr/1.17-chk-multilevel
Reviewer Review Type Date Requested Status
bzr-core Pending
Review via email: mp+8100@code.launchpad.net

This proposal has been superseded by a proposal from 2009-07-02.

To post a comment you must log in.
Revision history for this message
John A Meinel (jameinel) wrote :
Download full text (4.0 KiB)

This is a slightly better fix for bug #390563. The main changes are:

1) Rewrite iter_interesting_nodes using a class, rather than maintaining state between functions
2) Split up the processing of root nodes into:
  a) Read all uninteresting roots, and find uninteresting chks and items
  b) Filter out any interesting root keys that were found in (a)
  c) Read all remaining root pages and yield them
  d) Find interesting references that are not in (a) (and interesting items)
     Note the search key prefixes for these references and items, and queue
     them up to be returned later.
  e) Filter out uninteresting chk pages based on the search key prefixes in (d)

(d) and (e) are the main differences.
For (d) Robert found an edge case where we could return a non-interesting item (which would, in turn, be a text key, which would then be a failure for fetch) when one side had an immediate leaf node, and the other side had a deep tree. The only real change is to just defer returning interesting *items* until all uninteresting pages have been read.

(e) Before we used the 'intersection of interesting references', whereas now we filter simply based on prefix. Likely to be a small win if there are lots of uninteresting pages (lots of merge parents that are not being transmitted).

3) After processing root nodes
  a) Process the uninteresing queue until completion
  b) Process the interesting queue, filtering out all uninteresting items.
     (some minor things here, like doing batch '.difference()' checks rather than .difference on every record processed)

I looked at trying to use a heapq to unify things more with 'iter_changes' but the constraints are *very* different.

1) CHKMap.iter_changes() is symmetric. In that we need to return a record for (source, target, extra), for iter_interesting_nodes, we *only* need to return information for the 'target' side.
2) CHKMap.iter_changes() only ever effects 2 records, a single source and a single target. iter_interesting_nodes is looking at a many-to-many comparison.
3) CHKMap.iter_changes() only cares about the actual keys, and not about sending the raw byte records to be transmitted for fetch.
4) iter_interesting_nodes is much more concerned about batching because of its many-to-many nature, and the fact it is used for fetch. We really don't want to be doing a round trip per page.
5) The only gain for doing heapq is if we could figure out the algorithm for filtering out nodes that we thought we would want to read, but really don't. However,
  a) I do this at the root level, because it is fairly straightforward. So any further improvements only effect trees that are deeper than 2 (1 root, 1+ internal, 1 leaf), however this only happens for trees > 50k items.
  b) Reading extra chk pages isn't "incorrect", just "inefficient" (the stacking code ensures all referenced chk pages are present in the stacked-on branch, just like we ensure compression texts.)
  c) The best I could come up with would be to split the reads up by prefix. So you would page in the 'a' references, then the 'b' references, etc. However, that forces us to split the requests up even further, rather than larger batches of requests. (more...

Read more...

Revision history for this message
Robert Collins (lifeless) wrote :

Shallow immediate review:

The docstring for the class, and perhaps the class name could be better.

In terms of sets its:
interesting - uninteresting

So perhaps
CHKMapDifference

as a class name. And for the docstring, something like

"""Iterate the stored pages and key,value pairs for (new - old).

This class provides a generator over the stored CHK pages and the
key,value pairs that are in any of the new maps and not in any of the
old maps.
"""

-Rob

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bzrlib/chk_map.py'
2--- bzrlib/chk_map.py 2009-06-26 09:24:34 +0000
3+++ bzrlib/chk_map.py 2009-07-01 22:35:14 +0000
4@@ -1398,106 +1398,202 @@
5 return node
6
7
8-def _find_children_info(store, interesting_keys, uninteresting_keys, pb):
9- """Read the associated records, and determine what is interesting."""
10- uninteresting_keys = set(uninteresting_keys)
11- chks_to_read = uninteresting_keys.union(interesting_keys)
12- next_uninteresting = set()
13- next_interesting = set()
14- next_interesting_intersection = None
15- uninteresting_items = set()
16- interesting_items = set()
17- interesting_to_yield = []
18- for record in store.get_record_stream(chks_to_read, 'unordered', True):
19- # records_read.add(record.key())
20- if pb is not None:
21- pb.tick()
22- bytes = record.get_bytes_as('fulltext')
23- # We don't care about search_key_func for this code, because we only
24- # care about external references.
25- node = _deserialise(bytes, record.key, search_key_func=None)
26- if record.key in uninteresting_keys:
27- if type(node) is InternalNode:
28- next_uninteresting.update(node.refs())
29- else:
30- # We know we are at a LeafNode, so we can pass None for the
31- # store
32- uninteresting_items.update(node.iteritems(None))
33- else:
34- interesting_to_yield.append(record.key)
35- if type(node) is InternalNode:
36- if next_interesting_intersection is None:
37- next_interesting_intersection = set(node.refs())
38- else:
39- next_interesting_intersection = \
40- next_interesting_intersection.intersection(node.refs())
41- next_interesting.update(node.refs())
42- else:
43- interesting_items.update(node.iteritems(None))
44- return (next_uninteresting, uninteresting_items,
45- next_interesting, interesting_to_yield, interesting_items,
46- next_interesting_intersection)
47-
48-
49-def _find_all_uninteresting(store, interesting_root_keys,
50- uninteresting_root_keys, pb):
51- """Determine the full set of uninteresting keys."""
52- # What about duplicates between interesting_root_keys and
53- # uninteresting_root_keys?
54- if not uninteresting_root_keys:
55- # Shortcut case. We know there is nothing uninteresting to filter out
56- # So we just let the rest of the algorithm do the work
57- # We know there is nothing uninteresting, and we didn't have to read
58- # any interesting records yet.
59- return (set(), set(), set(interesting_root_keys), [], set())
60- all_uninteresting_chks = set(uninteresting_root_keys)
61- all_uninteresting_items = set()
62-
63- # First step, find the direct children of both the interesting and
64- # uninteresting set
65- (uninteresting_keys, uninteresting_items,
66- interesting_keys, interesting_to_yield,
67- interesting_items, interesting_intersection,
68- ) = _find_children_info(store, interesting_root_keys,
69- uninteresting_root_keys,
70- pb=pb)
71- all_uninteresting_chks.update(uninteresting_keys)
72- all_uninteresting_items.update(uninteresting_items)
73- del uninteresting_items
74- # Do not examine in detail pages common to all interesting trees.
75- # Pages that are common to all interesting trees will have their
76- # older versions found via the uninteresting tree traversal. Some pages
77- # found via the interesting trees traversal will be uninteresting for
78- # other of the interesting trees, which is why we require the pages to be
79- # common for us to trim them.
80- if interesting_intersection is not None:
81- uninteresting_keys.difference_update(interesting_intersection)
82-
83- # Second, find the full set of uninteresting bits reachable by the
84- # uninteresting roots
85- chks_to_read = uninteresting_keys
86- while chks_to_read:
87- next_chks = set()
88- for record in store.get_record_stream(chks_to_read, 'unordered', False):
89- # TODO: Handle 'absent'
90- if pb is not None:
91- pb.tick()
92+class InterestingNodeIterator(object):
93+ """Determine the nodes and items that are 'interesting'
94+
95+ This is defined as being present in the interesting set, but not being
96+ present in the uninteresting set.
97+ """
98+
99+ def __init__(self, store, interesting_root_keys, uninteresting_root_keys,
100+ search_key_func, pb=None):
101+ self._store = store
102+ self._interesting_root_keys = interesting_root_keys
103+ self._uninteresting_root_keys = uninteresting_root_keys
104+ self._pb = pb
105+ # All uninteresting chks that we have seen. By the time they are added
106+ # here, they should be either fully ignored, or queued up for
107+ # processing
108+ self._all_uninteresting_chks = set(self._uninteresting_root_keys)
109+ # All items that we have seen from the uninteresting_root_keys
110+ self._all_uninteresting_items = set()
111+ # These are interesting items which were either read, or already in the
112+ # interesting queue (so we don't need to walk them again)
113+ self._processed_interesting_refs = set()
114+ self._search_key_func = search_key_func
115+
116+ # The uninteresting and interesting nodes to be searched
117+ self._uninteresting_queue = []
118+ self._interesting_queue = []
119+ # Holds the (key, value) items found when processing the root nodes,
120+ # waiting for the uninteresting nodes to be walked
121+ self._interesting_item_queue = []
122+ self._state = None
123+
124+ def _read_nodes_from_store(self, keys):
125+ # We chose not to use _page_cache, because we think in terms of records
126+ # to be yielded. Also, we expect to touch each page only 1 time during
127+ # this code. (We may want to evaluate saving the raw bytes into the
128+ # page cache, which would allow a working tree update after the fetch
129+ # to not have to read the bytes again.)
130+ stream = self._store.get_record_stream(keys, 'unordered', True)
131+ for record in stream:
132+ if self._pb is not None:
133+ self._pb.tick()
134+ if record.storage_kind == 'absent':
135+ raise errors.NoSuchRevision(self._store, record.key)
136 bytes = record.get_bytes_as('fulltext')
137- # We don't care about search_key_func for this code, because we
138- # only care about external references.
139- node = _deserialise(bytes, record.key, search_key_func=None)
140+ node = _deserialise(bytes, record.key,
141+ search_key_func=self._search_key_func)
142 if type(node) is InternalNode:
143- # uninteresting_prefix_chks.update(node._items.iteritems())
144- chks = node._items.values()
145- # TODO: We remove the entries that are already in
146- # uninteresting_chks ?
147- next_chks.update(chks)
148- all_uninteresting_chks.update(chks)
149+ # Note we don't have to do node.refs() because we know that
150+ # there are no children that have been pushed into this node
151+ prefix_refs = node._items.items()
152+ items = []
153 else:
154- all_uninteresting_items.update(node._items.iteritems())
155- chks_to_read = next_chks
156- return (all_uninteresting_chks, all_uninteresting_items,
157- interesting_keys, interesting_to_yield, interesting_items)
158+ prefix_refs = []
159+ items = node._items.items()
160+ yield record, node, prefix_refs, items
161+
162+ def _read_uninteresting_roots(self):
163+ uninteresting_chks_to_enqueue = []
164+ all_uninteresting_chks = self._all_uninteresting_chks
165+ for record, node, prefix_refs, items in \
166+ self._read_nodes_from_store(self._uninteresting_root_keys):
167+ # Uninteresting node
168+ prefix_refs = [p_r for p_r in prefix_refs
169+ if p_r[1] not in all_uninteresting_chks]
170+ new_refs = [p_r[1] for p_r in prefix_refs]
171+ all_uninteresting_chks.update(new_refs)
172+ self._all_uninteresting_items.update(items)
173+ # Queue up the uninteresting references
174+ # Don't actually put them in the 'to-read' queue until we have
175+ # finished checking the interesting references
176+ uninteresting_chks_to_enqueue.extend(prefix_refs)
177+ return uninteresting_chks_to_enqueue
178+
179+ def _enqueue_uninteresting(self, interesting_prefixes,
180+ uninteresting_chks_to_enqueue):
181+ # At this point, we have read all the uninteresting and interesting
182+ # items, so we can queue up the uninteresting stuff, knowing that we've
183+ # handled the interesting ones
184+ for prefix, ref in uninteresting_chks_to_enqueue:
185+ not_interesting = True
186+ for i in xrange(len(prefix), 0, -1):
187+ if prefix[:i] in interesting_prefixes:
188+ not_interesting = False
189+ break
190+ if not_interesting:
191+ # This prefix is not part of the remaining 'interesting set'
192+ continue
193+ self._uninteresting_queue.append(ref)
194+
195+ def _read_all_roots(self):
196+ """Read the root pages.
197+
198+ This is structured as a generator, so that the root records can be
199+ yielded up to whoever needs them without any buffering.
200+ """
201+ # This is the bootstrap phase
202+ if not self._uninteresting_root_keys:
203+ # With no uninteresting_root_keys we can just shortcut and be ready
204+ # for _flush_interesting_queue
205+ self._interesting_queue = list(self._interesting_root_keys)
206+ return
207+ uninteresting_chks_to_enqueue = self._read_uninteresting_roots()
208+ # filter out any root keys that are already known to be uninteresting
209+ interesting_keys = set(self._interesting_root_keys).difference(
210+ self._all_uninteresting_chks)
211+ # These are prefixes that are present in interesting_keys that we are
212+ # thinking to yield
213+ interesting_prefixes = set()
214+ # We are about to yield all of these, so we don't want them getting
215+ # added a second time
216+ processed_interesting_refs = self._processed_interesting_refs
217+ processed_interesting_refs.update(interesting_keys)
218+ for record, node, prefix_refs, items in \
219+ self._read_nodes_from_store(interesting_keys):
220+ # At this level, we now know all the uninteresting references
221+ # So we filter and queue up whatever is remaining
222+ prefix_refs = [p_r for p_r in prefix_refs
223+ if p_r[1] not in self._all_uninteresting_chks
224+ and p_r[1] not in processed_interesting_refs]
225+ refs = [p_r[1] for p_r in prefix_refs]
226+ interesting_prefixes.update([p_r[0] for p_r in prefix_refs])
227+ self._interesting_queue.extend(refs)
228+ # TODO: We can potentially get multiple items here, however the
229+ # current design allows for this, as callers will do the work
230+ # to make the results unique. We might profile whether we
231+ # gain anything by ensuring unique return values for items
232+ interesting_items = [item for item in items
233+ if item not in self._all_uninteresting_items]
234+ self._interesting_item_queue.extend(interesting_items)
235+ interesting_prefixes.update([self._search_key_func(item[0])
236+ for item in interesting_items])
237+ processed_interesting_refs.update(refs)
238+ yield record
239+ # For interesting_prefixes we have the full length prefixes queued up.
240+ # However, we also need possible prefixes. (If we have a known ref to
241+ # 'ab', then we also need to include 'a'.) So expand the
242+ # interesting_prefixes to include all shorter prefixes
243+ for prefix in list(interesting_prefixes):
244+ interesting_prefixes.update([prefix[:i]
245+ for i in xrange(1, len(prefix))])
246+ self._enqueue_uninteresting(interesting_prefixes,
247+ uninteresting_chks_to_enqueue)
248+
249+ def _flush_interesting_queue(self):
250+ # No need to maintain the heap invariant anymore, just pull things out
251+ # and process them
252+ refs = set(self._interesting_queue)
253+ self._interesting_queue = []
254+ # First pass, flush all interesting items and convert to using direct refs
255+ all_uninteresting_chks = self._all_uninteresting_chks
256+ processed_interesting_refs = self._processed_interesting_refs
257+ all_uninteresting_items = self._all_uninteresting_items
258+ interesting_items = [item for item in self._interesting_item_queue
259+ if item not in all_uninteresting_items]
260+ self._interesting_item_queue = []
261+ if interesting_items:
262+ yield None, interesting_items
263+ refs = refs.difference(all_uninteresting_chks)
264+ while refs:
265+ next_refs = set()
266+ next_refs_update = next_refs.update
267+ # Inlining _read_nodes_from_store improves 'bzr branch bzr.dev'
268+ # from 1m54s to 1m51s. Consider it.
269+ for record, _, p_refs, items in self._read_nodes_from_store(refs):
270+ items = [item for item in items
271+ if item not in all_uninteresting_items]
272+ yield record, items
273+ next_refs_update([p_r[1] for p_r in p_refs])
274+ next_refs = next_refs.difference(all_uninteresting_chks)
275+ next_refs = next_refs.difference(processed_interesting_refs)
276+ processed_interesting_refs.update(next_refs)
277+ refs = next_refs
278+
279+ def _process_next_uninteresting(self):
280+ # Since we don't filter uninteresting any further than during
281+ # _read_all_roots, process the whole queue in a single pass.
282+ refs = self._uninteresting_queue
283+ self._uninteresting_queue = []
284+ all_uninteresting_chks = self._all_uninteresting_chks
285+ for record, _, prefix_refs, items in self._read_nodes_from_store(refs):
286+ self._all_uninteresting_items.update(items)
287+ refs = [r for _,r in prefix_refs if r not in all_uninteresting_chks]
288+ self._uninteresting_queue.extend(refs)
289+ all_uninteresting_chks.update(refs)
290+
291+ def _process_queues(self):
292+ while self._uninteresting_queue:
293+ self._process_next_uninteresting()
294+ return self._flush_interesting_queue()
295+
296+ def process(self):
297+ for record in self._read_all_roots():
298+ yield record, []
299+ for record, items in self._process_queues():
300+ yield record, items
301
302
303 def iter_interesting_nodes(store, interesting_root_keys,
304@@ -1514,72 +1610,11 @@
305 :return: Yield
306 (interesting record, {interesting key:values})
307 """
308- # TODO: consider that it may be more memory efficient to use the 20-byte
309- # sha1 string, rather than tuples of hexidecimal sha1 strings.
310- # TODO: Try to factor out a lot of the get_record_stream() calls into a
311- # helper function similar to _read_bytes. This function should be
312- # able to use nodes from the _page_cache as well as actually
313- # requesting bytes from the store.
314-
315- (all_uninteresting_chks, all_uninteresting_items, interesting_keys,
316- interesting_to_yield, interesting_items) = _find_all_uninteresting(store,
317- interesting_root_keys, uninteresting_root_keys, pb)
318-
319- # Now that we know everything uninteresting, we can yield information from
320- # our first request
321- interesting_items.difference_update(all_uninteresting_items)
322- interesting_to_yield = set(interesting_to_yield) - all_uninteresting_chks
323- if interesting_items:
324- yield None, interesting_items
325- if interesting_to_yield:
326- # We request these records again, rather than buffering the root
327- # records, most likely they are still in the _group_cache anyway.
328- for record in store.get_record_stream(interesting_to_yield,
329- 'unordered', False):
330- yield record, []
331- all_uninteresting_chks.update(interesting_to_yield)
332- interesting_keys.difference_update(all_uninteresting_chks)
333-
334- chks_to_read = interesting_keys
335- counter = 0
336- while chks_to_read:
337- next_chks = set()
338- for record in store.get_record_stream(chks_to_read, 'unordered', False):
339- counter += 1
340- if pb is not None:
341- pb.update('find chk pages', counter)
342- # TODO: Handle 'absent'?
343- bytes = record.get_bytes_as('fulltext')
344- # We don't care about search_key_func for this code, because we
345- # only care about external references.
346- node = _deserialise(bytes, record.key, search_key_func=None)
347- if type(node) is InternalNode:
348- # all_uninteresting_chks grows large, as it lists all nodes we
349- # don't want to process (including already seen interesting
350- # nodes).
351- # small.difference_update(large) scales O(large), but
352- # small.difference(large) scales O(small).
353- # Also, we know we just _deserialised this node, so we can
354- # access the dict directly.
355- chks = set(node._items.itervalues()).difference(
356- all_uninteresting_chks)
357- # Is set() and .difference_update better than:
358- # chks = [chk for chk in node.refs()
359- # if chk not in all_uninteresting_chks]
360- next_chks.update(chks)
361- # These are now uninteresting everywhere else
362- all_uninteresting_chks.update(chks)
363- interesting_items = []
364- else:
365- interesting_items = [item for item in node._items.iteritems()
366- if item not in all_uninteresting_items]
367- # TODO: Do we need to filter out items that we have already
368- # seen on other pages? We don't really want to buffer the
369- # whole thing, but it does mean that callers need to
370- # understand they may get duplicate values.
371- # all_uninteresting_items.update(interesting_items)
372- yield record, interesting_items
373- chks_to_read = next_chks
374+ iterator = InterestingNodeIterator(store, interesting_root_keys,
375+ uninteresting_root_keys,
376+ search_key_func=store._search_key_func,
377+ pb=pb)
378+ return iterator.process()
379
380
381 try:
382
383=== modified file 'bzrlib/repofmt/groupcompress_repo.py'
384--- bzrlib/repofmt/groupcompress_repo.py 2009-06-29 14:51:13 +0000
385+++ bzrlib/repofmt/groupcompress_repo.py 2009-07-01 22:35:14 +0000
386@@ -652,6 +652,9 @@
387 parents=False, is_locked=self.is_locked,
388 inconsistency_fatal=False),
389 access=self._pack_collection.chk_index.data_access)
390+ search_key_name = self._format._serializer.search_key_name
391+ search_key_func = chk_map.search_key_registry.get(search_key_name)
392+ self.chk_bytes._search_key_func = search_key_func
393 # True when the repository object is 'write locked' (as opposed to the
394 # physical lock only taken out around changes to the pack-names list.)
395 # Another way to represent this would be a decorator around the control
396
397=== modified file 'bzrlib/tests/test_chk_map.py'
398--- bzrlib/tests/test_chk_map.py 2009-06-26 09:24:34 +0000
399+++ bzrlib/tests/test_chk_map.py 2009-07-01 22:35:14 +0000
400@@ -20,6 +20,7 @@
401
402 from bzrlib import (
403 chk_map,
404+ groupcompress,
405 osutils,
406 tests,
407 )
408@@ -59,17 +60,14 @@
409 self.assertCommonPrefix('', '', '')
410
411
412-class TestCaseWithStore(tests.TestCaseWithTransport):
413+class TestCaseWithStore(tests.TestCaseWithMemoryTransport):
414
415 def get_chk_bytes(self):
416 # The easiest way to get a CHK store is a development6 repository and
417 # then work with the chk_bytes attribute directly.
418- repo = self.make_repository(".", format="development6-rich-root")
419- repo.lock_write()
420- self.addCleanup(repo.unlock)
421- repo.start_write_group()
422- self.addCleanup(repo.abort_write_group)
423- return repo.chk_bytes
424+ factory = groupcompress.make_pack_factory(False, False, 1)
425+ self.chk_bytes = factory(self.get_transport())
426+ return self.chk_bytes
427
428 def _get_map(self, a_dict, maximum_size=0, chk_bytes=None, key_width=1,
429 search_key_func=None):
430@@ -97,6 +95,246 @@
431 return dict(node.iteritems(*args))
432
433
434+class TestCaseWithExampleMaps(TestCaseWithStore):
435+
436+ def get_chk_bytes(self):
437+ if getattr(self, '_chk_bytes', None) is None:
438+ self._chk_bytes = super(TestCaseWithExampleMaps,
439+ self).get_chk_bytes()
440+ return self._chk_bytes
441+
442+ def get_map(self, a_dict, maximum_size=100, search_key_func=None):
443+ c_map = self._get_map(a_dict, maximum_size=maximum_size,
444+ chk_bytes=self.get_chk_bytes(),
445+ search_key_func=search_key_func)
446+ return c_map
447+
448+ def make_root_only_map(self, search_key_func=None):
449+ return self.get_map({
450+ ('aaa',): 'initial aaa content',
451+ ('abb',): 'initial abb content',
452+ }, search_key_func=search_key_func)
453+
454+ def make_root_only_aaa_ddd_map(self, search_key_func=None):
455+ return self.get_map({
456+ ('aaa',): 'initial aaa content',
457+ ('ddd',): 'initial ddd content',
458+ }, search_key_func=search_key_func)
459+
460+ def make_one_deep_map(self, search_key_func=None):
461+ # Same as root_only_map, except it forces an InternalNode at the root
462+ return self.get_map({
463+ ('aaa',): 'initial aaa content',
464+ ('abb',): 'initial abb content',
465+ ('ccc',): 'initial ccc content',
466+ ('ddd',): 'initial ddd content',
467+ }, search_key_func=search_key_func)
468+
469+ def make_two_deep_map(self, search_key_func=None):
470+ # Carefully chosen so that it creates a 2-deep map for both
471+ # _search_key_plain and for _search_key_16
472+ # Also so that things line up with make_one_deep_two_prefix_map
473+ return self.get_map({
474+ ('aaa',): 'initial aaa content',
475+ ('abb',): 'initial abb content',
476+ ('acc',): 'initial acc content',
477+ ('ace',): 'initial ace content',
478+ ('add',): 'initial add content',
479+ ('adh',): 'initial adh content',
480+ ('adl',): 'initial adl content',
481+ ('ccc',): 'initial ccc content',
482+ ('ddd',): 'initial ddd content',
483+ }, search_key_func=search_key_func)
484+
485+ def make_one_deep_two_prefix_map(self, search_key_func=None):
486+ """Create a map with one internal node, but references are extra long.
487+
488+ Otherwise has similar content to make_two_deep_map.
489+ """
490+ return self.get_map({
491+ ('aaa',): 'initial aaa content',
492+ ('add',): 'initial add content',
493+ ('adh',): 'initial adh content',
494+ ('adl',): 'initial adl content',
495+ }, search_key_func=search_key_func)
496+
497+ def make_one_deep_one_prefix_map(self, search_key_func=None):
498+ """Create a map with one internal node, but references are extra long.
499+
500+ Similar to make_one_deep_two_prefix_map, except the split is at the
501+ first char, rather than the second.
502+ """
503+ return self.get_map({
504+ ('add',): 'initial add content',
505+ ('adh',): 'initial adh content',
506+ ('adl',): 'initial adl content',
507+ ('bbb',): 'initial bbb content',
508+ }, search_key_func=search_key_func)
509+
510+
511+class TestTestCaseWithExampleMaps(TestCaseWithExampleMaps):
512+ """Actual tests for the provided examples."""
513+
514+ def test_root_only_map_plain(self):
515+ c_map = self.make_root_only_map()
516+ self.assertEqualDiff(
517+ "'' LeafNode\n"
518+ " ('aaa',) 'initial aaa content'\n"
519+ " ('abb',) 'initial abb content'\n",
520+ c_map._dump_tree())
521+
522+ def test_root_only_map_16(self):
523+ c_map = self.make_root_only_map(search_key_func=chk_map._search_key_16)
524+ self.assertEqualDiff(
525+ "'' LeafNode\n"
526+ " ('aaa',) 'initial aaa content'\n"
527+ " ('abb',) 'initial abb content'\n",
528+ c_map._dump_tree())
529+
530+ def test_one_deep_map_plain(self):
531+ c_map = self.make_one_deep_map()
532+ self.assertEqualDiff(
533+ "'' InternalNode\n"
534+ " 'a' LeafNode\n"
535+ " ('aaa',) 'initial aaa content'\n"
536+ " ('abb',) 'initial abb content'\n"
537+ " 'c' LeafNode\n"
538+ " ('ccc',) 'initial ccc content'\n"
539+ " 'd' LeafNode\n"
540+ " ('ddd',) 'initial ddd content'\n",
541+ c_map._dump_tree())
542+
543+ def test_one_deep_map_16(self):
544+ c_map = self.make_one_deep_map(search_key_func=chk_map._search_key_16)
545+ self.assertEqualDiff(
546+ "'' InternalNode\n"
547+ " '2' LeafNode\n"
548+ " ('ccc',) 'initial ccc content'\n"
549+ " '4' LeafNode\n"
550+ " ('abb',) 'initial abb content'\n"
551+ " 'F' LeafNode\n"
552+ " ('aaa',) 'initial aaa content'\n"
553+ " ('ddd',) 'initial ddd content'\n",
554+ c_map._dump_tree())
555+
556+ def test_root_only_aaa_ddd_plain(self):
557+ c_map = self.make_root_only_aaa_ddd_map()
558+ self.assertEqualDiff(
559+ "'' LeafNode\n"
560+ " ('aaa',) 'initial aaa content'\n"
561+ " ('ddd',) 'initial ddd content'\n",
562+ c_map._dump_tree())
563+
564+ def test_one_deep_map_16(self):
565+ c_map = self.make_root_only_aaa_ddd_map(
566+ search_key_func=chk_map._search_key_16)
567+ # We use 'aaa' and 'ddd' because they happen to map to 'F' when using
568+ # _search_key_16
569+ self.assertEqualDiff(
570+ "'' LeafNode\n"
571+ " ('aaa',) 'initial aaa content'\n"
572+ " ('ddd',) 'initial ddd content'\n",
573+ c_map._dump_tree())
574+
575+ def test_two_deep_map_plain(self):
576+ c_map = self.make_two_deep_map()
577+ self.assertEqualDiff(
578+ "'' InternalNode\n"
579+ " 'a' InternalNode\n"
580+ " 'aa' LeafNode\n"
581+ " ('aaa',) 'initial aaa content'\n"
582+ " 'ab' LeafNode\n"
583+ " ('abb',) 'initial abb content'\n"
584+ " 'ac' LeafNode\n"
585+ " ('acc',) 'initial acc content'\n"
586+ " ('ace',) 'initial ace content'\n"
587+ " 'ad' LeafNode\n"
588+ " ('add',) 'initial add content'\n"
589+ " ('adh',) 'initial adh content'\n"
590+ " ('adl',) 'initial adl content'\n"
591+ " 'c' LeafNode\n"
592+ " ('ccc',) 'initial ccc content'\n"
593+ " 'd' LeafNode\n"
594+ " ('ddd',) 'initial ddd content'\n",
595+ c_map._dump_tree())
596+
597+ def test_two_deep_map_16(self):
598+ c_map = self.make_two_deep_map(search_key_func=chk_map._search_key_16)
599+ self.assertEqualDiff(
600+ "'' InternalNode\n"
601+ " '2' LeafNode\n"
602+ " ('acc',) 'initial acc content'\n"
603+ " ('ccc',) 'initial ccc content'\n"
604+ " '4' LeafNode\n"
605+ " ('abb',) 'initial abb content'\n"
606+ " 'C' LeafNode\n"
607+ " ('ace',) 'initial ace content'\n"
608+ " 'F' InternalNode\n"
609+ " 'F0' LeafNode\n"
610+ " ('aaa',) 'initial aaa content'\n"
611+ " 'F3' LeafNode\n"
612+ " ('adl',) 'initial adl content'\n"
613+ " 'F4' LeafNode\n"
614+ " ('adh',) 'initial adh content'\n"
615+ " 'FB' LeafNode\n"
616+ " ('ddd',) 'initial ddd content'\n"
617+ " 'FD' LeafNode\n"
618+ " ('add',) 'initial add content'\n",
619+ c_map._dump_tree())
620+
621+ def test_one_deep_two_prefix_map_plain(self):
622+ c_map = self.make_one_deep_two_prefix_map()
623+ self.assertEqualDiff(
624+ "'' InternalNode\n"
625+ " 'aa' LeafNode\n"
626+ " ('aaa',) 'initial aaa content'\n"
627+ " 'ad' LeafNode\n"
628+ " ('add',) 'initial add content'\n"
629+ " ('adh',) 'initial adh content'\n"
630+ " ('adl',) 'initial adl content'\n",
631+ c_map._dump_tree())
632+
633+ def test_one_deep_two_prefix_map_16(self):
634+ c_map = self.make_one_deep_two_prefix_map(
635+ search_key_func=chk_map._search_key_16)
636+ self.assertEqualDiff(
637+ "'' InternalNode\n"
638+ " 'F0' LeafNode\n"
639+ " ('aaa',) 'initial aaa content'\n"
640+ " 'F3' LeafNode\n"
641+ " ('adl',) 'initial adl content'\n"
642+ " 'F4' LeafNode\n"
643+ " ('adh',) 'initial adh content'\n"
644+ " 'FD' LeafNode\n"
645+ " ('add',) 'initial add content'\n",
646+ c_map._dump_tree())
647+
648+ def test_one_deep_one_prefix_map_plain(self):
649+ c_map = self.make_one_deep_one_prefix_map()
650+ self.assertEqualDiff(
651+ "'' InternalNode\n"
652+ " 'a' LeafNode\n"
653+ " ('add',) 'initial add content'\n"
654+ " ('adh',) 'initial adh content'\n"
655+ " ('adl',) 'initial adl content'\n"
656+ " 'b' LeafNode\n"
657+ " ('bbb',) 'initial bbb content'\n",
658+ c_map._dump_tree())
659+
660+ def test_one_deep_one_prefix_map_16(self):
661+ c_map = self.make_one_deep_one_prefix_map(
662+ search_key_func=chk_map._search_key_16)
663+ self.assertEqualDiff(
664+ "'' InternalNode\n"
665+ " '4' LeafNode\n"
666+ " ('bbb',) 'initial bbb content'\n"
667+ " 'F' LeafNode\n"
668+ " ('add',) 'initial add content'\n"
669+ " ('adh',) 'initial adh content'\n"
670+ " ('adl',) 'initial adl content'\n",
671+ c_map._dump_tree())
672+
673+
674 class TestMap(TestCaseWithStore):
675
676 def assertHasABMap(self, chk_bytes):
677@@ -1886,62 +2124,408 @@
678 # 1-4K get0
679
680
681-class TestIterInterestingNodes(TestCaseWithStore):
682-
683- def get_chk_bytes(self):
684- if getattr(self, '_chk_bytes', None) is None:
685- self._chk_bytes = super(TestIterInterestingNodes,
686- self).get_chk_bytes()
687- return self._chk_bytes
688+class TestInterestingNodeIterator(TestCaseWithExampleMaps):
689+
690+ def get_iterator(self, interesting_roots, uninteresting_roots,
691+ search_key_func=None):
692+ if search_key_func is None:
693+ search_key_func = chk_map._search_key_plain
694+ return chk_map.InterestingNodeIterator(self.get_chk_bytes(),
695+ interesting_roots, uninteresting_roots, search_key_func)
696+
697+ def test__init__(self):
698+ c_map = self.make_root_only_map()
699+ key1 = c_map.key()
700+ c_map.map(('aaa',), 'new aaa content')
701+ key2 = c_map._save()
702+ iterator = self.get_iterator([key2], [key1])
703+ self.assertEqual(set([key1]), iterator._all_uninteresting_chks)
704+ self.assertEqual([], iterator._uninteresting_queue)
705+ self.assertEqual([], iterator._interesting_queue)
706+
707+ def help__read_all_roots(self, search_key_func):
708+ c_map = self.make_root_only_map(search_key_func=search_key_func)
709+ key1 = c_map.key()
710+ c_map.map(('aaa',), 'new aaa content')
711+ key2 = c_map._save()
712+ iterator = self.get_iterator([key2], [key1], search_key_func)
713+ root_results = [record.key for record in iterator._read_all_roots()]
714+ self.assertEqual([key2], root_results)
715+ # We should have queued up only items that aren't in the uninteresting
716+ # set
717+ self.assertEqual([(('aaa',), 'new aaa content')],
718+ iterator._interesting_item_queue)
719+ self.assertEqual([], iterator._interesting_queue)
720+ # And there are no uninteresting references, so that queue should be
721+ # empty
722+ self.assertEqual([], iterator._uninteresting_queue)
723+
724+ def test__read_all_roots_plain(self):
725+ self.help__read_all_roots(search_key_func=chk_map._search_key_plain)
726+
727+ def test__read_all_roots_16(self):
728+ self.help__read_all_roots(search_key_func=chk_map._search_key_16)
729+
730+ def test__read_all_roots_skips_known_uninteresting(self):
731+ c_map = self.make_one_deep_map(chk_map._search_key_plain)
732+ key1 = c_map.key()
733+ c_map2 = self.make_root_only_map(chk_map._search_key_plain)
734+ key2 = c_map2.key()
735+ iterator = self.get_iterator([key2], [key1], chk_map._search_key_plain)
736+ root_results = [record.key for record in iterator._read_all_roots()]
737+ # We should have no results. key2 is completely contained within key1,
738+ # and we should have seen that in the first pass
739+ self.assertEqual([], root_results)
740+
741+ def test__read_all_roots_prepares_queues(self):
742+ c_map = self.make_one_deep_map(chk_map._search_key_plain)
743+ key1 = c_map.key()
744+ c_map._dump_tree() # load everything
745+ key1_a = c_map._root_node._items['a'].key()
746+ c_map.map(('abb',), 'new abb content')
747+ key2 = c_map._save()
748+ key2_a = c_map._root_node._items['a'].key()
749+ iterator = self.get_iterator([key2], [key1], chk_map._search_key_plain)
750+ root_results = [record.key for record in iterator._read_all_roots()]
751+ self.assertEqual([key2], root_results)
752+ # At this point, we should have queued up only the 'a' Leaf on both
753+ # sides, both 'c' and 'd' are known to not have changed on both sides
754+ self.assertEqual([key2_a], iterator._interesting_queue)
755+ self.assertEqual([], iterator._interesting_item_queue)
756+ self.assertEqual([key1_a], iterator._uninteresting_queue)
757+
758+ def test__read_all_roots_multi_interesting_prepares_queues(self):
759+ c_map = self.make_one_deep_map(chk_map._search_key_plain)
760+ key1 = c_map.key()
761+ c_map._dump_tree() # load everything
762+ key1_a = c_map._root_node._items['a'].key()
763+ key1_c = c_map._root_node._items['c'].key()
764+ c_map.map(('abb',), 'new abb content')
765+ key2 = c_map._save()
766+ key2_a = c_map._root_node._items['a'].key()
767+ key2_c = c_map._root_node._items['c'].key()
768+ c_map = chk_map.CHKMap(self.get_chk_bytes(), key1,
769+ chk_map._search_key_plain)
770+ c_map.map(('ccc',), 'new ccc content')
771+ key3 = c_map._save()
772+ key3_a = c_map._root_node._items['a'].key()
773+ key3_c = c_map._root_node._items['c'].key()
774+ iterator = self.get_iterator([key2, key3], [key1],
775+ chk_map._search_key_plain)
776+ root_results = [record.key for record in iterator._read_all_roots()]
777+ self.assertEqual(sorted([key2, key3]), sorted(root_results))
778+ # We should have queued up key2_a, and key3_c, but not key2_c or key3_c
779+ self.assertEqual([key2_a, key3_c], iterator._interesting_queue)
780+ self.assertEqual([], iterator._interesting_item_queue)
781+ # And we should have queued up both a and c for the uninteresting set
782+ self.assertEqual([key1_a, key1_c], iterator._uninteresting_queue)
783+
784+ def test__read_all_roots_different_depths(self):
785+ c_map = self.make_two_deep_map(chk_map._search_key_plain)
786+ c_map._dump_tree() # load everything
787+ key1 = c_map.key()
788+ key1_a = c_map._root_node._items['a'].key()
789+ key1_c = c_map._root_node._items['c'].key()
790+ key1_d = c_map._root_node._items['d'].key()
791+
792+ c_map2 = self.make_one_deep_two_prefix_map(chk_map._search_key_plain)
793+ c_map2._dump_tree()
794+ key2 = c_map2.key()
795+ key2_aa = c_map2._root_node._items['aa'].key()
796+ key2_ad = c_map2._root_node._items['ad'].key()
797+
798+ iterator = self.get_iterator([key2], [key1], chk_map._search_key_plain)
799+ root_results = [record.key for record in iterator._read_all_roots()]
800+ self.assertEqual([key2], root_results)
801+ # Only the 'a' subset should be queued up, since 'c' and 'd' cannot be
802+ # present
803+ self.assertEqual([key1_a], iterator._uninteresting_queue)
804+ self.assertEqual([key2_aa, key2_ad], iterator._interesting_queue)
805+ self.assertEqual([], iterator._interesting_item_queue)
806+
807+ iterator = self.get_iterator([key1], [key2], chk_map._search_key_plain)
808+ root_results = [record.key for record in iterator._read_all_roots()]
809+ self.assertEqual([key1], root_results)
810+
811+ self.assertEqual([key2_aa, key2_ad], iterator._uninteresting_queue)
812+ self.assertEqual([key1_a, key1_c, key1_d], iterator._interesting_queue)
813+ self.assertEqual([], iterator._interesting_item_queue)
814+
815+ def test__read_all_roots_different_depths_16(self):
816+ c_map = self.make_two_deep_map(chk_map._search_key_16)
817+ c_map._dump_tree() # load everything
818+ key1 = c_map.key()
819+ key1_2 = c_map._root_node._items['2'].key()
820+ key1_4 = c_map._root_node._items['4'].key()
821+ key1_C = c_map._root_node._items['C'].key()
822+ key1_F = c_map._root_node._items['F'].key()
823+
824+ c_map2 = self.make_one_deep_two_prefix_map(chk_map._search_key_16)
825+ c_map2._dump_tree()
826+ key2 = c_map2.key()
827+ key2_F0 = c_map2._root_node._items['F0'].key()
828+ key2_F3 = c_map2._root_node._items['F3'].key()
829+ key2_F4 = c_map2._root_node._items['F4'].key()
830+ key2_FD = c_map2._root_node._items['FD'].key()
831+
832+ iterator = self.get_iterator([key2], [key1], chk_map._search_key_16)
833+ root_results = [record.key for record in iterator._read_all_roots()]
834+ self.assertEqual([key2], root_results)
835+ # Only the subset of keys that may be present should be queued up.
836+ self.assertEqual([key1_F], iterator._uninteresting_queue)
837+ self.assertEqual(sorted([key2_F0, key2_F3, key2_F4, key2_FD]),
838+ sorted(iterator._interesting_queue))
839+ self.assertEqual([], iterator._interesting_item_queue)
840+
841+ iterator = self.get_iterator([key1], [key2], chk_map._search_key_16)
842+ root_results = [record.key for record in iterator._read_all_roots()]
843+ self.assertEqual([key1], root_results)
844+
845+ self.assertEqual(sorted([key2_F0, key2_F3, key2_F4, key2_FD]),
846+ sorted(iterator._uninteresting_queue))
847+ self.assertEqual(sorted([key1_2, key1_4, key1_C, key1_F]),
848+ sorted(iterator._interesting_queue))
849+ self.assertEqual([], iterator._interesting_item_queue)
850+
851+ def test__read_all_roots_mixed_depth(self):
852+ c_map = self.make_one_deep_two_prefix_map(chk_map._search_key_plain)
853+ c_map._dump_tree() # load everything
854+ key1 = c_map.key()
855+ key1_aa = c_map._root_node._items['aa'].key()
856+ key1_ad = c_map._root_node._items['ad'].key()
857+
858+ c_map2 = self.make_one_deep_one_prefix_map(chk_map._search_key_plain)
859+ c_map2._dump_tree()
860+ key2 = c_map2.key()
861+ key2_a = c_map2._root_node._items['a'].key()
862+ key2_b = c_map2._root_node._items['b'].key()
863+
864+ iterator = self.get_iterator([key2], [key1], chk_map._search_key_plain)
865+ root_results = [record.key for record in iterator._read_all_roots()]
866+ self.assertEqual([key2], root_results)
867+ # 'ad' matches exactly 'a' on the other side, so it should be removed,
868+ # and neither side should have it queued for walking
869+ self.assertEqual([], iterator._uninteresting_queue)
870+ self.assertEqual([key2_b], iterator._interesting_queue)
871+ self.assertEqual([], iterator._interesting_item_queue)
872+
873+ iterator = self.get_iterator([key1], [key2], chk_map._search_key_plain)
874+ root_results = [record.key for record in iterator._read_all_roots()]
875+ self.assertEqual([key1], root_results)
876+ # Note: This is technically not the 'true minimal' set that we could
877+ # use The reason is that 'a' was matched exactly to 'ad' (by sha
878+ # sum). However, the code gets complicated in the case of more
879+ # than one interesting key, so for now, we live with this
880+ # Consider revising, though benchmarking showing it to be a
881+ # real-world issue should be done
882+ self.assertEqual([key2_a], iterator._uninteresting_queue)
883+ # self.assertEqual([], iterator._uninteresting_queue)
884+ self.assertEqual([key1_aa], iterator._interesting_queue)
885+ self.assertEqual([], iterator._interesting_item_queue)
886+
887+ def test__read_all_roots_yields_extra_deep_records(self):
888+ # This is slightly controversial, as we will yield a chk page that we
889+ # might later on find out could be filtered out. (If a root node is
890+ # referenced deeper in the uninteresting set.)
891+ # However, even with stacking, we always have all chk pages that we
892+ # will need. So as long as we filter out the referenced keys, we'll
893+ # never run into problems.
894+ # This allows us to yield a root node record immediately, without any
895+ # buffering.
896+ c_map = self.make_two_deep_map(chk_map._search_key_plain)
897+ c_map._dump_tree() # load all keys
898+ key1 = c_map.key()
899+ key1_a = c_map._root_node._items['a'].key()
900+ c_map2 = self.get_map({
901+ ('acc',): 'initial acc content',
902+ ('ace',): 'initial ace content',
903+ }, maximum_size=100)
904+ self.assertEqualDiff(
905+ "'' LeafNode\n"
906+ " ('acc',) 'initial acc content'\n"
907+ " ('ace',) 'initial ace content'\n",
908+ c_map2._dump_tree())
909+ key2 = c_map2.key()
910+ iterator = self.get_iterator([key2], [key1], chk_map._search_key_plain)
911+ root_results = [record.key for record in iterator._read_all_roots()]
912+ self.assertEqual([key2], root_results)
913+ # However, even though we have yielded the root node to be fetched,
914+ # we should have enqued all of the chk pages to be walked, so that we
915+ # can find the keys if they are present
916+ self.assertEqual([key1_a], iterator._uninteresting_queue)
917+ self.assertEqual([(('acc',), 'initial acc content'),
918+ (('ace',), 'initial ace content'),
919+ ], iterator._interesting_item_queue)
920+
921+ def test__read_all_roots_multiple_targets(self):
922+ c_map = self.make_root_only_map()
923+ key1 = c_map.key()
924+ c_map = self.make_one_deep_map()
925+ key2 = c_map.key()
926+ c_map._dump_tree()
927+ key2_c = c_map._root_node._items['c'].key()
928+ key2_d = c_map._root_node._items['d'].key()
929+ c_map.map(('ccc',), 'new ccc value')
930+ key3 = c_map._save()
931+ key3_c = c_map._root_node._items['c'].key()
932+ iterator = self.get_iterator([key2, key3], [key1],
933+ chk_map._search_key_plain)
934+ root_results = [record.key for record in iterator._read_all_roots()]
935+ self.assertEqual(sorted([key2, key3]), sorted(root_results))
936+ self.assertEqual([], iterator._uninteresting_queue)
937+ # the key 'd' is interesting from key2 and key3, but should only be
938+ # entered into the queue 1 time
939+ self.assertEqual(sorted([key2_c, key3_c, key2_d]),
940+ sorted(iterator._interesting_queue))
941+ self.assertEqual([], iterator._interesting_item_queue)
942+
943+ def test__read_all_roots_no_uninteresting(self):
944+ # This is the 'initial branch' case. With nothing in the uninteresting
945+ # set, we can just queue up all root nodes into interesting queue, and
946+ # then have them fast-path flushed via _flush_interesting_queue
947+ c_map = self.make_two_deep_map()
948+ key1 = c_map.key()
949+ iterator = self.get_iterator([key1], [], chk_map._search_key_plain)
950+ root_results = [record.key for record in iterator._read_all_roots()]
951+ self.assertEqual([], root_results)
952+ self.assertEqual([], iterator._uninteresting_queue)
953+ self.assertEqual([key1], iterator._interesting_queue)
954+ self.assertEqual([], iterator._interesting_item_queue)
955+
956+ c_map2 = self.make_one_deep_map()
957+ key2 = c_map2.key()
958+ iterator = self.get_iterator([key1, key2], [],
959+ chk_map._search_key_plain)
960+ root_results = [record.key for record in iterator._read_all_roots()]
961+ self.assertEqual([], root_results)
962+ self.assertEqual([], iterator._uninteresting_queue)
963+ self.assertEqual(sorted([key1, key2]),
964+ sorted(iterator._interesting_queue))
965+ self.assertEqual([], iterator._interesting_item_queue)
966+
967+ def test__read_all_roots_no_uninteresting_16(self):
968+ c_map = self.make_two_deep_map(chk_map._search_key_16)
969+ key1 = c_map.key()
970+ iterator = self.get_iterator([key1], [], chk_map._search_key_16)
971+ root_results = [record.key for record in iterator._read_all_roots()]
972+ self.assertEqual([], root_results)
973+ self.assertEqual([], iterator._uninteresting_queue)
974+ self.assertEqual([key1], iterator._interesting_queue)
975+ self.assertEqual([], iterator._interesting_item_queue)
976+
977+ c_map2 = self.make_one_deep_map(chk_map._search_key_16)
978+ key2 = c_map2.key()
979+ iterator = self.get_iterator([key1, key2], [],
980+ chk_map._search_key_16)
981+ root_results = [record.key for record in iterator._read_all_roots()]
982+ self.assertEqual([], root_results)
983+ self.assertEqual([], iterator._uninteresting_queue)
984+ self.assertEqual(sorted([key1, key2]),
985+ sorted(iterator._interesting_queue))
986+ self.assertEqual([], iterator._interesting_item_queue)
987+
988+ def test__read_all_roots_multiple_uninteresting(self):
989+ c_map = self.make_two_deep_map()
990+ key1 = c_map.key()
991+ c_map._dump_tree() # load everything
992+ key1_a = c_map._root_node._items['a'].key()
993+ c_map.map(('ccc',), 'new ccc value')
994+ key2 = c_map._save()
995+ key2_a = c_map._root_node._items['a'].key()
996+ c_map.map(('add',), 'new add value')
997+ key3 = c_map._save()
998+ key3_a = c_map._root_node._items['a'].key()
999+ iterator = self.get_iterator([key3], [key1, key2],
1000+ chk_map._search_key_plain)
1001+ root_results = [record.key for record in iterator._read_all_roots()]
1002+ self.assertEqual([key3], root_results)
1003+ # the 'a' keys should not be queued up 2 times, since they are
1004+ # identical
1005+ self.assertEqual([key1_a], iterator._uninteresting_queue)
1006+ self.assertEqual([key3_a], iterator._interesting_queue)
1007+ self.assertEqual([], iterator._interesting_item_queue)
1008+
1009+ def test__process_next_uninteresting_batched_no_dupes(self):
1010+ c_map = self.make_two_deep_map()
1011+ key1 = c_map.key()
1012+ c_map._dump_tree() # load everything
1013+ key1_a = c_map._root_node._items['a'].key()
1014+ key1_aa = c_map._root_node._items['a']._items['aa'].key()
1015+ key1_ab = c_map._root_node._items['a']._items['ab'].key()
1016+ key1_ac = c_map._root_node._items['a']._items['ac'].key()
1017+ key1_ad = c_map._root_node._items['a']._items['ad'].key()
1018+ c_map.map(('aaa',), 'new aaa value')
1019+ key2 = c_map._save()
1020+ key2_a = c_map._root_node._items['a'].key()
1021+ key2_aa = c_map._root_node._items['a']._items['aa'].key()
1022+ c_map.map(('acc',), 'new acc content')
1023+ key3 = c_map._save()
1024+ key3_a = c_map._root_node._items['a'].key()
1025+ key3_ac = c_map._root_node._items['a']._items['ac'].key()
1026+ iterator = self.get_iterator([key3], [key1, key2],
1027+ chk_map._search_key_plain)
1028+ root_results = [record.key for record in iterator._read_all_roots()]
1029+ self.assertEqual([key3], root_results)
1030+ self.assertEqual(sorted([key1_a, key2_a]),
1031+ sorted(iterator._uninteresting_queue))
1032+ self.assertEqual([key3_a], iterator._interesting_queue)
1033+ self.assertEqual([], iterator._interesting_item_queue)
1034+ iterator._process_next_uninteresting()
1035+ # All of the uninteresting records should be brought in and queued up,
1036+ # but we should not have any duplicates
1037+ self.assertEqual(sorted([key1_aa, key1_ab, key1_ac, key1_ad, key2_aa]),
1038+ sorted(iterator._uninteresting_queue))
1039+
1040+
1041+class TestIterInterestingNodes(TestCaseWithExampleMaps):
1042
1043 def get_map_key(self, a_dict, maximum_size=10):
1044- c_map = self._get_map(a_dict, maximum_size=maximum_size,
1045- chk_bytes=self.get_chk_bytes())
1046+ c_map = self.get_map(a_dict, maximum_size=maximum_size)
1047 return c_map.key()
1048
1049- def assertIterInteresting(self, expected, interesting_keys,
1050+ def assertIterInteresting(self, records, items, interesting_keys,
1051 uninteresting_keys):
1052 """Check the result of iter_interesting_nodes.
1053
1054- :param expected: A list of (record_keys, interesting_chk_pages,
1055- interesting key value pairs)
1056+ Note that we no longer care how many steps are taken, etc, just that
1057+ the right contents are returned.
1058+
1059+ :param records: A list of record keys that should be yielded
1060+ :param items: A list of items (key,value) that should be yielded.
1061 """
1062 store = self.get_chk_bytes()
1063+ store._search_key_func = chk_map._search_key_plain
1064 iter_nodes = chk_map.iter_interesting_nodes(store, interesting_keys,
1065 uninteresting_keys)
1066- nodes = list(iter_nodes)
1067- for count, (exp, act) in enumerate(izip(expected, nodes)):
1068- exp_record, exp_items = exp
1069- record, items = act
1070- exp_tuple = (exp_record, sorted(exp_items))
1071- if record is None:
1072- act_tuple = (None, sorted(items))
1073- else:
1074- act_tuple = (record.key, sorted(items))
1075- self.assertEqual(exp_tuple, act_tuple,
1076- 'entry %d did not match expected' % count)
1077- self.assertEqual(len(expected), len(nodes))
1078+ record_keys = []
1079+ all_items = []
1080+ for record, new_items in iter_nodes:
1081+ if record is not None:
1082+ record_keys.append(record.key)
1083+ if new_items:
1084+ all_items.extend(new_items)
1085+ self.assertEqual(sorted(records), sorted(record_keys))
1086+ self.assertEqual(sorted(items), sorted(all_items))
1087
1088 def test_empty_to_one_keys(self):
1089 target = self.get_map_key({('a',): 'content'})
1090- self.assertIterInteresting(
1091- [(target, [(('a',), 'content')]),
1092- ], [target], [])
1093+ self.assertIterInteresting([target],
1094+ [(('a',), 'content')],
1095+ [target], [])
1096
1097 def test_none_to_one_key(self):
1098 basis = self.get_map_key({})
1099 target = self.get_map_key({('a',): 'content'})
1100- self.assertIterInteresting(
1101- [(None, [(('a',), 'content')]),
1102- (target, []),
1103- ], [target], [basis])
1104+ self.assertIterInteresting([target],
1105+ [(('a',), 'content')],
1106+ [target], [basis])
1107
1108 def test_one_to_none_key(self):
1109 basis = self.get_map_key({('a',): 'content'})
1110 target = self.get_map_key({})
1111- self.assertIterInteresting(
1112- [(target, [])],
1113- [target], [basis])
1114+ self.assertIterInteresting([target],
1115+ [],
1116+ [target], [basis])
1117
1118 def test_common_pages(self):
1119 basis = self.get_map_key({('a',): 'content',
1120@@ -1964,10 +2548,9 @@
1121 target_map._dump_tree())
1122 b_key = target_map._root_node._items['b'].key()
1123 # This should return the root node, and the node for the 'b' key
1124- self.assertIterInteresting(
1125- [(target, []),
1126- (b_key, [(('b',), 'other content')])],
1127- [target], [basis])
1128+ self.assertIterInteresting([target, b_key],
1129+ [(('b',), 'other content')],
1130+ [target], [basis])
1131
1132 def test_common_sub_page(self):
1133 basis = self.get_map_key({('aaa',): 'common',
1134@@ -1991,12 +2574,11 @@
1135 # The key for the internal aa node
1136 a_key = target_map._root_node._items['a'].key()
1137 # The key for the leaf aab node
1138+ # aaa_key = target_map._root_node._items['a']._items['aaa'].key()
1139 aab_key = target_map._root_node._items['a']._items['aab'].key()
1140- self.assertIterInteresting(
1141- [(target, []),
1142- (a_key, []),
1143- (aab_key, [(('aab',), 'new')])],
1144- [target], [basis])
1145+ self.assertIterInteresting([target, a_key, aab_key],
1146+ [(('aab',), 'new')],
1147+ [target], [basis])
1148
1149 def test_common_leaf(self):
1150 basis = self.get_map_key({})
1151@@ -2040,29 +2622,22 @@
1152 a_key = target3_map._root_node._items['a'].key()
1153 aac_key = target3_map._root_node._items['a']._items['aac'].key()
1154 self.assertIterInteresting(
1155- [(None, [(('aaa',), 'common')]),
1156- (target1, []),
1157- (target2, []),
1158- (target3, []),
1159- (b_key, [(('bbb',), 'new')]),
1160- (a_key, []),
1161- (aac_key, [(('aac',), 'other')]),
1162- ], [target1, target2, target3], [basis])
1163-
1164- self.assertIterInteresting(
1165- [(target2, []),
1166- (target3, []),
1167- (b_key, [(('bbb',), 'new')]),
1168- (a_key, []),
1169- (aac_key, [(('aac',), 'other')]),
1170- ], [target2, target3], [target1])
1171-
1172- # This may be a case that we relax. A root node is a deep child of the
1173- # excluded set. The cost is buffering root nodes until we have
1174- # determined all possible exclusions. (Because a prefix of '', cannot
1175- # be excluded.)
1176- self.assertIterInteresting(
1177- [], [target1], [target3])
1178+ [target1, target2, target3, a_key, aac_key, b_key],
1179+ [(('aaa',), 'common'), (('bbb',), 'new'), (('aac',), 'other')],
1180+ [target1, target2, target3], [basis])
1181+
1182+ self.assertIterInteresting(
1183+ [target2, target3, a_key, aac_key, b_key],
1184+ [(('bbb',), 'new'), (('aac',), 'other')],
1185+ [target2, target3], [target1])
1186+
1187+ # Technically, target1 could be filtered out, but since it is a root
1188+ # node, we yield it immediately, rather than waiting to find out much
1189+ # later on.
1190+ self.assertIterInteresting(
1191+ [target1],
1192+ [],
1193+ [target1], [target3])
1194
1195 def test_multiple_maps(self):
1196 basis1 = self.get_map_key({('aaa',): 'common',
1197@@ -2111,13 +2686,9 @@
1198 # The key for the leaf bba node
1199 bba_key = target2_map._root_node._items['b']._items['bba'].key()
1200 self.assertIterInteresting(
1201- [(target1, []),
1202- (target2, []),
1203- (a_key, []),
1204- (b_key, []),
1205- (aac_key, [(('aac',), 'target1')]),
1206- (bba_key, [(('bba',), 'target2')]),
1207- ], [target1, target2], [basis1, basis2])
1208+ [target1, target2, a_key, aac_key, b_key, bba_key],
1209+ [(('aac',), 'target1'), (('bba',), 'target2')],
1210+ [target1, target2], [basis1, basis2])
1211
1212 def test_multiple_maps_overlapping_common_new(self):
1213 # Test that when a node found through the interesting_keys iteration
1214@@ -2188,17 +2759,10 @@
1215 right_map._dump_tree())
1216 # Keys from the right side target - none, the root is enough.
1217 # Test behaviour
1218- self.expectFailure("we don't properly filter different depths",
1219- self.assertIterInteresting,
1220- [(left, []),
1221- (right, []),
1222- (l_d_key, [(('ddd',), 'change')]),
1223- ], [left, right], [basis])
1224 self.assertIterInteresting(
1225- [(left, []),
1226- (right, []),
1227- (l_d_key, [(('ddd',), 'change')]),
1228- ], [left, right], [basis])
1229+ [right, left, l_d_key],
1230+ [(('ddd',), 'change')],
1231+ [left, right], [basis])
1232
1233 def test_multiple_maps_similar(self):
1234 # We want to have a depth=2 tree, with multiple entries in each leaf
1235@@ -2259,8 +2823,6 @@
1236 r_a_key = right_map._root_node._items['a'].key()
1237 r_c_key = right_map._root_node._items['c'].key()
1238 self.assertIterInteresting(
1239- [(left, []),
1240- (right, []),
1241- (l_a_key, [(('abb',), 'changed left')]),
1242- (r_c_key, [(('cbb',), 'changed right')]),
1243- ], [left, right], [basis])
1244+ [right, left, l_a_key, r_c_key],
1245+ [(('abb',), 'changed left'), (('cbb',), 'changed right')],
1246+ [left, right], [basis])