Merge lp:~jameinel/bzr/1.17-chk-multilevel into lp:~bzr/bzr/trunk-old

Proposed by John A Meinel
Status: Merged
Approved by: Ian Clatworthy
Approved revision: no longer in the source branch.
Merged at revision: not available
Proposed branch: lp:~jameinel/bzr/1.17-chk-multilevel
Merge into: lp:~bzr/bzr/trunk-old
Diff against target: 1253 lines
To merge this branch: bzr merge lp:~jameinel/bzr/1.17-chk-multilevel
Reviewer Review Type Date Requested Status
Robert Collins (community) Approve
Review via email: mp+8157@code.launchpad.net

This proposal supersedes a proposal from 2009-07-01.

To post a comment you must log in.
Revision history for this message
John A Meinel (jameinel) wrote : Posted in a previous version of this proposal
Download full text (4.0 KiB)

This is a slightly better fix for bug #390563. The main changes are:

1) Rewrite iter_interesting_nodes using a class, rather than maintaining state between functions
2) Split up the processing of root nodes into:
  a) Read all uninteresting roots, and find uninteresting chks and items
  b) Filter out any interesting root keys that were found in (a)
  c) Read all remaining root pages and yield them
  d) Find interesting references that are not in (a) (and interesting items)
     Note the search key prefixes for these references and items, and queue
     them up to be returned later.
  e) Filter out uninteresting chk pages based on the search key prefixes in (d)

(d) and (e) are the main differences.
For (d) Robert found an edge case where we could return a non-interesting item (which would, in turn, be a text key, which would then be a failure for fetch) when one side had an immediate leaf node, and the other side had a deep tree. The only real change is to just defer returning interesting *items* until all uninteresting pages have been read.

(e) Before we used the 'intersection of interesting references', whereas now we filter simply based on prefix. Likely to be a small win if there are lots of uninteresting pages (lots of merge parents that are not being transmitted).

3) After processing root nodes
  a) Process the uninteresing queue until completion
  b) Process the interesting queue, filtering out all uninteresting items.
     (some minor things here, like doing batch '.difference()' checks rather than .difference on every record processed)

I looked at trying to use a heapq to unify things more with 'iter_changes' but the constraints are *very* different.

1) CHKMap.iter_changes() is symmetric. In that we need to return a record for (source, target, extra), for iter_interesting_nodes, we *only* need to return information for the 'target' side.
2) CHKMap.iter_changes() only ever effects 2 records, a single source and a single target. iter_interesting_nodes is looking at a many-to-many comparison.
3) CHKMap.iter_changes() only cares about the actual keys, and not about sending the raw byte records to be transmitted for fetch.
4) iter_interesting_nodes is much more concerned about batching because of its many-to-many nature, and the fact it is used for fetch. We really don't want to be doing a round trip per page.
5) The only gain for doing heapq is if we could figure out the algorithm for filtering out nodes that we thought we would want to read, but really don't. However,
  a) I do this at the root level, because it is fairly straightforward. So any further improvements only effect trees that are deeper than 2 (1 root, 1+ internal, 1 leaf), however this only happens for trees > 50k items.
  b) Reading extra chk pages isn't "incorrect", just "inefficient" (the stacking code ensures all referenced chk pages are present in the stacked-on branch, just like we ensure compression texts.)
  c) The best I could come up with would be to split the reads up by prefix. So you would page in the 'a' references, then the 'b' references, etc. However, that forces us to split the requests up even further, rather than larger batches of requests. (more...

Read more...

Revision history for this message
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal

Shallow immediate review:

The docstring for the class, and perhaps the class name could be better.

In terms of sets its:
interesting - uninteresting

So perhaps
CHKMapDifference

as a class name. And for the docstring, something like

"""Iterate the stored pages and key,value pairs for (new - old).

This class provides a generator over the stored CHK pages and the
key,value pairs that are in any of the new maps and not in any of the
old maps.
"""

-Rob

Revision history for this message
John A Meinel (jameinel) wrote :

Just a follow up from Robert's "immediate" suggestions. Only renames the class and variables. Using "new" and "old" for the sets actually makes it a bit more readable for me. Both because it is shorter, but also because "interesting" and "uninteresting" are 85% the same word.

Revision history for this message
Robert Collins (lifeless) wrote :

 review +1

Some thoughts

process() could just be a itertools.chain:
return itertools.chain(self._yield_all_roots(), self._process_queues())

There is some code still using [un]interesting that you might want to
change (just grep the diff) to be more clear.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bzrlib/chk_map.py'
2--- bzrlib/chk_map.py 2009-07-01 10:52:23 +0000
3+++ bzrlib/chk_map.py 2009-07-09 07:36:30 +0000
4@@ -1396,106 +1396,202 @@
5 return node
6
7
8-def _find_children_info(store, interesting_keys, uninteresting_keys, pb):
9- """Read the associated records, and determine what is interesting."""
10- uninteresting_keys = set(uninteresting_keys)
11- chks_to_read = uninteresting_keys.union(interesting_keys)
12- next_uninteresting = set()
13- next_interesting = set()
14- next_interesting_intersection = None
15- uninteresting_items = set()
16- interesting_items = set()
17- interesting_to_yield = []
18- for record in store.get_record_stream(chks_to_read, 'unordered', True):
19- # records_read.add(record.key())
20- if pb is not None:
21- pb.tick()
22- bytes = record.get_bytes_as('fulltext')
23- # We don't care about search_key_func for this code, because we only
24- # care about external references.
25- node = _deserialise(bytes, record.key, search_key_func=None)
26- if record.key in uninteresting_keys:
27- if type(node) is InternalNode:
28- next_uninteresting.update(node.refs())
29- else:
30- # We know we are at a LeafNode, so we can pass None for the
31- # store
32- uninteresting_items.update(node.iteritems(None))
33- else:
34- interesting_to_yield.append(record.key)
35- if type(node) is InternalNode:
36- if next_interesting_intersection is None:
37- next_interesting_intersection = set(node.refs())
38- else:
39- next_interesting_intersection = \
40- next_interesting_intersection.intersection(node.refs())
41- next_interesting.update(node.refs())
42- else:
43- interesting_items.update(node.iteritems(None))
44- return (next_uninteresting, uninteresting_items,
45- next_interesting, interesting_to_yield, interesting_items,
46- next_interesting_intersection)
47-
48-
49-def _find_all_uninteresting(store, interesting_root_keys,
50- uninteresting_root_keys, pb):
51- """Determine the full set of uninteresting keys."""
52- # What about duplicates between interesting_root_keys and
53- # uninteresting_root_keys?
54- if not uninteresting_root_keys:
55- # Shortcut case. We know there is nothing uninteresting to filter out
56- # So we just let the rest of the algorithm do the work
57- # We know there is nothing uninteresting, and we didn't have to read
58- # any interesting records yet.
59- return (set(), set(), set(interesting_root_keys), [], set())
60- all_uninteresting_chks = set(uninteresting_root_keys)
61- all_uninteresting_items = set()
62-
63- # First step, find the direct children of both the interesting and
64- # uninteresting set
65- (uninteresting_keys, uninteresting_items,
66- interesting_keys, interesting_to_yield,
67- interesting_items, interesting_intersection,
68- ) = _find_children_info(store, interesting_root_keys,
69- uninteresting_root_keys,
70- pb=pb)
71- all_uninteresting_chks.update(uninteresting_keys)
72- all_uninteresting_items.update(uninteresting_items)
73- del uninteresting_items
74- # Do not examine in detail pages common to all interesting trees.
75- # Pages that are common to all interesting trees will have their
76- # older versions found via the uninteresting tree traversal. Some pages
77- # found via the interesting trees traversal will be uninteresting for
78- # other of the interesting trees, which is why we require the pages to be
79- # common for us to trim them.
80- if interesting_intersection is not None:
81- uninteresting_keys.difference_update(interesting_intersection)
82-
83- # Second, find the full set of uninteresting bits reachable by the
84- # uninteresting roots
85- chks_to_read = uninteresting_keys
86- while chks_to_read:
87- next_chks = set()
88- for record in store.get_record_stream(chks_to_read, 'unordered', False):
89- # TODO: Handle 'absent'
90- if pb is not None:
91- pb.tick()
92+class CHKMapDifference(object):
93+ """Iterate the stored pages and key,value pairs for (new - old).
94+
95+ This class provides a generator over the stored CHK pages and the
96+ (key, value) pairs that are in any of the new maps and not in any of the
97+ old maps.
98+
99+ Note that it may yield chk pages that are common (especially root nodes),
100+ but it won't yield (key,value) pairs that are common.
101+ """
102+
103+ def __init__(self, store, new_root_keys, old_root_keys,
104+ search_key_func, pb=None):
105+ self._store = store
106+ self._new_root_keys = new_root_keys
107+ self._old_root_keys = old_root_keys
108+ self._pb = pb
109+ # All uninteresting chks that we have seen. By the time they are added
110+ # here, they should be either fully ignored, or queued up for
111+ # processing
112+ self._all_old_chks = set(self._old_root_keys)
113+ # All items that we have seen from the old_root_keys
114+ self._all_old_items = set()
115+ # These are interesting items which were either read, or already in the
116+ # interesting queue (so we don't need to walk them again)
117+ self._processed_new_refs = set()
118+ self._search_key_func = search_key_func
119+
120+ # The uninteresting and interesting nodes to be searched
121+ self._old_queue = []
122+ self._new_queue = []
123+ # Holds the (key, value) items found when processing the root nodes,
124+ # waiting for the uninteresting nodes to be walked
125+ self._new_item_queue = []
126+ self._state = None
127+
128+ def _read_nodes_from_store(self, keys):
129+ # We chose not to use _page_cache, because we think in terms of records
130+ # to be yielded. Also, we expect to touch each page only 1 time during
131+ # this code. (We may want to evaluate saving the raw bytes into the
132+ # page cache, which would allow a working tree update after the fetch
133+ # to not have to read the bytes again.)
134+ stream = self._store.get_record_stream(keys, 'unordered', True)
135+ for record in stream:
136+ if self._pb is not None:
137+ self._pb.tick()
138+ if record.storage_kind == 'absent':
139+ raise errors.NoSuchRevision(self._store, record.key)
140 bytes = record.get_bytes_as('fulltext')
141- # We don't care about search_key_func for this code, because we
142- # only care about external references.
143- node = _deserialise(bytes, record.key, search_key_func=None)
144+ node = _deserialise(bytes, record.key,
145+ search_key_func=self._search_key_func)
146 if type(node) is InternalNode:
147- # uninteresting_prefix_chks.update(node._items.iteritems())
148- chks = node._items.values()
149- # TODO: We remove the entries that are already in
150- # uninteresting_chks ?
151- next_chks.update(chks)
152- all_uninteresting_chks.update(chks)
153+ # Note we don't have to do node.refs() because we know that
154+ # there are no children that have been pushed into this node
155+ prefix_refs = node._items.items()
156+ items = []
157 else:
158- all_uninteresting_items.update(node._items.iteritems())
159- chks_to_read = next_chks
160- return (all_uninteresting_chks, all_uninteresting_items,
161- interesting_keys, interesting_to_yield, interesting_items)
162+ prefix_refs = []
163+ items = node._items.items()
164+ yield record, node, prefix_refs, items
165+
166+ def _read_old_roots(self):
167+ old_chks_to_enqueue = []
168+ all_old_chks = self._all_old_chks
169+ for record, node, prefix_refs, items in \
170+ self._read_nodes_from_store(self._old_root_keys):
171+ # Uninteresting node
172+ prefix_refs = [p_r for p_r in prefix_refs
173+ if p_r[1] not in all_old_chks]
174+ new_refs = [p_r[1] for p_r in prefix_refs]
175+ all_old_chks.update(new_refs)
176+ self._all_old_items.update(items)
177+ # Queue up the uninteresting references
178+ # Don't actually put them in the 'to-read' queue until we have
179+ # finished checking the interesting references
180+ old_chks_to_enqueue.extend(prefix_refs)
181+ return old_chks_to_enqueue
182+
183+ def _enqueue_old(self, new_prefixes, old_chks_to_enqueue):
184+ # At this point, we have read all the uninteresting and interesting
185+ # items, so we can queue up the uninteresting stuff, knowing that we've
186+ # handled the interesting ones
187+ for prefix, ref in old_chks_to_enqueue:
188+ not_interesting = True
189+ for i in xrange(len(prefix), 0, -1):
190+ if prefix[:i] in new_prefixes:
191+ not_interesting = False
192+ break
193+ if not_interesting:
194+ # This prefix is not part of the remaining 'interesting set'
195+ continue
196+ self._old_queue.append(ref)
197+
198+ def _read_all_roots(self):
199+ """Read the root pages.
200+
201+ This is structured as a generator, so that the root records can be
202+ yielded up to whoever needs them without any buffering.
203+ """
204+ # This is the bootstrap phase
205+ if not self._old_root_keys:
206+ # With no old_root_keys we can just shortcut and be ready
207+ # for _flush_new_queue
208+ self._new_queue = list(self._new_root_keys)
209+ return
210+ old_chks_to_enqueue = self._read_old_roots()
211+ # filter out any root keys that are already known to be uninteresting
212+ new_keys = set(self._new_root_keys).difference(self._all_old_chks)
213+ # These are prefixes that are present in new_keys that we are
214+ # thinking to yield
215+ new_prefixes = set()
216+ # We are about to yield all of these, so we don't want them getting
217+ # added a second time
218+ processed_new_refs = self._processed_new_refs
219+ processed_new_refs.update(new_keys)
220+ for record, node, prefix_refs, items in \
221+ self._read_nodes_from_store(new_keys):
222+ # At this level, we now know all the uninteresting references
223+ # So we filter and queue up whatever is remaining
224+ prefix_refs = [p_r for p_r in prefix_refs
225+ if p_r[1] not in self._all_old_chks
226+ and p_r[1] not in processed_new_refs]
227+ refs = [p_r[1] for p_r in prefix_refs]
228+ new_prefixes.update([p_r[0] for p_r in prefix_refs])
229+ self._new_queue.extend(refs)
230+ # TODO: We can potentially get multiple items here, however the
231+ # current design allows for this, as callers will do the work
232+ # to make the results unique. We might profile whether we
233+ # gain anything by ensuring unique return values for items
234+ new_items = [item for item in items
235+ if item not in self._all_old_items]
236+ self._new_item_queue.extend(new_items)
237+ new_prefixes.update([self._search_key_func(item[0])
238+ for item in new_items])
239+ processed_new_refs.update(refs)
240+ yield record
241+ # For new_prefixes we have the full length prefixes queued up.
242+ # However, we also need possible prefixes. (If we have a known ref to
243+ # 'ab', then we also need to include 'a'.) So expand the
244+ # new_prefixes to include all shorter prefixes
245+ for prefix in list(new_prefixes):
246+ new_prefixes.update([prefix[:i] for i in xrange(1, len(prefix))])
247+ self._enqueue_old(new_prefixes, old_chks_to_enqueue)
248+
249+ def _flush_new_queue(self):
250+ # No need to maintain the heap invariant anymore, just pull things out
251+ # and process them
252+ refs = set(self._new_queue)
253+ self._new_queue = []
254+ # First pass, flush all interesting items and convert to using direct refs
255+ all_old_chks = self._all_old_chks
256+ processed_new_refs = self._processed_new_refs
257+ all_old_items = self._all_old_items
258+ new_items = [item for item in self._new_item_queue
259+ if item not in all_old_items]
260+ self._new_item_queue = []
261+ if new_items:
262+ yield None, new_items
263+ refs = refs.difference(all_old_chks)
264+ while refs:
265+ next_refs = set()
266+ next_refs_update = next_refs.update
267+ # Inlining _read_nodes_from_store improves 'bzr branch bzr.dev'
268+ # from 1m54s to 1m51s. Consider it.
269+ for record, _, p_refs, items in self._read_nodes_from_store(refs):
270+ items = [item for item in items
271+ if item not in all_old_items]
272+ yield record, items
273+ next_refs_update([p_r[1] for p_r in p_refs])
274+ next_refs = next_refs.difference(all_old_chks)
275+ next_refs = next_refs.difference(processed_new_refs)
276+ processed_new_refs.update(next_refs)
277+ refs = next_refs
278+
279+ def _process_next_old(self):
280+ # Since we don't filter uninteresting any further than during
281+ # _read_all_roots, process the whole queue in a single pass.
282+ refs = self._old_queue
283+ self._old_queue = []
284+ all_old_chks = self._all_old_chks
285+ for record, _, prefix_refs, items in self._read_nodes_from_store(refs):
286+ self._all_old_items.update(items)
287+ refs = [r for _,r in prefix_refs if r not in all_old_chks]
288+ self._old_queue.extend(refs)
289+ all_old_chks.update(refs)
290+
291+ def _process_queues(self):
292+ while self._old_queue:
293+ self._process_next_old()
294+ return self._flush_new_queue()
295+
296+ def process(self):
297+ for record in self._read_all_roots():
298+ yield record, []
299+ for record, items in self._process_queues():
300+ yield record, items
301
302
303 def iter_interesting_nodes(store, interesting_root_keys,
304@@ -1512,72 +1608,11 @@
305 :return: Yield
306 (interesting record, {interesting key:values})
307 """
308- # TODO: consider that it may be more memory efficient to use the 20-byte
309- # sha1 string, rather than tuples of hexidecimal sha1 strings.
310- # TODO: Try to factor out a lot of the get_record_stream() calls into a
311- # helper function similar to _read_bytes. This function should be
312- # able to use nodes from the _page_cache as well as actually
313- # requesting bytes from the store.
314-
315- (all_uninteresting_chks, all_uninteresting_items, interesting_keys,
316- interesting_to_yield, interesting_items) = _find_all_uninteresting(store,
317- interesting_root_keys, uninteresting_root_keys, pb)
318-
319- # Now that we know everything uninteresting, we can yield information from
320- # our first request
321- interesting_items.difference_update(all_uninteresting_items)
322- interesting_to_yield = set(interesting_to_yield) - all_uninteresting_chks
323- if interesting_items:
324- yield None, interesting_items
325- if interesting_to_yield:
326- # We request these records again, rather than buffering the root
327- # records, most likely they are still in the _group_cache anyway.
328- for record in store.get_record_stream(interesting_to_yield,
329- 'unordered', False):
330- yield record, []
331- all_uninteresting_chks.update(interesting_to_yield)
332- interesting_keys.difference_update(all_uninteresting_chks)
333-
334- chks_to_read = interesting_keys
335- counter = 0
336- while chks_to_read:
337- next_chks = set()
338- for record in store.get_record_stream(chks_to_read, 'unordered', False):
339- counter += 1
340- if pb is not None:
341- pb.update('find chk pages', counter)
342- # TODO: Handle 'absent'?
343- bytes = record.get_bytes_as('fulltext')
344- # We don't care about search_key_func for this code, because we
345- # only care about external references.
346- node = _deserialise(bytes, record.key, search_key_func=None)
347- if type(node) is InternalNode:
348- # all_uninteresting_chks grows large, as it lists all nodes we
349- # don't want to process (including already seen interesting
350- # nodes).
351- # small.difference_update(large) scales O(large), but
352- # small.difference(large) scales O(small).
353- # Also, we know we just _deserialised this node, so we can
354- # access the dict directly.
355- chks = set(node._items.itervalues()).difference(
356- all_uninteresting_chks)
357- # Is set() and .difference_update better than:
358- # chks = [chk for chk in node.refs()
359- # if chk not in all_uninteresting_chks]
360- next_chks.update(chks)
361- # These are now uninteresting everywhere else
362- all_uninteresting_chks.update(chks)
363- interesting_items = []
364- else:
365- interesting_items = [item for item in node._items.iteritems()
366- if item not in all_uninteresting_items]
367- # TODO: Do we need to filter out items that we have already
368- # seen on other pages? We don't really want to buffer the
369- # whole thing, but it does mean that callers need to
370- # understand they may get duplicate values.
371- # all_uninteresting_items.update(interesting_items)
372- yield record, interesting_items
373- chks_to_read = next_chks
374+ iterator = CHKMapDifference(store, interesting_root_keys,
375+ uninteresting_root_keys,
376+ search_key_func=store._search_key_func,
377+ pb=pb)
378+ return iterator.process()
379
380
381 try:
382
383=== modified file 'bzrlib/repofmt/groupcompress_repo.py'
384--- bzrlib/repofmt/groupcompress_repo.py 2009-07-06 14:47:36 +0000
385+++ bzrlib/repofmt/groupcompress_repo.py 2009-07-09 07:36:30 +0000
386@@ -650,6 +650,9 @@
387 parents=False, is_locked=self.is_locked,
388 inconsistency_fatal=False),
389 access=self._pack_collection.chk_index.data_access)
390+ search_key_name = self._format._serializer.search_key_name
391+ search_key_func = chk_map.search_key_registry.get(search_key_name)
392+ self.chk_bytes._search_key_func = search_key_func
393 # True when the repository object is 'write locked' (as opposed to the
394 # physical lock only taken out around changes to the pack-names list.)
395 # Another way to represent this would be a decorator around the control
396
397=== modified file 'bzrlib/tests/test_chk_map.py'
398--- bzrlib/tests/test_chk_map.py 2009-06-26 09:24:34 +0000
399+++ bzrlib/tests/test_chk_map.py 2009-07-09 07:36:31 +0000
400@@ -20,6 +20,7 @@
401
402 from bzrlib import (
403 chk_map,
404+ groupcompress,
405 osutils,
406 tests,
407 )
408@@ -59,17 +60,14 @@
409 self.assertCommonPrefix('', '', '')
410
411
412-class TestCaseWithStore(tests.TestCaseWithTransport):
413+class TestCaseWithStore(tests.TestCaseWithMemoryTransport):
414
415 def get_chk_bytes(self):
416 # The easiest way to get a CHK store is a development6 repository and
417 # then work with the chk_bytes attribute directly.
418- repo = self.make_repository(".", format="development6-rich-root")
419- repo.lock_write()
420- self.addCleanup(repo.unlock)
421- repo.start_write_group()
422- self.addCleanup(repo.abort_write_group)
423- return repo.chk_bytes
424+ factory = groupcompress.make_pack_factory(False, False, 1)
425+ self.chk_bytes = factory(self.get_transport())
426+ return self.chk_bytes
427
428 def _get_map(self, a_dict, maximum_size=0, chk_bytes=None, key_width=1,
429 search_key_func=None):
430@@ -97,6 +95,246 @@
431 return dict(node.iteritems(*args))
432
433
434+class TestCaseWithExampleMaps(TestCaseWithStore):
435+
436+ def get_chk_bytes(self):
437+ if getattr(self, '_chk_bytes', None) is None:
438+ self._chk_bytes = super(TestCaseWithExampleMaps,
439+ self).get_chk_bytes()
440+ return self._chk_bytes
441+
442+ def get_map(self, a_dict, maximum_size=100, search_key_func=None):
443+ c_map = self._get_map(a_dict, maximum_size=maximum_size,
444+ chk_bytes=self.get_chk_bytes(),
445+ search_key_func=search_key_func)
446+ return c_map
447+
448+ def make_root_only_map(self, search_key_func=None):
449+ return self.get_map({
450+ ('aaa',): 'initial aaa content',
451+ ('abb',): 'initial abb content',
452+ }, search_key_func=search_key_func)
453+
454+ def make_root_only_aaa_ddd_map(self, search_key_func=None):
455+ return self.get_map({
456+ ('aaa',): 'initial aaa content',
457+ ('ddd',): 'initial ddd content',
458+ }, search_key_func=search_key_func)
459+
460+ def make_one_deep_map(self, search_key_func=None):
461+ # Same as root_only_map, except it forces an InternalNode at the root
462+ return self.get_map({
463+ ('aaa',): 'initial aaa content',
464+ ('abb',): 'initial abb content',
465+ ('ccc',): 'initial ccc content',
466+ ('ddd',): 'initial ddd content',
467+ }, search_key_func=search_key_func)
468+
469+ def make_two_deep_map(self, search_key_func=None):
470+ # Carefully chosen so that it creates a 2-deep map for both
471+ # _search_key_plain and for _search_key_16
472+ # Also so that things line up with make_one_deep_two_prefix_map
473+ return self.get_map({
474+ ('aaa',): 'initial aaa content',
475+ ('abb',): 'initial abb content',
476+ ('acc',): 'initial acc content',
477+ ('ace',): 'initial ace content',
478+ ('add',): 'initial add content',
479+ ('adh',): 'initial adh content',
480+ ('adl',): 'initial adl content',
481+ ('ccc',): 'initial ccc content',
482+ ('ddd',): 'initial ddd content',
483+ }, search_key_func=search_key_func)
484+
485+ def make_one_deep_two_prefix_map(self, search_key_func=None):
486+ """Create a map with one internal node, but references are extra long.
487+
488+ Otherwise has similar content to make_two_deep_map.
489+ """
490+ return self.get_map({
491+ ('aaa',): 'initial aaa content',
492+ ('add',): 'initial add content',
493+ ('adh',): 'initial adh content',
494+ ('adl',): 'initial adl content',
495+ }, search_key_func=search_key_func)
496+
497+ def make_one_deep_one_prefix_map(self, search_key_func=None):
498+ """Create a map with one internal node, but references are extra long.
499+
500+ Similar to make_one_deep_two_prefix_map, except the split is at the
501+ first char, rather than the second.
502+ """
503+ return self.get_map({
504+ ('add',): 'initial add content',
505+ ('adh',): 'initial adh content',
506+ ('adl',): 'initial adl content',
507+ ('bbb',): 'initial bbb content',
508+ }, search_key_func=search_key_func)
509+
510+
511+class TestTestCaseWithExampleMaps(TestCaseWithExampleMaps):
512+ """Actual tests for the provided examples."""
513+
514+ def test_root_only_map_plain(self):
515+ c_map = self.make_root_only_map()
516+ self.assertEqualDiff(
517+ "'' LeafNode\n"
518+ " ('aaa',) 'initial aaa content'\n"
519+ " ('abb',) 'initial abb content'\n",
520+ c_map._dump_tree())
521+
522+ def test_root_only_map_16(self):
523+ c_map = self.make_root_only_map(search_key_func=chk_map._search_key_16)
524+ self.assertEqualDiff(
525+ "'' LeafNode\n"
526+ " ('aaa',) 'initial aaa content'\n"
527+ " ('abb',) 'initial abb content'\n",
528+ c_map._dump_tree())
529+
530+ def test_one_deep_map_plain(self):
531+ c_map = self.make_one_deep_map()
532+ self.assertEqualDiff(
533+ "'' InternalNode\n"
534+ " 'a' LeafNode\n"
535+ " ('aaa',) 'initial aaa content'\n"
536+ " ('abb',) 'initial abb content'\n"
537+ " 'c' LeafNode\n"
538+ " ('ccc',) 'initial ccc content'\n"
539+ " 'd' LeafNode\n"
540+ " ('ddd',) 'initial ddd content'\n",
541+ c_map._dump_tree())
542+
543+ def test_one_deep_map_16(self):
544+ c_map = self.make_one_deep_map(search_key_func=chk_map._search_key_16)
545+ self.assertEqualDiff(
546+ "'' InternalNode\n"
547+ " '2' LeafNode\n"
548+ " ('ccc',) 'initial ccc content'\n"
549+ " '4' LeafNode\n"
550+ " ('abb',) 'initial abb content'\n"
551+ " 'F' LeafNode\n"
552+ " ('aaa',) 'initial aaa content'\n"
553+ " ('ddd',) 'initial ddd content'\n",
554+ c_map._dump_tree())
555+
556+ def test_root_only_aaa_ddd_plain(self):
557+ c_map = self.make_root_only_aaa_ddd_map()
558+ self.assertEqualDiff(
559+ "'' LeafNode\n"
560+ " ('aaa',) 'initial aaa content'\n"
561+ " ('ddd',) 'initial ddd content'\n",
562+ c_map._dump_tree())
563+
564+ def test_one_deep_map_16(self):
565+ c_map = self.make_root_only_aaa_ddd_map(
566+ search_key_func=chk_map._search_key_16)
567+ # We use 'aaa' and 'ddd' because they happen to map to 'F' when using
568+ # _search_key_16
569+ self.assertEqualDiff(
570+ "'' LeafNode\n"
571+ " ('aaa',) 'initial aaa content'\n"
572+ " ('ddd',) 'initial ddd content'\n",
573+ c_map._dump_tree())
574+
575+ def test_two_deep_map_plain(self):
576+ c_map = self.make_two_deep_map()
577+ self.assertEqualDiff(
578+ "'' InternalNode\n"
579+ " 'a' InternalNode\n"
580+ " 'aa' LeafNode\n"
581+ " ('aaa',) 'initial aaa content'\n"
582+ " 'ab' LeafNode\n"
583+ " ('abb',) 'initial abb content'\n"
584+ " 'ac' LeafNode\n"
585+ " ('acc',) 'initial acc content'\n"
586+ " ('ace',) 'initial ace content'\n"
587+ " 'ad' LeafNode\n"
588+ " ('add',) 'initial add content'\n"
589+ " ('adh',) 'initial adh content'\n"
590+ " ('adl',) 'initial adl content'\n"
591+ " 'c' LeafNode\n"
592+ " ('ccc',) 'initial ccc content'\n"
593+ " 'd' LeafNode\n"
594+ " ('ddd',) 'initial ddd content'\n",
595+ c_map._dump_tree())
596+
597+ def test_two_deep_map_16(self):
598+ c_map = self.make_two_deep_map(search_key_func=chk_map._search_key_16)
599+ self.assertEqualDiff(
600+ "'' InternalNode\n"
601+ " '2' LeafNode\n"
602+ " ('acc',) 'initial acc content'\n"
603+ " ('ccc',) 'initial ccc content'\n"
604+ " '4' LeafNode\n"
605+ " ('abb',) 'initial abb content'\n"
606+ " 'C' LeafNode\n"
607+ " ('ace',) 'initial ace content'\n"
608+ " 'F' InternalNode\n"
609+ " 'F0' LeafNode\n"
610+ " ('aaa',) 'initial aaa content'\n"
611+ " 'F3' LeafNode\n"
612+ " ('adl',) 'initial adl content'\n"
613+ " 'F4' LeafNode\n"
614+ " ('adh',) 'initial adh content'\n"
615+ " 'FB' LeafNode\n"
616+ " ('ddd',) 'initial ddd content'\n"
617+ " 'FD' LeafNode\n"
618+ " ('add',) 'initial add content'\n",
619+ c_map._dump_tree())
620+
621+ def test_one_deep_two_prefix_map_plain(self):
622+ c_map = self.make_one_deep_two_prefix_map()
623+ self.assertEqualDiff(
624+ "'' InternalNode\n"
625+ " 'aa' LeafNode\n"
626+ " ('aaa',) 'initial aaa content'\n"
627+ " 'ad' LeafNode\n"
628+ " ('add',) 'initial add content'\n"
629+ " ('adh',) 'initial adh content'\n"
630+ " ('adl',) 'initial adl content'\n",
631+ c_map._dump_tree())
632+
633+ def test_one_deep_two_prefix_map_16(self):
634+ c_map = self.make_one_deep_two_prefix_map(
635+ search_key_func=chk_map._search_key_16)
636+ self.assertEqualDiff(
637+ "'' InternalNode\n"
638+ " 'F0' LeafNode\n"
639+ " ('aaa',) 'initial aaa content'\n"
640+ " 'F3' LeafNode\n"
641+ " ('adl',) 'initial adl content'\n"
642+ " 'F4' LeafNode\n"
643+ " ('adh',) 'initial adh content'\n"
644+ " 'FD' LeafNode\n"
645+ " ('add',) 'initial add content'\n",
646+ c_map._dump_tree())
647+
648+ def test_one_deep_one_prefix_map_plain(self):
649+ c_map = self.make_one_deep_one_prefix_map()
650+ self.assertEqualDiff(
651+ "'' InternalNode\n"
652+ " 'a' LeafNode\n"
653+ " ('add',) 'initial add content'\n"
654+ " ('adh',) 'initial adh content'\n"
655+ " ('adl',) 'initial adl content'\n"
656+ " 'b' LeafNode\n"
657+ " ('bbb',) 'initial bbb content'\n",
658+ c_map._dump_tree())
659+
660+ def test_one_deep_one_prefix_map_16(self):
661+ c_map = self.make_one_deep_one_prefix_map(
662+ search_key_func=chk_map._search_key_16)
663+ self.assertEqualDiff(
664+ "'' InternalNode\n"
665+ " '4' LeafNode\n"
666+ " ('bbb',) 'initial bbb content'\n"
667+ " 'F' LeafNode\n"
668+ " ('add',) 'initial add content'\n"
669+ " ('adh',) 'initial adh content'\n"
670+ " ('adl',) 'initial adl content'\n",
671+ c_map._dump_tree())
672+
673+
674 class TestMap(TestCaseWithStore):
675
676 def assertHasABMap(self, chk_bytes):
677@@ -1886,62 +2124,406 @@
678 # 1-4K get0
679
680
681-class TestIterInterestingNodes(TestCaseWithStore):
682-
683- def get_chk_bytes(self):
684- if getattr(self, '_chk_bytes', None) is None:
685- self._chk_bytes = super(TestIterInterestingNodes,
686- self).get_chk_bytes()
687- return self._chk_bytes
688+class TestCHKMapDifference(TestCaseWithExampleMaps):
689+
690+ def get_difference(self, new_roots, old_roots,
691+ search_key_func=None):
692+ if search_key_func is None:
693+ search_key_func = chk_map._search_key_plain
694+ return chk_map.CHKMapDifference(self.get_chk_bytes(),
695+ new_roots, old_roots, search_key_func)
696+
697+ def test__init__(self):
698+ c_map = self.make_root_only_map()
699+ key1 = c_map.key()
700+ c_map.map(('aaa',), 'new aaa content')
701+ key2 = c_map._save()
702+ diff = self.get_difference([key2], [key1])
703+ self.assertEqual(set([key1]), diff._all_old_chks)
704+ self.assertEqual([], diff._old_queue)
705+ self.assertEqual([], diff._new_queue)
706+
707+ def help__read_all_roots(self, search_key_func):
708+ c_map = self.make_root_only_map(search_key_func=search_key_func)
709+ key1 = c_map.key()
710+ c_map.map(('aaa',), 'new aaa content')
711+ key2 = c_map._save()
712+ diff = self.get_difference([key2], [key1], search_key_func)
713+ root_results = [record.key for record in diff._read_all_roots()]
714+ self.assertEqual([key2], root_results)
715+ # We should have queued up only items that aren't in the old
716+ # set
717+ self.assertEqual([(('aaa',), 'new aaa content')],
718+ diff._new_item_queue)
719+ self.assertEqual([], diff._new_queue)
720+ # And there are no old references, so that queue should be
721+ # empty
722+ self.assertEqual([], diff._old_queue)
723+
724+ def test__read_all_roots_plain(self):
725+ self.help__read_all_roots(search_key_func=chk_map._search_key_plain)
726+
727+ def test__read_all_roots_16(self):
728+ self.help__read_all_roots(search_key_func=chk_map._search_key_16)
729+
730+ def test__read_all_roots_skips_known_old(self):
731+ c_map = self.make_one_deep_map(chk_map._search_key_plain)
732+ key1 = c_map.key()
733+ c_map2 = self.make_root_only_map(chk_map._search_key_plain)
734+ key2 = c_map2.key()
735+ diff = self.get_difference([key2], [key1], chk_map._search_key_plain)
736+ root_results = [record.key for record in diff._read_all_roots()]
737+ # We should have no results. key2 is completely contained within key1,
738+ # and we should have seen that in the first pass
739+ self.assertEqual([], root_results)
740+
741+ def test__read_all_roots_prepares_queues(self):
742+ c_map = self.make_one_deep_map(chk_map._search_key_plain)
743+ key1 = c_map.key()
744+ c_map._dump_tree() # load everything
745+ key1_a = c_map._root_node._items['a'].key()
746+ c_map.map(('abb',), 'new abb content')
747+ key2 = c_map._save()
748+ key2_a = c_map._root_node._items['a'].key()
749+ diff = self.get_difference([key2], [key1], chk_map._search_key_plain)
750+ root_results = [record.key for record in diff._read_all_roots()]
751+ self.assertEqual([key2], root_results)
752+ # At this point, we should have queued up only the 'a' Leaf on both
753+ # sides, both 'c' and 'd' are known to not have changed on both sides
754+ self.assertEqual([key2_a], diff._new_queue)
755+ self.assertEqual([], diff._new_item_queue)
756+ self.assertEqual([key1_a], diff._old_queue)
757+
758+ def test__read_all_roots_multi_new_prepares_queues(self):
759+ c_map = self.make_one_deep_map(chk_map._search_key_plain)
760+ key1 = c_map.key()
761+ c_map._dump_tree() # load everything
762+ key1_a = c_map._root_node._items['a'].key()
763+ key1_c = c_map._root_node._items['c'].key()
764+ c_map.map(('abb',), 'new abb content')
765+ key2 = c_map._save()
766+ key2_a = c_map._root_node._items['a'].key()
767+ key2_c = c_map._root_node._items['c'].key()
768+ c_map = chk_map.CHKMap(self.get_chk_bytes(), key1,
769+ chk_map._search_key_plain)
770+ c_map.map(('ccc',), 'new ccc content')
771+ key3 = c_map._save()
772+ key3_a = c_map._root_node._items['a'].key()
773+ key3_c = c_map._root_node._items['c'].key()
774+ diff = self.get_difference([key2, key3], [key1],
775+ chk_map._search_key_plain)
776+ root_results = [record.key for record in diff._read_all_roots()]
777+ self.assertEqual(sorted([key2, key3]), sorted(root_results))
778+ # We should have queued up key2_a, and key3_c, but not key2_c or key3_c
779+ self.assertEqual([key2_a, key3_c], diff._new_queue)
780+ self.assertEqual([], diff._new_item_queue)
781+ # And we should have queued up both a and c for the old set
782+ self.assertEqual([key1_a, key1_c], diff._old_queue)
783+
784+ def test__read_all_roots_different_depths(self):
785+ c_map = self.make_two_deep_map(chk_map._search_key_plain)
786+ c_map._dump_tree() # load everything
787+ key1 = c_map.key()
788+ key1_a = c_map._root_node._items['a'].key()
789+ key1_c = c_map._root_node._items['c'].key()
790+ key1_d = c_map._root_node._items['d'].key()
791+
792+ c_map2 = self.make_one_deep_two_prefix_map(chk_map._search_key_plain)
793+ c_map2._dump_tree()
794+ key2 = c_map2.key()
795+ key2_aa = c_map2._root_node._items['aa'].key()
796+ key2_ad = c_map2._root_node._items['ad'].key()
797+
798+ diff = self.get_difference([key2], [key1], chk_map._search_key_plain)
799+ root_results = [record.key for record in diff._read_all_roots()]
800+ self.assertEqual([key2], root_results)
801+ # Only the 'a' subset should be queued up, since 'c' and 'd' cannot be
802+ # present
803+ self.assertEqual([key1_a], diff._old_queue)
804+ self.assertEqual([key2_aa, key2_ad], diff._new_queue)
805+ self.assertEqual([], diff._new_item_queue)
806+
807+ diff = self.get_difference([key1], [key2], chk_map._search_key_plain)
808+ root_results = [record.key for record in diff._read_all_roots()]
809+ self.assertEqual([key1], root_results)
810+
811+ self.assertEqual([key2_aa, key2_ad], diff._old_queue)
812+ self.assertEqual([key1_a, key1_c, key1_d], diff._new_queue)
813+ self.assertEqual([], diff._new_item_queue)
814+
815+ def test__read_all_roots_different_depths_16(self):
816+ c_map = self.make_two_deep_map(chk_map._search_key_16)
817+ c_map._dump_tree() # load everything
818+ key1 = c_map.key()
819+ key1_2 = c_map._root_node._items['2'].key()
820+ key1_4 = c_map._root_node._items['4'].key()
821+ key1_C = c_map._root_node._items['C'].key()
822+ key1_F = c_map._root_node._items['F'].key()
823+
824+ c_map2 = self.make_one_deep_two_prefix_map(chk_map._search_key_16)
825+ c_map2._dump_tree()
826+ key2 = c_map2.key()
827+ key2_F0 = c_map2._root_node._items['F0'].key()
828+ key2_F3 = c_map2._root_node._items['F3'].key()
829+ key2_F4 = c_map2._root_node._items['F4'].key()
830+ key2_FD = c_map2._root_node._items['FD'].key()
831+
832+ diff = self.get_difference([key2], [key1], chk_map._search_key_16)
833+ root_results = [record.key for record in diff._read_all_roots()]
834+ self.assertEqual([key2], root_results)
835+ # Only the subset of keys that may be present should be queued up.
836+ self.assertEqual([key1_F], diff._old_queue)
837+ self.assertEqual(sorted([key2_F0, key2_F3, key2_F4, key2_FD]),
838+ sorted(diff._new_queue))
839+ self.assertEqual([], diff._new_item_queue)
840+
841+ diff = self.get_difference([key1], [key2], chk_map._search_key_16)
842+ root_results = [record.key for record in diff._read_all_roots()]
843+ self.assertEqual([key1], root_results)
844+
845+ self.assertEqual(sorted([key2_F0, key2_F3, key2_F4, key2_FD]),
846+ sorted(diff._old_queue))
847+ self.assertEqual(sorted([key1_2, key1_4, key1_C, key1_F]),
848+ sorted(diff._new_queue))
849+ self.assertEqual([], diff._new_item_queue)
850+
851+ def test__read_all_roots_mixed_depth(self):
852+ c_map = self.make_one_deep_two_prefix_map(chk_map._search_key_plain)
853+ c_map._dump_tree() # load everything
854+ key1 = c_map.key()
855+ key1_aa = c_map._root_node._items['aa'].key()
856+ key1_ad = c_map._root_node._items['ad'].key()
857+
858+ c_map2 = self.make_one_deep_one_prefix_map(chk_map._search_key_plain)
859+ c_map2._dump_tree()
860+ key2 = c_map2.key()
861+ key2_a = c_map2._root_node._items['a'].key()
862+ key2_b = c_map2._root_node._items['b'].key()
863+
864+ diff = self.get_difference([key2], [key1], chk_map._search_key_plain)
865+ root_results = [record.key for record in diff._read_all_roots()]
866+ self.assertEqual([key2], root_results)
867+ # 'ad' matches exactly 'a' on the other side, so it should be removed,
868+ # and neither side should have it queued for walking
869+ self.assertEqual([], diff._old_queue)
870+ self.assertEqual([key2_b], diff._new_queue)
871+ self.assertEqual([], diff._new_item_queue)
872+
873+ diff = self.get_difference([key1], [key2], chk_map._search_key_plain)
874+ root_results = [record.key for record in diff._read_all_roots()]
875+ self.assertEqual([key1], root_results)
876+ # Note: This is technically not the 'true minimal' set that we could
877+ # use The reason is that 'a' was matched exactly to 'ad' (by sha
878+ # sum). However, the code gets complicated in the case of more
879+ # than one interesting key, so for now, we live with this
880+ # Consider revising, though benchmarking showing it to be a
881+ # real-world issue should be done
882+ self.assertEqual([key2_a], diff._old_queue)
883+ # self.assertEqual([], diff._old_queue)
884+ self.assertEqual([key1_aa], diff._new_queue)
885+ self.assertEqual([], diff._new_item_queue)
886+
887+ def test__read_all_roots_yields_extra_deep_records(self):
888+ # This is slightly controversial, as we will yield a chk page that we
889+ # might later on find out could be filtered out. (If a root node is
890+ # referenced deeper in the old set.)
891+ # However, even with stacking, we always have all chk pages that we
892+ # will need. So as long as we filter out the referenced keys, we'll
893+ # never run into problems.
894+ # This allows us to yield a root node record immediately, without any
895+ # buffering.
896+ c_map = self.make_two_deep_map(chk_map._search_key_plain)
897+ c_map._dump_tree() # load all keys
898+ key1 = c_map.key()
899+ key1_a = c_map._root_node._items['a'].key()
900+ c_map2 = self.get_map({
901+ ('acc',): 'initial acc content',
902+ ('ace',): 'initial ace content',
903+ }, maximum_size=100)
904+ self.assertEqualDiff(
905+ "'' LeafNode\n"
906+ " ('acc',) 'initial acc content'\n"
907+ " ('ace',) 'initial ace content'\n",
908+ c_map2._dump_tree())
909+ key2 = c_map2.key()
910+ diff = self.get_difference([key2], [key1], chk_map._search_key_plain)
911+ root_results = [record.key for record in diff._read_all_roots()]
912+ self.assertEqual([key2], root_results)
913+ # However, even though we have yielded the root node to be fetched,
914+ # we should have enqued all of the chk pages to be walked, so that we
915+ # can find the keys if they are present
916+ self.assertEqual([key1_a], diff._old_queue)
917+ self.assertEqual([(('acc',), 'initial acc content'),
918+ (('ace',), 'initial ace content'),
919+ ], diff._new_item_queue)
920+
921+ def test__read_all_roots_multiple_targets(self):
922+ c_map = self.make_root_only_map()
923+ key1 = c_map.key()
924+ c_map = self.make_one_deep_map()
925+ key2 = c_map.key()
926+ c_map._dump_tree()
927+ key2_c = c_map._root_node._items['c'].key()
928+ key2_d = c_map._root_node._items['d'].key()
929+ c_map.map(('ccc',), 'new ccc value')
930+ key3 = c_map._save()
931+ key3_c = c_map._root_node._items['c'].key()
932+ diff = self.get_difference([key2, key3], [key1],
933+ chk_map._search_key_plain)
934+ root_results = [record.key for record in diff._read_all_roots()]
935+ self.assertEqual(sorted([key2, key3]), sorted(root_results))
936+ self.assertEqual([], diff._old_queue)
937+ # the key 'd' is interesting from key2 and key3, but should only be
938+ # entered into the queue 1 time
939+ self.assertEqual(sorted([key2_c, key3_c, key2_d]),
940+ sorted(diff._new_queue))
941+ self.assertEqual([], diff._new_item_queue)
942+
943+ def test__read_all_roots_no_old(self):
944+ # This is the 'initial branch' case. With nothing in the old
945+ # set, we can just queue up all root nodes into interesting queue, and
946+ # then have them fast-path flushed via _flush_new_queue
947+ c_map = self.make_two_deep_map()
948+ key1 = c_map.key()
949+ diff = self.get_difference([key1], [], chk_map._search_key_plain)
950+ root_results = [record.key for record in diff._read_all_roots()]
951+ self.assertEqual([], root_results)
952+ self.assertEqual([], diff._old_queue)
953+ self.assertEqual([key1], diff._new_queue)
954+ self.assertEqual([], diff._new_item_queue)
955+
956+ c_map2 = self.make_one_deep_map()
957+ key2 = c_map2.key()
958+ diff = self.get_difference([key1, key2], [], chk_map._search_key_plain)
959+ root_results = [record.key for record in diff._read_all_roots()]
960+ self.assertEqual([], root_results)
961+ self.assertEqual([], diff._old_queue)
962+ self.assertEqual(sorted([key1, key2]), sorted(diff._new_queue))
963+ self.assertEqual([], diff._new_item_queue)
964+
965+ def test__read_all_roots_no_old_16(self):
966+ c_map = self.make_two_deep_map(chk_map._search_key_16)
967+ key1 = c_map.key()
968+ diff = self.get_difference([key1], [], chk_map._search_key_16)
969+ root_results = [record.key for record in diff._read_all_roots()]
970+ self.assertEqual([], root_results)
971+ self.assertEqual([], diff._old_queue)
972+ self.assertEqual([key1], diff._new_queue)
973+ self.assertEqual([], diff._new_item_queue)
974+
975+ c_map2 = self.make_one_deep_map(chk_map._search_key_16)
976+ key2 = c_map2.key()
977+ diff = self.get_difference([key1, key2], [],
978+ chk_map._search_key_16)
979+ root_results = [record.key for record in diff._read_all_roots()]
980+ self.assertEqual([], root_results)
981+ self.assertEqual([], diff._old_queue)
982+ self.assertEqual(sorted([key1, key2]),
983+ sorted(diff._new_queue))
984+ self.assertEqual([], diff._new_item_queue)
985+
986+ def test__read_all_roots_multiple_old(self):
987+ c_map = self.make_two_deep_map()
988+ key1 = c_map.key()
989+ c_map._dump_tree() # load everything
990+ key1_a = c_map._root_node._items['a'].key()
991+ c_map.map(('ccc',), 'new ccc value')
992+ key2 = c_map._save()
993+ key2_a = c_map._root_node._items['a'].key()
994+ c_map.map(('add',), 'new add value')
995+ key3 = c_map._save()
996+ key3_a = c_map._root_node._items['a'].key()
997+ diff = self.get_difference([key3], [key1, key2],
998+ chk_map._search_key_plain)
999+ root_results = [record.key for record in diff._read_all_roots()]
1000+ self.assertEqual([key3], root_results)
1001+ # the 'a' keys should not be queued up 2 times, since they are
1002+ # identical
1003+ self.assertEqual([key1_a], diff._old_queue)
1004+ self.assertEqual([key3_a], diff._new_queue)
1005+ self.assertEqual([], diff._new_item_queue)
1006+
1007+ def test__process_next_old_batched_no_dupes(self):
1008+ c_map = self.make_two_deep_map()
1009+ key1 = c_map.key()
1010+ c_map._dump_tree() # load everything
1011+ key1_a = c_map._root_node._items['a'].key()
1012+ key1_aa = c_map._root_node._items['a']._items['aa'].key()
1013+ key1_ab = c_map._root_node._items['a']._items['ab'].key()
1014+ key1_ac = c_map._root_node._items['a']._items['ac'].key()
1015+ key1_ad = c_map._root_node._items['a']._items['ad'].key()
1016+ c_map.map(('aaa',), 'new aaa value')
1017+ key2 = c_map._save()
1018+ key2_a = c_map._root_node._items['a'].key()
1019+ key2_aa = c_map._root_node._items['a']._items['aa'].key()
1020+ c_map.map(('acc',), 'new acc content')
1021+ key3 = c_map._save()
1022+ key3_a = c_map._root_node._items['a'].key()
1023+ key3_ac = c_map._root_node._items['a']._items['ac'].key()
1024+ diff = self.get_difference([key3], [key1, key2],
1025+ chk_map._search_key_plain)
1026+ root_results = [record.key for record in diff._read_all_roots()]
1027+ self.assertEqual([key3], root_results)
1028+ self.assertEqual(sorted([key1_a, key2_a]),
1029+ sorted(diff._old_queue))
1030+ self.assertEqual([key3_a], diff._new_queue)
1031+ self.assertEqual([], diff._new_item_queue)
1032+ diff._process_next_old()
1033+ # All of the old records should be brought in and queued up,
1034+ # but we should not have any duplicates
1035+ self.assertEqual(sorted([key1_aa, key1_ab, key1_ac, key1_ad, key2_aa]),
1036+ sorted(diff._old_queue))
1037+
1038+
1039+class TestIterInterestingNodes(TestCaseWithExampleMaps):
1040
1041 def get_map_key(self, a_dict, maximum_size=10):
1042- c_map = self._get_map(a_dict, maximum_size=maximum_size,
1043- chk_bytes=self.get_chk_bytes())
1044+ c_map = self.get_map(a_dict, maximum_size=maximum_size)
1045 return c_map.key()
1046
1047- def assertIterInteresting(self, expected, interesting_keys,
1048- uninteresting_keys):
1049+ def assertIterInteresting(self, records, items, interesting_keys,
1050+ old_keys):
1051 """Check the result of iter_interesting_nodes.
1052
1053- :param expected: A list of (record_keys, interesting_chk_pages,
1054- interesting key value pairs)
1055+ Note that we no longer care how many steps are taken, etc, just that
1056+ the right contents are returned.
1057+
1058+ :param records: A list of record keys that should be yielded
1059+ :param items: A list of items (key,value) that should be yielded.
1060 """
1061 store = self.get_chk_bytes()
1062+ store._search_key_func = chk_map._search_key_plain
1063 iter_nodes = chk_map.iter_interesting_nodes(store, interesting_keys,
1064- uninteresting_keys)
1065- nodes = list(iter_nodes)
1066- for count, (exp, act) in enumerate(izip(expected, nodes)):
1067- exp_record, exp_items = exp
1068- record, items = act
1069- exp_tuple = (exp_record, sorted(exp_items))
1070- if record is None:
1071- act_tuple = (None, sorted(items))
1072- else:
1073- act_tuple = (record.key, sorted(items))
1074- self.assertEqual(exp_tuple, act_tuple,
1075- 'entry %d did not match expected' % count)
1076- self.assertEqual(len(expected), len(nodes))
1077+ old_keys)
1078+ record_keys = []
1079+ all_items = []
1080+ for record, new_items in iter_nodes:
1081+ if record is not None:
1082+ record_keys.append(record.key)
1083+ if new_items:
1084+ all_items.extend(new_items)
1085+ self.assertEqual(sorted(records), sorted(record_keys))
1086+ self.assertEqual(sorted(items), sorted(all_items))
1087
1088 def test_empty_to_one_keys(self):
1089 target = self.get_map_key({('a',): 'content'})
1090- self.assertIterInteresting(
1091- [(target, [(('a',), 'content')]),
1092- ], [target], [])
1093+ self.assertIterInteresting([target],
1094+ [(('a',), 'content')],
1095+ [target], [])
1096
1097 def test_none_to_one_key(self):
1098 basis = self.get_map_key({})
1099 target = self.get_map_key({('a',): 'content'})
1100- self.assertIterInteresting(
1101- [(None, [(('a',), 'content')]),
1102- (target, []),
1103- ], [target], [basis])
1104+ self.assertIterInteresting([target],
1105+ [(('a',), 'content')],
1106+ [target], [basis])
1107
1108 def test_one_to_none_key(self):
1109 basis = self.get_map_key({('a',): 'content'})
1110 target = self.get_map_key({})
1111- self.assertIterInteresting(
1112- [(target, [])],
1113- [target], [basis])
1114+ self.assertIterInteresting([target],
1115+ [],
1116+ [target], [basis])
1117
1118 def test_common_pages(self):
1119 basis = self.get_map_key({('a',): 'content',
1120@@ -1964,10 +2546,9 @@
1121 target_map._dump_tree())
1122 b_key = target_map._root_node._items['b'].key()
1123 # This should return the root node, and the node for the 'b' key
1124- self.assertIterInteresting(
1125- [(target, []),
1126- (b_key, [(('b',), 'other content')])],
1127- [target], [basis])
1128+ self.assertIterInteresting([target, b_key],
1129+ [(('b',), 'other content')],
1130+ [target], [basis])
1131
1132 def test_common_sub_page(self):
1133 basis = self.get_map_key({('aaa',): 'common',
1134@@ -1991,12 +2572,11 @@
1135 # The key for the internal aa node
1136 a_key = target_map._root_node._items['a'].key()
1137 # The key for the leaf aab node
1138+ # aaa_key = target_map._root_node._items['a']._items['aaa'].key()
1139 aab_key = target_map._root_node._items['a']._items['aab'].key()
1140- self.assertIterInteresting(
1141- [(target, []),
1142- (a_key, []),
1143- (aab_key, [(('aab',), 'new')])],
1144- [target], [basis])
1145+ self.assertIterInteresting([target, a_key, aab_key],
1146+ [(('aab',), 'new')],
1147+ [target], [basis])
1148
1149 def test_common_leaf(self):
1150 basis = self.get_map_key({})
1151@@ -2040,29 +2620,22 @@
1152 a_key = target3_map._root_node._items['a'].key()
1153 aac_key = target3_map._root_node._items['a']._items['aac'].key()
1154 self.assertIterInteresting(
1155- [(None, [(('aaa',), 'common')]),
1156- (target1, []),
1157- (target2, []),
1158- (target3, []),
1159- (b_key, [(('bbb',), 'new')]),
1160- (a_key, []),
1161- (aac_key, [(('aac',), 'other')]),
1162- ], [target1, target2, target3], [basis])
1163-
1164- self.assertIterInteresting(
1165- [(target2, []),
1166- (target3, []),
1167- (b_key, [(('bbb',), 'new')]),
1168- (a_key, []),
1169- (aac_key, [(('aac',), 'other')]),
1170- ], [target2, target3], [target1])
1171-
1172- # This may be a case that we relax. A root node is a deep child of the
1173- # excluded set. The cost is buffering root nodes until we have
1174- # determined all possible exclusions. (Because a prefix of '', cannot
1175- # be excluded.)
1176- self.assertIterInteresting(
1177- [], [target1], [target3])
1178+ [target1, target2, target3, a_key, aac_key, b_key],
1179+ [(('aaa',), 'common'), (('bbb',), 'new'), (('aac',), 'other')],
1180+ [target1, target2, target3], [basis])
1181+
1182+ self.assertIterInteresting(
1183+ [target2, target3, a_key, aac_key, b_key],
1184+ [(('bbb',), 'new'), (('aac',), 'other')],
1185+ [target2, target3], [target1])
1186+
1187+ # Technically, target1 could be filtered out, but since it is a root
1188+ # node, we yield it immediately, rather than waiting to find out much
1189+ # later on.
1190+ self.assertIterInteresting(
1191+ [target1],
1192+ [],
1193+ [target1], [target3])
1194
1195 def test_multiple_maps(self):
1196 basis1 = self.get_map_key({('aaa',): 'common',
1197@@ -2111,18 +2684,14 @@
1198 # The key for the leaf bba node
1199 bba_key = target2_map._root_node._items['b']._items['bba'].key()
1200 self.assertIterInteresting(
1201- [(target1, []),
1202- (target2, []),
1203- (a_key, []),
1204- (b_key, []),
1205- (aac_key, [(('aac',), 'target1')]),
1206- (bba_key, [(('bba',), 'target2')]),
1207- ], [target1, target2], [basis1, basis2])
1208+ [target1, target2, a_key, aac_key, b_key, bba_key],
1209+ [(('aac',), 'target1'), (('bba',), 'target2')],
1210+ [target1, target2], [basis1, basis2])
1211
1212 def test_multiple_maps_overlapping_common_new(self):
1213 # Test that when a node found through the interesting_keys iteration
1214- # for *some roots* and also via the uninteresting keys iteration, that
1215- # it is still scanned for uninteresting refs and items, because its
1216+ # for *some roots* and also via the old keys iteration, that
1217+ # it is still scanned for old refs and items, because its
1218 # not truely new. This requires 2 levels of InternalNodes to expose,
1219 # because of the way the bootstrap in _find_children_info works.
1220 # This suggests that the code is probably amenable to/benefit from
1221@@ -2188,17 +2757,10 @@
1222 right_map._dump_tree())
1223 # Keys from the right side target - none, the root is enough.
1224 # Test behaviour
1225- self.expectFailure("we don't properly filter different depths",
1226- self.assertIterInteresting,
1227- [(left, []),
1228- (right, []),
1229- (l_d_key, [(('ddd',), 'change')]),
1230- ], [left, right], [basis])
1231 self.assertIterInteresting(
1232- [(left, []),
1233- (right, []),
1234- (l_d_key, [(('ddd',), 'change')]),
1235- ], [left, right], [basis])
1236+ [right, left, l_d_key],
1237+ [(('ddd',), 'change')],
1238+ [left, right], [basis])
1239
1240 def test_multiple_maps_similar(self):
1241 # We want to have a depth=2 tree, with multiple entries in each leaf
1242@@ -2259,8 +2821,6 @@
1243 r_a_key = right_map._root_node._items['a'].key()
1244 r_c_key = right_map._root_node._items['c'].key()
1245 self.assertIterInteresting(
1246- [(left, []),
1247- (right, []),
1248- (l_a_key, [(('abb',), 'changed left')]),
1249- (r_c_key, [(('cbb',), 'changed right')]),
1250- ], [left, right], [basis])
1251+ [right, left, l_a_key, r_c_key],
1252+ [(('abb',), 'changed left'), (('cbb',), 'changed right')],
1253+ [left, right], [basis])