Merge lp:~jameinel/bzr/1.17-chk-multilevel into lp:~bzr/bzr/trunk-old

Proposed by John A Meinel
Status: Superseded
Proposed branch: lp:~jameinel/bzr/1.17-chk-multilevel
Merge into: lp:~bzr/bzr/trunk-old
Diff against target: 1246 lines
To merge this branch: bzr merge lp:~jameinel/bzr/1.17-chk-multilevel
Reviewer Review Type Date Requested Status
bzr-core Pending
Review via email: mp+8100@code.launchpad.net

This proposal has been superseded by a proposal from 2009-07-02.

To post a comment you must log in.
Revision history for this message
John A Meinel (jameinel) wrote :
Download full text (4.0 KiB)

This is a slightly better fix for bug #390563. The main changes are:

1) Rewrite iter_interesting_nodes using a class, rather than maintaining state between functions
2) Split up the processing of root nodes into:
  a) Read all uninteresting roots, and find uninteresting chks and items
  b) Filter out any interesting root keys that were found in (a)
  c) Read all remaining root pages and yield them
  d) Find interesting references that are not in (a) (and interesting items)
     Note the search key prefixes for these references and items, and queue
     them up to be returned later.
  e) Filter out uninteresting chk pages based on the search key prefixes in (d)

(d) and (e) are the main differences.
For (d) Robert found an edge case where we could return a non-interesting item (which would, in turn, be a text key, which would then be a failure for fetch) when one side had an immediate leaf node, and the other side had a deep tree. The only real change is to just defer returning interesting *items* until all uninteresting pages have been read.

(e) Before we used the 'intersection of interesting references', whereas now we filter simply based on prefix. Likely to be a small win if there are lots of uninteresting pages (lots of merge parents that are not being transmitted).

3) After processing root nodes
  a) Process the uninteresing queue until completion
  b) Process the interesting queue, filtering out all uninteresting items.
     (some minor things here, like doing batch '.difference()' checks rather than .difference on every record processed)

I looked at trying to use a heapq to unify things more with 'iter_changes' but the constraints are *very* different.

1) CHKMap.iter_changes() is symmetric. In that we need to return a record for (source, target, extra), for iter_interesting_nodes, we *only* need to return information for the 'target' side.
2) CHKMap.iter_changes() only ever effects 2 records, a single source and a single target. iter_interesting_nodes is looking at a many-to-many comparison.
3) CHKMap.iter_changes() only cares about the actual keys, and not about sending the raw byte records to be transmitted for fetch.
4) iter_interesting_nodes is much more concerned about batching because of its many-to-many nature, and the fact it is used for fetch. We really don't want to be doing a round trip per page.
5) The only gain for doing heapq is if we could figure out the algorithm for filtering out nodes that we thought we would want to read, but really don't. However,
  a) I do this at the root level, because it is fairly straightforward. So any further improvements only effect trees that are deeper than 2 (1 root, 1+ internal, 1 leaf), however this only happens for trees > 50k items.
  b) Reading extra chk pages isn't "incorrect", just "inefficient" (the stacking code ensures all referenced chk pages are present in the stacked-on branch, just like we ensure compression texts.)
  c) The best I could come up with would be to split the reads up by prefix. So you would page in the 'a' references, then the 'b' references, etc. However, that forces us to split the requests up even further, rather than larger batches of requests. (more...

Read more...

Revision history for this message
Robert Collins (lifeless) wrote :

Shallow immediate review:

The docstring for the class, and perhaps the class name could be better.

In terms of sets its:
interesting - uninteresting

So perhaps
CHKMapDifference

as a class name. And for the docstring, something like

"""Iterate the stored pages and key,value pairs for (new - old).

This class provides a generator over the stored CHK pages and the
key,value pairs that are in any of the new maps and not in any of the
old maps.
"""

-Rob

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'bzrlib/chk_map.py'
--- bzrlib/chk_map.py 2009-06-26 09:24:34 +0000
+++ bzrlib/chk_map.py 2009-07-01 22:35:14 +0000
@@ -1398,106 +1398,202 @@
1398 return node1398 return node
13991399
14001400
1401def _find_children_info(store, interesting_keys, uninteresting_keys, pb):1401class InterestingNodeIterator(object):
1402 """Read the associated records, and determine what is interesting."""1402 """Determine the nodes and items that are 'interesting'
1403 uninteresting_keys = set(uninteresting_keys)1403
1404 chks_to_read = uninteresting_keys.union(interesting_keys)1404 This is defined as being present in the interesting set, but not being
1405 next_uninteresting = set()1405 present in the uninteresting set.
1406 next_interesting = set()1406 """
1407 next_interesting_intersection = None1407
1408 uninteresting_items = set()1408 def __init__(self, store, interesting_root_keys, uninteresting_root_keys,
1409 interesting_items = set()1409 search_key_func, pb=None):
1410 interesting_to_yield = []1410 self._store = store
1411 for record in store.get_record_stream(chks_to_read, 'unordered', True):1411 self._interesting_root_keys = interesting_root_keys
1412 # records_read.add(record.key())1412 self._uninteresting_root_keys = uninteresting_root_keys
1413 if pb is not None:1413 self._pb = pb
1414 pb.tick()1414 # All uninteresting chks that we have seen. By the time they are added
1415 bytes = record.get_bytes_as('fulltext')1415 # here, they should be either fully ignored, or queued up for
1416 # We don't care about search_key_func for this code, because we only1416 # processing
1417 # care about external references.1417 self._all_uninteresting_chks = set(self._uninteresting_root_keys)
1418 node = _deserialise(bytes, record.key, search_key_func=None)1418 # All items that we have seen from the uninteresting_root_keys
1419 if record.key in uninteresting_keys:1419 self._all_uninteresting_items = set()
1420 if type(node) is InternalNode:1420 # These are interesting items which were either read, or already in the
1421 next_uninteresting.update(node.refs())1421 # interesting queue (so we don't need to walk them again)
1422 else:1422 self._processed_interesting_refs = set()
1423 # We know we are at a LeafNode, so we can pass None for the1423 self._search_key_func = search_key_func
1424 # store1424
1425 uninteresting_items.update(node.iteritems(None))1425 # The uninteresting and interesting nodes to be searched
1426 else:1426 self._uninteresting_queue = []
1427 interesting_to_yield.append(record.key)1427 self._interesting_queue = []
1428 if type(node) is InternalNode:1428 # Holds the (key, value) items found when processing the root nodes,
1429 if next_interesting_intersection is None:1429 # waiting for the uninteresting nodes to be walked
1430 next_interesting_intersection = set(node.refs())1430 self._interesting_item_queue = []
1431 else:1431 self._state = None
1432 next_interesting_intersection = \1432
1433 next_interesting_intersection.intersection(node.refs())1433 def _read_nodes_from_store(self, keys):
1434 next_interesting.update(node.refs())1434 # We chose not to use _page_cache, because we think in terms of records
1435 else:1435 # to be yielded. Also, we expect to touch each page only 1 time during
1436 interesting_items.update(node.iteritems(None))1436 # this code. (We may want to evaluate saving the raw bytes into the
1437 return (next_uninteresting, uninteresting_items,1437 # page cache, which would allow a working tree update after the fetch
1438 next_interesting, interesting_to_yield, interesting_items,1438 # to not have to read the bytes again.)
1439 next_interesting_intersection)1439 stream = self._store.get_record_stream(keys, 'unordered', True)
14401440 for record in stream:
14411441 if self._pb is not None:
1442def _find_all_uninteresting(store, interesting_root_keys,1442 self._pb.tick()
1443 uninteresting_root_keys, pb):1443 if record.storage_kind == 'absent':
1444 """Determine the full set of uninteresting keys."""1444 raise errors.NoSuchRevision(self._store, record.key)
1445 # What about duplicates between interesting_root_keys and
1446 # uninteresting_root_keys?
1447 if not uninteresting_root_keys:
1448 # Shortcut case. We know there is nothing uninteresting to filter out
1449 # So we just let the rest of the algorithm do the work
1450 # We know there is nothing uninteresting, and we didn't have to read
1451 # any interesting records yet.
1452 return (set(), set(), set(interesting_root_keys), [], set())
1453 all_uninteresting_chks = set(uninteresting_root_keys)
1454 all_uninteresting_items = set()
1455
1456 # First step, find the direct children of both the interesting and
1457 # uninteresting set
1458 (uninteresting_keys, uninteresting_items,
1459 interesting_keys, interesting_to_yield,
1460 interesting_items, interesting_intersection,
1461 ) = _find_children_info(store, interesting_root_keys,
1462 uninteresting_root_keys,
1463 pb=pb)
1464 all_uninteresting_chks.update(uninteresting_keys)
1465 all_uninteresting_items.update(uninteresting_items)
1466 del uninteresting_items
1467 # Do not examine in detail pages common to all interesting trees.
1468 # Pages that are common to all interesting trees will have their
1469 # older versions found via the uninteresting tree traversal. Some pages
1470 # found via the interesting trees traversal will be uninteresting for
1471 # other of the interesting trees, which is why we require the pages to be
1472 # common for us to trim them.
1473 if interesting_intersection is not None:
1474 uninteresting_keys.difference_update(interesting_intersection)
1475
1476 # Second, find the full set of uninteresting bits reachable by the
1477 # uninteresting roots
1478 chks_to_read = uninteresting_keys
1479 while chks_to_read:
1480 next_chks = set()
1481 for record in store.get_record_stream(chks_to_read, 'unordered', False):
1482 # TODO: Handle 'absent'
1483 if pb is not None:
1484 pb.tick()
1485 bytes = record.get_bytes_as('fulltext')1445 bytes = record.get_bytes_as('fulltext')
1486 # We don't care about search_key_func for this code, because we1446 node = _deserialise(bytes, record.key,
1487 # only care about external references.1447 search_key_func=self._search_key_func)
1488 node = _deserialise(bytes, record.key, search_key_func=None)
1489 if type(node) is InternalNode:1448 if type(node) is InternalNode:
1490 # uninteresting_prefix_chks.update(node._items.iteritems())1449 # Note we don't have to do node.refs() because we know that
1491 chks = node._items.values()1450 # there are no children that have been pushed into this node
1492 # TODO: We remove the entries that are already in1451 prefix_refs = node._items.items()
1493 # uninteresting_chks ?1452 items = []
1494 next_chks.update(chks)
1495 all_uninteresting_chks.update(chks)
1496 else:1453 else:
1497 all_uninteresting_items.update(node._items.iteritems())1454 prefix_refs = []
1498 chks_to_read = next_chks1455 items = node._items.items()
1499 return (all_uninteresting_chks, all_uninteresting_items,1456 yield record, node, prefix_refs, items
1500 interesting_keys, interesting_to_yield, interesting_items)1457
1458 def _read_uninteresting_roots(self):
1459 uninteresting_chks_to_enqueue = []
1460 all_uninteresting_chks = self._all_uninteresting_chks
1461 for record, node, prefix_refs, items in \
1462 self._read_nodes_from_store(self._uninteresting_root_keys):
1463 # Uninteresting node
1464 prefix_refs = [p_r for p_r in prefix_refs
1465 if p_r[1] not in all_uninteresting_chks]
1466 new_refs = [p_r[1] for p_r in prefix_refs]
1467 all_uninteresting_chks.update(new_refs)
1468 self._all_uninteresting_items.update(items)
1469 # Queue up the uninteresting references
1470 # Don't actually put them in the 'to-read' queue until we have
1471 # finished checking the interesting references
1472 uninteresting_chks_to_enqueue.extend(prefix_refs)
1473 return uninteresting_chks_to_enqueue
1474
1475 def _enqueue_uninteresting(self, interesting_prefixes,
1476 uninteresting_chks_to_enqueue):
1477 # At this point, we have read all the uninteresting and interesting
1478 # items, so we can queue up the uninteresting stuff, knowing that we've
1479 # handled the interesting ones
1480 for prefix, ref in uninteresting_chks_to_enqueue:
1481 not_interesting = True
1482 for i in xrange(len(prefix), 0, -1):
1483 if prefix[:i] in interesting_prefixes:
1484 not_interesting = False
1485 break
1486 if not_interesting:
1487 # This prefix is not part of the remaining 'interesting set'
1488 continue
1489 self._uninteresting_queue.append(ref)
1490
1491 def _read_all_roots(self):
1492 """Read the root pages.
1493
1494 This is structured as a generator, so that the root records can be
1495 yielded up to whoever needs them without any buffering.
1496 """
1497 # This is the bootstrap phase
1498 if not self._uninteresting_root_keys:
1499 # With no uninteresting_root_keys we can just shortcut and be ready
1500 # for _flush_interesting_queue
1501 self._interesting_queue = list(self._interesting_root_keys)
1502 return
1503 uninteresting_chks_to_enqueue = self._read_uninteresting_roots()
1504 # filter out any root keys that are already known to be uninteresting
1505 interesting_keys = set(self._interesting_root_keys).difference(
1506 self._all_uninteresting_chks)
1507 # These are prefixes that are present in interesting_keys that we are
1508 # thinking to yield
1509 interesting_prefixes = set()
1510 # We are about to yield all of these, so we don't want them getting
1511 # added a second time
1512 processed_interesting_refs = self._processed_interesting_refs
1513 processed_interesting_refs.update(interesting_keys)
1514 for record, node, prefix_refs, items in \
1515 self._read_nodes_from_store(interesting_keys):
1516 # At this level, we now know all the uninteresting references
1517 # So we filter and queue up whatever is remaining
1518 prefix_refs = [p_r for p_r in prefix_refs
1519 if p_r[1] not in self._all_uninteresting_chks
1520 and p_r[1] not in processed_interesting_refs]
1521 refs = [p_r[1] for p_r in prefix_refs]
1522 interesting_prefixes.update([p_r[0] for p_r in prefix_refs])
1523 self._interesting_queue.extend(refs)
1524 # TODO: We can potentially get multiple items here, however the
1525 # current design allows for this, as callers will do the work
1526 # to make the results unique. We might profile whether we
1527 # gain anything by ensuring unique return values for items
1528 interesting_items = [item for item in items
1529 if item not in self._all_uninteresting_items]
1530 self._interesting_item_queue.extend(interesting_items)
1531 interesting_prefixes.update([self._search_key_func(item[0])
1532 for item in interesting_items])
1533 processed_interesting_refs.update(refs)
1534 yield record
1535 # For interesting_prefixes we have the full length prefixes queued up.
1536 # However, we also need possible prefixes. (If we have a known ref to
1537 # 'ab', then we also need to include 'a'.) So expand the
1538 # interesting_prefixes to include all shorter prefixes
1539 for prefix in list(interesting_prefixes):
1540 interesting_prefixes.update([prefix[:i]
1541 for i in xrange(1, len(prefix))])
1542 self._enqueue_uninteresting(interesting_prefixes,
1543 uninteresting_chks_to_enqueue)
1544
1545 def _flush_interesting_queue(self):
1546 # No need to maintain the heap invariant anymore, just pull things out
1547 # and process them
1548 refs = set(self._interesting_queue)
1549 self._interesting_queue = []
1550 # First pass, flush all interesting items and convert to using direct refs
1551 all_uninteresting_chks = self._all_uninteresting_chks
1552 processed_interesting_refs = self._processed_interesting_refs
1553 all_uninteresting_items = self._all_uninteresting_items
1554 interesting_items = [item for item in self._interesting_item_queue
1555 if item not in all_uninteresting_items]
1556 self._interesting_item_queue = []
1557 if interesting_items:
1558 yield None, interesting_items
1559 refs = refs.difference(all_uninteresting_chks)
1560 while refs:
1561 next_refs = set()
1562 next_refs_update = next_refs.update
1563 # Inlining _read_nodes_from_store improves 'bzr branch bzr.dev'
1564 # from 1m54s to 1m51s. Consider it.
1565 for record, _, p_refs, items in self._read_nodes_from_store(refs):
1566 items = [item for item in items
1567 if item not in all_uninteresting_items]
1568 yield record, items
1569 next_refs_update([p_r[1] for p_r in p_refs])
1570 next_refs = next_refs.difference(all_uninteresting_chks)
1571 next_refs = next_refs.difference(processed_interesting_refs)
1572 processed_interesting_refs.update(next_refs)
1573 refs = next_refs
1574
1575 def _process_next_uninteresting(self):
1576 # Since we don't filter uninteresting any further than during
1577 # _read_all_roots, process the whole queue in a single pass.
1578 refs = self._uninteresting_queue
1579 self._uninteresting_queue = []
1580 all_uninteresting_chks = self._all_uninteresting_chks
1581 for record, _, prefix_refs, items in self._read_nodes_from_store(refs):
1582 self._all_uninteresting_items.update(items)
1583 refs = [r for _,r in prefix_refs if r not in all_uninteresting_chks]
1584 self._uninteresting_queue.extend(refs)
1585 all_uninteresting_chks.update(refs)
1586
1587 def _process_queues(self):
1588 while self._uninteresting_queue:
1589 self._process_next_uninteresting()
1590 return self._flush_interesting_queue()
1591
1592 def process(self):
1593 for record in self._read_all_roots():
1594 yield record, []
1595 for record, items in self._process_queues():
1596 yield record, items
15011597
15021598
1503def iter_interesting_nodes(store, interesting_root_keys,1599def iter_interesting_nodes(store, interesting_root_keys,
@@ -1514,72 +1610,11 @@
1514 :return: Yield1610 :return: Yield
1515 (interesting record, {interesting key:values})1611 (interesting record, {interesting key:values})
1516 """1612 """
1517 # TODO: consider that it may be more memory efficient to use the 20-byte1613 iterator = InterestingNodeIterator(store, interesting_root_keys,
1518 # sha1 string, rather than tuples of hexidecimal sha1 strings.1614 uninteresting_root_keys,
1519 # TODO: Try to factor out a lot of the get_record_stream() calls into a1615 search_key_func=store._search_key_func,
1520 # helper function similar to _read_bytes. This function should be1616 pb=pb)
1521 # able to use nodes from the _page_cache as well as actually1617 return iterator.process()
1522 # requesting bytes from the store.
1523
1524 (all_uninteresting_chks, all_uninteresting_items, interesting_keys,
1525 interesting_to_yield, interesting_items) = _find_all_uninteresting(store,
1526 interesting_root_keys, uninteresting_root_keys, pb)
1527
1528 # Now that we know everything uninteresting, we can yield information from
1529 # our first request
1530 interesting_items.difference_update(all_uninteresting_items)
1531 interesting_to_yield = set(interesting_to_yield) - all_uninteresting_chks
1532 if interesting_items:
1533 yield None, interesting_items
1534 if interesting_to_yield:
1535 # We request these records again, rather than buffering the root
1536 # records, most likely they are still in the _group_cache anyway.
1537 for record in store.get_record_stream(interesting_to_yield,
1538 'unordered', False):
1539 yield record, []
1540 all_uninteresting_chks.update(interesting_to_yield)
1541 interesting_keys.difference_update(all_uninteresting_chks)
1542
1543 chks_to_read = interesting_keys
1544 counter = 0
1545 while chks_to_read:
1546 next_chks = set()
1547 for record in store.get_record_stream(chks_to_read, 'unordered', False):
1548 counter += 1
1549 if pb is not None:
1550 pb.update('find chk pages', counter)
1551 # TODO: Handle 'absent'?
1552 bytes = record.get_bytes_as('fulltext')
1553 # We don't care about search_key_func for this code, because we
1554 # only care about external references.
1555 node = _deserialise(bytes, record.key, search_key_func=None)
1556 if type(node) is InternalNode:
1557 # all_uninteresting_chks grows large, as it lists all nodes we
1558 # don't want to process (including already seen interesting
1559 # nodes).
1560 # small.difference_update(large) scales O(large), but
1561 # small.difference(large) scales O(small).
1562 # Also, we know we just _deserialised this node, so we can
1563 # access the dict directly.
1564 chks = set(node._items.itervalues()).difference(
1565 all_uninteresting_chks)
1566 # Is set() and .difference_update better than:
1567 # chks = [chk for chk in node.refs()
1568 # if chk not in all_uninteresting_chks]
1569 next_chks.update(chks)
1570 # These are now uninteresting everywhere else
1571 all_uninteresting_chks.update(chks)
1572 interesting_items = []
1573 else:
1574 interesting_items = [item for item in node._items.iteritems()
1575 if item not in all_uninteresting_items]
1576 # TODO: Do we need to filter out items that we have already
1577 # seen on other pages? We don't really want to buffer the
1578 # whole thing, but it does mean that callers need to
1579 # understand they may get duplicate values.
1580 # all_uninteresting_items.update(interesting_items)
1581 yield record, interesting_items
1582 chks_to_read = next_chks
15831618
15841619
1585try:1620try:
15861621
=== modified file 'bzrlib/repofmt/groupcompress_repo.py'
--- bzrlib/repofmt/groupcompress_repo.py 2009-06-29 14:51:13 +0000
+++ bzrlib/repofmt/groupcompress_repo.py 2009-07-01 22:35:14 +0000
@@ -652,6 +652,9 @@
652 parents=False, is_locked=self.is_locked,652 parents=False, is_locked=self.is_locked,
653 inconsistency_fatal=False),653 inconsistency_fatal=False),
654 access=self._pack_collection.chk_index.data_access)654 access=self._pack_collection.chk_index.data_access)
655 search_key_name = self._format._serializer.search_key_name
656 search_key_func = chk_map.search_key_registry.get(search_key_name)
657 self.chk_bytes._search_key_func = search_key_func
655 # True when the repository object is 'write locked' (as opposed to the658 # True when the repository object is 'write locked' (as opposed to the
656 # physical lock only taken out around changes to the pack-names list.)659 # physical lock only taken out around changes to the pack-names list.)
657 # Another way to represent this would be a decorator around the control660 # Another way to represent this would be a decorator around the control
658661
=== modified file 'bzrlib/tests/test_chk_map.py'
--- bzrlib/tests/test_chk_map.py 2009-06-26 09:24:34 +0000
+++ bzrlib/tests/test_chk_map.py 2009-07-01 22:35:14 +0000
@@ -20,6 +20,7 @@
2020
21from bzrlib import (21from bzrlib import (
22 chk_map,22 chk_map,
23 groupcompress,
23 osutils,24 osutils,
24 tests,25 tests,
25 )26 )
@@ -59,17 +60,14 @@
59 self.assertCommonPrefix('', '', '')60 self.assertCommonPrefix('', '', '')
6061
6162
62class TestCaseWithStore(tests.TestCaseWithTransport):63class TestCaseWithStore(tests.TestCaseWithMemoryTransport):
6364
64 def get_chk_bytes(self):65 def get_chk_bytes(self):
65 # The easiest way to get a CHK store is a development6 repository and66 # The easiest way to get a CHK store is a development6 repository and
66 # then work with the chk_bytes attribute directly.67 # then work with the chk_bytes attribute directly.
67 repo = self.make_repository(".", format="development6-rich-root")68 factory = groupcompress.make_pack_factory(False, False, 1)
68 repo.lock_write()69 self.chk_bytes = factory(self.get_transport())
69 self.addCleanup(repo.unlock)70 return self.chk_bytes
70 repo.start_write_group()
71 self.addCleanup(repo.abort_write_group)
72 return repo.chk_bytes
7371
74 def _get_map(self, a_dict, maximum_size=0, chk_bytes=None, key_width=1,72 def _get_map(self, a_dict, maximum_size=0, chk_bytes=None, key_width=1,
75 search_key_func=None):73 search_key_func=None):
@@ -97,6 +95,246 @@
97 return dict(node.iteritems(*args))95 return dict(node.iteritems(*args))
9896
9997
98class TestCaseWithExampleMaps(TestCaseWithStore):
99
100 def get_chk_bytes(self):
101 if getattr(self, '_chk_bytes', None) is None:
102 self._chk_bytes = super(TestCaseWithExampleMaps,
103 self).get_chk_bytes()
104 return self._chk_bytes
105
106 def get_map(self, a_dict, maximum_size=100, search_key_func=None):
107 c_map = self._get_map(a_dict, maximum_size=maximum_size,
108 chk_bytes=self.get_chk_bytes(),
109 search_key_func=search_key_func)
110 return c_map
111
112 def make_root_only_map(self, search_key_func=None):
113 return self.get_map({
114 ('aaa',): 'initial aaa content',
115 ('abb',): 'initial abb content',
116 }, search_key_func=search_key_func)
117
118 def make_root_only_aaa_ddd_map(self, search_key_func=None):
119 return self.get_map({
120 ('aaa',): 'initial aaa content',
121 ('ddd',): 'initial ddd content',
122 }, search_key_func=search_key_func)
123
124 def make_one_deep_map(self, search_key_func=None):
125 # Same as root_only_map, except it forces an InternalNode at the root
126 return self.get_map({
127 ('aaa',): 'initial aaa content',
128 ('abb',): 'initial abb content',
129 ('ccc',): 'initial ccc content',
130 ('ddd',): 'initial ddd content',
131 }, search_key_func=search_key_func)
132
133 def make_two_deep_map(self, search_key_func=None):
134 # Carefully chosen so that it creates a 2-deep map for both
135 # _search_key_plain and for _search_key_16
136 # Also so that things line up with make_one_deep_two_prefix_map
137 return self.get_map({
138 ('aaa',): 'initial aaa content',
139 ('abb',): 'initial abb content',
140 ('acc',): 'initial acc content',
141 ('ace',): 'initial ace content',
142 ('add',): 'initial add content',
143 ('adh',): 'initial adh content',
144 ('adl',): 'initial adl content',
145 ('ccc',): 'initial ccc content',
146 ('ddd',): 'initial ddd content',
147 }, search_key_func=search_key_func)
148
149 def make_one_deep_two_prefix_map(self, search_key_func=None):
150 """Create a map with one internal node, but references are extra long.
151
152 Otherwise has similar content to make_two_deep_map.
153 """
154 return self.get_map({
155 ('aaa',): 'initial aaa content',
156 ('add',): 'initial add content',
157 ('adh',): 'initial adh content',
158 ('adl',): 'initial adl content',
159 }, search_key_func=search_key_func)
160
161 def make_one_deep_one_prefix_map(self, search_key_func=None):
162 """Create a map with one internal node, but references are extra long.
163
164 Similar to make_one_deep_two_prefix_map, except the split is at the
165 first char, rather than the second.
166 """
167 return self.get_map({
168 ('add',): 'initial add content',
169 ('adh',): 'initial adh content',
170 ('adl',): 'initial adl content',
171 ('bbb',): 'initial bbb content',
172 }, search_key_func=search_key_func)
173
174
175class TestTestCaseWithExampleMaps(TestCaseWithExampleMaps):
176 """Actual tests for the provided examples."""
177
178 def test_root_only_map_plain(self):
179 c_map = self.make_root_only_map()
180 self.assertEqualDiff(
181 "'' LeafNode\n"
182 " ('aaa',) 'initial aaa content'\n"
183 " ('abb',) 'initial abb content'\n",
184 c_map._dump_tree())
185
186 def test_root_only_map_16(self):
187 c_map = self.make_root_only_map(search_key_func=chk_map._search_key_16)
188 self.assertEqualDiff(
189 "'' LeafNode\n"
190 " ('aaa',) 'initial aaa content'\n"
191 " ('abb',) 'initial abb content'\n",
192 c_map._dump_tree())
193
194 def test_one_deep_map_plain(self):
195 c_map = self.make_one_deep_map()
196 self.assertEqualDiff(
197 "'' InternalNode\n"
198 " 'a' LeafNode\n"
199 " ('aaa',) 'initial aaa content'\n"
200 " ('abb',) 'initial abb content'\n"
201 " 'c' LeafNode\n"
202 " ('ccc',) 'initial ccc content'\n"
203 " 'd' LeafNode\n"
204 " ('ddd',) 'initial ddd content'\n",
205 c_map._dump_tree())
206
207 def test_one_deep_map_16(self):
208 c_map = self.make_one_deep_map(search_key_func=chk_map._search_key_16)
209 self.assertEqualDiff(
210 "'' InternalNode\n"
211 " '2' LeafNode\n"
212 " ('ccc',) 'initial ccc content'\n"
213 " '4' LeafNode\n"
214 " ('abb',) 'initial abb content'\n"
215 " 'F' LeafNode\n"
216 " ('aaa',) 'initial aaa content'\n"
217 " ('ddd',) 'initial ddd content'\n",
218 c_map._dump_tree())
219
220 def test_root_only_aaa_ddd_plain(self):
221 c_map = self.make_root_only_aaa_ddd_map()
222 self.assertEqualDiff(
223 "'' LeafNode\n"
224 " ('aaa',) 'initial aaa content'\n"
225 " ('ddd',) 'initial ddd content'\n",
226 c_map._dump_tree())
227
228 def test_one_deep_map_16(self):
229 c_map = self.make_root_only_aaa_ddd_map(
230 search_key_func=chk_map._search_key_16)
231 # We use 'aaa' and 'ddd' because they happen to map to 'F' when using
232 # _search_key_16
233 self.assertEqualDiff(
234 "'' LeafNode\n"
235 " ('aaa',) 'initial aaa content'\n"
236 " ('ddd',) 'initial ddd content'\n",
237 c_map._dump_tree())
238
239 def test_two_deep_map_plain(self):
240 c_map = self.make_two_deep_map()
241 self.assertEqualDiff(
242 "'' InternalNode\n"
243 " 'a' InternalNode\n"
244 " 'aa' LeafNode\n"
245 " ('aaa',) 'initial aaa content'\n"
246 " 'ab' LeafNode\n"
247 " ('abb',) 'initial abb content'\n"
248 " 'ac' LeafNode\n"
249 " ('acc',) 'initial acc content'\n"
250 " ('ace',) 'initial ace content'\n"
251 " 'ad' LeafNode\n"
252 " ('add',) 'initial add content'\n"
253 " ('adh',) 'initial adh content'\n"
254 " ('adl',) 'initial adl content'\n"
255 " 'c' LeafNode\n"
256 " ('ccc',) 'initial ccc content'\n"
257 " 'd' LeafNode\n"
258 " ('ddd',) 'initial ddd content'\n",
259 c_map._dump_tree())
260
261 def test_two_deep_map_16(self):
262 c_map = self.make_two_deep_map(search_key_func=chk_map._search_key_16)
263 self.assertEqualDiff(
264 "'' InternalNode\n"
265 " '2' LeafNode\n"
266 " ('acc',) 'initial acc content'\n"
267 " ('ccc',) 'initial ccc content'\n"
268 " '4' LeafNode\n"
269 " ('abb',) 'initial abb content'\n"
270 " 'C' LeafNode\n"
271 " ('ace',) 'initial ace content'\n"
272 " 'F' InternalNode\n"
273 " 'F0' LeafNode\n"
274 " ('aaa',) 'initial aaa content'\n"
275 " 'F3' LeafNode\n"
276 " ('adl',) 'initial adl content'\n"
277 " 'F4' LeafNode\n"
278 " ('adh',) 'initial adh content'\n"
279 " 'FB' LeafNode\n"
280 " ('ddd',) 'initial ddd content'\n"
281 " 'FD' LeafNode\n"
282 " ('add',) 'initial add content'\n",
283 c_map._dump_tree())
284
285 def test_one_deep_two_prefix_map_plain(self):
286 c_map = self.make_one_deep_two_prefix_map()
287 self.assertEqualDiff(
288 "'' InternalNode\n"
289 " 'aa' LeafNode\n"
290 " ('aaa',) 'initial aaa content'\n"
291 " 'ad' LeafNode\n"
292 " ('add',) 'initial add content'\n"
293 " ('adh',) 'initial adh content'\n"
294 " ('adl',) 'initial adl content'\n",
295 c_map._dump_tree())
296
297 def test_one_deep_two_prefix_map_16(self):
298 c_map = self.make_one_deep_two_prefix_map(
299 search_key_func=chk_map._search_key_16)
300 self.assertEqualDiff(
301 "'' InternalNode\n"
302 " 'F0' LeafNode\n"
303 " ('aaa',) 'initial aaa content'\n"
304 " 'F3' LeafNode\n"
305 " ('adl',) 'initial adl content'\n"
306 " 'F4' LeafNode\n"
307 " ('adh',) 'initial adh content'\n"
308 " 'FD' LeafNode\n"
309 " ('add',) 'initial add content'\n",
310 c_map._dump_tree())
311
312 def test_one_deep_one_prefix_map_plain(self):
313 c_map = self.make_one_deep_one_prefix_map()
314 self.assertEqualDiff(
315 "'' InternalNode\n"
316 " 'a' LeafNode\n"
317 " ('add',) 'initial add content'\n"
318 " ('adh',) 'initial adh content'\n"
319 " ('adl',) 'initial adl content'\n"
320 " 'b' LeafNode\n"
321 " ('bbb',) 'initial bbb content'\n",
322 c_map._dump_tree())
323
324 def test_one_deep_one_prefix_map_16(self):
325 c_map = self.make_one_deep_one_prefix_map(
326 search_key_func=chk_map._search_key_16)
327 self.assertEqualDiff(
328 "'' InternalNode\n"
329 " '4' LeafNode\n"
330 " ('bbb',) 'initial bbb content'\n"
331 " 'F' LeafNode\n"
332 " ('add',) 'initial add content'\n"
333 " ('adh',) 'initial adh content'\n"
334 " ('adl',) 'initial adl content'\n",
335 c_map._dump_tree())
336
337
100class TestMap(TestCaseWithStore):338class TestMap(TestCaseWithStore):
101339
102 def assertHasABMap(self, chk_bytes):340 def assertHasABMap(self, chk_bytes):
@@ -1886,62 +2124,408 @@
1886# 1-4K get02124# 1-4K get0
18872125
18882126
1889class TestIterInterestingNodes(TestCaseWithStore):2127class TestInterestingNodeIterator(TestCaseWithExampleMaps):
18902128
1891 def get_chk_bytes(self):2129 def get_iterator(self, interesting_roots, uninteresting_roots,
1892 if getattr(self, '_chk_bytes', None) is None:2130 search_key_func=None):
1893 self._chk_bytes = super(TestIterInterestingNodes,2131 if search_key_func is None:
1894 self).get_chk_bytes()2132 search_key_func = chk_map._search_key_plain
1895 return self._chk_bytes2133 return chk_map.InterestingNodeIterator(self.get_chk_bytes(),
2134 interesting_roots, uninteresting_roots, search_key_func)
2135
2136 def test__init__(self):
2137 c_map = self.make_root_only_map()
2138 key1 = c_map.key()
2139 c_map.map(('aaa',), 'new aaa content')
2140 key2 = c_map._save()
2141 iterator = self.get_iterator([key2], [key1])
2142 self.assertEqual(set([key1]), iterator._all_uninteresting_chks)
2143 self.assertEqual([], iterator._uninteresting_queue)
2144 self.assertEqual([], iterator._interesting_queue)
2145
2146 def help__read_all_roots(self, search_key_func):
2147 c_map = self.make_root_only_map(search_key_func=search_key_func)
2148 key1 = c_map.key()
2149 c_map.map(('aaa',), 'new aaa content')
2150 key2 = c_map._save()
2151 iterator = self.get_iterator([key2], [key1], search_key_func)
2152 root_results = [record.key for record in iterator._read_all_roots()]
2153 self.assertEqual([key2], root_results)
2154 # We should have queued up only items that aren't in the uninteresting
2155 # set
2156 self.assertEqual([(('aaa',), 'new aaa content')],
2157 iterator._interesting_item_queue)
2158 self.assertEqual([], iterator._interesting_queue)
2159 # And there are no uninteresting references, so that queue should be
2160 # empty
2161 self.assertEqual([], iterator._uninteresting_queue)
2162
2163 def test__read_all_roots_plain(self):
2164 self.help__read_all_roots(search_key_func=chk_map._search_key_plain)
2165
2166 def test__read_all_roots_16(self):
2167 self.help__read_all_roots(search_key_func=chk_map._search_key_16)
2168
2169 def test__read_all_roots_skips_known_uninteresting(self):
2170 c_map = self.make_one_deep_map(chk_map._search_key_plain)
2171 key1 = c_map.key()
2172 c_map2 = self.make_root_only_map(chk_map._search_key_plain)
2173 key2 = c_map2.key()
2174 iterator = self.get_iterator([key2], [key1], chk_map._search_key_plain)
2175 root_results = [record.key for record in iterator._read_all_roots()]
2176 # We should have no results. key2 is completely contained within key1,
2177 # and we should have seen that in the first pass
2178 self.assertEqual([], root_results)
2179
2180 def test__read_all_roots_prepares_queues(self):
2181 c_map = self.make_one_deep_map(chk_map._search_key_plain)
2182 key1 = c_map.key()
2183 c_map._dump_tree() # load everything
2184 key1_a = c_map._root_node._items['a'].key()
2185 c_map.map(('abb',), 'new abb content')
2186 key2 = c_map._save()
2187 key2_a = c_map._root_node._items['a'].key()
2188 iterator = self.get_iterator([key2], [key1], chk_map._search_key_plain)
2189 root_results = [record.key for record in iterator._read_all_roots()]
2190 self.assertEqual([key2], root_results)
2191 # At this point, we should have queued up only the 'a' Leaf on both
2192 # sides, both 'c' and 'd' are known to not have changed on both sides
2193 self.assertEqual([key2_a], iterator._interesting_queue)
2194 self.assertEqual([], iterator._interesting_item_queue)
2195 self.assertEqual([key1_a], iterator._uninteresting_queue)
2196
2197 def test__read_all_roots_multi_interesting_prepares_queues(self):
2198 c_map = self.make_one_deep_map(chk_map._search_key_plain)
2199 key1 = c_map.key()
2200 c_map._dump_tree() # load everything
2201 key1_a = c_map._root_node._items['a'].key()
2202 key1_c = c_map._root_node._items['c'].key()
2203 c_map.map(('abb',), 'new abb content')
2204 key2 = c_map._save()
2205 key2_a = c_map._root_node._items['a'].key()
2206 key2_c = c_map._root_node._items['c'].key()
2207 c_map = chk_map.CHKMap(self.get_chk_bytes(), key1,
2208 chk_map._search_key_plain)
2209 c_map.map(('ccc',), 'new ccc content')
2210 key3 = c_map._save()
2211 key3_a = c_map._root_node._items['a'].key()
2212 key3_c = c_map._root_node._items['c'].key()
2213 iterator = self.get_iterator([key2, key3], [key1],
2214 chk_map._search_key_plain)
2215 root_results = [record.key for record in iterator._read_all_roots()]
2216 self.assertEqual(sorted([key2, key3]), sorted(root_results))
2217 # We should have queued up key2_a, and key3_c, but not key2_c or key3_c
2218 self.assertEqual([key2_a, key3_c], iterator._interesting_queue)
2219 self.assertEqual([], iterator._interesting_item_queue)
2220 # And we should have queued up both a and c for the uninteresting set
2221 self.assertEqual([key1_a, key1_c], iterator._uninteresting_queue)
2222
2223 def test__read_all_roots_different_depths(self):
2224 c_map = self.make_two_deep_map(chk_map._search_key_plain)
2225 c_map._dump_tree() # load everything
2226 key1 = c_map.key()
2227 key1_a = c_map._root_node._items['a'].key()
2228 key1_c = c_map._root_node._items['c'].key()
2229 key1_d = c_map._root_node._items['d'].key()
2230
2231 c_map2 = self.make_one_deep_two_prefix_map(chk_map._search_key_plain)
2232 c_map2._dump_tree()
2233 key2 = c_map2.key()
2234 key2_aa = c_map2._root_node._items['aa'].key()
2235 key2_ad = c_map2._root_node._items['ad'].key()
2236
2237 iterator = self.get_iterator([key2], [key1], chk_map._search_key_plain)
2238 root_results = [record.key for record in iterator._read_all_roots()]
2239 self.assertEqual([key2], root_results)
2240 # Only the 'a' subset should be queued up, since 'c' and 'd' cannot be
2241 # present
2242 self.assertEqual([key1_a], iterator._uninteresting_queue)
2243 self.assertEqual([key2_aa, key2_ad], iterator._interesting_queue)
2244 self.assertEqual([], iterator._interesting_item_queue)
2245
2246 iterator = self.get_iterator([key1], [key2], chk_map._search_key_plain)
2247 root_results = [record.key for record in iterator._read_all_roots()]
2248 self.assertEqual([key1], root_results)
2249
2250 self.assertEqual([key2_aa, key2_ad], iterator._uninteresting_queue)
2251 self.assertEqual([key1_a, key1_c, key1_d], iterator._interesting_queue)
2252 self.assertEqual([], iterator._interesting_item_queue)
2253
2254 def test__read_all_roots_different_depths_16(self):
2255 c_map = self.make_two_deep_map(chk_map._search_key_16)
2256 c_map._dump_tree() # load everything
2257 key1 = c_map.key()
2258 key1_2 = c_map._root_node._items['2'].key()
2259 key1_4 = c_map._root_node._items['4'].key()
2260 key1_C = c_map._root_node._items['C'].key()
2261 key1_F = c_map._root_node._items['F'].key()
2262
2263 c_map2 = self.make_one_deep_two_prefix_map(chk_map._search_key_16)
2264 c_map2._dump_tree()
2265 key2 = c_map2.key()
2266 key2_F0 = c_map2._root_node._items['F0'].key()
2267 key2_F3 = c_map2._root_node._items['F3'].key()
2268 key2_F4 = c_map2._root_node._items['F4'].key()
2269 key2_FD = c_map2._root_node._items['FD'].key()
2270
2271 iterator = self.get_iterator([key2], [key1], chk_map._search_key_16)
2272 root_results = [record.key for record in iterator._read_all_roots()]
2273 self.assertEqual([key2], root_results)
2274 # Only the subset of keys that may be present should be queued up.
2275 self.assertEqual([key1_F], iterator._uninteresting_queue)
2276 self.assertEqual(sorted([key2_F0, key2_F3, key2_F4, key2_FD]),
2277 sorted(iterator._interesting_queue))
2278 self.assertEqual([], iterator._interesting_item_queue)
2279
2280 iterator = self.get_iterator([key1], [key2], chk_map._search_key_16)
2281 root_results = [record.key for record in iterator._read_all_roots()]
2282 self.assertEqual([key1], root_results)
2283
2284 self.assertEqual(sorted([key2_F0, key2_F3, key2_F4, key2_FD]),
2285 sorted(iterator._uninteresting_queue))
2286 self.assertEqual(sorted([key1_2, key1_4, key1_C, key1_F]),
2287 sorted(iterator._interesting_queue))
2288 self.assertEqual([], iterator._interesting_item_queue)
2289
2290 def test__read_all_roots_mixed_depth(self):
2291 c_map = self.make_one_deep_two_prefix_map(chk_map._search_key_plain)
2292 c_map._dump_tree() # load everything
2293 key1 = c_map.key()
2294 key1_aa = c_map._root_node._items['aa'].key()
2295 key1_ad = c_map._root_node._items['ad'].key()
2296
2297 c_map2 = self.make_one_deep_one_prefix_map(chk_map._search_key_plain)
2298 c_map2._dump_tree()
2299 key2 = c_map2.key()
2300 key2_a = c_map2._root_node._items['a'].key()
2301 key2_b = c_map2._root_node._items['b'].key()
2302
2303 iterator = self.get_iterator([key2], [key1], chk_map._search_key_plain)
2304 root_results = [record.key for record in iterator._read_all_roots()]
2305 self.assertEqual([key2], root_results)
2306 # 'ad' matches exactly 'a' on the other side, so it should be removed,
2307 # and neither side should have it queued for walking
2308 self.assertEqual([], iterator._uninteresting_queue)
2309 self.assertEqual([key2_b], iterator._interesting_queue)
2310 self.assertEqual([], iterator._interesting_item_queue)
2311
2312 iterator = self.get_iterator([key1], [key2], chk_map._search_key_plain)
2313 root_results = [record.key for record in iterator._read_all_roots()]
2314 self.assertEqual([key1], root_results)
2315 # Note: This is technically not the 'true minimal' set that we could
2316 # use The reason is that 'a' was matched exactly to 'ad' (by sha
2317 # sum). However, the code gets complicated in the case of more
2318 # than one interesting key, so for now, we live with this
2319 # Consider revising, though benchmarking showing it to be a
2320 # real-world issue should be done
2321 self.assertEqual([key2_a], iterator._uninteresting_queue)
2322 # self.assertEqual([], iterator._uninteresting_queue)
2323 self.assertEqual([key1_aa], iterator._interesting_queue)
2324 self.assertEqual([], iterator._interesting_item_queue)
2325
2326 def test__read_all_roots_yields_extra_deep_records(self):
2327 # This is slightly controversial, as we will yield a chk page that we
2328 # might later on find out could be filtered out. (If a root node is
2329 # referenced deeper in the uninteresting set.)
2330 # However, even with stacking, we always have all chk pages that we
2331 # will need. So as long as we filter out the referenced keys, we'll
2332 # never run into problems.
2333 # This allows us to yield a root node record immediately, without any
2334 # buffering.
2335 c_map = self.make_two_deep_map(chk_map._search_key_plain)
2336 c_map._dump_tree() # load all keys
2337 key1 = c_map.key()
2338 key1_a = c_map._root_node._items['a'].key()
2339 c_map2 = self.get_map({
2340 ('acc',): 'initial acc content',
2341 ('ace',): 'initial ace content',
2342 }, maximum_size=100)
2343 self.assertEqualDiff(
2344 "'' LeafNode\n"
2345 " ('acc',) 'initial acc content'\n"
2346 " ('ace',) 'initial ace content'\n",
2347 c_map2._dump_tree())
2348 key2 = c_map2.key()
2349 iterator = self.get_iterator([key2], [key1], chk_map._search_key_plain)
2350 root_results = [record.key for record in iterator._read_all_roots()]
2351 self.assertEqual([key2], root_results)
2352 # However, even though we have yielded the root node to be fetched,
2353 # we should have enqued all of the chk pages to be walked, so that we
2354 # can find the keys if they are present
2355 self.assertEqual([key1_a], iterator._uninteresting_queue)
2356 self.assertEqual([(('acc',), 'initial acc content'),
2357 (('ace',), 'initial ace content'),
2358 ], iterator._interesting_item_queue)
2359
2360 def test__read_all_roots_multiple_targets(self):
2361 c_map = self.make_root_only_map()
2362 key1 = c_map.key()
2363 c_map = self.make_one_deep_map()
2364 key2 = c_map.key()
2365 c_map._dump_tree()
2366 key2_c = c_map._root_node._items['c'].key()
2367 key2_d = c_map._root_node._items['d'].key()
2368 c_map.map(('ccc',), 'new ccc value')
2369 key3 = c_map._save()
2370 key3_c = c_map._root_node._items['c'].key()
2371 iterator = self.get_iterator([key2, key3], [key1],
2372 chk_map._search_key_plain)
2373 root_results = [record.key for record in iterator._read_all_roots()]
2374 self.assertEqual(sorted([key2, key3]), sorted(root_results))
2375 self.assertEqual([], iterator._uninteresting_queue)
2376 # the key 'd' is interesting from key2 and key3, but should only be
2377 # entered into the queue 1 time
2378 self.assertEqual(sorted([key2_c, key3_c, key2_d]),
2379 sorted(iterator._interesting_queue))
2380 self.assertEqual([], iterator._interesting_item_queue)
2381
2382 def test__read_all_roots_no_uninteresting(self):
2383 # This is the 'initial branch' case. With nothing in the uninteresting
2384 # set, we can just queue up all root nodes into interesting queue, and
2385 # then have them fast-path flushed via _flush_interesting_queue
2386 c_map = self.make_two_deep_map()
2387 key1 = c_map.key()
2388 iterator = self.get_iterator([key1], [], chk_map._search_key_plain)
2389 root_results = [record.key for record in iterator._read_all_roots()]
2390 self.assertEqual([], root_results)
2391 self.assertEqual([], iterator._uninteresting_queue)
2392 self.assertEqual([key1], iterator._interesting_queue)
2393 self.assertEqual([], iterator._interesting_item_queue)
2394
2395 c_map2 = self.make_one_deep_map()
2396 key2 = c_map2.key()
2397 iterator = self.get_iterator([key1, key2], [],
2398 chk_map._search_key_plain)
2399 root_results = [record.key for record in iterator._read_all_roots()]
2400 self.assertEqual([], root_results)
2401 self.assertEqual([], iterator._uninteresting_queue)
2402 self.assertEqual(sorted([key1, key2]),
2403 sorted(iterator._interesting_queue))
2404 self.assertEqual([], iterator._interesting_item_queue)
2405
2406 def test__read_all_roots_no_uninteresting_16(self):
2407 c_map = self.make_two_deep_map(chk_map._search_key_16)
2408 key1 = c_map.key()
2409 iterator = self.get_iterator([key1], [], chk_map._search_key_16)
2410 root_results = [record.key for record in iterator._read_all_roots()]
2411 self.assertEqual([], root_results)
2412 self.assertEqual([], iterator._uninteresting_queue)
2413 self.assertEqual([key1], iterator._interesting_queue)
2414 self.assertEqual([], iterator._interesting_item_queue)
2415
2416 c_map2 = self.make_one_deep_map(chk_map._search_key_16)
2417 key2 = c_map2.key()
2418 iterator = self.get_iterator([key1, key2], [],
2419 chk_map._search_key_16)
2420 root_results = [record.key for record in iterator._read_all_roots()]
2421 self.assertEqual([], root_results)
2422 self.assertEqual([], iterator._uninteresting_queue)
2423 self.assertEqual(sorted([key1, key2]),
2424 sorted(iterator._interesting_queue))
2425 self.assertEqual([], iterator._interesting_item_queue)
2426
2427 def test__read_all_roots_multiple_uninteresting(self):
2428 c_map = self.make_two_deep_map()
2429 key1 = c_map.key()
2430 c_map._dump_tree() # load everything
2431 key1_a = c_map._root_node._items['a'].key()
2432 c_map.map(('ccc',), 'new ccc value')
2433 key2 = c_map._save()
2434 key2_a = c_map._root_node._items['a'].key()
2435 c_map.map(('add',), 'new add value')
2436 key3 = c_map._save()
2437 key3_a = c_map._root_node._items['a'].key()
2438 iterator = self.get_iterator([key3], [key1, key2],
2439 chk_map._search_key_plain)
2440 root_results = [record.key for record in iterator._read_all_roots()]
2441 self.assertEqual([key3], root_results)
2442 # the 'a' keys should not be queued up 2 times, since they are
2443 # identical
2444 self.assertEqual([key1_a], iterator._uninteresting_queue)
2445 self.assertEqual([key3_a], iterator._interesting_queue)
2446 self.assertEqual([], iterator._interesting_item_queue)
2447
2448 def test__process_next_uninteresting_batched_no_dupes(self):
2449 c_map = self.make_two_deep_map()
2450 key1 = c_map.key()
2451 c_map._dump_tree() # load everything
2452 key1_a = c_map._root_node._items['a'].key()
2453 key1_aa = c_map._root_node._items['a']._items['aa'].key()
2454 key1_ab = c_map._root_node._items['a']._items['ab'].key()
2455 key1_ac = c_map._root_node._items['a']._items['ac'].key()
2456 key1_ad = c_map._root_node._items['a']._items['ad'].key()
2457 c_map.map(('aaa',), 'new aaa value')
2458 key2 = c_map._save()
2459 key2_a = c_map._root_node._items['a'].key()
2460 key2_aa = c_map._root_node._items['a']._items['aa'].key()
2461 c_map.map(('acc',), 'new acc content')
2462 key3 = c_map._save()
2463 key3_a = c_map._root_node._items['a'].key()
2464 key3_ac = c_map._root_node._items['a']._items['ac'].key()
2465 iterator = self.get_iterator([key3], [key1, key2],
2466 chk_map._search_key_plain)
2467 root_results = [record.key for record in iterator._read_all_roots()]
2468 self.assertEqual([key3], root_results)
2469 self.assertEqual(sorted([key1_a, key2_a]),
2470 sorted(iterator._uninteresting_queue))
2471 self.assertEqual([key3_a], iterator._interesting_queue)
2472 self.assertEqual([], iterator._interesting_item_queue)
2473 iterator._process_next_uninteresting()
2474 # All of the uninteresting records should be brought in and queued up,
2475 # but we should not have any duplicates
2476 self.assertEqual(sorted([key1_aa, key1_ab, key1_ac, key1_ad, key2_aa]),
2477 sorted(iterator._uninteresting_queue))
2478
2479
2480class TestIterInterestingNodes(TestCaseWithExampleMaps):
18962481
1897 def get_map_key(self, a_dict, maximum_size=10):2482 def get_map_key(self, a_dict, maximum_size=10):
1898 c_map = self._get_map(a_dict, maximum_size=maximum_size,2483 c_map = self.get_map(a_dict, maximum_size=maximum_size)
1899 chk_bytes=self.get_chk_bytes())
1900 return c_map.key()2484 return c_map.key()
19012485
1902 def assertIterInteresting(self, expected, interesting_keys,2486 def assertIterInteresting(self, records, items, interesting_keys,
1903 uninteresting_keys):2487 uninteresting_keys):
1904 """Check the result of iter_interesting_nodes.2488 """Check the result of iter_interesting_nodes.
19052489
1906 :param expected: A list of (record_keys, interesting_chk_pages,2490 Note that we no longer care how many steps are taken, etc, just that
1907 interesting key value pairs)2491 the right contents are returned.
2492
2493 :param records: A list of record keys that should be yielded
2494 :param items: A list of items (key,value) that should be yielded.
1908 """2495 """
1909 store = self.get_chk_bytes()2496 store = self.get_chk_bytes()
2497 store._search_key_func = chk_map._search_key_plain
1910 iter_nodes = chk_map.iter_interesting_nodes(store, interesting_keys,2498 iter_nodes = chk_map.iter_interesting_nodes(store, interesting_keys,
1911 uninteresting_keys)2499 uninteresting_keys)
1912 nodes = list(iter_nodes)2500 record_keys = []
1913 for count, (exp, act) in enumerate(izip(expected, nodes)):2501 all_items = []
1914 exp_record, exp_items = exp2502 for record, new_items in iter_nodes:
1915 record, items = act2503 if record is not None:
1916 exp_tuple = (exp_record, sorted(exp_items))2504 record_keys.append(record.key)
1917 if record is None:2505 if new_items:
1918 act_tuple = (None, sorted(items))2506 all_items.extend(new_items)
1919 else:2507 self.assertEqual(sorted(records), sorted(record_keys))
1920 act_tuple = (record.key, sorted(items))2508 self.assertEqual(sorted(items), sorted(all_items))
1921 self.assertEqual(exp_tuple, act_tuple,
1922 'entry %d did not match expected' % count)
1923 self.assertEqual(len(expected), len(nodes))
19242509
1925 def test_empty_to_one_keys(self):2510 def test_empty_to_one_keys(self):
1926 target = self.get_map_key({('a',): 'content'})2511 target = self.get_map_key({('a',): 'content'})
1927 self.assertIterInteresting(2512 self.assertIterInteresting([target],
1928 [(target, [(('a',), 'content')]),2513 [(('a',), 'content')],
1929 ], [target], [])2514 [target], [])
19302515
1931 def test_none_to_one_key(self):2516 def test_none_to_one_key(self):
1932 basis = self.get_map_key({})2517 basis = self.get_map_key({})
1933 target = self.get_map_key({('a',): 'content'})2518 target = self.get_map_key({('a',): 'content'})
1934 self.assertIterInteresting(2519 self.assertIterInteresting([target],
1935 [(None, [(('a',), 'content')]),2520 [(('a',), 'content')],
1936 (target, []),2521 [target], [basis])
1937 ], [target], [basis])
19382522
1939 def test_one_to_none_key(self):2523 def test_one_to_none_key(self):
1940 basis = self.get_map_key({('a',): 'content'})2524 basis = self.get_map_key({('a',): 'content'})
1941 target = self.get_map_key({})2525 target = self.get_map_key({})
1942 self.assertIterInteresting(2526 self.assertIterInteresting([target],
1943 [(target, [])],2527 [],
1944 [target], [basis])2528 [target], [basis])
19452529
1946 def test_common_pages(self):2530 def test_common_pages(self):
1947 basis = self.get_map_key({('a',): 'content',2531 basis = self.get_map_key({('a',): 'content',
@@ -1964,10 +2548,9 @@
1964 target_map._dump_tree())2548 target_map._dump_tree())
1965 b_key = target_map._root_node._items['b'].key()2549 b_key = target_map._root_node._items['b'].key()
1966 # This should return the root node, and the node for the 'b' key2550 # This should return the root node, and the node for the 'b' key
1967 self.assertIterInteresting(2551 self.assertIterInteresting([target, b_key],
1968 [(target, []),2552 [(('b',), 'other content')],
1969 (b_key, [(('b',), 'other content')])],2553 [target], [basis])
1970 [target], [basis])
19712554
1972 def test_common_sub_page(self):2555 def test_common_sub_page(self):
1973 basis = self.get_map_key({('aaa',): 'common',2556 basis = self.get_map_key({('aaa',): 'common',
@@ -1991,12 +2574,11 @@
1991 # The key for the internal aa node2574 # The key for the internal aa node
1992 a_key = target_map._root_node._items['a'].key()2575 a_key = target_map._root_node._items['a'].key()
1993 # The key for the leaf aab node2576 # The key for the leaf aab node
2577 # aaa_key = target_map._root_node._items['a']._items['aaa'].key()
1994 aab_key = target_map._root_node._items['a']._items['aab'].key()2578 aab_key = target_map._root_node._items['a']._items['aab'].key()
1995 self.assertIterInteresting(2579 self.assertIterInteresting([target, a_key, aab_key],
1996 [(target, []),2580 [(('aab',), 'new')],
1997 (a_key, []),2581 [target], [basis])
1998 (aab_key, [(('aab',), 'new')])],
1999 [target], [basis])
20002582
2001 def test_common_leaf(self):2583 def test_common_leaf(self):
2002 basis = self.get_map_key({})2584 basis = self.get_map_key({})
@@ -2040,29 +2622,22 @@
2040 a_key = target3_map._root_node._items['a'].key()2622 a_key = target3_map._root_node._items['a'].key()
2041 aac_key = target3_map._root_node._items['a']._items['aac'].key()2623 aac_key = target3_map._root_node._items['a']._items['aac'].key()
2042 self.assertIterInteresting(2624 self.assertIterInteresting(
2043 [(None, [(('aaa',), 'common')]),2625 [target1, target2, target3, a_key, aac_key, b_key],
2044 (target1, []),2626 [(('aaa',), 'common'), (('bbb',), 'new'), (('aac',), 'other')],
2045 (target2, []),2627 [target1, target2, target3], [basis])
2046 (target3, []),2628
2047 (b_key, [(('bbb',), 'new')]),2629 self.assertIterInteresting(
2048 (a_key, []),2630 [target2, target3, a_key, aac_key, b_key],
2049 (aac_key, [(('aac',), 'other')]),2631 [(('bbb',), 'new'), (('aac',), 'other')],
2050 ], [target1, target2, target3], [basis])2632 [target2, target3], [target1])
20512633
2052 self.assertIterInteresting(2634 # Technically, target1 could be filtered out, but since it is a root
2053 [(target2, []),2635 # node, we yield it immediately, rather than waiting to find out much
2054 (target3, []),2636 # later on.
2055 (b_key, [(('bbb',), 'new')]),2637 self.assertIterInteresting(
2056 (a_key, []),2638 [target1],
2057 (aac_key, [(('aac',), 'other')]),2639 [],
2058 ], [target2, target3], [target1])2640 [target1], [target3])
2059
2060 # This may be a case that we relax. A root node is a deep child of the
2061 # excluded set. The cost is buffering root nodes until we have
2062 # determined all possible exclusions. (Because a prefix of '', cannot
2063 # be excluded.)
2064 self.assertIterInteresting(
2065 [], [target1], [target3])
20662641
2067 def test_multiple_maps(self):2642 def test_multiple_maps(self):
2068 basis1 = self.get_map_key({('aaa',): 'common',2643 basis1 = self.get_map_key({('aaa',): 'common',
@@ -2111,13 +2686,9 @@
2111 # The key for the leaf bba node2686 # The key for the leaf bba node
2112 bba_key = target2_map._root_node._items['b']._items['bba'].key()2687 bba_key = target2_map._root_node._items['b']._items['bba'].key()
2113 self.assertIterInteresting(2688 self.assertIterInteresting(
2114 [(target1, []),2689 [target1, target2, a_key, aac_key, b_key, bba_key],
2115 (target2, []),2690 [(('aac',), 'target1'), (('bba',), 'target2')],
2116 (a_key, []),2691 [target1, target2], [basis1, basis2])
2117 (b_key, []),
2118 (aac_key, [(('aac',), 'target1')]),
2119 (bba_key, [(('bba',), 'target2')]),
2120 ], [target1, target2], [basis1, basis2])
21212692
2122 def test_multiple_maps_overlapping_common_new(self):2693 def test_multiple_maps_overlapping_common_new(self):
2123 # Test that when a node found through the interesting_keys iteration2694 # Test that when a node found through the interesting_keys iteration
@@ -2188,17 +2759,10 @@
2188 right_map._dump_tree())2759 right_map._dump_tree())
2189 # Keys from the right side target - none, the root is enough.2760 # Keys from the right side target - none, the root is enough.
2190 # Test behaviour2761 # Test behaviour
2191 self.expectFailure("we don't properly filter different depths",
2192 self.assertIterInteresting,
2193 [(left, []),
2194 (right, []),
2195 (l_d_key, [(('ddd',), 'change')]),
2196 ], [left, right], [basis])
2197 self.assertIterInteresting(2762 self.assertIterInteresting(
2198 [(left, []),2763 [right, left, l_d_key],
2199 (right, []),2764 [(('ddd',), 'change')],
2200 (l_d_key, [(('ddd',), 'change')]),2765 [left, right], [basis])
2201 ], [left, right], [basis])
22022766
2203 def test_multiple_maps_similar(self):2767 def test_multiple_maps_similar(self):
2204 # We want to have a depth=2 tree, with multiple entries in each leaf2768 # We want to have a depth=2 tree, with multiple entries in each leaf
@@ -2259,8 +2823,6 @@
2259 r_a_key = right_map._root_node._items['a'].key()2823 r_a_key = right_map._root_node._items['a'].key()
2260 r_c_key = right_map._root_node._items['c'].key()2824 r_c_key = right_map._root_node._items['c'].key()
2261 self.assertIterInteresting(2825 self.assertIterInteresting(
2262 [(left, []),2826 [right, left, l_a_key, r_c_key],
2263 (right, []),2827 [(('abb',), 'changed left'), (('cbb',), 'changed right')],
2264 (l_a_key, [(('abb',), 'changed left')]),2828 [left, right], [basis])
2265 (r_c_key, [(('cbb',), 'changed right')]),
2266 ], [left, right], [basis])