Merge lp:~jameinel/bzr/1.17-chk-multilevel into lp:~bzr/bzr/trunk-old
- 1.17-chk-multilevel
- Merge into trunk-old
Proposed by
John A Meinel
Status: | Superseded |
---|---|
Proposed branch: | lp:~jameinel/bzr/1.17-chk-multilevel |
Merge into: | lp:~bzr/bzr/trunk-old |
Diff against target: | 1246 lines |
To merge this branch: | bzr merge lp:~jameinel/bzr/1.17-chk-multilevel |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
bzr-core | Pending | ||
Review via email: mp+8100@code.launchpad.net |
This proposal has been superseded by a proposal from 2009-07-02.
Commit message
Description of the change
To post a comment you must log in.
Revision history for this message
John A Meinel (jameinel) wrote : | # |
Revision history for this message
Robert Collins (lifeless) wrote : | # |
Shallow immediate review:
The docstring for the class, and perhaps the class name could be better.
In terms of sets its:
interesting - uninteresting
So perhaps
CHKMapDifference
as a class name. And for the docstring, something like
"""Iterate the stored pages and key,value pairs for (new - old).
This class provides a generator over the stored CHK pages and the
key,value pairs that are in any of the new maps and not in any of the
old maps.
"""
-Rob
Preview Diff
[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1 | === modified file 'bzrlib/chk_map.py' |
2 | --- bzrlib/chk_map.py 2009-06-26 09:24:34 +0000 |
3 | +++ bzrlib/chk_map.py 2009-07-01 22:35:14 +0000 |
4 | @@ -1398,106 +1398,202 @@ |
5 | return node |
6 | |
7 | |
8 | -def _find_children_info(store, interesting_keys, uninteresting_keys, pb): |
9 | - """Read the associated records, and determine what is interesting.""" |
10 | - uninteresting_keys = set(uninteresting_keys) |
11 | - chks_to_read = uninteresting_keys.union(interesting_keys) |
12 | - next_uninteresting = set() |
13 | - next_interesting = set() |
14 | - next_interesting_intersection = None |
15 | - uninteresting_items = set() |
16 | - interesting_items = set() |
17 | - interesting_to_yield = [] |
18 | - for record in store.get_record_stream(chks_to_read, 'unordered', True): |
19 | - # records_read.add(record.key()) |
20 | - if pb is not None: |
21 | - pb.tick() |
22 | - bytes = record.get_bytes_as('fulltext') |
23 | - # We don't care about search_key_func for this code, because we only |
24 | - # care about external references. |
25 | - node = _deserialise(bytes, record.key, search_key_func=None) |
26 | - if record.key in uninteresting_keys: |
27 | - if type(node) is InternalNode: |
28 | - next_uninteresting.update(node.refs()) |
29 | - else: |
30 | - # We know we are at a LeafNode, so we can pass None for the |
31 | - # store |
32 | - uninteresting_items.update(node.iteritems(None)) |
33 | - else: |
34 | - interesting_to_yield.append(record.key) |
35 | - if type(node) is InternalNode: |
36 | - if next_interesting_intersection is None: |
37 | - next_interesting_intersection = set(node.refs()) |
38 | - else: |
39 | - next_interesting_intersection = \ |
40 | - next_interesting_intersection.intersection(node.refs()) |
41 | - next_interesting.update(node.refs()) |
42 | - else: |
43 | - interesting_items.update(node.iteritems(None)) |
44 | - return (next_uninteresting, uninteresting_items, |
45 | - next_interesting, interesting_to_yield, interesting_items, |
46 | - next_interesting_intersection) |
47 | - |
48 | - |
49 | -def _find_all_uninteresting(store, interesting_root_keys, |
50 | - uninteresting_root_keys, pb): |
51 | - """Determine the full set of uninteresting keys.""" |
52 | - # What about duplicates between interesting_root_keys and |
53 | - # uninteresting_root_keys? |
54 | - if not uninteresting_root_keys: |
55 | - # Shortcut case. We know there is nothing uninteresting to filter out |
56 | - # So we just let the rest of the algorithm do the work |
57 | - # We know there is nothing uninteresting, and we didn't have to read |
58 | - # any interesting records yet. |
59 | - return (set(), set(), set(interesting_root_keys), [], set()) |
60 | - all_uninteresting_chks = set(uninteresting_root_keys) |
61 | - all_uninteresting_items = set() |
62 | - |
63 | - # First step, find the direct children of both the interesting and |
64 | - # uninteresting set |
65 | - (uninteresting_keys, uninteresting_items, |
66 | - interesting_keys, interesting_to_yield, |
67 | - interesting_items, interesting_intersection, |
68 | - ) = _find_children_info(store, interesting_root_keys, |
69 | - uninteresting_root_keys, |
70 | - pb=pb) |
71 | - all_uninteresting_chks.update(uninteresting_keys) |
72 | - all_uninteresting_items.update(uninteresting_items) |
73 | - del uninteresting_items |
74 | - # Do not examine in detail pages common to all interesting trees. |
75 | - # Pages that are common to all interesting trees will have their |
76 | - # older versions found via the uninteresting tree traversal. Some pages |
77 | - # found via the interesting trees traversal will be uninteresting for |
78 | - # other of the interesting trees, which is why we require the pages to be |
79 | - # common for us to trim them. |
80 | - if interesting_intersection is not None: |
81 | - uninteresting_keys.difference_update(interesting_intersection) |
82 | - |
83 | - # Second, find the full set of uninteresting bits reachable by the |
84 | - # uninteresting roots |
85 | - chks_to_read = uninteresting_keys |
86 | - while chks_to_read: |
87 | - next_chks = set() |
88 | - for record in store.get_record_stream(chks_to_read, 'unordered', False): |
89 | - # TODO: Handle 'absent' |
90 | - if pb is not None: |
91 | - pb.tick() |
92 | +class InterestingNodeIterator(object): |
93 | + """Determine the nodes and items that are 'interesting' |
94 | + |
95 | + This is defined as being present in the interesting set, but not being |
96 | + present in the uninteresting set. |
97 | + """ |
98 | + |
99 | + def __init__(self, store, interesting_root_keys, uninteresting_root_keys, |
100 | + search_key_func, pb=None): |
101 | + self._store = store |
102 | + self._interesting_root_keys = interesting_root_keys |
103 | + self._uninteresting_root_keys = uninteresting_root_keys |
104 | + self._pb = pb |
105 | + # All uninteresting chks that we have seen. By the time they are added |
106 | + # here, they should be either fully ignored, or queued up for |
107 | + # processing |
108 | + self._all_uninteresting_chks = set(self._uninteresting_root_keys) |
109 | + # All items that we have seen from the uninteresting_root_keys |
110 | + self._all_uninteresting_items = set() |
111 | + # These are interesting items which were either read, or already in the |
112 | + # interesting queue (so we don't need to walk them again) |
113 | + self._processed_interesting_refs = set() |
114 | + self._search_key_func = search_key_func |
115 | + |
116 | + # The uninteresting and interesting nodes to be searched |
117 | + self._uninteresting_queue = [] |
118 | + self._interesting_queue = [] |
119 | + # Holds the (key, value) items found when processing the root nodes, |
120 | + # waiting for the uninteresting nodes to be walked |
121 | + self._interesting_item_queue = [] |
122 | + self._state = None |
123 | + |
124 | + def _read_nodes_from_store(self, keys): |
125 | + # We chose not to use _page_cache, because we think in terms of records |
126 | + # to be yielded. Also, we expect to touch each page only 1 time during |
127 | + # this code. (We may want to evaluate saving the raw bytes into the |
128 | + # page cache, which would allow a working tree update after the fetch |
129 | + # to not have to read the bytes again.) |
130 | + stream = self._store.get_record_stream(keys, 'unordered', True) |
131 | + for record in stream: |
132 | + if self._pb is not None: |
133 | + self._pb.tick() |
134 | + if record.storage_kind == 'absent': |
135 | + raise errors.NoSuchRevision(self._store, record.key) |
136 | bytes = record.get_bytes_as('fulltext') |
137 | - # We don't care about search_key_func for this code, because we |
138 | - # only care about external references. |
139 | - node = _deserialise(bytes, record.key, search_key_func=None) |
140 | + node = _deserialise(bytes, record.key, |
141 | + search_key_func=self._search_key_func) |
142 | if type(node) is InternalNode: |
143 | - # uninteresting_prefix_chks.update(node._items.iteritems()) |
144 | - chks = node._items.values() |
145 | - # TODO: We remove the entries that are already in |
146 | - # uninteresting_chks ? |
147 | - next_chks.update(chks) |
148 | - all_uninteresting_chks.update(chks) |
149 | + # Note we don't have to do node.refs() because we know that |
150 | + # there are no children that have been pushed into this node |
151 | + prefix_refs = node._items.items() |
152 | + items = [] |
153 | else: |
154 | - all_uninteresting_items.update(node._items.iteritems()) |
155 | - chks_to_read = next_chks |
156 | - return (all_uninteresting_chks, all_uninteresting_items, |
157 | - interesting_keys, interesting_to_yield, interesting_items) |
158 | + prefix_refs = [] |
159 | + items = node._items.items() |
160 | + yield record, node, prefix_refs, items |
161 | + |
162 | + def _read_uninteresting_roots(self): |
163 | + uninteresting_chks_to_enqueue = [] |
164 | + all_uninteresting_chks = self._all_uninteresting_chks |
165 | + for record, node, prefix_refs, items in \ |
166 | + self._read_nodes_from_store(self._uninteresting_root_keys): |
167 | + # Uninteresting node |
168 | + prefix_refs = [p_r for p_r in prefix_refs |
169 | + if p_r[1] not in all_uninteresting_chks] |
170 | + new_refs = [p_r[1] for p_r in prefix_refs] |
171 | + all_uninteresting_chks.update(new_refs) |
172 | + self._all_uninteresting_items.update(items) |
173 | + # Queue up the uninteresting references |
174 | + # Don't actually put them in the 'to-read' queue until we have |
175 | + # finished checking the interesting references |
176 | + uninteresting_chks_to_enqueue.extend(prefix_refs) |
177 | + return uninteresting_chks_to_enqueue |
178 | + |
179 | + def _enqueue_uninteresting(self, interesting_prefixes, |
180 | + uninteresting_chks_to_enqueue): |
181 | + # At this point, we have read all the uninteresting and interesting |
182 | + # items, so we can queue up the uninteresting stuff, knowing that we've |
183 | + # handled the interesting ones |
184 | + for prefix, ref in uninteresting_chks_to_enqueue: |
185 | + not_interesting = True |
186 | + for i in xrange(len(prefix), 0, -1): |
187 | + if prefix[:i] in interesting_prefixes: |
188 | + not_interesting = False |
189 | + break |
190 | + if not_interesting: |
191 | + # This prefix is not part of the remaining 'interesting set' |
192 | + continue |
193 | + self._uninteresting_queue.append(ref) |
194 | + |
195 | + def _read_all_roots(self): |
196 | + """Read the root pages. |
197 | + |
198 | + This is structured as a generator, so that the root records can be |
199 | + yielded up to whoever needs them without any buffering. |
200 | + """ |
201 | + # This is the bootstrap phase |
202 | + if not self._uninteresting_root_keys: |
203 | + # With no uninteresting_root_keys we can just shortcut and be ready |
204 | + # for _flush_interesting_queue |
205 | + self._interesting_queue = list(self._interesting_root_keys) |
206 | + return |
207 | + uninteresting_chks_to_enqueue = self._read_uninteresting_roots() |
208 | + # filter out any root keys that are already known to be uninteresting |
209 | + interesting_keys = set(self._interesting_root_keys).difference( |
210 | + self._all_uninteresting_chks) |
211 | + # These are prefixes that are present in interesting_keys that we are |
212 | + # thinking to yield |
213 | + interesting_prefixes = set() |
214 | + # We are about to yield all of these, so we don't want them getting |
215 | + # added a second time |
216 | + processed_interesting_refs = self._processed_interesting_refs |
217 | + processed_interesting_refs.update(interesting_keys) |
218 | + for record, node, prefix_refs, items in \ |
219 | + self._read_nodes_from_store(interesting_keys): |
220 | + # At this level, we now know all the uninteresting references |
221 | + # So we filter and queue up whatever is remaining |
222 | + prefix_refs = [p_r for p_r in prefix_refs |
223 | + if p_r[1] not in self._all_uninteresting_chks |
224 | + and p_r[1] not in processed_interesting_refs] |
225 | + refs = [p_r[1] for p_r in prefix_refs] |
226 | + interesting_prefixes.update([p_r[0] for p_r in prefix_refs]) |
227 | + self._interesting_queue.extend(refs) |
228 | + # TODO: We can potentially get multiple items here, however the |
229 | + # current design allows for this, as callers will do the work |
230 | + # to make the results unique. We might profile whether we |
231 | + # gain anything by ensuring unique return values for items |
232 | + interesting_items = [item for item in items |
233 | + if item not in self._all_uninteresting_items] |
234 | + self._interesting_item_queue.extend(interesting_items) |
235 | + interesting_prefixes.update([self._search_key_func(item[0]) |
236 | + for item in interesting_items]) |
237 | + processed_interesting_refs.update(refs) |
238 | + yield record |
239 | + # For interesting_prefixes we have the full length prefixes queued up. |
240 | + # However, we also need possible prefixes. (If we have a known ref to |
241 | + # 'ab', then we also need to include 'a'.) So expand the |
242 | + # interesting_prefixes to include all shorter prefixes |
243 | + for prefix in list(interesting_prefixes): |
244 | + interesting_prefixes.update([prefix[:i] |
245 | + for i in xrange(1, len(prefix))]) |
246 | + self._enqueue_uninteresting(interesting_prefixes, |
247 | + uninteresting_chks_to_enqueue) |
248 | + |
249 | + def _flush_interesting_queue(self): |
250 | + # No need to maintain the heap invariant anymore, just pull things out |
251 | + # and process them |
252 | + refs = set(self._interesting_queue) |
253 | + self._interesting_queue = [] |
254 | + # First pass, flush all interesting items and convert to using direct refs |
255 | + all_uninteresting_chks = self._all_uninteresting_chks |
256 | + processed_interesting_refs = self._processed_interesting_refs |
257 | + all_uninteresting_items = self._all_uninteresting_items |
258 | + interesting_items = [item for item in self._interesting_item_queue |
259 | + if item not in all_uninteresting_items] |
260 | + self._interesting_item_queue = [] |
261 | + if interesting_items: |
262 | + yield None, interesting_items |
263 | + refs = refs.difference(all_uninteresting_chks) |
264 | + while refs: |
265 | + next_refs = set() |
266 | + next_refs_update = next_refs.update |
267 | + # Inlining _read_nodes_from_store improves 'bzr branch bzr.dev' |
268 | + # from 1m54s to 1m51s. Consider it. |
269 | + for record, _, p_refs, items in self._read_nodes_from_store(refs): |
270 | + items = [item for item in items |
271 | + if item not in all_uninteresting_items] |
272 | + yield record, items |
273 | + next_refs_update([p_r[1] for p_r in p_refs]) |
274 | + next_refs = next_refs.difference(all_uninteresting_chks) |
275 | + next_refs = next_refs.difference(processed_interesting_refs) |
276 | + processed_interesting_refs.update(next_refs) |
277 | + refs = next_refs |
278 | + |
279 | + def _process_next_uninteresting(self): |
280 | + # Since we don't filter uninteresting any further than during |
281 | + # _read_all_roots, process the whole queue in a single pass. |
282 | + refs = self._uninteresting_queue |
283 | + self._uninteresting_queue = [] |
284 | + all_uninteresting_chks = self._all_uninteresting_chks |
285 | + for record, _, prefix_refs, items in self._read_nodes_from_store(refs): |
286 | + self._all_uninteresting_items.update(items) |
287 | + refs = [r for _,r in prefix_refs if r not in all_uninteresting_chks] |
288 | + self._uninteresting_queue.extend(refs) |
289 | + all_uninteresting_chks.update(refs) |
290 | + |
291 | + def _process_queues(self): |
292 | + while self._uninteresting_queue: |
293 | + self._process_next_uninteresting() |
294 | + return self._flush_interesting_queue() |
295 | + |
296 | + def process(self): |
297 | + for record in self._read_all_roots(): |
298 | + yield record, [] |
299 | + for record, items in self._process_queues(): |
300 | + yield record, items |
301 | |
302 | |
303 | def iter_interesting_nodes(store, interesting_root_keys, |
304 | @@ -1514,72 +1610,11 @@ |
305 | :return: Yield |
306 | (interesting record, {interesting key:values}) |
307 | """ |
308 | - # TODO: consider that it may be more memory efficient to use the 20-byte |
309 | - # sha1 string, rather than tuples of hexidecimal sha1 strings. |
310 | - # TODO: Try to factor out a lot of the get_record_stream() calls into a |
311 | - # helper function similar to _read_bytes. This function should be |
312 | - # able to use nodes from the _page_cache as well as actually |
313 | - # requesting bytes from the store. |
314 | - |
315 | - (all_uninteresting_chks, all_uninteresting_items, interesting_keys, |
316 | - interesting_to_yield, interesting_items) = _find_all_uninteresting(store, |
317 | - interesting_root_keys, uninteresting_root_keys, pb) |
318 | - |
319 | - # Now that we know everything uninteresting, we can yield information from |
320 | - # our first request |
321 | - interesting_items.difference_update(all_uninteresting_items) |
322 | - interesting_to_yield = set(interesting_to_yield) - all_uninteresting_chks |
323 | - if interesting_items: |
324 | - yield None, interesting_items |
325 | - if interesting_to_yield: |
326 | - # We request these records again, rather than buffering the root |
327 | - # records, most likely they are still in the _group_cache anyway. |
328 | - for record in store.get_record_stream(interesting_to_yield, |
329 | - 'unordered', False): |
330 | - yield record, [] |
331 | - all_uninteresting_chks.update(interesting_to_yield) |
332 | - interesting_keys.difference_update(all_uninteresting_chks) |
333 | - |
334 | - chks_to_read = interesting_keys |
335 | - counter = 0 |
336 | - while chks_to_read: |
337 | - next_chks = set() |
338 | - for record in store.get_record_stream(chks_to_read, 'unordered', False): |
339 | - counter += 1 |
340 | - if pb is not None: |
341 | - pb.update('find chk pages', counter) |
342 | - # TODO: Handle 'absent'? |
343 | - bytes = record.get_bytes_as('fulltext') |
344 | - # We don't care about search_key_func for this code, because we |
345 | - # only care about external references. |
346 | - node = _deserialise(bytes, record.key, search_key_func=None) |
347 | - if type(node) is InternalNode: |
348 | - # all_uninteresting_chks grows large, as it lists all nodes we |
349 | - # don't want to process (including already seen interesting |
350 | - # nodes). |
351 | - # small.difference_update(large) scales O(large), but |
352 | - # small.difference(large) scales O(small). |
353 | - # Also, we know we just _deserialised this node, so we can |
354 | - # access the dict directly. |
355 | - chks = set(node._items.itervalues()).difference( |
356 | - all_uninteresting_chks) |
357 | - # Is set() and .difference_update better than: |
358 | - # chks = [chk for chk in node.refs() |
359 | - # if chk not in all_uninteresting_chks] |
360 | - next_chks.update(chks) |
361 | - # These are now uninteresting everywhere else |
362 | - all_uninteresting_chks.update(chks) |
363 | - interesting_items = [] |
364 | - else: |
365 | - interesting_items = [item for item in node._items.iteritems() |
366 | - if item not in all_uninteresting_items] |
367 | - # TODO: Do we need to filter out items that we have already |
368 | - # seen on other pages? We don't really want to buffer the |
369 | - # whole thing, but it does mean that callers need to |
370 | - # understand they may get duplicate values. |
371 | - # all_uninteresting_items.update(interesting_items) |
372 | - yield record, interesting_items |
373 | - chks_to_read = next_chks |
374 | + iterator = InterestingNodeIterator(store, interesting_root_keys, |
375 | + uninteresting_root_keys, |
376 | + search_key_func=store._search_key_func, |
377 | + pb=pb) |
378 | + return iterator.process() |
379 | |
380 | |
381 | try: |
382 | |
383 | === modified file 'bzrlib/repofmt/groupcompress_repo.py' |
384 | --- bzrlib/repofmt/groupcompress_repo.py 2009-06-29 14:51:13 +0000 |
385 | +++ bzrlib/repofmt/groupcompress_repo.py 2009-07-01 22:35:14 +0000 |
386 | @@ -652,6 +652,9 @@ |
387 | parents=False, is_locked=self.is_locked, |
388 | inconsistency_fatal=False), |
389 | access=self._pack_collection.chk_index.data_access) |
390 | + search_key_name = self._format._serializer.search_key_name |
391 | + search_key_func = chk_map.search_key_registry.get(search_key_name) |
392 | + self.chk_bytes._search_key_func = search_key_func |
393 | # True when the repository object is 'write locked' (as opposed to the |
394 | # physical lock only taken out around changes to the pack-names list.) |
395 | # Another way to represent this would be a decorator around the control |
396 | |
397 | === modified file 'bzrlib/tests/test_chk_map.py' |
398 | --- bzrlib/tests/test_chk_map.py 2009-06-26 09:24:34 +0000 |
399 | +++ bzrlib/tests/test_chk_map.py 2009-07-01 22:35:14 +0000 |
400 | @@ -20,6 +20,7 @@ |
401 | |
402 | from bzrlib import ( |
403 | chk_map, |
404 | + groupcompress, |
405 | osutils, |
406 | tests, |
407 | ) |
408 | @@ -59,17 +60,14 @@ |
409 | self.assertCommonPrefix('', '', '') |
410 | |
411 | |
412 | -class TestCaseWithStore(tests.TestCaseWithTransport): |
413 | +class TestCaseWithStore(tests.TestCaseWithMemoryTransport): |
414 | |
415 | def get_chk_bytes(self): |
416 | # The easiest way to get a CHK store is a development6 repository and |
417 | # then work with the chk_bytes attribute directly. |
418 | - repo = self.make_repository(".", format="development6-rich-root") |
419 | - repo.lock_write() |
420 | - self.addCleanup(repo.unlock) |
421 | - repo.start_write_group() |
422 | - self.addCleanup(repo.abort_write_group) |
423 | - return repo.chk_bytes |
424 | + factory = groupcompress.make_pack_factory(False, False, 1) |
425 | + self.chk_bytes = factory(self.get_transport()) |
426 | + return self.chk_bytes |
427 | |
428 | def _get_map(self, a_dict, maximum_size=0, chk_bytes=None, key_width=1, |
429 | search_key_func=None): |
430 | @@ -97,6 +95,246 @@ |
431 | return dict(node.iteritems(*args)) |
432 | |
433 | |
434 | +class TestCaseWithExampleMaps(TestCaseWithStore): |
435 | + |
436 | + def get_chk_bytes(self): |
437 | + if getattr(self, '_chk_bytes', None) is None: |
438 | + self._chk_bytes = super(TestCaseWithExampleMaps, |
439 | + self).get_chk_bytes() |
440 | + return self._chk_bytes |
441 | + |
442 | + def get_map(self, a_dict, maximum_size=100, search_key_func=None): |
443 | + c_map = self._get_map(a_dict, maximum_size=maximum_size, |
444 | + chk_bytes=self.get_chk_bytes(), |
445 | + search_key_func=search_key_func) |
446 | + return c_map |
447 | + |
448 | + def make_root_only_map(self, search_key_func=None): |
449 | + return self.get_map({ |
450 | + ('aaa',): 'initial aaa content', |
451 | + ('abb',): 'initial abb content', |
452 | + }, search_key_func=search_key_func) |
453 | + |
454 | + def make_root_only_aaa_ddd_map(self, search_key_func=None): |
455 | + return self.get_map({ |
456 | + ('aaa',): 'initial aaa content', |
457 | + ('ddd',): 'initial ddd content', |
458 | + }, search_key_func=search_key_func) |
459 | + |
460 | + def make_one_deep_map(self, search_key_func=None): |
461 | + # Same as root_only_map, except it forces an InternalNode at the root |
462 | + return self.get_map({ |
463 | + ('aaa',): 'initial aaa content', |
464 | + ('abb',): 'initial abb content', |
465 | + ('ccc',): 'initial ccc content', |
466 | + ('ddd',): 'initial ddd content', |
467 | + }, search_key_func=search_key_func) |
468 | + |
469 | + def make_two_deep_map(self, search_key_func=None): |
470 | + # Carefully chosen so that it creates a 2-deep map for both |
471 | + # _search_key_plain and for _search_key_16 |
472 | + # Also so that things line up with make_one_deep_two_prefix_map |
473 | + return self.get_map({ |
474 | + ('aaa',): 'initial aaa content', |
475 | + ('abb',): 'initial abb content', |
476 | + ('acc',): 'initial acc content', |
477 | + ('ace',): 'initial ace content', |
478 | + ('add',): 'initial add content', |
479 | + ('adh',): 'initial adh content', |
480 | + ('adl',): 'initial adl content', |
481 | + ('ccc',): 'initial ccc content', |
482 | + ('ddd',): 'initial ddd content', |
483 | + }, search_key_func=search_key_func) |
484 | + |
485 | + def make_one_deep_two_prefix_map(self, search_key_func=None): |
486 | + """Create a map with one internal node, but references are extra long. |
487 | + |
488 | + Otherwise has similar content to make_two_deep_map. |
489 | + """ |
490 | + return self.get_map({ |
491 | + ('aaa',): 'initial aaa content', |
492 | + ('add',): 'initial add content', |
493 | + ('adh',): 'initial adh content', |
494 | + ('adl',): 'initial adl content', |
495 | + }, search_key_func=search_key_func) |
496 | + |
497 | + def make_one_deep_one_prefix_map(self, search_key_func=None): |
498 | + """Create a map with one internal node, but references are extra long. |
499 | + |
500 | + Similar to make_one_deep_two_prefix_map, except the split is at the |
501 | + first char, rather than the second. |
502 | + """ |
503 | + return self.get_map({ |
504 | + ('add',): 'initial add content', |
505 | + ('adh',): 'initial adh content', |
506 | + ('adl',): 'initial adl content', |
507 | + ('bbb',): 'initial bbb content', |
508 | + }, search_key_func=search_key_func) |
509 | + |
510 | + |
511 | +class TestTestCaseWithExampleMaps(TestCaseWithExampleMaps): |
512 | + """Actual tests for the provided examples.""" |
513 | + |
514 | + def test_root_only_map_plain(self): |
515 | + c_map = self.make_root_only_map() |
516 | + self.assertEqualDiff( |
517 | + "'' LeafNode\n" |
518 | + " ('aaa',) 'initial aaa content'\n" |
519 | + " ('abb',) 'initial abb content'\n", |
520 | + c_map._dump_tree()) |
521 | + |
522 | + def test_root_only_map_16(self): |
523 | + c_map = self.make_root_only_map(search_key_func=chk_map._search_key_16) |
524 | + self.assertEqualDiff( |
525 | + "'' LeafNode\n" |
526 | + " ('aaa',) 'initial aaa content'\n" |
527 | + " ('abb',) 'initial abb content'\n", |
528 | + c_map._dump_tree()) |
529 | + |
530 | + def test_one_deep_map_plain(self): |
531 | + c_map = self.make_one_deep_map() |
532 | + self.assertEqualDiff( |
533 | + "'' InternalNode\n" |
534 | + " 'a' LeafNode\n" |
535 | + " ('aaa',) 'initial aaa content'\n" |
536 | + " ('abb',) 'initial abb content'\n" |
537 | + " 'c' LeafNode\n" |
538 | + " ('ccc',) 'initial ccc content'\n" |
539 | + " 'd' LeafNode\n" |
540 | + " ('ddd',) 'initial ddd content'\n", |
541 | + c_map._dump_tree()) |
542 | + |
543 | + def test_one_deep_map_16(self): |
544 | + c_map = self.make_one_deep_map(search_key_func=chk_map._search_key_16) |
545 | + self.assertEqualDiff( |
546 | + "'' InternalNode\n" |
547 | + " '2' LeafNode\n" |
548 | + " ('ccc',) 'initial ccc content'\n" |
549 | + " '4' LeafNode\n" |
550 | + " ('abb',) 'initial abb content'\n" |
551 | + " 'F' LeafNode\n" |
552 | + " ('aaa',) 'initial aaa content'\n" |
553 | + " ('ddd',) 'initial ddd content'\n", |
554 | + c_map._dump_tree()) |
555 | + |
556 | + def test_root_only_aaa_ddd_plain(self): |
557 | + c_map = self.make_root_only_aaa_ddd_map() |
558 | + self.assertEqualDiff( |
559 | + "'' LeafNode\n" |
560 | + " ('aaa',) 'initial aaa content'\n" |
561 | + " ('ddd',) 'initial ddd content'\n", |
562 | + c_map._dump_tree()) |
563 | + |
564 | + def test_one_deep_map_16(self): |
565 | + c_map = self.make_root_only_aaa_ddd_map( |
566 | + search_key_func=chk_map._search_key_16) |
567 | + # We use 'aaa' and 'ddd' because they happen to map to 'F' when using |
568 | + # _search_key_16 |
569 | + self.assertEqualDiff( |
570 | + "'' LeafNode\n" |
571 | + " ('aaa',) 'initial aaa content'\n" |
572 | + " ('ddd',) 'initial ddd content'\n", |
573 | + c_map._dump_tree()) |
574 | + |
575 | + def test_two_deep_map_plain(self): |
576 | + c_map = self.make_two_deep_map() |
577 | + self.assertEqualDiff( |
578 | + "'' InternalNode\n" |
579 | + " 'a' InternalNode\n" |
580 | + " 'aa' LeafNode\n" |
581 | + " ('aaa',) 'initial aaa content'\n" |
582 | + " 'ab' LeafNode\n" |
583 | + " ('abb',) 'initial abb content'\n" |
584 | + " 'ac' LeafNode\n" |
585 | + " ('acc',) 'initial acc content'\n" |
586 | + " ('ace',) 'initial ace content'\n" |
587 | + " 'ad' LeafNode\n" |
588 | + " ('add',) 'initial add content'\n" |
589 | + " ('adh',) 'initial adh content'\n" |
590 | + " ('adl',) 'initial adl content'\n" |
591 | + " 'c' LeafNode\n" |
592 | + " ('ccc',) 'initial ccc content'\n" |
593 | + " 'd' LeafNode\n" |
594 | + " ('ddd',) 'initial ddd content'\n", |
595 | + c_map._dump_tree()) |
596 | + |
597 | + def test_two_deep_map_16(self): |
598 | + c_map = self.make_two_deep_map(search_key_func=chk_map._search_key_16) |
599 | + self.assertEqualDiff( |
600 | + "'' InternalNode\n" |
601 | + " '2' LeafNode\n" |
602 | + " ('acc',) 'initial acc content'\n" |
603 | + " ('ccc',) 'initial ccc content'\n" |
604 | + " '4' LeafNode\n" |
605 | + " ('abb',) 'initial abb content'\n" |
606 | + " 'C' LeafNode\n" |
607 | + " ('ace',) 'initial ace content'\n" |
608 | + " 'F' InternalNode\n" |
609 | + " 'F0' LeafNode\n" |
610 | + " ('aaa',) 'initial aaa content'\n" |
611 | + " 'F3' LeafNode\n" |
612 | + " ('adl',) 'initial adl content'\n" |
613 | + " 'F4' LeafNode\n" |
614 | + " ('adh',) 'initial adh content'\n" |
615 | + " 'FB' LeafNode\n" |
616 | + " ('ddd',) 'initial ddd content'\n" |
617 | + " 'FD' LeafNode\n" |
618 | + " ('add',) 'initial add content'\n", |
619 | + c_map._dump_tree()) |
620 | + |
621 | + def test_one_deep_two_prefix_map_plain(self): |
622 | + c_map = self.make_one_deep_two_prefix_map() |
623 | + self.assertEqualDiff( |
624 | + "'' InternalNode\n" |
625 | + " 'aa' LeafNode\n" |
626 | + " ('aaa',) 'initial aaa content'\n" |
627 | + " 'ad' LeafNode\n" |
628 | + " ('add',) 'initial add content'\n" |
629 | + " ('adh',) 'initial adh content'\n" |
630 | + " ('adl',) 'initial adl content'\n", |
631 | + c_map._dump_tree()) |
632 | + |
633 | + def test_one_deep_two_prefix_map_16(self): |
634 | + c_map = self.make_one_deep_two_prefix_map( |
635 | + search_key_func=chk_map._search_key_16) |
636 | + self.assertEqualDiff( |
637 | + "'' InternalNode\n" |
638 | + " 'F0' LeafNode\n" |
639 | + " ('aaa',) 'initial aaa content'\n" |
640 | + " 'F3' LeafNode\n" |
641 | + " ('adl',) 'initial adl content'\n" |
642 | + " 'F4' LeafNode\n" |
643 | + " ('adh',) 'initial adh content'\n" |
644 | + " 'FD' LeafNode\n" |
645 | + " ('add',) 'initial add content'\n", |
646 | + c_map._dump_tree()) |
647 | + |
648 | + def test_one_deep_one_prefix_map_plain(self): |
649 | + c_map = self.make_one_deep_one_prefix_map() |
650 | + self.assertEqualDiff( |
651 | + "'' InternalNode\n" |
652 | + " 'a' LeafNode\n" |
653 | + " ('add',) 'initial add content'\n" |
654 | + " ('adh',) 'initial adh content'\n" |
655 | + " ('adl',) 'initial adl content'\n" |
656 | + " 'b' LeafNode\n" |
657 | + " ('bbb',) 'initial bbb content'\n", |
658 | + c_map._dump_tree()) |
659 | + |
660 | + def test_one_deep_one_prefix_map_16(self): |
661 | + c_map = self.make_one_deep_one_prefix_map( |
662 | + search_key_func=chk_map._search_key_16) |
663 | + self.assertEqualDiff( |
664 | + "'' InternalNode\n" |
665 | + " '4' LeafNode\n" |
666 | + " ('bbb',) 'initial bbb content'\n" |
667 | + " 'F' LeafNode\n" |
668 | + " ('add',) 'initial add content'\n" |
669 | + " ('adh',) 'initial adh content'\n" |
670 | + " ('adl',) 'initial adl content'\n", |
671 | + c_map._dump_tree()) |
672 | + |
673 | + |
674 | class TestMap(TestCaseWithStore): |
675 | |
676 | def assertHasABMap(self, chk_bytes): |
677 | @@ -1886,62 +2124,408 @@ |
678 | # 1-4K get0 |
679 | |
680 | |
681 | -class TestIterInterestingNodes(TestCaseWithStore): |
682 | - |
683 | - def get_chk_bytes(self): |
684 | - if getattr(self, '_chk_bytes', None) is None: |
685 | - self._chk_bytes = super(TestIterInterestingNodes, |
686 | - self).get_chk_bytes() |
687 | - return self._chk_bytes |
688 | +class TestInterestingNodeIterator(TestCaseWithExampleMaps): |
689 | + |
690 | + def get_iterator(self, interesting_roots, uninteresting_roots, |
691 | + search_key_func=None): |
692 | + if search_key_func is None: |
693 | + search_key_func = chk_map._search_key_plain |
694 | + return chk_map.InterestingNodeIterator(self.get_chk_bytes(), |
695 | + interesting_roots, uninteresting_roots, search_key_func) |
696 | + |
697 | + def test__init__(self): |
698 | + c_map = self.make_root_only_map() |
699 | + key1 = c_map.key() |
700 | + c_map.map(('aaa',), 'new aaa content') |
701 | + key2 = c_map._save() |
702 | + iterator = self.get_iterator([key2], [key1]) |
703 | + self.assertEqual(set([key1]), iterator._all_uninteresting_chks) |
704 | + self.assertEqual([], iterator._uninteresting_queue) |
705 | + self.assertEqual([], iterator._interesting_queue) |
706 | + |
707 | + def help__read_all_roots(self, search_key_func): |
708 | + c_map = self.make_root_only_map(search_key_func=search_key_func) |
709 | + key1 = c_map.key() |
710 | + c_map.map(('aaa',), 'new aaa content') |
711 | + key2 = c_map._save() |
712 | + iterator = self.get_iterator([key2], [key1], search_key_func) |
713 | + root_results = [record.key for record in iterator._read_all_roots()] |
714 | + self.assertEqual([key2], root_results) |
715 | + # We should have queued up only items that aren't in the uninteresting |
716 | + # set |
717 | + self.assertEqual([(('aaa',), 'new aaa content')], |
718 | + iterator._interesting_item_queue) |
719 | + self.assertEqual([], iterator._interesting_queue) |
720 | + # And there are no uninteresting references, so that queue should be |
721 | + # empty |
722 | + self.assertEqual([], iterator._uninteresting_queue) |
723 | + |
724 | + def test__read_all_roots_plain(self): |
725 | + self.help__read_all_roots(search_key_func=chk_map._search_key_plain) |
726 | + |
727 | + def test__read_all_roots_16(self): |
728 | + self.help__read_all_roots(search_key_func=chk_map._search_key_16) |
729 | + |
730 | + def test__read_all_roots_skips_known_uninteresting(self): |
731 | + c_map = self.make_one_deep_map(chk_map._search_key_plain) |
732 | + key1 = c_map.key() |
733 | + c_map2 = self.make_root_only_map(chk_map._search_key_plain) |
734 | + key2 = c_map2.key() |
735 | + iterator = self.get_iterator([key2], [key1], chk_map._search_key_plain) |
736 | + root_results = [record.key for record in iterator._read_all_roots()] |
737 | + # We should have no results. key2 is completely contained within key1, |
738 | + # and we should have seen that in the first pass |
739 | + self.assertEqual([], root_results) |
740 | + |
741 | + def test__read_all_roots_prepares_queues(self): |
742 | + c_map = self.make_one_deep_map(chk_map._search_key_plain) |
743 | + key1 = c_map.key() |
744 | + c_map._dump_tree() # load everything |
745 | + key1_a = c_map._root_node._items['a'].key() |
746 | + c_map.map(('abb',), 'new abb content') |
747 | + key2 = c_map._save() |
748 | + key2_a = c_map._root_node._items['a'].key() |
749 | + iterator = self.get_iterator([key2], [key1], chk_map._search_key_plain) |
750 | + root_results = [record.key for record in iterator._read_all_roots()] |
751 | + self.assertEqual([key2], root_results) |
752 | + # At this point, we should have queued up only the 'a' Leaf on both |
753 | + # sides, both 'c' and 'd' are known to not have changed on both sides |
754 | + self.assertEqual([key2_a], iterator._interesting_queue) |
755 | + self.assertEqual([], iterator._interesting_item_queue) |
756 | + self.assertEqual([key1_a], iterator._uninteresting_queue) |
757 | + |
758 | + def test__read_all_roots_multi_interesting_prepares_queues(self): |
759 | + c_map = self.make_one_deep_map(chk_map._search_key_plain) |
760 | + key1 = c_map.key() |
761 | + c_map._dump_tree() # load everything |
762 | + key1_a = c_map._root_node._items['a'].key() |
763 | + key1_c = c_map._root_node._items['c'].key() |
764 | + c_map.map(('abb',), 'new abb content') |
765 | + key2 = c_map._save() |
766 | + key2_a = c_map._root_node._items['a'].key() |
767 | + key2_c = c_map._root_node._items['c'].key() |
768 | + c_map = chk_map.CHKMap(self.get_chk_bytes(), key1, |
769 | + chk_map._search_key_plain) |
770 | + c_map.map(('ccc',), 'new ccc content') |
771 | + key3 = c_map._save() |
772 | + key3_a = c_map._root_node._items['a'].key() |
773 | + key3_c = c_map._root_node._items['c'].key() |
774 | + iterator = self.get_iterator([key2, key3], [key1], |
775 | + chk_map._search_key_plain) |
776 | + root_results = [record.key for record in iterator._read_all_roots()] |
777 | + self.assertEqual(sorted([key2, key3]), sorted(root_results)) |
778 | + # We should have queued up key2_a, and key3_c, but not key2_c or key3_c |
779 | + self.assertEqual([key2_a, key3_c], iterator._interesting_queue) |
780 | + self.assertEqual([], iterator._interesting_item_queue) |
781 | + # And we should have queued up both a and c for the uninteresting set |
782 | + self.assertEqual([key1_a, key1_c], iterator._uninteresting_queue) |
783 | + |
784 | + def test__read_all_roots_different_depths(self): |
785 | + c_map = self.make_two_deep_map(chk_map._search_key_plain) |
786 | + c_map._dump_tree() # load everything |
787 | + key1 = c_map.key() |
788 | + key1_a = c_map._root_node._items['a'].key() |
789 | + key1_c = c_map._root_node._items['c'].key() |
790 | + key1_d = c_map._root_node._items['d'].key() |
791 | + |
792 | + c_map2 = self.make_one_deep_two_prefix_map(chk_map._search_key_plain) |
793 | + c_map2._dump_tree() |
794 | + key2 = c_map2.key() |
795 | + key2_aa = c_map2._root_node._items['aa'].key() |
796 | + key2_ad = c_map2._root_node._items['ad'].key() |
797 | + |
798 | + iterator = self.get_iterator([key2], [key1], chk_map._search_key_plain) |
799 | + root_results = [record.key for record in iterator._read_all_roots()] |
800 | + self.assertEqual([key2], root_results) |
801 | + # Only the 'a' subset should be queued up, since 'c' and 'd' cannot be |
802 | + # present |
803 | + self.assertEqual([key1_a], iterator._uninteresting_queue) |
804 | + self.assertEqual([key2_aa, key2_ad], iterator._interesting_queue) |
805 | + self.assertEqual([], iterator._interesting_item_queue) |
806 | + |
807 | + iterator = self.get_iterator([key1], [key2], chk_map._search_key_plain) |
808 | + root_results = [record.key for record in iterator._read_all_roots()] |
809 | + self.assertEqual([key1], root_results) |
810 | + |
811 | + self.assertEqual([key2_aa, key2_ad], iterator._uninteresting_queue) |
812 | + self.assertEqual([key1_a, key1_c, key1_d], iterator._interesting_queue) |
813 | + self.assertEqual([], iterator._interesting_item_queue) |
814 | + |
815 | + def test__read_all_roots_different_depths_16(self): |
816 | + c_map = self.make_two_deep_map(chk_map._search_key_16) |
817 | + c_map._dump_tree() # load everything |
818 | + key1 = c_map.key() |
819 | + key1_2 = c_map._root_node._items['2'].key() |
820 | + key1_4 = c_map._root_node._items['4'].key() |
821 | + key1_C = c_map._root_node._items['C'].key() |
822 | + key1_F = c_map._root_node._items['F'].key() |
823 | + |
824 | + c_map2 = self.make_one_deep_two_prefix_map(chk_map._search_key_16) |
825 | + c_map2._dump_tree() |
826 | + key2 = c_map2.key() |
827 | + key2_F0 = c_map2._root_node._items['F0'].key() |
828 | + key2_F3 = c_map2._root_node._items['F3'].key() |
829 | + key2_F4 = c_map2._root_node._items['F4'].key() |
830 | + key2_FD = c_map2._root_node._items['FD'].key() |
831 | + |
832 | + iterator = self.get_iterator([key2], [key1], chk_map._search_key_16) |
833 | + root_results = [record.key for record in iterator._read_all_roots()] |
834 | + self.assertEqual([key2], root_results) |
835 | + # Only the subset of keys that may be present should be queued up. |
836 | + self.assertEqual([key1_F], iterator._uninteresting_queue) |
837 | + self.assertEqual(sorted([key2_F0, key2_F3, key2_F4, key2_FD]), |
838 | + sorted(iterator._interesting_queue)) |
839 | + self.assertEqual([], iterator._interesting_item_queue) |
840 | + |
841 | + iterator = self.get_iterator([key1], [key2], chk_map._search_key_16) |
842 | + root_results = [record.key for record in iterator._read_all_roots()] |
843 | + self.assertEqual([key1], root_results) |
844 | + |
845 | + self.assertEqual(sorted([key2_F0, key2_F3, key2_F4, key2_FD]), |
846 | + sorted(iterator._uninteresting_queue)) |
847 | + self.assertEqual(sorted([key1_2, key1_4, key1_C, key1_F]), |
848 | + sorted(iterator._interesting_queue)) |
849 | + self.assertEqual([], iterator._interesting_item_queue) |
850 | + |
851 | + def test__read_all_roots_mixed_depth(self): |
852 | + c_map = self.make_one_deep_two_prefix_map(chk_map._search_key_plain) |
853 | + c_map._dump_tree() # load everything |
854 | + key1 = c_map.key() |
855 | + key1_aa = c_map._root_node._items['aa'].key() |
856 | + key1_ad = c_map._root_node._items['ad'].key() |
857 | + |
858 | + c_map2 = self.make_one_deep_one_prefix_map(chk_map._search_key_plain) |
859 | + c_map2._dump_tree() |
860 | + key2 = c_map2.key() |
861 | + key2_a = c_map2._root_node._items['a'].key() |
862 | + key2_b = c_map2._root_node._items['b'].key() |
863 | + |
864 | + iterator = self.get_iterator([key2], [key1], chk_map._search_key_plain) |
865 | + root_results = [record.key for record in iterator._read_all_roots()] |
866 | + self.assertEqual([key2], root_results) |
867 | + # 'ad' matches exactly 'a' on the other side, so it should be removed, |
868 | + # and neither side should have it queued for walking |
869 | + self.assertEqual([], iterator._uninteresting_queue) |
870 | + self.assertEqual([key2_b], iterator._interesting_queue) |
871 | + self.assertEqual([], iterator._interesting_item_queue) |
872 | + |
873 | + iterator = self.get_iterator([key1], [key2], chk_map._search_key_plain) |
874 | + root_results = [record.key for record in iterator._read_all_roots()] |
875 | + self.assertEqual([key1], root_results) |
876 | + # Note: This is technically not the 'true minimal' set that we could |
877 | + # use The reason is that 'a' was matched exactly to 'ad' (by sha |
878 | + # sum). However, the code gets complicated in the case of more |
879 | + # than one interesting key, so for now, we live with this |
880 | + # Consider revising, though benchmarking showing it to be a |
881 | + # real-world issue should be done |
882 | + self.assertEqual([key2_a], iterator._uninteresting_queue) |
883 | + # self.assertEqual([], iterator._uninteresting_queue) |
884 | + self.assertEqual([key1_aa], iterator._interesting_queue) |
885 | + self.assertEqual([], iterator._interesting_item_queue) |
886 | + |
887 | + def test__read_all_roots_yields_extra_deep_records(self): |
888 | + # This is slightly controversial, as we will yield a chk page that we |
889 | + # might later on find out could be filtered out. (If a root node is |
890 | + # referenced deeper in the uninteresting set.) |
891 | + # However, even with stacking, we always have all chk pages that we |
892 | + # will need. So as long as we filter out the referenced keys, we'll |
893 | + # never run into problems. |
894 | + # This allows us to yield a root node record immediately, without any |
895 | + # buffering. |
896 | + c_map = self.make_two_deep_map(chk_map._search_key_plain) |
897 | + c_map._dump_tree() # load all keys |
898 | + key1 = c_map.key() |
899 | + key1_a = c_map._root_node._items['a'].key() |
900 | + c_map2 = self.get_map({ |
901 | + ('acc',): 'initial acc content', |
902 | + ('ace',): 'initial ace content', |
903 | + }, maximum_size=100) |
904 | + self.assertEqualDiff( |
905 | + "'' LeafNode\n" |
906 | + " ('acc',) 'initial acc content'\n" |
907 | + " ('ace',) 'initial ace content'\n", |
908 | + c_map2._dump_tree()) |
909 | + key2 = c_map2.key() |
910 | + iterator = self.get_iterator([key2], [key1], chk_map._search_key_plain) |
911 | + root_results = [record.key for record in iterator._read_all_roots()] |
912 | + self.assertEqual([key2], root_results) |
913 | + # However, even though we have yielded the root node to be fetched, |
914 | + # we should have enqued all of the chk pages to be walked, so that we |
915 | + # can find the keys if they are present |
916 | + self.assertEqual([key1_a], iterator._uninteresting_queue) |
917 | + self.assertEqual([(('acc',), 'initial acc content'), |
918 | + (('ace',), 'initial ace content'), |
919 | + ], iterator._interesting_item_queue) |
920 | + |
921 | + def test__read_all_roots_multiple_targets(self): |
922 | + c_map = self.make_root_only_map() |
923 | + key1 = c_map.key() |
924 | + c_map = self.make_one_deep_map() |
925 | + key2 = c_map.key() |
926 | + c_map._dump_tree() |
927 | + key2_c = c_map._root_node._items['c'].key() |
928 | + key2_d = c_map._root_node._items['d'].key() |
929 | + c_map.map(('ccc',), 'new ccc value') |
930 | + key3 = c_map._save() |
931 | + key3_c = c_map._root_node._items['c'].key() |
932 | + iterator = self.get_iterator([key2, key3], [key1], |
933 | + chk_map._search_key_plain) |
934 | + root_results = [record.key for record in iterator._read_all_roots()] |
935 | + self.assertEqual(sorted([key2, key3]), sorted(root_results)) |
936 | + self.assertEqual([], iterator._uninteresting_queue) |
937 | + # the key 'd' is interesting from key2 and key3, but should only be |
938 | + # entered into the queue 1 time |
939 | + self.assertEqual(sorted([key2_c, key3_c, key2_d]), |
940 | + sorted(iterator._interesting_queue)) |
941 | + self.assertEqual([], iterator._interesting_item_queue) |
942 | + |
943 | + def test__read_all_roots_no_uninteresting(self): |
944 | + # This is the 'initial branch' case. With nothing in the uninteresting |
945 | + # set, we can just queue up all root nodes into interesting queue, and |
946 | + # then have them fast-path flushed via _flush_interesting_queue |
947 | + c_map = self.make_two_deep_map() |
948 | + key1 = c_map.key() |
949 | + iterator = self.get_iterator([key1], [], chk_map._search_key_plain) |
950 | + root_results = [record.key for record in iterator._read_all_roots()] |
951 | + self.assertEqual([], root_results) |
952 | + self.assertEqual([], iterator._uninteresting_queue) |
953 | + self.assertEqual([key1], iterator._interesting_queue) |
954 | + self.assertEqual([], iterator._interesting_item_queue) |
955 | + |
956 | + c_map2 = self.make_one_deep_map() |
957 | + key2 = c_map2.key() |
958 | + iterator = self.get_iterator([key1, key2], [], |
959 | + chk_map._search_key_plain) |
960 | + root_results = [record.key for record in iterator._read_all_roots()] |
961 | + self.assertEqual([], root_results) |
962 | + self.assertEqual([], iterator._uninteresting_queue) |
963 | + self.assertEqual(sorted([key1, key2]), |
964 | + sorted(iterator._interesting_queue)) |
965 | + self.assertEqual([], iterator._interesting_item_queue) |
966 | + |
967 | + def test__read_all_roots_no_uninteresting_16(self): |
968 | + c_map = self.make_two_deep_map(chk_map._search_key_16) |
969 | + key1 = c_map.key() |
970 | + iterator = self.get_iterator([key1], [], chk_map._search_key_16) |
971 | + root_results = [record.key for record in iterator._read_all_roots()] |
972 | + self.assertEqual([], root_results) |
973 | + self.assertEqual([], iterator._uninteresting_queue) |
974 | + self.assertEqual([key1], iterator._interesting_queue) |
975 | + self.assertEqual([], iterator._interesting_item_queue) |
976 | + |
977 | + c_map2 = self.make_one_deep_map(chk_map._search_key_16) |
978 | + key2 = c_map2.key() |
979 | + iterator = self.get_iterator([key1, key2], [], |
980 | + chk_map._search_key_16) |
981 | + root_results = [record.key for record in iterator._read_all_roots()] |
982 | + self.assertEqual([], root_results) |
983 | + self.assertEqual([], iterator._uninteresting_queue) |
984 | + self.assertEqual(sorted([key1, key2]), |
985 | + sorted(iterator._interesting_queue)) |
986 | + self.assertEqual([], iterator._interesting_item_queue) |
987 | + |
988 | + def test__read_all_roots_multiple_uninteresting(self): |
989 | + c_map = self.make_two_deep_map() |
990 | + key1 = c_map.key() |
991 | + c_map._dump_tree() # load everything |
992 | + key1_a = c_map._root_node._items['a'].key() |
993 | + c_map.map(('ccc',), 'new ccc value') |
994 | + key2 = c_map._save() |
995 | + key2_a = c_map._root_node._items['a'].key() |
996 | + c_map.map(('add',), 'new add value') |
997 | + key3 = c_map._save() |
998 | + key3_a = c_map._root_node._items['a'].key() |
999 | + iterator = self.get_iterator([key3], [key1, key2], |
1000 | + chk_map._search_key_plain) |
1001 | + root_results = [record.key for record in iterator._read_all_roots()] |
1002 | + self.assertEqual([key3], root_results) |
1003 | + # the 'a' keys should not be queued up 2 times, since they are |
1004 | + # identical |
1005 | + self.assertEqual([key1_a], iterator._uninteresting_queue) |
1006 | + self.assertEqual([key3_a], iterator._interesting_queue) |
1007 | + self.assertEqual([], iterator._interesting_item_queue) |
1008 | + |
1009 | + def test__process_next_uninteresting_batched_no_dupes(self): |
1010 | + c_map = self.make_two_deep_map() |
1011 | + key1 = c_map.key() |
1012 | + c_map._dump_tree() # load everything |
1013 | + key1_a = c_map._root_node._items['a'].key() |
1014 | + key1_aa = c_map._root_node._items['a']._items['aa'].key() |
1015 | + key1_ab = c_map._root_node._items['a']._items['ab'].key() |
1016 | + key1_ac = c_map._root_node._items['a']._items['ac'].key() |
1017 | + key1_ad = c_map._root_node._items['a']._items['ad'].key() |
1018 | + c_map.map(('aaa',), 'new aaa value') |
1019 | + key2 = c_map._save() |
1020 | + key2_a = c_map._root_node._items['a'].key() |
1021 | + key2_aa = c_map._root_node._items['a']._items['aa'].key() |
1022 | + c_map.map(('acc',), 'new acc content') |
1023 | + key3 = c_map._save() |
1024 | + key3_a = c_map._root_node._items['a'].key() |
1025 | + key3_ac = c_map._root_node._items['a']._items['ac'].key() |
1026 | + iterator = self.get_iterator([key3], [key1, key2], |
1027 | + chk_map._search_key_plain) |
1028 | + root_results = [record.key for record in iterator._read_all_roots()] |
1029 | + self.assertEqual([key3], root_results) |
1030 | + self.assertEqual(sorted([key1_a, key2_a]), |
1031 | + sorted(iterator._uninteresting_queue)) |
1032 | + self.assertEqual([key3_a], iterator._interesting_queue) |
1033 | + self.assertEqual([], iterator._interesting_item_queue) |
1034 | + iterator._process_next_uninteresting() |
1035 | + # All of the uninteresting records should be brought in and queued up, |
1036 | + # but we should not have any duplicates |
1037 | + self.assertEqual(sorted([key1_aa, key1_ab, key1_ac, key1_ad, key2_aa]), |
1038 | + sorted(iterator._uninteresting_queue)) |
1039 | + |
1040 | + |
1041 | +class TestIterInterestingNodes(TestCaseWithExampleMaps): |
1042 | |
1043 | def get_map_key(self, a_dict, maximum_size=10): |
1044 | - c_map = self._get_map(a_dict, maximum_size=maximum_size, |
1045 | - chk_bytes=self.get_chk_bytes()) |
1046 | + c_map = self.get_map(a_dict, maximum_size=maximum_size) |
1047 | return c_map.key() |
1048 | |
1049 | - def assertIterInteresting(self, expected, interesting_keys, |
1050 | + def assertIterInteresting(self, records, items, interesting_keys, |
1051 | uninteresting_keys): |
1052 | """Check the result of iter_interesting_nodes. |
1053 | |
1054 | - :param expected: A list of (record_keys, interesting_chk_pages, |
1055 | - interesting key value pairs) |
1056 | + Note that we no longer care how many steps are taken, etc, just that |
1057 | + the right contents are returned. |
1058 | + |
1059 | + :param records: A list of record keys that should be yielded |
1060 | + :param items: A list of items (key,value) that should be yielded. |
1061 | """ |
1062 | store = self.get_chk_bytes() |
1063 | + store._search_key_func = chk_map._search_key_plain |
1064 | iter_nodes = chk_map.iter_interesting_nodes(store, interesting_keys, |
1065 | uninteresting_keys) |
1066 | - nodes = list(iter_nodes) |
1067 | - for count, (exp, act) in enumerate(izip(expected, nodes)): |
1068 | - exp_record, exp_items = exp |
1069 | - record, items = act |
1070 | - exp_tuple = (exp_record, sorted(exp_items)) |
1071 | - if record is None: |
1072 | - act_tuple = (None, sorted(items)) |
1073 | - else: |
1074 | - act_tuple = (record.key, sorted(items)) |
1075 | - self.assertEqual(exp_tuple, act_tuple, |
1076 | - 'entry %d did not match expected' % count) |
1077 | - self.assertEqual(len(expected), len(nodes)) |
1078 | + record_keys = [] |
1079 | + all_items = [] |
1080 | + for record, new_items in iter_nodes: |
1081 | + if record is not None: |
1082 | + record_keys.append(record.key) |
1083 | + if new_items: |
1084 | + all_items.extend(new_items) |
1085 | + self.assertEqual(sorted(records), sorted(record_keys)) |
1086 | + self.assertEqual(sorted(items), sorted(all_items)) |
1087 | |
1088 | def test_empty_to_one_keys(self): |
1089 | target = self.get_map_key({('a',): 'content'}) |
1090 | - self.assertIterInteresting( |
1091 | - [(target, [(('a',), 'content')]), |
1092 | - ], [target], []) |
1093 | + self.assertIterInteresting([target], |
1094 | + [(('a',), 'content')], |
1095 | + [target], []) |
1096 | |
1097 | def test_none_to_one_key(self): |
1098 | basis = self.get_map_key({}) |
1099 | target = self.get_map_key({('a',): 'content'}) |
1100 | - self.assertIterInteresting( |
1101 | - [(None, [(('a',), 'content')]), |
1102 | - (target, []), |
1103 | - ], [target], [basis]) |
1104 | + self.assertIterInteresting([target], |
1105 | + [(('a',), 'content')], |
1106 | + [target], [basis]) |
1107 | |
1108 | def test_one_to_none_key(self): |
1109 | basis = self.get_map_key({('a',): 'content'}) |
1110 | target = self.get_map_key({}) |
1111 | - self.assertIterInteresting( |
1112 | - [(target, [])], |
1113 | - [target], [basis]) |
1114 | + self.assertIterInteresting([target], |
1115 | + [], |
1116 | + [target], [basis]) |
1117 | |
1118 | def test_common_pages(self): |
1119 | basis = self.get_map_key({('a',): 'content', |
1120 | @@ -1964,10 +2548,9 @@ |
1121 | target_map._dump_tree()) |
1122 | b_key = target_map._root_node._items['b'].key() |
1123 | # This should return the root node, and the node for the 'b' key |
1124 | - self.assertIterInteresting( |
1125 | - [(target, []), |
1126 | - (b_key, [(('b',), 'other content')])], |
1127 | - [target], [basis]) |
1128 | + self.assertIterInteresting([target, b_key], |
1129 | + [(('b',), 'other content')], |
1130 | + [target], [basis]) |
1131 | |
1132 | def test_common_sub_page(self): |
1133 | basis = self.get_map_key({('aaa',): 'common', |
1134 | @@ -1991,12 +2574,11 @@ |
1135 | # The key for the internal aa node |
1136 | a_key = target_map._root_node._items['a'].key() |
1137 | # The key for the leaf aab node |
1138 | + # aaa_key = target_map._root_node._items['a']._items['aaa'].key() |
1139 | aab_key = target_map._root_node._items['a']._items['aab'].key() |
1140 | - self.assertIterInteresting( |
1141 | - [(target, []), |
1142 | - (a_key, []), |
1143 | - (aab_key, [(('aab',), 'new')])], |
1144 | - [target], [basis]) |
1145 | + self.assertIterInteresting([target, a_key, aab_key], |
1146 | + [(('aab',), 'new')], |
1147 | + [target], [basis]) |
1148 | |
1149 | def test_common_leaf(self): |
1150 | basis = self.get_map_key({}) |
1151 | @@ -2040,29 +2622,22 @@ |
1152 | a_key = target3_map._root_node._items['a'].key() |
1153 | aac_key = target3_map._root_node._items['a']._items['aac'].key() |
1154 | self.assertIterInteresting( |
1155 | - [(None, [(('aaa',), 'common')]), |
1156 | - (target1, []), |
1157 | - (target2, []), |
1158 | - (target3, []), |
1159 | - (b_key, [(('bbb',), 'new')]), |
1160 | - (a_key, []), |
1161 | - (aac_key, [(('aac',), 'other')]), |
1162 | - ], [target1, target2, target3], [basis]) |
1163 | - |
1164 | - self.assertIterInteresting( |
1165 | - [(target2, []), |
1166 | - (target3, []), |
1167 | - (b_key, [(('bbb',), 'new')]), |
1168 | - (a_key, []), |
1169 | - (aac_key, [(('aac',), 'other')]), |
1170 | - ], [target2, target3], [target1]) |
1171 | - |
1172 | - # This may be a case that we relax. A root node is a deep child of the |
1173 | - # excluded set. The cost is buffering root nodes until we have |
1174 | - # determined all possible exclusions. (Because a prefix of '', cannot |
1175 | - # be excluded.) |
1176 | - self.assertIterInteresting( |
1177 | - [], [target1], [target3]) |
1178 | + [target1, target2, target3, a_key, aac_key, b_key], |
1179 | + [(('aaa',), 'common'), (('bbb',), 'new'), (('aac',), 'other')], |
1180 | + [target1, target2, target3], [basis]) |
1181 | + |
1182 | + self.assertIterInteresting( |
1183 | + [target2, target3, a_key, aac_key, b_key], |
1184 | + [(('bbb',), 'new'), (('aac',), 'other')], |
1185 | + [target2, target3], [target1]) |
1186 | + |
1187 | + # Technically, target1 could be filtered out, but since it is a root |
1188 | + # node, we yield it immediately, rather than waiting to find out much |
1189 | + # later on. |
1190 | + self.assertIterInteresting( |
1191 | + [target1], |
1192 | + [], |
1193 | + [target1], [target3]) |
1194 | |
1195 | def test_multiple_maps(self): |
1196 | basis1 = self.get_map_key({('aaa',): 'common', |
1197 | @@ -2111,13 +2686,9 @@ |
1198 | # The key for the leaf bba node |
1199 | bba_key = target2_map._root_node._items['b']._items['bba'].key() |
1200 | self.assertIterInteresting( |
1201 | - [(target1, []), |
1202 | - (target2, []), |
1203 | - (a_key, []), |
1204 | - (b_key, []), |
1205 | - (aac_key, [(('aac',), 'target1')]), |
1206 | - (bba_key, [(('bba',), 'target2')]), |
1207 | - ], [target1, target2], [basis1, basis2]) |
1208 | + [target1, target2, a_key, aac_key, b_key, bba_key], |
1209 | + [(('aac',), 'target1'), (('bba',), 'target2')], |
1210 | + [target1, target2], [basis1, basis2]) |
1211 | |
1212 | def test_multiple_maps_overlapping_common_new(self): |
1213 | # Test that when a node found through the interesting_keys iteration |
1214 | @@ -2188,17 +2759,10 @@ |
1215 | right_map._dump_tree()) |
1216 | # Keys from the right side target - none, the root is enough. |
1217 | # Test behaviour |
1218 | - self.expectFailure("we don't properly filter different depths", |
1219 | - self.assertIterInteresting, |
1220 | - [(left, []), |
1221 | - (right, []), |
1222 | - (l_d_key, [(('ddd',), 'change')]), |
1223 | - ], [left, right], [basis]) |
1224 | self.assertIterInteresting( |
1225 | - [(left, []), |
1226 | - (right, []), |
1227 | - (l_d_key, [(('ddd',), 'change')]), |
1228 | - ], [left, right], [basis]) |
1229 | + [right, left, l_d_key], |
1230 | + [(('ddd',), 'change')], |
1231 | + [left, right], [basis]) |
1232 | |
1233 | def test_multiple_maps_similar(self): |
1234 | # We want to have a depth=2 tree, with multiple entries in each leaf |
1235 | @@ -2259,8 +2823,6 @@ |
1236 | r_a_key = right_map._root_node._items['a'].key() |
1237 | r_c_key = right_map._root_node._items['c'].key() |
1238 | self.assertIterInteresting( |
1239 | - [(left, []), |
1240 | - (right, []), |
1241 | - (l_a_key, [(('abb',), 'changed left')]), |
1242 | - (r_c_key, [(('cbb',), 'changed right')]), |
1243 | - ], [left, right], [basis]) |
1244 | + [right, left, l_a_key, r_c_key], |
1245 | + [(('abb',), 'changed left'), (('cbb',), 'changed right')], |
1246 | + [left, right], [basis]) |
This is a slightly better fix for bug #390563. The main changes are:
1) Rewrite iter_interestin g_nodes using a class, rather than maintaining state between functions
2) Split up the processing of root nodes into:
a) Read all uninteresting roots, and find uninteresting chks and items
b) Filter out any interesting root keys that were found in (a)
c) Read all remaining root pages and yield them
d) Find interesting references that are not in (a) (and interesting items)
Note the search key prefixes for these references and items, and queue
them up to be returned later.
e) Filter out uninteresting chk pages based on the search key prefixes in (d)
(d) and (e) are the main differences.
For (d) Robert found an edge case where we could return a non-interesting item (which would, in turn, be a text key, which would then be a failure for fetch) when one side had an immediate leaf node, and the other side had a deep tree. The only real change is to just defer returning interesting *items* until all uninteresting pages have been read.
(e) Before we used the 'intersection of interesting references', whereas now we filter simply based on prefix. Likely to be a small win if there are lots of uninteresting pages (lots of merge parents that are not being transmitted).
3) After processing root nodes
a) Process the uninteresing queue until completion
b) Process the interesting queue, filtering out all uninteresting items.
(some minor things here, like doing batch '.difference()' checks rather than .difference on every record processed)
I looked at trying to use a heapq to unify things more with 'iter_changes' but the constraints are *very* different.
1) CHKMap. iter_changes( ) is symmetric. In that we need to return a record for (source, target, extra), for iter_interestin g_nodes, we *only* need to return information for the 'target' side. iter_changes( ) only ever effects 2 records, a single source and a single target. iter_interestin g_nodes is looking at a many-to-many comparison. iter_changes( ) only cares about the actual keys, and not about sending the raw byte records to be transmitted for fetch. g_nodes is much more concerned about batching because of its many-to-many nature, and the fact it is used for fetch. We really don't want to be doing a round trip per page.
2) CHKMap.
3) CHKMap.
4) iter_interestin
5) The only gain for doing heapq is if we could figure out the algorithm for filtering out nodes that we thought we would want to read, but really don't. However,
a) I do this at the root level, because it is fairly straightforward. So any further improvements only effect trees that are deeper than 2 (1 root, 1+ internal, 1 leaf), however this only happens for trees > 50k items.
b) Reading extra chk pages isn't "incorrect", just "inefficient" (the stacking code ensures all referenced chk pages are present in the stacked-on branch, just like we ensure compression texts.)
c) The best I could come up with would be to split the reads up by prefix. So you would page in the 'a' references, then the 'b' references, etc. However, that forces us to split the requests up even further, rather than larger batches of requests. (more...