Merge lp:~jameinel/bzr/1.19-known-graph-sorted into lp:~bzr/bzr/trunk-old

Proposed by John A Meinel
Status: Merged
Merged at revision: not available
Proposed branch: lp:~jameinel/bzr/1.19-known-graph-sorted
Merge into: lp:~bzr/bzr/trunk-old
Diff against target: 1973 lines
To merge this branch: bzr merge lp:~jameinel/bzr/1.19-known-graph-sorted
Reviewer Review Type Date Requested Status
Gary van der Merwe Abstain
Vincent Ladeuil Approve
Review via email: mp+10293@code.launchpad.net
To post a comment you must log in.
Revision history for this message
John A Meinel (jameinel) wrote :

This change finally brings the new ancestry extraction code up all the way so that 'bzr log' gets to use it.

It adds a member VersionedFiles.get_known_graph_ancestry(keys) which returns a KnownGraph instance.

It also implements 'merge_sort' in pyrex code on the KnownGraph object. At this point, KnownGraph.merge_sort() is quite fast, taking about 40ms to merge_sort all of bzr.dev and another 50ms to build up the KnownGraph object.

Combined with the improved extraction of ancestry, this brings "bzr log -n0 -r-10..-1" on bzr.dev down from 2.5s down to right at 1.0s for me.

As near as I can tell, the big win for doing merge_sort on the KnownGraph object is that you get to avoid a lot of dict lookup calls. 40ms for 25.6k keys is 1.5us per key. And doing a dict lookup to get the parents costs 10ms overall (.4us per key). It also brings the time for 'merge_sort' on the OOo tree down to 1.0s.

It also adds KnownGraph.topo_sort(), which turns out to only take around 10ms for all of bzr.dev (on top of the 50ms to build the KnownGraph data structure).

Because I already have an object model internally, I went ahead and exposed KnownGraph.merge_sort() as returning objects, rather than tuples. I think the api is going to be a lot easier to use, and none of my timings so far show an advantage to the tuple version. (It could be because the objects are compiled making getattr() faster...)

This changes tsort.topo_sort() to just be a thunk over to KnownGraph(parent_map).topo_sort(), as it is still faster than the fastest python implementation. (Though the pure python form is actually slower because of the overhead of building the KG object. I'm not worried, as it is still faster than our existing topo_sort implementation.)

I was thinking to do the same thing for 'merge_sort' but I took this opportunity to break the api, getting rid of "sequence_number", using an object model, ignoring 'mainline_revisions', etc. Note that only semi-deprecated code uses mainline_revisions anyway, and it really shouldn't be anymore. (It was back when we allowed Branch.revision_history() != lefthand_history.)

I also plan on updating KnownGraph with an 'add_node()' function, so that its implementation of 'merge_sort' can be used for annotate (which sometimes annotates the working tree). It shouldn't be hard to do.

Revision history for this message
Robert Collins (lifeless) wrote :

What impact will this have on things like my all-ubuntu repository (16K
unrelated branches in one repo) ?

-Rob

Revision history for this message
Vincent Ladeuil (vila) wrote :

>>>>> "robert" == Robert Collins writes:

    > What impact will this have on things like my
    > all-ubuntu repository (16K unrelated branches in one
    > repo) ?

Just you tells us ! :-)

But from the discussion with John, it should either improves
things (I'm 90% confident here)... or provide us with very nice
data ! I, for one, will love to have such a repo around to play
with...

@John, how did you measure your progress ? Still using
time_graph.py ? Is it time to enhance it ?

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Vincent Ladeuil wrote:
>>>>>> "robert" == Robert Collins writes:
>
> > What impact will this have on things like my
> > all-ubuntu repository (16K unrelated branches in one
> > repo) ?

It should perform approximately the same or better.

The 'find_ancestors()' code doesn't grab arbitrary node => parent
mappings, only ones that are in the ancestry of the keys that were
requested.

As such it is the same as repeated get_parent_map() calls, just without
the repeats.

The merge_sort() code is simply the same algorithm, just 3-7x faster
(depending on whether you count the time to build the KnownGraph).

Again, all of this code has the same "look at only the ancestry
requested" that the current code has. So there shouldn't be a blowout
from having lots of unrelated history. It just may not be a whole lot
faster because the other history is 'in the way'.

>
> Just you tells us ! :-)
>
> But from the discussion with John, it should either improves
> things (I'm 90% confident here)... or provide us with very nice
> data ! I, for one, will love to have such a repo around to play
> with...
>
> @John, how did you measure your progress ? Still using
> time_graph.py ? Is it time to enhance it ?

No. I did have another helper here, but mostly this is tested with:

$ PYTHONPATH=../bzr/work TIMEIT -s "from bzrlib import branch,
repository, tsort, graph
b = branch.Branch.open('bzr-2a-extra/bzr.dev')
b.lock_read()
l_rev = b.last_revision()
p_map, missing =
b.repository.revisions._index._graph_index.find_ancestry([(l_rev,)], 0)
b.unlock()
" "kg = graph.KnownGraph(p_map);
for n in kg.merge_sort((l_rev,)):
  n.key, n.revno, n.end_of_merge, n.merge_depth
"

Or just simply running:
  time bzr log -n0 -r -10..-1 >/dev/null
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkqKrUUACgkQJdeBCYSNAAMeogCfTHUrwZBQq2HWwKvQ3cLzrfeW
50gAoJxEtAiADlvkdtA/sP+ixeBfJebv
=XbkW
-----END PGP SIGNATURE-----

Revision history for this message
Vincent Ladeuil (vila) wrote :

Nice job !

In tsort.py, you can get rid of 'from collections import deque'.

Nothing else to ask for :) Nice work on the tests too.

review: Approve
Revision history for this message
Gary van der Merwe (garyvdm) wrote :

Note: This branch fixes Bug 350796.

review: Abstain
Revision history for this message
Gary van der Merwe (garyvdm) wrote :

> Note: This branch fixes Bug 350796.

Sorry - I was wrong about that. Bug 350796 was fixed by rev 4260.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'NEWS'
--- NEWS 2009-08-18 20:05:30 +0000
+++ NEWS 2009-08-19 16:35:14 +0000
@@ -9,22 +9,6 @@
9In Development9In Development
10##############10##############
1111
12Bug Fixes
13*********
14
15* Fix a test failure on karmic by making a locale test more robust.
16 (Vincent Ladeuil, #413514)
17
18Improvements
19************
20
21* A better description of the platform is shown in crash tracebacks, ``bzr
22 --version`` and ``bzr selftest``.
23 (Martin Pool, #409137)
24
25bzr 1.18
26########
27
28Compatibility Breaks12Compatibility Breaks
29********************13********************
3014
@@ -83,6 +67,9 @@
83 version-3 protocol, but it does cause test suite failures when testing67 version-3 protocol, but it does cause test suite failures when testing
84 downlevel protocol behaviour. (Robert Collins)68 downlevel protocol behaviour. (Robert Collins)
8569
70* Fix a test failure on karmic by making a locale test more robust.
71 (Vincent Ladeuil, #413514)
72
86* Fixed "Pack ... already exists" error when running ``bzr pack`` on a73* Fixed "Pack ... already exists" error when running ``bzr pack`` on a
87 fully packed 2a repository. (Andrew Bennetts, #382463)74 fully packed 2a repository. (Andrew Bennetts, #382463)
8875
@@ -109,12 +96,21 @@
109Improvements96Improvements
110************97************
11198
99* A better description of the platform is shown in crash tracebacks, ``bzr
100 --version`` and ``bzr selftest``.
101 (Martin Pool, #409137)
102
112* Cross-format fetches (such as between 1.9-rich-root and 2a) via the103* Cross-format fetches (such as between 1.9-rich-root and 2a) via the
113 smart server are more efficient now. They send inventory deltas rather104 smart server are more efficient now. They send inventory deltas rather
114 than full inventories. The smart server has two new requests,105 than full inventories. The smart server has two new requests,
115 ``Repository.get_stream_1.19`` and ``Repository.insert_stream_1.19`` to106 ``Repository.get_stream_1.19`` and ``Repository.insert_stream_1.19`` to
116 support this. (Andrew Bennetts, #374738, #385826)107 support this. (Andrew Bennetts, #374738, #385826)
117108
109* Extracting the full ancestry and computing the ``merge_sort`` is now
110 significantly faster. This effects things like ``bzr log -n0``. (For
111 example, ``bzr log -r -10..-1 -n0 bzr.dev`` is 2.5s down to 1.0s.
112 (John Arbash Meinel)
113
118Documentation114Documentation
119*************115*************
120116
@@ -136,15 +132,20 @@
136 friendly StreamSource, which now automatically does the same132 friendly StreamSource, which now automatically does the same
137 transformations as InterDifferingSerializer. (Andrew Bennetts)133 transformations as InterDifferingSerializer. (Andrew Bennetts)
138134
135* ``KnownGraph`` now has a ``.topo_sort`` and ``.merge_sort`` member which
136 are implemented in pyrex and significantly faster. This is exposed along
137 with ``CombinedGraphIndex.find_ancestry()`` as
138 ``VersionedFiles.get_known_graph_ancestry(keys)``.
139 (John Arbash Meinel)
140
139* RemoteBranch.open now honours ignore_fallbacks correctly on bzr-v2141* RemoteBranch.open now honours ignore_fallbacks correctly on bzr-v2
140 protocols. (Robert Collins)142 protocols. (Robert Collins)
141143
142* The index code now has some specialized routines to extract the full144* The index code now has some specialized routines to extract the full
143 ancestry of a key in a more efficient manner.145 ancestry of a key in a more efficient manner.
144 ``CombinedGraphIndex.find_ancestry()``. This is not fully exposed to the146 ``CombinedGraphIndex.find_ancestry()``. (Time to get ancestry for
145 higher levels yet, but has the potential to improve grabbing the full147 bzr.dev drops from 1.5s down to 300ms. For OOo from 33s => 10.5s) (John
146 ancestry tremendously. (Time to get ancestry for bzr.dev drops from 1.5s148 Arbash Meinel)
147 down to 300ms. For OOo from 33s => 10.5s) (John Arbash Meinel)
148149
149Testing150Testing
150*******151*******
151152
=== modified file 'bzrlib/_known_graph_py.py'
--- bzrlib/_known_graph_py.py 2009-07-08 20:58:10 +0000
+++ bzrlib/_known_graph_py.py 2009-08-19 16:35:14 +0000
@@ -18,6 +18,7 @@
18"""18"""
1919
20from bzrlib import (20from bzrlib import (
21 errors,
21 revision,22 revision,
22 )23 )
2324
@@ -40,6 +41,18 @@
40 self.parent_keys, self.child_keys)41 self.parent_keys, self.child_keys)
4142
4243
44class _MergeSortNode(object):
45 """Information about a specific node in the merge graph."""
46
47 __slots__ = ('key', 'merge_depth', 'revno', 'end_of_merge')
48
49 def __init__(self, key, merge_depth, revno, end_of_merge):
50 self.key = key
51 self.merge_depth = merge_depth
52 self.revno = revno
53 self.end_of_merge = end_of_merge
54
55
43class KnownGraph(object):56class KnownGraph(object):
44 """This is a class which assumes we already know the full graph."""57 """This is a class which assumes we already know the full graph."""
4558
@@ -171,3 +184,51 @@
171 self._known_heads[heads_key] = heads184 self._known_heads[heads_key] = heads
172 return heads185 return heads
173186
187 def topo_sort(self):
188 """Return the nodes in topological order.
189
190 All parents must occur before all children.
191 """
192 for node in self._nodes.itervalues():
193 if node.gdfo is None:
194 raise errors.GraphCycleError(self._nodes)
195 pending = self._find_tails()
196 pending_pop = pending.pop
197 pending_append = pending.append
198
199 topo_order = []
200 topo_order_append = topo_order.append
201
202 num_seen_parents = dict.fromkeys(self._nodes, 0)
203 while pending:
204 node = pending_pop()
205 if node.parent_keys is not None:
206 # We don't include ghost parents
207 topo_order_append(node.key)
208 for child_key in node.child_keys:
209 child_node = self._nodes[child_key]
210 seen_parents = num_seen_parents[child_key] + 1
211 if seen_parents == len(child_node.parent_keys):
212 # All parents have been processed, enqueue this child
213 pending_append(child_node)
214 # This has been queued up, stop tracking it
215 del num_seen_parents[child_key]
216 else:
217 num_seen_parents[child_key] = seen_parents
218 # We started from the parents, so we don't need to do anymore work
219 return topo_order
220
221 def merge_sort(self, tip_key):
222 """Compute the merge sorted graph output."""
223 from bzrlib import tsort
224 as_parent_map = dict((node.key, node.parent_keys)
225 for node in self._nodes.itervalues()
226 if node.parent_keys is not None)
227 # We intentionally always generate revnos and never force the
228 # mainline_revisions
229 # Strip the sequence_number that merge_sort generates
230 return [_MergeSortNode(key, merge_depth, revno, end_of_merge)
231 for _, key, merge_depth, revno, end_of_merge
232 in tsort.merge_sort(as_parent_map, tip_key,
233 mainline_revisions=None,
234 generate_revno=True)]
174235
=== modified file 'bzrlib/_known_graph_pyx.pyx'
--- bzrlib/_known_graph_pyx.pyx 2009-07-14 16:10:32 +0000
+++ bzrlib/_known_graph_pyx.pyx 2009-08-19 16:35:14 +0000
@@ -44,8 +44,9 @@
4444
45 void Py_INCREF(object)45 void Py_INCREF(object)
4646
47import gc
4748
48from bzrlib import revision49from bzrlib import errors, revision
4950
50cdef object NULL_REVISION51cdef object NULL_REVISION
51NULL_REVISION = revision.NULL_REVISION52NULL_REVISION = revision.NULL_REVISION
@@ -59,10 +60,9 @@
59 cdef object children60 cdef object children
60 cdef public long gdfo61 cdef public long gdfo
61 cdef int seen62 cdef int seen
63 cdef object extra
6264
63 def __init__(self, key):65 def __init__(self, key):
64 cdef int i
65
66 self.key = key66 self.key = key
67 self.parents = None67 self.parents = None
6868
@@ -70,6 +70,7 @@
70 # Greatest distance from origin70 # Greatest distance from origin
71 self.gdfo = -171 self.gdfo = -1
72 self.seen = 072 self.seen = 0
73 self.extra = None
7374
74 property child_keys:75 property child_keys:
75 def __get__(self):76 def __get__(self):
@@ -115,9 +116,7 @@
115 return <_KnownGraphNode>temp_node116 return <_KnownGraphNode>temp_node
116117
117118
118# TODO: slab allocate all _KnownGraphNode objects.119cdef class _MergeSorter
119# We already know how many we are going to need, except for a couple of
120# ghosts that could be allocated on demand.
121120
122cdef class KnownGraph:121cdef class KnownGraph:
123 """This is a class which assumes we already know the full graph."""122 """This is a class which assumes we already know the full graph."""
@@ -136,6 +135,9 @@
136 # Maps {sorted(revision_id, revision_id): heads}135 # Maps {sorted(revision_id, revision_id): heads}
137 self._known_heads = {}136 self._known_heads = {}
138 self.do_cache = int(do_cache)137 self.do_cache = int(do_cache)
138 # TODO: consider disabling gc since we are allocating a lot of nodes
139 # that won't be collectable anyway. real world testing has not
140 # shown a specific impact, yet.
139 self._initialize_nodes(parent_map)141 self._initialize_nodes(parent_map)
140 self._find_gdfo()142 self._find_gdfo()
141143
@@ -183,11 +185,16 @@
183 parent_keys = <object>temp_parent_keys185 parent_keys = <object>temp_parent_keys
184 num_parent_keys = len(parent_keys)186 num_parent_keys = len(parent_keys)
185 node = self._get_or_create_node(key)187 node = self._get_or_create_node(key)
186 # We know how many parents, so we could pre allocate an exact sized188 # We know how many parents, so we pre allocate the tuple
187 # tuple here
188 parent_nodes = PyTuple_New(num_parent_keys)189 parent_nodes = PyTuple_New(num_parent_keys)
189 # We use iter here, because parent_keys maybe be a list or tuple
190 for pos2 from 0 <= pos2 < num_parent_keys:190 for pos2 from 0 <= pos2 < num_parent_keys:
191 # Note: it costs us 10ms out of 40ms to lookup all of these
192 # parents, it doesn't seem to be an allocation overhead,
193 # but rather a lookup overhead. There doesn't seem to be
194 # a way around it, and that is one reason why
195 # KnownGraphNode maintains a direct pointer to the parent
196 # node.
197 # We use [] because parent_keys may be a tuple or list
191 parent_node = self._get_or_create_node(parent_keys[pos2])198 parent_node = self._get_or_create_node(parent_keys[pos2])
192 # PyTuple_SET_ITEM will steal a reference, so INCREF first199 # PyTuple_SET_ITEM will steal a reference, so INCREF first
193 Py_INCREF(parent_node)200 Py_INCREF(parent_node)
@@ -335,3 +342,353 @@
335 if self.do_cache:342 if self.do_cache:
336 PyDict_SetItem(self._known_heads, heads_key, heads)343 PyDict_SetItem(self._known_heads, heads_key, heads)
337 return heads344 return heads
345
346 def topo_sort(self):
347 """Return the nodes in topological order.
348
349 All parents must occur before all children.
350 """
351 # This is, for the most part, the same iteration order that we used for
352 # _find_gdfo, consider finding a way to remove the duplication
353 # In general, we find the 'tails' (nodes with no parents), and then
354 # walk to the children. For children that have all of their parents
355 # yielded, we queue up the child to be yielded as well.
356 cdef _KnownGraphNode node
357 cdef _KnownGraphNode child
358 cdef PyObject *temp
359 cdef Py_ssize_t pos
360 cdef int replace
361 cdef Py_ssize_t last_item
362
363 pending = self._find_tails()
364 if PyList_GET_SIZE(pending) == 0 and len(self._nodes) > 0:
365 raise errors.GraphCycleError(self._nodes)
366
367 topo_order = []
368
369 last_item = PyList_GET_SIZE(pending) - 1
370 while last_item >= 0:
371 # Avoid pop followed by push, instead, peek, and replace
372 # timing shows this is 930ms => 770ms for OOo
373 node = _get_list_node(pending, last_item)
374 last_item = last_item - 1
375 if node.parents is not None:
376 # We don't include ghost parents
377 PyList_Append(topo_order, node.key)
378 for pos from 0 <= pos < PyList_GET_SIZE(node.children):
379 child = _get_list_node(node.children, pos)
380 if child.gdfo == -1:
381 # We know we have a graph cycle because a node has a parent
382 # which we couldn't find
383 raise errors.GraphCycleError(self._nodes)
384 child.seen = child.seen + 1
385 if child.seen == PyTuple_GET_SIZE(child.parents):
386 # All parents of this child have been yielded, queue this
387 # one to be yielded as well
388 last_item = last_item + 1
389 if last_item < PyList_GET_SIZE(pending):
390 Py_INCREF(child) # SetItem steals a ref
391 PyList_SetItem(pending, last_item, child)
392 else:
393 PyList_Append(pending, child)
394 # We have queued this node, we don't need to track it
395 # anymore
396 child.seen = 0
397 # We started from the parents, so we don't need to do anymore work
398 return topo_order
399
400
401 def merge_sort(self, tip_key):
402 """Compute the merge sorted graph output."""
403 cdef _MergeSorter sorter
404
405 # TODO: consider disabling gc since we are allocating a lot of nodes
406 # that won't be collectable anyway. real world testing has not
407 # shown a specific impact, yet.
408 sorter = _MergeSorter(self, tip_key)
409 return sorter.topo_order()
410
411
412cdef class _MergeSortNode:
413 """Tracks information about a node during the merge_sort operation."""
414
415 # Public api
416 cdef public object key
417 cdef public long merge_depth
418 cdef public object end_of_merge # True/False Is this the end of the current merge
419
420 # Private api, used while computing the information
421 cdef _KnownGraphNode left_parent
422 cdef _KnownGraphNode left_pending_parent
423 cdef object pending_parents # list of _KnownGraphNode for non-left parents
424 cdef long _revno_first
425 cdef long _revno_second
426 cdef long _revno_last
427 # TODO: turn these into flag/bit fields rather than individual members
428 cdef int is_first_child # Is this the first child?
429 cdef int seen_by_child # A child node has seen this parent
430 cdef int completed # Fully Processed
431
432 def __init__(self, key):
433 self.key = key
434 self.merge_depth = -1
435 self.left_parent = None
436 self.left_pending_parent = None
437 self.pending_parents = None
438 self._revno_first = -1
439 self._revno_second = -1
440 self._revno_last = -1
441 self.is_first_child = 0
442 self.seen_by_child = 0
443 self.completed = 0
444
445 def __repr__(self):
446 return '%s(depth:%s rev:%s,%s,%s first:%s seen:%s)' % (self.__class__.__name__,
447 self.merge_depth,
448 self._revno_first, self._revno_second, self._revno_last,
449 self.is_first_child, self.seen_by_child)
450
451 cdef int has_pending_parents(self):
452 if self.left_pending_parent is not None or self.pending_parents:
453 return 1
454 return 0
455
456 cdef object _revno(self):
457 if self._revno_first == -1:
458 if self._revno_second != -1:
459 raise RuntimeError('Something wrong with: %s' % (self,))
460 return (self._revno_last,)
461 else:
462 return (self._revno_first, self._revno_second, self._revno_last)
463
464 property revno:
465 def __get__(self):
466 return self._revno()
467
468
469cdef class _MergeSorter:
470 """This class does the work of computing the merge_sort ordering.
471
472 We have some small advantages, in that we get all the extra information
473 that KnownGraph knows, like knowing the child lists, etc.
474 """
475
476 # Current performance numbers for merge_sort(bzr_dev_parent_map):
477 # 302ms tsort.merge_sort()
478 # 91ms graph.KnownGraph().merge_sort()
479 # 40ms kg.merge_sort()
480
481 cdef KnownGraph graph
482 cdef object _depth_first_stack # list
483 cdef Py_ssize_t _last_stack_item # offset to last item on stack
484 # cdef object _ms_nodes # dict of key => _MergeSortNode
485 cdef object _revno_to_branch_count # {revno => num child branches}
486 cdef object _scheduled_nodes # List of nodes ready to be yielded
487
488 def __init__(self, known_graph, tip_key):
489 cdef _KnownGraphNode node
490
491 self.graph = known_graph
492 # self._ms_nodes = {}
493 self._revno_to_branch_count = {}
494 self._depth_first_stack = []
495 self._last_stack_item = -1
496 self._scheduled_nodes = []
497 if (tip_key is not None and tip_key != NULL_REVISION
498 and tip_key != (NULL_REVISION,)):
499 node = self.graph._nodes[tip_key]
500 self._get_ms_node(node)
501 self._push_node(node, 0)
502
503 cdef _MergeSortNode _get_ms_node(self, _KnownGraphNode node):
504 cdef PyObject *temp_node
505 cdef _MergeSortNode ms_node
506
507 if node.extra is None:
508 ms_node = _MergeSortNode(node.key)
509 node.extra = ms_node
510 else:
511 ms_node = <_MergeSortNode>node.extra
512 return ms_node
513
514 cdef _push_node(self, _KnownGraphNode node, long merge_depth):
515 cdef _KnownGraphNode parent_node
516 cdef _MergeSortNode ms_node, ms_parent_node
517 cdef Py_ssize_t pos
518
519 ms_node = self._get_ms_node(node)
520 ms_node.merge_depth = merge_depth
521 if PyTuple_GET_SIZE(node.parents) > 0:
522 parent_node = _get_parent(node.parents, 0)
523 ms_node.left_parent = parent_node
524 ms_node.left_pending_parent = parent_node
525 if PyTuple_GET_SIZE(node.parents) > 1:
526 ms_node.pending_parents = []
527 for pos from 1 <= pos < PyTuple_GET_SIZE(node.parents):
528 parent_node = _get_parent(node.parents, pos)
529 if parent_node.parents is None: # ghost
530 continue
531 PyList_Append(ms_node.pending_parents, parent_node)
532
533 ms_node.is_first_child = 1
534 if ms_node.left_parent is not None:
535 ms_parent_node = self._get_ms_node(ms_node.left_parent)
536 if ms_parent_node.seen_by_child:
537 ms_node.is_first_child = 0
538 ms_parent_node.seen_by_child = 1
539 self._last_stack_item = self._last_stack_item + 1
540 if self._last_stack_item < PyList_GET_SIZE(self._depth_first_stack):
541 Py_INCREF(node) # SetItem steals a ref
542 PyList_SetItem(self._depth_first_stack, self._last_stack_item,
543 node)
544 else:
545 PyList_Append(self._depth_first_stack, node)
546
547 cdef _pop_node(self):
548 cdef PyObject *temp
549 cdef _MergeSortNode ms_node, ms_parent_node, ms_prev_node
550 cdef _KnownGraphNode node, parent_node, prev_node
551
552 node = _get_list_node(self._depth_first_stack, self._last_stack_item)
553 ms_node = <_MergeSortNode>node.extra
554 self._last_stack_item = self._last_stack_item - 1
555 if ms_node.left_parent is not None:
556 # Assign the revision number from the left-hand parent
557 ms_parent_node = <_MergeSortNode>ms_node.left_parent.extra
558 if ms_node.is_first_child:
559 # First child just increments the final digit
560 ms_node._revno_first = ms_parent_node._revno_first
561 ms_node._revno_second = ms_parent_node._revno_second
562 ms_node._revno_last = ms_parent_node._revno_last + 1
563 else:
564 # Not the first child, make a new branch
565 # (mainline_revno, branch_count, 1)
566 if ms_parent_node._revno_first == -1:
567 # Mainline ancestor, the increment is on the last digit
568 base_revno = ms_parent_node._revno_last
569 else:
570 base_revno = ms_parent_node._revno_first
571 temp = PyDict_GetItem(self._revno_to_branch_count,
572 base_revno)
573 if temp == NULL:
574 branch_count = 1
575 else:
576 branch_count = (<object>temp) + 1
577 PyDict_SetItem(self._revno_to_branch_count, base_revno,
578 branch_count)
579 ms_node._revno_first = base_revno
580 ms_node._revno_second = branch_count
581 ms_node._revno_last = 1
582 else:
583 temp = PyDict_GetItem(self._revno_to_branch_count, 0)
584 if temp == NULL:
585 # The first root node doesn't have a 3-digit revno
586 root_count = 0
587 ms_node._revno_first = -1
588 ms_node._revno_second = -1
589 ms_node._revno_last = 1
590 else:
591 root_count = (<object>temp) + 1
592 ms_node._revno_first = 0
593 ms_node._revno_second = root_count
594 ms_node._revno_last = 1
595 PyDict_SetItem(self._revno_to_branch_count, 0, root_count)
596 ms_node.completed = 1
597 if PyList_GET_SIZE(self._scheduled_nodes) == 0:
598 # The first scheduled node is always the end of merge
599 ms_node.end_of_merge = True
600 else:
601 prev_node = _get_list_node(self._scheduled_nodes,
602 PyList_GET_SIZE(self._scheduled_nodes) - 1)
603 ms_prev_node = <_MergeSortNode>prev_node.extra
604 if ms_prev_node.merge_depth < ms_node.merge_depth:
605 # The previously pushed node is to our left, so this is the end
606 # of this right-hand chain
607 ms_node.end_of_merge = True
608 elif (ms_prev_node.merge_depth == ms_node.merge_depth
609 and prev_node not in node.parents):
610 # The next node is not a direct parent of this node
611 ms_node.end_of_merge = True
612 else:
613 ms_node.end_of_merge = False
614 PyList_Append(self._scheduled_nodes, node)
615
616 cdef _schedule_stack(self):
617 cdef _KnownGraphNode last_node, next_node
618 cdef _MergeSortNode ms_node, ms_last_node, ms_next_node
619 cdef long next_merge_depth
620 ordered = []
621 while self._last_stack_item >= 0:
622 # Peek at the last item on the stack
623 last_node = _get_list_node(self._depth_first_stack,
624 self._last_stack_item)
625 if last_node.gdfo == -1:
626 # if _find_gdfo skipped a node, that means there is a graph
627 # cycle, error out now
628 raise errors.GraphCycleError(self.graph._nodes)
629 ms_last_node = <_MergeSortNode>last_node.extra
630 if not ms_last_node.has_pending_parents():
631 # Processed all parents, pop this node
632 self._pop_node()
633 continue
634 while ms_last_node.has_pending_parents():
635 if ms_last_node.left_pending_parent is not None:
636 # recurse depth first into the primary parent
637 next_node = ms_last_node.left_pending_parent
638 ms_last_node.left_pending_parent = None
639 else:
640 # place any merges in right-to-left order for scheduling
641 # which gives us left-to-right order after we reverse
642 # the scheduled queue.
643 # Note: This has the effect of allocating common-new
644 # revisions to the right-most subtree rather than the
645 # left most, which will display nicely (you get
646 # smaller trees at the top of the combined merge).
647 next_node = ms_last_node.pending_parents.pop()
648 ms_next_node = self._get_ms_node(next_node)
649 if ms_next_node.completed:
650 # this parent was completed by a child on the
651 # call stack. skip it.
652 continue
653 # otherwise transfer it from the source graph into the
654 # top of the current depth first search stack.
655
656 if next_node is ms_last_node.left_parent:
657 next_merge_depth = ms_last_node.merge_depth
658 else:
659 next_merge_depth = ms_last_node.merge_depth + 1
660 self._push_node(next_node, next_merge_depth)
661 # and do not continue processing parents until this 'call'
662 # has recursed.
663 break
664
665 cdef topo_order(self):
666 cdef _MergeSortNode ms_node
667 cdef _KnownGraphNode node
668 cdef Py_ssize_t pos
669 cdef PyObject *temp_key, *temp_node
670
671 # Note: allocating a _MergeSortNode and deallocating it for all nodes
672 # costs approx 8.52ms (21%) of the total runtime
673 # We might consider moving the attributes into the base
674 # KnownGraph object.
675 self._schedule_stack()
676
677 # We've set up the basic schedule, now we can continue processing the
678 # output.
679 # Note: This final loop costs us 40.0ms => 28.8ms (11ms, 25%) on
680 # bzr.dev, to convert the internal Object representation into a
681 # Tuple representation...
682 # 2ms is walking the data and computing revno tuples
683 # 7ms is computing the return tuple
684 # 4ms is PyList_Append()
685 ordered = []
686 # output the result in reverse order, and separate the generated info
687 for pos from PyList_GET_SIZE(self._scheduled_nodes) > pos >= 0:
688 node = _get_list_node(self._scheduled_nodes, pos)
689 ms_node = <_MergeSortNode>node.extra
690 PyList_Append(ordered, ms_node)
691 node.extra = None
692 # Clear out the scheduled nodes now that we're done
693 self._scheduled_nodes = []
694 return ordered
338695
=== modified file 'bzrlib/annotate.py'
--- bzrlib/annotate.py 2009-07-08 17:09:03 +0000
+++ bzrlib/annotate.py 2009-08-19 16:35:14 +0000
@@ -188,6 +188,10 @@
188 # or something.188 # or something.
189 last_revision = current_rev.revision_id189 last_revision = current_rev.revision_id
190 # XXX: Partially Cloned from branch, uses the old_get_graph, eep.190 # XXX: Partially Cloned from branch, uses the old_get_graph, eep.
191 # XXX: The main difficulty is that we need to inject a single new node
192 # (current_rev) into the graph before it gets numbered, etc.
193 # Once KnownGraph gets an 'add_node()' function, we can use
194 # VF.get_known_graph_ancestry().
191 graph = repository.get_graph()195 graph = repository.get_graph()
192 revision_graph = dict(((key, value) for key, value in196 revision_graph = dict(((key, value) for key, value in
193 graph.iter_ancestry(current_rev.parent_ids) if value is not None))197 graph.iter_ancestry(current_rev.parent_ids) if value is not None))
194198
=== modified file 'bzrlib/branch.py'
--- bzrlib/branch.py 2009-08-17 06:22:18 +0000
+++ bzrlib/branch.py 2009-08-19 16:35:14 +0000
@@ -446,15 +446,11 @@
446 # start_revision_id.446 # start_revision_id.
447 if self._merge_sorted_revisions_cache is None:447 if self._merge_sorted_revisions_cache is None:
448 last_revision = self.last_revision()448 last_revision = self.last_revision()
449 graph = self.repository.get_graph()449 last_key = (last_revision,)
450 parent_map = dict(((key, value) for key, value in450 known_graph = self.repository.revisions.get_known_graph_ancestry(
451 graph.iter_ancestry([last_revision]) if value is not None))451 [last_key])
452 revision_graph = repository._strip_NULL_ghosts(parent_map)452 self._merge_sorted_revisions_cache = known_graph.merge_sort(
453 revs = tsort.merge_sort(revision_graph, last_revision, None,453 last_key)
454 generate_revno=True)
455 # Drop the sequence # before caching
456 self._merge_sorted_revisions_cache = [r[1:] for r in revs]
457
458 filtered = self._filter_merge_sorted_revisions(454 filtered = self._filter_merge_sorted_revisions(
459 self._merge_sorted_revisions_cache, start_revision_id,455 self._merge_sorted_revisions_cache, start_revision_id,
460 stop_revision_id, stop_rule)456 stop_revision_id, stop_rule)
@@ -470,27 +466,34 @@
470 """Iterate over an inclusive range of sorted revisions."""466 """Iterate over an inclusive range of sorted revisions."""
471 rev_iter = iter(merge_sorted_revisions)467 rev_iter = iter(merge_sorted_revisions)
472 if start_revision_id is not None:468 if start_revision_id is not None:
473 for rev_id, depth, revno, end_of_merge in rev_iter:469 for node in rev_iter:
470 rev_id = node.key[-1]
474 if rev_id != start_revision_id:471 if rev_id != start_revision_id:
475 continue472 continue
476 else:473 else:
477 # The decision to include the start or not474 # The decision to include the start or not
478 # depends on the stop_rule if a stop is provided475 # depends on the stop_rule if a stop is provided
479 rev_iter = chain(476 # so pop this node back into the iterator
480 iter([(rev_id, depth, revno, end_of_merge)]),477 rev_iter = chain(iter([node]), rev_iter)
481 rev_iter)
482 break478 break
483 if stop_revision_id is None:479 if stop_revision_id is None:
484 for rev_id, depth, revno, end_of_merge in rev_iter:480 # Yield everything
485 yield rev_id, depth, revno, end_of_merge481 for node in rev_iter:
482 rev_id = node.key[-1]
483 yield (rev_id, node.merge_depth, node.revno,
484 node.end_of_merge)
486 elif stop_rule == 'exclude':485 elif stop_rule == 'exclude':
487 for rev_id, depth, revno, end_of_merge in rev_iter:486 for node in rev_iter:
487 rev_id = node.key[-1]
488 if rev_id == stop_revision_id:488 if rev_id == stop_revision_id:
489 return489 return
490 yield rev_id, depth, revno, end_of_merge490 yield (rev_id, node.merge_depth, node.revno,
491 node.end_of_merge)
491 elif stop_rule == 'include':492 elif stop_rule == 'include':
492 for rev_id, depth, revno, end_of_merge in rev_iter:493 for node in rev_iter:
493 yield rev_id, depth, revno, end_of_merge494 rev_id = node.key[-1]
495 yield (rev_id, node.merge_depth, node.revno,
496 node.end_of_merge)
494 if rev_id == stop_revision_id:497 if rev_id == stop_revision_id:
495 return498 return
496 elif stop_rule == 'with-merges':499 elif stop_rule == 'with-merges':
@@ -499,10 +502,12 @@
499 left_parent = stop_rev.parent_ids[0]502 left_parent = stop_rev.parent_ids[0]
500 else:503 else:
501 left_parent = _mod_revision.NULL_REVISION504 left_parent = _mod_revision.NULL_REVISION
502 for rev_id, depth, revno, end_of_merge in rev_iter:505 for node in rev_iter:
506 rev_id = node.key[-1]
503 if rev_id == left_parent:507 if rev_id == left_parent:
504 return508 return
505 yield rev_id, depth, revno, end_of_merge509 yield (rev_id, node.merge_depth, node.revno,
510 node.end_of_merge)
506 else:511 else:
507 raise ValueError('invalid stop_rule %r' % stop_rule)512 raise ValueError('invalid stop_rule %r' % stop_rule)
508513
509514
=== modified file 'bzrlib/graph.py'
--- bzrlib/graph.py 2009-08-04 04:36:34 +0000
+++ bzrlib/graph.py 2009-08-19 16:35:14 +0000
@@ -21,7 +21,6 @@
21 errors,21 errors,
22 revision,22 revision,
23 trace,23 trace,
24 tsort,
25 )24 )
26from bzrlib.symbol_versioning import deprecated_function, deprecated_in25from bzrlib.symbol_versioning import deprecated_function, deprecated_in
2726
@@ -926,6 +925,7 @@
926 An ancestor may sort after a descendant if the relationship is not925 An ancestor may sort after a descendant if the relationship is not
927 visible in the supplied list of revisions.926 visible in the supplied list of revisions.
928 """927 """
928 from bzrlib import tsort
929 sorter = tsort.TopoSorter(self.get_parent_map(revisions))929 sorter = tsort.TopoSorter(self.get_parent_map(revisions))
930 return sorter.iter_topo_order()930 return sorter.iter_topo_order()
931931
932932
=== modified file 'bzrlib/groupcompress.py'
--- bzrlib/groupcompress.py 2009-08-04 04:36:34 +0000
+++ bzrlib/groupcompress.py 2009-08-19 16:35:14 +0000
@@ -62,16 +62,15 @@
62 # groupcompress ordering is approximately reverse topological,62 # groupcompress ordering is approximately reverse topological,
63 # properly grouped by file-id.63 # properly grouped by file-id.
64 per_prefix_map = {}64 per_prefix_map = {}
65 for item in parent_map.iteritems():65 for key, value in parent_map.iteritems():
66 key = item[0]
67 if isinstance(key, str) or len(key) == 1:66 if isinstance(key, str) or len(key) == 1:
68 prefix = ''67 prefix = ''
69 else:68 else:
70 prefix = key[0]69 prefix = key[0]
71 try:70 try:
72 per_prefix_map[prefix].append(item)71 per_prefix_map[prefix][key] = value
73 except KeyError:72 except KeyError:
74 per_prefix_map[prefix] = [item]73 per_prefix_map[prefix] = {key: value}
7574
76 present_keys = []75 present_keys = []
77 for prefix in sorted(per_prefix_map):76 for prefix in sorted(per_prefix_map):
@@ -1099,6 +1098,13 @@
1099 self._check_lines_not_unicode(lines)1098 self._check_lines_not_unicode(lines)
1100 self._check_lines_are_lines(lines)1099 self._check_lines_are_lines(lines)
11011100
1101 def get_known_graph_ancestry(self, keys):
1102 """Get a KnownGraph instance with the ancestry of keys."""
1103 parent_map, missing_keys = self._index._graph_index.find_ancestry(keys,
1104 0)
1105 kg = _mod_graph.KnownGraph(parent_map)
1106 return kg
1107
1102 def get_parent_map(self, keys):1108 def get_parent_map(self, keys):
1103 """Get a map of the graph parents of keys.1109 """Get a map of the graph parents of keys.
11041110
11051111
=== modified file 'bzrlib/index.py'
--- bzrlib/index.py 2009-08-13 19:56:26 +0000
+++ bzrlib/index.py 2009-08-19 16:35:14 +0000
@@ -333,6 +333,22 @@
333 if combine_backing_indices is not None:333 if combine_backing_indices is not None:
334 self._combine_backing_indices = combine_backing_indices334 self._combine_backing_indices = combine_backing_indices
335335
336 def find_ancestry(self, keys, ref_list_num):
337 """See CombinedGraphIndex.find_ancestry()"""
338 pending = set(keys)
339 parent_map = {}
340 missing_keys = set()
341 while pending:
342 next_pending = set()
343 for _, key, value, ref_lists in self.iter_entries(pending):
344 parent_keys = ref_lists[ref_list_num]
345 parent_map[key] = parent_keys
346 next_pending.update([p for p in parent_keys if p not in
347 parent_map])
348 missing_keys.update(pending.difference(parent_map))
349 pending = next_pending
350 return parent_map, missing_keys
351
336352
337class GraphIndex(object):353class GraphIndex(object):
338 """An index for data with embedded graphs.354 """An index for data with embedded graphs.
339355
=== modified file 'bzrlib/knit.py'
--- bzrlib/knit.py 2009-08-04 04:36:34 +0000
+++ bzrlib/knit.py 2009-08-19 16:35:14 +0000
@@ -1190,6 +1190,12 @@
1190 generator = _VFContentMapGenerator(self, [key])1190 generator = _VFContentMapGenerator(self, [key])
1191 return generator._get_content(key)1191 return generator._get_content(key)
11921192
1193 def get_known_graph_ancestry(self, keys):
1194 """Get a KnownGraph instance with the ancestry of keys."""
1195 parent_map, missing_keys = self._index.find_ancestry(keys)
1196 kg = _mod_graph.KnownGraph(parent_map)
1197 return kg
1198
1193 def get_parent_map(self, keys):1199 def get_parent_map(self, keys):
1194 """Get a map of the graph parents of keys.1200 """Get a map of the graph parents of keys.
11951201
@@ -2560,6 +2566,33 @@
2560 except KeyError:2566 except KeyError:
2561 raise RevisionNotPresent(key, self)2567 raise RevisionNotPresent(key, self)
25622568
2569 def find_ancestry(self, keys):
2570 """See CombinedGraphIndex.find_ancestry()"""
2571 prefixes = set(key[:-1] for key in keys)
2572 self._load_prefixes(prefixes)
2573 result = {}
2574 parent_map = {}
2575 missing_keys = set()
2576 pending_keys = list(keys)
2577 # This assumes that keys will not reference parents in a different
2578 # prefix, which is accurate so far.
2579 while pending_keys:
2580 key = pending_keys.pop()
2581 if key in parent_map:
2582 continue
2583 prefix = key[:-1]
2584 try:
2585 suffix_parents = self._kndx_cache[prefix][0][key[-1]][4]
2586 except KeyError:
2587 missing_keys.add(key)
2588 else:
2589 parent_keys = tuple([prefix + (suffix,)
2590 for suffix in suffix_parents])
2591 parent_map[key] = parent_keys
2592 pending_keys.extend([p for p in parent_keys
2593 if p not in parent_map])
2594 return parent_map, missing_keys
2595
2563 def get_parent_map(self, keys):2596 def get_parent_map(self, keys):
2564 """Get a map of the parents of keys.2597 """Get a map of the parents of keys.
25652598
@@ -3049,6 +3082,10 @@
3049 options.append('no-eol')3082 options.append('no-eol')
3050 return options3083 return options
30513084
3085 def find_ancestry(self, keys):
3086 """See CombinedGraphIndex.find_ancestry()"""
3087 return self._graph_index.find_ancestry(keys, 0)
3088
3052 def get_parent_map(self, keys):3089 def get_parent_map(self, keys):
3053 """Get a map of the parents of keys.3090 """Get a map of the parents of keys.
30543091
30553092
=== modified file 'bzrlib/missing.py'
--- bzrlib/missing.py 2009-03-23 14:59:43 +0000
+++ bzrlib/missing.py 2009-08-19 16:35:14 +0000
@@ -138,31 +138,13 @@
138 if not ancestry: #Empty ancestry, no need to do any work138 if not ancestry: #Empty ancestry, no need to do any work
139 return []139 return []
140140
141 mainline_revs, rev_nos, start_rev_id, end_rev_id = log._get_mainline_revs(141 merge_sorted_revisions = branch.iter_merge_sorted_revisions()
142 branch, None, tip_revno)
143 if not mainline_revs:
144 return []
145
146 # This asks for all mainline revisions, which is size-of-history and
147 # should be addressed (but currently the only way to get correct
148 # revnos).
149
150 # mainline_revisions always includes an extra revision at the
151 # beginning, so don't request it.
152 parent_map = dict(((key, value) for key, value
153 in graph.iter_ancestry(mainline_revs[1:])
154 if value is not None))
155 # filter out ghosts; merge_sort errors on ghosts.
156 # XXX: is this needed here ? -- vila080910
157 rev_graph = _mod_repository._strip_NULL_ghosts(parent_map)
158 # XXX: what if rev_graph is empty now ? -- vila080910
159 merge_sorted_revisions = tsort.merge_sort(rev_graph, tip,
160 mainline_revs,
161 generate_revno=True)
162 # Now that we got the correct revnos, keep only the relevant142 # Now that we got the correct revnos, keep only the relevant
163 # revisions.143 # revisions.
164 merge_sorted_revisions = [144 merge_sorted_revisions = [
165 (s, revid, n, d, e) for s, revid, n, d, e in merge_sorted_revisions145 # log.reverse_by_depth expects seq_num to be present, but it is
146 # stripped by iter_merge_sorted_revisions()
147 (0, revid, n, d, e) for revid, n, d, e in merge_sorted_revisions
166 if revid in ancestry]148 if revid in ancestry]
167 if not backward:149 if not backward:
168 merge_sorted_revisions = log.reverse_by_depth(merge_sorted_revisions)150 merge_sorted_revisions = log.reverse_by_depth(merge_sorted_revisions)
169151
=== modified file 'bzrlib/reconcile.py'
--- bzrlib/reconcile.py 2009-06-10 03:56:49 +0000
+++ bzrlib/reconcile.py 2009-08-19 16:35:14 +0000
@@ -33,7 +33,7 @@
33 repofmt,33 repofmt,
34 )34 )
35from bzrlib.trace import mutter, note35from bzrlib.trace import mutter, note
36from bzrlib.tsort import TopoSorter36from bzrlib.tsort import topo_sort
37from bzrlib.versionedfile import AdapterFactory, FulltextContentFactory37from bzrlib.versionedfile import AdapterFactory, FulltextContentFactory
3838
3939
@@ -247,8 +247,7 @@
247247
248 # we have topological order of revisions and non ghost parents ready.248 # we have topological order of revisions and non ghost parents ready.
249 self._setup_steps(len(self._rev_graph))249 self._setup_steps(len(self._rev_graph))
250 revision_keys = [(rev_id,) for rev_id in250 revision_keys = [(rev_id,) for rev_id in topo_sort(self._rev_graph)]
251 TopoSorter(self._rev_graph.items()).iter_topo_order()]
252 stream = self._change_inv_parents(251 stream = self._change_inv_parents(
253 self.inventory.get_record_stream(revision_keys, 'unordered', True),252 self.inventory.get_record_stream(revision_keys, 'unordered', True),
254 self._new_inv_parents,253 self._new_inv_parents,
@@ -378,7 +377,7 @@
378 new_inventories = self.repo._temp_inventories()377 new_inventories = self.repo._temp_inventories()
379 # we have topological order of revisions and non ghost parents ready.378 # we have topological order of revisions and non ghost parents ready.
380 graph = self.revisions.get_parent_map(self.revisions.keys())379 graph = self.revisions.get_parent_map(self.revisions.keys())
381 revision_keys = list(TopoSorter(graph).iter_topo_order())380 revision_keys = topo_sort(graph)
382 revision_ids = [key[-1] for key in revision_keys]381 revision_ids = [key[-1] for key in revision_keys]
383 self._setup_steps(len(revision_keys))382 self._setup_steps(len(revision_keys))
384 stream = self._change_inv_parents(383 stream = self._change_inv_parents(
385384
=== modified file 'bzrlib/repofmt/weaverepo.py'
--- bzrlib/repofmt/weaverepo.py 2009-08-14 11:11:29 +0000
+++ bzrlib/repofmt/weaverepo.py 2009-08-19 16:35:14 +0000
@@ -28,6 +28,7 @@
28lazy_import(globals(), """28lazy_import(globals(), """
29from bzrlib import (29from bzrlib import (
30 xml5,30 xml5,
31 graph as _mod_graph,
31 )32 )
32""")33""")
33from bzrlib import (34from bzrlib import (
@@ -663,6 +664,13 @@
663 result[key] = parents664 result[key] = parents
664 return result665 return result
665666
667 def get_known_graph_ancestry(self, keys):
668 """Get a KnownGraph instance with the ancestry of keys."""
669 keys = self.keys()
670 parent_map = self.get_parent_map(keys)
671 kg = _mod_graph.KnownGraph(parent_map)
672 return kg
673
666 def get_record_stream(self, keys, sort_order, include_delta_closure):674 def get_record_stream(self, keys, sort_order, include_delta_closure):
667 for key in keys:675 for key in keys:
668 text, parents = self._load_text_parents(key)676 text, parents = self._load_text_parents(key)
669677
=== modified file 'bzrlib/repository.py'
--- bzrlib/repository.py 2009-08-17 23:15:55 +0000
+++ bzrlib/repository.py 2009-08-19 16:35:14 +0000
@@ -4351,7 +4351,7 @@
4351 phase = 'file'4351 phase = 'file'
4352 revs = search.get_keys()4352 revs = search.get_keys()
4353 graph = self.from_repository.get_graph()4353 graph = self.from_repository.get_graph()
4354 revs = list(graph.iter_topo_order(revs))4354 revs = tsort.topo_sort(graph.get_parent_map(revs))
4355 data_to_fetch = self.from_repository.item_keys_introduced_by(revs)4355 data_to_fetch = self.from_repository.item_keys_introduced_by(revs)
4356 text_keys = []4356 text_keys = []
4357 for knit_kind, file_id, revisions in data_to_fetch:4357 for knit_kind, file_id, revisions in data_to_fetch:
43584358
=== modified file 'bzrlib/tests/__init__.py'
--- bzrlib/tests/__init__.py 2009-08-18 14:20:28 +0000
+++ bzrlib/tests/__init__.py 2009-08-19 16:35:15 +0000
@@ -3434,6 +3434,7 @@
3434 'bzrlib.tests.per_repository',3434 'bzrlib.tests.per_repository',
3435 'bzrlib.tests.per_repository_chk',3435 'bzrlib.tests.per_repository_chk',
3436 'bzrlib.tests.per_repository_reference',3436 'bzrlib.tests.per_repository_reference',
3437 'bzrlib.tests.per_versionedfile',
3437 'bzrlib.tests.per_workingtree',3438 'bzrlib.tests.per_workingtree',
3438 'bzrlib.tests.test__annotator',3439 'bzrlib.tests.test__annotator',
3439 'bzrlib.tests.test__chk_map',3440 'bzrlib.tests.test__chk_map',
@@ -3585,7 +3586,6 @@
3585 'bzrlib.tests.test_urlutils',3586 'bzrlib.tests.test_urlutils',
3586 'bzrlib.tests.test_version',3587 'bzrlib.tests.test_version',
3587 'bzrlib.tests.test_version_info',3588 'bzrlib.tests.test_version_info',
3588 'bzrlib.tests.test_versionedfile',
3589 'bzrlib.tests.test_weave',3589 'bzrlib.tests.test_weave',
3590 'bzrlib.tests.test_whitebox',3590 'bzrlib.tests.test_whitebox',
3591 'bzrlib.tests.test_win32utils',3591 'bzrlib.tests.test_win32utils',
35923592
=== modified file 'bzrlib/tests/blackbox/test_ancestry.py'
--- bzrlib/tests/blackbox/test_ancestry.py 2009-03-23 14:59:43 +0000
+++ bzrlib/tests/blackbox/test_ancestry.py 2009-08-19 16:35:15 +0000
@@ -43,9 +43,15 @@
4343
44 def _check_ancestry(self, location='', result=None):44 def _check_ancestry(self, location='', result=None):
45 out = self.run_bzr(['ancestry', location])[0]45 out = self.run_bzr(['ancestry', location])[0]
46 if result is None:46 if result is not None:
47 self.assertEqualDiff(result, out)
48 else:
49 # A2 and B1 can be in either order, because they are parallel, and
50 # thus their topological order is not defined
47 result = "A1\nB1\nA2\nA3\n"51 result = "A1\nB1\nA2\nA3\n"
48 self.assertEqualDiff(out, result)52 if result != out:
53 result = "A1\nA2\nB1\nA3\n"
54 self.assertEqualDiff(result, out)
4955
50 def test_ancestry(self):56 def test_ancestry(self):
51 """Tests 'ancestry' command"""57 """Tests 'ancestry' command"""
5258
=== renamed file 'bzrlib/tests/test_versionedfile.py' => 'bzrlib/tests/per_versionedfile.py'
--- bzrlib/tests/test_versionedfile.py 2009-08-04 04:36:34 +0000
+++ bzrlib/tests/per_versionedfile.py 2009-08-19 16:35:15 +0000
@@ -26,6 +26,7 @@
2626
27from bzrlib import (27from bzrlib import (
28 errors,28 errors,
29 graph as _mod_graph,
29 groupcompress,30 groupcompress,
30 knit as _mod_knit,31 knit as _mod_knit,
31 osutils,32 osutils,
@@ -1737,6 +1738,25 @@
1737 f.get_record_stream([key_b], 'unordered', True1738 f.get_record_stream([key_b], 'unordered', True
1738 ).next().get_bytes_as('fulltext'))1739 ).next().get_bytes_as('fulltext'))
17391740
1741 def test_get_known_graph_ancestry(self):
1742 f = self.get_versionedfiles()
1743 if not self.graph:
1744 raise TestNotApplicable('ancestry info only relevant with graph.')
1745 key_a = self.get_simple_key('a')
1746 key_b = self.get_simple_key('b')
1747 key_c = self.get_simple_key('c')
1748 # A
1749 # |\
1750 # | B
1751 # |/
1752 # C
1753 f.add_lines(key_a, [], ['\n'])
1754 f.add_lines(key_b, [key_a], ['\n'])
1755 f.add_lines(key_c, [key_a, key_b], ['\n'])
1756 kg = f.get_known_graph_ancestry([key_c])
1757 self.assertIsInstance(kg, _mod_graph.KnownGraph)
1758 self.assertEqual([key_a, key_b, key_c], list(kg.topo_sort()))
1759
1740 def test_get_record_stream_empty(self):1760 def test_get_record_stream_empty(self):
1741 """An empty stream can be requested without error."""1761 """An empty stream can be requested without error."""
1742 f = self.get_versionedfiles()1762 f = self.get_versionedfiles()
17431763
=== modified file 'bzrlib/tests/test__known_graph.py'
--- bzrlib/tests/test__known_graph.py 2009-07-08 20:58:10 +0000
+++ bzrlib/tests/test__known_graph.py 2009-08-19 16:35:15 +0000
@@ -16,6 +16,8 @@
1616
17"""Tests for the python and pyrex extensions of KnownGraph"""17"""Tests for the python and pyrex extensions of KnownGraph"""
1818
19import pprint
20
19from bzrlib import (21from bzrlib import (
20 errors,22 errors,
21 graph as _mod_graph,23 graph as _mod_graph,
@@ -30,13 +32,15 @@
30 """Parameterize tests for all versions of groupcompress."""32 """Parameterize tests for all versions of groupcompress."""
31 scenarios = [33 scenarios = [
32 ('python', {'module': _known_graph_py, 'do_cache': True}),34 ('python', {'module': _known_graph_py, 'do_cache': True}),
35 ]
36 caching_scenarios = [
33 ('python-nocache', {'module': _known_graph_py, 'do_cache': False}),37 ('python-nocache', {'module': _known_graph_py, 'do_cache': False}),
34 ]38 ]
35 suite = loader.suiteClass()39 suite = loader.suiteClass()
36 if CompiledKnownGraphFeature.available():40 if CompiledKnownGraphFeature.available():
37 from bzrlib import _known_graph_pyx41 from bzrlib import _known_graph_pyx
38 scenarios.append(('C', {'module': _known_graph_pyx, 'do_cache': True}))42 scenarios.append(('C', {'module': _known_graph_pyx, 'do_cache': True}))
39 scenarios.append(('C-nocache',43 caching_scenarios.append(('C-nocache',
40 {'module': _known_graph_pyx, 'do_cache': False}))44 {'module': _known_graph_pyx, 'do_cache': False}))
41 else:45 else:
42 # the compiled module isn't available, so we add a failing test46 # the compiled module isn't available, so we add a failing test
@@ -44,8 +48,14 @@
44 def test_fail(self):48 def test_fail(self):
45 self.requireFeature(CompiledKnownGraphFeature)49 self.requireFeature(CompiledKnownGraphFeature)
46 suite.addTest(loader.loadTestsFromTestCase(FailWithoutFeature))50 suite.addTest(loader.loadTestsFromTestCase(FailWithoutFeature))
47 result = tests.multiply_tests(standard_tests, scenarios, suite)51 # TestKnownGraphHeads needs to be permutated with and without caching.
48 return result52 # All other TestKnownGraph tests only need to be tested across module
53 heads_suite, other_suite = tests.split_suite_by_condition(
54 standard_tests, tests.condition_isinstance(TestKnownGraphHeads))
55 suite = tests.multiply_tests(other_suite, scenarios, suite)
56 suite = tests.multiply_tests(heads_suite, scenarios + caching_scenarios,
57 suite)
58 return suite
4959
5060
51class _CompiledKnownGraphFeature(tests.Feature):61class _CompiledKnownGraphFeature(tests.Feature):
@@ -73,14 +83,16 @@
73alt_merge = {'a': [], 'b': ['a'], 'c': ['b'], 'd': ['a', 'c']}83alt_merge = {'a': [], 'b': ['a'], 'c': ['b'], 'd': ['a', 'c']}
7484
7585
76class TestKnownGraph(tests.TestCase):86class TestCaseWithKnownGraph(tests.TestCase):
7787
78 module = None # Set by load_tests88 module = None # Set by load_tests
79 do_cache = None # Set by load_tests
8089
81 def make_known_graph(self, ancestry):90 def make_known_graph(self, ancestry):
82 return self.module.KnownGraph(ancestry, do_cache=self.do_cache)91 return self.module.KnownGraph(ancestry, do_cache=self.do_cache)
8392
93
94class TestKnownGraph(TestCaseWithKnownGraph):
95
84 def assertGDFO(self, graph, rev, gdfo):96 def assertGDFO(self, graph, rev, gdfo):
85 node = graph._nodes[rev]97 node = graph._nodes[rev]
86 self.assertEqual(gdfo, node.gdfo)98 self.assertEqual(gdfo, node.gdfo)
@@ -127,6 +139,11 @@
127 self.assertGDFO(graph, 'a', 5)139 self.assertGDFO(graph, 'a', 5)
128 self.assertGDFO(graph, 'c', 5)140 self.assertGDFO(graph, 'c', 5)
129141
142
143class TestKnownGraphHeads(TestCaseWithKnownGraph):
144
145 do_cache = None # Set by load_tests
146
130 def test_heads_null(self):147 def test_heads_null(self):
131 graph = self.make_known_graph(test_graph.ancestry_1)148 graph = self.make_known_graph(test_graph.ancestry_1)
132 self.assertEqual(set(['null:']), graph.heads(['null:']))149 self.assertEqual(set(['null:']), graph.heads(['null:']))
@@ -227,3 +244,513 @@
227 self.assertEqual(set(['c']), graph.heads(['c', 'b', 'd', 'g']))244 self.assertEqual(set(['c']), graph.heads(['c', 'b', 'd', 'g']))
228 self.assertEqual(set(['a', 'c']), graph.heads(['a', 'c', 'e', 'g']))245 self.assertEqual(set(['a', 'c']), graph.heads(['a', 'c', 'e', 'g']))
229 self.assertEqual(set(['a', 'c']), graph.heads(['a', 'c', 'f']))246 self.assertEqual(set(['a', 'c']), graph.heads(['a', 'c', 'f']))
247
248
249class TestKnownGraphTopoSort(TestCaseWithKnownGraph):
250
251 def assertTopoSortOrder(self, ancestry):
252 """Check topo_sort and iter_topo_order is genuinely topological order.
253
254 For every child in the graph, check if it comes after all of it's
255 parents.
256 """
257 graph = self.make_known_graph(ancestry)
258 sort_result = graph.topo_sort()
259 # We should have an entry in sort_result for every entry present in the
260 # graph.
261 self.assertEqual(len(ancestry), len(sort_result))
262 node_idx = dict((node, idx) for idx, node in enumerate(sort_result))
263 for node in sort_result:
264 parents = ancestry[node]
265 for parent in parents:
266 if parent not in ancestry:
267 # ghost
268 continue
269 if node_idx[node] <= node_idx[parent]:
270 self.fail("parent %s must come before child %s:\n%s"
271 % (parent, node, sort_result))
272
273 def test_topo_sort_empty(self):
274 """TopoSort empty list"""
275 self.assertTopoSortOrder({})
276
277 def test_topo_sort_easy(self):
278 """TopoSort list with one node"""
279 self.assertTopoSortOrder({0: []})
280
281 def test_topo_sort_cycle(self):
282 """TopoSort traps graph with cycles"""
283 g = self.make_known_graph({0: [1],
284 1: [0]})
285 self.assertRaises(errors.GraphCycleError, g.topo_sort)
286
287 def test_topo_sort_cycle_2(self):
288 """TopoSort traps graph with longer cycle"""
289 g = self.make_known_graph({0: [1],
290 1: [2],
291 2: [0]})
292 self.assertRaises(errors.GraphCycleError, g.topo_sort)
293
294 def test_topo_sort_cycle_with_tail(self):
295 """TopoSort traps graph with longer cycle"""
296 g = self.make_known_graph({0: [1],
297 1: [2],
298 2: [3, 4],
299 3: [0],
300 4: []})
301 self.assertRaises(errors.GraphCycleError, g.topo_sort)
302
303 def test_topo_sort_1(self):
304 """TopoSort simple nontrivial graph"""
305 self.assertTopoSortOrder({0: [3],
306 1: [4],
307 2: [1, 4],
308 3: [],
309 4: [0, 3]})
310
311 def test_topo_sort_partial(self):
312 """Topological sort with partial ordering.
313
314 Multiple correct orderings are possible, so test for
315 correctness, not for exact match on the resulting list.
316 """
317 self.assertTopoSortOrder({0: [],
318 1: [0],
319 2: [0],
320 3: [0],
321 4: [1, 2, 3],
322 5: [1, 2],
323 6: [1, 2],
324 7: [2, 3],
325 8: [0, 1, 4, 5, 6]})
326
327 def test_topo_sort_ghost_parent(self):
328 """Sort nodes, but don't include some parents in the output"""
329 self.assertTopoSortOrder({0: [1],
330 1: [2]})
331
332
333class TestKnownGraphMergeSort(TestCaseWithKnownGraph):
334
335 def assertSortAndIterate(self, ancestry, branch_tip, result_list):
336 """Check that merge based sorting and iter_topo_order on graph works."""
337 graph = self.make_known_graph(ancestry)
338 value = graph.merge_sort(branch_tip)
339 value = [(n.key, n.merge_depth, n.revno, n.end_of_merge)
340 for n in value]
341 if result_list != value:
342 self.assertEqualDiff(pprint.pformat(result_list),
343 pprint.pformat(value))
344
345 def test_merge_sort_empty(self):
346 # sorting of an emptygraph does not error
347 self.assertSortAndIterate({}, None, [])
348 self.assertSortAndIterate({}, NULL_REVISION, [])
349 self.assertSortAndIterate({}, (NULL_REVISION,), [])
350
351 def test_merge_sort_not_empty_no_tip(self):
352 # merge sorting of a branch starting with None should result
353 # in an empty list: no revisions are dragged in.
354 self.assertSortAndIterate({0: []}, None, [])
355 self.assertSortAndIterate({0: []}, NULL_REVISION, [])
356 self.assertSortAndIterate({0: []}, (NULL_REVISION,), [])
357
358 def test_merge_sort_one_revision(self):
359 # sorting with one revision as the tip returns the correct fields:
360 # sequence - 0, revision id, merge depth - 0, end_of_merge
361 self.assertSortAndIterate({'id': []},
362 'id',
363 [('id', 0, (1,), True)])
364
365 def test_sequence_numbers_increase_no_merges(self):
366 # emit a few revisions with no merges to check the sequence
367 # numbering works in trivial cases
368 self.assertSortAndIterate(
369 {'A': [],
370 'B': ['A'],
371 'C': ['B']},
372 'C',
373 [('C', 0, (3,), False),
374 ('B', 0, (2,), False),
375 ('A', 0, (1,), True),
376 ],
377 )
378
379 def test_sequence_numbers_increase_with_merges(self):
380 # test that sequence numbers increase across merges
381 self.assertSortAndIterate(
382 {'A': [],
383 'B': ['A'],
384 'C': ['A', 'B']},
385 'C',
386 [('C', 0, (2,), False),
387 ('B', 1, (1,1,1), True),
388 ('A', 0, (1,), True),
389 ],
390 )
391
392 def test_merge_sort_race(self):
393 # A
394 # |
395 # B-.
396 # |\ \
397 # | | C
398 # | |/
399 # | D
400 # |/
401 # F
402 graph = {'A': [],
403 'B': ['A'],
404 'C': ['B'],
405 'D': ['B', 'C'],
406 'F': ['B', 'D'],
407 }
408 self.assertSortAndIterate(graph, 'F',
409 [('F', 0, (3,), False),
410 ('D', 1, (2,2,1), False),
411 ('C', 2, (2,1,1), True),
412 ('B', 0, (2,), False),
413 ('A', 0, (1,), True),
414 ])
415 # A
416 # |
417 # B-.
418 # |\ \
419 # | X C
420 # | |/
421 # | D
422 # |/
423 # F
424 graph = {'A': [],
425 'B': ['A'],
426 'C': ['B'],
427 'X': ['B'],
428 'D': ['X', 'C'],
429 'F': ['B', 'D'],
430 }
431 self.assertSortAndIterate(graph, 'F',
432 [('F', 0, (3,), False),
433 ('D', 1, (2,1,2), False),
434 ('C', 2, (2,2,1), True),
435 ('X', 1, (2,1,1), True),
436 ('B', 0, (2,), False),
437 ('A', 0, (1,), True),
438 ])
439
440 def test_merge_depth_with_nested_merges(self):
441 # the merge depth marker should reflect the depth of the revision
442 # in terms of merges out from the mainline
443 # revid, depth, parents:
444 # A 0 [D, B]
445 # B 1 [C, F]
446 # C 1 [H]
447 # D 0 [H, E]
448 # E 1 [G, F]
449 # F 2 [G]
450 # G 1 [H]
451 # H 0
452 self.assertSortAndIterate(
453 {'A': ['D', 'B'],
454 'B': ['C', 'F'],
455 'C': ['H'],
456 'D': ['H', 'E'],
457 'E': ['G', 'F'],
458 'F': ['G'],
459 'G': ['H'],
460 'H': []
461 },
462 'A',
463 [('A', 0, (3,), False),
464 ('B', 1, (1,3,2), False),
465 ('C', 1, (1,3,1), True),
466 ('D', 0, (2,), False),
467 ('E', 1, (1,1,2), False),
468 ('F', 2, (1,2,1), True),
469 ('G', 1, (1,1,1), True),
470 ('H', 0, (1,), True),
471 ],
472 )
473
474 def test_dotted_revnos_with_simple_merges(self):
475 # A 1
476 # |\
477 # B C 2, 1.1.1
478 # | |\
479 # D E F 3, 1.1.2, 1.2.1
480 # |/ /|
481 # G H I 4, 1.2.2, 1.3.1
482 # |/ /
483 # J K 5, 1.3.2
484 # |/
485 # L 6
486 self.assertSortAndIterate(
487 {'A': [],
488 'B': ['A'],
489 'C': ['A'],
490 'D': ['B'],
491 'E': ['C'],
492 'F': ['C'],
493 'G': ['D', 'E'],
494 'H': ['F'],
495 'I': ['F'],
496 'J': ['G', 'H'],
497 'K': ['I'],
498 'L': ['J', 'K'],
499 },
500 'L',
501 [('L', 0, (6,), False),
502 ('K', 1, (1,3,2), False),
503 ('I', 1, (1,3,1), True),
504 ('J', 0, (5,), False),
505 ('H', 1, (1,2,2), False),
506 ('F', 1, (1,2,1), True),
507 ('G', 0, (4,), False),
508 ('E', 1, (1,1,2), False),
509 ('C', 1, (1,1,1), True),
510 ('D', 0, (3,), False),
511 ('B', 0, (2,), False),
512 ('A', 0, (1,), True),
513 ],
514 )
515 # Adding a shortcut from the first revision should not change any of
516 # the existing numbers
517 self.assertSortAndIterate(
518 {'A': [],
519 'B': ['A'],
520 'C': ['A'],
521 'D': ['B'],
522 'E': ['C'],
523 'F': ['C'],
524 'G': ['D', 'E'],
525 'H': ['F'],
526 'I': ['F'],
527 'J': ['G', 'H'],
528 'K': ['I'],
529 'L': ['J', 'K'],
530 'M': ['A'],
531 'N': ['L', 'M'],
532 },
533 'N',
534 [('N', 0, (7,), False),
535 ('M', 1, (1,4,1), True),
536 ('L', 0, (6,), False),
537 ('K', 1, (1,3,2), False),
538 ('I', 1, (1,3,1), True),
539 ('J', 0, (5,), False),
540 ('H', 1, (1,2,2), False),
541 ('F', 1, (1,2,1), True),
542 ('G', 0, (4,), False),
543 ('E', 1, (1,1,2), False),
544 ('C', 1, (1,1,1), True),
545 ('D', 0, (3,), False),
546 ('B', 0, (2,), False),
547 ('A', 0, (1,), True),
548 ],
549 )
550
551 def test_end_of_merge_not_last_revision_in_branch(self):
552 # within a branch only the last revision gets an
553 # end of merge marker.
554 self.assertSortAndIterate(
555 {'A': ['B'],
556 'B': [],
557 },
558 'A',
559 [('A', 0, (2,), False),
560 ('B', 0, (1,), True)
561 ],
562 )
563
564 def test_end_of_merge_multiple_revisions_merged_at_once(self):
565 # when multiple branches are merged at once, both of their
566 # branch-endpoints should be listed as end-of-merge.
567 # Also, the order of the multiple merges should be
568 # left-right shown top to bottom.
569 # * means end of merge
570 # A 0 [H, B, E]
571 # B 1 [D, C]
572 # C 2 [D] *
573 # D 1 [H] *
574 # E 1 [G, F]
575 # F 2 [G] *
576 # G 1 [H] *
577 # H 0 [] *
578 self.assertSortAndIterate(
579 {'A': ['H', 'B', 'E'],
580 'B': ['D', 'C'],
581 'C': ['D'],
582 'D': ['H'],
583 'E': ['G', 'F'],
584 'F': ['G'],
585 'G': ['H'],
586 'H': [],
587 },
588 'A',
589 [('A', 0, (2,), False),
590 ('B', 1, (1,3,2), False),
591 ('C', 2, (1,4,1), True),
592 ('D', 1, (1,3,1), True),
593 ('E', 1, (1,1,2), False),
594 ('F', 2, (1,2,1), True),
595 ('G', 1, (1,1,1), True),
596 ('H', 0, (1,), True),
597 ],
598 )
599
600 def test_parallel_root_sequence_numbers_increase_with_merges(self):
601 """When there are parallel roots, check their revnos."""
602 self.assertSortAndIterate(
603 {'A': [],
604 'B': [],
605 'C': ['A', 'B']},
606 'C',
607 [('C', 0, (2,), False),
608 ('B', 1, (0,1,1), True),
609 ('A', 0, (1,), True),
610 ],
611 )
612
613 def test_revnos_are_globally_assigned(self):
614 """revnos are assigned according to the revision they derive from."""
615 # in this test we setup a number of branches that all derive from
616 # the first revision, and then merge them one at a time, which
617 # should give the revisions as they merge numbers still deriving from
618 # the revision were based on.
619 # merge 3: J: ['G', 'I']
620 # branch 3:
621 # I: ['H']
622 # H: ['A']
623 # merge 2: G: ['D', 'F']
624 # branch 2:
625 # F: ['E']
626 # E: ['A']
627 # merge 1: D: ['A', 'C']
628 # branch 1:
629 # C: ['B']
630 # B: ['A']
631 # root: A: []
632 self.assertSortAndIterate(
633 {'J': ['G', 'I'],
634 'I': ['H',],
635 'H': ['A'],
636 'G': ['D', 'F'],
637 'F': ['E'],
638 'E': ['A'],
639 'D': ['A', 'C'],
640 'C': ['B'],
641 'B': ['A'],
642 'A': [],
643 },
644 'J',
645 [('J', 0, (4,), False),
646 ('I', 1, (1,3,2), False),
647 ('H', 1, (1,3,1), True),
648 ('G', 0, (3,), False),
649 ('F', 1, (1,2,2), False),
650 ('E', 1, (1,2,1), True),
651 ('D', 0, (2,), False),
652 ('C', 1, (1,1,2), False),
653 ('B', 1, (1,1,1), True),
654 ('A', 0, (1,), True),
655 ],
656 )
657
658 def test_roots_and_sub_branches_versus_ghosts(self):
659 """Extra roots and their mini branches use the same numbering.
660
661 All of them use the 0-node numbering.
662 """
663 # A D K
664 # | |\ |\
665 # B E F L M
666 # | |/ |/
667 # C G N
668 # |/ |\
669 # H I O P
670 # |/ |/
671 # J Q
672 # |.---'
673 # R
674 self.assertSortAndIterate(
675 {'A': [],
676 'B': ['A'],
677 'C': ['B'],
678 'D': [],
679 'E': ['D'],
680 'F': ['D'],
681 'G': ['E', 'F'],
682 'H': ['C', 'G'],
683 'I': [],
684 'J': ['H', 'I'],
685 'K': [],
686 'L': ['K'],
687 'M': ['K'],
688 'N': ['L', 'M'],
689 'O': ['N'],
690 'P': ['N'],
691 'Q': ['O', 'P'],
692 'R': ['J', 'Q'],
693 },
694 'R',
695 [('R', 0, (6,), False),
696 ('Q', 1, (0,4,5), False),
697 ('P', 2, (0,6,1), True),
698 ('O', 1, (0,4,4), False),
699 ('N', 1, (0,4,3), False),
700 ('M', 2, (0,5,1), True),
701 ('L', 1, (0,4,2), False),
702 ('K', 1, (0,4,1), True),
703 ('J', 0, (5,), False),
704 ('I', 1, (0,3,1), True),
705 ('H', 0, (4,), False),
706 ('G', 1, (0,1,3), False),
707 ('F', 2, (0,2,1), True),
708 ('E', 1, (0,1,2), False),
709 ('D', 1, (0,1,1), True),
710 ('C', 0, (3,), False),
711 ('B', 0, (2,), False),
712 ('A', 0, (1,), True),
713 ],
714 )
715
716 def test_ghost(self):
717 # merge_sort should be able to ignore ghosts
718 # A
719 # |
720 # B ghost
721 # |/
722 # C
723 self.assertSortAndIterate(
724 {'A': [],
725 'B': ['A'],
726 'C': ['B', 'ghost'],
727 },
728 'C',
729 [('C', 0, (3,), False),
730 ('B', 0, (2,), False),
731 ('A', 0, (1,), True),
732 ])
733
734 def test_graph_cycle(self):
735 # merge_sort should fail with a simple error when a graph cycle is
736 # encountered.
737 #
738 # A
739 # |,-.
740 # B |
741 # | |
742 # C ^
743 # | |
744 # D |
745 # |'-'
746 # E
747 self.assertRaises(errors.GraphCycleError,
748 self.assertSortAndIterate,
749 {'A': [],
750 'B': ['D'],
751 'C': ['B'],
752 'D': ['C'],
753 'E': ['D'],
754 },
755 'E',
756 [])
230757
=== modified file 'bzrlib/tests/test_tsort.py'
--- bzrlib/tests/test_tsort.py 2009-08-17 15:26:18 +0000
+++ bzrlib/tests/test_tsort.py 2009-08-19 16:35:15 +0000
@@ -17,6 +17,7 @@
1717
18"""Tests for topological sort."""18"""Tests for topological sort."""
1919
20import pprint
2021
21from bzrlib.tests import TestCase22from bzrlib.tests import TestCase
22from bzrlib.tsort import topo_sort, TopoSorter, MergeSorter, merge_sort23from bzrlib.tsort import topo_sort, TopoSorter, MergeSorter, merge_sort
@@ -39,6 +40,23 @@
39 list,40 list,
40 TopoSorter(graph).iter_topo_order())41 TopoSorter(graph).iter_topo_order())
4142
43 def assertSortAndIterateOrder(self, graph):
44 """Check topo_sort and iter_topo_order is genuinely topological order.
45
46 For every child in the graph, check if it comes after all of it's
47 parents.
48 """
49 sort_result = topo_sort(graph)
50 iter_result = list(TopoSorter(graph).iter_topo_order())
51 for (node, parents) in graph:
52 for parent in parents:
53 if sort_result.index(node) < sort_result.index(parent):
54 self.fail("parent %s must come before child %s:\n%s"
55 % (parent, node, sort_result))
56 if iter_result.index(node) < iter_result.index(parent):
57 self.fail("parent %s must come before child %s:\n%s"
58 % (parent, node, iter_result))
59
42 def test_tsort_empty(self):60 def test_tsort_empty(self):
43 """TopoSort empty list"""61 """TopoSort empty list"""
44 self.assertSortAndIterate([], [])62 self.assertSortAndIterate([], [])
@@ -60,6 +78,15 @@
60 1: [2],78 1: [2],
61 2: [0]}.items())79 2: [0]}.items())
6280
81 def test_topo_sort_cycle_with_tail(self):
82 """TopoSort traps graph with longer cycle"""
83 self.assertSortAndIterateRaise(GraphCycleError,
84 {0: [1],
85 1: [2],
86 2: [3, 4],
87 3: [0],
88 4: []}.items())
89
63 def test_tsort_1(self):90 def test_tsort_1(self):
64 """TopoSort simple nontrivial graph"""91 """TopoSort simple nontrivial graph"""
65 self.assertSortAndIterate({0: [3],92 self.assertSortAndIterate({0: [3],
@@ -72,10 +99,10 @@
72 def test_tsort_partial(self):99 def test_tsort_partial(self):
73 """Topological sort with partial ordering.100 """Topological sort with partial ordering.
74101
75 If the graph does not give an order between two nodes, they are102 Multiple correct orderings are possible, so test for
76 returned in lexicographical order.103 correctness, not for exact match on the resulting list.
77 """104 """
78 self.assertSortAndIterate(([(0, []),105 self.assertSortAndIterateOrder([(0, []),
79 (1, [0]),106 (1, [0]),
80 (2, [0]),107 (2, [0]),
81 (3, [0]),108 (3, [0]),
@@ -83,8 +110,7 @@
83 (5, [1, 2]),110 (5, [1, 2]),
84 (6, [1, 2]),111 (6, [1, 2]),
85 (7, [2, 3]),112 (7, [2, 3]),
86 (8, [0, 1, 4, 5, 6])]),113 (8, [0, 1, 4, 5, 6])])
87 [0, 1, 2, 3, 4, 5, 6, 7, 8])
88114
89 def test_tsort_unincluded_parent(self):115 def test_tsort_unincluded_parent(self):
90 """Sort nodes, but don't include some parents in the output"""116 """Sort nodes, but don't include some parents in the output"""
@@ -102,12 +128,8 @@
102 mainline_revisions=mainline_revisions,128 mainline_revisions=mainline_revisions,
103 generate_revno=generate_revno)129 generate_revno=generate_revno)
104 if result_list != value:130 if result_list != value:
105 import pprint
106 self.assertEqualDiff(pprint.pformat(result_list),131 self.assertEqualDiff(pprint.pformat(result_list),
107 pprint.pformat(value))132 pprint.pformat(value))
108 self.assertEquals(result_list,
109 merge_sort(graph, branch_tip, mainline_revisions=mainline_revisions,
110 generate_revno=generate_revno))
111 self.assertEqual(result_list,133 self.assertEqual(result_list,
112 list(MergeSorter(134 list(MergeSorter(
113 graph,135 graph,
114136
=== modified file 'bzrlib/tsort.py'
--- bzrlib/tsort.py 2009-08-17 15:26:18 +0000
+++ bzrlib/tsort.py 2009-08-19 16:35:14 +0000
@@ -18,8 +18,11 @@
18"""Topological sorting routines."""18"""Topological sorting routines."""
1919
2020
21from bzrlib import errors21from bzrlib import (
22import bzrlib.revision as _mod_revision22 errors,
23 graph as _mod_graph,
24 revision as _mod_revision,
25 )
2326
2427
25__all__ = ["topo_sort", "TopoSorter", "merge_sort", "MergeSorter"]28__all__ = ["topo_sort", "TopoSorter", "merge_sort", "MergeSorter"]
@@ -30,12 +33,21 @@
3033
31 graph -- sequence of pairs of node->parents_list.34 graph -- sequence of pairs of node->parents_list.
3235
33 The result is a list of node names, such that all parents come before36 The result is a list of node names, such that all parents come before their
34 their children.37 children.
3538
36 node identifiers can be any hashable object, and are typically strings.39 node identifiers can be any hashable object, and are typically strings.
40
41 This function has the same purpose as the TopoSorter class, but uses a
42 different algorithm to sort the graph. That means that while both return a
43 list with parents before their child nodes, the exact ordering can be
44 different.
45
46 topo_sort is faster when the whole list is needed, while when iterating
47 over a part of the list, TopoSorter.iter_topo_order should be used.
37 """48 """
38 return TopoSorter(graph).sorted()49 kg = _mod_graph.KnownGraph(dict(graph))
50 return kg.topo_sort()
3951
4052
41class TopoSorter(object):53class TopoSorter(object):
@@ -60,22 +72,8 @@
60 iteration or sorting may raise GraphCycleError if a cycle is present72 iteration or sorting may raise GraphCycleError if a cycle is present
61 in the graph.73 in the graph.
62 """74 """
63 # a dict of the graph.75 # store a dict of the graph.
64 self._graph = dict(graph)76 self._graph = dict(graph)
65 self._visitable = set(self._graph)
66 ### if debugging:
67 # self._original_graph = dict(graph)
68
69 # this is a stack storing the depth first search into the graph.
70 self._node_name_stack = []
71 # at each level of 'recursion' we have to check each parent. This
72 # stack stores the parents we have not yet checked for the node at the
73 # matching depth in _node_name_stack
74 self._pending_parents_stack = []
75 # this is a set of the completed nodes for fast checking whether a
76 # parent in a node we are processing on the stack has already been
77 # emitted and thus can be skipped.
78 self._completed_node_names = set()
7977
80 def sorted(self):78 def sorted(self):
81 """Sort the graph and return as a list.79 """Sort the graph and return as a list.
@@ -100,67 +98,64 @@
100 After finishing iteration the sorter is empty and you cannot continue98 After finishing iteration the sorter is empty and you cannot continue
101 iteration.99 iteration.
102 """100 """
103 while self._graph:101 graph = self._graph
102 visitable = set(graph)
103
104 # this is a stack storing the depth first search into the graph.
105 pending_node_stack = []
106 # at each level of 'recursion' we have to check each parent. This
107 # stack stores the parents we have not yet checked for the node at the
108 # matching depth in pending_node_stack
109 pending_parents_stack = []
110
111 # this is a set of the completed nodes for fast checking whether a
112 # parent in a node we are processing on the stack has already been
113 # emitted and thus can be skipped.
114 completed_node_names = set()
115
116 while graph:
104 # now pick a random node in the source graph, and transfer it to the117 # now pick a random node in the source graph, and transfer it to the
105 # top of the depth first search stack.118 # top of the depth first search stack of pending nodes.
106 node_name, parents = self._graph.popitem()119 node_name, parents = graph.popitem()
107 self._push_node(node_name, parents)120 pending_node_stack.append(node_name)
108 while self._node_name_stack:121 pending_parents_stack.append(list(parents))
109 # loop until this call completes.122
110 parents_to_visit = self._pending_parents_stack[-1]123 # loop until pending_node_stack is empty
111 # if all parents are done, the revision is done124 while pending_node_stack:
125 parents_to_visit = pending_parents_stack[-1]
126 # if there are no parents left, the revision is done
112 if not parents_to_visit:127 if not parents_to_visit:
113 # append the revision to the topo sorted list128 # append the revision to the topo sorted list
114 # all the nodes parents have been added to the output, now129 # all the nodes parents have been added to the output,
115 # we can add it to the output.130 # now we can add it to the output.
116 yield self._pop_node()131 popped_node = pending_node_stack.pop()
132 pending_parents_stack.pop()
133 completed_node_names.add(popped_node)
134 yield popped_node
117 else:135 else:
118 while self._pending_parents_stack[-1]:136 # recurse depth first into a single parent
119 # recurse depth first into a single parent137 next_node_name = parents_to_visit.pop()
120 next_node_name = self._pending_parents_stack[-1].pop()138
121 if next_node_name in self._completed_node_names:139 if next_node_name in completed_node_names:
122 # this parent was completed by a child on the140 # parent was already completed by a child, skip it.
123 # call stack. skip it.141 continue
124 continue142 if next_node_name not in visitable:
125 if next_node_name not in self._visitable:143 # parent is not a node in the original graph, skip it.
126 continue144 continue
127 # otherwise transfer it from the source graph into the145
128 # top of the current depth first search stack.146 # transfer it along with its parents from the source graph
129 try:147 # into the top of the current depth first search stack.
130 parents = self._graph.pop(next_node_name)148 try:
131 except KeyError:149 parents = graph.pop(next_node_name)
132 # if the next node is not in the source graph it has150 except KeyError:
133 # already been popped from it and placed into the151 # if the next node is not in the source graph it has
134 # current search stack (but not completed or we would152 # already been popped from it and placed into the
135 # have hit the continue 4 lines up.153 # current search stack (but not completed or we would
136 # this indicates a cycle.154 # have hit the continue 6 lines up). this indicates a
137 raise errors.GraphCycleError(self._node_name_stack)155 # cycle.
138 self._push_node(next_node_name, parents)156 raise errors.GraphCycleError(pending_node_stack)
139 # and do not continue processing parents until this 'call'157 pending_node_stack.append(next_node_name)
140 # has recursed.158 pending_parents_stack.append(list(parents))
141 break
142
143 def _push_node(self, node_name, parents):
144 """Add node_name to the pending node stack.
145
146 Names in this stack will get emitted into the output as they are popped
147 off the stack.
148 """
149 self._node_name_stack.append(node_name)
150 self._pending_parents_stack.append(list(parents))
151
152 def _pop_node(self):
153 """Pop the top node off the stack
154
155 The node is appended to the sorted output.
156 """
157 # we are returning from the flattened call frame:
158 # pop off the local variables
159 node_name = self._node_name_stack.pop()
160 self._pending_parents_stack.pop()
161
162 self._completed_node_names.add(node_name)
163 return node_name
164159
165160
166def merge_sort(graph, branch_tip, mainline_revisions=None, generate_revno=False):161def merge_sort(graph, branch_tip, mainline_revisions=None, generate_revno=False):
@@ -414,7 +409,8 @@
414409
415 # seed the search with the tip of the branch410 # seed the search with the tip of the branch
416 if (branch_tip is not None and411 if (branch_tip is not None and
417 branch_tip != _mod_revision.NULL_REVISION):412 branch_tip != _mod_revision.NULL_REVISION and
413 branch_tip != (_mod_revision.NULL_REVISION,)):
418 parents = self._graph.pop(branch_tip)414 parents = self._graph.pop(branch_tip)
419 self._push_node(branch_tip, 0, parents)415 self._push_node(branch_tip, 0, parents)
420416
@@ -571,7 +567,11 @@
571 # current search stack (but not completed or we would567 # current search stack (but not completed or we would
572 # have hit the continue 4 lines up.568 # have hit the continue 4 lines up.
573 # this indicates a cycle.569 # this indicates a cycle.
574 raise errors.GraphCycleError(node_name_stack)570 if next_node_name in self._original_graph:
571 raise errors.GraphCycleError(node_name_stack)
572 else:
573 # This is just a ghost parent, ignore it
574 continue
575 next_merge_depth = 0575 next_merge_depth = 0
576 if is_left_subtree:576 if is_left_subtree:
577 # a new child branch from name_stack[-1]577 # a new child branch from name_stack[-1]
@@ -673,11 +673,12 @@
673 else:673 else:
674 # no parents, use the root sequence674 # no parents, use the root sequence
675 root_count = self._revno_to_branch_count.get(0, 0)675 root_count = self._revno_to_branch_count.get(0, 0)
676 root_count = self._revno_to_branch_count.get(0, -1)
677 root_count += 1
676 if root_count:678 if root_count:
677 revno = (0, root_count, 1)679 revno = (0, root_count, 1)
678 else:680 else:
679 revno = (1,)681 revno = (1,)
680 root_count += 1
681 self._revno_to_branch_count[0] = root_count682 self._revno_to_branch_count[0] = root_count
682683
683 # store the revno for this node for future reference684 # store the revno for this node for future reference
684685
=== modified file 'bzrlib/versionedfile.py'
--- bzrlib/versionedfile.py 2009-08-07 05:56:29 +0000
+++ bzrlib/versionedfile.py 2009-08-19 16:35:14 +0000
@@ -32,6 +32,7 @@
32from bzrlib import (32from bzrlib import (
33 annotate,33 annotate,
34 errors,34 errors,
35 graph as _mod_graph,
35 groupcompress,36 groupcompress,
36 index,37 index,
37 knit,38 knit,
@@ -941,6 +942,20 @@
941 if '\n' in line[:-1]:942 if '\n' in line[:-1]:
942 raise errors.BzrBadParameterContainsNewline("lines")943 raise errors.BzrBadParameterContainsNewline("lines")
943944
945 def get_known_graph_ancestry(self, keys):
946 """Get a KnownGraph instance with the ancestry of keys."""
947 # most basic implementation is a loop around get_parent_map
948 pending = set(keys)
949 parent_map = {}
950 while pending:
951 this_parent_map = self.get_parent_map(pending)
952 parent_map.update(this_parent_map)
953 pending = set()
954 map(pending.update, this_parent_map.itervalues())
955 pending = pending.difference(parent_map)
956 kg = _mod_graph.KnownGraph(parent_map)
957 return kg
958
944 def get_parent_map(self, keys):959 def get_parent_map(self, keys):
945 """Get a map of the parents of keys.960 """Get a map of the parents of keys.
946961