Merge lp:~jameinel/bzr/1.19-known-graph-sorted into lp:~bzr/bzr/trunk-old

Proposed by John A Meinel
Status: Merged
Merged at revision: not available
Proposed branch: lp:~jameinel/bzr/1.19-known-graph-sorted
Merge into: lp:~bzr/bzr/trunk-old
Diff against target: 1973 lines
To merge this branch: bzr merge lp:~jameinel/bzr/1.19-known-graph-sorted
Reviewer Review Type Date Requested Status
Gary van der Merwe Abstain
Vincent Ladeuil Approve
Review via email: mp+10293@code.launchpad.net
To post a comment you must log in.
Revision history for this message
John A Meinel (jameinel) wrote :

This change finally brings the new ancestry extraction code up all the way so that 'bzr log' gets to use it.

It adds a member VersionedFiles.get_known_graph_ancestry(keys) which returns a KnownGraph instance.

It also implements 'merge_sort' in pyrex code on the KnownGraph object. At this point, KnownGraph.merge_sort() is quite fast, taking about 40ms to merge_sort all of bzr.dev and another 50ms to build up the KnownGraph object.

Combined with the improved extraction of ancestry, this brings "bzr log -n0 -r-10..-1" on bzr.dev down from 2.5s down to right at 1.0s for me.

As near as I can tell, the big win for doing merge_sort on the KnownGraph object is that you get to avoid a lot of dict lookup calls. 40ms for 25.6k keys is 1.5us per key. And doing a dict lookup to get the parents costs 10ms overall (.4us per key). It also brings the time for 'merge_sort' on the OOo tree down to 1.0s.

It also adds KnownGraph.topo_sort(), which turns out to only take around 10ms for all of bzr.dev (on top of the 50ms to build the KnownGraph data structure).

Because I already have an object model internally, I went ahead and exposed KnownGraph.merge_sort() as returning objects, rather than tuples. I think the api is going to be a lot easier to use, and none of my timings so far show an advantage to the tuple version. (It could be because the objects are compiled making getattr() faster...)

This changes tsort.topo_sort() to just be a thunk over to KnownGraph(parent_map).topo_sort(), as it is still faster than the fastest python implementation. (Though the pure python form is actually slower because of the overhead of building the KG object. I'm not worried, as it is still faster than our existing topo_sort implementation.)

I was thinking to do the same thing for 'merge_sort' but I took this opportunity to break the api, getting rid of "sequence_number", using an object model, ignoring 'mainline_revisions', etc. Note that only semi-deprecated code uses mainline_revisions anyway, and it really shouldn't be anymore. (It was back when we allowed Branch.revision_history() != lefthand_history.)

I also plan on updating KnownGraph with an 'add_node()' function, so that its implementation of 'merge_sort' can be used for annotate (which sometimes annotates the working tree). It shouldn't be hard to do.

Revision history for this message
Robert Collins (lifeless) wrote :

What impact will this have on things like my all-ubuntu repository (16K
unrelated branches in one repo) ?

-Rob

Revision history for this message
Vincent Ladeuil (vila) wrote :

>>>>> "robert" == Robert Collins writes:

    > What impact will this have on things like my
    > all-ubuntu repository (16K unrelated branches in one
    > repo) ?

Just you tells us ! :-)

But from the discussion with John, it should either improves
things (I'm 90% confident here)... or provide us with very nice
data ! I, for one, will love to have such a repo around to play
with...

@John, how did you measure your progress ? Still using
time_graph.py ? Is it time to enhance it ?

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Vincent Ladeuil wrote:
>>>>>> "robert" == Robert Collins writes:
>
> > What impact will this have on things like my
> > all-ubuntu repository (16K unrelated branches in one
> > repo) ?

It should perform approximately the same or better.

The 'find_ancestors()' code doesn't grab arbitrary node => parent
mappings, only ones that are in the ancestry of the keys that were
requested.

As such it is the same as repeated get_parent_map() calls, just without
the repeats.

The merge_sort() code is simply the same algorithm, just 3-7x faster
(depending on whether you count the time to build the KnownGraph).

Again, all of this code has the same "look at only the ancestry
requested" that the current code has. So there shouldn't be a blowout
from having lots of unrelated history. It just may not be a whole lot
faster because the other history is 'in the way'.

>
> Just you tells us ! :-)
>
> But from the discussion with John, it should either improves
> things (I'm 90% confident here)... or provide us with very nice
> data ! I, for one, will love to have such a repo around to play
> with...
>
> @John, how did you measure your progress ? Still using
> time_graph.py ? Is it time to enhance it ?

No. I did have another helper here, but mostly this is tested with:

$ PYTHONPATH=../bzr/work TIMEIT -s "from bzrlib import branch,
repository, tsort, graph
b = branch.Branch.open('bzr-2a-extra/bzr.dev')
b.lock_read()
l_rev = b.last_revision()
p_map, missing =
b.repository.revisions._index._graph_index.find_ancestry([(l_rev,)], 0)
b.unlock()
" "kg = graph.KnownGraph(p_map);
for n in kg.merge_sort((l_rev,)):
  n.key, n.revno, n.end_of_merge, n.merge_depth
"

Or just simply running:
  time bzr log -n0 -r -10..-1 >/dev/null
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkqKrUUACgkQJdeBCYSNAAMeogCfTHUrwZBQq2HWwKvQ3cLzrfeW
50gAoJxEtAiADlvkdtA/sP+ixeBfJebv
=XbkW
-----END PGP SIGNATURE-----

Revision history for this message
Vincent Ladeuil (vila) wrote :

Nice job !

In tsort.py, you can get rid of 'from collections import deque'.

Nothing else to ask for :) Nice work on the tests too.

review: Approve
Revision history for this message
Gary van der Merwe (garyvdm) wrote :

Note: This branch fixes Bug 350796.

review: Abstain
Revision history for this message
Gary van der Merwe (garyvdm) wrote :

> Note: This branch fixes Bug 350796.

Sorry - I was wrong about that. Bug 350796 was fixed by rev 4260.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'NEWS'
2--- NEWS 2009-08-18 20:05:30 +0000
3+++ NEWS 2009-08-19 16:35:14 +0000
4@@ -9,22 +9,6 @@
5 In Development
6 ##############
7
8-Bug Fixes
9-*********
10-
11-* Fix a test failure on karmic by making a locale test more robust.
12- (Vincent Ladeuil, #413514)
13-
14-Improvements
15-************
16-
17-* A better description of the platform is shown in crash tracebacks, ``bzr
18- --version`` and ``bzr selftest``.
19- (Martin Pool, #409137)
20-
21-bzr 1.18
22-########
23-
24 Compatibility Breaks
25 ********************
26
27@@ -83,6 +67,9 @@
28 version-3 protocol, but it does cause test suite failures when testing
29 downlevel protocol behaviour. (Robert Collins)
30
31+* Fix a test failure on karmic by making a locale test more robust.
32+ (Vincent Ladeuil, #413514)
33+
34 * Fixed "Pack ... already exists" error when running ``bzr pack`` on a
35 fully packed 2a repository. (Andrew Bennetts, #382463)
36
37@@ -109,12 +96,21 @@
38 Improvements
39 ************
40
41+* A better description of the platform is shown in crash tracebacks, ``bzr
42+ --version`` and ``bzr selftest``.
43+ (Martin Pool, #409137)
44+
45 * Cross-format fetches (such as between 1.9-rich-root and 2a) via the
46 smart server are more efficient now. They send inventory deltas rather
47 than full inventories. The smart server has two new requests,
48 ``Repository.get_stream_1.19`` and ``Repository.insert_stream_1.19`` to
49 support this. (Andrew Bennetts, #374738, #385826)
50
51+* Extracting the full ancestry and computing the ``merge_sort`` is now
52+ significantly faster. This effects things like ``bzr log -n0``. (For
53+ example, ``bzr log -r -10..-1 -n0 bzr.dev`` is 2.5s down to 1.0s.
54+ (John Arbash Meinel)
55+
56 Documentation
57 *************
58
59@@ -136,15 +132,20 @@
60 friendly StreamSource, which now automatically does the same
61 transformations as InterDifferingSerializer. (Andrew Bennetts)
62
63+* ``KnownGraph`` now has a ``.topo_sort`` and ``.merge_sort`` member which
64+ are implemented in pyrex and significantly faster. This is exposed along
65+ with ``CombinedGraphIndex.find_ancestry()`` as
66+ ``VersionedFiles.get_known_graph_ancestry(keys)``.
67+ (John Arbash Meinel)
68+
69 * RemoteBranch.open now honours ignore_fallbacks correctly on bzr-v2
70 protocols. (Robert Collins)
71
72 * The index code now has some specialized routines to extract the full
73 ancestry of a key in a more efficient manner.
74- ``CombinedGraphIndex.find_ancestry()``. This is not fully exposed to the
75- higher levels yet, but has the potential to improve grabbing the full
76- ancestry tremendously. (Time to get ancestry for bzr.dev drops from 1.5s
77- down to 300ms. For OOo from 33s => 10.5s) (John Arbash Meinel)
78+ ``CombinedGraphIndex.find_ancestry()``. (Time to get ancestry for
79+ bzr.dev drops from 1.5s down to 300ms. For OOo from 33s => 10.5s) (John
80+ Arbash Meinel)
81
82 Testing
83 *******
84
85=== modified file 'bzrlib/_known_graph_py.py'
86--- bzrlib/_known_graph_py.py 2009-07-08 20:58:10 +0000
87+++ bzrlib/_known_graph_py.py 2009-08-19 16:35:14 +0000
88@@ -18,6 +18,7 @@
89 """
90
91 from bzrlib import (
92+ errors,
93 revision,
94 )
95
96@@ -40,6 +41,18 @@
97 self.parent_keys, self.child_keys)
98
99
100+class _MergeSortNode(object):
101+ """Information about a specific node in the merge graph."""
102+
103+ __slots__ = ('key', 'merge_depth', 'revno', 'end_of_merge')
104+
105+ def __init__(self, key, merge_depth, revno, end_of_merge):
106+ self.key = key
107+ self.merge_depth = merge_depth
108+ self.revno = revno
109+ self.end_of_merge = end_of_merge
110+
111+
112 class KnownGraph(object):
113 """This is a class which assumes we already know the full graph."""
114
115@@ -171,3 +184,51 @@
116 self._known_heads[heads_key] = heads
117 return heads
118
119+ def topo_sort(self):
120+ """Return the nodes in topological order.
121+
122+ All parents must occur before all children.
123+ """
124+ for node in self._nodes.itervalues():
125+ if node.gdfo is None:
126+ raise errors.GraphCycleError(self._nodes)
127+ pending = self._find_tails()
128+ pending_pop = pending.pop
129+ pending_append = pending.append
130+
131+ topo_order = []
132+ topo_order_append = topo_order.append
133+
134+ num_seen_parents = dict.fromkeys(self._nodes, 0)
135+ while pending:
136+ node = pending_pop()
137+ if node.parent_keys is not None:
138+ # We don't include ghost parents
139+ topo_order_append(node.key)
140+ for child_key in node.child_keys:
141+ child_node = self._nodes[child_key]
142+ seen_parents = num_seen_parents[child_key] + 1
143+ if seen_parents == len(child_node.parent_keys):
144+ # All parents have been processed, enqueue this child
145+ pending_append(child_node)
146+ # This has been queued up, stop tracking it
147+ del num_seen_parents[child_key]
148+ else:
149+ num_seen_parents[child_key] = seen_parents
150+ # We started from the parents, so we don't need to do anymore work
151+ return topo_order
152+
153+ def merge_sort(self, tip_key):
154+ """Compute the merge sorted graph output."""
155+ from bzrlib import tsort
156+ as_parent_map = dict((node.key, node.parent_keys)
157+ for node in self._nodes.itervalues()
158+ if node.parent_keys is not None)
159+ # We intentionally always generate revnos and never force the
160+ # mainline_revisions
161+ # Strip the sequence_number that merge_sort generates
162+ return [_MergeSortNode(key, merge_depth, revno, end_of_merge)
163+ for _, key, merge_depth, revno, end_of_merge
164+ in tsort.merge_sort(as_parent_map, tip_key,
165+ mainline_revisions=None,
166+ generate_revno=True)]
167
168=== modified file 'bzrlib/_known_graph_pyx.pyx'
169--- bzrlib/_known_graph_pyx.pyx 2009-07-14 16:10:32 +0000
170+++ bzrlib/_known_graph_pyx.pyx 2009-08-19 16:35:14 +0000
171@@ -44,8 +44,9 @@
172
173 void Py_INCREF(object)
174
175+import gc
176
177-from bzrlib import revision
178+from bzrlib import errors, revision
179
180 cdef object NULL_REVISION
181 NULL_REVISION = revision.NULL_REVISION
182@@ -59,10 +60,9 @@
183 cdef object children
184 cdef public long gdfo
185 cdef int seen
186+ cdef object extra
187
188 def __init__(self, key):
189- cdef int i
190-
191 self.key = key
192 self.parents = None
193
194@@ -70,6 +70,7 @@
195 # Greatest distance from origin
196 self.gdfo = -1
197 self.seen = 0
198+ self.extra = None
199
200 property child_keys:
201 def __get__(self):
202@@ -115,9 +116,7 @@
203 return <_KnownGraphNode>temp_node
204
205
206-# TODO: slab allocate all _KnownGraphNode objects.
207-# We already know how many we are going to need, except for a couple of
208-# ghosts that could be allocated on demand.
209+cdef class _MergeSorter
210
211 cdef class KnownGraph:
212 """This is a class which assumes we already know the full graph."""
213@@ -136,6 +135,9 @@
214 # Maps {sorted(revision_id, revision_id): heads}
215 self._known_heads = {}
216 self.do_cache = int(do_cache)
217+ # TODO: consider disabling gc since we are allocating a lot of nodes
218+ # that won't be collectable anyway. real world testing has not
219+ # shown a specific impact, yet.
220 self._initialize_nodes(parent_map)
221 self._find_gdfo()
222
223@@ -183,11 +185,16 @@
224 parent_keys = <object>temp_parent_keys
225 num_parent_keys = len(parent_keys)
226 node = self._get_or_create_node(key)
227- # We know how many parents, so we could pre allocate an exact sized
228- # tuple here
229+ # We know how many parents, so we pre allocate the tuple
230 parent_nodes = PyTuple_New(num_parent_keys)
231- # We use iter here, because parent_keys maybe be a list or tuple
232 for pos2 from 0 <= pos2 < num_parent_keys:
233+ # Note: it costs us 10ms out of 40ms to lookup all of these
234+ # parents, it doesn't seem to be an allocation overhead,
235+ # but rather a lookup overhead. There doesn't seem to be
236+ # a way around it, and that is one reason why
237+ # KnownGraphNode maintains a direct pointer to the parent
238+ # node.
239+ # We use [] because parent_keys may be a tuple or list
240 parent_node = self._get_or_create_node(parent_keys[pos2])
241 # PyTuple_SET_ITEM will steal a reference, so INCREF first
242 Py_INCREF(parent_node)
243@@ -335,3 +342,353 @@
244 if self.do_cache:
245 PyDict_SetItem(self._known_heads, heads_key, heads)
246 return heads
247+
248+ def topo_sort(self):
249+ """Return the nodes in topological order.
250+
251+ All parents must occur before all children.
252+ """
253+ # This is, for the most part, the same iteration order that we used for
254+ # _find_gdfo, consider finding a way to remove the duplication
255+ # In general, we find the 'tails' (nodes with no parents), and then
256+ # walk to the children. For children that have all of their parents
257+ # yielded, we queue up the child to be yielded as well.
258+ cdef _KnownGraphNode node
259+ cdef _KnownGraphNode child
260+ cdef PyObject *temp
261+ cdef Py_ssize_t pos
262+ cdef int replace
263+ cdef Py_ssize_t last_item
264+
265+ pending = self._find_tails()
266+ if PyList_GET_SIZE(pending) == 0 and len(self._nodes) > 0:
267+ raise errors.GraphCycleError(self._nodes)
268+
269+ topo_order = []
270+
271+ last_item = PyList_GET_SIZE(pending) - 1
272+ while last_item >= 0:
273+ # Avoid pop followed by push, instead, peek, and replace
274+ # timing shows this is 930ms => 770ms for OOo
275+ node = _get_list_node(pending, last_item)
276+ last_item = last_item - 1
277+ if node.parents is not None:
278+ # We don't include ghost parents
279+ PyList_Append(topo_order, node.key)
280+ for pos from 0 <= pos < PyList_GET_SIZE(node.children):
281+ child = _get_list_node(node.children, pos)
282+ if child.gdfo == -1:
283+ # We know we have a graph cycle because a node has a parent
284+ # which we couldn't find
285+ raise errors.GraphCycleError(self._nodes)
286+ child.seen = child.seen + 1
287+ if child.seen == PyTuple_GET_SIZE(child.parents):
288+ # All parents of this child have been yielded, queue this
289+ # one to be yielded as well
290+ last_item = last_item + 1
291+ if last_item < PyList_GET_SIZE(pending):
292+ Py_INCREF(child) # SetItem steals a ref
293+ PyList_SetItem(pending, last_item, child)
294+ else:
295+ PyList_Append(pending, child)
296+ # We have queued this node, we don't need to track it
297+ # anymore
298+ child.seen = 0
299+ # We started from the parents, so we don't need to do anymore work
300+ return topo_order
301+
302+
303+ def merge_sort(self, tip_key):
304+ """Compute the merge sorted graph output."""
305+ cdef _MergeSorter sorter
306+
307+ # TODO: consider disabling gc since we are allocating a lot of nodes
308+ # that won't be collectable anyway. real world testing has not
309+ # shown a specific impact, yet.
310+ sorter = _MergeSorter(self, tip_key)
311+ return sorter.topo_order()
312+
313+
314+cdef class _MergeSortNode:
315+ """Tracks information about a node during the merge_sort operation."""
316+
317+ # Public api
318+ cdef public object key
319+ cdef public long merge_depth
320+ cdef public object end_of_merge # True/False Is this the end of the current merge
321+
322+ # Private api, used while computing the information
323+ cdef _KnownGraphNode left_parent
324+ cdef _KnownGraphNode left_pending_parent
325+ cdef object pending_parents # list of _KnownGraphNode for non-left parents
326+ cdef long _revno_first
327+ cdef long _revno_second
328+ cdef long _revno_last
329+ # TODO: turn these into flag/bit fields rather than individual members
330+ cdef int is_first_child # Is this the first child?
331+ cdef int seen_by_child # A child node has seen this parent
332+ cdef int completed # Fully Processed
333+
334+ def __init__(self, key):
335+ self.key = key
336+ self.merge_depth = -1
337+ self.left_parent = None
338+ self.left_pending_parent = None
339+ self.pending_parents = None
340+ self._revno_first = -1
341+ self._revno_second = -1
342+ self._revno_last = -1
343+ self.is_first_child = 0
344+ self.seen_by_child = 0
345+ self.completed = 0
346+
347+ def __repr__(self):
348+ return '%s(depth:%s rev:%s,%s,%s first:%s seen:%s)' % (self.__class__.__name__,
349+ self.merge_depth,
350+ self._revno_first, self._revno_second, self._revno_last,
351+ self.is_first_child, self.seen_by_child)
352+
353+ cdef int has_pending_parents(self):
354+ if self.left_pending_parent is not None or self.pending_parents:
355+ return 1
356+ return 0
357+
358+ cdef object _revno(self):
359+ if self._revno_first == -1:
360+ if self._revno_second != -1:
361+ raise RuntimeError('Something wrong with: %s' % (self,))
362+ return (self._revno_last,)
363+ else:
364+ return (self._revno_first, self._revno_second, self._revno_last)
365+
366+ property revno:
367+ def __get__(self):
368+ return self._revno()
369+
370+
371+cdef class _MergeSorter:
372+ """This class does the work of computing the merge_sort ordering.
373+
374+ We have some small advantages, in that we get all the extra information
375+ that KnownGraph knows, like knowing the child lists, etc.
376+ """
377+
378+ # Current performance numbers for merge_sort(bzr_dev_parent_map):
379+ # 302ms tsort.merge_sort()
380+ # 91ms graph.KnownGraph().merge_sort()
381+ # 40ms kg.merge_sort()
382+
383+ cdef KnownGraph graph
384+ cdef object _depth_first_stack # list
385+ cdef Py_ssize_t _last_stack_item # offset to last item on stack
386+ # cdef object _ms_nodes # dict of key => _MergeSortNode
387+ cdef object _revno_to_branch_count # {revno => num child branches}
388+ cdef object _scheduled_nodes # List of nodes ready to be yielded
389+
390+ def __init__(self, known_graph, tip_key):
391+ cdef _KnownGraphNode node
392+
393+ self.graph = known_graph
394+ # self._ms_nodes = {}
395+ self._revno_to_branch_count = {}
396+ self._depth_first_stack = []
397+ self._last_stack_item = -1
398+ self._scheduled_nodes = []
399+ if (tip_key is not None and tip_key != NULL_REVISION
400+ and tip_key != (NULL_REVISION,)):
401+ node = self.graph._nodes[tip_key]
402+ self._get_ms_node(node)
403+ self._push_node(node, 0)
404+
405+ cdef _MergeSortNode _get_ms_node(self, _KnownGraphNode node):
406+ cdef PyObject *temp_node
407+ cdef _MergeSortNode ms_node
408+
409+ if node.extra is None:
410+ ms_node = _MergeSortNode(node.key)
411+ node.extra = ms_node
412+ else:
413+ ms_node = <_MergeSortNode>node.extra
414+ return ms_node
415+
416+ cdef _push_node(self, _KnownGraphNode node, long merge_depth):
417+ cdef _KnownGraphNode parent_node
418+ cdef _MergeSortNode ms_node, ms_parent_node
419+ cdef Py_ssize_t pos
420+
421+ ms_node = self._get_ms_node(node)
422+ ms_node.merge_depth = merge_depth
423+ if PyTuple_GET_SIZE(node.parents) > 0:
424+ parent_node = _get_parent(node.parents, 0)
425+ ms_node.left_parent = parent_node
426+ ms_node.left_pending_parent = parent_node
427+ if PyTuple_GET_SIZE(node.parents) > 1:
428+ ms_node.pending_parents = []
429+ for pos from 1 <= pos < PyTuple_GET_SIZE(node.parents):
430+ parent_node = _get_parent(node.parents, pos)
431+ if parent_node.parents is None: # ghost
432+ continue
433+ PyList_Append(ms_node.pending_parents, parent_node)
434+
435+ ms_node.is_first_child = 1
436+ if ms_node.left_parent is not None:
437+ ms_parent_node = self._get_ms_node(ms_node.left_parent)
438+ if ms_parent_node.seen_by_child:
439+ ms_node.is_first_child = 0
440+ ms_parent_node.seen_by_child = 1
441+ self._last_stack_item = self._last_stack_item + 1
442+ if self._last_stack_item < PyList_GET_SIZE(self._depth_first_stack):
443+ Py_INCREF(node) # SetItem steals a ref
444+ PyList_SetItem(self._depth_first_stack, self._last_stack_item,
445+ node)
446+ else:
447+ PyList_Append(self._depth_first_stack, node)
448+
449+ cdef _pop_node(self):
450+ cdef PyObject *temp
451+ cdef _MergeSortNode ms_node, ms_parent_node, ms_prev_node
452+ cdef _KnownGraphNode node, parent_node, prev_node
453+
454+ node = _get_list_node(self._depth_first_stack, self._last_stack_item)
455+ ms_node = <_MergeSortNode>node.extra
456+ self._last_stack_item = self._last_stack_item - 1
457+ if ms_node.left_parent is not None:
458+ # Assign the revision number from the left-hand parent
459+ ms_parent_node = <_MergeSortNode>ms_node.left_parent.extra
460+ if ms_node.is_first_child:
461+ # First child just increments the final digit
462+ ms_node._revno_first = ms_parent_node._revno_first
463+ ms_node._revno_second = ms_parent_node._revno_second
464+ ms_node._revno_last = ms_parent_node._revno_last + 1
465+ else:
466+ # Not the first child, make a new branch
467+ # (mainline_revno, branch_count, 1)
468+ if ms_parent_node._revno_first == -1:
469+ # Mainline ancestor, the increment is on the last digit
470+ base_revno = ms_parent_node._revno_last
471+ else:
472+ base_revno = ms_parent_node._revno_first
473+ temp = PyDict_GetItem(self._revno_to_branch_count,
474+ base_revno)
475+ if temp == NULL:
476+ branch_count = 1
477+ else:
478+ branch_count = (<object>temp) + 1
479+ PyDict_SetItem(self._revno_to_branch_count, base_revno,
480+ branch_count)
481+ ms_node._revno_first = base_revno
482+ ms_node._revno_second = branch_count
483+ ms_node._revno_last = 1
484+ else:
485+ temp = PyDict_GetItem(self._revno_to_branch_count, 0)
486+ if temp == NULL:
487+ # The first root node doesn't have a 3-digit revno
488+ root_count = 0
489+ ms_node._revno_first = -1
490+ ms_node._revno_second = -1
491+ ms_node._revno_last = 1
492+ else:
493+ root_count = (<object>temp) + 1
494+ ms_node._revno_first = 0
495+ ms_node._revno_second = root_count
496+ ms_node._revno_last = 1
497+ PyDict_SetItem(self._revno_to_branch_count, 0, root_count)
498+ ms_node.completed = 1
499+ if PyList_GET_SIZE(self._scheduled_nodes) == 0:
500+ # The first scheduled node is always the end of merge
501+ ms_node.end_of_merge = True
502+ else:
503+ prev_node = _get_list_node(self._scheduled_nodes,
504+ PyList_GET_SIZE(self._scheduled_nodes) - 1)
505+ ms_prev_node = <_MergeSortNode>prev_node.extra
506+ if ms_prev_node.merge_depth < ms_node.merge_depth:
507+ # The previously pushed node is to our left, so this is the end
508+ # of this right-hand chain
509+ ms_node.end_of_merge = True
510+ elif (ms_prev_node.merge_depth == ms_node.merge_depth
511+ and prev_node not in node.parents):
512+ # The next node is not a direct parent of this node
513+ ms_node.end_of_merge = True
514+ else:
515+ ms_node.end_of_merge = False
516+ PyList_Append(self._scheduled_nodes, node)
517+
518+ cdef _schedule_stack(self):
519+ cdef _KnownGraphNode last_node, next_node
520+ cdef _MergeSortNode ms_node, ms_last_node, ms_next_node
521+ cdef long next_merge_depth
522+ ordered = []
523+ while self._last_stack_item >= 0:
524+ # Peek at the last item on the stack
525+ last_node = _get_list_node(self._depth_first_stack,
526+ self._last_stack_item)
527+ if last_node.gdfo == -1:
528+ # if _find_gdfo skipped a node, that means there is a graph
529+ # cycle, error out now
530+ raise errors.GraphCycleError(self.graph._nodes)
531+ ms_last_node = <_MergeSortNode>last_node.extra
532+ if not ms_last_node.has_pending_parents():
533+ # Processed all parents, pop this node
534+ self._pop_node()
535+ continue
536+ while ms_last_node.has_pending_parents():
537+ if ms_last_node.left_pending_parent is not None:
538+ # recurse depth first into the primary parent
539+ next_node = ms_last_node.left_pending_parent
540+ ms_last_node.left_pending_parent = None
541+ else:
542+ # place any merges in right-to-left order for scheduling
543+ # which gives us left-to-right order after we reverse
544+ # the scheduled queue.
545+ # Note: This has the effect of allocating common-new
546+ # revisions to the right-most subtree rather than the
547+ # left most, which will display nicely (you get
548+ # smaller trees at the top of the combined merge).
549+ next_node = ms_last_node.pending_parents.pop()
550+ ms_next_node = self._get_ms_node(next_node)
551+ if ms_next_node.completed:
552+ # this parent was completed by a child on the
553+ # call stack. skip it.
554+ continue
555+ # otherwise transfer it from the source graph into the
556+ # top of the current depth first search stack.
557+
558+ if next_node is ms_last_node.left_parent:
559+ next_merge_depth = ms_last_node.merge_depth
560+ else:
561+ next_merge_depth = ms_last_node.merge_depth + 1
562+ self._push_node(next_node, next_merge_depth)
563+ # and do not continue processing parents until this 'call'
564+ # has recursed.
565+ break
566+
567+ cdef topo_order(self):
568+ cdef _MergeSortNode ms_node
569+ cdef _KnownGraphNode node
570+ cdef Py_ssize_t pos
571+ cdef PyObject *temp_key, *temp_node
572+
573+ # Note: allocating a _MergeSortNode and deallocating it for all nodes
574+ # costs approx 8.52ms (21%) of the total runtime
575+ # We might consider moving the attributes into the base
576+ # KnownGraph object.
577+ self._schedule_stack()
578+
579+ # We've set up the basic schedule, now we can continue processing the
580+ # output.
581+ # Note: This final loop costs us 40.0ms => 28.8ms (11ms, 25%) on
582+ # bzr.dev, to convert the internal Object representation into a
583+ # Tuple representation...
584+ # 2ms is walking the data and computing revno tuples
585+ # 7ms is computing the return tuple
586+ # 4ms is PyList_Append()
587+ ordered = []
588+ # output the result in reverse order, and separate the generated info
589+ for pos from PyList_GET_SIZE(self._scheduled_nodes) > pos >= 0:
590+ node = _get_list_node(self._scheduled_nodes, pos)
591+ ms_node = <_MergeSortNode>node.extra
592+ PyList_Append(ordered, ms_node)
593+ node.extra = None
594+ # Clear out the scheduled nodes now that we're done
595+ self._scheduled_nodes = []
596+ return ordered
597
598=== modified file 'bzrlib/annotate.py'
599--- bzrlib/annotate.py 2009-07-08 17:09:03 +0000
600+++ bzrlib/annotate.py 2009-08-19 16:35:14 +0000
601@@ -188,6 +188,10 @@
602 # or something.
603 last_revision = current_rev.revision_id
604 # XXX: Partially Cloned from branch, uses the old_get_graph, eep.
605+ # XXX: The main difficulty is that we need to inject a single new node
606+ # (current_rev) into the graph before it gets numbered, etc.
607+ # Once KnownGraph gets an 'add_node()' function, we can use
608+ # VF.get_known_graph_ancestry().
609 graph = repository.get_graph()
610 revision_graph = dict(((key, value) for key, value in
611 graph.iter_ancestry(current_rev.parent_ids) if value is not None))
612
613=== modified file 'bzrlib/branch.py'
614--- bzrlib/branch.py 2009-08-17 06:22:18 +0000
615+++ bzrlib/branch.py 2009-08-19 16:35:14 +0000
616@@ -446,15 +446,11 @@
617 # start_revision_id.
618 if self._merge_sorted_revisions_cache is None:
619 last_revision = self.last_revision()
620- graph = self.repository.get_graph()
621- parent_map = dict(((key, value) for key, value in
622- graph.iter_ancestry([last_revision]) if value is not None))
623- revision_graph = repository._strip_NULL_ghosts(parent_map)
624- revs = tsort.merge_sort(revision_graph, last_revision, None,
625- generate_revno=True)
626- # Drop the sequence # before caching
627- self._merge_sorted_revisions_cache = [r[1:] for r in revs]
628-
629+ last_key = (last_revision,)
630+ known_graph = self.repository.revisions.get_known_graph_ancestry(
631+ [last_key])
632+ self._merge_sorted_revisions_cache = known_graph.merge_sort(
633+ last_key)
634 filtered = self._filter_merge_sorted_revisions(
635 self._merge_sorted_revisions_cache, start_revision_id,
636 stop_revision_id, stop_rule)
637@@ -470,27 +466,34 @@
638 """Iterate over an inclusive range of sorted revisions."""
639 rev_iter = iter(merge_sorted_revisions)
640 if start_revision_id is not None:
641- for rev_id, depth, revno, end_of_merge in rev_iter:
642+ for node in rev_iter:
643+ rev_id = node.key[-1]
644 if rev_id != start_revision_id:
645 continue
646 else:
647 # The decision to include the start or not
648 # depends on the stop_rule if a stop is provided
649- rev_iter = chain(
650- iter([(rev_id, depth, revno, end_of_merge)]),
651- rev_iter)
652+ # so pop this node back into the iterator
653+ rev_iter = chain(iter([node]), rev_iter)
654 break
655 if stop_revision_id is None:
656- for rev_id, depth, revno, end_of_merge in rev_iter:
657- yield rev_id, depth, revno, end_of_merge
658+ # Yield everything
659+ for node in rev_iter:
660+ rev_id = node.key[-1]
661+ yield (rev_id, node.merge_depth, node.revno,
662+ node.end_of_merge)
663 elif stop_rule == 'exclude':
664- for rev_id, depth, revno, end_of_merge in rev_iter:
665+ for node in rev_iter:
666+ rev_id = node.key[-1]
667 if rev_id == stop_revision_id:
668 return
669- yield rev_id, depth, revno, end_of_merge
670+ yield (rev_id, node.merge_depth, node.revno,
671+ node.end_of_merge)
672 elif stop_rule == 'include':
673- for rev_id, depth, revno, end_of_merge in rev_iter:
674- yield rev_id, depth, revno, end_of_merge
675+ for node in rev_iter:
676+ rev_id = node.key[-1]
677+ yield (rev_id, node.merge_depth, node.revno,
678+ node.end_of_merge)
679 if rev_id == stop_revision_id:
680 return
681 elif stop_rule == 'with-merges':
682@@ -499,10 +502,12 @@
683 left_parent = stop_rev.parent_ids[0]
684 else:
685 left_parent = _mod_revision.NULL_REVISION
686- for rev_id, depth, revno, end_of_merge in rev_iter:
687+ for node in rev_iter:
688+ rev_id = node.key[-1]
689 if rev_id == left_parent:
690 return
691- yield rev_id, depth, revno, end_of_merge
692+ yield (rev_id, node.merge_depth, node.revno,
693+ node.end_of_merge)
694 else:
695 raise ValueError('invalid stop_rule %r' % stop_rule)
696
697
698=== modified file 'bzrlib/graph.py'
699--- bzrlib/graph.py 2009-08-04 04:36:34 +0000
700+++ bzrlib/graph.py 2009-08-19 16:35:14 +0000
701@@ -21,7 +21,6 @@
702 errors,
703 revision,
704 trace,
705- tsort,
706 )
707 from bzrlib.symbol_versioning import deprecated_function, deprecated_in
708
709@@ -926,6 +925,7 @@
710 An ancestor may sort after a descendant if the relationship is not
711 visible in the supplied list of revisions.
712 """
713+ from bzrlib import tsort
714 sorter = tsort.TopoSorter(self.get_parent_map(revisions))
715 return sorter.iter_topo_order()
716
717
718=== modified file 'bzrlib/groupcompress.py'
719--- bzrlib/groupcompress.py 2009-08-04 04:36:34 +0000
720+++ bzrlib/groupcompress.py 2009-08-19 16:35:14 +0000
721@@ -62,16 +62,15 @@
722 # groupcompress ordering is approximately reverse topological,
723 # properly grouped by file-id.
724 per_prefix_map = {}
725- for item in parent_map.iteritems():
726- key = item[0]
727+ for key, value in parent_map.iteritems():
728 if isinstance(key, str) or len(key) == 1:
729 prefix = ''
730 else:
731 prefix = key[0]
732 try:
733- per_prefix_map[prefix].append(item)
734+ per_prefix_map[prefix][key] = value
735 except KeyError:
736- per_prefix_map[prefix] = [item]
737+ per_prefix_map[prefix] = {key: value}
738
739 present_keys = []
740 for prefix in sorted(per_prefix_map):
741@@ -1099,6 +1098,13 @@
742 self._check_lines_not_unicode(lines)
743 self._check_lines_are_lines(lines)
744
745+ def get_known_graph_ancestry(self, keys):
746+ """Get a KnownGraph instance with the ancestry of keys."""
747+ parent_map, missing_keys = self._index._graph_index.find_ancestry(keys,
748+ 0)
749+ kg = _mod_graph.KnownGraph(parent_map)
750+ return kg
751+
752 def get_parent_map(self, keys):
753 """Get a map of the graph parents of keys.
754
755
756=== modified file 'bzrlib/index.py'
757--- bzrlib/index.py 2009-08-13 19:56:26 +0000
758+++ bzrlib/index.py 2009-08-19 16:35:14 +0000
759@@ -333,6 +333,22 @@
760 if combine_backing_indices is not None:
761 self._combine_backing_indices = combine_backing_indices
762
763+ def find_ancestry(self, keys, ref_list_num):
764+ """See CombinedGraphIndex.find_ancestry()"""
765+ pending = set(keys)
766+ parent_map = {}
767+ missing_keys = set()
768+ while pending:
769+ next_pending = set()
770+ for _, key, value, ref_lists in self.iter_entries(pending):
771+ parent_keys = ref_lists[ref_list_num]
772+ parent_map[key] = parent_keys
773+ next_pending.update([p for p in parent_keys if p not in
774+ parent_map])
775+ missing_keys.update(pending.difference(parent_map))
776+ pending = next_pending
777+ return parent_map, missing_keys
778+
779
780 class GraphIndex(object):
781 """An index for data with embedded graphs.
782
783=== modified file 'bzrlib/knit.py'
784--- bzrlib/knit.py 2009-08-04 04:36:34 +0000
785+++ bzrlib/knit.py 2009-08-19 16:35:14 +0000
786@@ -1190,6 +1190,12 @@
787 generator = _VFContentMapGenerator(self, [key])
788 return generator._get_content(key)
789
790+ def get_known_graph_ancestry(self, keys):
791+ """Get a KnownGraph instance with the ancestry of keys."""
792+ parent_map, missing_keys = self._index.find_ancestry(keys)
793+ kg = _mod_graph.KnownGraph(parent_map)
794+ return kg
795+
796 def get_parent_map(self, keys):
797 """Get a map of the graph parents of keys.
798
799@@ -2560,6 +2566,33 @@
800 except KeyError:
801 raise RevisionNotPresent(key, self)
802
803+ def find_ancestry(self, keys):
804+ """See CombinedGraphIndex.find_ancestry()"""
805+ prefixes = set(key[:-1] for key in keys)
806+ self._load_prefixes(prefixes)
807+ result = {}
808+ parent_map = {}
809+ missing_keys = set()
810+ pending_keys = list(keys)
811+ # This assumes that keys will not reference parents in a different
812+ # prefix, which is accurate so far.
813+ while pending_keys:
814+ key = pending_keys.pop()
815+ if key in parent_map:
816+ continue
817+ prefix = key[:-1]
818+ try:
819+ suffix_parents = self._kndx_cache[prefix][0][key[-1]][4]
820+ except KeyError:
821+ missing_keys.add(key)
822+ else:
823+ parent_keys = tuple([prefix + (suffix,)
824+ for suffix in suffix_parents])
825+ parent_map[key] = parent_keys
826+ pending_keys.extend([p for p in parent_keys
827+ if p not in parent_map])
828+ return parent_map, missing_keys
829+
830 def get_parent_map(self, keys):
831 """Get a map of the parents of keys.
832
833@@ -3049,6 +3082,10 @@
834 options.append('no-eol')
835 return options
836
837+ def find_ancestry(self, keys):
838+ """See CombinedGraphIndex.find_ancestry()"""
839+ return self._graph_index.find_ancestry(keys, 0)
840+
841 def get_parent_map(self, keys):
842 """Get a map of the parents of keys.
843
844
845=== modified file 'bzrlib/missing.py'
846--- bzrlib/missing.py 2009-03-23 14:59:43 +0000
847+++ bzrlib/missing.py 2009-08-19 16:35:14 +0000
848@@ -138,31 +138,13 @@
849 if not ancestry: #Empty ancestry, no need to do any work
850 return []
851
852- mainline_revs, rev_nos, start_rev_id, end_rev_id = log._get_mainline_revs(
853- branch, None, tip_revno)
854- if not mainline_revs:
855- return []
856-
857- # This asks for all mainline revisions, which is size-of-history and
858- # should be addressed (but currently the only way to get correct
859- # revnos).
860-
861- # mainline_revisions always includes an extra revision at the
862- # beginning, so don't request it.
863- parent_map = dict(((key, value) for key, value
864- in graph.iter_ancestry(mainline_revs[1:])
865- if value is not None))
866- # filter out ghosts; merge_sort errors on ghosts.
867- # XXX: is this needed here ? -- vila080910
868- rev_graph = _mod_repository._strip_NULL_ghosts(parent_map)
869- # XXX: what if rev_graph is empty now ? -- vila080910
870- merge_sorted_revisions = tsort.merge_sort(rev_graph, tip,
871- mainline_revs,
872- generate_revno=True)
873+ merge_sorted_revisions = branch.iter_merge_sorted_revisions()
874 # Now that we got the correct revnos, keep only the relevant
875 # revisions.
876 merge_sorted_revisions = [
877- (s, revid, n, d, e) for s, revid, n, d, e in merge_sorted_revisions
878+ # log.reverse_by_depth expects seq_num to be present, but it is
879+ # stripped by iter_merge_sorted_revisions()
880+ (0, revid, n, d, e) for revid, n, d, e in merge_sorted_revisions
881 if revid in ancestry]
882 if not backward:
883 merge_sorted_revisions = log.reverse_by_depth(merge_sorted_revisions)
884
885=== modified file 'bzrlib/reconcile.py'
886--- bzrlib/reconcile.py 2009-06-10 03:56:49 +0000
887+++ bzrlib/reconcile.py 2009-08-19 16:35:14 +0000
888@@ -33,7 +33,7 @@
889 repofmt,
890 )
891 from bzrlib.trace import mutter, note
892-from bzrlib.tsort import TopoSorter
893+from bzrlib.tsort import topo_sort
894 from bzrlib.versionedfile import AdapterFactory, FulltextContentFactory
895
896
897@@ -247,8 +247,7 @@
898
899 # we have topological order of revisions and non ghost parents ready.
900 self._setup_steps(len(self._rev_graph))
901- revision_keys = [(rev_id,) for rev_id in
902- TopoSorter(self._rev_graph.items()).iter_topo_order()]
903+ revision_keys = [(rev_id,) for rev_id in topo_sort(self._rev_graph)]
904 stream = self._change_inv_parents(
905 self.inventory.get_record_stream(revision_keys, 'unordered', True),
906 self._new_inv_parents,
907@@ -378,7 +377,7 @@
908 new_inventories = self.repo._temp_inventories()
909 # we have topological order of revisions and non ghost parents ready.
910 graph = self.revisions.get_parent_map(self.revisions.keys())
911- revision_keys = list(TopoSorter(graph).iter_topo_order())
912+ revision_keys = topo_sort(graph)
913 revision_ids = [key[-1] for key in revision_keys]
914 self._setup_steps(len(revision_keys))
915 stream = self._change_inv_parents(
916
917=== modified file 'bzrlib/repofmt/weaverepo.py'
918--- bzrlib/repofmt/weaverepo.py 2009-08-14 11:11:29 +0000
919+++ bzrlib/repofmt/weaverepo.py 2009-08-19 16:35:14 +0000
920@@ -28,6 +28,7 @@
921 lazy_import(globals(), """
922 from bzrlib import (
923 xml5,
924+ graph as _mod_graph,
925 )
926 """)
927 from bzrlib import (
928@@ -663,6 +664,13 @@
929 result[key] = parents
930 return result
931
932+ def get_known_graph_ancestry(self, keys):
933+ """Get a KnownGraph instance with the ancestry of keys."""
934+ keys = self.keys()
935+ parent_map = self.get_parent_map(keys)
936+ kg = _mod_graph.KnownGraph(parent_map)
937+ return kg
938+
939 def get_record_stream(self, keys, sort_order, include_delta_closure):
940 for key in keys:
941 text, parents = self._load_text_parents(key)
942
943=== modified file 'bzrlib/repository.py'
944--- bzrlib/repository.py 2009-08-17 23:15:55 +0000
945+++ bzrlib/repository.py 2009-08-19 16:35:14 +0000
946@@ -4351,7 +4351,7 @@
947 phase = 'file'
948 revs = search.get_keys()
949 graph = self.from_repository.get_graph()
950- revs = list(graph.iter_topo_order(revs))
951+ revs = tsort.topo_sort(graph.get_parent_map(revs))
952 data_to_fetch = self.from_repository.item_keys_introduced_by(revs)
953 text_keys = []
954 for knit_kind, file_id, revisions in data_to_fetch:
955
956=== modified file 'bzrlib/tests/__init__.py'
957--- bzrlib/tests/__init__.py 2009-08-18 14:20:28 +0000
958+++ bzrlib/tests/__init__.py 2009-08-19 16:35:15 +0000
959@@ -3434,6 +3434,7 @@
960 'bzrlib.tests.per_repository',
961 'bzrlib.tests.per_repository_chk',
962 'bzrlib.tests.per_repository_reference',
963+ 'bzrlib.tests.per_versionedfile',
964 'bzrlib.tests.per_workingtree',
965 'bzrlib.tests.test__annotator',
966 'bzrlib.tests.test__chk_map',
967@@ -3585,7 +3586,6 @@
968 'bzrlib.tests.test_urlutils',
969 'bzrlib.tests.test_version',
970 'bzrlib.tests.test_version_info',
971- 'bzrlib.tests.test_versionedfile',
972 'bzrlib.tests.test_weave',
973 'bzrlib.tests.test_whitebox',
974 'bzrlib.tests.test_win32utils',
975
976=== modified file 'bzrlib/tests/blackbox/test_ancestry.py'
977--- bzrlib/tests/blackbox/test_ancestry.py 2009-03-23 14:59:43 +0000
978+++ bzrlib/tests/blackbox/test_ancestry.py 2009-08-19 16:35:15 +0000
979@@ -43,9 +43,15 @@
980
981 def _check_ancestry(self, location='', result=None):
982 out = self.run_bzr(['ancestry', location])[0]
983- if result is None:
984+ if result is not None:
985+ self.assertEqualDiff(result, out)
986+ else:
987+ # A2 and B1 can be in either order, because they are parallel, and
988+ # thus their topological order is not defined
989 result = "A1\nB1\nA2\nA3\n"
990- self.assertEqualDiff(out, result)
991+ if result != out:
992+ result = "A1\nA2\nB1\nA3\n"
993+ self.assertEqualDiff(result, out)
994
995 def test_ancestry(self):
996 """Tests 'ancestry' command"""
997
998=== renamed file 'bzrlib/tests/test_versionedfile.py' => 'bzrlib/tests/per_versionedfile.py'
999--- bzrlib/tests/test_versionedfile.py 2009-08-04 04:36:34 +0000
1000+++ bzrlib/tests/per_versionedfile.py 2009-08-19 16:35:15 +0000
1001@@ -26,6 +26,7 @@
1002
1003 from bzrlib import (
1004 errors,
1005+ graph as _mod_graph,
1006 groupcompress,
1007 knit as _mod_knit,
1008 osutils,
1009@@ -1737,6 +1738,25 @@
1010 f.get_record_stream([key_b], 'unordered', True
1011 ).next().get_bytes_as('fulltext'))
1012
1013+ def test_get_known_graph_ancestry(self):
1014+ f = self.get_versionedfiles()
1015+ if not self.graph:
1016+ raise TestNotApplicable('ancestry info only relevant with graph.')
1017+ key_a = self.get_simple_key('a')
1018+ key_b = self.get_simple_key('b')
1019+ key_c = self.get_simple_key('c')
1020+ # A
1021+ # |\
1022+ # | B
1023+ # |/
1024+ # C
1025+ f.add_lines(key_a, [], ['\n'])
1026+ f.add_lines(key_b, [key_a], ['\n'])
1027+ f.add_lines(key_c, [key_a, key_b], ['\n'])
1028+ kg = f.get_known_graph_ancestry([key_c])
1029+ self.assertIsInstance(kg, _mod_graph.KnownGraph)
1030+ self.assertEqual([key_a, key_b, key_c], list(kg.topo_sort()))
1031+
1032 def test_get_record_stream_empty(self):
1033 """An empty stream can be requested without error."""
1034 f = self.get_versionedfiles()
1035
1036=== modified file 'bzrlib/tests/test__known_graph.py'
1037--- bzrlib/tests/test__known_graph.py 2009-07-08 20:58:10 +0000
1038+++ bzrlib/tests/test__known_graph.py 2009-08-19 16:35:15 +0000
1039@@ -16,6 +16,8 @@
1040
1041 """Tests for the python and pyrex extensions of KnownGraph"""
1042
1043+import pprint
1044+
1045 from bzrlib import (
1046 errors,
1047 graph as _mod_graph,
1048@@ -30,13 +32,15 @@
1049 """Parameterize tests for all versions of groupcompress."""
1050 scenarios = [
1051 ('python', {'module': _known_graph_py, 'do_cache': True}),
1052+ ]
1053+ caching_scenarios = [
1054 ('python-nocache', {'module': _known_graph_py, 'do_cache': False}),
1055 ]
1056 suite = loader.suiteClass()
1057 if CompiledKnownGraphFeature.available():
1058 from bzrlib import _known_graph_pyx
1059 scenarios.append(('C', {'module': _known_graph_pyx, 'do_cache': True}))
1060- scenarios.append(('C-nocache',
1061+ caching_scenarios.append(('C-nocache',
1062 {'module': _known_graph_pyx, 'do_cache': False}))
1063 else:
1064 # the compiled module isn't available, so we add a failing test
1065@@ -44,8 +48,14 @@
1066 def test_fail(self):
1067 self.requireFeature(CompiledKnownGraphFeature)
1068 suite.addTest(loader.loadTestsFromTestCase(FailWithoutFeature))
1069- result = tests.multiply_tests(standard_tests, scenarios, suite)
1070- return result
1071+ # TestKnownGraphHeads needs to be permutated with and without caching.
1072+ # All other TestKnownGraph tests only need to be tested across module
1073+ heads_suite, other_suite = tests.split_suite_by_condition(
1074+ standard_tests, tests.condition_isinstance(TestKnownGraphHeads))
1075+ suite = tests.multiply_tests(other_suite, scenarios, suite)
1076+ suite = tests.multiply_tests(heads_suite, scenarios + caching_scenarios,
1077+ suite)
1078+ return suite
1079
1080
1081 class _CompiledKnownGraphFeature(tests.Feature):
1082@@ -73,14 +83,16 @@
1083 alt_merge = {'a': [], 'b': ['a'], 'c': ['b'], 'd': ['a', 'c']}
1084
1085
1086-class TestKnownGraph(tests.TestCase):
1087+class TestCaseWithKnownGraph(tests.TestCase):
1088
1089 module = None # Set by load_tests
1090- do_cache = None # Set by load_tests
1091
1092 def make_known_graph(self, ancestry):
1093 return self.module.KnownGraph(ancestry, do_cache=self.do_cache)
1094
1095+
1096+class TestKnownGraph(TestCaseWithKnownGraph):
1097+
1098 def assertGDFO(self, graph, rev, gdfo):
1099 node = graph._nodes[rev]
1100 self.assertEqual(gdfo, node.gdfo)
1101@@ -127,6 +139,11 @@
1102 self.assertGDFO(graph, 'a', 5)
1103 self.assertGDFO(graph, 'c', 5)
1104
1105+
1106+class TestKnownGraphHeads(TestCaseWithKnownGraph):
1107+
1108+ do_cache = None # Set by load_tests
1109+
1110 def test_heads_null(self):
1111 graph = self.make_known_graph(test_graph.ancestry_1)
1112 self.assertEqual(set(['null:']), graph.heads(['null:']))
1113@@ -227,3 +244,513 @@
1114 self.assertEqual(set(['c']), graph.heads(['c', 'b', 'd', 'g']))
1115 self.assertEqual(set(['a', 'c']), graph.heads(['a', 'c', 'e', 'g']))
1116 self.assertEqual(set(['a', 'c']), graph.heads(['a', 'c', 'f']))
1117+
1118+
1119+class TestKnownGraphTopoSort(TestCaseWithKnownGraph):
1120+
1121+ def assertTopoSortOrder(self, ancestry):
1122+ """Check topo_sort and iter_topo_order is genuinely topological order.
1123+
1124+ For every child in the graph, check if it comes after all of it's
1125+ parents.
1126+ """
1127+ graph = self.make_known_graph(ancestry)
1128+ sort_result = graph.topo_sort()
1129+ # We should have an entry in sort_result for every entry present in the
1130+ # graph.
1131+ self.assertEqual(len(ancestry), len(sort_result))
1132+ node_idx = dict((node, idx) for idx, node in enumerate(sort_result))
1133+ for node in sort_result:
1134+ parents = ancestry[node]
1135+ for parent in parents:
1136+ if parent not in ancestry:
1137+ # ghost
1138+ continue
1139+ if node_idx[node] <= node_idx[parent]:
1140+ self.fail("parent %s must come before child %s:\n%s"
1141+ % (parent, node, sort_result))
1142+
1143+ def test_topo_sort_empty(self):
1144+ """TopoSort empty list"""
1145+ self.assertTopoSortOrder({})
1146+
1147+ def test_topo_sort_easy(self):
1148+ """TopoSort list with one node"""
1149+ self.assertTopoSortOrder({0: []})
1150+
1151+ def test_topo_sort_cycle(self):
1152+ """TopoSort traps graph with cycles"""
1153+ g = self.make_known_graph({0: [1],
1154+ 1: [0]})
1155+ self.assertRaises(errors.GraphCycleError, g.topo_sort)
1156+
1157+ def test_topo_sort_cycle_2(self):
1158+ """TopoSort traps graph with longer cycle"""
1159+ g = self.make_known_graph({0: [1],
1160+ 1: [2],
1161+ 2: [0]})
1162+ self.assertRaises(errors.GraphCycleError, g.topo_sort)
1163+
1164+ def test_topo_sort_cycle_with_tail(self):
1165+ """TopoSort traps graph with longer cycle"""
1166+ g = self.make_known_graph({0: [1],
1167+ 1: [2],
1168+ 2: [3, 4],
1169+ 3: [0],
1170+ 4: []})
1171+ self.assertRaises(errors.GraphCycleError, g.topo_sort)
1172+
1173+ def test_topo_sort_1(self):
1174+ """TopoSort simple nontrivial graph"""
1175+ self.assertTopoSortOrder({0: [3],
1176+ 1: [4],
1177+ 2: [1, 4],
1178+ 3: [],
1179+ 4: [0, 3]})
1180+
1181+ def test_topo_sort_partial(self):
1182+ """Topological sort with partial ordering.
1183+
1184+ Multiple correct orderings are possible, so test for
1185+ correctness, not for exact match on the resulting list.
1186+ """
1187+ self.assertTopoSortOrder({0: [],
1188+ 1: [0],
1189+ 2: [0],
1190+ 3: [0],
1191+ 4: [1, 2, 3],
1192+ 5: [1, 2],
1193+ 6: [1, 2],
1194+ 7: [2, 3],
1195+ 8: [0, 1, 4, 5, 6]})
1196+
1197+ def test_topo_sort_ghost_parent(self):
1198+ """Sort nodes, but don't include some parents in the output"""
1199+ self.assertTopoSortOrder({0: [1],
1200+ 1: [2]})
1201+
1202+
1203+class TestKnownGraphMergeSort(TestCaseWithKnownGraph):
1204+
1205+ def assertSortAndIterate(self, ancestry, branch_tip, result_list):
1206+ """Check that merge based sorting and iter_topo_order on graph works."""
1207+ graph = self.make_known_graph(ancestry)
1208+ value = graph.merge_sort(branch_tip)
1209+ value = [(n.key, n.merge_depth, n.revno, n.end_of_merge)
1210+ for n in value]
1211+ if result_list != value:
1212+ self.assertEqualDiff(pprint.pformat(result_list),
1213+ pprint.pformat(value))
1214+
1215+ def test_merge_sort_empty(self):
1216+ # sorting of an emptygraph does not error
1217+ self.assertSortAndIterate({}, None, [])
1218+ self.assertSortAndIterate({}, NULL_REVISION, [])
1219+ self.assertSortAndIterate({}, (NULL_REVISION,), [])
1220+
1221+ def test_merge_sort_not_empty_no_tip(self):
1222+ # merge sorting of a branch starting with None should result
1223+ # in an empty list: no revisions are dragged in.
1224+ self.assertSortAndIterate({0: []}, None, [])
1225+ self.assertSortAndIterate({0: []}, NULL_REVISION, [])
1226+ self.assertSortAndIterate({0: []}, (NULL_REVISION,), [])
1227+
1228+ def test_merge_sort_one_revision(self):
1229+ # sorting with one revision as the tip returns the correct fields:
1230+ # sequence - 0, revision id, merge depth - 0, end_of_merge
1231+ self.assertSortAndIterate({'id': []},
1232+ 'id',
1233+ [('id', 0, (1,), True)])
1234+
1235+ def test_sequence_numbers_increase_no_merges(self):
1236+ # emit a few revisions with no merges to check the sequence
1237+ # numbering works in trivial cases
1238+ self.assertSortAndIterate(
1239+ {'A': [],
1240+ 'B': ['A'],
1241+ 'C': ['B']},
1242+ 'C',
1243+ [('C', 0, (3,), False),
1244+ ('B', 0, (2,), False),
1245+ ('A', 0, (1,), True),
1246+ ],
1247+ )
1248+
1249+ def test_sequence_numbers_increase_with_merges(self):
1250+ # test that sequence numbers increase across merges
1251+ self.assertSortAndIterate(
1252+ {'A': [],
1253+ 'B': ['A'],
1254+ 'C': ['A', 'B']},
1255+ 'C',
1256+ [('C', 0, (2,), False),
1257+ ('B', 1, (1,1,1), True),
1258+ ('A', 0, (1,), True),
1259+ ],
1260+ )
1261+
1262+ def test_merge_sort_race(self):
1263+ # A
1264+ # |
1265+ # B-.
1266+ # |\ \
1267+ # | | C
1268+ # | |/
1269+ # | D
1270+ # |/
1271+ # F
1272+ graph = {'A': [],
1273+ 'B': ['A'],
1274+ 'C': ['B'],
1275+ 'D': ['B', 'C'],
1276+ 'F': ['B', 'D'],
1277+ }
1278+ self.assertSortAndIterate(graph, 'F',
1279+ [('F', 0, (3,), False),
1280+ ('D', 1, (2,2,1), False),
1281+ ('C', 2, (2,1,1), True),
1282+ ('B', 0, (2,), False),
1283+ ('A', 0, (1,), True),
1284+ ])
1285+ # A
1286+ # |
1287+ # B-.
1288+ # |\ \
1289+ # | X C
1290+ # | |/
1291+ # | D
1292+ # |/
1293+ # F
1294+ graph = {'A': [],
1295+ 'B': ['A'],
1296+ 'C': ['B'],
1297+ 'X': ['B'],
1298+ 'D': ['X', 'C'],
1299+ 'F': ['B', 'D'],
1300+ }
1301+ self.assertSortAndIterate(graph, 'F',
1302+ [('F', 0, (3,), False),
1303+ ('D', 1, (2,1,2), False),
1304+ ('C', 2, (2,2,1), True),
1305+ ('X', 1, (2,1,1), True),
1306+ ('B', 0, (2,), False),
1307+ ('A', 0, (1,), True),
1308+ ])
1309+
1310+ def test_merge_depth_with_nested_merges(self):
1311+ # the merge depth marker should reflect the depth of the revision
1312+ # in terms of merges out from the mainline
1313+ # revid, depth, parents:
1314+ # A 0 [D, B]
1315+ # B 1 [C, F]
1316+ # C 1 [H]
1317+ # D 0 [H, E]
1318+ # E 1 [G, F]
1319+ # F 2 [G]
1320+ # G 1 [H]
1321+ # H 0
1322+ self.assertSortAndIterate(
1323+ {'A': ['D', 'B'],
1324+ 'B': ['C', 'F'],
1325+ 'C': ['H'],
1326+ 'D': ['H', 'E'],
1327+ 'E': ['G', 'F'],
1328+ 'F': ['G'],
1329+ 'G': ['H'],
1330+ 'H': []
1331+ },
1332+ 'A',
1333+ [('A', 0, (3,), False),
1334+ ('B', 1, (1,3,2), False),
1335+ ('C', 1, (1,3,1), True),
1336+ ('D', 0, (2,), False),
1337+ ('E', 1, (1,1,2), False),
1338+ ('F', 2, (1,2,1), True),
1339+ ('G', 1, (1,1,1), True),
1340+ ('H', 0, (1,), True),
1341+ ],
1342+ )
1343+
1344+ def test_dotted_revnos_with_simple_merges(self):
1345+ # A 1
1346+ # |\
1347+ # B C 2, 1.1.1
1348+ # | |\
1349+ # D E F 3, 1.1.2, 1.2.1
1350+ # |/ /|
1351+ # G H I 4, 1.2.2, 1.3.1
1352+ # |/ /
1353+ # J K 5, 1.3.2
1354+ # |/
1355+ # L 6
1356+ self.assertSortAndIterate(
1357+ {'A': [],
1358+ 'B': ['A'],
1359+ 'C': ['A'],
1360+ 'D': ['B'],
1361+ 'E': ['C'],
1362+ 'F': ['C'],
1363+ 'G': ['D', 'E'],
1364+ 'H': ['F'],
1365+ 'I': ['F'],
1366+ 'J': ['G', 'H'],
1367+ 'K': ['I'],
1368+ 'L': ['J', 'K'],
1369+ },
1370+ 'L',
1371+ [('L', 0, (6,), False),
1372+ ('K', 1, (1,3,2), False),
1373+ ('I', 1, (1,3,1), True),
1374+ ('J', 0, (5,), False),
1375+ ('H', 1, (1,2,2), False),
1376+ ('F', 1, (1,2,1), True),
1377+ ('G', 0, (4,), False),
1378+ ('E', 1, (1,1,2), False),
1379+ ('C', 1, (1,1,1), True),
1380+ ('D', 0, (3,), False),
1381+ ('B', 0, (2,), False),
1382+ ('A', 0, (1,), True),
1383+ ],
1384+ )
1385+ # Adding a shortcut from the first revision should not change any of
1386+ # the existing numbers
1387+ self.assertSortAndIterate(
1388+ {'A': [],
1389+ 'B': ['A'],
1390+ 'C': ['A'],
1391+ 'D': ['B'],
1392+ 'E': ['C'],
1393+ 'F': ['C'],
1394+ 'G': ['D', 'E'],
1395+ 'H': ['F'],
1396+ 'I': ['F'],
1397+ 'J': ['G', 'H'],
1398+ 'K': ['I'],
1399+ 'L': ['J', 'K'],
1400+ 'M': ['A'],
1401+ 'N': ['L', 'M'],
1402+ },
1403+ 'N',
1404+ [('N', 0, (7,), False),
1405+ ('M', 1, (1,4,1), True),
1406+ ('L', 0, (6,), False),
1407+ ('K', 1, (1,3,2), False),
1408+ ('I', 1, (1,3,1), True),
1409+ ('J', 0, (5,), False),
1410+ ('H', 1, (1,2,2), False),
1411+ ('F', 1, (1,2,1), True),
1412+ ('G', 0, (4,), False),
1413+ ('E', 1, (1,1,2), False),
1414+ ('C', 1, (1,1,1), True),
1415+ ('D', 0, (3,), False),
1416+ ('B', 0, (2,), False),
1417+ ('A', 0, (1,), True),
1418+ ],
1419+ )
1420+
1421+ def test_end_of_merge_not_last_revision_in_branch(self):
1422+ # within a branch only the last revision gets an
1423+ # end of merge marker.
1424+ self.assertSortAndIterate(
1425+ {'A': ['B'],
1426+ 'B': [],
1427+ },
1428+ 'A',
1429+ [('A', 0, (2,), False),
1430+ ('B', 0, (1,), True)
1431+ ],
1432+ )
1433+
1434+ def test_end_of_merge_multiple_revisions_merged_at_once(self):
1435+ # when multiple branches are merged at once, both of their
1436+ # branch-endpoints should be listed as end-of-merge.
1437+ # Also, the order of the multiple merges should be
1438+ # left-right shown top to bottom.
1439+ # * means end of merge
1440+ # A 0 [H, B, E]
1441+ # B 1 [D, C]
1442+ # C 2 [D] *
1443+ # D 1 [H] *
1444+ # E 1 [G, F]
1445+ # F 2 [G] *
1446+ # G 1 [H] *
1447+ # H 0 [] *
1448+ self.assertSortAndIterate(
1449+ {'A': ['H', 'B', 'E'],
1450+ 'B': ['D', 'C'],
1451+ 'C': ['D'],
1452+ 'D': ['H'],
1453+ 'E': ['G', 'F'],
1454+ 'F': ['G'],
1455+ 'G': ['H'],
1456+ 'H': [],
1457+ },
1458+ 'A',
1459+ [('A', 0, (2,), False),
1460+ ('B', 1, (1,3,2), False),
1461+ ('C', 2, (1,4,1), True),
1462+ ('D', 1, (1,3,1), True),
1463+ ('E', 1, (1,1,2), False),
1464+ ('F', 2, (1,2,1), True),
1465+ ('G', 1, (1,1,1), True),
1466+ ('H', 0, (1,), True),
1467+ ],
1468+ )
1469+
1470+ def test_parallel_root_sequence_numbers_increase_with_merges(self):
1471+ """When there are parallel roots, check their revnos."""
1472+ self.assertSortAndIterate(
1473+ {'A': [],
1474+ 'B': [],
1475+ 'C': ['A', 'B']},
1476+ 'C',
1477+ [('C', 0, (2,), False),
1478+ ('B', 1, (0,1,1), True),
1479+ ('A', 0, (1,), True),
1480+ ],
1481+ )
1482+
1483+ def test_revnos_are_globally_assigned(self):
1484+ """revnos are assigned according to the revision they derive from."""
1485+ # in this test we setup a number of branches that all derive from
1486+ # the first revision, and then merge them one at a time, which
1487+ # should give the revisions as they merge numbers still deriving from
1488+ # the revision were based on.
1489+ # merge 3: J: ['G', 'I']
1490+ # branch 3:
1491+ # I: ['H']
1492+ # H: ['A']
1493+ # merge 2: G: ['D', 'F']
1494+ # branch 2:
1495+ # F: ['E']
1496+ # E: ['A']
1497+ # merge 1: D: ['A', 'C']
1498+ # branch 1:
1499+ # C: ['B']
1500+ # B: ['A']
1501+ # root: A: []
1502+ self.assertSortAndIterate(
1503+ {'J': ['G', 'I'],
1504+ 'I': ['H',],
1505+ 'H': ['A'],
1506+ 'G': ['D', 'F'],
1507+ 'F': ['E'],
1508+ 'E': ['A'],
1509+ 'D': ['A', 'C'],
1510+ 'C': ['B'],
1511+ 'B': ['A'],
1512+ 'A': [],
1513+ },
1514+ 'J',
1515+ [('J', 0, (4,), False),
1516+ ('I', 1, (1,3,2), False),
1517+ ('H', 1, (1,3,1), True),
1518+ ('G', 0, (3,), False),
1519+ ('F', 1, (1,2,2), False),
1520+ ('E', 1, (1,2,1), True),
1521+ ('D', 0, (2,), False),
1522+ ('C', 1, (1,1,2), False),
1523+ ('B', 1, (1,1,1), True),
1524+ ('A', 0, (1,), True),
1525+ ],
1526+ )
1527+
1528+ def test_roots_and_sub_branches_versus_ghosts(self):
1529+ """Extra roots and their mini branches use the same numbering.
1530+
1531+ All of them use the 0-node numbering.
1532+ """
1533+ # A D K
1534+ # | |\ |\
1535+ # B E F L M
1536+ # | |/ |/
1537+ # C G N
1538+ # |/ |\
1539+ # H I O P
1540+ # |/ |/
1541+ # J Q
1542+ # |.---'
1543+ # R
1544+ self.assertSortAndIterate(
1545+ {'A': [],
1546+ 'B': ['A'],
1547+ 'C': ['B'],
1548+ 'D': [],
1549+ 'E': ['D'],
1550+ 'F': ['D'],
1551+ 'G': ['E', 'F'],
1552+ 'H': ['C', 'G'],
1553+ 'I': [],
1554+ 'J': ['H', 'I'],
1555+ 'K': [],
1556+ 'L': ['K'],
1557+ 'M': ['K'],
1558+ 'N': ['L', 'M'],
1559+ 'O': ['N'],
1560+ 'P': ['N'],
1561+ 'Q': ['O', 'P'],
1562+ 'R': ['J', 'Q'],
1563+ },
1564+ 'R',
1565+ [('R', 0, (6,), False),
1566+ ('Q', 1, (0,4,5), False),
1567+ ('P', 2, (0,6,1), True),
1568+ ('O', 1, (0,4,4), False),
1569+ ('N', 1, (0,4,3), False),
1570+ ('M', 2, (0,5,1), True),
1571+ ('L', 1, (0,4,2), False),
1572+ ('K', 1, (0,4,1), True),
1573+ ('J', 0, (5,), False),
1574+ ('I', 1, (0,3,1), True),
1575+ ('H', 0, (4,), False),
1576+ ('G', 1, (0,1,3), False),
1577+ ('F', 2, (0,2,1), True),
1578+ ('E', 1, (0,1,2), False),
1579+ ('D', 1, (0,1,1), True),
1580+ ('C', 0, (3,), False),
1581+ ('B', 0, (2,), False),
1582+ ('A', 0, (1,), True),
1583+ ],
1584+ )
1585+
1586+ def test_ghost(self):
1587+ # merge_sort should be able to ignore ghosts
1588+ # A
1589+ # |
1590+ # B ghost
1591+ # |/
1592+ # C
1593+ self.assertSortAndIterate(
1594+ {'A': [],
1595+ 'B': ['A'],
1596+ 'C': ['B', 'ghost'],
1597+ },
1598+ 'C',
1599+ [('C', 0, (3,), False),
1600+ ('B', 0, (2,), False),
1601+ ('A', 0, (1,), True),
1602+ ])
1603+
1604+ def test_graph_cycle(self):
1605+ # merge_sort should fail with a simple error when a graph cycle is
1606+ # encountered.
1607+ #
1608+ # A
1609+ # |,-.
1610+ # B |
1611+ # | |
1612+ # C ^
1613+ # | |
1614+ # D |
1615+ # |'-'
1616+ # E
1617+ self.assertRaises(errors.GraphCycleError,
1618+ self.assertSortAndIterate,
1619+ {'A': [],
1620+ 'B': ['D'],
1621+ 'C': ['B'],
1622+ 'D': ['C'],
1623+ 'E': ['D'],
1624+ },
1625+ 'E',
1626+ [])
1627
1628=== modified file 'bzrlib/tests/test_tsort.py'
1629--- bzrlib/tests/test_tsort.py 2009-08-17 15:26:18 +0000
1630+++ bzrlib/tests/test_tsort.py 2009-08-19 16:35:15 +0000
1631@@ -17,6 +17,7 @@
1632
1633 """Tests for topological sort."""
1634
1635+import pprint
1636
1637 from bzrlib.tests import TestCase
1638 from bzrlib.tsort import topo_sort, TopoSorter, MergeSorter, merge_sort
1639@@ -39,6 +40,23 @@
1640 list,
1641 TopoSorter(graph).iter_topo_order())
1642
1643+ def assertSortAndIterateOrder(self, graph):
1644+ """Check topo_sort and iter_topo_order is genuinely topological order.
1645+
1646+ For every child in the graph, check if it comes after all of it's
1647+ parents.
1648+ """
1649+ sort_result = topo_sort(graph)
1650+ iter_result = list(TopoSorter(graph).iter_topo_order())
1651+ for (node, parents) in graph:
1652+ for parent in parents:
1653+ if sort_result.index(node) < sort_result.index(parent):
1654+ self.fail("parent %s must come before child %s:\n%s"
1655+ % (parent, node, sort_result))
1656+ if iter_result.index(node) < iter_result.index(parent):
1657+ self.fail("parent %s must come before child %s:\n%s"
1658+ % (parent, node, iter_result))
1659+
1660 def test_tsort_empty(self):
1661 """TopoSort empty list"""
1662 self.assertSortAndIterate([], [])
1663@@ -60,6 +78,15 @@
1664 1: [2],
1665 2: [0]}.items())
1666
1667+ def test_topo_sort_cycle_with_tail(self):
1668+ """TopoSort traps graph with longer cycle"""
1669+ self.assertSortAndIterateRaise(GraphCycleError,
1670+ {0: [1],
1671+ 1: [2],
1672+ 2: [3, 4],
1673+ 3: [0],
1674+ 4: []}.items())
1675+
1676 def test_tsort_1(self):
1677 """TopoSort simple nontrivial graph"""
1678 self.assertSortAndIterate({0: [3],
1679@@ -72,10 +99,10 @@
1680 def test_tsort_partial(self):
1681 """Topological sort with partial ordering.
1682
1683- If the graph does not give an order between two nodes, they are
1684- returned in lexicographical order.
1685+ Multiple correct orderings are possible, so test for
1686+ correctness, not for exact match on the resulting list.
1687 """
1688- self.assertSortAndIterate(([(0, []),
1689+ self.assertSortAndIterateOrder([(0, []),
1690 (1, [0]),
1691 (2, [0]),
1692 (3, [0]),
1693@@ -83,8 +110,7 @@
1694 (5, [1, 2]),
1695 (6, [1, 2]),
1696 (7, [2, 3]),
1697- (8, [0, 1, 4, 5, 6])]),
1698- [0, 1, 2, 3, 4, 5, 6, 7, 8])
1699+ (8, [0, 1, 4, 5, 6])])
1700
1701 def test_tsort_unincluded_parent(self):
1702 """Sort nodes, but don't include some parents in the output"""
1703@@ -102,12 +128,8 @@
1704 mainline_revisions=mainline_revisions,
1705 generate_revno=generate_revno)
1706 if result_list != value:
1707- import pprint
1708 self.assertEqualDiff(pprint.pformat(result_list),
1709 pprint.pformat(value))
1710- self.assertEquals(result_list,
1711- merge_sort(graph, branch_tip, mainline_revisions=mainline_revisions,
1712- generate_revno=generate_revno))
1713 self.assertEqual(result_list,
1714 list(MergeSorter(
1715 graph,
1716
1717=== modified file 'bzrlib/tsort.py'
1718--- bzrlib/tsort.py 2009-08-17 15:26:18 +0000
1719+++ bzrlib/tsort.py 2009-08-19 16:35:14 +0000
1720@@ -18,8 +18,11 @@
1721 """Topological sorting routines."""
1722
1723
1724-from bzrlib import errors
1725-import bzrlib.revision as _mod_revision
1726+from bzrlib import (
1727+ errors,
1728+ graph as _mod_graph,
1729+ revision as _mod_revision,
1730+ )
1731
1732
1733 __all__ = ["topo_sort", "TopoSorter", "merge_sort", "MergeSorter"]
1734@@ -30,12 +33,21 @@
1735
1736 graph -- sequence of pairs of node->parents_list.
1737
1738- The result is a list of node names, such that all parents come before
1739- their children.
1740+ The result is a list of node names, such that all parents come before their
1741+ children.
1742
1743 node identifiers can be any hashable object, and are typically strings.
1744+
1745+ This function has the same purpose as the TopoSorter class, but uses a
1746+ different algorithm to sort the graph. That means that while both return a
1747+ list with parents before their child nodes, the exact ordering can be
1748+ different.
1749+
1750+ topo_sort is faster when the whole list is needed, while when iterating
1751+ over a part of the list, TopoSorter.iter_topo_order should be used.
1752 """
1753- return TopoSorter(graph).sorted()
1754+ kg = _mod_graph.KnownGraph(dict(graph))
1755+ return kg.topo_sort()
1756
1757
1758 class TopoSorter(object):
1759@@ -60,22 +72,8 @@
1760 iteration or sorting may raise GraphCycleError if a cycle is present
1761 in the graph.
1762 """
1763- # a dict of the graph.
1764+ # store a dict of the graph.
1765 self._graph = dict(graph)
1766- self._visitable = set(self._graph)
1767- ### if debugging:
1768- # self._original_graph = dict(graph)
1769-
1770- # this is a stack storing the depth first search into the graph.
1771- self._node_name_stack = []
1772- # at each level of 'recursion' we have to check each parent. This
1773- # stack stores the parents we have not yet checked for the node at the
1774- # matching depth in _node_name_stack
1775- self._pending_parents_stack = []
1776- # this is a set of the completed nodes for fast checking whether a
1777- # parent in a node we are processing on the stack has already been
1778- # emitted and thus can be skipped.
1779- self._completed_node_names = set()
1780
1781 def sorted(self):
1782 """Sort the graph and return as a list.
1783@@ -100,67 +98,64 @@
1784 After finishing iteration the sorter is empty and you cannot continue
1785 iteration.
1786 """
1787- while self._graph:
1788+ graph = self._graph
1789+ visitable = set(graph)
1790+
1791+ # this is a stack storing the depth first search into the graph.
1792+ pending_node_stack = []
1793+ # at each level of 'recursion' we have to check each parent. This
1794+ # stack stores the parents we have not yet checked for the node at the
1795+ # matching depth in pending_node_stack
1796+ pending_parents_stack = []
1797+
1798+ # this is a set of the completed nodes for fast checking whether a
1799+ # parent in a node we are processing on the stack has already been
1800+ # emitted and thus can be skipped.
1801+ completed_node_names = set()
1802+
1803+ while graph:
1804 # now pick a random node in the source graph, and transfer it to the
1805- # top of the depth first search stack.
1806- node_name, parents = self._graph.popitem()
1807- self._push_node(node_name, parents)
1808- while self._node_name_stack:
1809- # loop until this call completes.
1810- parents_to_visit = self._pending_parents_stack[-1]
1811- # if all parents are done, the revision is done
1812+ # top of the depth first search stack of pending nodes.
1813+ node_name, parents = graph.popitem()
1814+ pending_node_stack.append(node_name)
1815+ pending_parents_stack.append(list(parents))
1816+
1817+ # loop until pending_node_stack is empty
1818+ while pending_node_stack:
1819+ parents_to_visit = pending_parents_stack[-1]
1820+ # if there are no parents left, the revision is done
1821 if not parents_to_visit:
1822 # append the revision to the topo sorted list
1823- # all the nodes parents have been added to the output, now
1824- # we can add it to the output.
1825- yield self._pop_node()
1826+ # all the nodes parents have been added to the output,
1827+ # now we can add it to the output.
1828+ popped_node = pending_node_stack.pop()
1829+ pending_parents_stack.pop()
1830+ completed_node_names.add(popped_node)
1831+ yield popped_node
1832 else:
1833- while self._pending_parents_stack[-1]:
1834- # recurse depth first into a single parent
1835- next_node_name = self._pending_parents_stack[-1].pop()
1836- if next_node_name in self._completed_node_names:
1837- # this parent was completed by a child on the
1838- # call stack. skip it.
1839- continue
1840- if next_node_name not in self._visitable:
1841- continue
1842- # otherwise transfer it from the source graph into the
1843- # top of the current depth first search stack.
1844- try:
1845- parents = self._graph.pop(next_node_name)
1846- except KeyError:
1847- # if the next node is not in the source graph it has
1848- # already been popped from it and placed into the
1849- # current search stack (but not completed or we would
1850- # have hit the continue 4 lines up.
1851- # this indicates a cycle.
1852- raise errors.GraphCycleError(self._node_name_stack)
1853- self._push_node(next_node_name, parents)
1854- # and do not continue processing parents until this 'call'
1855- # has recursed.
1856- break
1857-
1858- def _push_node(self, node_name, parents):
1859- """Add node_name to the pending node stack.
1860-
1861- Names in this stack will get emitted into the output as they are popped
1862- off the stack.
1863- """
1864- self._node_name_stack.append(node_name)
1865- self._pending_parents_stack.append(list(parents))
1866-
1867- def _pop_node(self):
1868- """Pop the top node off the stack
1869-
1870- The node is appended to the sorted output.
1871- """
1872- # we are returning from the flattened call frame:
1873- # pop off the local variables
1874- node_name = self._node_name_stack.pop()
1875- self._pending_parents_stack.pop()
1876-
1877- self._completed_node_names.add(node_name)
1878- return node_name
1879+ # recurse depth first into a single parent
1880+ next_node_name = parents_to_visit.pop()
1881+
1882+ if next_node_name in completed_node_names:
1883+ # parent was already completed by a child, skip it.
1884+ continue
1885+ if next_node_name not in visitable:
1886+ # parent is not a node in the original graph, skip it.
1887+ continue
1888+
1889+ # transfer it along with its parents from the source graph
1890+ # into the top of the current depth first search stack.
1891+ try:
1892+ parents = graph.pop(next_node_name)
1893+ except KeyError:
1894+ # if the next node is not in the source graph it has
1895+ # already been popped from it and placed into the
1896+ # current search stack (but not completed or we would
1897+ # have hit the continue 6 lines up). this indicates a
1898+ # cycle.
1899+ raise errors.GraphCycleError(pending_node_stack)
1900+ pending_node_stack.append(next_node_name)
1901+ pending_parents_stack.append(list(parents))
1902
1903
1904 def merge_sort(graph, branch_tip, mainline_revisions=None, generate_revno=False):
1905@@ -414,7 +409,8 @@
1906
1907 # seed the search with the tip of the branch
1908 if (branch_tip is not None and
1909- branch_tip != _mod_revision.NULL_REVISION):
1910+ branch_tip != _mod_revision.NULL_REVISION and
1911+ branch_tip != (_mod_revision.NULL_REVISION,)):
1912 parents = self._graph.pop(branch_tip)
1913 self._push_node(branch_tip, 0, parents)
1914
1915@@ -571,7 +567,11 @@
1916 # current search stack (but not completed or we would
1917 # have hit the continue 4 lines up.
1918 # this indicates a cycle.
1919- raise errors.GraphCycleError(node_name_stack)
1920+ if next_node_name in self._original_graph:
1921+ raise errors.GraphCycleError(node_name_stack)
1922+ else:
1923+ # This is just a ghost parent, ignore it
1924+ continue
1925 next_merge_depth = 0
1926 if is_left_subtree:
1927 # a new child branch from name_stack[-1]
1928@@ -673,11 +673,12 @@
1929 else:
1930 # no parents, use the root sequence
1931 root_count = self._revno_to_branch_count.get(0, 0)
1932+ root_count = self._revno_to_branch_count.get(0, -1)
1933+ root_count += 1
1934 if root_count:
1935 revno = (0, root_count, 1)
1936 else:
1937 revno = (1,)
1938- root_count += 1
1939 self._revno_to_branch_count[0] = root_count
1940
1941 # store the revno for this node for future reference
1942
1943=== modified file 'bzrlib/versionedfile.py'
1944--- bzrlib/versionedfile.py 2009-08-07 05:56:29 +0000
1945+++ bzrlib/versionedfile.py 2009-08-19 16:35:14 +0000
1946@@ -32,6 +32,7 @@
1947 from bzrlib import (
1948 annotate,
1949 errors,
1950+ graph as _mod_graph,
1951 groupcompress,
1952 index,
1953 knit,
1954@@ -941,6 +942,20 @@
1955 if '\n' in line[:-1]:
1956 raise errors.BzrBadParameterContainsNewline("lines")
1957
1958+ def get_known_graph_ancestry(self, keys):
1959+ """Get a KnownGraph instance with the ancestry of keys."""
1960+ # most basic implementation is a loop around get_parent_map
1961+ pending = set(keys)
1962+ parent_map = {}
1963+ while pending:
1964+ this_parent_map = self.get_parent_map(pending)
1965+ parent_map.update(this_parent_map)
1966+ pending = set()
1967+ map(pending.update, this_parent_map.itervalues())
1968+ pending = pending.difference(parent_map)
1969+ kg = _mod_graph.KnownGraph(parent_map)
1970+ return kg
1971+
1972 def get_parent_map(self, keys):
1973 """Get a map of the parents of keys.
1974