Bazaar

Merge lp:~jameinel/bzr/1.19-known-graph-sorted into lp:~bzr/bzr/trunk-old

1.19-known-graph-sorted
Merge into trunk-old

Proposed by John A Meinel on 2009-08-18

Status:	Merged
Merged at revision:	not available
Proposed branch:	lp:~jameinel/bzr/1.19-known-graph-sorted
Merge into:	lp:~bzr/bzr/trunk-old
Diff against target:	1973 lines
To merge this branch:	bzr merge lp:~jameinel/bzr/1.19-known-graph-sorted
Related bugs:	Link a bug report

Reviewer	Date Requested	Status
Gary van der Merwe		Abstain on 2009-08-19
Vincent Ladeuil	2009-08-18	Approve on 2009-08-18
Review via email: mp+10293@code.launchpad.net

Revision history for this message

John A Meinel (jameinel) wrote on 2009-08-18:

This change finally brings the new ancestry extraction code up all the way so that 'bzr log' gets to use it.

It adds a member VersionedFiles.get_known_graph_ancestry(keys) which returns a KnownGraph instance.

It also implements 'merge_sort' in pyrex code on the KnownGraph object. At this point, KnownGraph.merge_sort() is quite fast, taking about 40ms to merge_sort all of bzr.dev and another 50ms to build up the KnownGraph object.

Combined with the improved extraction of ancestry, this brings "bzr log -n0 -r-10..-1" on bzr.dev down from 2.5s down to right at 1.0s for me.

As near as I can tell, the big win for doing merge_sort on the KnownGraph object is that you get to avoid a lot of dict lookup calls. 40ms for 25.6k keys is 1.5us per key. And doing a dict lookup to get the parents costs 10ms overall (.4us per key). It also brings the time for 'merge_sort' on the OOo tree down to 1.0s.

It also adds KnownGraph.topo_sort(), which turns out to only take around 10ms for all of bzr.dev (on top of the 50ms to build the KnownGraph data structure).

Because I already have an object model internally, I went ahead and exposed KnownGraph.merge_sort() as returning objects, rather than tuples. I think the api is going to be a lot easier to use, and none of my timings so far show an advantage to the tuple version. (It could be because the objects are compiled making getattr() faster...)

This changes tsort.topo_sort() to just be a thunk over to KnownGraph(parent_map).topo_sort(), as it is still faster than the fastest python implementation. (Though the pure python form is actually slower because of the overhead of building the KG object. I'm not worried, as it is still faster than our existing topo_sort implementation.)

I was thinking to do the same thing for 'merge_sort' but I took this opportunity to break the api, getting rid of "sequence_number", using an object model, ignoring 'mainline_revisions', etc. Note that only semi-deprecated code uses mainline_revisions anyway, and it really shouldn't be anymore. (It was back when we allowed Branch.revision_history() != lefthand_history.)

I also plan on updating KnownGraph with an 'add_node()' function, so that its implementation of 'merge_sort' can be used for annotate (which sometimes annotates the working tree). It shouldn't be hard to do.

This change finally brings the new ancestry extraction code up all the way so that 'bzr log' gets to use it.

It adds a member VersionedFiles.get_known_graph_ancestry(keys) which returns a KnownGraph instance.

Combined with the improved extraction of ancestry, this brings "bzr log -n0 -r-10..-1" on bzr.dev down from 2.5s down to right at 1.0s for me.

It also adds KnownGraph.topo_sort(), which turns out to only take around 10ms for all of bzr.dev (on top of the 50ms to build the KnownGraph data structure).

Revision history for this message

Robert Collins (lifeless) wrote on 2009-08-18:

What impact will this have on things like my all-ubuntu repository (16K
unrelated branches in one repo) ?

-Rob

Revision history for this message

Vincent Ladeuil (vila) wrote on 2009-08-18:

>>>>> "robert" == Robert Collins writes:

    > What impact will this have on things like my
    > all-ubuntu repository (16K unrelated branches in one
    > repo) ?

Just you tells us ! :-)

But from the discussion with John, it should either improves
things (I'm 90% confident here)... or provide us with very nice
data ! I, for one, will love to have such a repo around to play
with...

@John, how did you measure your progress ? Still using
time_graph.py ? Is it time to enhance it ?

Revision history for this message

John A Meinel (jameinel) wrote on 2009-08-18:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Vincent Ladeuil wrote:
>>>>>> "robert" == Robert Collins writes:
>
> > What impact will this have on things like my
> > all-ubuntu repository (16K unrelated branches in one
> > repo) ?

It should perform approximately the same or better.

The 'find_ancestors()' code doesn't grab arbitrary node => parent
mappings, only ones that are in the ancestry of the keys that were
requested.

As such it is the same as repeated get_parent_map() calls, just without
the repeats.

The merge_sort() code is simply the same algorithm, just 3-7x faster
(depending on whether you count the time to build the KnownGraph).

Again, all of this code has the same "look at only the ancestry
requested" that the current code has. So there shouldn't be a blowout
from having lots of unrelated history. It just may not be a whole lot
faster because the other history is 'in the way'.

>
> Just you tells us ! :-)
>
> But from the discussion with John, it should either improves
> things (I'm 90% confident here)... or provide us with very nice
> data ! I, for one, will love to have such a repo around to play
> with...
>
> @John, how did you measure your progress ? Still using
> time_graph.py ? Is it time to enhance it ?

No. I did have another helper here, but mostly this is tested with:

$ PYTHONPATH=../bzr/work TIMEIT -s "from bzrlib import branch,
repository, tsort, graph
b = branch.Branch.open('bzr-2a-extra/bzr.dev')
b.lock_read()
l_rev = b.last_revision()
p_map, missing =
b.repository.revisions._index._graph_index.find_ancestry([(l_rev,)], 0)
b.unlock()
" "kg = graph.KnownGraph(p_map);
for n in kg.merge_sort((l_rev,)):
n.key, n.revno, n.end_of_merge, n.merge_depth
"

Or just simply running:
time bzr log -n0 -r -10..-1 >/dev/null
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkqKrUUACgkQJdeBCYSNAAMeogCfTHUrwZBQq2HWwKvQ3cLzrfeW
50gAoJxEtAiADlvkdtA/sP+ixeBfJebv
=XbkW
-----END PGP SIGNATURE-----

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Vincent Ladeuil wrote:
>>>>>> "robert" == Robert Collins writes:
> 
>     > What impact will this have on things like my
>     > all-ubuntu repository (16K unrelated branches in one
>     > repo) ?

It should perform approximately the same or better.

The 'find_ancestors()' code doesn't grab arbitrary node => parent
mappings, only ones that are in the ancestry of the keys that were
requested.

As such it is the same as repeated get_parent_map() calls, just without
the repeats.

The merge_sort() code is simply the same algorithm, just 3-7x faster
(depending on whether you count the time to build the KnownGraph).

> 
> Just you tells us ! :-)
> 
> But from the discussion with John, it should either improves
> things (I'm 90% confident here)... or provide us with very nice
> data ! I, for one, will love to have such a repo around to play
> with...
> 
> @John, how did you measure your progress ? Still using
> time_graph.py ? Is it time to enhance it ?

No. I did have another helper here, but mostly this is tested with:

Or just simply running:
  time bzr log -n0 -r -10..-1 >/dev/null
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkqKrUUACgkQJdeBCYSNAAMeogCfTHUrwZBQq2HWwKvQ3cLzrfeW
50gAoJxEtAiADlvkdtA/sP+ixeBfJebv
=XbkW
-----END PGP SIGNATURE-----

Revision history for this message

Vincent Ladeuil (vila) wrote on 2009-08-18:

Nice job !

In tsort.py, you can get rid of 'from collections import deque'.

Nothing else to ask for :) Nice work on the tests too.

review: Approve

Revision history for this message

Gary van der Merwe (garyvdm) wrote on 2009-08-19:

Note: This branch fixes Bug 350796.

review: Abstain

Revision history for this message

Gary van der Merwe (garyvdm) wrote on 2009-08-19:

> Note: This branch fixes Bug 350796.

Sorry - I was wrong about that. Bug 350796 was fixed by rev 4260.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Aaron Bentley

Denys Duchier

Eric Siegerman

Gary van der Merwe

Jelmer Vernooij

John A Meinel

John Szakmeister

Jonathan Lange

Marius Kruger

Martin Albisetti

Matt Nordhoff

Paul Hummer

SuperMMX

Talden

Yoshinori Sano

to status/vote changes:

Alexander Belchenko

Martin Eisenhardt

Tim Penhey

Vincent Ladeuil

Bazaar

Merge lp:~jameinel/bzr/1.19-known-graph-sorted into lp:~bzr/bzr/trunk-old

Commit message

Description of the change

Preview Diff

Subscribers

 === modified file 'NEWS'
 --- NEWS	2009-08-18 20:05:30 +0000
 +++ NEWS	2009-08-19 16:35:14 +0000
@@ -9,22 +9,6 @@
  In Development
  ##############
--Bug Fixes
--*********
--
--* Fix a test failure on karmic by making a locale test more robust.
--  (Vincent Ladeuil, #413514)
--
--Improvements
--************
--
--* A better description of the platform is shown in crash tracebacks, ``bzr
--  --version`` and ``bzr selftest``.
--  (Martin Pool, #409137)
--
--bzr 1.18
--########
--
  Compatibility Breaks
  ********************
@@ -83,6 +67,9 @@
    version-3 protocol, but it does cause test suite failures when testing
    downlevel protocol behaviour. (Robert Collins)
++* Fix a test failure on karmic by making a locale test more robust.
++  (Vincent Ladeuil, #413514)
++
  * Fixed "Pack ... already exists" error when running ``bzr pack`` on a
    fully packed 2a repository.  (Andrew Bennetts, #382463)
@@ -109,12 +96,21 @@
  Improvements
  ************
++* A better description of the platform is shown in crash tracebacks, ``bzr
++  --version`` and ``bzr selftest``.
++  (Martin Pool, #409137)
++
  * Cross-format fetches (such as between 1.9-rich-root and 2a) via the
    smart server are more efficient now.  They send inventory deltas rather
    than full inventories.  The smart server has two new requests,
    ``Repository.get_stream_1.19`` and ``Repository.insert_stream_1.19`` to
    support this.  (Andrew Bennetts, #374738, #385826)
++* Extracting the full ancestry and computing the ``merge_sort`` is now
++  significantly faster. This effects things like ``bzr log -n0``. (For
++  example, ``bzr log -r -10..-1 -n0 bzr.dev`` is 2.5s down to 1.0s.
++  (John Arbash Meinel)
++
  Documentation
  *************
@@ -136,15 +132,20 @@
    friendly StreamSource, which now automatically does the same
    transformations as InterDifferingSerializer.  (Andrew Bennetts)
++* ``KnownGraph`` now has a ``.topo_sort`` and ``.merge_sort`` member which
++  are implemented in pyrex and significantly faster. This is exposed along
++  with ``CombinedGraphIndex.find_ancestry()`` as
++  ``VersionedFiles.get_known_graph_ancestry(keys)``.
++  (John Arbash Meinel)
++
  * RemoteBranch.open now honours ignore_fallbacks correctly on bzr-v2
    protocols. (Robert Collins)
  * The index code now has some specialized routines to extract the full
    ancestry of a key in a more efficient manner.
--  ``CombinedGraphIndex.find_ancestry()``. This is not fully exposed to the
--  higher levels yet, but has the potential to improve grabbing the full
--  ancestry tremendously. (Time to get ancestry for bzr.dev drops from 1.5s
--  down to 300ms. For OOo from 33s => 10.5s) (John Arbash Meinel)
++  ``CombinedGraphIndex.find_ancestry()``. (Time to get ancestry for
++  bzr.dev drops from 1.5s down to 300ms. For OOo from 33s => 10.5s) (John
++  Arbash Meinel)
  Testing
  *******
 === modified file 'bzrlib/_known_graph_py.py'
 --- bzrlib/_known_graph_py.py	2009-07-08 20:58:10 +0000
 +++ bzrlib/_known_graph_py.py	2009-08-19 16:35:14 +0000
@@ -18,6 +18,7 @@
  """
  from bzrlib import (
++    errors,
      revision,
+     )
@@ -40,6 +41,18 @@
              self.parent_keys, self.child_keys)
++class _MergeSortNode(object):
++    """Information about a specific node in the merge graph."""
++
++    __slots__ = ('key', 'merge_depth', 'revno', 'end_of_merge')
++
++    def __init__(self, key, merge_depth, revno, end_of_merge):
++        self.key = key
++        self.merge_depth = merge_depth
++        self.revno = revno
++        self.end_of_merge = end_of_merge
++
++
  class KnownGraph(object):
      """This is a class which assumes we already know the full graph."""
@@ -171,3 +184,51 @@
              self._known_heads[heads_key] = heads
          return heads
++    def topo_sort(self):
++        """Return the nodes in topological order.
++
++        All parents must occur before all children.
++        """
++        for node in self._nodes.itervalues():
++            if node.gdfo is None:
++                raise errors.GraphCycleError(self._nodes)
++        pending = self._find_tails()
++        pending_pop = pending.pop
++        pending_append = pending.append
++
++        topo_order = []
++        topo_order_append = topo_order.append
++
++        num_seen_parents = dict.fromkeys(self._nodes, 0)
++        while pending:
++            node = pending_pop()
++            if node.parent_keys is not None:
++                # We don't include ghost parents
++                topo_order_append(node.key)
++            for child_key in node.child_keys:
++                child_node = self._nodes[child_key]
++                seen_parents = num_seen_parents[child_key] + 1
++                if seen_parents == len(child_node.parent_keys):
++                    # All parents have been processed, enqueue this child
++                    pending_append(child_node)
++                    # This has been queued up, stop tracking it
++                    del num_seen_parents[child_key]
++                else:
++                    num_seen_parents[child_key] = seen_parents
++        # We started from the parents, so we don't need to do anymore work
++        return topo_order
++
++    def merge_sort(self, tip_key):
++        """Compute the merge sorted graph output."""
++        from bzrlib import tsort
++        as_parent_map = dict((node.key, node.parent_keys)
++                             for node in self._nodes.itervalues()
++                              if node.parent_keys is not None)
++        # We intentionally always generate revnos and never force the
++        # mainline_revisions
++        # Strip the sequence_number that merge_sort generates
++        return [_MergeSortNode(key, merge_depth, revno, end_of_merge)
++                for _, key, merge_depth, revno, end_of_merge
++                 in tsort.merge_sort(as_parent_map, tip_key,
++                                     mainline_revisions=None,
++                                     generate_revno=True)]
 === modified file 'bzrlib/_known_graph_pyx.pyx'
 --- bzrlib/_known_graph_pyx.pyx	2009-07-14 16:10:32 +0000
 +++ bzrlib/_known_graph_pyx.pyx	2009-08-19 16:35:14 +0000
@@ -44,8 +44,9 @@
      void Py_INCREF(object)
++import gc
--from bzrlib import revision
++from bzrlib import errors, revision
  cdef object NULL_REVISION
  NULL_REVISION = revision.NULL_REVISION
@@ -59,10 +60,9 @@
      cdef object children
      cdef public long gdfo
      cdef int seen
++    cdef object extra
      def __init__(self, key):
--        cdef int i
--
          self.key = key
          self.parents = None
@@ -70,6 +70,7 @@
          # Greatest distance from origin
          self.gdfo = -1
          self.seen = 0
++        self.extra = None
      property child_keys:
          def __get__(self):
@@ -115,9 +116,7 @@
      return <_KnownGraphNode>temp_node
--# TODO: slab allocate all _KnownGraphNode objects.
--#       We already know how many we are going to need, except for a couple of
--#       ghosts that could be allocated on demand.
++cdef class _MergeSorter
  cdef class KnownGraph:
      """This is a class which assumes we already know the full graph."""
@@ -136,6 +135,9 @@
          # Maps {sorted(revision_id, revision_id): heads}
          self._known_heads = {}
          self.do_cache = int(do_cache)
++        # TODO: consider disabling gc since we are allocating a lot of nodes
++        #       that won't be collectable anyway. real world testing has not
++        #       shown a specific impact, yet.
          self._initialize_nodes(parent_map)
          self._find_gdfo()
@@ -183,11 +185,16 @@
              parent_keys = <object>temp_parent_keys
              num_parent_keys = len(parent_keys)
              node = self._get_or_create_node(key)
--            # We know how many parents, so we could pre allocate an exact sized
--            # tuple here
++            # We know how many parents, so we pre allocate the tuple
              parent_nodes = PyTuple_New(num_parent_keys)
--            # We use iter here, because parent_keys maybe be a list or tuple
              for pos2 from 0 <= pos2 < num_parent_keys:
++                # Note: it costs us 10ms out of 40ms to lookup all of these
++                #       parents, it doesn't seem to be an allocation overhead,
++                #       but rather a lookup overhead. There doesn't seem to be
++                #       a way around it, and that is one reason why
++                #       KnownGraphNode maintains a direct pointer to the parent
++                #       node.
++                # We use [] because parent_keys may be a tuple or list
                  parent_node = self._get_or_create_node(parent_keys[pos2])
                  # PyTuple_SET_ITEM will steal a reference, so INCREF first
                  Py_INCREF(parent_node)
@@ -335,3 +342,353 @@
          if self.do_cache:
              PyDict_SetItem(self._known_heads, heads_key, heads)
          return heads
++
++    def topo_sort(self):
++        """Return the nodes in topological order.
++
++        All parents must occur before all children.
++        """
++        # This is, for the most part, the same iteration order that we used for
++        # _find_gdfo, consider finding a way to remove the duplication
++        # In general, we find the 'tails' (nodes with no parents), and then
++        # walk to the children. For children that have all of their parents
++        # yielded, we queue up the child to be yielded as well.
++        cdef _KnownGraphNode node
++        cdef _KnownGraphNode child
++        cdef PyObject *temp
++        cdef Py_ssize_t pos
++        cdef int replace
++        cdef Py_ssize_t last_item
++
++        pending = self._find_tails()
++        if PyList_GET_SIZE(pending) == 0 and len(self._nodes) > 0:
++            raise errors.GraphCycleError(self._nodes)
++
++        topo_order = []
++
++        last_item = PyList_GET_SIZE(pending) - 1
++        while last_item >= 0:
++            # Avoid pop followed by push, instead, peek, and replace
++            # timing shows this is 930ms => 770ms for OOo
++            node = _get_list_node(pending, last_item)
++            last_item = last_item - 1
++            if node.parents is not None:
++                # We don't include ghost parents
++                PyList_Append(topo_order, node.key)
++            for pos from 0 <= pos < PyList_GET_SIZE(node.children):
++                child = _get_list_node(node.children, pos)
++                if child.gdfo == -1:
++                    # We know we have a graph cycle because a node has a parent
++                    # which we couldn't find
++                    raise errors.GraphCycleError(self._nodes)
++                child.seen = child.seen + 1
++                if child.seen == PyTuple_GET_SIZE(child.parents):
++                    # All parents of this child have been yielded, queue this
++                    # one to be yielded as well
++                    last_item = last_item + 1
++                    if last_item < PyList_GET_SIZE(pending):
++                        Py_INCREF(child) # SetItem steals a ref
++                        PyList_SetItem(pending, last_item, child)
++                    else:
++                        PyList_Append(pending, child)
++                    # We have queued this node, we don't need to track it
++                    # anymore
++                    child.seen = 0
++        # We started from the parents, so we don't need to do anymore work
++        return topo_order
++
++
++    def merge_sort(self, tip_key):
++        """Compute the merge sorted graph output."""
++        cdef _MergeSorter sorter
++
++        # TODO: consider disabling gc since we are allocating a lot of nodes
++        #       that won't be collectable anyway. real world testing has not
++        #       shown a specific impact, yet.
++        sorter = _MergeSorter(self, tip_key)
++        return sorter.topo_order()
++
++
++cdef class _MergeSortNode:
++    """Tracks information about a node during the merge_sort operation."""
++
++    # Public api
++    cdef public object key
++    cdef public long merge_depth
++    cdef public object end_of_merge # True/False Is this the end of the current merge
++
++    # Private api, used while computing the information
++    cdef _KnownGraphNode left_parent
++    cdef _KnownGraphNode left_pending_parent
++    cdef object pending_parents # list of _KnownGraphNode for non-left parents
++    cdef long _revno_first
++    cdef long _revno_second
++    cdef long _revno_last
++    # TODO: turn these into flag/bit fields rather than individual members
++    cdef int is_first_child # Is this the first child?
++    cdef int seen_by_child # A child node has seen this parent
++    cdef int completed # Fully Processed
++
++    def __init__(self, key):
++        self.key = key
++        self.merge_depth = -1
++        self.left_parent = None
++        self.left_pending_parent = None
++        self.pending_parents = None
++        self._revno_first = -1
++        self._revno_second = -1
++        self._revno_last = -1
++        self.is_first_child = 0
++        self.seen_by_child = 0
++        self.completed = 0
++
++    def __repr__(self):
++        return '%s(depth:%s rev:%s,%s,%s first:%s seen:%s)' % (self.__class__.__name__,
++            self.merge_depth,
++            self._revno_first, self._revno_second, self._revno_last,
++            self.is_first_child, self.seen_by_child)
++
++    cdef int has_pending_parents(self):
++        if self.left_pending_parent is not None or self.pending_parents:
++            return 1
++        return 0
++
++    cdef object _revno(self):
++        if self._revno_first == -1:
++            if self._revno_second != -1:
++                raise RuntimeError('Something wrong with: %s' % (self,))
++            return (self._revno_last,)
++        else:
++            return (self._revno_first, self._revno_second, self._revno_last)
++
++    property revno:
++        def __get__(self):
++            return self._revno()
++
++
++cdef class _MergeSorter:
++    """This class does the work of computing the merge_sort ordering.
++
++    We have some small advantages, in that we get all the extra information
++    that KnownGraph knows, like knowing the child lists, etc.
++    """
++
++    # Current performance numbers for merge_sort(bzr_dev_parent_map):
++    #  302ms tsort.merge_sort()
++    #   91ms graph.KnownGraph().merge_sort()
++    #   40ms kg.merge_sort()
++
++    cdef KnownGraph graph
++    cdef object _depth_first_stack  # list
++    cdef Py_ssize_t _last_stack_item # offset to last item on stack
++    # cdef object _ms_nodes # dict of key => _MergeSortNode
++    cdef object _revno_to_branch_count # {revno => num child branches}
++    cdef object _scheduled_nodes # List of nodes ready to be yielded
++
++    def __init__(self, known_graph, tip_key):
++        cdef _KnownGraphNode node
++
++        self.graph = known_graph
++        # self._ms_nodes = {}
++        self._revno_to_branch_count = {}
++        self._depth_first_stack = []
++        self._last_stack_item = -1
++        self._scheduled_nodes = []
++        if (tip_key is not None and tip_key != NULL_REVISION
++            and tip_key != (NULL_REVISION,)):
++            node = self.graph._nodes[tip_key]
++            self._get_ms_node(node)
++            self._push_node(node, 0)
++
++    cdef _MergeSortNode _get_ms_node(self, _KnownGraphNode node):
++        cdef PyObject *temp_node
++        cdef _MergeSortNode ms_node
++
++        if node.extra is None:
++            ms_node = _MergeSortNode(node.key)
++            node.extra = ms_node
++        else:
++            ms_node = <_MergeSortNode>node.extra
++        return ms_node
++
++    cdef _push_node(self, _KnownGraphNode node, long merge_depth):
++        cdef _KnownGraphNode parent_node
++        cdef _MergeSortNode ms_node, ms_parent_node
++        cdef Py_ssize_t pos
++
++        ms_node = self._get_ms_node(node)
++        ms_node.merge_depth = merge_depth
++        if PyTuple_GET_SIZE(node.parents) > 0:
++            parent_node = _get_parent(node.parents, 0)
++            ms_node.left_parent = parent_node
++            ms_node.left_pending_parent = parent_node
++        if PyTuple_GET_SIZE(node.parents) > 1:
++            ms_node.pending_parents = []
++            for pos from 1 <= pos < PyTuple_GET_SIZE(node.parents):
++                parent_node = _get_parent(node.parents, pos)
++                if parent_node.parents is None: # ghost
++                    continue
++                PyList_Append(ms_node.pending_parents, parent_node)
++
++        ms_node.is_first_child = 1
++        if ms_node.left_parent is not None:
++            ms_parent_node = self._get_ms_node(ms_node.left_parent)
++            if ms_parent_node.seen_by_child:
++                ms_node.is_first_child = 0
++            ms_parent_node.seen_by_child = 1
++        self._last_stack_item = self._last_stack_item + 1
++        if self._last_stack_item < PyList_GET_SIZE(self._depth_first_stack):
++            Py_INCREF(node) # SetItem steals a ref
++            PyList_SetItem(self._depth_first_stack, self._last_stack_item,
++                           node)
++        else:
++            PyList_Append(self._depth_first_stack, node)
++
++    cdef _pop_node(self):
++        cdef PyObject *temp
++        cdef _MergeSortNode ms_node, ms_parent_node, ms_prev_node
++        cdef _KnownGraphNode node, parent_node, prev_node
++
++        node = _get_list_node(self._depth_first_stack, self._last_stack_item)
++        ms_node = <_MergeSortNode>node.extra
++        self._last_stack_item = self._last_stack_item - 1
++        if ms_node.left_parent is not None:
++            # Assign the revision number from the left-hand parent
++            ms_parent_node = <_MergeSortNode>ms_node.left_parent.extra
++            if ms_node.is_first_child:
++                # First child just increments the final digit
++                ms_node._revno_first = ms_parent_node._revno_first
++                ms_node._revno_second = ms_parent_node._revno_second
++                ms_node._revno_last = ms_parent_node._revno_last + 1
++            else:
++                # Not the first child, make a new branch
++                #  (mainline_revno, branch_count, 1)
++                if ms_parent_node._revno_first == -1:
++                    # Mainline ancestor, the increment is on the last digit
++                    base_revno = ms_parent_node._revno_last
++                else:
++                    base_revno = ms_parent_node._revno_first
++                temp = PyDict_GetItem(self._revno_to_branch_count,
++                                      base_revno)
++                if temp == NULL:
++                    branch_count = 1
++                else:
++                    branch_count = (<object>temp) + 1
++                PyDict_SetItem(self._revno_to_branch_count, base_revno,
++                               branch_count)
++                ms_node._revno_first = base_revno
++                ms_node._revno_second = branch_count
++                ms_node._revno_last = 1
++        else:
++            temp = PyDict_GetItem(self._revno_to_branch_count, 0)
++            if temp == NULL:
++                # The first root node doesn't have a 3-digit revno
++                root_count = 0
++                ms_node._revno_first = -1
++                ms_node._revno_second = -1
++                ms_node._revno_last = 1
++            else:
++                root_count = (<object>temp) + 1
++                ms_node._revno_first = 0
++                ms_node._revno_second = root_count
++                ms_node._revno_last = 1
++            PyDict_SetItem(self._revno_to_branch_count, 0, root_count)
++        ms_node.completed = 1
++        if PyList_GET_SIZE(self._scheduled_nodes) == 0:
++            # The first scheduled node is always the end of merge
++            ms_node.end_of_merge = True
++        else:
++            prev_node = _get_list_node(self._scheduled_nodes,
++                                    PyList_GET_SIZE(self._scheduled_nodes) - 1)
++            ms_prev_node = <_MergeSortNode>prev_node.extra
++            if ms_prev_node.merge_depth < ms_node.merge_depth:
++                # The previously pushed node is to our left, so this is the end
++                # of this right-hand chain
++                ms_node.end_of_merge = True
++            elif (ms_prev_node.merge_depth == ms_node.merge_depth
++                  and prev_node not in node.parents):
++                # The next node is not a direct parent of this node
++                ms_node.end_of_merge = True
++            else:
++                ms_node.end_of_merge = False
++        PyList_Append(self._scheduled_nodes, node)
++
++    cdef _schedule_stack(self):
++        cdef _KnownGraphNode last_node, next_node
++        cdef _MergeSortNode ms_node, ms_last_node, ms_next_node
++        cdef long next_merge_depth
++        ordered = []
++        while self._last_stack_item >= 0:
++            # Peek at the last item on the stack
++            last_node = _get_list_node(self._depth_first_stack,
++                                       self._last_stack_item)
++            if last_node.gdfo == -1:
++                # if _find_gdfo skipped a node, that means there is a graph
++                # cycle, error out now
++                raise errors.GraphCycleError(self.graph._nodes)
++            ms_last_node = <_MergeSortNode>last_node.extra
++            if not ms_last_node.has_pending_parents():
++                # Processed all parents, pop this node
++                self._pop_node()
++                continue
++            while ms_last_node.has_pending_parents():
++                if ms_last_node.left_pending_parent is not None:
++                    # recurse depth first into the primary parent
++                    next_node = ms_last_node.left_pending_parent
++                    ms_last_node.left_pending_parent = None
++                else:
++                    # place any merges in right-to-left order for scheduling
++                    # which gives us left-to-right order after we reverse
++                    # the scheduled queue.
++                    # Note: This has the effect of allocating common-new
++                    #       revisions to the right-most subtree rather than the
++                    #       left most, which will display nicely (you get
++                    #       smaller trees at the top of the combined merge).
++                    next_node = ms_last_node.pending_parents.pop()
++                ms_next_node = self._get_ms_node(next_node)
++                if ms_next_node.completed:
++                    # this parent was completed by a child on the
++                    # call stack. skip it.
++                    continue
++                # otherwise transfer it from the source graph into the
++                # top of the current depth first search stack.
++
++                if next_node is ms_last_node.left_parent:
++                    next_merge_depth = ms_last_node.merge_depth
++                else:
++                    next_merge_depth = ms_last_node.merge_depth + 1
++                self._push_node(next_node, next_merge_depth)
++                # and do not continue processing parents until this 'call'
++                # has recursed.
++                break
++
++    cdef topo_order(self):
++        cdef _MergeSortNode ms_node
++        cdef _KnownGraphNode node
++        cdef Py_ssize_t pos
++        cdef PyObject *temp_key, *temp_node
++
++        # Note: allocating a _MergeSortNode and deallocating it for all nodes
++        #       costs approx 8.52ms (21%) of the total runtime
++        #       We might consider moving the attributes into the base
++        #       KnownGraph object.
++        self._schedule_stack()
++
++        # We've set up the basic schedule, now we can continue processing the
++        # output.
++        # Note: This final loop costs us 40.0ms => 28.8ms (11ms, 25%) on
++        #       bzr.dev, to convert the internal Object representation into a
++        #       Tuple representation...
++        #       2ms is walking the data and computing revno tuples
++        #       7ms is computing the return tuple
++        #       4ms is PyList_Append()
++        ordered = []
++        # output the result in reverse order, and separate the generated info
++        for pos from PyList_GET_SIZE(self._scheduled_nodes) > pos >= 0:
++            node = _get_list_node(self._scheduled_nodes, pos)
++            ms_node = <_MergeSortNode>node.extra
++            PyList_Append(ordered, ms_node)
++            node.extra = None
++        # Clear out the scheduled nodes now that we're done
++        self._scheduled_nodes = []
++        return ordered
 === modified file 'bzrlib/annotate.py'
 --- bzrlib/annotate.py	2009-07-08 17:09:03 +0000
 +++ bzrlib/annotate.py	2009-08-19 16:35:14 +0000
@@ -188,6 +188,10 @@
          # or something.
          last_revision = current_rev.revision_id
          # XXX: Partially Cloned from branch, uses the old_get_graph, eep.
++        # XXX: The main difficulty is that we need to inject a single new node
++        #      (current_rev) into the graph before it gets numbered, etc.
++        #      Once KnownGraph gets an 'add_node()' function, we can use
++        #      VF.get_known_graph_ancestry().
          graph = repository.get_graph()
          revision_graph = dict(((key, value) for key, value in
              graph.iter_ancestry(current_rev.parent_ids) if value is not None))
 === modified file 'bzrlib/branch.py'
 --- bzrlib/branch.py	2009-08-17 06:22:18 +0000
 +++ bzrlib/branch.py	2009-08-19 16:35:14 +0000
@@ -446,15 +446,11 @@
          # start_revision_id.
          if self._merge_sorted_revisions_cache is None:
              last_revision = self.last_revision()
--            graph = self.repository.get_graph()
--            parent_map = dict(((key, value) for key, value in
--                     graph.iter_ancestry([last_revision]) if value is not None))
--            revision_graph = repository._strip_NULL_ghosts(parent_map)
--            revs = tsort.merge_sort(revision_graph, last_revision, None,
--                generate_revno=True)
--            # Drop the sequence # before caching
--            self._merge_sorted_revisions_cache = [r[1:] for r in revs]
--
++            last_key = (last_revision,)
++            known_graph = self.repository.revisions.get_known_graph_ancestry(
++                [last_key])
++            self._merge_sorted_revisions_cache = known_graph.merge_sort(
++                last_key)
          filtered = self._filter_merge_sorted_revisions(
              self._merge_sorted_revisions_cache, start_revision_id,
              stop_revision_id, stop_rule)
@@ -470,27 +466,34 @@
          """Iterate over an inclusive range of sorted revisions."""
          rev_iter = iter(merge_sorted_revisions)
          if start_revision_id is not None:
--            for rev_id, depth, revno, end_of_merge in rev_iter:
++            for node in rev_iter:
++                rev_id = node.key[-1]
                  if rev_id != start_revision_id:
                      continue
                  else:
                      # The decision to include the start or not
                      # depends on the stop_rule if a stop is provided
--                    rev_iter = chain(
--                        iter([(rev_id, depth, revno, end_of_merge)]),
--                        rev_iter)
++                    # so pop this node back into the iterator
++                    rev_iter = chain(iter([node]), rev_iter)
                      break
          if stop_revision_id is None:
--            for rev_id, depth, revno, end_of_merge in rev_iter:
--                yield rev_id, depth, revno, end_of_merge
++            # Yield everything
++            for node in rev_iter:
++                rev_id = node.key[-1]
++                yield (rev_id, node.merge_depth, node.revno,
++                       node.end_of_merge)
          elif stop_rule == 'exclude':
--            for rev_id, depth, revno, end_of_merge in rev_iter:
++            for node in rev_iter:
++                rev_id = node.key[-1]
                  if rev_id == stop_revision_id:
                      return
--                yield rev_id, depth, revno, end_of_merge
++                yield (rev_id, node.merge_depth, node.revno,
++                       node.end_of_merge)
          elif stop_rule == 'include':
--            for rev_id, depth, revno, end_of_merge in rev_iter:
--                yield rev_id, depth, revno, end_of_merge
++            for node in rev_iter:
++                rev_id = node.key[-1]
++                yield (rev_id, node.merge_depth, node.revno,
++                       node.end_of_merge)
                  if rev_id == stop_revision_id:
                      return
          elif stop_rule == 'with-merges':
@@ -499,10 +502,12 @@
                  left_parent = stop_rev.parent_ids[0]
              else:
                  left_parent = _mod_revision.NULL_REVISION
--            for rev_id, depth, revno, end_of_merge in rev_iter:
++            for node in rev_iter:
++                rev_id = node.key[-1]
                  if rev_id == left_parent:
                      return
--                yield rev_id, depth, revno, end_of_merge
++                yield (rev_id, node.merge_depth, node.revno,
++                       node.end_of_merge)
          else:
              raise ValueError('invalid stop_rule %r' % stop_rule)
 === modified file 'bzrlib/graph.py'
 --- bzrlib/graph.py	2009-08-04 04:36:34 +0000
 +++ bzrlib/graph.py	2009-08-19 16:35:14 +0000
@@ -21,7 +21,6 @@
      errors,
      revision,
      trace,
--    tsort,
+     )
  from bzrlib.symbol_versioning import deprecated_function, deprecated_in
@@ -926,6 +925,7 @@
          An ancestor may sort after a descendant if the relationship is not
          visible in the supplied list of revisions.
          """
++        from bzrlib import tsort
          sorter = tsort.TopoSorter(self.get_parent_map(revisions))
          return sorter.iter_topo_order()
 === modified file 'bzrlib/groupcompress.py'
 --- bzrlib/groupcompress.py	2009-08-04 04:36:34 +0000
 +++ bzrlib/groupcompress.py	2009-08-19 16:35:14 +0000
@@ -62,16 +62,15 @@
      # groupcompress ordering is approximately reverse topological,
      # properly grouped by file-id.
      per_prefix_map = {}
--    for item in parent_map.iteritems():
--        key = item[0]
++    for key, value in parent_map.iteritems():
          if isinstance(key, str) or len(key) == 1:
              prefix = ''
          else:
              prefix = key[0]
          try:
--            per_prefix_map[prefix].append(item)
++            per_prefix_map[prefix][key] = value
          except KeyError:
--            per_prefix_map[prefix] = [item]
++            per_prefix_map[prefix] = {key: value}
      present_keys = []
      for prefix in sorted(per_prefix_map):
@@ -1099,6 +1098,13 @@
              self._check_lines_not_unicode(lines)
              self._check_lines_are_lines(lines)
++    def get_known_graph_ancestry(self, keys):
++        """Get a KnownGraph instance with the ancestry of keys."""
++        parent_map, missing_keys = self._index._graph_index.find_ancestry(keys,
++                                                                          0)
++        kg = _mod_graph.KnownGraph(parent_map)
++        return kg
++
      def get_parent_map(self, keys):
          """Get a map of the graph parents of keys.
 === modified file 'bzrlib/index.py'
 --- bzrlib/index.py	2009-08-13 19:56:26 +0000
 +++ bzrlib/index.py	2009-08-19 16:35:14 +0000
@@ -333,6 +333,22 @@
          if combine_backing_indices is not None:
              self._combine_backing_indices = combine_backing_indices
++    def find_ancestry(self, keys, ref_list_num):
++        """See CombinedGraphIndex.find_ancestry()"""
++        pending = set(keys)
++        parent_map = {}
++        missing_keys = set()
++        while pending:
++            next_pending = set()
++            for _, key, value, ref_lists in self.iter_entries(pending):
++                parent_keys = ref_lists[ref_list_num]
++                parent_map[key] = parent_keys
++                next_pending.update([p for p in parent_keys if p not in
++                                     parent_map])
++                missing_keys.update(pending.difference(parent_map))
++            pending = next_pending
++        return parent_map, missing_keys
++
  class GraphIndex(object):
      """An index for data with embedded graphs.
 === modified file 'bzrlib/knit.py'
 --- bzrlib/knit.py	2009-08-04 04:36:34 +0000
 +++ bzrlib/knit.py	2009-08-19 16:35:14 +0000
@@ -1190,6 +1190,12 @@
          generator = _VFContentMapGenerator(self, [key])
          return generator._get_content(key)
++    def get_known_graph_ancestry(self, keys):
++        """Get a KnownGraph instance with the ancestry of keys."""
++        parent_map, missing_keys = self._index.find_ancestry(keys)
++        kg = _mod_graph.KnownGraph(parent_map)
++        return kg
++
      def get_parent_map(self, keys):
          """Get a map of the graph parents of keys.
@@ -2560,6 +2566,33 @@
          except KeyError:
              raise RevisionNotPresent(key, self)
++    def find_ancestry(self, keys):
++        """See CombinedGraphIndex.find_ancestry()"""
++        prefixes = set(key[:-1] for key in keys)
++        self._load_prefixes(prefixes)
++        result = {}
++        parent_map = {}
++        missing_keys = set()
++        pending_keys = list(keys)
++        # This assumes that keys will not reference parents in a different
++        # prefix, which is accurate so far.
++        while pending_keys:
++            key = pending_keys.pop()
++            if key in parent_map:
++                continue
++            prefix = key[:-1]
++            try:
++                suffix_parents = self._kndx_cache[prefix][0][key[-1]][4]
++            except KeyError:
++                missing_keys.add(key)
++            else:
++                parent_keys = tuple([prefix + (suffix,)
++                                     for suffix in suffix_parents])
++                parent_map[key] = parent_keys
++                pending_keys.extend([p for p in parent_keys
++                                        if p not in parent_map])
++        return parent_map, missing_keys
++
      def get_parent_map(self, keys):
          """Get a map of the parents of keys.
@@ -3049,6 +3082,10 @@
              options.append('no-eol')
          return options
++    def find_ancestry(self, keys):
++        """See CombinedGraphIndex.find_ancestry()"""
++        return self._graph_index.find_ancestry(keys, 0)
++
      def get_parent_map(self, keys):
          """Get a map of the parents of keys.
 === modified file 'bzrlib/missing.py'
 --- bzrlib/missing.py	2009-03-23 14:59:43 +0000
 +++ bzrlib/missing.py	2009-08-19 16:35:14 +0000
@@ -138,31 +138,13 @@
      if not ancestry: #Empty ancestry, no need to do any work
          return []
--    mainline_revs, rev_nos, start_rev_id, end_rev_id = log._get_mainline_revs(
--        branch, None, tip_revno)
--    if not mainline_revs:
--        return []
--
--    # This asks for all mainline revisions, which is size-of-history and
--    # should be addressed (but currently the only way to get correct
--    # revnos).
--
--    # mainline_revisions always includes an extra revision at the
--    # beginning, so don't request it.
--    parent_map = dict(((key, value) for key, value
--                       in graph.iter_ancestry(mainline_revs[1:])
--                       if value is not None))
--    # filter out ghosts; merge_sort errors on ghosts.
--    # XXX: is this needed here ? -- vila080910
--    rev_graph = _mod_repository._strip_NULL_ghosts(parent_map)
--    # XXX: what if rev_graph is empty now ? -- vila080910
--    merge_sorted_revisions = tsort.merge_sort(rev_graph, tip,
--                                              mainline_revs,
--                                              generate_revno=True)
++    merge_sorted_revisions = branch.iter_merge_sorted_revisions()
      # Now that we got the correct revnos, keep only the relevant
      # revisions.
      merge_sorted_revisions = [
--        (s, revid, n, d, e) for s, revid, n, d, e in merge_sorted_revisions
++        # log.reverse_by_depth expects seq_num to be present, but it is
++        # stripped by iter_merge_sorted_revisions()
++        (0, revid, n, d, e) for revid, n, d, e in merge_sorted_revisions
          if revid in ancestry]
      if not backward:
          merge_sorted_revisions = log.reverse_by_depth(merge_sorted_revisions)
 === modified file 'bzrlib/reconcile.py'
 --- bzrlib/reconcile.py	2009-06-10 03:56:49 +0000
 +++ bzrlib/reconcile.py	2009-08-19 16:35:14 +0000
@@ -33,7 +33,7 @@
      repofmt,
+     )
  from bzrlib.trace import mutter, note
--from bzrlib.tsort import TopoSorter
++from bzrlib.tsort import topo_sort
  from bzrlib.versionedfile import AdapterFactory, FulltextContentFactory
@@ -247,8 +247,7 @@
          # we have topological order of revisions and non ghost parents ready.
          self._setup_steps(len(self._rev_graph))
--        revision_keys = [(rev_id,) for rev_id in
--            TopoSorter(self._rev_graph.items()).iter_topo_order()]
++        revision_keys = [(rev_id,) for rev_id in topo_sort(self._rev_graph)]
          stream = self._change_inv_parents(
              self.inventory.get_record_stream(revision_keys, 'unordered', True),
              self._new_inv_parents,
@@ -378,7 +377,7 @@
          new_inventories = self.repo._temp_inventories()
          # we have topological order of revisions and non ghost parents ready.
          graph = self.revisions.get_parent_map(self.revisions.keys())
--        revision_keys = list(TopoSorter(graph).iter_topo_order())
++        revision_keys = topo_sort(graph)
          revision_ids = [key[-1] for key in revision_keys]
          self._setup_steps(len(revision_keys))
          stream = self._change_inv_parents(
 === modified file 'bzrlib/repofmt/weaverepo.py'
 --- bzrlib/repofmt/weaverepo.py	2009-08-14 11:11:29 +0000
 +++ bzrlib/repofmt/weaverepo.py	2009-08-19 16:35:14 +0000
@@ -28,6 +28,7 @@
  lazy_import(globals(), """
  from bzrlib import (
      xml5,
++    graph as _mod_graph,
+     )
  """)
  from bzrlib import (
@@ -663,6 +664,13 @@
              result[key] = parents
          return result
++    def get_known_graph_ancestry(self, keys):
++        """Get a KnownGraph instance with the ancestry of keys."""
++        keys = self.keys()
++        parent_map = self.get_parent_map(keys)
++        kg = _mod_graph.KnownGraph(parent_map)
++        return kg
++
      def get_record_stream(self, keys, sort_order, include_delta_closure):
          for key in keys:
              text, parents = self._load_text_parents(key)
 === modified file 'bzrlib/repository.py'
 --- bzrlib/repository.py	2009-08-17 23:15:55 +0000
 +++ bzrlib/repository.py	2009-08-19 16:35:14 +0000
@@ -4351,7 +4351,7 @@
          phase = 'file'
          revs = search.get_keys()
          graph = self.from_repository.get_graph()
--        revs = list(graph.iter_topo_order(revs))
++        revs = tsort.topo_sort(graph.get_parent_map(revs))
          data_to_fetch = self.from_repository.item_keys_introduced_by(revs)
          text_keys = []
          for knit_kind, file_id, revisions in data_to_fetch:
 === modified file 'bzrlib/tests/__init__.py'
 --- bzrlib/tests/__init__.py	2009-08-18 14:20:28 +0000
 +++ bzrlib/tests/__init__.py	2009-08-19 16:35:15 +0000
@@ -3434,6 +3434,7 @@
                     'bzrlib.tests.per_repository',
                     'bzrlib.tests.per_repository_chk',
                     'bzrlib.tests.per_repository_reference',
++                   'bzrlib.tests.per_versionedfile',
                     'bzrlib.tests.per_workingtree',
                     'bzrlib.tests.test__annotator',
                     'bzrlib.tests.test__chk_map',
@@ -3585,7 +3586,6 @@
                     'bzrlib.tests.test_urlutils',
                     'bzrlib.tests.test_version',
                     'bzrlib.tests.test_version_info',
--                   'bzrlib.tests.test_versionedfile',
                     'bzrlib.tests.test_weave',
                     'bzrlib.tests.test_whitebox',
                     'bzrlib.tests.test_win32utils',
 === modified file 'bzrlib/tests/blackbox/test_ancestry.py'
 --- bzrlib/tests/blackbox/test_ancestry.py	2009-03-23 14:59:43 +0000
 +++ bzrlib/tests/blackbox/test_ancestry.py	2009-08-19 16:35:15 +0000
@@ -43,9 +43,15 @@
      def _check_ancestry(self, location='', result=None):
          out = self.run_bzr(['ancestry', location])[0]
--        if result is None:
++        if result is not None:
++            self.assertEqualDiff(result, out)
++        else:
++            # A2 and B1 can be in either order, because they are parallel, and
++            # thus their topological order is not defined
              result = "A1\nB1\nA2\nA3\n"
--        self.assertEqualDiff(out, result)
++            if result != out:
++                result = "A1\nA2\nB1\nA3\n"
++            self.assertEqualDiff(result, out)
      def test_ancestry(self):
          """Tests 'ancestry' command"""
 === renamed file 'bzrlib/tests/test_versionedfile.py' => 'bzrlib/tests/per_versionedfile.py'
 --- bzrlib/tests/test_versionedfile.py	2009-08-04 04:36:34 +0000
 +++ bzrlib/tests/per_versionedfile.py	2009-08-19 16:35:15 +0000
@@ -26,6 +26,7 @@
  from bzrlib import (
      errors,
++    graph as _mod_graph,
      groupcompress,
      knit as _mod_knit,
      osutils,
@@ -1737,6 +1738,25 @@
              f.get_record_stream([key_b], 'unordered', True
                  ).next().get_bytes_as('fulltext'))
++    def test_get_known_graph_ancestry(self):
++        f = self.get_versionedfiles()
++        if not self.graph:
++            raise TestNotApplicable('ancestry info only relevant with graph.')
++        key_a = self.get_simple_key('a')
++        key_b = self.get_simple_key('b')
++        key_c = self.get_simple_key('c')
++        # A
++        # |\
++        # | B
++        # |/
++        # C
++        f.add_lines(key_a, [], ['\n'])
++        f.add_lines(key_b, [key_a], ['\n'])
++        f.add_lines(key_c, [key_a, key_b], ['\n'])
++        kg = f.get_known_graph_ancestry([key_c])
++        self.assertIsInstance(kg, _mod_graph.KnownGraph)
++        self.assertEqual([key_a, key_b, key_c], list(kg.topo_sort()))
++
      def test_get_record_stream_empty(self):
          """An empty stream can be requested without error."""
          f = self.get_versionedfiles()
 === modified file 'bzrlib/tests/test__known_graph.py'
 --- bzrlib/tests/test__known_graph.py	2009-07-08 20:58:10 +0000
 +++ bzrlib/tests/test__known_graph.py	2009-08-19 16:35:15 +0000
@@ -16,6 +16,8 @@
  """Tests for the python and pyrex extensions of KnownGraph"""
++import pprint
++
  from bzrlib import (
      errors,
      graph as _mod_graph,
@@ -30,13 +32,15 @@
      """Parameterize tests for all versions of groupcompress."""
      scenarios = [
          ('python', {'module': _known_graph_py, 'do_cache': True}),
++    ]
++    caching_scenarios = [
          ('python-nocache', {'module': _known_graph_py, 'do_cache': False}),
+     ]
      suite = loader.suiteClass()
      if CompiledKnownGraphFeature.available():
          from bzrlib import _known_graph_pyx
          scenarios.append(('C', {'module': _known_graph_pyx, 'do_cache': True}))
--        scenarios.append(('C-nocache',
++        caching_scenarios.append(('C-nocache',
                            {'module': _known_graph_pyx, 'do_cache': False}))
      else:
          # the compiled module isn't available, so we add a failing test
@@ -44,8 +48,14 @@
              def test_fail(self):
                  self.requireFeature(CompiledKnownGraphFeature)
          suite.addTest(loader.loadTestsFromTestCase(FailWithoutFeature))
--    result = tests.multiply_tests(standard_tests, scenarios, suite)
--    return result
++    # TestKnownGraphHeads needs to be permutated with and without caching.
++    # All other TestKnownGraph tests only need to be tested across module
++    heads_suite, other_suite = tests.split_suite_by_condition(
++        standard_tests, tests.condition_isinstance(TestKnownGraphHeads))
++    suite = tests.multiply_tests(other_suite, scenarios, suite)
++    suite = tests.multiply_tests(heads_suite, scenarios + caching_scenarios,
++                                 suite)
++    return suite
  class _CompiledKnownGraphFeature(tests.Feature):
@@ -73,14 +83,16 @@
  alt_merge = {'a': [], 'b': ['a'], 'c': ['b'], 'd': ['a', 'c']}
--class TestKnownGraph(tests.TestCase):
++class TestCaseWithKnownGraph(tests.TestCase):
      module = None # Set by load_tests
--    do_cache = None # Set by load_tests
      def make_known_graph(self, ancestry):
          return self.module.KnownGraph(ancestry, do_cache=self.do_cache)
++
++class TestKnownGraph(TestCaseWithKnownGraph):
++
      def assertGDFO(self, graph, rev, gdfo):
          node = graph._nodes[rev]
          self.assertEqual(gdfo, node.gdfo)
@@ -127,6 +139,11 @@
          self.assertGDFO(graph, 'a', 5)
          self.assertGDFO(graph, 'c', 5)
++
++class TestKnownGraphHeads(TestCaseWithKnownGraph):
++
++    do_cache = None # Set by load_tests
++
      def test_heads_null(self):
          graph = self.make_known_graph(test_graph.ancestry_1)
          self.assertEqual(set(['null:']), graph.heads(['null:']))
@@ -227,3 +244,513 @@
          self.assertEqual(set(['c']), graph.heads(['c', 'b', 'd', 'g']))
          self.assertEqual(set(['a', 'c']), graph.heads(['a', 'c', 'e', 'g']))
          self.assertEqual(set(['a', 'c']), graph.heads(['a', 'c', 'f']))
++
++
++class TestKnownGraphTopoSort(TestCaseWithKnownGraph):
++
++    def assertTopoSortOrder(self, ancestry):
++        """Check topo_sort and iter_topo_order is genuinely topological order.
++
++        For every child in the graph, check if it comes after all of it's
++        parents.
++        """
++        graph = self.make_known_graph(ancestry)
++        sort_result = graph.topo_sort()
++        # We should have an entry in sort_result for every entry present in the
++        # graph.
++        self.assertEqual(len(ancestry), len(sort_result))
++        node_idx = dict((node, idx) for idx, node in enumerate(sort_result))
++        for node in sort_result:
++            parents = ancestry[node]
++            for parent in parents:
++                if parent not in ancestry:
++                    # ghost
++                    continue
++                if node_idx[node] <= node_idx[parent]:
++                    self.fail("parent %s must come before child %s:\n%s"
++                              % (parent, node, sort_result))
++
++    def test_topo_sort_empty(self):
++        """TopoSort empty list"""
++        self.assertTopoSortOrder({})
++
++    def test_topo_sort_easy(self):
++        """TopoSort list with one node"""
++        self.assertTopoSortOrder({0: []})
++
++    def test_topo_sort_cycle(self):
++        """TopoSort traps graph with cycles"""
++        g = self.make_known_graph({0: [1],
++                                  1: [0]})
++        self.assertRaises(errors.GraphCycleError, g.topo_sort)
++
++    def test_topo_sort_cycle_2(self):
++        """TopoSort traps graph with longer cycle"""
++        g = self.make_known_graph({0: [1],
++                                   1: [2],
++                                   2: [0]})
++        self.assertRaises(errors.GraphCycleError, g.topo_sort)
++
++    def test_topo_sort_cycle_with_tail(self):
++        """TopoSort traps graph with longer cycle"""
++        g = self.make_known_graph({0: [1],
++                                   1: [2],
++                                   2: [3, 4],
++                                   3: [0],
++                                   4: []})
++        self.assertRaises(errors.GraphCycleError, g.topo_sort)
++
++    def test_topo_sort_1(self):
++        """TopoSort simple nontrivial graph"""
++        self.assertTopoSortOrder({0: [3],
++                                  1: [4],
++                                  2: [1, 4],
++                                  3: [],
++                                  4: [0, 3]})
++
++    def test_topo_sort_partial(self):
++        """Topological sort with partial ordering.
++
++        Multiple correct orderings are possible, so test for
++        correctness, not for exact match on the resulting list.
++        """
++        self.assertTopoSortOrder({0: [],
++                                  1: [0],
++                                  2: [0],
++                                  3: [0],
++                                  4: [1, 2, 3],
++                                  5: [1, 2],
++                                  6: [1, 2],
++                                  7: [2, 3],
++                                  8: [0, 1, 4, 5, 6]})
++
++    def test_topo_sort_ghost_parent(self):
++        """Sort nodes, but don't include some parents in the output"""
++        self.assertTopoSortOrder({0: [1],
++                                  1: [2]})
++
++
++class TestKnownGraphMergeSort(TestCaseWithKnownGraph):
++
++    def assertSortAndIterate(self, ancestry, branch_tip, result_list):
++        """Check that merge based sorting and iter_topo_order on graph works."""
++        graph = self.make_known_graph(ancestry)
++        value = graph.merge_sort(branch_tip)
++        value = [(n.key, n.merge_depth, n.revno, n.end_of_merge)
++                 for n in value]
++        if result_list != value:
++            self.assertEqualDiff(pprint.pformat(result_list),
++                                 pprint.pformat(value))
++
++    def test_merge_sort_empty(self):
++        # sorting of an emptygraph does not error
++        self.assertSortAndIterate({}, None, [])
++        self.assertSortAndIterate({}, NULL_REVISION, [])
++        self.assertSortAndIterate({}, (NULL_REVISION,), [])
++
++    def test_merge_sort_not_empty_no_tip(self):
++        # merge sorting of a branch starting with None should result
++        # in an empty list: no revisions are dragged in.
++        self.assertSortAndIterate({0: []}, None, [])
++        self.assertSortAndIterate({0: []}, NULL_REVISION, [])
++        self.assertSortAndIterate({0: []}, (NULL_REVISION,), [])
++
++    def test_merge_sort_one_revision(self):
++        # sorting with one revision as the tip returns the correct fields:
++        # sequence - 0, revision id, merge depth - 0, end_of_merge
++        self.assertSortAndIterate({'id': []},
++                                  'id',
++                                  [('id', 0, (1,), True)])
++
++    def test_sequence_numbers_increase_no_merges(self):
++        # emit a few revisions with no merges to check the sequence
++        # numbering works in trivial cases
++        self.assertSortAndIterate(
++            {'A': [],
++             'B': ['A'],
++             'C': ['B']},
++            'C',
++            [('C', 0, (3,), False),
++             ('B', 0, (2,), False),
++             ('A', 0, (1,), True),
++             ],
++            )
++
++    def test_sequence_numbers_increase_with_merges(self):
++        # test that sequence numbers increase across merges
++        self.assertSortAndIterate(
++            {'A': [],
++             'B': ['A'],
++             'C': ['A', 'B']},
++            'C',
++            [('C', 0, (2,), False),
++             ('B', 1, (1,1,1), True),
++             ('A', 0, (1,), True),
++             ],
++            )
++
++    def test_merge_sort_race(self):
++        # A
++        # |
++        # B-.
++        # |\ \
++        # | | C
++        # | |/
++        # | D
++        # |/
++        # F
++        graph = {'A': [],
++                 'B': ['A'],
++                 'C': ['B'],
++                 'D': ['B', 'C'],
++                 'F': ['B', 'D'],
++                 }
++        self.assertSortAndIterate(graph, 'F',
++            [('F', 0, (3,), False),
++             ('D', 1, (2,2,1), False),
++             ('C', 2, (2,1,1), True),
++             ('B', 0, (2,), False),
++             ('A', 0, (1,), True),
++             ])
++        # A
++        # |
++        # B-.
++        # |\ \
++        # | X C
++        # | |/
++        # | D
++        # |/
++        # F
++        graph = {'A': [],
++                 'B': ['A'],
++                 'C': ['B'],
++                 'X': ['B'],
++                 'D': ['X', 'C'],
++                 'F': ['B', 'D'],
++                 }
++        self.assertSortAndIterate(graph, 'F',
++            [('F', 0, (3,), False),
++             ('D', 1, (2,1,2), False),
++             ('C', 2, (2,2,1), True),
++             ('X', 1, (2,1,1), True),
++             ('B', 0, (2,), False),
++             ('A', 0, (1,), True),
++             ])
++
++    def test_merge_depth_with_nested_merges(self):
++        # the merge depth marker should reflect the depth of the revision
++        # in terms of merges out from the mainline
++        # revid, depth, parents:
++        #  A 0   [D, B]
++        #  B  1  [C, F]
++        #  C  1  [H]
++        #  D 0   [H, E]
++        #  E  1  [G, F]
++        #  F   2 [G]
++        #  G  1  [H]
++        #  H 0
++        self.assertSortAndIterate(
++            {'A': ['D', 'B'],
++             'B': ['C', 'F'],
++             'C': ['H'],
++             'D': ['H', 'E'],
++             'E': ['G', 'F'],
++             'F': ['G'],
++             'G': ['H'],
++             'H': []
++             },
++            'A',
++            [('A', 0, (3,),  False),
++             ('B', 1, (1,3,2), False),
++             ('C', 1, (1,3,1), True),
++             ('D', 0, (2,), False),
++             ('E', 1, (1,1,2), False),
++             ('F', 2, (1,2,1), True),
++             ('G', 1, (1,1,1), True),
++             ('H', 0, (1,), True),
++             ],
++            )
++
++    def test_dotted_revnos_with_simple_merges(self):
++        # A         1
++        # |\
++        # B C       2, 1.1.1
++        # | |\
++        # D E F     3, 1.1.2, 1.2.1
++        # |/ /|
++        # G H I     4, 1.2.2, 1.3.1
++        # |/ /
++        # J K       5, 1.3.2
++        # |/
++        # L         6
++        self.assertSortAndIterate(
++            {'A': [],
++             'B': ['A'],
++             'C': ['A'],
++             'D': ['B'],
++             'E': ['C'],
++             'F': ['C'],
++             'G': ['D', 'E'],
++             'H': ['F'],
++             'I': ['F'],
++             'J': ['G', 'H'],
++             'K': ['I'],
++             'L': ['J', 'K'],
++            },
++            'L',
++            [('L', 0, (6,), False),
++             ('K', 1, (1,3,2), False),
++             ('I', 1, (1,3,1), True),
++             ('J', 0, (5,), False),
++             ('H', 1, (1,2,2), False),
++             ('F', 1, (1,2,1), True),
++             ('G', 0, (4,), False),
++             ('E', 1, (1,1,2), False),
++             ('C', 1, (1,1,1), True),
++             ('D', 0, (3,), False),
++             ('B', 0, (2,), False),
++             ('A', 0, (1,),  True),
++             ],
++            )
++        # Adding a shortcut from the first revision should not change any of
++        # the existing numbers
++        self.assertSortAndIterate(
++            {'A': [],
++             'B': ['A'],
++             'C': ['A'],
++             'D': ['B'],
++             'E': ['C'],
++             'F': ['C'],
++             'G': ['D', 'E'],
++             'H': ['F'],
++             'I': ['F'],
++             'J': ['G', 'H'],
++             'K': ['I'],
++             'L': ['J', 'K'],
++             'M': ['A'],
++             'N': ['L', 'M'],
++            },
++            'N',
++            [('N', 0, (7,), False),
++             ('M', 1, (1,4,1), True),
++             ('L', 0, (6,), False),
++             ('K', 1, (1,3,2), False),
++             ('I', 1, (1,3,1), True),
++             ('J', 0, (5,), False),
++             ('H', 1, (1,2,2), False),
++             ('F', 1, (1,2,1), True),
++             ('G', 0, (4,), False),
++             ('E', 1, (1,1,2), False),
++             ('C', 1, (1,1,1), True),
++             ('D', 0, (3,), False),
++             ('B', 0, (2,), False),
++             ('A', 0, (1,),  True),
++             ],
++            )
++
++    def test_end_of_merge_not_last_revision_in_branch(self):
++        # within a branch only the last revision gets an
++        # end of merge marker.
++        self.assertSortAndIterate(
++            {'A': ['B'],
++             'B': [],
++             },
++            'A',
++            [('A', 0, (2,), False),
++             ('B', 0, (1,), True)
++             ],
++            )
++
++    def test_end_of_merge_multiple_revisions_merged_at_once(self):
++        # when multiple branches are merged at once, both of their
++        # branch-endpoints should be listed as end-of-merge.
++        # Also, the order of the multiple merges should be
++        # left-right shown top to bottom.
++        # * means end of merge
++        # A 0    [H, B, E]
++        # B  1   [D, C]
++        # C   2  [D]       *
++        # D  1   [H]       *
++        # E  1   [G, F]
++        # F   2  [G]       *
++        # G  1   [H]       *
++        # H 0    []        *
++        self.assertSortAndIterate(
++            {'A': ['H', 'B', 'E'],
++             'B': ['D', 'C'],
++             'C': ['D'],
++             'D': ['H'],
++             'E': ['G', 'F'],
++             'F': ['G'],
++             'G': ['H'],
++             'H': [],
++             },
++            'A',
++            [('A', 0, (2,), False),
++             ('B', 1, (1,3,2), False),
++             ('C', 2, (1,4,1), True),
++             ('D', 1, (1,3,1), True),
++             ('E', 1, (1,1,2), False),
++             ('F', 2, (1,2,1), True),
++             ('G', 1, (1,1,1), True),
++             ('H', 0, (1,), True),
++             ],
++            )
++
++    def test_parallel_root_sequence_numbers_increase_with_merges(self):
++        """When there are parallel roots, check their revnos."""
++        self.assertSortAndIterate(
++            {'A': [],
++             'B': [],
++             'C': ['A', 'B']},
++            'C',
++            [('C', 0, (2,), False),
++             ('B', 1, (0,1,1), True),
++             ('A', 0, (1,), True),
++             ],
++            )
++
++    def test_revnos_are_globally_assigned(self):
++        """revnos are assigned according to the revision they derive from."""
++        # in this test we setup a number of branches that all derive from
++        # the first revision, and then merge them one at a time, which
++        # should give the revisions as they merge numbers still deriving from
++        # the revision were based on.
++        # merge 3: J: ['G', 'I']
++        # branch 3:
++        #  I: ['H']
++        #  H: ['A']
++        # merge 2: G: ['D', 'F']
++        # branch 2:
++        #  F: ['E']
++        #  E: ['A']
++        # merge 1: D: ['A', 'C']
++        # branch 1:
++        #  C: ['B']
++        #  B: ['A']
++        # root: A: []
++        self.assertSortAndIterate(
++            {'J': ['G', 'I'],
++             'I': ['H',],
++             'H': ['A'],
++             'G': ['D', 'F'],
++             'F': ['E'],
++             'E': ['A'],
++             'D': ['A', 'C'],
++             'C': ['B'],
++             'B': ['A'],
++             'A': [],
++             },
++            'J',
++            [('J', 0, (4,), False),
++             ('I', 1, (1,3,2), False),
++             ('H', 1, (1,3,1), True),
++             ('G', 0, (3,), False),
++             ('F', 1, (1,2,2), False),
++             ('E', 1, (1,2,1), True),
++             ('D', 0, (2,), False),
++             ('C', 1, (1,1,2), False),
++             ('B', 1, (1,1,1), True),
++             ('A', 0, (1,), True),
++             ],
++            )
++
++    def test_roots_and_sub_branches_versus_ghosts(self):
++        """Extra roots and their mini branches use the same numbering.
++
++        All of them use the 0-node numbering.
++        """
++        #       A D   K
++        #       | |\  |\
++        #       B E F L M
++        #       | |/  |/
++        #       C G   N
++        #       |/    |\
++        #       H I   O P
++        #       |/    |/
++        #       J     Q
++        #       |.---'
++        #       R
++        self.assertSortAndIterate(
++            {'A': [],
++             'B': ['A'],
++             'C': ['B'],
++             'D': [],
++             'E': ['D'],
++             'F': ['D'],
++             'G': ['E', 'F'],
++             'H': ['C', 'G'],
++             'I': [],
++             'J': ['H', 'I'],
++             'K': [],
++             'L': ['K'],
++             'M': ['K'],
++             'N': ['L', 'M'],
++             'O': ['N'],
++             'P': ['N'],
++             'Q': ['O', 'P'],
++             'R': ['J', 'Q'],
++            },
++            'R',
++            [('R', 0, (6,), False),
++             ('Q', 1, (0,4,5), False),
++             ('P', 2, (0,6,1), True),
++             ('O', 1, (0,4,4), False),
++             ('N', 1, (0,4,3), False),
++             ('M', 2, (0,5,1), True),
++             ('L', 1, (0,4,2), False),
++             ('K', 1, (0,4,1), True),
++             ('J', 0, (5,), False),
++             ('I', 1, (0,3,1), True),
++             ('H', 0, (4,), False),
++             ('G', 1, (0,1,3), False),
++             ('F', 2, (0,2,1), True),
++             ('E', 1, (0,1,2), False),
++             ('D', 1, (0,1,1), True),
++             ('C', 0, (3,), False),
++             ('B', 0, (2,), False),
++             ('A', 0, (1,), True),
++             ],
++            )
++
++    def test_ghost(self):
++        # merge_sort should be able to ignore ghosts
++        # A
++        # |
++        # B ghost
++        # |/
++        # C
++        self.assertSortAndIterate(
++            {'A': [],
++             'B': ['A'],
++             'C': ['B', 'ghost'],
++            },
++            'C',
++            [('C', 0, (3,), False),
++             ('B', 0, (2,), False),
++             ('A', 0, (1,), True),
++            ])
++
++    def test_graph_cycle(self):
++        # merge_sort should fail with a simple error when a graph cycle is
++        # encountered.
++        #
++        # A
++        # |,-.
++        # B  |
++        # |  |
++        # C  ^
++        # |  |
++        # D  |
++        # |'-'
++        # E
++        self.assertRaises(errors.GraphCycleError,
++            self.assertSortAndIterate,
++                {'A': [],
++                 'B': ['D'],
++                 'C': ['B'],
++                 'D': ['C'],
++                 'E': ['D'],
++                },
++                'E',
++                [])
 === modified file 'bzrlib/tests/test_tsort.py'
 --- bzrlib/tests/test_tsort.py	2009-08-17 15:26:18 +0000
 +++ bzrlib/tests/test_tsort.py	2009-08-19 16:35:15 +0000
@@ -17,6 +17,7 @@
  """Tests for topological sort."""
++import pprint
  from bzrlib.tests import TestCase
  from bzrlib.tsort import topo_sort, TopoSorter, MergeSorter, merge_sort
@@ -39,6 +40,23 @@
                            list,
                            TopoSorter(graph).iter_topo_order())
++    def assertSortAndIterateOrder(self, graph):
++        """Check topo_sort and iter_topo_order is genuinely topological order.
++
++        For every child in the graph, check if it comes after all of it's
++        parents.
++        """
++        sort_result = topo_sort(graph)
++        iter_result = list(TopoSorter(graph).iter_topo_order())
++        for (node, parents) in graph:
++            for parent in parents:
++                if sort_result.index(node) < sort_result.index(parent):
++                    self.fail("parent %s must come before child %s:\n%s"
++                              % (parent, node, sort_result))
++                if iter_result.index(node) < iter_result.index(parent):
++                    self.fail("parent %s must come before child %s:\n%s"
++                              % (parent, node, iter_result))
++
      def test_tsort_empty(self):
          """TopoSort empty list"""
          self.assertSortAndIterate([], [])
@@ -60,6 +78,15 @@
 : [2],
 : [0]}.items())
++    def test_topo_sort_cycle_with_tail(self):
++        """TopoSort traps graph with longer cycle"""
++        self.assertSortAndIterateRaise(GraphCycleError,
++                                       {0: [1],
++                                        1: [2],
++                                        2: [3, 4],
++                                        3: [0],
++                                        4: []}.items())
++
      def test_tsort_1(self):
          """TopoSort simple nontrivial graph"""
          self.assertSortAndIterate({0: [3],
@@ -72,10 +99,10 @@
      def test_tsort_partial(self):
          """Topological sort with partial ordering.
--        If the graph does not give an order between two nodes, they are
--        returned in lexicographical order.
++        Multiple correct orderings are possible, so test for
++        correctness, not for exact match on the resulting list.
          """
--        self.assertSortAndIterate(([(0, []),
++        self.assertSortAndIterateOrder([(0, []),
                                     (1, [0]),
                                     (2, [0]),
                                     (3, [0]),
@@ -83,8 +110,7 @@
                                     (5, [1, 2]),
                                     (6, [1, 2]),
                                     (7, [2, 3]),
--                                   (8, [0, 1, 4, 5, 6])]),
--                                  [0, 1, 2, 3, 4, 5, 6, 7, 8])
++                                   (8, [0, 1, 4, 5, 6])])
      def test_tsort_unincluded_parent(self):
          """Sort nodes, but don't include some parents in the output"""
@@ -102,12 +128,8 @@
                             mainline_revisions=mainline_revisions,
                             generate_revno=generate_revno)
          if result_list != value:
--            import pprint
              self.assertEqualDiff(pprint.pformat(result_list),
                                   pprint.pformat(value))
--        self.assertEquals(result_list,
--            merge_sort(graph, branch_tip, mainline_revisions=mainline_revisions,
--                generate_revno=generate_revno))
          self.assertEqual(result_list,
              list(MergeSorter(
                  graph,
 === modified file 'bzrlib/tsort.py'
 --- bzrlib/tsort.py	2009-08-17 15:26:18 +0000
 +++ bzrlib/tsort.py	2009-08-19 16:35:14 +0000
@@ -18,8 +18,11 @@
  """Topological sorting routines."""
--from bzrlib import errors
--import bzrlib.revision as _mod_revision
++from bzrlib import (
++    errors,
++    graph as _mod_graph,
++    revision as _mod_revision,
++    )
  __all__ = ["topo_sort", "TopoSorter", "merge_sort", "MergeSorter"]
@@ -30,12 +33,21 @@
      graph -- sequence of pairs of node->parents_list.
--    The result is a list of node names, such that all parents come before
--    their children.
++    The result is a list of node names, such that all parents come before their
++    children.
      node identifiers can be any hashable object, and are typically strings.
++
++    This function has the same purpose as the TopoSorter class, but uses a
++    different algorithm to sort the graph. That means that while both return a
++    list with parents before their child nodes, the exact ordering can be
++    different.
++
++    topo_sort is faster when the whole list is needed, while when iterating
++    over a part of the list, TopoSorter.iter_topo_order should be used.
      """
--    return TopoSorter(graph).sorted()
++    kg = _mod_graph.KnownGraph(dict(graph))
++    return kg.topo_sort()
  class TopoSorter(object):
@@ -60,22 +72,8 @@
          iteration or sorting may raise GraphCycleError if a cycle is present
          in the graph.
          """
--        # a dict of the graph.
++        # store a dict of the graph.
          self._graph = dict(graph)
--        self._visitable = set(self._graph)
--        ### if debugging:
--        # self._original_graph = dict(graph)
--
--        # this is a stack storing the depth first search into the graph.
--        self._node_name_stack = []
--        # at each level of 'recursion' we have to check each parent. This
--        # stack stores the parents we have not yet checked for the node at the
--        # matching depth in _node_name_stack
--        self._pending_parents_stack = []
--        # this is a set of the completed nodes for fast checking whether a
--        # parent in a node we are processing on the stack has already been
--        # emitted and thus can be skipped.
--        self._completed_node_names = set()
      def sorted(self):
          """Sort the graph and return as a list.
@@ -100,67 +98,64 @@
          After finishing iteration the sorter is empty and you cannot continue
          iteration.
          """
--        while self._graph:
++        graph = self._graph
++        visitable = set(graph)
++
++        # this is a stack storing the depth first search into the graph.
++        pending_node_stack = []
++        # at each level of 'recursion' we have to check each parent. This
++        # stack stores the parents we have not yet checked for the node at the
++        # matching depth in pending_node_stack
++        pending_parents_stack = []
++
++        # this is a set of the completed nodes for fast checking whether a
++        # parent in a node we are processing on the stack has already been
++        # emitted and thus can be skipped.
++        completed_node_names = set()
++
++        while graph:
              # now pick a random node in the source graph, and transfer it to the
--            # top of the depth first search stack.
--            node_name, parents = self._graph.popitem()
--            self._push_node(node_name, parents)
--            while self._node_name_stack:
--                # loop until this call completes.
--                parents_to_visit = self._pending_parents_stack[-1]
--                # if all parents are done, the revision is done
++            # top of the depth first search stack of pending nodes.
++            node_name, parents = graph.popitem()
++            pending_node_stack.append(node_name)
++            pending_parents_stack.append(list(parents))
++
++            # loop until pending_node_stack is empty
++            while pending_node_stack:
++                parents_to_visit = pending_parents_stack[-1]
++                # if there are no parents left, the revision is done
                  if not parents_to_visit:
                      # append the revision to the topo sorted list
--                    # all the nodes parents have been added to the output, now
--                    # we can add it to the output.
--                    yield self._pop_node()
++                    # all the nodes parents have been added to the output,
++                    # now we can add it to the output.
++                    popped_node = pending_node_stack.pop()
++                    pending_parents_stack.pop()
++                    completed_node_names.add(popped_node)
++                    yield popped_node
                  else:
--                    while self._pending_parents_stack[-1]:
--                        # recurse depth first into a single parent
--                        next_node_name = self._pending_parents_stack[-1].pop()
--                        if next_node_name in self._completed_node_names:
--                            # this parent was completed by a child on the
--                            # call stack. skip it.
--                            continue
--                        if next_node_name not in self._visitable:
--                            continue
--                        # otherwise transfer it from the source graph into the
--                        # top of the current depth first search stack.
--                        try:
--                            parents = self._graph.pop(next_node_name)
--                        except KeyError:
--                            # if the next node is not in the source graph it has
--                            # already been popped from it and placed into the
--                            # current search stack (but not completed or we would
--                            # have hit the continue 4 lines up.
--                            # this indicates a cycle.
--                            raise errors.GraphCycleError(self._node_name_stack)
--                        self._push_node(next_node_name, parents)
--                        # and do not continue processing parents until this 'call'
--                        # has recursed.
--                        break
--
--    def _push_node(self, node_name, parents):
--        """Add node_name to the pending node stack.
--
--        Names in this stack will get emitted into the output as they are popped
--        off the stack.
--        """
--        self._node_name_stack.append(node_name)
--        self._pending_parents_stack.append(list(parents))
--
--    def _pop_node(self):
--        """Pop the top node off the stack
--
--        The node is appended to the sorted output.
--        """
--        # we are returning from the flattened call frame:
--        # pop off the local variables
--        node_name = self._node_name_stack.pop()
--        self._pending_parents_stack.pop()
--
--        self._completed_node_names.add(node_name)
--        return node_name
++                    # recurse depth first into a single parent
++                    next_node_name = parents_to_visit.pop()
++
++                    if next_node_name in completed_node_names:
++                        # parent was already completed by a child, skip it.
++                        continue
++                    if next_node_name not in visitable:
++                        # parent is not a node in the original graph, skip it.
++                        continue
++
++                    # transfer it along with its parents from the source graph
++                    # into the top of the current depth first search stack.
++                    try:
++                        parents = graph.pop(next_node_name)
++                    except KeyError:
++                        # if the next node is not in the source graph it has
++                        # already been popped from it and placed into the
++                        # current search stack (but not completed or we would
++                        # have hit the continue 6 lines up).  this indicates a
++                        # cycle.
++                        raise errors.GraphCycleError(pending_node_stack)
++                    pending_node_stack.append(next_node_name)
++                    pending_parents_stack.append(list(parents))
  def merge_sort(graph, branch_tip, mainline_revisions=None, generate_revno=False):
@@ -414,7 +409,8 @@
          # seed the search with the tip of the branch
          if (branch_tip is not None and
--            branch_tip != _mod_revision.NULL_REVISION):
++            branch_tip != _mod_revision.NULL_REVISION and
++            branch_tip != (_mod_revision.NULL_REVISION,)):
              parents = self._graph.pop(branch_tip)
              self._push_node(branch_tip, 0, parents)
@@ -571,7 +567,11 @@
                          # current search stack (but not completed or we would
                          # have hit the continue 4 lines up.
                          # this indicates a cycle.
--                        raise errors.GraphCycleError(node_name_stack)
++                        if next_node_name in self._original_graph:
++                            raise errors.GraphCycleError(node_name_stack)
++                        else:
++                            # This is just a ghost parent, ignore it
++                            continue
                      next_merge_depth = 0
                      if is_left_subtree:
                          # a new child branch from name_stack[-1]
@@ -673,11 +673,12 @@
          else:
              # no parents, use the root sequence
              root_count = self._revno_to_branch_count.get(0, 0)
++            root_count = self._revno_to_branch_count.get(0, -1)
++            root_count += 1
              if root_count:
                  revno = (0, root_count, 1)
              else:
                  revno = (1,)
--            root_count += 1
              self._revno_to_branch_count[0] = root_count
          # store the revno for this node for future reference
 === modified file 'bzrlib/versionedfile.py'
 --- bzrlib/versionedfile.py	2009-08-07 05:56:29 +0000
 +++ bzrlib/versionedfile.py	2009-08-19 16:35:14 +0000
@@ -32,6 +32,7 @@
  from bzrlib import (
      annotate,
      errors,
++    graph as _mod_graph,
      groupcompress,
      index,
      knit,
@@ -941,6 +942,20 @@
              if '\n' in line[:-1]:
                  raise errors.BzrBadParameterContainsNewline("lines")
++    def get_known_graph_ancestry(self, keys):
++        """Get a KnownGraph instance with the ancestry of keys."""
++        # most basic implementation is a loop around get_parent_map
++        pending = set(keys)
++        parent_map = {}
++        while pending:
++            this_parent_map = self.get_parent_map(pending)
++            parent_map.update(this_parent_map)
++            pending = set()
++            map(pending.update, this_parent_map.itervalues())
++            pending = pending.difference(parent_map)
++        kg = _mod_graph.KnownGraph(parent_map)
++        return kg
++
      def get_parent_map(self, keys):
          """Get a map of the parents of keys.