Merge lp:~vila/bzr/gdfo-heads into lp:~bzr/bzr/trunk-old

Proposed by Vincent Ladeuil
Status: Merged
Merged at revision: not available
Proposed branch: lp:~vila/bzr/gdfo-heads
Merge into: lp:~bzr/bzr/trunk-old
Diff against target: 1180 lines
To merge this branch: bzr merge lp:~vila/bzr/gdfo-heads
Reviewer Review Type Date Requested Status
John A Meinel Needs Fixing
Review via email: mp+7651@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Vincent Ladeuil (vila) wrote :

This is a new implementation of KnownGraph.heads() and a companion _find_gdfo().

The main target is annotate, this doesn't address the revno
calculation which is certainly the next target (all numbers below
referring to annotate are with --show-ids (aka, the improvements
are due to the per-file graph better processing, not the revision
graph)).

Climbing on Jonh's shoulders (all the test and the pyrex plumbing
was there, waiting ;-P) I was able to improve the performances for
"dense" graphs (think mysql). The performance for bzr improves
too, but in a less spectacular way.

This patch also includes a fix by John for its implementation
that also give some spectacular results, keep that in mind when
reading the numbers.

Below are some results for 1.16, bzr.dev@4459, john's version
(aka an unreleased one, see
revid:<email address hidden> in
this branch) and this proposed version (which supersedes John's
one).

I've used NEWS for bzr as a reference point in the bzr.dev@4459
tree, and sql/mysqld.cc in dev6 conversion of mysql-6.0@2791.

I've also used tools/time_graph.py to measure the performances
more precisely.

Annotate bzr NEWS:
1.16: real 12.36 user 12.27 sys 0.04
trunk: real 6.84 user 6.80 sys 0.03
john: real 6.76 user 6.11 sys 0.12
this: real 6.00 user 5.96 sys 0.03

Annotate mysql sql/mysqld.cc:
1.16: real 21.66 user 21.52 sys 0.14
trunk: real
john: real 19.97 user 19.62 sys 0.17
this: real 18.69 user 18.56 sys 0.14

time_graph bzr, 25378 nodes, 7586 combinations:
1.16: N.A.
trunk: python 127.070s pyrex 49.800s
john: python 1.030s pyrex 0.700s
this: python 0.960s pyrex 0.540s

time_graph mysql, 67633 nodes, 33740 combinations:
1.16: N.A.
trunk: python 3470.620s pyrex 1751.760s
john: python 34.190s pyrex 36.480s
this: python 33.750s pyrex 17.670s

So while John fixed version vastly improves that specific part
(gdfo rules), the proposed new algorithms goes even further for
'bushy' graphs.

Unlike John, I couldn't find a way to make good use of 'linear
dominators' (either my pyrex fu is weak (two days old) or they'll
be useful elsewhere (I'm sure about that ;-)). Anyway, trying to
use them only decrease the performance and increase the memory
consumption, so, finally I get rid of them.

There are certainly better optimisations to put in place at the
pyrex level, but I thought the improvement was worth a review.

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Vincent Ladeuil wrote:
> Vincent Ladeuil has proposed merging lp:~vila/bzr/gdfo-heads into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
>
> This is a new implementation of KnownGraph.heads() and a companion _find_gdfo().

...

>
> Climbing on Jonh's shoulders (all the test and the pyrex plumbing
> was there, waiting ;-P) I was able to improve the performances for
> "dense" graphs (think mysql). The performance for bzr improves
> too, but in a less spectacular way.
>
> This patch also includes a fix by John for its implementation
> that also give some spectacular results, keep that in mind when
> reading the numbers.

Well, you were the one that realized we could stop at "min_gdfo" which
was the big win. I just implemented it the existing pyrex code.

>
> Below are some results for 1.16, bzr.dev@4459, john's version
> (aka an unreleased one, see
> revid:<email address hidden> in
> this branch) and this proposed version (which supersedes John's
> one).
>
> I've used NEWS for bzr as a reference point in the bzr.dev@4459
> tree, and sql/mysqld.cc in dev6 conversion of mysql-6.0@2791.
>
> I've also used tools/time_graph.py to measure the performances
> more precisely.

Actually, time_graph measures something different. Specifically, it
measures interesting nodes in the ancestry of a branch (merge parents),
while annotate is doing things in a per-file graph.

Both are potentially useful, but they really aren't measuring the same
thing.

since you are no longer using heapq you can actually improve the pyrex
quite a bit with:

     cdef public long gdfo # Int

Pyrex will automatically wrap a long with the appropriate to/from PyLong
wrappers (actually Python does it when you declare the class object).
And at that point you get to access gdfo as a raw C long, rather than
casting to/from PyInt. (With the downside that you can't handle graphs
larger than 2^31, which I'm not worried about.)

In general, I'm pretty sure the pyrex version is almost identical to the
python version, and I would expect it to not really be any faster. (You
aren't directly accessing List items, etc. And in fact, you deleted my
helpers for doing so.)

I also wasn't actually planning on including 'time_graph.py' in bzr.dev,
that was more of an accident.

Anyway, the algorithm changes are nice, and if you want I'll take some
time to profile and improve the pyrex code.

(this: python 0.960s pyrex 0.540s i guess the pyrex is slightly
faster, but I was seeing a lot more than this in the past...)

John
=:->

So I'll take over some Pyrex perf tomorrow. Otherwise I certainly approve.

  review: needs_fixing

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAko6sH4ACgkQJdeBCYSNAAM3UgCfdYEW5kol4Bl4/X1j7BSL2UvY
HvUAoLDMCysO6m27BtpB95bKAFykUrXW
=3nmR
-----END PGP SIGNATURE-----

review: Needs Fixing
Revision history for this message
Vincent Ladeuil (vila) wrote :
Download full text (3.5 KiB)

>>>>> "jam" == John A Meinel <email address hidden> writes:

<snip/>

    jam> Well, you were the one that realized we could stop at
    jam> "min_gdfo" which was the big win.

Right, having ideas doesn't cost much ;-)

    jam> I just implemented it the existing pyrex code.

And rightly deserve credit for that ;-)

<snip/>

    jam> Actually, time_graph measures something
    jam> different. Specifically, it measures interesting nodes
    jam> in the ancestry of a branch (merge parents), while
    jam> annotate is doing things in a per-file graph.

I think they are an easy to find, yet pretty representative, of heads() calls.

    jam> Both are potentially useful, but they really aren't
    jam> measuring the same thing.

I understand that, but I thought they were valid enough to
compare different implementations.

    jam> since you are no longer using heapq you can actually improve the pyrex
    jam> quite a bit with:

    jam> cdef public long gdfo # Int

    jam> Pyrex will automatically wrap a long with the
    jam> appropriate to/from PyLong wrappers (actually Python
    jam> does it when you declare the class object). And at that
    jam> point you get to access gdfo as a raw C long, rather
    jam> than casting to/from PyInt. (With the downside that you
    jam> can't handle graphs larger than 2^31, which I'm not
    jam> worried about.)

AIUI, the potential meat when using pyrex is to avoid conversions
to/from python objects.

As such, I'm damn *sure* we can do far better.

Yet, I still feel it's a bit premature to do it right now
(*especially* considering I deleted the linear dominators related
code, which I still think will be useful).

    jam> In general, I'm pretty sure the pyrex version is almost
    jam> identical to the python version, and I would expect it
    jam> to not really be any faster. (You aren't directly
    jam> accessing List items, etc. And in fact, you deleted my
    jam> helpers for doing so.)

Yes, I tried to keep both implementations as close as possible.

    jam> I also wasn't actually planning on including
    jam> 'time_graph.py' in bzr.dev, that was more of an
    jam> accident.

Keep having accidents like that, time_graph was a key part of
that experiment !

hack && make && selftest && time_graph && rinse && repeat

    jam> Anyway, the algorithm changes are nice, and if you want
    jam> I'll take some time to profile and improve the pyrex
    jam> code.

As said above, I don't want to optimize prematurely.

I'd like to see this landed as it gives a nice improvement in
every case (unless someone can prove us wrong here).

But it also shows (and your experiment more than mine) that micro
tuning has potential at the cost of making the code hard to read
(I may change my minds by practicing pyrex a bit more, but that's
how I feel so far).

The next steps I have in mind are:

First, I'd like to try the KnownGraph strategy to the revno
computation. That will still be an O(history) operation but if we
can make is less crazily long...

From there, we can evaluate caching gdfo (ghost filling problems
yet to be addressed) and make these algorithms works with lazy
loaded graphs.

    jam> (this: pytho...

Read more...

Revision history for this message
Vincent Ladeuil (vila) wrote :

>>>>> "vila" == Vincent Ladeuil <email address hidden> writes:

<snip/>

    jam> And in fact, you deleted my helpers for doing so.)

I failed to mention that my intent was to keep the code as
minimal as possible. I value these helpers, know where they are
and can restore them the day I need them

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bzrlib/_known_graph_py.py'
2--- bzrlib/_known_graph_py.py 2009-06-16 15:35:14 +0000
3+++ bzrlib/_known_graph_py.py 2009-06-19 02:35:18 +0000
4@@ -17,8 +17,6 @@
5 """Implementation of Graph algorithms when we have already loaded everything.
6 """
7
8-import heapq
9-
10 from bzrlib import (
11 revision,
12 )
13@@ -27,27 +25,19 @@
14 class _KnownGraphNode(object):
15 """Represents a single object in the known graph."""
16
17- __slots__ = ('key', 'parent_keys', 'child_keys', 'linear_dominator',
18- 'gdfo', 'ancestor_of')
19+ __slots__ = ('key', 'parent_keys', 'child_keys', 'gdfo')
20
21 def __init__(self, key, parent_keys):
22 self.key = key
23 self.parent_keys = parent_keys
24 self.child_keys = []
25- # oldest ancestor, such that no parents between here and there have >1
26- # child or >1 parent.
27- self.linear_dominator = None
28 # Greatest distance from origin
29 self.gdfo = None
30- # This will become a tuple of known heads that have this node as an
31- # ancestor
32- self.ancestor_of = None
33
34 def __repr__(self):
35- return '%s(%s gdfo:%s par:%s child:%s %s)' % (
36+ return '%s(%s gdfo:%s par:%s child:%s)' % (
37 self.__class__.__name__, self.key, self.gdfo,
38- self.parent_keys, self.child_keys,
39- self.linear_dominator)
40+ self.parent_keys, self.child_keys)
41
42
43 class KnownGraph(object):
44@@ -63,21 +53,28 @@
45 self._known_heads = {}
46 self.do_cache = do_cache
47 self._initialize_nodes(parent_map)
48- self._find_linear_dominators()
49 self._find_gdfo()
50
51 def _initialize_nodes(self, parent_map):
52 """Populate self._nodes.
53
54- After this has finished, self._nodes will have an entry for every entry
55- in parent_map. Ghosts will have a parent_keys = None, all nodes found
56- will also have .child_keys populated with all known child_keys.
57+ After this has finished:
58+ - self._nodes will have an entry for every entry in parent_map.
59+ - ghosts will have a parent_keys = None,
60+ - all nodes found will also have .child_keys populated with all known
61+ child_keys,
62+ - self._tails will list all the nodes without parents.
63 """
64+ tails = self._tails = set()
65 nodes = self._nodes
66 for key, parent_keys in parent_map.iteritems():
67 if key in nodes:
68 node = nodes[key]
69 node.parent_keys = parent_keys
70+ if parent_keys:
71+ # This node has been added before being seen in parent_map
72+ # (see below)
73+ tails.remove(node)
74 else:
75 node = _KnownGraphNode(key, parent_keys)
76 nodes[key] = node
77@@ -87,129 +84,34 @@
78 except KeyError:
79 parent_node = _KnownGraphNode(parent_key, None)
80 nodes[parent_key] = parent_node
81+ # Potentially a tail, if we're wrong we'll remove it later
82+ # (see above)
83+ tails.add(parent_node)
84 parent_node.child_keys.append(key)
85
86- def _find_linear_dominators(self):
87- """For each node in the set, find any linear dominators.
88-
89- For any given node, the 'linear dominator' is an ancestor, such that
90- all parents between this node and that one have a single parent, and a
91- single child. So if A->B->C->D then B,C,D all have a linear dominator
92- of A.
93-
94- There are two main benefits:
95- 1) When walking the graph, we can jump to the nearest linear dominator,
96- rather than walking all of the nodes inbetween.
97- 2) When caching heads() results, dominators give the "same" results as
98- their children. (If the dominator is a head, then the descendant is
99- a head, if the dominator is not a head, then the child isn't
100- either.)
101- """
102- def check_node(node):
103- if node.parent_keys is None or len(node.parent_keys) != 1:
104- # This node is either a ghost, a tail, or has multiple parents
105- # It its own dominator
106- node.linear_dominator = node.key
107- return None
108- parent_node = self._nodes[node.parent_keys[0]]
109- if len(parent_node.child_keys) > 1:
110- # The parent has multiple children, so *this* node is the
111- # dominator
112- node.linear_dominator = node.key
113- return None
114- # The parent is already filled in, so add and continue
115- if parent_node.linear_dominator is not None:
116- node.linear_dominator = parent_node.linear_dominator
117- return None
118- # We don't know this node, or its parent node, so start walking to
119- # next
120- return parent_node
121-
122- for node in self._nodes.itervalues():
123- # The parent is not filled in, so walk until we get somewhere
124- if node.linear_dominator is not None: #already done
125- continue
126- next_node = check_node(node)
127- if next_node is None:
128- # Nothing more needs to be done
129- continue
130- stack = []
131- while next_node is not None:
132- stack.append(node)
133- node = next_node
134- next_node = check_node(node)
135- # The stack now contains the linear chain, and 'node' should have
136- # been labeled
137- dominator = node.linear_dominator
138- while stack:
139- next_node = stack.pop()
140- next_node.linear_dominator = dominator
141- node = next_node
142-
143 def _find_gdfo(self):
144- def find_tails():
145- return [node for node in self._nodes.itervalues()
146- if not node.parent_keys]
147- tails = find_tails()
148- todo = []
149- heappush = heapq.heappush
150- heappop = heapq.heappop
151 nodes = self._nodes
152- for node in tails:
153+ known_parent_gdfos = {}
154+ pending = []
155+
156+ for node in self._tails:
157 node.gdfo = 1
158- heappush(todo, (1, node))
159- processed = 0
160- while todo:
161- gdfo, next = heappop(todo)
162- processed += 1
163- if next.gdfo is not None and gdfo < next.gdfo:
164- # This node was reached from a longer path, we assume it was
165- # enqued correctly with the longer gdfo, so don't continue
166- # processing now
167- continue
168- next_gdfo = gdfo + 1
169- for child_key in next.child_keys:
170- child_node = nodes[child_key]
171- if child_node.gdfo is None or child_node.gdfo < next_gdfo:
172- # Only enque children when all of their parents have been
173- # resolved
174- for parent_key in child_node.parent_keys:
175- # We know that 'this' parent is counted
176- if parent_key != next.key:
177- parent_node = nodes[parent_key]
178- if parent_node.gdfo is None:
179- break
180- else:
181- child_node.gdfo = next_gdfo
182- heappush(todo, (next_gdfo, child_node))
183-
184- def _get_dominators_to_nodes(self, candidate_nodes):
185- """Get the reverse mapping from dominator_key => candidate_nodes.
186-
187- As a side effect, this can also remove potential candidate nodes if we
188- determine that they share a dominator.
189- """
190- dom_to_node = {}
191- keys_to_remove = []
192- for node in candidate_nodes.values():
193- if node.linear_dominator in dom_to_node:
194- # This node already exists, resolve which node supersedes the
195- # other
196- other_node = dom_to_node[node.linear_dominator]
197- # There should be no way that nodes sharing a dominator could
198- # 'tie' for gdfo
199- if other_node.gdfo > node.gdfo:
200- # The other node has this node as an ancestor
201- keys_to_remove.append(node.key)
202- else:
203- # Replace the other node, and set this as the new key
204- keys_to_remove.append(other_node.key)
205- dom_to_node[node.linear_dominator] = node
206- else:
207- dom_to_node[node.linear_dominator] = node
208- for key in keys_to_remove:
209- candidate_nodes.pop(key)
210- return dom_to_node
211+ pending.append(node)
212+
213+ while pending:
214+ node = pending.pop()
215+ for child_key in node.child_keys:
216+ child = nodes[child_key]
217+ try:
218+ known_parent_gdfos[child_key] += 1
219+ except KeyError:
220+ known_parent_gdfos[child_key] = 1
221+ if child.gdfo is None or node.gdfo + 1 > child.gdfo:
222+ child.gdfo = node.gdfo + 1
223+ if known_parent_gdfos[child_key] == len(child.parent_keys):
224+ # We are the last parent updating that node, we can
225+ # continue from there
226+ pending.append(child)
227
228 def heads(self, keys):
229 """Return the heads from amongst keys.
230@@ -217,9 +119,8 @@
231 This is done by searching the ancestries of each key. Any key that is
232 reachable from another key is not returned; all the others are.
233
234- This operation scales with the relative depth between any two keys. If
235- any two keys are completely disconnected all ancestry of both sides
236- will be retrieved.
237+ This operation scales with the relative depth between any two keys. It
238+ uses gdfo to avoid walking all ancestry.
239
240 :param keys: An iterable of keys.
241 :return: A set of the heads. Note that as a set there is no ordering
242@@ -231,114 +132,43 @@
243 # NULL_REVISION is only a head if it is the only entry
244 candidate_nodes.pop(revision.NULL_REVISION)
245 if not candidate_nodes:
246- return set([revision.NULL_REVISION])
247+ return frozenset([revision.NULL_REVISION])
248 if len(candidate_nodes) < 2:
249+ # No or only one candidate
250 return frozenset(candidate_nodes)
251 heads_key = frozenset(candidate_nodes)
252 if heads_key != frozenset(keys):
253+ # Mention duplicates
254 note('%s != %s', heads_key, frozenset(keys))
255+ # Do we have a cached result ?
256 try:
257 heads = self._known_heads[heads_key]
258 return heads
259 except KeyError:
260- pass # compute it ourselves
261- dom_to_node = self._get_dominators_to_nodes(candidate_nodes)
262- if len(candidate_nodes) < 2:
263- # We shrunk candidate_nodes and determined a new head
264- return frozenset(candidate_nodes)
265- dom_heads_key = None
266- # Check the linear dominators of these keys, to see if we already
267- # know the heads answer
268- dom_heads_key = frozenset([node.linear_dominator
269- for node in candidate_nodes.itervalues()])
270- if dom_heads_key in self._known_heads:
271- # map back into the original keys
272- heads = self._known_heads[dom_heads_key]
273- heads = frozenset([dom_to_node[key].key for key in heads])
274- return heads
275- heads = self._heads_from_candidate_nodes(candidate_nodes, dom_to_node)
276+ pass
277+ # Let's compute the heads
278+ seen = set()
279+ pending = []
280+ min_gdfo = None
281+ for node in candidate_nodes.values():
282+ if node.parent_keys:
283+ pending.extend(node.parent_keys)
284+ if min_gdfo is None or node.gdfo < min_gdfo:
285+ min_gdfo = node.gdfo
286+ nodes = self._nodes
287+ while pending:
288+ node_key = pending.pop()
289+ if node_key in seen:
290+ # node already appears in some ancestry
291+ continue
292+ seen.add(node_key)
293+ node = nodes[node_key]
294+ if node.gdfo <= min_gdfo:
295+ continue
296+ if node.parent_keys:
297+ pending.extend(node.parent_keys)
298+ heads = heads_key.difference(seen)
299 if self.do_cache:
300 self._known_heads[heads_key] = heads
301- # Cache the dominator heads
302- if dom_heads_key is not None:
303- dom_heads = frozenset([candidate_nodes[key].linear_dominator
304- for key in heads])
305- self._known_heads[dom_heads_key] = dom_heads
306 return heads
307
308- def _heads_from_candidate_nodes(self, candidate_nodes, dom_to_node):
309- queue = []
310- to_cleanup = []
311- to_cleanup_append = to_cleanup.append
312- for node in candidate_nodes.itervalues():
313- node.ancestor_of = (node.key,)
314- queue.append((-node.gdfo, node))
315- to_cleanup_append(node)
316- heapq.heapify(queue)
317- # These are nodes that we determined are 'common' that we are no longer
318- # walking
319- # Now we walk nodes until all nodes that are being walked are 'common'
320- num_candidates = len(candidate_nodes)
321- nodes = self._nodes
322- heappop = heapq.heappop
323- heappush = heapq.heappush
324- while queue and len(candidate_nodes) > 1:
325- _, node = heappop(queue)
326- next_ancestor_of = node.ancestor_of
327- if len(next_ancestor_of) == num_candidates:
328- # This node is now considered 'common'
329- # Make sure all parent nodes are marked as such
330- for parent_key in node.parent_keys:
331- parent_node = nodes[parent_key]
332- if parent_node.ancestor_of is not None:
333- parent_node.ancestor_of = next_ancestor_of
334- if node.linear_dominator != node.key:
335- parent_node = nodes[node.linear_dominator]
336- if parent_node.ancestor_of is not None:
337- parent_node.ancestor_of = next_ancestor_of
338- continue
339- if node.parent_keys is None:
340- # This is a ghost
341- continue
342- # Now project the current nodes ancestor list to the parent nodes,
343- # and queue them up to be walked
344- # Note: using linear_dominator speeds things up quite a bit
345- # enough that we actually start to be slightly faster
346- # than the default heads() implementation
347- if node.linear_dominator != node.key:
348- # We are at the tip of a long linear region
349- # We know that there is nothing between here and the tail
350- # that is interesting, so skip to the end
351- parent_keys = [node.linear_dominator]
352- else:
353- parent_keys = node.parent_keys
354- for parent_key in parent_keys:
355- if parent_key in candidate_nodes:
356- candidate_nodes.pop(parent_key)
357- if len(candidate_nodes) <= 1:
358- break
359- elif parent_key in dom_to_node:
360- orig_node = dom_to_node[parent_key]
361- if orig_node is not node:
362- if orig_node.key in candidate_nodes:
363- candidate_nodes.pop(orig_node.key)
364- if len(candidate_nodes) <= 1:
365- break
366- parent_node = nodes[parent_key]
367- ancestor_of = parent_node.ancestor_of
368- if ancestor_of is None:
369- # This node hasn't been walked yet
370- parent_node.ancestor_of = next_ancestor_of
371- # Enqueue this node
372- heappush(queue, (-parent_node.gdfo, parent_node))
373- to_cleanup_append(parent_node)
374- elif ancestor_of != next_ancestor_of:
375- # Combine to get the full set of parents
376- all_ancestors = set(ancestor_of)
377- all_ancestors.update(next_ancestor_of)
378- parent_node.ancestor_of = tuple(sorted(all_ancestors))
379- def cleanup():
380- for node in to_cleanup:
381- node.ancestor_of = None
382- cleanup()
383- return frozenset(candidate_nodes)
384
385=== modified file 'bzrlib/_known_graph_pyx.pyx'
386--- bzrlib/_known_graph_pyx.pyx 2009-06-16 15:35:14 +0000
387+++ bzrlib/_known_graph_pyx.pyx 2009-06-19 02:35:18 +0000
388@@ -42,28 +42,15 @@
389 void Py_INCREF(object)
390
391
392-import heapq
393-
394 from bzrlib import revision
395
396-# Define these as cdef objects, so we don't have to getattr them later
397-cdef object heappush, heappop, heapify, heapreplace
398-heappush = heapq.heappush
399-heappop = heapq.heappop
400-heapify = heapq.heapify
401-heapreplace = heapq.heapreplace
402-
403-
404 cdef class _KnownGraphNode:
405 """Represents a single object in the known graph."""
406
407 cdef object key
408 cdef object parents
409 cdef object children
410- cdef _KnownGraphNode linear_dominator_node
411 cdef public object gdfo # Int
412- # This could also be simplified
413- cdef object ancestor_of
414
415 def __init__(self, key):
416 cdef int i
417@@ -72,14 +59,8 @@
418 self.parents = None
419
420 self.children = []
421- # oldest ancestor, such that no parents between here and there have >1
422- # child or >1 parent.
423- self.linear_dominator_node = None
424 # Greatest distance from origin
425 self.gdfo = -1
426- # This will become a tuple of known heads that have this node as an
427- # ancestor
428- self.ancestor_of = None
429
430 property child_keys:
431 def __get__(self):
432@@ -90,17 +71,9 @@
433 PyList_Append(keys, child.key)
434 return keys
435
436- property linear_dominator:
437- def __get__(self):
438- if self.linear_dominator_node is None:
439- return None
440- else:
441- return self.linear_dominator_node.key
442-
443 cdef clear_references(self):
444 self.parents = None
445 self.children = None
446- self.linear_dominator_node = None
447
448 def __repr__(self):
449 cdef _KnownGraphNode node
450@@ -113,34 +86,10 @@
451 if self.children is not None:
452 for node in self.children:
453 child_keys.append(node.key)
454- return '%s(%s gdfo:%s par:%s child:%s %s)' % (
455+ return '%s(%s gdfo:%s par:%s child:%s)' % (
456 self.__class__.__name__, self.key, self.gdfo,
457- parent_keys, child_keys,
458- self.linear_dominator)
459-
460-
461-cdef _KnownGraphNode _get_list_node(lst, Py_ssize_t pos):
462- cdef PyObject *temp_node
463-
464- temp_node = PyList_GET_ITEM(lst, pos)
465- return <_KnownGraphNode>temp_node
466-
467-
468-cdef _KnownGraphNode _get_parent(parents, Py_ssize_t pos):
469- cdef PyObject *temp_node
470- cdef _KnownGraphNode node
471-
472- temp_node = PyTuple_GET_ITEM(parents, pos)
473- return <_KnownGraphNode>temp_node
474-
475-
476-cdef _KnownGraphNode _peek_node(queue):
477- cdef PyObject *temp_node
478- cdef _KnownGraphNode node
479-
480- temp_node = PyTuple_GET_ITEM(<object>PyList_GET_ITEM(queue, 0), 1)
481- node = <_KnownGraphNode>temp_node
482- return node
483+ parent_keys, child_keysr)
484+
485
486 # TODO: slab allocate all _KnownGraphNode objects.
487 # We already know how many we are going to need, except for a couple of
488@@ -150,11 +99,9 @@
489 """This is a class which assumes we already know the full graph."""
490
491 cdef public object _nodes
492+ cdef public object _tails
493 cdef object _known_heads
494 cdef public int do_cache
495- # Nodes we've touched that we'll need to reset their info when heads() is
496- # done
497- cdef object _to_cleanup
498
499 def __init__(self, parent_map, do_cache=True):
500 """Create a new KnownGraph instance.
501@@ -164,10 +111,8 @@
502 self._nodes = {}
503 # Maps {sorted(revision_id, revision_id): heads}
504 self._known_heads = {}
505- self._to_cleanup = []
506 self.do_cache = int(do_cache)
507 self._initialize_nodes(parent_map)
508- self._find_linear_dominators()
509 self._find_gdfo()
510
511 def __dealloc__(self):
512@@ -179,7 +124,7 @@
513 child = <_KnownGraphNode>temp_node
514 child.clear_references()
515
516- cdef _KnownGraphNode _get_or_create_node(self, key):
517+ cdef _KnownGraphNode _get_or_create_node(self, key, int *created):
518 cdef PyObject *temp_node
519 cdef _KnownGraphNode node
520
521@@ -187,22 +132,29 @@
522 if temp_node == NULL:
523 node = _KnownGraphNode(key)
524 PyDict_SetItem(self._nodes, key, node)
525+ created[0] = 1 # True
526 else:
527 node = <_KnownGraphNode>temp_node
528+ created[0] = 0 # False
529 return node
530
531 def _initialize_nodes(self, parent_map):
532 """Populate self._nodes.
533
534- After this has finished, self._nodes will have an entry for every entry
535- in parent_map. Ghosts will have a parent_keys = None, all nodes found
536- will also have .child_keys populated with all known child_keys.
537+ After this has finished:
538+ - self._nodes will have an entry for every entry in parent_map.
539+ - ghosts will have a parent_keys = None,
540+ - all nodes found will also have .child_keys populated with all known
541+ child_keys,
542+ - self._tails will list all the nodes without parents.
543 """
544 cdef PyObject *temp_key, *temp_parent_keys, *temp_node
545 cdef Py_ssize_t pos, pos2, num_parent_keys
546 cdef _KnownGraphNode node
547 cdef _KnownGraphNode parent_node
548+ cdef int created
549
550+ tails = self._tails = set()
551 nodes = self._nodes
552
553 if not PyDict_CheckExact(parent_map):
554@@ -212,151 +164,56 @@
555 while PyDict_Next(parent_map, &pos, &temp_key, &temp_parent_keys):
556 key = <object>temp_key
557 parent_keys = <object>temp_parent_keys
558- node = self._get_or_create_node(key)
559+ num_parent_keys = len(parent_keys)
560+ node = self._get_or_create_node(key, &created)
561+ if not created and num_parent_keys != 0:
562+ # This node has been added before being seen in parent_map (see
563+ # below)
564+ tails.remove(node)
565 # We know how many parents, so we could pre allocate an exact sized
566 # tuple here
567- num_parent_keys = len(parent_keys)
568 parent_nodes = PyTuple_New(num_parent_keys)
569 # We use iter here, because parent_keys maybe be a list or tuple
570 for pos2 from 0 <= pos2 < num_parent_keys:
571- parent_key = parent_keys[pos2]
572- parent_node = self._get_or_create_node(parent_keys[pos2])
573+ parent_node = self._get_or_create_node(parent_keys[pos2],
574+ &created)
575+ if created:
576+ # Potentially a tail, if we're wrong we'll remove it later
577+ # (see above)
578+ tails.add(parent_node)
579 # PyTuple_SET_ITEM will steal a reference, so INCREF first
580 Py_INCREF(parent_node)
581 PyTuple_SET_ITEM(parent_nodes, pos2, parent_node)
582 PyList_Append(parent_node.children, node)
583 node.parents = parent_nodes
584
585- cdef _KnownGraphNode _check_is_linear(self, _KnownGraphNode node):
586- """Check to see if a given node is part of a linear chain."""
587- cdef _KnownGraphNode parent_node
588- if node.parents is None or PyTuple_GET_SIZE(node.parents) != 1:
589- # This node is either a ghost, a tail, or has multiple parents
590- # It its own dominator
591- node.linear_dominator_node = node
592- return None
593- parent_node = _get_parent(node.parents, 0)
594- if PyList_GET_SIZE(parent_node.children) > 1:
595- # The parent has multiple children, so *this* node is the
596- # dominator
597- node.linear_dominator_node = node
598- return None
599- # The parent is already filled in, so add and continue
600- if parent_node.linear_dominator_node is not None:
601- node.linear_dominator_node = parent_node.linear_dominator_node
602- return None
603- # We don't know this node, or its parent node, so start walking to
604- # next
605- return parent_node
606-
607- def _find_linear_dominators(self):
608- """
609- For any given node, the 'linear dominator' is an ancestor, such that
610- all parents between this node and that one have a single parent, and a
611- single child. So if A->B->C->D then B,C,D all have a linear dominator
612- of A.
613-
614- There are two main benefits:
615- 1) When walking the graph, we can jump to the nearest linear dominator,
616- rather than walking all of the nodes inbetween.
617- 2) When caching heads() results, dominators give the "same" results as
618- their children. (If the dominator is a head, then the descendant is
619- a head, if the dominator is not a head, then the child isn't
620- either.)
621- """
622- cdef PyObject *temp_node
623- cdef Py_ssize_t pos
624- cdef _KnownGraphNode node
625- cdef _KnownGraphNode next_node
626- cdef _KnownGraphNode dominator
627- cdef int i, num_elements
628-
629- pos = 0
630- while PyDict_Next(self._nodes, &pos, NULL, &temp_node):
631- node = <_KnownGraphNode>temp_node
632- # The parent is not filled in, so walk until we get somewhere
633- if node.linear_dominator_node is not None: #already done
634- continue
635- next_node = self._check_is_linear(node)
636- if next_node is None:
637- # Nothing more needs to be done
638- continue
639- stack = []
640- while next_node is not None:
641- PyList_Append(stack, node)
642- node = next_node
643- next_node = self._check_is_linear(node)
644- # The stack now contains the linear chain, and 'node' should have
645- # been labeled
646- dominator = node.linear_dominator_node
647- num_elements = len(stack)
648- for i from num_elements > i >= 0:
649- next_node = _get_list_node(stack, i)
650- next_node.linear_dominator_node = dominator
651- node = next_node
652-
653- cdef object _find_tails(self):
654- cdef object tails
655- cdef PyObject *temp_node
656- cdef Py_ssize_t pos
657- cdef _KnownGraphNode node
658-
659- tails = []
660- pos = 0
661- while PyDict_Next(self._nodes, &pos, NULL, &temp_node):
662- node = <_KnownGraphNode>temp_node
663- if node.parents is None or PyTuple_GET_SIZE(node.parents) == 0:
664- PyList_Append(tails, node)
665- return tails
666-
667 def _find_gdfo(self):
668- cdef Py_ssize_t pos, pos2
669 cdef _KnownGraphNode node
670- cdef _KnownGraphNode child_node
671- cdef _KnownGraphNode parent_node
672- cdef int replace_node, missing_parent
673-
674- tails = self._find_tails()
675- todo = []
676- for pos from 0 <= pos < PyList_GET_SIZE(tails):
677- node = _get_list_node(tails, pos)
678+ cdef _KnownGraphNode child
679+
680+ nodes = self._nodes
681+ pending = []
682+ known_parent_gdfos = {}
683+
684+ for node in self._tails:
685 node.gdfo = 1
686- PyList_Append(todo, (1, node))
687- # No need to heapify, because all tails have priority=1
688- while PyList_GET_SIZE(todo) > 0:
689- node = _peek_node(todo)
690- next_gdfo = node.gdfo + 1
691- replace_node = 1
692- for pos from 0 <= pos < PyList_GET_SIZE(node.children):
693- child_node = _get_list_node(node.children, pos)
694- # We should never have numbered children before we numbered
695- # a parent
696- if child_node.gdfo != -1:
697- continue
698- # Only enque children when all of their parents have been
699- # resolved. With a single parent, we can just take 'this' value
700- child_gdfo = next_gdfo
701- if PyTuple_GET_SIZE(child_node.parents) > 1:
702- missing_parent = 0
703- for pos2 from 0 <= pos2 < PyTuple_GET_SIZE(child_node.parents):
704- parent_node = _get_parent(child_node.parents, pos2)
705- if parent_node.gdfo == -1:
706- missing_parent = 1
707- break
708- if parent_node.gdfo >= child_gdfo:
709- child_gdfo = parent_node.gdfo + 1
710- if missing_parent:
711- # One of the parents is not numbered, so wait until we get
712- # back here
713- continue
714- child_node.gdfo = child_gdfo
715- if replace_node:
716- heapreplace(todo, (child_gdfo, child_node))
717- replace_node = 0
718- else:
719- heappush(todo, (child_gdfo, child_node))
720- if replace_node:
721- heappop(todo)
722+ known_parent_gdfos[node] = 0
723+ pending.append(node)
724+
725+ while pending:
726+ node = <_KnownGraphNode>pending.pop()
727+ for child in node.children:
728+ try:
729+ known_parents = known_parent_gdfos[child.key]
730+ except KeyError:
731+ known_parents = 0
732+ known_parent_gdfos[child.key] = known_parents + 1
733+ if child.gdfo is None or node.gdfo + 1 > child.gdfo:
734+ child.gdfo = node.gdfo + 1
735+ if known_parent_gdfos[child.key] == len(child.parents):
736+ # We are the last parent updating that node, we can
737+ # continue from there
738+ pending.append(child)
739
740 def heads(self, keys):
741 """Return the heads from amongst keys.
742@@ -364,9 +221,8 @@
743 This is done by searching the ancestries of each key. Any key that is
744 reachable from another key is not returned; all the others are.
745
746- This operation scales with the relative depth between any two keys. If
747- any two keys are completely disconnected all ancestry of both sides
748- will be retrieved.
749+ This operation scales with the relative depth between any two keys. It
750+ uses gdfo to avoid walking all ancestry.
751
752 :param keys: An iterable of keys.
753 :return: A set of the heads. Note that as a set there is no ordering
754@@ -375,12 +231,13 @@
755 """
756 cdef PyObject *maybe_node
757 cdef PyObject *maybe_heads
758+ cdef PyObject *temp_node
759+ cdef _KnownGraphNode node
760
761 heads_key = PyFrozenSet_New(keys)
762 maybe_heads = PyDict_GetItem(self._known_heads, heads_key)
763 if maybe_heads != NULL:
764 return <object>maybe_heads
765-
766 # Not cached, compute it ourselves
767 candidate_nodes = {}
768 nodes = self._nodes
769@@ -398,208 +255,30 @@
770 heads_key = PyFrozenSet_New(candidate_nodes)
771 if len(candidate_nodes) < 2:
772 return heads_key
773- dom_to_node = self._get_dominators_to_nodes(candidate_nodes)
774- if PyDict_Size(candidate_nodes) < 2:
775- return frozenset(candidate_nodes)
776- dom_lookup_key, heads = self._heads_from_dominators(candidate_nodes,
777- dom_to_node)
778- if heads is not None:
779- if self.do_cache:
780- # This heads was not in the cache, or it would have been caught
781- # earlier, but the dom head *was*, so do the simple cache
782- PyDict_SetItem(self._known_heads, heads_key, heads)
783- return heads
784- heads = self._heads_from_candidate_nodes(candidate_nodes, dom_to_node)
785+
786+ seen = set()
787+ pending = []
788+ cdef Py_ssize_t pos
789+ pos = 0
790+ min_gdfo = None
791+ while PyDict_Next(candidate_nodes, &pos, NULL, &temp_node):
792+ node = <_KnownGraphNode>temp_node
793+ if node.parents is not None:
794+ pending.extend(node.parents)
795+ if min_gdfo is None or node.gdfo < min_gdfo:
796+ min_gdfo = node.gdfo
797+ nodes = self._nodes
798+ while pending:
799+ node = pending.pop()
800+ if node.key in seen:
801+ # node already appears in some ancestry
802+ continue
803+ seen.add(node.key)
804+ if node.gdfo <= min_gdfo:
805+ continue
806+ if node.parents:
807+ pending.extend(node.parents)
808+ heads = heads_key.difference(seen)
809 if self.do_cache:
810- self._cache_heads(heads, heads_key, dom_lookup_key, candidate_nodes)
811+ self._known_heads[heads_key] = heads
812 return heads
813-
814- cdef object _cache_heads(self, heads, heads_key, dom_lookup_key,
815- candidate_nodes):
816- cdef PyObject *maybe_node
817- cdef _KnownGraphNode node
818-
819- PyDict_SetItem(self._known_heads, heads_key, heads)
820- dom_heads = []
821- for key in heads:
822- maybe_node = PyDict_GetItem(candidate_nodes, key)
823- if maybe_node == NULL:
824- raise KeyError
825- node = <_KnownGraphNode>maybe_node
826- PyList_Append(dom_heads, node.linear_dominator_node.key)
827- PyDict_SetItem(self._known_heads, dom_lookup_key,
828- PyFrozenSet_New(dom_heads))
829-
830- cdef _get_dominators_to_nodes(self, candidate_nodes):
831- """Get the reverse mapping from dominator_key => candidate_nodes.
832-
833- As a side effect, this can also remove potential candidate nodes if we
834- determine that they share a dominator.
835- """
836- cdef Py_ssize_t pos
837- cdef _KnownGraphNode node, other_node
838- cdef PyObject *temp_node
839- cdef PyObject *maybe_node
840-
841- dom_to_node = {}
842- keys_to_remove = []
843- pos = 0
844- while PyDict_Next(candidate_nodes, &pos, NULL, &temp_node):
845- node = <_KnownGraphNode>temp_node
846- dom_key = node.linear_dominator_node.key
847- maybe_node = PyDict_GetItem(dom_to_node, dom_key)
848- if maybe_node == NULL:
849- PyDict_SetItem(dom_to_node, dom_key, node)
850- else:
851- other_node = <_KnownGraphNode>maybe_node
852- # These nodes share a dominator, one of them obviously
853- # supersedes the other, figure out which
854- if other_node.gdfo > node.gdfo:
855- PyList_Append(keys_to_remove, node.key)
856- else:
857- # This wins, replace the other
858- PyList_Append(keys_to_remove, other_node.key)
859- PyDict_SetItem(dom_to_node, dom_key, node)
860- for pos from 0 <= pos < PyList_GET_SIZE(keys_to_remove):
861- key = <object>PyList_GET_ITEM(keys_to_remove, pos)
862- candidate_nodes.pop(key)
863- return dom_to_node
864-
865- cdef object _heads_from_dominators(self, candidate_nodes, dom_to_node):
866- cdef PyObject *maybe_heads
867- cdef PyObject *maybe_node
868- cdef _KnownGraphNode node
869- cdef Py_ssize_t pos
870- cdef PyObject *temp_node
871-
872- dom_list_key = []
873- pos = 0
874- while PyDict_Next(candidate_nodes, &pos, NULL, &temp_node):
875- node = <_KnownGraphNode>temp_node
876- PyList_Append(dom_list_key, node.linear_dominator_node.key)
877- dom_lookup_key = PyFrozenSet_New(dom_list_key)
878- maybe_heads = PyDict_GetItem(self._known_heads, dom_lookup_key)
879- if maybe_heads == NULL:
880- return dom_lookup_key, None
881- # We need to map back from the dominator head to the original keys
882- dom_heads = <object>maybe_heads
883- heads = []
884- for dom_key in dom_heads:
885- maybe_node = PyDict_GetItem(dom_to_node, dom_key)
886- if maybe_node == NULL:
887- # Should never happen
888- raise KeyError
889- node = <_KnownGraphNode>maybe_node
890- PyList_Append(heads, node.key)
891- return dom_lookup_key, PyFrozenSet_New(heads)
892-
893- cdef int _process_parent(self, _KnownGraphNode node,
894- _KnownGraphNode parent_node,
895- candidate_nodes, dom_to_node,
896- queue, int *replace_item) except -1:
897- """Process the parent of a node, seeing if we need to walk it."""
898- cdef PyObject *maybe_candidate
899- cdef PyObject *maybe_node
900- cdef _KnownGraphNode dom_child_node
901- maybe_candidate = PyDict_GetItem(candidate_nodes, parent_node.key)
902- if maybe_candidate != NULL:
903- candidate_nodes.pop(parent_node.key)
904- # We could pass up a flag that tells the caller to stop processing,
905- # but it doesn't help much, and makes the code uglier
906- return 0
907- maybe_node = PyDict_GetItem(dom_to_node, parent_node.key)
908- if maybe_node != NULL:
909- # This is a dominator of a node
910- dom_child_node = <_KnownGraphNode>maybe_node
911- if dom_child_node is not node:
912- # It isn't a dominator of a node we are searching, so we should
913- # remove it from the search
914- maybe_candidate = PyDict_GetItem(candidate_nodes, dom_child_node.key)
915- if maybe_candidate != NULL:
916- candidate_nodes.pop(dom_child_node.key)
917- return 0
918- if parent_node.ancestor_of is None:
919- # This node hasn't been walked yet, so just project node's ancestor
920- # info directly to parent_node, and enqueue it for later processing
921- parent_node.ancestor_of = node.ancestor_of
922- if replace_item[0]:
923- heapreplace(queue, (-parent_node.gdfo, parent_node))
924- replace_item[0] = 0
925- else:
926- heappush(queue, (-parent_node.gdfo, parent_node))
927- PyList_Append(self._to_cleanup, parent_node)
928- elif parent_node.ancestor_of != node.ancestor_of:
929- # Combine to get the full set of parents
930- # Rewrite using PySet_* functions, unfortunately you have to use
931- # PySet_Add since there is no PySet_Update... :(
932- all_ancestors = set(parent_node.ancestor_of)
933- for k in node.ancestor_of:
934- PySet_Add(all_ancestors, k)
935- parent_node.ancestor_of = tuple(sorted(all_ancestors))
936- return 0
937-
938- cdef object _heads_from_candidate_nodes(self, candidate_nodes, dom_to_node):
939- cdef _KnownGraphNode node
940- cdef _KnownGraphNode parent_node
941- cdef Py_ssize_t num_candidates
942- cdef int num_parents, replace_item
943- cdef Py_ssize_t pos
944- cdef PyObject *temp_node
945-
946- queue = []
947- pos = 0
948- while PyDict_Next(candidate_nodes, &pos, NULL, &temp_node):
949- node = <_KnownGraphNode>temp_node
950- node.ancestor_of = (node.key,)
951- PyList_Append(queue, (-node.gdfo, node))
952- PyList_Append(self._to_cleanup, node)
953- heapify(queue)
954- # These are nodes that we determined are 'common' that we are no longer
955- # walking
956- # Now we walk nodes until all nodes that are being walked are 'common'
957- num_candidates = len(candidate_nodes)
958- replace_item = 0
959- while PyList_GET_SIZE(queue) > 0 and PyDict_Size(candidate_nodes) > 1:
960- if replace_item:
961- # We still need to pop the smallest member out of the queue
962- # before we peek again
963- heappop(queue)
964- if PyList_GET_SIZE(queue) == 0:
965- break
966- # peek at the smallest item. We don't pop, because we expect we'll
967- # need to push more things into the queue anyway
968- node = _peek_node(queue)
969- replace_item = 1
970- if PyTuple_GET_SIZE(node.ancestor_of) == num_candidates:
971- # This node is now considered 'common'
972- # Make sure all parent nodes are marked as such
973- for pos from 0 <= pos < PyTuple_GET_SIZE(node.parents):
974- parent_node = _get_parent(node.parents, pos)
975- if parent_node.ancestor_of is not None:
976- parent_node.ancestor_of = node.ancestor_of
977- if node.linear_dominator_node is not node:
978- parent_node = node.linear_dominator_node
979- if parent_node.ancestor_of is not None:
980- parent_node.ancestor_of = node.ancestor_of
981- continue
982- if node.parents is None:
983- # This is a ghost
984- continue
985- # Now project the current nodes ancestor list to the parent nodes,
986- # and queue them up to be walked
987- if node.linear_dominator_node is not node:
988- # We are at the tip of a long linear region
989- # We know that there is nothing between here and the tail
990- # that is interesting, so skip to the end
991- self._process_parent(node, node.linear_dominator_node,
992- candidate_nodes, dom_to_node, queue, &replace_item)
993- else:
994- for pos from 0 <= pos < PyTuple_GET_SIZE(node.parents):
995- parent_node = _get_parent(node.parents, pos)
996- self._process_parent(node, parent_node, candidate_nodes,
997- dom_to_node, queue, &replace_item)
998- for pos from 0 <= pos < PyList_GET_SIZE(self._to_cleanup):
999- node = _get_list_node(self._to_cleanup, pos)
1000- node.ancestor_of = None
1001- self._to_cleanup = []
1002- return PyFrozenSet_New(candidate_nodes)
1003
1004=== modified file 'bzrlib/tests/test__known_graph.py'
1005--- bzrlib/tests/test__known_graph.py 2009-06-16 15:35:14 +0000
1006+++ bzrlib/tests/test__known_graph.py 2009-06-19 02:35:18 +0000
1007@@ -71,10 +71,6 @@
1008 def make_known_graph(self, ancestry):
1009 return self.module.KnownGraph(ancestry, do_cache=self.do_cache)
1010
1011- def assertDominator(self, graph, rev, dominator):
1012- node = graph._nodes[rev]
1013- self.assertEqual(dominator, node.linear_dominator)
1014-
1015 def assertGDFO(self, graph, rev, gdfo):
1016 node = graph._nodes[rev]
1017 self.assertEqual(gdfo, node.gdfo)
1018@@ -88,29 +84,6 @@
1019 self.assertEqual(['rev4'], sorted(graph._nodes['rev3'].child_keys))
1020 self.assertEqual(['rev4'], sorted(graph._nodes['rev2b'].child_keys))
1021
1022- def test_dominators_ancestry_1(self):
1023- graph = self.make_known_graph(test_graph.ancestry_1)
1024- self.assertDominator(graph, 'rev1', NULL_REVISION)
1025- self.assertDominator(graph, 'rev2b', 'rev2b')
1026- self.assertDominator(graph, 'rev2a', 'rev2a')
1027- self.assertDominator(graph, 'rev3', 'rev2a')
1028- self.assertDominator(graph, 'rev4', 'rev4')
1029-
1030- def test_dominators_feature_branch(self):
1031- graph = self.make_known_graph(test_graph.feature_branch)
1032- self.assertDominator(graph, 'rev1', NULL_REVISION)
1033- self.assertDominator(graph, 'rev2b', NULL_REVISION)
1034- self.assertDominator(graph, 'rev3b', NULL_REVISION)
1035-
1036- def test_dominators_extended_history_shortcut(self):
1037- graph = self.make_known_graph(test_graph.extended_history_shortcut)
1038- self.assertDominator(graph, 'a', NULL_REVISION)
1039- self.assertDominator(graph, 'b', 'b')
1040- self.assertDominator(graph, 'c', 'b')
1041- self.assertDominator(graph, 'd', 'b')
1042- self.assertDominator(graph, 'e', 'e')
1043- self.assertDominator(graph, 'f', 'f')
1044-
1045 def test_gdfo_ancestry_1(self):
1046 graph = self.make_known_graph(test_graph.ancestry_1)
1047 self.assertGDFO(graph, 'rev1', 2)
1048@@ -229,3 +202,14 @@
1049 self.assertEqual(set(['z']), graph.heads(['w', 's', 'z']))
1050 self.assertEqual(set(['w', 'q']), graph.heads(['w', 's', 'q']))
1051 self.assertEqual(set(['z']), graph.heads(['s', 'z']))
1052+
1053+ def test_heads_with_ghost(self):
1054+ graph = self.make_known_graph(test_graph.with_ghost)
1055+ self.assertEqual(set(['e', 'g']), graph.heads(['e', 'g']))
1056+ self.assertEqual(set(['a', 'c']), graph.heads(['a', 'c']))
1057+ self.assertEqual(set(['a', 'g']), graph.heads(['a', 'g']))
1058+ self.assertEqual(set(['f', 'g']), graph.heads(['f', 'g']))
1059+ self.assertEqual(set(['c']), graph.heads(['c', 'g']))
1060+ self.assertEqual(set(['c']), graph.heads(['c', 'b', 'd', 'g']))
1061+ self.assertEqual(set(['a', 'c']), graph.heads(['a', 'c', 'e', 'g']))
1062+ self.assertEqual(set(['a', 'c']), graph.heads(['a', 'c', 'f']))
1063
1064=== modified file 'tools/time_graph.py'
1065--- tools/time_graph.py 2009-06-12 18:05:15 +0000
1066+++ tools/time_graph.py 2009-06-19 02:35:18 +0000
1067@@ -16,12 +16,15 @@
1068 from bzrlib.ui import text
1069
1070 p = optparse.OptionParser()
1071+p.add_option('--quick', default=False, action='store_true')
1072 p.add_option('--max-combinations', default=500, type=int)
1073 p.add_option('--lsprof', default=None, type=str)
1074 opts, args = p.parse_args(sys.argv[1:])
1075+
1076 trace.enable_default_logging()
1077 ui.ui_factory = text.TextUIFactory()
1078
1079+begin = time.clock()
1080 if len(args) >= 1:
1081 b = branch.Branch.open(args[0])
1082 else:
1083@@ -33,8 +36,9 @@
1084 if p[1] is not None)
1085 finally:
1086 b.unlock()
1087+end = time.clock()
1088
1089-print 'Found %d nodes' % (len(parent_map),)
1090+print 'Found %d nodes, loaded in %.3fs' % (len(parent_map), end - begin)
1091
1092 def all_heads_comp(g, combinations):
1093 h = []
1094@@ -47,6 +51,7 @@
1095 finally:
1096 pb.finished()
1097 return h
1098+
1099 combinations = []
1100 # parents = parent_map.keys()
1101 # for p1 in parents:
1102@@ -65,33 +70,48 @@
1103 combinations = random.sample(combinations, opts.max_combinations)
1104
1105 print ' %d combinations' % (len(combinations),)
1106-t1 = time.clock()
1107-known_g = _known_graph_py.KnownGraph(parent_map)
1108-if opts.lsprof is not None:
1109- h_known = commands.apply_lsprofiled(opts.lsprof,
1110- all_heads_comp, known_g, combinations)
1111-else:
1112- h_known = all_heads_comp(known_g, combinations)
1113-t2 = time.clock()
1114-print "Known: %.3fs" % (t2-t1,)
1115-print " %s" % (graph._counters,)
1116-t1 = time.clock()
1117-known_g = _known_graph_pyx.KnownGraph(parent_map)
1118-if opts.lsprof is not None:
1119- h_known = commands.apply_lsprofiled(opts.lsprof,
1120- all_heads_comp, known_g, combinations)
1121-else:
1122- h_known = all_heads_comp(known_g, combinations)
1123-t2 = time.clock()
1124-print "Known (pyx): %.3fs" % (t2-t1,)
1125-print " %s" % (graph._counters,)
1126-simple_g = graph.Graph(graph.DictParentsProvider(parent_map))
1127-graph._counters[1] = 0
1128-graph._counters[2] = 0
1129-h_simple = all_heads_comp(simple_g, combinations)
1130-t3 = time.clock()
1131-print "Orig: %.3fs" % (t3-t2,)
1132-print " %s" % (graph._counters,)
1133-if h_simple != h_known:
1134- import pdb; pdb.set_trace()
1135-print 'ratio: %.3fs' % ((t2-t1) / (t3-t2))
1136+
1137+def combi_graph(graph_klass, comb):
1138+ # DEBUG
1139+ graph._counters[1] = 0
1140+ graph._counters[2] = 0
1141+
1142+ begin = time.clock()
1143+ g = graph_klass(parent_map)
1144+ if opts.lsprof is not None:
1145+ heads = commands.apply_lsprofiled(opts.lsprof, all_heads_comp, g, comb)
1146+ else:
1147+ heads = all_heads_comp(g, comb)
1148+ end = time.clock()
1149+ return dict(elapsed=(end - begin), graph=g, heads=heads)
1150+
1151+def report(name, g):
1152+ print '%s: %.3fs' % (name, g['elapsed'])
1153+ counters_used = False
1154+ for c in graph._counters:
1155+ if c:
1156+ counters_used = True
1157+ if counters_used:
1158+ print ' %s' % (graph._counters,)
1159+
1160+known_python = combi_graph(_known_graph_py.KnownGraph, combinations)
1161+report('Known', known_python)
1162+
1163+known_pyrex = combi_graph(_known_graph_pyx.KnownGraph, combinations)
1164+report('Known (pyx)', known_pyrex)
1165+
1166+def _simple_graph(parent_map):
1167+ return graph.Graph(graph.DictParentsProvider(parent_map))
1168+
1169+if opts.quick:
1170+ if known_python['heads'] != known_pyrex['heads']:
1171+ import pdb; pdb.set_trace()
1172+ print 'ratio: %.3fs' % (known_pyrex['elapsed'] / known_python['elapsed'])
1173+else:
1174+ orig = combi_graph(_simple_graph, combinations)
1175+ report('Orig', orig)
1176+
1177+ if orig['heads'] != known_pyrex['heads']:
1178+ import pdb; pdb.set_trace()
1179+
1180+ print 'ratio: %.3fs' % (known_pyrex['elapsed'] / orig['elapsed'])