Merge lp:~jameinel/bzr/jam-gdfo-heads into lp:~bzr/bzr/trunk-old

Proposed by John A Meinel
Status: Merged
Merged at revision: not available
Proposed branch: lp:~jameinel/bzr/jam-gdfo-heads
Merge into: lp:~bzr/bzr/trunk-old
Diff against target: 1241 lines
To merge this branch: bzr merge lp:~jameinel/bzr/jam-gdfo-heads
Reviewer Review Type Date Requested Status
Vincent Ladeuil Approve
Review via email: mp+7694@code.launchpad.net
To post a comment you must log in.
Revision history for this message
John A Meinel (jameinel) wrote :

So this is my followup to Vincent's new algorithm for KnownGraph.heads() The only real changes are to tweak the Pyrex code to use direct access.

There are some wins by changing to using direct access to members, and the big win is to change the algorithm to avoid calling list.pop(). Instead we just track what item we are on and insert an item there. If we find we need more space then we append. (Note there is a specific edge case when you are popping short lists as it reallocs() whenever the size shrinks to 1/2 the max size. However if you only have 2 items, that is *every* pop, and then every append following it will trigger another realloc.)

Also, this changes away from using a 'seen' dictionary to using a .seen attribute.

Here are my times:

            annotate NEWS time_graph (all 7k)
bzr.dev 11.6s 71.4s
vila pyx 11.5s 1.102s
jam pyx 9.7s 0.362s

So while Vincent's update is godlike for whole-ancestry heads() calls, it doesn't show much of an effect for NEWS. I'm pretty happy to find that with a bit of tweaking the new algorithm still gives another 10% of total annotate performance (note that 'resolve right' is only 8.6s, so we are only spending ~1s resolving something like 200k heads now.)

The main diff between my old heads() and Vincent's, is that I was effectively doing a 'find_unique_ancestors' walk, while he realized that you don't have to walk any ancestors of the lowest node.

One other bit of interest for the new code. We track how many nodes we have to walk in order to compute heads() (used to be in cleanup, then in seen, and now in cleanup again.)
The bzr entry walking the most nodes is:
    <email address hidden>
    <email address hidden>

(landing the groupcompress code into brisbane-core, IIRC)
We walk 22k revisions (out of 25.4k total...) because the plugin gdfo is 182, while the bzr gdfo is 7150. So something very close to null versus something about as far away as we can get in bzr. The time to compute heads() between the two... 5ms. Yes, that is 0.005s to walk 22k revisions after computing gdfo and other such bits and determine that the two revisions are not ancestors.

I did end up spending some time to tweak the _find_gdfo() code as well. Once you are around 400ms for doing heads() for every merge revision, suddenly the 100ms or so spent in _find_gdfo starts to be a factor :).
I found that computing the tails 'for free' using a set was *not* free, and was, in fact, a lot slower than walking the dict after the fact. The issue is that parent_map is purely random walking, so you end up pushing a lot of nodes into the set that you just remove. In contrast it takes only 40ms to walk 200k nodes in the OOo dict, and find the small handful of tails we will care about.

_find_gdfo is down around 9ms time for bzr.dev. The bulk of the time in __init__ is now in _initialize_nodes which is doing mallocs for all the objects. Removing _find_gdfo is 35.5ms bzr.dev, 407ms OOo, versus 44.4ms, 563ms w/.

Revision history for this message
Vincent Ladeuil (vila) wrote :
Download full text (5.3 KiB)

>>>>> "jam" == John A Meinel <email address hidden> writes:

    jam> John A Meinel has proposed merging
    jam> lp:~jameinel/bzr/jam-gdfo-heads into lp:bzr.

 review approve

    jam> Requested reviews:
    jam> bzr-core (bzr-core)

    jam> So this is my followup to Vincent's new algorithm for
    jam> KnownGraph.heads() The only real changes are to tweak the Pyrex
    jam> code to use direct access.

    jam> There are some wins by changing to using direct access to
    jam> members, and the big win is to change the algorithm to avoid
    jam> calling list.pop(). Instead we just track what item we are on and
    jam> insert an item there. If we find we need more space then we
    jam> append. (Note there is a specific edge case when you are popping
    jam> short lists as it reallocs() whenever the size shrinks to 1/2 the
    jam> max size. However if you only have 2 items, that is *every* pop,
    jam> and then every append following it will trigger another realloc.)

    jam> Also, this changes away from using a 'seen' dictionary to using a
    jam> .seen attribute.

    jam> Here are my times:

    jam> annotate NEWS time_graph (all 7k)
    jam> bzr.dev 11.6s 71.4s
    jam> vila pyx 11.5s 1.102s
    jam> jam pyx 9.7s 0.362s

Blam :) Well done !

But while I can reproduce the time_graph times (taking into
account that my test config is faster than yours), I can't
repoduce the NEWS ones, are you sure you compile the extensions
before measuring these times ? Here are the ones I can reproduce:

            annotate NEWS time_graph (all 7k)
bzr.dev pyx 6.62s 45.390s
vila pyx 5.92s 0.510s
jam pyx 5.45s 0.180s

If we can't reliably reproduce other's experiment, I think it's
worth investigating why... I used bzr.dev@4459, what about yours
for example ? Could it be that using an SSD introduce such a
distortion (worth isolating IO related penalties, but I can't see
where in the code concerned by this patch) ? By the way, I
measure annotate with --show-ids and >/dev/null, I'm pretty sure
you do the same, but just in case.

    jam> So while Vincent's update is godlike for whole-ancestry heads()
    jam> calls, it doesn't show much of an effect for NEWS.

And for mysql:

            annotate sql/mysqld.cc time_graph (all 34k)
bzr.dev pyx 18.87s 1751.760s
vila pyx 18.42s 17.190s
jam pyx 18.43s 4.030s

Note that I made a lot of measures here to break the tie, so the
difference is a bit less obvious than for NEWS. Your version *is*
better on average, mine did the best score, but obviously the
variation is high and I don't have a good explanation for that
(except that the time spent in heads is now so low we don't
measure it properly in that case).

<snip/>

    jam> (note that resolve right' is only 8.6s, so we are only
    jam> spending ~1s resolving something like 200k heads now.)

And being qualitatively better !

<snip/>

    jam> I did end up spending some time to tweak the _find_gdfo() code as
    jam> well. Once you are around 400ms for doing heads() for e...

Read more...

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bzrlib/_known_graph_py.py'
2--- bzrlib/_known_graph_py.py 2009-06-16 15:35:14 +0000
3+++ bzrlib/_known_graph_py.py 2009-06-20 08:35:13 +0000
4@@ -17,8 +17,6 @@
5 """Implementation of Graph algorithms when we have already loaded everything.
6 """
7
8-import heapq
9-
10 from bzrlib import (
11 revision,
12 )
13@@ -27,27 +25,19 @@
14 class _KnownGraphNode(object):
15 """Represents a single object in the known graph."""
16
17- __slots__ = ('key', 'parent_keys', 'child_keys', 'linear_dominator',
18- 'gdfo', 'ancestor_of')
19+ __slots__ = ('key', 'parent_keys', 'child_keys', 'gdfo')
20
21 def __init__(self, key, parent_keys):
22 self.key = key
23 self.parent_keys = parent_keys
24 self.child_keys = []
25- # oldest ancestor, such that no parents between here and there have >1
26- # child or >1 parent.
27- self.linear_dominator = None
28 # Greatest distance from origin
29 self.gdfo = None
30- # This will become a tuple of known heads that have this node as an
31- # ancestor
32- self.ancestor_of = None
33
34 def __repr__(self):
35- return '%s(%s gdfo:%s par:%s child:%s %s)' % (
36+ return '%s(%s gdfo:%s par:%s child:%s)' % (
37 self.__class__.__name__, self.key, self.gdfo,
38- self.parent_keys, self.child_keys,
39- self.linear_dominator)
40+ self.parent_keys, self.child_keys)
41
42
43 class KnownGraph(object):
44@@ -63,21 +53,28 @@
45 self._known_heads = {}
46 self.do_cache = do_cache
47 self._initialize_nodes(parent_map)
48- self._find_linear_dominators()
49 self._find_gdfo()
50
51 def _initialize_nodes(self, parent_map):
52 """Populate self._nodes.
53
54- After this has finished, self._nodes will have an entry for every entry
55- in parent_map. Ghosts will have a parent_keys = None, all nodes found
56- will also have .child_keys populated with all known child_keys.
57+ After this has finished:
58+ - self._nodes will have an entry for every entry in parent_map.
59+ - ghosts will have a parent_keys = None,
60+ - all nodes found will also have .child_keys populated with all known
61+ child_keys,
62+ - self._tails will list all the nodes without parents.
63 """
64+ tails = self._tails = set()
65 nodes = self._nodes
66 for key, parent_keys in parent_map.iteritems():
67 if key in nodes:
68 node = nodes[key]
69 node.parent_keys = parent_keys
70+ if parent_keys:
71+ # This node has been added before being seen in parent_map
72+ # (see below)
73+ tails.remove(node)
74 else:
75 node = _KnownGraphNode(key, parent_keys)
76 nodes[key] = node
77@@ -87,129 +84,41 @@
78 except KeyError:
79 parent_node = _KnownGraphNode(parent_key, None)
80 nodes[parent_key] = parent_node
81+ # Potentially a tail, if we're wrong we'll remove it later
82+ # (see above)
83+ tails.add(parent_node)
84 parent_node.child_keys.append(key)
85
86- def _find_linear_dominators(self):
87- """For each node in the set, find any linear dominators.
88-
89- For any given node, the 'linear dominator' is an ancestor, such that
90- all parents between this node and that one have a single parent, and a
91- single child. So if A->B->C->D then B,C,D all have a linear dominator
92- of A.
93-
94- There are two main benefits:
95- 1) When walking the graph, we can jump to the nearest linear dominator,
96- rather than walking all of the nodes inbetween.
97- 2) When caching heads() results, dominators give the "same" results as
98- their children. (If the dominator is a head, then the descendant is
99- a head, if the dominator is not a head, then the child isn't
100- either.)
101- """
102- def check_node(node):
103- if node.parent_keys is None or len(node.parent_keys) != 1:
104- # This node is either a ghost, a tail, or has multiple parents
105- # It its own dominator
106- node.linear_dominator = node.key
107- return None
108- parent_node = self._nodes[node.parent_keys[0]]
109- if len(parent_node.child_keys) > 1:
110- # The parent has multiple children, so *this* node is the
111- # dominator
112- node.linear_dominator = node.key
113- return None
114- # The parent is already filled in, so add and continue
115- if parent_node.linear_dominator is not None:
116- node.linear_dominator = parent_node.linear_dominator
117- return None
118- # We don't know this node, or its parent node, so start walking to
119- # next
120- return parent_node
121-
122- for node in self._nodes.itervalues():
123- # The parent is not filled in, so walk until we get somewhere
124- if node.linear_dominator is not None: #already done
125- continue
126- next_node = check_node(node)
127- if next_node is None:
128- # Nothing more needs to be done
129- continue
130- stack = []
131- while next_node is not None:
132- stack.append(node)
133- node = next_node
134- next_node = check_node(node)
135- # The stack now contains the linear chain, and 'node' should have
136- # been labeled
137- dominator = node.linear_dominator
138- while stack:
139- next_node = stack.pop()
140- next_node.linear_dominator = dominator
141- node = next_node
142-
143 def _find_gdfo(self):
144- def find_tails():
145- return [node for node in self._nodes.itervalues()
146- if not node.parent_keys]
147- tails = find_tails()
148- todo = []
149- heappush = heapq.heappush
150- heappop = heapq.heappop
151 nodes = self._nodes
152- for node in tails:
153+ known_parent_gdfos = {}
154+ pending = []
155+
156+ for node in self._tails:
157 node.gdfo = 1
158- heappush(todo, (1, node))
159- processed = 0
160- while todo:
161- gdfo, next = heappop(todo)
162- processed += 1
163- if next.gdfo is not None and gdfo < next.gdfo:
164- # This node was reached from a longer path, we assume it was
165- # enqued correctly with the longer gdfo, so don't continue
166- # processing now
167- continue
168- next_gdfo = gdfo + 1
169- for child_key in next.child_keys:
170- child_node = nodes[child_key]
171- if child_node.gdfo is None or child_node.gdfo < next_gdfo:
172- # Only enque children when all of their parents have been
173- # resolved
174- for parent_key in child_node.parent_keys:
175- # We know that 'this' parent is counted
176- if parent_key != next.key:
177- parent_node = nodes[parent_key]
178- if parent_node.gdfo is None:
179- break
180- else:
181- child_node.gdfo = next_gdfo
182- heappush(todo, (next_gdfo, child_node))
183-
184- def _get_dominators_to_nodes(self, candidate_nodes):
185- """Get the reverse mapping from dominator_key => candidate_nodes.
186-
187- As a side effect, this can also remove potential candidate nodes if we
188- determine that they share a dominator.
189- """
190- dom_to_node = {}
191- keys_to_remove = []
192- for node in candidate_nodes.values():
193- if node.linear_dominator in dom_to_node:
194- # This node already exists, resolve which node supersedes the
195- # other
196- other_node = dom_to_node[node.linear_dominator]
197- # There should be no way that nodes sharing a dominator could
198- # 'tie' for gdfo
199- if other_node.gdfo > node.gdfo:
200- # The other node has this node as an ancestor
201- keys_to_remove.append(node.key)
202- else:
203- # Replace the other node, and set this as the new key
204- keys_to_remove.append(other_node.key)
205- dom_to_node[node.linear_dominator] = node
206- else:
207- dom_to_node[node.linear_dominator] = node
208- for key in keys_to_remove:
209- candidate_nodes.pop(key)
210- return dom_to_node
211+ pending.append(node)
212+
213+ while pending:
214+ node = pending.pop()
215+ for child_key in node.child_keys:
216+ child = nodes[child_key]
217+ if child_key in known_parent_gdfos:
218+ known_gdfo = known_parent_gdfos[child_key] + 1
219+ present = True
220+ else:
221+ known_gdfo = 1
222+ present = False
223+ if child.gdfo is None or node.gdfo + 1 > child.gdfo:
224+ child.gdfo = node.gdfo + 1
225+ if known_gdfo == len(child.parent_keys):
226+ # We are the last parent updating that node, we can
227+ # continue from there
228+ pending.append(child)
229+ if present:
230+ del known_parent_gdfos[child_key]
231+ else:
232+ # Update known_parent_gdfos for a key we couldn't process
233+ known_parent_gdfos[child_key] = known_gdfo
234
235 def heads(self, keys):
236 """Return the heads from amongst keys.
237@@ -217,9 +126,8 @@
238 This is done by searching the ancestries of each key. Any key that is
239 reachable from another key is not returned; all the others are.
240
241- This operation scales with the relative depth between any two keys. If
242- any two keys are completely disconnected all ancestry of both sides
243- will be retrieved.
244+ This operation scales with the relative depth between any two keys. It
245+ uses gdfo to avoid walking all ancestry.
246
247 :param keys: An iterable of keys.
248 :return: A set of the heads. Note that as a set there is no ordering
249@@ -231,114 +139,43 @@
250 # NULL_REVISION is only a head if it is the only entry
251 candidate_nodes.pop(revision.NULL_REVISION)
252 if not candidate_nodes:
253- return set([revision.NULL_REVISION])
254+ return frozenset([revision.NULL_REVISION])
255 if len(candidate_nodes) < 2:
256+ # No or only one candidate
257 return frozenset(candidate_nodes)
258 heads_key = frozenset(candidate_nodes)
259 if heads_key != frozenset(keys):
260+ # Mention duplicates
261 note('%s != %s', heads_key, frozenset(keys))
262+ # Do we have a cached result ?
263 try:
264 heads = self._known_heads[heads_key]
265 return heads
266 except KeyError:
267- pass # compute it ourselves
268- dom_to_node = self._get_dominators_to_nodes(candidate_nodes)
269- if len(candidate_nodes) < 2:
270- # We shrunk candidate_nodes and determined a new head
271- return frozenset(candidate_nodes)
272- dom_heads_key = None
273- # Check the linear dominators of these keys, to see if we already
274- # know the heads answer
275- dom_heads_key = frozenset([node.linear_dominator
276- for node in candidate_nodes.itervalues()])
277- if dom_heads_key in self._known_heads:
278- # map back into the original keys
279- heads = self._known_heads[dom_heads_key]
280- heads = frozenset([dom_to_node[key].key for key in heads])
281- return heads
282- heads = self._heads_from_candidate_nodes(candidate_nodes, dom_to_node)
283+ pass
284+ # Let's compute the heads
285+ seen = set()
286+ pending = []
287+ min_gdfo = None
288+ for node in candidate_nodes.values():
289+ if node.parent_keys:
290+ pending.extend(node.parent_keys)
291+ if min_gdfo is None or node.gdfo < min_gdfo:
292+ min_gdfo = node.gdfo
293+ nodes = self._nodes
294+ while pending:
295+ node_key = pending.pop()
296+ if node_key in seen:
297+ # node already appears in some ancestry
298+ continue
299+ seen.add(node_key)
300+ node = nodes[node_key]
301+ if node.gdfo <= min_gdfo:
302+ continue
303+ if node.parent_keys:
304+ pending.extend(node.parent_keys)
305+ heads = heads_key.difference(seen)
306 if self.do_cache:
307 self._known_heads[heads_key] = heads
308- # Cache the dominator heads
309- if dom_heads_key is not None:
310- dom_heads = frozenset([candidate_nodes[key].linear_dominator
311- for key in heads])
312- self._known_heads[dom_heads_key] = dom_heads
313 return heads
314
315- def _heads_from_candidate_nodes(self, candidate_nodes, dom_to_node):
316- queue = []
317- to_cleanup = []
318- to_cleanup_append = to_cleanup.append
319- for node in candidate_nodes.itervalues():
320- node.ancestor_of = (node.key,)
321- queue.append((-node.gdfo, node))
322- to_cleanup_append(node)
323- heapq.heapify(queue)
324- # These are nodes that we determined are 'common' that we are no longer
325- # walking
326- # Now we walk nodes until all nodes that are being walked are 'common'
327- num_candidates = len(candidate_nodes)
328- nodes = self._nodes
329- heappop = heapq.heappop
330- heappush = heapq.heappush
331- while queue and len(candidate_nodes) > 1:
332- _, node = heappop(queue)
333- next_ancestor_of = node.ancestor_of
334- if len(next_ancestor_of) == num_candidates:
335- # This node is now considered 'common'
336- # Make sure all parent nodes are marked as such
337- for parent_key in node.parent_keys:
338- parent_node = nodes[parent_key]
339- if parent_node.ancestor_of is not None:
340- parent_node.ancestor_of = next_ancestor_of
341- if node.linear_dominator != node.key:
342- parent_node = nodes[node.linear_dominator]
343- if parent_node.ancestor_of is not None:
344- parent_node.ancestor_of = next_ancestor_of
345- continue
346- if node.parent_keys is None:
347- # This is a ghost
348- continue
349- # Now project the current nodes ancestor list to the parent nodes,
350- # and queue them up to be walked
351- # Note: using linear_dominator speeds things up quite a bit
352- # enough that we actually start to be slightly faster
353- # than the default heads() implementation
354- if node.linear_dominator != node.key:
355- # We are at the tip of a long linear region
356- # We know that there is nothing between here and the tail
357- # that is interesting, so skip to the end
358- parent_keys = [node.linear_dominator]
359- else:
360- parent_keys = node.parent_keys
361- for parent_key in parent_keys:
362- if parent_key in candidate_nodes:
363- candidate_nodes.pop(parent_key)
364- if len(candidate_nodes) <= 1:
365- break
366- elif parent_key in dom_to_node:
367- orig_node = dom_to_node[parent_key]
368- if orig_node is not node:
369- if orig_node.key in candidate_nodes:
370- candidate_nodes.pop(orig_node.key)
371- if len(candidate_nodes) <= 1:
372- break
373- parent_node = nodes[parent_key]
374- ancestor_of = parent_node.ancestor_of
375- if ancestor_of is None:
376- # This node hasn't been walked yet
377- parent_node.ancestor_of = next_ancestor_of
378- # Enqueue this node
379- heappush(queue, (-parent_node.gdfo, parent_node))
380- to_cleanup_append(parent_node)
381- elif ancestor_of != next_ancestor_of:
382- # Combine to get the full set of parents
383- all_ancestors = set(ancestor_of)
384- all_ancestors.update(next_ancestor_of)
385- parent_node.ancestor_of = tuple(sorted(all_ancestors))
386- def cleanup():
387- for node in to_cleanup:
388- node.ancestor_of = None
389- cleanup()
390- return frozenset(candidate_nodes)
391
392=== modified file 'bzrlib/_known_graph_pyx.pyx'
393--- bzrlib/_known_graph_pyx.pyx 2009-06-16 15:35:14 +0000
394+++ bzrlib/_known_graph_pyx.pyx 2009-06-20 08:35:13 +0000
395@@ -26,32 +26,31 @@
396 pass
397
398 object PyFrozenSet_New(object)
399+
400 object PyTuple_New(Py_ssize_t n)
401+ Py_ssize_t PyTuple_GET_SIZE(object t)
402+ PyObject * PyTuple_GET_ITEM(object t, Py_ssize_t o)
403 void PyTuple_SET_ITEM(object t, Py_ssize_t o, object v)
404- PyObject * PyTuple_GET_ITEM(object t, Py_ssize_t o)
405- Py_ssize_t PyTuple_GET_SIZE(object t)
406+
407+ Py_ssize_t PyList_GET_SIZE(object l)
408+ PyObject * PyList_GET_ITEM(object l, Py_ssize_t o)
409+ int PyList_SetItem(object l, Py_ssize_t o, object l) except -1
410+ int PyList_Append(object l, object v) except -1
411+
412+ int PyDict_CheckExact(object d)
413+ Py_ssize_t PyDict_Size(object d) except -1
414 PyObject * PyDict_GetItem(object d, object k)
415- Py_ssize_t PyDict_Size(object d) except -1
416- int PyDict_CheckExact(object d)
417+ int PyDict_SetItem(object d, object k, object v) except -1
418+ int PyDict_DelItem(object d, object k) except -1
419 int PyDict_Next(object d, Py_ssize_t *pos, PyObject **k, PyObject **v)
420- int PyList_Append(object l, object v) except -1
421- PyObject * PyList_GET_ITEM(object l, Py_ssize_t o)
422- Py_ssize_t PyList_GET_SIZE(object l)
423- int PyDict_SetItem(object d, object k, object v) except -1
424- int PySet_Add(object s, object k) except -1
425+
426 void Py_INCREF(object)
427
428
429-import heapq
430-
431 from bzrlib import revision
432
433-# Define these as cdef objects, so we don't have to getattr them later
434-cdef object heappush, heappop, heapify, heapreplace
435-heappush = heapq.heappush
436-heappop = heapq.heappop
437-heapify = heapq.heapify
438-heapreplace = heapq.heapreplace
439+cdef object NULL_REVISION
440+NULL_REVISION = revision.NULL_REVISION
441
442
443 cdef class _KnownGraphNode:
444@@ -60,10 +59,8 @@
445 cdef object key
446 cdef object parents
447 cdef object children
448- cdef _KnownGraphNode linear_dominator_node
449- cdef public object gdfo # Int
450- # This could also be simplified
451- cdef object ancestor_of
452+ cdef public long gdfo
453+ cdef int seen
454
455 def __init__(self, key):
456 cdef int i
457@@ -72,14 +69,9 @@
458 self.parents = None
459
460 self.children = []
461- # oldest ancestor, such that no parents between here and there have >1
462- # child or >1 parent.
463- self.linear_dominator_node = None
464 # Greatest distance from origin
465 self.gdfo = -1
466- # This will become a tuple of known heads that have this node as an
467- # ancestor
468- self.ancestor_of = None
469+ self.seen = 0
470
471 property child_keys:
472 def __get__(self):
473@@ -90,17 +82,9 @@
474 PyList_Append(keys, child.key)
475 return keys
476
477- property linear_dominator:
478- def __get__(self):
479- if self.linear_dominator_node is None:
480- return None
481- else:
482- return self.linear_dominator_node.key
483-
484 cdef clear_references(self):
485 self.parents = None
486 self.children = None
487- self.linear_dominator_node = None
488
489 def __repr__(self):
490 cdef _KnownGraphNode node
491@@ -113,10 +97,9 @@
492 if self.children is not None:
493 for node in self.children:
494 child_keys.append(node.key)
495- return '%s(%s gdfo:%s par:%s child:%s %s)' % (
496+ return '%s(%s gdfo:%s par:%s child:%s)' % (
497 self.__class__.__name__, self.key, self.gdfo,
498- parent_keys, child_keys,
499- self.linear_dominator)
500+ parent_keys, child_keys)
501
502
503 cdef _KnownGraphNode _get_list_node(lst, Py_ssize_t pos):
504@@ -134,14 +117,6 @@
505 return <_KnownGraphNode>temp_node
506
507
508-cdef _KnownGraphNode _peek_node(queue):
509- cdef PyObject *temp_node
510- cdef _KnownGraphNode node
511-
512- temp_node = PyTuple_GET_ITEM(<object>PyList_GET_ITEM(queue, 0), 1)
513- node = <_KnownGraphNode>temp_node
514- return node
515-
516 # TODO: slab allocate all _KnownGraphNode objects.
517 # We already know how many we are going to need, except for a couple of
518 # ghosts that could be allocated on demand.
519@@ -152,22 +127,18 @@
520 cdef public object _nodes
521 cdef object _known_heads
522 cdef public int do_cache
523- # Nodes we've touched that we'll need to reset their info when heads() is
524- # done
525- cdef object _to_cleanup
526
527 def __init__(self, parent_map, do_cache=True):
528 """Create a new KnownGraph instance.
529
530 :param parent_map: A dictionary mapping key => parent_keys
531 """
532+ # tests at pre-allocating the node dict actually slowed things down
533 self._nodes = {}
534 # Maps {sorted(revision_id, revision_id): heads}
535 self._known_heads = {}
536- self._to_cleanup = []
537 self.do_cache = int(do_cache)
538 self._initialize_nodes(parent_map)
539- self._find_linear_dominators()
540 self._find_gdfo()
541
542 def __dealloc__(self):
543@@ -194,17 +165,17 @@
544 def _initialize_nodes(self, parent_map):
545 """Populate self._nodes.
546
547- After this has finished, self._nodes will have an entry for every entry
548- in parent_map. Ghosts will have a parent_keys = None, all nodes found
549- will also have .child_keys populated with all known child_keys.
550+ After this has finished:
551+ - self._nodes will have an entry for every entry in parent_map.
552+ - ghosts will have a parent_keys = None,
553+ - all nodes found will also have child_keys populated with all known
554+ child keys,
555 """
556 cdef PyObject *temp_key, *temp_parent_keys, *temp_node
557 cdef Py_ssize_t pos, pos2, num_parent_keys
558 cdef _KnownGraphNode node
559 cdef _KnownGraphNode parent_node
560
561- nodes = self._nodes
562-
563 if not PyDict_CheckExact(parent_map):
564 raise TypeError('parent_map should be a dict of {key:parent_keys}')
565 # for key, parent_keys in parent_map.iteritems():
566@@ -212,14 +183,13 @@
567 while PyDict_Next(parent_map, &pos, &temp_key, &temp_parent_keys):
568 key = <object>temp_key
569 parent_keys = <object>temp_parent_keys
570+ num_parent_keys = len(parent_keys)
571 node = self._get_or_create_node(key)
572 # We know how many parents, so we could pre allocate an exact sized
573 # tuple here
574- num_parent_keys = len(parent_keys)
575 parent_nodes = PyTuple_New(num_parent_keys)
576 # We use iter here, because parent_keys maybe be a list or tuple
577 for pos2 from 0 <= pos2 < num_parent_keys:
578- parent_key = parent_keys[pos2]
579 parent_node = self._get_or_create_node(parent_keys[pos2])
580 # PyTuple_SET_ITEM will steal a reference, so INCREF first
581 Py_INCREF(parent_node)
582@@ -227,136 +197,54 @@
583 PyList_Append(parent_node.children, node)
584 node.parents = parent_nodes
585
586- cdef _KnownGraphNode _check_is_linear(self, _KnownGraphNode node):
587- """Check to see if a given node is part of a linear chain."""
588- cdef _KnownGraphNode parent_node
589- if node.parents is None or PyTuple_GET_SIZE(node.parents) != 1:
590- # This node is either a ghost, a tail, or has multiple parents
591- # It its own dominator
592- node.linear_dominator_node = node
593- return None
594- parent_node = _get_parent(node.parents, 0)
595- if PyList_GET_SIZE(parent_node.children) > 1:
596- # The parent has multiple children, so *this* node is the
597- # dominator
598- node.linear_dominator_node = node
599- return None
600- # The parent is already filled in, so add and continue
601- if parent_node.linear_dominator_node is not None:
602- node.linear_dominator_node = parent_node.linear_dominator_node
603- return None
604- # We don't know this node, or its parent node, so start walking to
605- # next
606- return parent_node
607-
608- def _find_linear_dominators(self):
609- """
610- For any given node, the 'linear dominator' is an ancestor, such that
611- all parents between this node and that one have a single parent, and a
612- single child. So if A->B->C->D then B,C,D all have a linear dominator
613- of A.
614-
615- There are two main benefits:
616- 1) When walking the graph, we can jump to the nearest linear dominator,
617- rather than walking all of the nodes inbetween.
618- 2) When caching heads() results, dominators give the "same" results as
619- their children. (If the dominator is a head, then the descendant is
620- a head, if the dominator is not a head, then the child isn't
621- either.)
622- """
623- cdef PyObject *temp_node
624- cdef Py_ssize_t pos
625- cdef _KnownGraphNode node
626- cdef _KnownGraphNode next_node
627- cdef _KnownGraphNode dominator
628- cdef int i, num_elements
629-
630- pos = 0
631- while PyDict_Next(self._nodes, &pos, NULL, &temp_node):
632- node = <_KnownGraphNode>temp_node
633- # The parent is not filled in, so walk until we get somewhere
634- if node.linear_dominator_node is not None: #already done
635- continue
636- next_node = self._check_is_linear(node)
637- if next_node is None:
638- # Nothing more needs to be done
639- continue
640- stack = []
641- while next_node is not None:
642- PyList_Append(stack, node)
643- node = next_node
644- next_node = self._check_is_linear(node)
645- # The stack now contains the linear chain, and 'node' should have
646- # been labeled
647- dominator = node.linear_dominator_node
648- num_elements = len(stack)
649- for i from num_elements > i >= 0:
650- next_node = _get_list_node(stack, i)
651- next_node.linear_dominator_node = dominator
652- node = next_node
653-
654- cdef object _find_tails(self):
655- cdef object tails
656- cdef PyObject *temp_node
657- cdef Py_ssize_t pos
658- cdef _KnownGraphNode node
659+ def _find_tails(self):
660+ cdef PyObject *temp_node
661+ cdef _KnownGraphNode node
662+ cdef Py_ssize_t pos
663
664 tails = []
665 pos = 0
666 while PyDict_Next(self._nodes, &pos, NULL, &temp_node):
667 node = <_KnownGraphNode>temp_node
668 if node.parents is None or PyTuple_GET_SIZE(node.parents) == 0:
669+ node.gdfo = 1
670 PyList_Append(tails, node)
671 return tails
672
673 def _find_gdfo(self):
674- cdef Py_ssize_t pos, pos2
675 cdef _KnownGraphNode node
676- cdef _KnownGraphNode child_node
677- cdef _KnownGraphNode parent_node
678- cdef int replace_node, missing_parent
679-
680- tails = self._find_tails()
681- todo = []
682- for pos from 0 <= pos < PyList_GET_SIZE(tails):
683- node = _get_list_node(tails, pos)
684- node.gdfo = 1
685- PyList_Append(todo, (1, node))
686- # No need to heapify, because all tails have priority=1
687- while PyList_GET_SIZE(todo) > 0:
688- node = _peek_node(todo)
689+ cdef _KnownGraphNode child
690+ cdef PyObject *temp
691+ cdef Py_ssize_t pos
692+ cdef int replace
693+ cdef Py_ssize_t last_item
694+ cdef long next_gdfo
695+
696+ pending = self._find_tails()
697+
698+ last_item = PyList_GET_SIZE(pending) - 1
699+ while last_item >= 0:
700+ # Avoid pop followed by push, instead, peek, and replace
701+ # timing shows this is 930ms => 770ms for OOo
702+ node = _get_list_node(pending, last_item)
703+ last_item = last_item - 1
704 next_gdfo = node.gdfo + 1
705- replace_node = 1
706 for pos from 0 <= pos < PyList_GET_SIZE(node.children):
707- child_node = _get_list_node(node.children, pos)
708- # We should never have numbered children before we numbered
709- # a parent
710- if child_node.gdfo != -1:
711- continue
712- # Only enque children when all of their parents have been
713- # resolved. With a single parent, we can just take 'this' value
714- child_gdfo = next_gdfo
715- if PyTuple_GET_SIZE(child_node.parents) > 1:
716- missing_parent = 0
717- for pos2 from 0 <= pos2 < PyTuple_GET_SIZE(child_node.parents):
718- parent_node = _get_parent(child_node.parents, pos2)
719- if parent_node.gdfo == -1:
720- missing_parent = 1
721- break
722- if parent_node.gdfo >= child_gdfo:
723- child_gdfo = parent_node.gdfo + 1
724- if missing_parent:
725- # One of the parents is not numbered, so wait until we get
726- # back here
727- continue
728- child_node.gdfo = child_gdfo
729- if replace_node:
730- heapreplace(todo, (child_gdfo, child_node))
731- replace_node = 0
732- else:
733- heappush(todo, (child_gdfo, child_node))
734- if replace_node:
735- heappop(todo)
736+ child = _get_list_node(node.children, pos)
737+ if next_gdfo > child.gdfo:
738+ child.gdfo = next_gdfo
739+ child.seen = child.seen + 1
740+ if child.seen == PyTuple_GET_SIZE(child.parents):
741+ # This child is populated, queue it to be walked
742+ last_item = last_item + 1
743+ if last_item < PyList_GET_SIZE(pending):
744+ Py_INCREF(child) # SetItem steals a ref
745+ PyList_SetItem(pending, last_item, child)
746+ else:
747+ PyList_Append(pending, child)
748+ # We have queued this node, we don't need to track it
749+ # anymore
750+ child.seen = 0
751
752 def heads(self, keys):
753 """Return the heads from amongst keys.
754@@ -364,9 +252,8 @@
755 This is done by searching the ancestries of each key. Any key that is
756 reachable from another key is not returned; all the others are.
757
758- This operation scales with the relative depth between any two keys. If
759- any two keys are completely disconnected all ancestry of both sides
760- will be retrieved.
761+ This operation scales with the relative depth between any two keys. It
762+ uses gdfo to avoid walking all ancestry.
763
764 :param keys: An iterable of keys.
765 :return: A set of the heads. Note that as a set there is no ordering
766@@ -375,231 +262,78 @@
767 """
768 cdef PyObject *maybe_node
769 cdef PyObject *maybe_heads
770+ cdef PyObject *temp_node
771+ cdef _KnownGraphNode node
772+ cdef Py_ssize_t pos, last_item
773+ cdef long min_gdfo
774
775 heads_key = PyFrozenSet_New(keys)
776 maybe_heads = PyDict_GetItem(self._known_heads, heads_key)
777 if maybe_heads != NULL:
778 return <object>maybe_heads
779-
780 # Not cached, compute it ourselves
781 candidate_nodes = {}
782- nodes = self._nodes
783 for key in keys:
784- maybe_node = PyDict_GetItem(nodes, key)
785+ maybe_node = PyDict_GetItem(self._nodes, key)
786 if maybe_node == NULL:
787 raise KeyError('key %s not in nodes' % (key,))
788 PyDict_SetItem(candidate_nodes, key, <object>maybe_node)
789- if revision.NULL_REVISION in candidate_nodes:
790+ maybe_node = PyDict_GetItem(candidate_nodes, NULL_REVISION)
791+ if maybe_node != NULL:
792 # NULL_REVISION is only a head if it is the only entry
793- candidate_nodes.pop(revision.NULL_REVISION)
794+ candidate_nodes.pop(NULL_REVISION)
795 if not candidate_nodes:
796- return set([revision.NULL_REVISION])
797+ return frozenset([NULL_REVISION])
798 # The keys changed, so recalculate heads_key
799 heads_key = PyFrozenSet_New(candidate_nodes)
800- if len(candidate_nodes) < 2:
801+ if PyDict_Size(candidate_nodes) < 2:
802 return heads_key
803- dom_to_node = self._get_dominators_to_nodes(candidate_nodes)
804- if PyDict_Size(candidate_nodes) < 2:
805- return frozenset(candidate_nodes)
806- dom_lookup_key, heads = self._heads_from_dominators(candidate_nodes,
807- dom_to_node)
808- if heads is not None:
809- if self.do_cache:
810- # This heads was not in the cache, or it would have been caught
811- # earlier, but the dom head *was*, so do the simple cache
812- PyDict_SetItem(self._known_heads, heads_key, heads)
813- return heads
814- heads = self._heads_from_candidate_nodes(candidate_nodes, dom_to_node)
815+
816+ cleanup = []
817+ pending = []
818+ # we know a gdfo cannot be longer than a linear chain of all nodes
819+ min_gdfo = PyDict_Size(self._nodes) + 1
820+ # Build up nodes that need to be walked, note that starting nodes are
821+ # not added to seen()
822+ pos = 0
823+ while PyDict_Next(candidate_nodes, &pos, NULL, &temp_node):
824+ node = <_KnownGraphNode>temp_node
825+ if node.parents is not None:
826+ pending.extend(node.parents)
827+ if node.gdfo < min_gdfo:
828+ min_gdfo = node.gdfo
829+
830+ # Now do all the real work
831+ last_item = PyList_GET_SIZE(pending) - 1
832+ while last_item >= 0:
833+ node = _get_list_node(pending, last_item)
834+ last_item = last_item - 1
835+ if node.seen:
836+ # node already appears in some ancestry
837+ continue
838+ PyList_Append(cleanup, node)
839+ node.seen = 1
840+ if node.gdfo <= min_gdfo:
841+ continue
842+ if node.parents is not None and PyTuple_GET_SIZE(node.parents) > 0:
843+ for pos from 0 <= pos < PyTuple_GET_SIZE(node.parents):
844+ parent_node = _get_parent(node.parents, pos)
845+ last_item = last_item + 1
846+ if last_item < PyList_GET_SIZE(pending):
847+ Py_INCREF(parent_node) # SetItem steals a ref
848+ PyList_SetItem(pending, last_item, parent_node)
849+ else:
850+ PyList_Append(pending, parent_node)
851+ heads = []
852+ pos = 0
853+ while PyDict_Next(candidate_nodes, &pos, NULL, &temp_node):
854+ node = <_KnownGraphNode>temp_node
855+ if not node.seen:
856+ PyList_Append(heads, node.key)
857+ heads = PyFrozenSet_New(heads)
858+ for pos from 0 <= pos < PyList_GET_SIZE(cleanup):
859+ node = _get_list_node(cleanup, pos)
860+ node.seen = 0
861 if self.do_cache:
862- self._cache_heads(heads, heads_key, dom_lookup_key, candidate_nodes)
863+ PyDict_SetItem(self._known_heads, heads_key, heads)
864 return heads
865-
866- cdef object _cache_heads(self, heads, heads_key, dom_lookup_key,
867- candidate_nodes):
868- cdef PyObject *maybe_node
869- cdef _KnownGraphNode node
870-
871- PyDict_SetItem(self._known_heads, heads_key, heads)
872- dom_heads = []
873- for key in heads:
874- maybe_node = PyDict_GetItem(candidate_nodes, key)
875- if maybe_node == NULL:
876- raise KeyError
877- node = <_KnownGraphNode>maybe_node
878- PyList_Append(dom_heads, node.linear_dominator_node.key)
879- PyDict_SetItem(self._known_heads, dom_lookup_key,
880- PyFrozenSet_New(dom_heads))
881-
882- cdef _get_dominators_to_nodes(self, candidate_nodes):
883- """Get the reverse mapping from dominator_key => candidate_nodes.
884-
885- As a side effect, this can also remove potential candidate nodes if we
886- determine that they share a dominator.
887- """
888- cdef Py_ssize_t pos
889- cdef _KnownGraphNode node, other_node
890- cdef PyObject *temp_node
891- cdef PyObject *maybe_node
892-
893- dom_to_node = {}
894- keys_to_remove = []
895- pos = 0
896- while PyDict_Next(candidate_nodes, &pos, NULL, &temp_node):
897- node = <_KnownGraphNode>temp_node
898- dom_key = node.linear_dominator_node.key
899- maybe_node = PyDict_GetItem(dom_to_node, dom_key)
900- if maybe_node == NULL:
901- PyDict_SetItem(dom_to_node, dom_key, node)
902- else:
903- other_node = <_KnownGraphNode>maybe_node
904- # These nodes share a dominator, one of them obviously
905- # supersedes the other, figure out which
906- if other_node.gdfo > node.gdfo:
907- PyList_Append(keys_to_remove, node.key)
908- else:
909- # This wins, replace the other
910- PyList_Append(keys_to_remove, other_node.key)
911- PyDict_SetItem(dom_to_node, dom_key, node)
912- for pos from 0 <= pos < PyList_GET_SIZE(keys_to_remove):
913- key = <object>PyList_GET_ITEM(keys_to_remove, pos)
914- candidate_nodes.pop(key)
915- return dom_to_node
916-
917- cdef object _heads_from_dominators(self, candidate_nodes, dom_to_node):
918- cdef PyObject *maybe_heads
919- cdef PyObject *maybe_node
920- cdef _KnownGraphNode node
921- cdef Py_ssize_t pos
922- cdef PyObject *temp_node
923-
924- dom_list_key = []
925- pos = 0
926- while PyDict_Next(candidate_nodes, &pos, NULL, &temp_node):
927- node = <_KnownGraphNode>temp_node
928- PyList_Append(dom_list_key, node.linear_dominator_node.key)
929- dom_lookup_key = PyFrozenSet_New(dom_list_key)
930- maybe_heads = PyDict_GetItem(self._known_heads, dom_lookup_key)
931- if maybe_heads == NULL:
932- return dom_lookup_key, None
933- # We need to map back from the dominator head to the original keys
934- dom_heads = <object>maybe_heads
935- heads = []
936- for dom_key in dom_heads:
937- maybe_node = PyDict_GetItem(dom_to_node, dom_key)
938- if maybe_node == NULL:
939- # Should never happen
940- raise KeyError
941- node = <_KnownGraphNode>maybe_node
942- PyList_Append(heads, node.key)
943- return dom_lookup_key, PyFrozenSet_New(heads)
944-
945- cdef int _process_parent(self, _KnownGraphNode node,
946- _KnownGraphNode parent_node,
947- candidate_nodes, dom_to_node,
948- queue, int *replace_item) except -1:
949- """Process the parent of a node, seeing if we need to walk it."""
950- cdef PyObject *maybe_candidate
951- cdef PyObject *maybe_node
952- cdef _KnownGraphNode dom_child_node
953- maybe_candidate = PyDict_GetItem(candidate_nodes, parent_node.key)
954- if maybe_candidate != NULL:
955- candidate_nodes.pop(parent_node.key)
956- # We could pass up a flag that tells the caller to stop processing,
957- # but it doesn't help much, and makes the code uglier
958- return 0
959- maybe_node = PyDict_GetItem(dom_to_node, parent_node.key)
960- if maybe_node != NULL:
961- # This is a dominator of a node
962- dom_child_node = <_KnownGraphNode>maybe_node
963- if dom_child_node is not node:
964- # It isn't a dominator of a node we are searching, so we should
965- # remove it from the search
966- maybe_candidate = PyDict_GetItem(candidate_nodes, dom_child_node.key)
967- if maybe_candidate != NULL:
968- candidate_nodes.pop(dom_child_node.key)
969- return 0
970- if parent_node.ancestor_of is None:
971- # This node hasn't been walked yet, so just project node's ancestor
972- # info directly to parent_node, and enqueue it for later processing
973- parent_node.ancestor_of = node.ancestor_of
974- if replace_item[0]:
975- heapreplace(queue, (-parent_node.gdfo, parent_node))
976- replace_item[0] = 0
977- else:
978- heappush(queue, (-parent_node.gdfo, parent_node))
979- PyList_Append(self._to_cleanup, parent_node)
980- elif parent_node.ancestor_of != node.ancestor_of:
981- # Combine to get the full set of parents
982- # Rewrite using PySet_* functions, unfortunately you have to use
983- # PySet_Add since there is no PySet_Update... :(
984- all_ancestors = set(parent_node.ancestor_of)
985- for k in node.ancestor_of:
986- PySet_Add(all_ancestors, k)
987- parent_node.ancestor_of = tuple(sorted(all_ancestors))
988- return 0
989-
990- cdef object _heads_from_candidate_nodes(self, candidate_nodes, dom_to_node):
991- cdef _KnownGraphNode node
992- cdef _KnownGraphNode parent_node
993- cdef Py_ssize_t num_candidates
994- cdef int num_parents, replace_item
995- cdef Py_ssize_t pos
996- cdef PyObject *temp_node
997-
998- queue = []
999- pos = 0
1000- while PyDict_Next(candidate_nodes, &pos, NULL, &temp_node):
1001- node = <_KnownGraphNode>temp_node
1002- node.ancestor_of = (node.key,)
1003- PyList_Append(queue, (-node.gdfo, node))
1004- PyList_Append(self._to_cleanup, node)
1005- heapify(queue)
1006- # These are nodes that we determined are 'common' that we are no longer
1007- # walking
1008- # Now we walk nodes until all nodes that are being walked are 'common'
1009- num_candidates = len(candidate_nodes)
1010- replace_item = 0
1011- while PyList_GET_SIZE(queue) > 0 and PyDict_Size(candidate_nodes) > 1:
1012- if replace_item:
1013- # We still need to pop the smallest member out of the queue
1014- # before we peek again
1015- heappop(queue)
1016- if PyList_GET_SIZE(queue) == 0:
1017- break
1018- # peek at the smallest item. We don't pop, because we expect we'll
1019- # need to push more things into the queue anyway
1020- node = _peek_node(queue)
1021- replace_item = 1
1022- if PyTuple_GET_SIZE(node.ancestor_of) == num_candidates:
1023- # This node is now considered 'common'
1024- # Make sure all parent nodes are marked as such
1025- for pos from 0 <= pos < PyTuple_GET_SIZE(node.parents):
1026- parent_node = _get_parent(node.parents, pos)
1027- if parent_node.ancestor_of is not None:
1028- parent_node.ancestor_of = node.ancestor_of
1029- if node.linear_dominator_node is not node:
1030- parent_node = node.linear_dominator_node
1031- if parent_node.ancestor_of is not None:
1032- parent_node.ancestor_of = node.ancestor_of
1033- continue
1034- if node.parents is None:
1035- # This is a ghost
1036- continue
1037- # Now project the current nodes ancestor list to the parent nodes,
1038- # and queue them up to be walked
1039- if node.linear_dominator_node is not node:
1040- # We are at the tip of a long linear region
1041- # We know that there is nothing between here and the tail
1042- # that is interesting, so skip to the end
1043- self._process_parent(node, node.linear_dominator_node,
1044- candidate_nodes, dom_to_node, queue, &replace_item)
1045- else:
1046- for pos from 0 <= pos < PyTuple_GET_SIZE(node.parents):
1047- parent_node = _get_parent(node.parents, pos)
1048- self._process_parent(node, parent_node, candidate_nodes,
1049- dom_to_node, queue, &replace_item)
1050- for pos from 0 <= pos < PyList_GET_SIZE(self._to_cleanup):
1051- node = _get_list_node(self._to_cleanup, pos)
1052- node.ancestor_of = None
1053- self._to_cleanup = []
1054- return PyFrozenSet_New(candidate_nodes)
1055
1056=== modified file 'bzrlib/tests/test__known_graph.py'
1057--- bzrlib/tests/test__known_graph.py 2009-06-16 15:35:14 +0000
1058+++ bzrlib/tests/test__known_graph.py 2009-06-20 08:35:13 +0000
1059@@ -71,10 +71,6 @@
1060 def make_known_graph(self, ancestry):
1061 return self.module.KnownGraph(ancestry, do_cache=self.do_cache)
1062
1063- def assertDominator(self, graph, rev, dominator):
1064- node = graph._nodes[rev]
1065- self.assertEqual(dominator, node.linear_dominator)
1066-
1067 def assertGDFO(self, graph, rev, gdfo):
1068 node = graph._nodes[rev]
1069 self.assertEqual(gdfo, node.gdfo)
1070@@ -88,29 +84,6 @@
1071 self.assertEqual(['rev4'], sorted(graph._nodes['rev3'].child_keys))
1072 self.assertEqual(['rev4'], sorted(graph._nodes['rev2b'].child_keys))
1073
1074- def test_dominators_ancestry_1(self):
1075- graph = self.make_known_graph(test_graph.ancestry_1)
1076- self.assertDominator(graph, 'rev1', NULL_REVISION)
1077- self.assertDominator(graph, 'rev2b', 'rev2b')
1078- self.assertDominator(graph, 'rev2a', 'rev2a')
1079- self.assertDominator(graph, 'rev3', 'rev2a')
1080- self.assertDominator(graph, 'rev4', 'rev4')
1081-
1082- def test_dominators_feature_branch(self):
1083- graph = self.make_known_graph(test_graph.feature_branch)
1084- self.assertDominator(graph, 'rev1', NULL_REVISION)
1085- self.assertDominator(graph, 'rev2b', NULL_REVISION)
1086- self.assertDominator(graph, 'rev3b', NULL_REVISION)
1087-
1088- def test_dominators_extended_history_shortcut(self):
1089- graph = self.make_known_graph(test_graph.extended_history_shortcut)
1090- self.assertDominator(graph, 'a', NULL_REVISION)
1091- self.assertDominator(graph, 'b', 'b')
1092- self.assertDominator(graph, 'c', 'b')
1093- self.assertDominator(graph, 'd', 'b')
1094- self.assertDominator(graph, 'e', 'e')
1095- self.assertDominator(graph, 'f', 'f')
1096-
1097 def test_gdfo_ancestry_1(self):
1098 graph = self.make_known_graph(test_graph.ancestry_1)
1099 self.assertGDFO(graph, 'rev1', 2)
1100@@ -229,3 +202,14 @@
1101 self.assertEqual(set(['z']), graph.heads(['w', 's', 'z']))
1102 self.assertEqual(set(['w', 'q']), graph.heads(['w', 's', 'q']))
1103 self.assertEqual(set(['z']), graph.heads(['s', 'z']))
1104+
1105+ def test_heads_with_ghost(self):
1106+ graph = self.make_known_graph(test_graph.with_ghost)
1107+ self.assertEqual(set(['e', 'g']), graph.heads(['e', 'g']))
1108+ self.assertEqual(set(['a', 'c']), graph.heads(['a', 'c']))
1109+ self.assertEqual(set(['a', 'g']), graph.heads(['a', 'g']))
1110+ self.assertEqual(set(['f', 'g']), graph.heads(['f', 'g']))
1111+ self.assertEqual(set(['c']), graph.heads(['c', 'g']))
1112+ self.assertEqual(set(['c']), graph.heads(['c', 'b', 'd', 'g']))
1113+ self.assertEqual(set(['a', 'c']), graph.heads(['a', 'c', 'e', 'g']))
1114+ self.assertEqual(set(['a', 'c']), graph.heads(['a', 'c', 'f']))
1115
1116=== modified file 'tools/time_graph.py'
1117--- tools/time_graph.py 2009-06-12 18:05:15 +0000
1118+++ tools/time_graph.py 2009-06-20 08:35:13 +0000
1119@@ -16,12 +16,15 @@
1120 from bzrlib.ui import text
1121
1122 p = optparse.OptionParser()
1123+p.add_option('--quick', default=False, action='store_true')
1124 p.add_option('--max-combinations', default=500, type=int)
1125 p.add_option('--lsprof', default=None, type=str)
1126 opts, args = p.parse_args(sys.argv[1:])
1127+
1128 trace.enable_default_logging()
1129 ui.ui_factory = text.TextUIFactory()
1130
1131+begin = time.clock()
1132 if len(args) >= 1:
1133 b = branch.Branch.open(args[0])
1134 else:
1135@@ -33,8 +36,9 @@
1136 if p[1] is not None)
1137 finally:
1138 b.unlock()
1139+end = time.clock()
1140
1141-print 'Found %d nodes' % (len(parent_map),)
1142+print 'Found %d nodes, loaded in %.3fs' % (len(parent_map), end - begin)
1143
1144 def all_heads_comp(g, combinations):
1145 h = []
1146@@ -47,6 +51,7 @@
1147 finally:
1148 pb.finished()
1149 return h
1150+
1151 combinations = []
1152 # parents = parent_map.keys()
1153 # for p1 in parents:
1154@@ -61,37 +66,57 @@
1155 for revision_id, parent_ids in parent_map.iteritems():
1156 if parent_ids is not None and len(parent_ids) > 1:
1157 combinations.append(parent_ids)
1158+# The largest portion of the graph that has to be walked for a heads() check
1159+# combinations = [('john@arbash-meinel.com-20090312021943-tu6tcog48aiujx4s',
1160+# 'john@arbash-meinel.com-20090312130552-09xa2xsitf6rilzc')]
1161 if opts.max_combinations > 0 and len(combinations) > opts.max_combinations:
1162 combinations = random.sample(combinations, opts.max_combinations)
1163
1164 print ' %d combinations' % (len(combinations),)
1165-t1 = time.clock()
1166-known_g = _known_graph_py.KnownGraph(parent_map)
1167-if opts.lsprof is not None:
1168- h_known = commands.apply_lsprofiled(opts.lsprof,
1169- all_heads_comp, known_g, combinations)
1170-else:
1171- h_known = all_heads_comp(known_g, combinations)
1172-t2 = time.clock()
1173-print "Known: %.3fs" % (t2-t1,)
1174-print " %s" % (graph._counters,)
1175-t1 = time.clock()
1176-known_g = _known_graph_pyx.KnownGraph(parent_map)
1177-if opts.lsprof is not None:
1178- h_known = commands.apply_lsprofiled(opts.lsprof,
1179- all_heads_comp, known_g, combinations)
1180-else:
1181- h_known = all_heads_comp(known_g, combinations)
1182-t2 = time.clock()
1183-print "Known (pyx): %.3fs" % (t2-t1,)
1184-print " %s" % (graph._counters,)
1185-simple_g = graph.Graph(graph.DictParentsProvider(parent_map))
1186-graph._counters[1] = 0
1187-graph._counters[2] = 0
1188-h_simple = all_heads_comp(simple_g, combinations)
1189-t3 = time.clock()
1190-print "Orig: %.3fs" % (t3-t2,)
1191-print " %s" % (graph._counters,)
1192-if h_simple != h_known:
1193- import pdb; pdb.set_trace()
1194-print 'ratio: %.3fs' % ((t2-t1) / (t3-t2))
1195+
1196+def combi_graph(graph_klass, comb):
1197+ # DEBUG
1198+ graph._counters[1] = 0
1199+ graph._counters[2] = 0
1200+
1201+ begin = time.clock()
1202+ g = graph_klass(parent_map)
1203+ if opts.lsprof is not None:
1204+ heads = commands.apply_lsprofiled(opts.lsprof, all_heads_comp, g, comb)
1205+ else:
1206+ heads = all_heads_comp(g, comb)
1207+ end = time.clock()
1208+ return dict(elapsed=(end - begin), graph=g, heads=heads)
1209+
1210+def report(name, g):
1211+ print '%s: %.3fs' % (name, g['elapsed'])
1212+ counters_used = False
1213+ for c in graph._counters:
1214+ if c:
1215+ counters_used = True
1216+ if counters_used:
1217+ print ' %s' % (graph._counters,)
1218+
1219+known_python = combi_graph(_known_graph_py.KnownGraph, combinations)
1220+report('Known', known_python)
1221+
1222+known_pyrex = combi_graph(_known_graph_pyx.KnownGraph, combinations)
1223+report('Known (pyx)', known_pyrex)
1224+
1225+def _simple_graph(parent_map):
1226+ return graph.Graph(graph.DictParentsProvider(parent_map))
1227+
1228+if opts.quick:
1229+ if known_python['heads'] != known_pyrex['heads']:
1230+ import pdb; pdb.set_trace()
1231+ print 'ratio: %.1f:1 faster' % (
1232+ known_python['elapsed'] / known_pyrex['elapsed'],)
1233+else:
1234+ orig = combi_graph(_simple_graph, combinations)
1235+ report('Orig', orig)
1236+
1237+ if orig['heads'] != known_pyrex['heads']:
1238+ import pdb; pdb.set_trace()
1239+
1240+ print 'ratio: %.1f:1 faster' % (
1241+ orig['elapsed'] / known_pyrex['elapsed'],)