Code review comment for lp:~jameinel/bzr/1.19-known-graph-sorted

Revision history for this message
John A Meinel (jameinel) wrote :

This change finally brings the new ancestry extraction code up all the way so that 'bzr log' gets to use it.

It adds a member VersionedFiles.get_known_graph_ancestry(keys) which returns a KnownGraph instance.

It also implements 'merge_sort' in pyrex code on the KnownGraph object. At this point, KnownGraph.merge_sort() is quite fast, taking about 40ms to merge_sort all of bzr.dev and another 50ms to build up the KnownGraph object.

Combined with the improved extraction of ancestry, this brings "bzr log -n0 -r-10..-1" on bzr.dev down from 2.5s down to right at 1.0s for me.

As near as I can tell, the big win for doing merge_sort on the KnownGraph object is that you get to avoid a lot of dict lookup calls. 40ms for 25.6k keys is 1.5us per key. And doing a dict lookup to get the parents costs 10ms overall (.4us per key). It also brings the time for 'merge_sort' on the OOo tree down to 1.0s.

It also adds KnownGraph.topo_sort(), which turns out to only take around 10ms for all of bzr.dev (on top of the 50ms to build the KnownGraph data structure).

Because I already have an object model internally, I went ahead and exposed KnownGraph.merge_sort() as returning objects, rather than tuples. I think the api is going to be a lot easier to use, and none of my timings so far show an advantage to the tuple version. (It could be because the objects are compiled making getattr() faster...)

This changes tsort.topo_sort() to just be a thunk over to KnownGraph(parent_map).topo_sort(), as it is still faster than the fastest python implementation. (Though the pure python form is actually slower because of the overhead of building the KG object. I'm not worried, as it is still faster than our existing topo_sort implementation.)

I was thinking to do the same thing for 'merge_sort' but I took this opportunity to break the api, getting rid of "sequence_number", using an object model, ignoring 'mainline_revisions', etc. Note that only semi-deprecated code uses mainline_revisions anyway, and it really shouldn't be anymore. (It was back when we allowed Branch.revision_history() != lefthand_history.)

I also plan on updating KnownGraph with an 'add_node()' function, so that its implementation of 'merge_sort' can be used for annotate (which sometimes annotates the working tree). It shouldn't be hard to do.

« Back to merge proposal