Code review comment for lp:~jameinel/bzr/jam-gdfo-heads

Revision history for this message
Vincent Ladeuil (vila) wrote :

>>>>> "jam" == John A Meinel <email address hidden> writes:

    jam> John A Meinel has proposed merging
    jam> lp:~jameinel/bzr/jam-gdfo-heads into lp:bzr.

 review approve

    jam> Requested reviews:
    jam> bzr-core (bzr-core)

    jam> So this is my followup to Vincent's new algorithm for
    jam> KnownGraph.heads() The only real changes are to tweak the Pyrex
    jam> code to use direct access.

    jam> There are some wins by changing to using direct access to
    jam> members, and the big win is to change the algorithm to avoid
    jam> calling list.pop(). Instead we just track what item we are on and
    jam> insert an item there. If we find we need more space then we
    jam> append. (Note there is a specific edge case when you are popping
    jam> short lists as it reallocs() whenever the size shrinks to 1/2 the
    jam> max size. However if you only have 2 items, that is *every* pop,
    jam> and then every append following it will trigger another realloc.)

    jam> Also, this changes away from using a 'seen' dictionary to using a
    jam> .seen attribute.

    jam> Here are my times:

    jam> annotate NEWS time_graph (all 7k)
    jam> bzr.dev 11.6s 71.4s
    jam> vila pyx 11.5s 1.102s
    jam> jam pyx 9.7s 0.362s

Blam :) Well done !

But while I can reproduce the time_graph times (taking into
account that my test config is faster than yours), I can't
repoduce the NEWS ones, are you sure you compile the extensions
before measuring these times ? Here are the ones I can reproduce:

            annotate NEWS time_graph (all 7k)
bzr.dev pyx 6.62s 45.390s
vila pyx 5.92s 0.510s
jam pyx 5.45s 0.180s

If we can't reliably reproduce other's experiment, I think it's
worth investigating why... I used bzr.dev@4459, what about yours
for example ? Could it be that using an SSD introduce such a
distortion (worth isolating IO related penalties, but I can't see
where in the code concerned by this patch) ? By the way, I
measure annotate with --show-ids and >/dev/null, I'm pretty sure
you do the same, but just in case.

    jam> So while Vincent's update is godlike for whole-ancestry heads()
    jam> calls, it doesn't show much of an effect for NEWS.

And for mysql:

            annotate sql/mysqld.cc time_graph (all 34k)
bzr.dev pyx 18.87s 1751.760s
vila pyx 18.42s 17.190s
jam pyx 18.43s 4.030s

Note that I made a lot of measures here to break the tie, so the
difference is a bit less obvious than for NEWS. Your version *is*
better on average, mine did the best score, but obviously the
variation is high and I don't have a good explanation for that
(except that the time spent in heads is now so low we don't
measure it properly in that case).

<snip/>

    jam> (note that resolve right' is only 8.6s, so we are only
    jam> spending ~1s resolving something like 200k heads now.)

And being qualitatively better !

<snip/>

    jam> I did end up spending some time to tweak the _find_gdfo() code as
    jam> well. Once you are around 400ms for doing heads() for every merge
    jam> revision, suddenly the 100ms or so spent in _find_gdfo starts to
    jam> be a factor :).

:-)

    jam> I found that computing the tails 'for free' using a set was *not*
    jam> free, and was, in fact, a lot slower than walking the dict after
    jam> the fact.

Aaargh, big mistake ! I introduced that while cleaning up the
code so I didn't measure it ! Beginner's fault ! Gimme a rock I
can hide under :D Thanks for catching that !

    jam> The issue is that parent_map is purely random walking,
    jam> so you end up pushing a lot of nodes into the set that
    jam> you just remove. In contrast it takes only 40ms to walk
    jam> 200k nodes in the OOo dict, and find the small handful
    jam> of tails we will care about.

I'd like to note that since we can still find an upper bound for
the tails size (or just replace the add/remove by
increment/decrement and be precise ?) with that method. And since
it's needed temporarily only we don't care about wasting some
space if we can do a single allocation.

    jam> _find_gdfo is down around 9ms time for bzr.dev. The bulk of the
    jam> time in __init__ is now in _initialize_nodes which is doing
    jam> mallocs for all the objects. Removing _find_gdfo is 35.5ms
    jam> bzr.dev, 407ms OOo, versus 44.4ms, 563ms w/.

Bingo !

Another note then, my idea about _initialize_nodes is that we
could do it in two passes on the parent_map
1 - count everything we need for 2
2 - allocate all needed data structures once
3 - populate the structures knowing that we won't need to realloc them

To achieve that you sometimes need to switch from lists (as we
use them in python) for say parents to a *single* array
containing all parents (as pointers to nodes) and each node
containing an index for the first parent and the number of
parents.

Given your experiment above that should provides a nice
improvement.

Oh, and thanks for the detailed commit messages, I didn't read
them all in parallel with the diffs, but rest assured I will, I
want that patch to land though, so I send that review first ;-)

So a big thank you for doing that pyrex part and documenting it
that way, that will help me a lot getting up to speed in that
area.

     Vincent

review: Approve

« Back to merge proposal