Merge lp:~ian-clatworthy/bzr/faster-tags into lp:~bzr/bzr/trunk-old

Proposed by Ian Clatworthy
Status: Merged
Approved by: Ian Clatworthy
Approved revision: no longer in the source branch.
Merged at revision: not available
Proposed branch: lp:~ian-clatworthy/bzr/faster-tags
Merge into: lp:~bzr/bzr/trunk-old
Diff against target: 49 lines
To merge this branch: bzr merge lp:~ian-clatworthy/bzr/faster-tags
Reviewer Review Type Date Requested Status
Martin Pool Needs Information
Review via email: mp+6875@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote :

This 5 minute patch drops the time of 'bzr tags' on OOo down from 29.3 seconds to 0.4 seconds, simply by avoiding generating the full revid-to-revno map explicitly. (It will still be done implicitly when needed.)

Revision history for this message
Martin Pool (mbp) wrote :

It's helpful when we're reporting on bzr progress to put an indicative performance number in the news like "5-10x faster". (Especially when it's as good as you've done here.) I realize it's not spads-approved and ymmv etc.

I might be missing something but it looks like in the previous code the dict 'tags' was indexed by tag name but here it's indexed by the enumeration integer?

If that's true, shouldn't a test have failed?

review: Needs Information
Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin Pool wrote:
> Review: Needs Information
> It's helpful when we're reporting on bzr progress to put an indicative performance number in the news like "5-10x faster". (Especially when it's as good as you've done here.) I realize it's not spads-approved and ymmv etc.
>
> I might be missing something but it looks like in the previous code the dict 'tags' was indexed by tag name but here it's indexed by the enumeration integer?
>
> If that's true, shouldn't a test have failed?

 Review: approve

This only benefits the case of "no actual tags present". I thought we
actually had something like this, but whatever.

I'll mention that "bzr tags --show-ids" should always be the same speed
(fast), and this is just avoiding loading the revision_id=>revno map
until actually needed.

We honestly need a better answer for the case when there really are
tags, but for now...

Unless revision_id_to_revno() also knows to try to determine mainline
revnos first (which it can do by walking backwards from tip). So there
is the chance that if all your tags are on the mainline (and recent) it
won't load as much history.

Anyway, I think this is an improvement, I don't know that it is complete.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkofkzcACgkQJdeBCYSNAAPmMwCgvaO9ZmRymH+fFTpu78GiyRrz
4+4AoNVnMOXceTbdDV+jJDb+dpZq2sQP
=1GIz
-----END PGP SIGNATURE-----

Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote :

Martin Pool wrote:
> Review: Needs Information
> It's helpful when we're reporting on bzr progress to put an indicative performance number in the news like "5-10x faster". (Especially when it's as good as you've done here.) I realize it's not spads-approved and ymmv etc.
>
Sure. The numbers can be quite variable but I'm happy to say something
like "up to 70X faster" as a tweak. I can do the same for the other
things I've put into the one NEWS item as well.
> I might be missing something but it looks like in the previous code the dict 'tags' was indexed by tag name but here it's indexed by the enumeration integer
tags is a list of tuples, not a dict. The code and tests are all fine. I
checked all the tests (for tags) passed before I submitted the patch (as
I do for 95% of patches I send in).

Ian C.

Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote :

John A Meinel wrote:
>
> This only benefits the case of "no actual tags present". I thought we
> actually had something like this, but whatever.
>
Not true. OOo has tags (10-20) and its works fine.
> I'll mention that "bzr tags --show-ids" should always be the same speed
> (fast), and this is just avoiding loading the revision_id=>revno map
> until actually needed.
Right. But showing revnos is default behaviour so saying to users "but
tags --show-ids is fast" doesn't seem a good enough response.
> Unless revision_id_to_revno() also knows to try to determine mainline
> revnos first (which it can do by walking backwards from tip). So there
> is the chance that if all your tags are on the mainline (and recent) it
> won't load as much history.
>
It does exactly that.

Tagging things along the mainline is pretty common, particularly when
there's only a mainline. :-) So I think this is a win for many projects
using tags.

Ian C.

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ian Clatworthy wrote:
> John A Meinel wrote:
>> This only benefits the case of "no actual tags present". I thought we
>> actually had something like this, but whatever.
>>
> Not true. OOo has tags (10-20) and its works fine.
>> I'll mention that "bzr tags --show-ids" should always be the same speed
>> (fast), and this is just avoiding loading the revision_id=>revno map
>> until actually needed.
> Right. But showing revnos is default behaviour so saying to users "but
> tags --show-ids is fast" doesn't seem a good enough response.
>> Unless revision_id_to_revno() also knows to try to determine mainline
>> revnos first (which it can do by walking backwards from tip). So there
>> is the chance that if all your tags are on the mainline (and recent) it
>> won't load as much history.
>>
> It does exactly that.
>
> Tagging things along the mainline is pretty common, particularly when
> there's only a mainline. :-) So I think this is a win for many projects
> using tags.
>
> Ian C.

Perhaps. Though having 'really old' tags in a long mainline is likely to
also be common.

If 'list(iter_reverse_revision_history(branch.last_revision()))' is much
faster than 'tsort.merge_sort' on very long linear histories, then I
think we have something wrong going on that is worth looking into.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkofnCAACgkQJdeBCYSNAAMGaACeMvKXKZVYzYbD9j22WOF9oHYg
64gAn34WHY6fejM21ipjQ88RlIftZR3G
=UGUR
-----END PGP SIGNATURE-----

Revision history for this message
Matthew Fuller (fullermd) wrote :

On Fri, May 29, 2009 at 03:35:14AM -0000 I heard the voice of
Ian Clatworthy, and lo! it spake thus:
>
> This 5 minute patch drops the time of 'bzr tags' on OOo down from
> 29.3 seconds to 0.4 seconds, simply by avoiding generating the full
> revid-to-revno map explicitly.

See also bug 249776; this may round it out, or be a step along the
way.

--
Matthew Fuller
<email address hidden>

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'NEWS'
--- NEWS 2009-05-28 18:56:55 +0000
+++ NEWS 2009-05-29 03:35:12 +0000
@@ -18,12 +18,16 @@
18Improvements18Improvements
19************19************
2020
21* ``bzr branch`` in a shared repository is now faster for21* Numerous operations are now faster for huge projects, i.e. those
22 development6-rich-root format. (Ian Clatworthy)22 with a large number of files and/or a large number of revisions,
2323 particularly when the latest development format is used. These
24* ``bzr branch --notree`` is now faster. (Ian Clatworthy)24 operations include:
2525
26* ``bzr diff`` is now faster on large trees. (Ian Clatworthy)26 * branch in a shared repo, branch --no-tree
27 * diff
28 * tags
29
30 (Ian Clatworthy)
2731
28Bug Fixes32Bug Fixes
29*********33*********
3034
=== modified file 'bzrlib/builtins.py'
--- bzrlib/builtins.py 2009-05-28 18:06:43 +0000
+++ bzrlib/builtins.py 2009-05-29 03:35:12 +0000
@@ -5053,9 +5053,16 @@
5053 tags.sort(key=lambda x: timestamps[x[1]])5053 tags.sort(key=lambda x: timestamps[x[1]])
5054 if not show_ids:5054 if not show_ids:
5055 # [ (tag, revid), ... ] -> [ (tag, dotted_revno), ... ]5055 # [ (tag, revid), ... ] -> [ (tag, dotted_revno), ... ]
5056 revno_map = branch.get_revision_id_to_revno_map()5056 for index, (tag, revid) in enumerate(tags):
5057 tags = [ (tag, '.'.join(map(str, revno_map.get(revid, ('?',)))))5057 try:
5058 for tag, revid in tags ]5058 revno = branch.revision_id_to_dotted_revno(revid)
5059 if isinstance(revno, tuple):
5060 revno = '.'.join(map(str, revno))
5061 except errors.NoSuchRevision:
5062 # Bad tag data/merges can lead to tagged revisions
5063 # which are not in this branch. Fail gracefully ...
5064 revno = '?'
5065 tags[index] = (tag, revno)
5059 for tag, revspec in tags:5066 for tag, revspec in tags:
5060 self.outf.write('%-20s %s\n' % (tag, revspec))5067 self.outf.write('%-20s %s\n' % (tag, revspec))
50615068