Merge lp:~ian-clatworthy/bzr/faster-tags into lp:~bzr/bzr/trunk-old

Proposed by Ian Clatworthy on 2009-05-29
Status: Merged
Approved by: Ian Clatworthy on 2009-05-29
Approved revision: 4391
Merged at revision: not available
Proposed branch: lp:~ian-clatworthy/bzr/faster-tags
Merge into: lp:~bzr/bzr/trunk-old
Diff against target: 49 lines
To merge this branch: bzr merge lp:~ian-clatworthy/bzr/faster-tags
Reviewer Review Type Date Requested Status
Martin Pool 2009-05-29 Needs Information on 2009-05-29
Review via email: mp+6875@code.launchpad.net
To post a comment you must log in.
Ian Clatworthy (ian-clatworthy) wrote :

This 5 minute patch drops the time of 'bzr tags' on OOo down from 29.3 seconds to 0.4 seconds, simply by avoiding generating the full revid-to-revno map explicitly. (It will still be done implicitly when needed.)

Martin Pool (mbp) wrote :

It's helpful when we're reporting on bzr progress to put an indicative performance number in the news like "5-10x faster". (Especially when it's as good as you've done here.) I realize it's not spads-approved and ymmv etc.

I might be missing something but it looks like in the previous code the dict 'tags' was indexed by tag name but here it's indexed by the enumeration integer?

If that's true, shouldn't a test have failed?

review: Needs Information
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin Pool wrote:
> Review: Needs Information
> It's helpful when we're reporting on bzr progress to put an indicative performance number in the news like "5-10x faster". (Especially when it's as good as you've done here.) I realize it's not spads-approved and ymmv etc.
>
> I might be missing something but it looks like in the previous code the dict 'tags' was indexed by tag name but here it's indexed by the enumeration integer?
>
> If that's true, shouldn't a test have failed?

 Review: approve

This only benefits the case of "no actual tags present". I thought we
actually had something like this, but whatever.

I'll mention that "bzr tags --show-ids" should always be the same speed
(fast), and this is just avoiding loading the revision_id=>revno map
until actually needed.

We honestly need a better answer for the case when there really are
tags, but for now...

Unless revision_id_to_revno() also knows to try to determine mainline
revnos first (which it can do by walking backwards from tip). So there
is the chance that if all your tags are on the mainline (and recent) it
won't load as much history.

Anyway, I think this is an improvement, I don't know that it is complete.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkofkzcACgkQJdeBCYSNAAPmMwCgvaO9ZmRymH+fFTpu78GiyRrz
4+4AoNVnMOXceTbdDV+jJDb+dpZq2sQP
=1GIz
-----END PGP SIGNATURE-----

Ian Clatworthy (ian-clatworthy) wrote :

Martin Pool wrote:
> Review: Needs Information
> It's helpful when we're reporting on bzr progress to put an indicative performance number in the news like "5-10x faster". (Especially when it's as good as you've done here.) I realize it's not spads-approved and ymmv etc.
>
Sure. The numbers can be quite variable but I'm happy to say something
like "up to 70X faster" as a tweak. I can do the same for the other
things I've put into the one NEWS item as well.
> I might be missing something but it looks like in the previous code the dict 'tags' was indexed by tag name but here it's indexed by the enumeration integer
tags is a list of tuples, not a dict. The code and tests are all fine. I
checked all the tests (for tags) passed before I submitted the patch (as
I do for 95% of patches I send in).

Ian C.

Ian Clatworthy (ian-clatworthy) wrote :

John A Meinel wrote:
>
> This only benefits the case of "no actual tags present". I thought we
> actually had something like this, but whatever.
>
Not true. OOo has tags (10-20) and its works fine.
> I'll mention that "bzr tags --show-ids" should always be the same speed
> (fast), and this is just avoiding loading the revision_id=>revno map
> until actually needed.
Right. But showing revnos is default behaviour so saying to users "but
tags --show-ids is fast" doesn't seem a good enough response.
> Unless revision_id_to_revno() also knows to try to determine mainline
> revnos first (which it can do by walking backwards from tip). So there
> is the chance that if all your tags are on the mainline (and recent) it
> won't load as much history.
>
It does exactly that.

Tagging things along the mainline is pretty common, particularly when
there's only a mainline. :-) So I think this is a win for many projects
using tags.

Ian C.

John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ian Clatworthy wrote:
> John A Meinel wrote:
>> This only benefits the case of "no actual tags present". I thought we
>> actually had something like this, but whatever.
>>
> Not true. OOo has tags (10-20) and its works fine.
>> I'll mention that "bzr tags --show-ids" should always be the same speed
>> (fast), and this is just avoiding loading the revision_id=>revno map
>> until actually needed.
> Right. But showing revnos is default behaviour so saying to users "but
> tags --show-ids is fast" doesn't seem a good enough response.
>> Unless revision_id_to_revno() also knows to try to determine mainline
>> revnos first (which it can do by walking backwards from tip). So there
>> is the chance that if all your tags are on the mainline (and recent) it
>> won't load as much history.
>>
> It does exactly that.
>
> Tagging things along the mainline is pretty common, particularly when
> there's only a mainline. :-) So I think this is a win for many projects
> using tags.
>
> Ian C.

Perhaps. Though having 'really old' tags in a long mainline is likely to
also be common.

If 'list(iter_reverse_revision_history(branch.last_revision()))' is much
faster than 'tsort.merge_sort' on very long linear histories, then I
think we have something wrong going on that is worth looking into.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkofnCAACgkQJdeBCYSNAAMGaACeMvKXKZVYzYbD9j22WOF9oHYg
64gAn34WHY6fejM21ipjQ88RlIftZR3G
=UGUR
-----END PGP SIGNATURE-----

Matthew Fuller (fullermd) wrote :

On Fri, May 29, 2009 at 03:35:14AM -0000 I heard the voice of
Ian Clatworthy, and lo! it spake thus:
>
> This 5 minute patch drops the time of 'bzr tags' on OOo down from
> 29.3 seconds to 0.4 seconds, simply by avoiding generating the full
> revid-to-revno map explicitly.

See also bug 249776; this may round it out, or be a step along the
way.

--
Matthew Fuller
<email address hidden>

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'NEWS'
2--- NEWS 2009-05-28 18:56:55 +0000
3+++ NEWS 2009-05-29 03:35:12 +0000
4@@ -18,12 +18,16 @@
5 Improvements
6 ************
7
8-* ``bzr branch`` in a shared repository is now faster for
9- development6-rich-root format. (Ian Clatworthy)
10-
11-* ``bzr branch --notree`` is now faster. (Ian Clatworthy)
12-
13-* ``bzr diff`` is now faster on large trees. (Ian Clatworthy)
14+* Numerous operations are now faster for huge projects, i.e. those
15+ with a large number of files and/or a large number of revisions,
16+ particularly when the latest development format is used. These
17+ operations include:
18+
19+ * branch in a shared repo, branch --no-tree
20+ * diff
21+ * tags
22+
23+ (Ian Clatworthy)
24
25 Bug Fixes
26 *********
27
28=== modified file 'bzrlib/builtins.py'
29--- bzrlib/builtins.py 2009-05-28 18:06:43 +0000
30+++ bzrlib/builtins.py 2009-05-29 03:35:12 +0000
31@@ -5053,9 +5053,16 @@
32 tags.sort(key=lambda x: timestamps[x[1]])
33 if not show_ids:
34 # [ (tag, revid), ... ] -> [ (tag, dotted_revno), ... ]
35- revno_map = branch.get_revision_id_to_revno_map()
36- tags = [ (tag, '.'.join(map(str, revno_map.get(revid, ('?',)))))
37- for tag, revid in tags ]
38+ for index, (tag, revid) in enumerate(tags):
39+ try:
40+ revno = branch.revision_id_to_dotted_revno(revid)
41+ if isinstance(revno, tuple):
42+ revno = '.'.join(map(str, revno))
43+ except errors.NoSuchRevision:
44+ # Bad tag data/merges can lead to tagged revisions
45+ # which are not in this branch. Fail gracefully ...
46+ revno = '?'
47+ tags[index] = (tag, revno)
48 for tag, revspec in tags:
49 self.outf.write('%-20s %s\n' % (tag, revspec))
50