Merge lp:~spiv/bzr/faster-revert-593560 into lp:bzr

Proposed by Andrew Bennetts
Status: Merged
Approved by: Andrew Bennetts
Approved revision: no longer in the source branch.
Merged at revision: 5401
Proposed branch: lp:~spiv/bzr/faster-revert-593560
Merge into: lp:bzr
Diff against target: 56 lines (+11/-3)
3 files modified
NEWS (+4/-0)
bzrlib/tree.py (+1/-1)
doc/en/whats-new/whats-new-in-2.3.txt (+6/-2)
To merge this branch: bzr merge lp:~spiv/bzr/faster-revert-593560
Reviewer Review Type Date Requested Status
bzr-core Pending
Review via email: mp+34043@code.launchpad.net

Commit message

Avoid repeatedly calling self.target.all_file_ids() in InterTree.iter_changes.

Description of the change

This simple change improves bzr revert and bzr status on large trees (i.e. gcc-linaro) with many changes (i.e. reverting from lp:gcc-linaro/4.5 to tag:gcc_4_5_0_release) by about 15% on my system. Specifically bzr revert to that tag goes from 3m 49s to 3m 18s, and bzr st on the resulting tree also drops from about 1.8s to 1.6s.

Tests still pass, and avoiding unnecessary (and repeated) calls to all_file_ids seems like a clearly good idea.

To post a comment you must log in.
Revision history for this message
Martin Pool (mbp) wrote :

That's pretty nice.

Perhaps the news entry can say a bit more about what changed so we can
understand it later: "by not repeatedly building a list of file ids."

I'd like you to add this to the whatsnew-2.3 docs too, under a
performance heading.

In some ways this patch feels incomplete not to have a test or a
prevention of regressions. I guess it is reasonable to keep
all_file_ids for people who want it. But perhaps just measuring
performance is the best way to keep it up, so +1 with those doc
tweaks.

--
Martin

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 8/30/2010 3:55 AM, Martin Pool wrote:
> That's pretty nice.
>
> Perhaps the news entry can say a bit more about what changed so we can
> understand it later: "by not repeatedly building a list of file ids."
>
> I'd like you to add this to the whatsnew-2.3 docs too, under a
> performance heading.
>
> In some ways this patch feels incomplete not to have a test or a
> prevention of regressions. I guess it is reasonable to keep
> all_file_ids for people who want it. But perhaps just measuring
> performance is the best way to keep it up, so +1 with those doc
> tweaks.
>

Can you switch it to use 'target.has_id()' instead? I think Martin has
mentioned that he doesn't really prefer the '__contains__' syntax for
trees. Certainly it is a bit unclear what 'in tree' should be passed.
(It takes a file_id, but could be a InventoryEntry, etc.)

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkx7upQACgkQJdeBCYSNAAOmUwCeOvR1CsV04QEq7gaS4co/xc4X
xEcAoL5OwV2iJLq1oqrj5DCcCCflg5gt
=LQSN
-----END PGP SIGNATURE-----

Revision history for this message
Martin Pool (mbp) wrote :

On 31 August 2010 00:05, John Arbash Meinel <email address hidden> wrote:
> Can you switch it to use 'target.has_id()' instead? I think Martin has
> mentioned that he doesn't really prefer the '__contains__' syntax for
> trees. Certainly it is a bit unclear what 'in tree' should be passed.
> (It takes a file_id, but could be a InventoryEntry, etc.)

Yes, if we have that and can use it I think it is much more explicit.

--
Martin

Revision history for this message
Andrew Bennetts (spiv) wrote :

sent to pqm by email

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'NEWS'
2--- NEWS 2010-08-30 10:14:21 +0000
3+++ NEWS 2010-08-31 07:14:41 +0000
4@@ -149,6 +149,10 @@
5 will be backed up (adding an extention of the form .~#~).
6 (Marius Kruger, #400554)
7
8+* ``bzr revert`` and ``bzr status`` are up to 15% faster on large trees
9+ with many changes by not repeatedly building a list of all file-ids.
10+ (Andrew Bennetts)
11+
12 * Decrease memory consumption when many chk index pages are loaded. (Such
13 as during ``bzr co`` or ``bzr ls -R`` of a large tree.) Often we need to
14 read many chk pages because the individual chk map nodes will be spread
15
16=== modified file 'bzrlib/tree.py'
17--- bzrlib/tree.py 2010-05-25 17:27:52 +0000
18+++ bzrlib/tree.py 2010-08-31 07:14:41 +0000
19@@ -1130,7 +1130,7 @@
20 if file_id in to_paths:
21 # already returned
22 continue
23- if file_id not in self.target.all_file_ids():
24+ if not self.target.has_id(file_id):
25 # common case - paths we have not emitted are not present in
26 # target.
27 to_path = None
28
29=== modified file 'doc/en/whats-new/whats-new-in-2.3.txt'
30--- doc/en/whats-new/whats-new-in-2.3.txt 2010-08-23 18:59:22 +0000
31+++ doc/en/whats-new/whats-new-in-2.3.txt 2010-08-31 07:14:41 +0000
32@@ -8,12 +8,12 @@
33 :doc:`../release-notes/index` for a full list.
34
35 Users are encouraged to upgrade from the other stable series. This
36-document outlines the improvements in Bazaar 2.2 vs Bazaar 2.1. As well as
37+document outlines the improvements in Bazaar 2.3 vs Bazaar 2.2. As well as
38 summarizing improvements made to the core product, it highlights
39 enhancements within the broader Bazaar world of potential interest to
40 those upgrading.
41
42-Bazaar 2.2.0 is fully compatible both locally and on the network with 2.0
43+Bazaar 2.3.0 is fully compatible both locally and on the network with 2.0
44 2.1, and 2.2, and can read and write repositories generated by all
45 previous versions.
46
47@@ -28,6 +28,10 @@
48 Performance improvements
49 ************************
50
51+* ``bzr revert`` and ``bzr status`` are up to 15% faster on large trees
52+ with many changes by not repeatedly building a list of all file-ids.
53+ (Andrew Bennetts)
54+
55 * ``bzr send`` uses less memory.
56 (John Arbash Meinel, #614576)
57