Merge lp:~jameinel/bzr/2.4-transform-cache-sha-740932 into lp:bzr
Status: | Merged | ||||
---|---|---|---|---|---|
Approved by: | Andrew Bennetts | ||||
Approved revision: | no longer in the source branch. | ||||
Merged at revision: | 5775 | ||||
Proposed branch: | lp:~jameinel/bzr/2.4-transform-cache-sha-740932 | ||||
Merge into: | lp:bzr | ||||
Prerequisite: | lp:~jameinel/bzr/2.4-transform-create-file-740932 | ||||
Diff against target: |
134 lines (+61/-6) 3 files modified
bzrlib/tests/test_transform.py (+48/-0) bzrlib/transform.py (+7/-6) doc/en/release-notes/bzr-2.4.txt (+6/-0) |
||||
To merge this branch: | bzr merge lp:~jameinel/bzr/2.4-transform-cache-sha-740932 | ||||
Related bugs: |
|
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Andrew Bennetts | Approve | ||
Review via email: mp+56364@code.launchpad.net |
Commit message
Change ``bzrlib.
Description of the change
This is finally something that fixes the 'bzr status' time after 'bzr checkout'.
Specifically it updates 'build_tree' to save and pass around the sha1 to Transform.
The difference is pretty noticeable:
Without:
1m13s time bzr co --lightweight
37s time bzr st
1s time bzr st
With:
1m13s time bzr co --lightweight
5s time bzr st
1s time bzr st
Note, we still have some overhead. Specifically, we *only* cache the stat info for files. We don't really have a way to do so for Directories. So the first 'bzr st' still stats all the dirs and ends up rewriting the Dirstate file. However, we *don't* spend 32s re-reading and re-sha1ing all 576MB of gcc's tree. And even if we did, there would still be *some* files that are going to be within our 3s window, unless we add a time.sleep(3) before we call _apply_
The one downside is it is one more O(new_files) dict. Which then stores a tuple of sha and stat value. On my 64-bit machine, the numbers are:
w/o: VmPeak: 721700 kB
w/: VmPeak: 745276 kB
Given we are already in the 700MB range, I think we can live with another 24MB. I suppose that is one of the peak-memory conditions that I should profile and reduce. :)
Thanks for tackling this.
I wonder if we can take a further step in this direction still within
the current format by just making sure we never actually hash things
from disk: it's enough to just say that they're either identical to
the current tree, or they're unknown. Recording hashes different to
those in any of the basis trees has very little value.
I don't see why we would be recording or checking the stat values of
directories at all. I don't think it can save us any time.
Martin