Merge lp:~bzr/bzr/faster-commit-file into lp:~bzr/bzr/trunk-old
- faster-commit-file
- Merge into trunk-old
Status: | Rejected | ||||
---|---|---|---|---|---|
Rejected by: | Ian Clatworthy | ||||
Proposed branch: | lp:~bzr/bzr/faster-commit-file | ||||
Merge into: | lp:~bzr/bzr/trunk-old | ||||
Diff against target: |
313 lines (has conflicts)
Text conflict in NEWS |
||||
To merge this branch: | bzr merge lp:~bzr/bzr/faster-commit-file | ||||
Related bugs: |
|
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Robert Collins (community) | Needs Fixing | ||
Review via email: mp+7885@code.launchpad.net |
This proposal supersedes a proposal from 2009-05-12.
Commit message
Description of the change
Ian Clatworthy (ian-clatworthy) wrote : Posted in a previous version of this proposal | # |
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal | # |
On Tue, 2009-05-12 at 14:05 +0000, Ian Clatworthy wrote:
> Ian Clatworthy has proposed merging lp:~bzr/bzr/faster-commit-file
> into lp:bzr.
>
> Requested reviews:
> Bazaar Developers (bzr)
>
> This branch makes commit of selected files on development6-
> faster than it is on 1.9 format, instead of being twice as slow. All
> tests still pass and the code changes are pretty simple so I think it
> is safe for 1.15rc.
>
> It's possible to get further improvements here by pushing more
> complexity down into iter_changes as bug #347649 suggests. Given
> selective commit performance is down to 0.27 seconds for Emacs with 3K
> files and 105K revisions, I'm not sure that's necessary yet. That path
> is certainly a lot more work than this patch, given the multiple
> implementations of iter_changes() around the place and the tuning that
> has gone into them to date.
This feels like a bit of a bandaid, and I don't think that the lack of
failures is an indication of correctness - there weren't tests for the
failures that bug 347649 refers to. For instance, simple inspection of
the patch suggests to me that you can miss the root id - e.g. a
pathological case would be a selective first commit with the root
skipped.
I appreciate not wanting to monkey with iter_changes, but the problem
with doing this just up at the commit level is that you have to black
box test. I don't think its safe to put these particular changes into
commit without specific tests.
If you were to put the masking support into iter-changes as a optional
decorating generator (*)you could using test iter-changes quite easily
(using the intertree tests) - if its fast enough for commit, then as you
say its likely fast enough in general, and by preserving the layering we
can optimise later. And unit tests for iter_changes will actually speak
to the core issue rather than circumstance, which is the most a test on
commit itself could do.
e.g.
def iter_changes_
for change in changes:
if change[0][1] is_inside_
yield change
def iter_changes_
....
def iter_changes(....)
result = self._iter_
if specific:
result = iter_changes_
if exclude:
result = iter_changes_
return result
review disapprove
-Rob
Ian Clatworthy (ian-clatworthy) wrote : Posted in a previous version of this proposal | # |
Robert Collins wrote:
> Review: Disapprove
> On Tue, 2009-05-12 at 14:05 +0000, Ian Clatworthy wrote:
>
>> t's possible to get further improvements here by pushing more
>> complexity down into iter_changes as bug #347649 suggests. Given
>> selective commit performance is down to 0.27 seconds for Emacs with 3K
>> files and 105K revisions, I'm not sure that's necessary yet. That path
>> is certainly a lot more work than this patch, given the multiple
>> implementations of iter_changes() around the place and the tuning that
>> has gone into them to date.
>>
>
> This feels like a bit of a bandaid, and I don't think that the lack of
> failures is an indication of correctness - there weren't tests for the
> failures that bug 347649 refers to. For instance, simple inspection of
> the patch suggests to me that you can miss the root id - e.g. a
> pathological case would be a selective first commit with the root
> skipped.
>
>
I'd expect bzr commit -x '' to be a pointless commit. I suspect my code
will error in exactly that way. It's a *very* pathological case as well:
no sane person would do that on the command line (though it may happen
via the API).
> I appreciate not wanting to monkey with iter_changes, but the problem
> with doing this just up at the commit level is that you have to black
> box test. I don't think its safe to put these particular changes into
> commit without specific tests.
>
>
"bzr selftest commit" runs 2335 tests so I think it's test coverage is
pretty comprehensive. Lots of tests broke during the course of putting
this patch together fwiw, e.g. simply passing specific_files to
iter_changes breaks the test suite so bug 347649 is indeed trapped by tests.
> If you were to put the masking support into iter-changes as a optional
> decorating generator (*)you could using test iter-changes quite easily
> (using the intertree tests) - if its fast enough for commit, then as you
> say its likely fast enough in general, and by preserving the layering we
> can optimise later. And unit tests for iter_changes will actually speak
> to the core issue rather than circumstance, which is the most a test on
> commit itself could do.
>
> e.g.
> def iter_changes_
> for change in changes:
> if change[0][1] is_inside_
> continue
> yield change
>
> def iter_changes_
> ....
>
>
> def iter_changes(....)
> result = self._iter_
> if specific:
> result = iter_changes_
> if exclude:
> result = iter_changes_
> return result
>
>
That still implies adding 2 new options to *multiple* implementations of
iter_changes: yield_parents and exclude_files, say. That's a far bigger
change than this for no additional performance gain. Longer term, I'd be
surprised if pushing exclude_files down could gain us much. Yielding
parents in-situ could be *slightly* faster though but it would take a
large delta before any speed difference was noticed I suspect.
> review disapprove
This patch is an interim step - we really want to leverage the
...
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal | # |
On Wed, 2009-05-13 at 03:33 +0000, Ian Clatworthy wrote:
>
> I'd expect bzr commit -x '' to be a pointless commit. I suspect my
> code
> will error in exactly that way. It's a *very* pathological case as
> well:
> no sane person would do that on the command line (though it may happen
> via the API).
Thats not what I mean. What I mean is
bzr init
bzr add *
bzr commit -m 'foo' README
-> bad data.
> > I appreciate not wanting to monkey with iter_changes, but the
> problem
> > with doing this just up at the commit level is that you have to
> black
> > box test. I don't think its safe to put these particular changes
> into
> > commit without specific tests.
> >
> >
> "bzr selftest commit" runs 2335 tests so I think it's test coverage is
> pretty comprehensive. Lots of tests broke during the course of putting
> this patch together fwiw, e.g. simply passing specific_files to
> iter_changes breaks the test suite so bug 347649 is indeed trapped by
> tests.
That count includes all the micro tests that test specific behaviours of
commit builder. I'm glad that we have accidental coverage of the bug;
I'm still very sure we don't have explicit coverage.
> That still implies adding 2 new options to *multiple* implementations
> of
> iter_changes: yield_parents and exclude_files, say. That's a far
> bigger
> change than this for no additional performance gain. Longer term, I'd
> be
> surprised if pushing exclude_files down could gain us much.
We have the concept that 'diff' and 'status' show us what 'commit' will
do. Until we have commit building on the same logic we'll struggle to
make that a reality. That is another thing that pushing the logic down
will help with.
> Yielding
> parents in-situ could be *slightly* faster though but it would take a
> large delta before any speed difference was noticed I suspect.
> > review disapprove
> This patch is an interim step - we really want to leverage the
> specific_files masking at the layer below so someone running 'bzr
> commit
> foo/bar' on the whole Debian tree gets a fast response that this patch
> delivers. Having said that, I don't think this patch is a band-aid and
> it meets all our criteria for approval: code quality is good, test
> coverage is not reduced and performance is improved by a factor of
> 4(!!)
> on a common operation. I think it ought to be landed as a step forward
> while I work on a faster solution. But that's just me.
Another factor now I think about this is people using
record_
having code in commit.py to workaround limits in iter_changes won't help
with that.
I can be persuaded that we should land this as is, but:
- it definitely does not fix the bug report, because the bug is about
iter_changes
- I think its only a couple of hours work to do it right, within the
constraints of 'get the interface solid and tested'. I don't see any
benefit to not doing the work now, while its paged in and in your
mind.
-Rob
Ian Clatworthy (ian-clatworthy) wrote : Posted in a previous version of this proposal | # |
Resubmitted along the lines discussed on IRC.
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal | # |
I've taken a diff against bzr.dev - review follows. Look for ****
introducing my comments
=== modified file 'bzrlib/commit.py'
--- bzrlib/commit.py 2009-04-03 00:07:49 +0000
+++ bzrlib/commit.py 2009-05-15 02:31:26 +0000
@@ -289,12 +289,14 @@
# We can use record_iter_changes IFF iter_changes is compatible
with
# the command line parameters, and the repository has fast
delta
# generation. See bug 347649.
- self.use_
- not self.specific_files and
- not self.exclude and
- not self.branch.
and
- (self.branch.
- len(self.parents) < 2))
+ if self.branch.
+ # TODO: fix this
+ self.use_
+ else:
+ self.use_
+ self.branch.
+ or (len(self.parents) < 2 and not self.specific_files
+ and not self.exclude))
**** You've altered this if block substantially, and I think
unintentionally. Please just use the old phrase and remove the two lines
that you're address. *OR* Update the comment to match what your new
if:else: actually does.
@@ -632,13 +634,16 @@
def _update_
"""Update the commit builder with the data about what has
changed.
"""
- exclude = self.exclude
- specific_files = self.specific_files or []
- mutter("Selecting files for commit with filter %s",
specific_files)
+ mutter("Selecting files for commit with filter %s excluding %
s",
+ self.specific_
if self.use_
+ if self.specific_files or self.exclude:
+ iter_changes =
tree.filter_
+ self.specific_
+ yield_changed_
**** I don't think a flag for 'yield_
smells of YAGNI to me: its a new interface, and our only user wants
yield-changed-
@@ -955,17 +959,16 @@
def _set_specific_
- if not self.use_
- # If provided, ensure the specified files are versioned
- if self.specific_files is not None:
- # Note: This routine is being called because it raises
- # PathNotVersione
IDs. We
- # later use the ids we found as input to the working
tree
- # inventory iterator, so we only consider those ids
rather than
- # examining the whole tree again.
- # XXX: Dont we have filter_unversioned to do this more
- # cheaply?
- self.specific_
- self.specific_
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal | # |
> **** Reading this, I'm not sure how you are getting the deltas for the
> parents if they were not selected by iter changes.
>
> Perhaps I"m missing something
(but I don't think I am - I've checked back against the previous diff
and this isn't a change from the refactoring; it was present in that
version as well.)
-Rob
Ian Clatworthy (ian-clatworthy) wrote : Posted in a previous version of this proposal | # |
Follow-up IRC discussion re review and next steps ...
(14:46:07) igc: lifeless: so, I might be stupid but I still don't see the problem ...
(14:46:13) lifeless: ok
(14:46:24) igc: iter_changes only needs to return changed parents, not all of them
(14:46:46) lifeless: right, but unless its looking at parents it won't return them, because iter_changes has a bug
(14:46:52) igc: i.e. if a parent isn't in iter_changes, then it isn't needed in apply_delta IIUIC
(14:47:22) igc: lifeless: but it *is* looking at parent right now
(14:47:27) lifeless: how?
(14:47:33) igc: it looks at the whole tree, then post-filters
(14:47:59) lifeless: ow
(14:48:17) lifeless: I'm really torn
(14:48:17) igc: and osutils.
(14:48:31) igc: at the os level, '' doesn't make sense
(14:48:52) lifeless: well, other code would be cleaner if it does - as you are making sure you have [''] always
(14:49:05) igc: also, my change to when record_iter_changes was deliberate ...
(14:49:15) igc: and the comment still hold exactly as is btw
(14:49:26) igc: I benchmarked to decide when it was the right thing fwiw
(14:51:24) lifeless: why the (len(self.parents) < 2 and not self.specific_files and not self.exclude)
(14:51:27) lifeless: clause?
(14:51:46) igc: because for non-chk formats, ...
(14:52:21) igc: it's slower than the current code otherwise
(14:52:29) igc: at least on Emacs
(14:53:17) igc: I'm happy to extend the comment along those lines
(14:53:42) lifeless: So this means that the self.specific_files and self.exclude processing with record_iter_changes is _slower_ than the old full-tree code path
(14:54:02) lifeless: and that its only faster with record_iter_changes on chk repositoreis because the chk code is able to compensate
(14:54:13) lifeless: I appreciate that you've got it going faster
(14:54:17) igc: yes, according to my measurements
(14:54:47) lifeless: I'm really worried that the other changes, such as the reinstance of files_across_trees are going to make it harder for someone to come bacj and fix bug 347649 properly
(14:54:48) igc: it was consistently slower - 08. vs 04 or something like that
(14:54:48) ubottu: Launchpad bug 347649 in bzr "iter_changes missing support needed for commit" [High,Confirmed] https:/
(14:54:56) lifeless: as they will have to basically undo this patch to fix it
(14:55:28) lifeless: on the other hand I don't want to block performance improvements
(14:55:35) igc: lifeless: I did look at trying to remove the file_across_trees bit but ...
(14:55:44) igc: it broke the test suite cos ...
(14:55:45) lifeless: do you have any suggestion about how we can do both things?
(14:55:56) igc: we trap for bogus include paths
(14:56:06) igc: and iter_changes doesn't see those
(14:56:21) lifeless: file_across_trees is a problem because it upcassts the dirstate to an inventory - its spectcularly slow
(14:56:44) lifeless: emacs is what 20K files?
(14:56:59) igc: just checking, we're talking about the _set_specific_
(14:57:06) lifeless: yes
(14:57:25) igc: emacs is 8K from memory
(14:57:31) igc: 100K revisions but ...
(14:57:38) igc: that doesn't matter myuch here
(...
Ian Clatworthy (ian-clatworthy) wrote : | # |
This is effectively a resubmission of my earlier patch 6 weeks ago, with some minor clean-ups. While I agree that we can do much better than this by pushing more functionality down into iter_changes, I still believe this is a safe and valuable step forward. For example, usertest results (run tonight) on the 'selective commit' task comparing bzr.dev r4476 to r4476+patch shows an improvement from 73.4 seconds to 3.0 seconds for OOo on format 2a. In other words, this patch improves performance from "absolutely terrible on large projects" to "acceptable for most users".
FWIW, I did spent quite a bit of time over two weeks trying a different approach, namely smarter use of iter_changes - doing include filtering there - plus decorators over those results. That proved to be a dead-end because of safety concerns: race-conditions around dirstate locking mean wrong results are possible. In comparison, this patch is safe as best I can tell.
I'll leave it to others to decide whether this is enough progress on this problem for us to ship 2.0. At a minimum, I think it changes the bug priority from Critical to something less.
Robert Collins (lifeless) wrote : | # |
Do you have cold cache (hot bzr, cold tree) performance data for it?
I ask because from my reading of the patch, it could well be worse for
ooo in that [not uncommon] scenario.
-Rob
Robert Collins (lifeless) wrote : | # |
This can generate bad commit deltas too; its the primary thing that is making my doing the changes at the right level problematic as well.
While I don't object in principle to layering like this if it is faster, correctness in the commit code path has to be a high priority.
I have a fix-in-iter-changes working, I'm currently focused on examining all the cases that determine whether the code can product bad deltas that will be accepted silently, corrupting repositories.
Ian Clatworthy (ian-clatworthy) wrote : | # |
This has been superceded by Robert's work so I'll change the status to Rejected.
Preview Diff
1 | === modified file 'NEWS' |
2 | --- NEWS 2009-08-18 20:05:30 +0000 |
3 | +++ NEWS 2009-08-19 02:39:06 +0000 |
4 | @@ -595,6 +595,7 @@ |
5 | or ``bzr+ssh://`` is now much faster and involves no VFS operations. |
6 | This speeds up commands like ``bzr pull -r 123``. (Andrew Bennetts) |
7 | |
8 | +<<<<<<< TREE |
9 | * ``revision-info`` now properly aligns the revnos/revids in the output |
10 | and doesn't traceback when given revisions not in the current branch. |
11 | Performance is also significantly improved when requesting multiple revs |
12 | @@ -603,6 +604,11 @@ |
13 | * Tildes are no longer escaped by Transports. (Andy Kilner) |
14 | |
15 | |
16 | +======= |
17 | +* Selective commit performance on ``2a`` is now better than it is on |
18 | + ``1.9`` format. (Ian Clatworthy) |
19 | + |
20 | +>>>>>>> MERGE-SOURCE |
21 | Documentation |
22 | ************* |
23 | |
24 | |
25 | === modified file 'bzrlib/commit.py' |
26 | --- bzrlib/commit.py 2009-07-15 05:54:37 +0000 |
27 | +++ bzrlib/commit.py 2009-08-19 02:39:06 +0000 |
28 | @@ -284,12 +284,14 @@ |
29 | # We can use record_iter_changes IFF iter_changes is compatible with |
30 | # the command line parameters, and the repository has fast delta |
31 | # generation. See bug 347649. |
32 | - self.use_record_iter_changes = ( |
33 | - not self.specific_files and |
34 | - not self.exclude and |
35 | - not self.branch.repository._format.supports_tree_reference and |
36 | - (self.branch.repository._format.fast_deltas or |
37 | - len(self.parents) < 2)) |
38 | + if self.branch.repository._format.supports_tree_reference: |
39 | + # TODO: fix this |
40 | + self.use_record_iter_changes = False |
41 | + else: |
42 | + self.use_record_iter_changes = \ |
43 | + self.branch.repository._format.fast_deltas |
44 | + #or (len(self.parents) < 2 and not self.specific_files |
45 | + #and not self.exclude)) |
46 | self.pb = bzrlib.ui.ui_factory.nested_progress_bar() |
47 | self.basis_revid = self.work_tree.last_revision() |
48 | self.basis_tree = self.work_tree.basis_tree() |
49 | @@ -580,10 +582,10 @@ |
50 | except Exception, e: |
51 | found_exception = e |
52 | if found_exception is not None: |
53 | - # don't do a plan raise, because the last exception may have been |
54 | + # don't do a plain raise, because the last exception may have been |
55 | # trashed, e is our sure-to-work exception even though it loses the |
56 | # full traceback. XXX: RBC 20060421 perhaps we could check the |
57 | - # exc_info and if its the same one do a plain raise otherwise |
58 | + # exc_info and if it's the same one do a plain raise otherwise |
59 | # 'raise e' as we do now. |
60 | raise e |
61 | |
62 | @@ -618,13 +620,16 @@ |
63 | def _update_builder_with_changes(self): |
64 | """Update the commit builder with the data about what has changed. |
65 | """ |
66 | - exclude = self.exclude |
67 | - specific_files = self.specific_files or [] |
68 | - mutter("Selecting files for commit with filter %s", specific_files) |
69 | + mutter("Selecting files for commit with filter %s excluding %s", |
70 | + self.specific_files, self.exclude) |
71 | |
72 | self._check_strict() |
73 | if self.use_record_iter_changes: |
74 | iter_changes = self.work_tree.iter_changes(self.basis_tree) |
75 | + if self.specific_files or self.exclude: |
76 | + iter_changes = tree._filter_iter_changes_by_paths(iter_changes, |
77 | + self.specific_files, self.exclude, |
78 | + yield_changed_parents=True) |
79 | iter_changes = self._filter_iter_changes(iter_changes) |
80 | for file_id, path, fs_hash in self.builder.record_iter_changes( |
81 | self.work_tree, self.basis_revid, iter_changes): |
82 | @@ -640,7 +645,7 @@ |
83 | |
84 | This method reports on the changes in iter_changes to the user, and |
85 | converts 'missing' entries in the iter_changes iterator to 'deleted' |
86 | - entries. 'missing' entries have their |
87 | + entries. |
88 | |
89 | :param iter_changes: An iter_changes to process. |
90 | :return: A generator of changes. |
91 | @@ -652,7 +657,6 @@ |
92 | if report_changes: |
93 | old_path = change[1][0] |
94 | new_path = change[1][1] |
95 | - versioned = change[3][1] |
96 | kind = change[6][1] |
97 | versioned = change[3][1] |
98 | if kind is None and versioned: |
99 | @@ -941,17 +945,16 @@ |
100 | |
101 | def _set_specific_file_ids(self): |
102 | """populate self.specific_file_ids if we will use it.""" |
103 | - if not self.use_record_iter_changes: |
104 | - # If provided, ensure the specified files are versioned |
105 | - if self.specific_files is not None: |
106 | - # Note: This routine is being called because it raises |
107 | - # PathNotVersionedError as a side effect of finding the IDs. We |
108 | - # later use the ids we found as input to the working tree |
109 | - # inventory iterator, so we only consider those ids rather than |
110 | - # examining the whole tree again. |
111 | - # XXX: Dont we have filter_unversioned to do this more |
112 | - # cheaply? |
113 | - self.specific_file_ids = tree.find_ids_across_trees( |
114 | - self.specific_files, [self.basis_tree, self.work_tree]) |
115 | - else: |
116 | - self.specific_file_ids = None |
117 | + # If provided, ensure the specified files are versioned |
118 | + if self.specific_files is not None: |
119 | + # Note: This routine is being called because it raises |
120 | + # PathNotVersionedError as a side effect of finding the IDs. We |
121 | + # later use the ids we found as input to the working tree |
122 | + # inventory iterator, so we only consider those ids rather than |
123 | + # examining the whole tree again. |
124 | + # XXX: Dont we have filter_unversioned to do this more |
125 | + # cheaply? |
126 | + self.specific_file_ids = tree.find_ids_across_trees( |
127 | + self.specific_files, [self.basis_tree, self.work_tree]) |
128 | + else: |
129 | + self.specific_file_ids = None |
130 | |
131 | === modified file 'bzrlib/repository.py' |
132 | --- bzrlib/repository.py 2009-08-17 23:15:55 +0000 |
133 | +++ bzrlib/repository.py 2009-08-19 02:39:06 +0000 |
134 | @@ -558,7 +558,7 @@ |
135 | :param iter_changes: An iter_changes iterator with the changes to apply |
136 | to basis_revision_id. The iterator must not include any items with |
137 | a current kind of None - missing items must be either filtered out |
138 | - or errored-on beefore record_iter_changes sees the item. |
139 | + or errored-on before record_iter_changes sees the item. |
140 | :param _entry_factory: Private method to bind entry_factory locally for |
141 | performance. |
142 | :return: A generator of (file_id, relpath, fs_hash) tuples for use with |
143 | |
144 | === modified file 'bzrlib/tests/test_tree.py' |
145 | --- bzrlib/tests/test_tree.py 2009-03-23 14:59:43 +0000 |
146 | +++ bzrlib/tests/test_tree.py 2009-08-19 02:39:06 +0000 |
147 | @@ -23,7 +23,7 @@ |
148 | tree as _mod_tree, |
149 | ) |
150 | from bzrlib.tests import TestCaseWithTransport |
151 | -from bzrlib.tree import InterTree |
152 | +from bzrlib.tree import InterTree, _filter_iter_changes_by_paths |
153 | |
154 | |
155 | class TestInterTree(TestCaseWithTransport): |
156 | @@ -417,3 +417,84 @@ |
157 | self.assertPathToKey(([u''], u'a'), u'a') |
158 | self.assertPathToKey(([u'a'], u'b'), u'a/b') |
159 | self.assertPathToKey(([u'a', u'b'], u'c'), u'a/b/c') |
160 | + |
161 | + |
162 | +class TestFilterIterChangesByPaths(TestCaseWithTransport): |
163 | + """Tests for tree._filter_iter_changes_by_paths().""" |
164 | + |
165 | + def make_changes_iter(self): |
166 | + wt = self.make_branch_and_tree('.') |
167 | + b = wt.branch |
168 | + self.build_tree(['foo/', 'foo/foo1', 'bar/', 'bar/bar1', 'bar/bar2', |
169 | + 'baz']) |
170 | + wt.add(['foo', 'foo/foo1', 'bar', 'bar/bar1', 'bar/bar2', 'baz'], |
171 | + ['foo-id', 'foo1-id', 'bar-id', 'bar1-id', 'bar2-id', 'baz-id']) |
172 | + wt.commit('bar/bar1', specific_files=['bar/bar1'], rev_id='1') |
173 | + basis = wt.basis_tree() |
174 | + wt.lock_read() |
175 | + self.addCleanup(wt.unlock) |
176 | + basis.lock_read() |
177 | + self.addCleanup(basis.unlock) |
178 | + return wt.iter_changes(basis) |
179 | + |
180 | + def test_no_filtering(self): |
181 | + original = self.make_changes_iter() |
182 | + original_list = list(original) |
183 | + filtered = _filter_iter_changes_by_paths(iter(original_list)) |
184 | + self.assertEqual(original_list, list(filtered)) |
185 | + |
186 | + def test_include_files(self): |
187 | + original = self.make_changes_iter() |
188 | + filtered = _filter_iter_changes_by_paths(original, |
189 | + include_files=['bar']) |
190 | + self.assertEqual([ |
191 | + ('bar2-id', (None, u'bar/bar2'), True, (False, True), |
192 | + (None, 'bar-id'), (None, u'bar2'), (None, 'file'), (None, 0)), |
193 | + ], list(filtered)) |
194 | + |
195 | + def test_include_files_yielding_changed_parents(self): |
196 | + original = self.make_changes_iter() |
197 | + filtered = _filter_iter_changes_by_paths(original, |
198 | + include_files=['bar'], yield_changed_parents=True) |
199 | + self.assertEqual([ |
200 | + ('bar2-id', (None, u'bar/bar2'), True, (False, True), |
201 | + (None, 'bar-id'), (None, u'bar2'), (None, 'file'), (None, 0)), |
202 | + ], list(filtered)) |
203 | + |
204 | + def test_include_files_yielding_changed_parents2(self): |
205 | + original = self.make_changes_iter() |
206 | + filtered = _filter_iter_changes_by_paths(original, |
207 | + include_files=['foo'], yield_changed_parents=True) |
208 | + self.assertEqual([ |
209 | + ('foo-id', (None, u'foo'), True, (False, True), |
210 | + (None, 'TREE_ROOT'), (None, u'foo'), (None, 'directory'), (None, 0)), |
211 | + ('foo1-id', (None, u'foo/foo1'), True, (False, True), |
212 | + (None, 'foo-id'), (None, u'foo1'), (None, 'file'), (None, 0)), |
213 | + ], list(filtered)) |
214 | + |
215 | + def test_exclude_files(self): |
216 | + original = self.make_changes_iter() |
217 | + filtered = _filter_iter_changes_by_paths(original, |
218 | + exclude_files=['foo']) |
219 | + self.assertEqual([ |
220 | + ('baz-id', (None, u'baz'), True, (False, True), |
221 | + (None, 'TREE_ROOT'), (None, u'baz'), (None, 'file'), (None, 0)), |
222 | + ('bar2-id', (None, u'bar/bar2'), True, (False, True), |
223 | + (None, 'bar-id'), (None, u'bar2'), (None, 'file'), (None, 0)), |
224 | + ], list(filtered)) |
225 | + pass |
226 | + |
227 | + def test_excludes_override_includes(self): |
228 | + original = self.make_changes_iter() |
229 | + filtered = _filter_iter_changes_by_paths(original, |
230 | + include_files=['bar/bar2'], exclude_files=['bar']) |
231 | + self.assertEqual([], list(filtered)) |
232 | + |
233 | + def test_excludes_override_includes2(self): |
234 | + original = self.make_changes_iter() |
235 | + filtered = _filter_iter_changes_by_paths(original, |
236 | + include_files=['foo'], exclude_files=['foo/foo1']) |
237 | + self.assertEqual([ |
238 | + ('foo-id', (None, u'foo'), True, (False, True), |
239 | + (None, 'TREE_ROOT'), (None, u'foo'), (None, 'directory'), (None, 0)), |
240 | + ], list(filtered)) |
241 | |
242 | === modified file 'bzrlib/tree.py' |
243 | --- bzrlib/tree.py 2009-07-17 06:04:35 +0000 |
244 | +++ bzrlib/tree.py 2009-08-19 02:39:06 +0000 |
245 | @@ -35,7 +35,7 @@ |
246 | from bzrlib import errors |
247 | from bzrlib.inventory import InventoryFile |
248 | from bzrlib.inter import InterObject |
249 | -from bzrlib.osutils import fingerprint_file |
250 | +from bzrlib.osutils import fingerprint_file, is_inside_any |
251 | import bzrlib.revision |
252 | from bzrlib.symbol_versioning import deprecated_function, deprecated_in |
253 | from bzrlib.trace import note |
254 | @@ -1289,3 +1289,59 @@ |
255 | other_values.append(self._lookup_by_file_id( |
256 | alt_extra, alt_tree, file_id)) |
257 | yield other_path, file_id, None, other_values |
258 | + |
259 | + |
260 | +def _filter_iter_changes_by_paths(changes, include_files=None, |
261 | + exclude_files=None, yield_changed_parents=False): |
262 | + """Filter the results from iter_changes using paths. |
263 | + |
264 | + This decorator is useful for post-processing the output from |
265 | + Tree.iter_changes(). It may also be used for post-procesing the output |
266 | + from result-compatible methods such as Inventory.iter_changes() and |
267 | + PreviewTree.iter_changes(). Note that specific-path filtering should not |
268 | + have already been applied. |
269 | + |
270 | + :param changes: the iterator of changes to filter |
271 | + :param include_files: paths of files and directories to include or None |
272 | + for no masking. |
273 | + :param exclude_files: paths of files and directories to exclude or None |
274 | + for no masking. Excludes take precedence over includes. |
275 | + :param yield_changed_parents: if True, include changed parent directories |
276 | + of included paths. |
277 | + """ |
278 | + if not (include_files or exclude_files): |
279 | + for change in changes: |
280 | + yield change |
281 | + |
282 | + # Find the sets of files to include, additional parents and excludes, |
283 | + # always including the root in the parents |
284 | + includes_parents = set(['']) |
285 | + if include_files: |
286 | + include_set = osutils.minimum_path_selection(include_files) |
287 | + if yield_changed_parents: |
288 | + for include in include_set: |
289 | + for parent in osutils.parent_directories(include): |
290 | + if parent not in include_set: |
291 | + includes_parents.add(parent) |
292 | + else: |
293 | + include_set = set(['']) |
294 | + if exclude_files: |
295 | + exclude_set = osutils.minimum_path_selection(exclude_files) |
296 | + else: |
297 | + exclude_set = set([]) |
298 | + |
299 | + for change in changes: |
300 | + # Decide which path to use |
301 | + old_path, new_path = change[1] |
302 | + if new_path is None: |
303 | + path = old_path |
304 | + else: |
305 | + path = new_path |
306 | + |
307 | + # Do the filtering |
308 | + if exclude_set and is_inside_any(exclude_set, path): |
309 | + continue |
310 | + elif is_inside_any(include_set, path): |
311 | + yield change |
312 | + elif yield_changed_parents and path in includes_parents: |
313 | + yield change |
This branch makes commit of selected files on development6- rich-root faster than it is on 1.9 format, instead of being twice as slow. All tests still pass and the code changes are pretty simple so I think it is safe for 1.15rc.
It's possible to get further improvements here by pushing more complexity down into iter_changes as bug #347649 suggests. Given selective commit performance is down to 0.27 seconds for Emacs with 3K files and 105K revisions, I'm not sure that's necessary yet. That path is certainly a lot more work than this patch, given the multiple implementations of iter_changes() around the place and the tuning that has gone into them to date.