Merge lp:~bzr/bzr/faster-commit-file into lp:~bzr/bzr/trunk-old

Proposed by Ian Clatworthy on 2009-05-12
Status: Superseded
Proposed branch: lp:~bzr/bzr/faster-commit-file
Merge into: lp:~bzr/bzr/trunk-old
Diff against target: 367 lines (has conflicts)
Text conflict in bzrlib/osutils.py
Text conflict in bzrlib/tests/test_osutils.py
To merge this branch: bzr merge lp:~bzr/bzr/faster-commit-file
Reviewer Review Type Date Requested Status
Robert Collins (community) 2009-05-12 Disapprove on 2009-05-15
Bazaar Developers 2009-05-13 Pending
Review via email: mp+6468@code.launchpad.net

This proposal has been superseded by a proposal from 2009-06-25.

To post a comment you must log in.
Ian Clatworthy (ian-clatworthy) wrote :

This branch makes commit of selected files on development6-rich-root faster than it is on 1.9 format, instead of being twice as slow. All tests still pass and the code changes are pretty simple so I think it is safe for 1.15rc.

It's possible to get further improvements here by pushing more complexity down into iter_changes as bug #347649 suggests. Given selective commit performance is down to 0.27 seconds for Emacs with 3K files and 105K revisions, I'm not sure that's necessary yet. That path is certainly a lot more work than this patch, given the multiple implementations of iter_changes() around the place and the tuning that has gone into them to date.

Robert Collins (lifeless) wrote :

On Tue, 2009-05-12 at 14:05 +0000, Ian Clatworthy wrote:
> Ian Clatworthy has proposed merging lp:~bzr/bzr/faster-commit-file
> into lp:bzr.
>
> Requested reviews:
> Bazaar Developers (bzr)
>
> This branch makes commit of selected files on development6-rich-root
> faster than it is on 1.9 format, instead of being twice as slow. All
> tests still pass and the code changes are pretty simple so I think it
> is safe for 1.15rc.
>
> It's possible to get further improvements here by pushing more
> complexity down into iter_changes as bug #347649 suggests. Given
> selective commit performance is down to 0.27 seconds for Emacs with 3K
> files and 105K revisions, I'm not sure that's necessary yet. That path
> is certainly a lot more work than this patch, given the multiple
> implementations of iter_changes() around the place and the tuning that
> has gone into them to date.

This feels like a bit of a bandaid, and I don't think that the lack of
failures is an indication of correctness - there weren't tests for the
failures that bug 347649 refers to. For instance, simple inspection of
the patch suggests to me that you can miss the root id - e.g. a
pathological case would be a selective first commit with the root
skipped.

I appreciate not wanting to monkey with iter_changes, but the problem
with doing this just up at the commit level is that you have to black
box test. I don't think its safe to put these particular changes into
commit without specific tests.

If you were to put the masking support into iter-changes as a optional
decorating generator (*)you could using test iter-changes quite easily
(using the intertree tests) - if its fast enough for commit, then as you
say its likely fast enough in general, and by preserving the layering we
can optimise later. And unit tests for iter_changes will actually speak
to the core issue rather than circumstance, which is the most a test on
commit itself could do.

e.g.
def iter_changes_exclude_paths(changes, paths)
    for change in changes:
        if change[0][1] is_inside_any(paths):
            continue
    yield change

def iter_changes_include_added_parents(changes, specific_paths):
   ....

def iter_changes(....)
    result = self._iter_changes(...)
    if specific:
        result = iter_changes_include_added_parents(result, foo)
    if exclude:
        result = iter_changes_exclude_paths(result, foo)
    return result

 review disapprove

-Rob

review: Disapprove
Ian Clatworthy (ian-clatworthy) wrote :
Download full text (3.6 KiB)

Robert Collins wrote:
> Review: Disapprove
> On Tue, 2009-05-12 at 14:05 +0000, Ian Clatworthy wrote:
>
>> t's possible to get further improvements here by pushing more
>> complexity down into iter_changes as bug #347649 suggests. Given
>> selective commit performance is down to 0.27 seconds for Emacs with 3K
>> files and 105K revisions, I'm not sure that's necessary yet. That path
>> is certainly a lot more work than this patch, given the multiple
>> implementations of iter_changes() around the place and the tuning that
>> has gone into them to date.
>>
>
> This feels like a bit of a bandaid, and I don't think that the lack of
> failures is an indication of correctness - there weren't tests for the
> failures that bug 347649 refers to. For instance, simple inspection of
> the patch suggests to me that you can miss the root id - e.g. a
> pathological case would be a selective first commit with the root
> skipped.
>
>
I'd expect bzr commit -x '' to be a pointless commit. I suspect my code
will error in exactly that way. It's a *very* pathological case as well:
no sane person would do that on the command line (though it may happen
via the API).
> I appreciate not wanting to monkey with iter_changes, but the problem
> with doing this just up at the commit level is that you have to black
> box test. I don't think its safe to put these particular changes into
> commit without specific tests.
>
>
"bzr selftest commit" runs 2335 tests so I think it's test coverage is
pretty comprehensive. Lots of tests broke during the course of putting
this patch together fwiw, e.g. simply passing specific_files to
iter_changes breaks the test suite so bug 347649 is indeed trapped by tests.

> If you were to put the masking support into iter-changes as a optional
> decorating generator (*)you could using test iter-changes quite easily
> (using the intertree tests) - if its fast enough for commit, then as you
> say its likely fast enough in general, and by preserving the layering we
> can optimise later. And unit tests for iter_changes will actually speak
> to the core issue rather than circumstance, which is the most a test on
> commit itself could do.
>
> e.g.
> def iter_changes_exclude_paths(changes, paths)
> for change in changes:
> if change[0][1] is_inside_any(paths):
> continue
> yield change
>
> def iter_changes_include_added_parents(changes, specific_paths):
> ....
>
>
> def iter_changes(....)
> result = self._iter_changes(...)
> if specific:
> result = iter_changes_include_added_parents(result, foo)
> if exclude:
> result = iter_changes_exclude_paths(result, foo)
> return result
>
>
That still implies adding 2 new options to *multiple* implementations of
iter_changes: yield_parents and exclude_files, say. That's a far bigger
change than this for no additional performance gain. Longer term, I'd be
surprised if pushing exclude_files down could gain us much. Yielding
parents in-situ could be *slightly* faster though but it would take a
large delta before any speed difference was noticed I suspect.
> review disapprove
This patch is an interim step - we really want to leverage the
...

Read more...

Robert Collins (lifeless) wrote :

On Wed, 2009-05-13 at 03:33 +0000, Ian Clatworthy wrote:
>
> I'd expect bzr commit -x '' to be a pointless commit. I suspect my
> code
> will error in exactly that way. It's a *very* pathological case as
> well:
> no sane person would do that on the command line (though it may happen
> via the API).

Thats not what I mean. What I mean is
bzr init
bzr add *
bzr commit -m 'foo' README
-> bad data.

> > I appreciate not wanting to monkey with iter_changes, but the
> problem
> > with doing this just up at the commit level is that you have to
> black
> > box test. I don't think its safe to put these particular changes
> into
> > commit without specific tests.
> >
> >
> "bzr selftest commit" runs 2335 tests so I think it's test coverage is
> pretty comprehensive. Lots of tests broke during the course of putting
> this patch together fwiw, e.g. simply passing specific_files to
> iter_changes breaks the test suite so bug 347649 is indeed trapped by
> tests.

That count includes all the micro tests that test specific behaviours of
commit builder. I'm glad that we have accidental coverage of the bug;
I'm still very sure we don't have explicit coverage.

> That still implies adding 2 new options to *multiple* implementations
> of
> iter_changes: yield_parents and exclude_files, say. That's a far
> bigger
> change than this for no additional performance gain. Longer term, I'd
> be
> surprised if pushing exclude_files down could gain us much.

We have the concept that 'diff' and 'status' show us what 'commit' will
do. Until we have commit building on the same logic we'll struggle to
make that a reality. That is another thing that pushing the logic down
will help with.

> Yielding
> parents in-situ could be *slightly* faster though but it would take a
> large delta before any speed difference was noticed I suspect.
> > review disapprove

> This patch is an interim step - we really want to leverage the
> specific_files masking at the layer below so someone running 'bzr
> commit
> foo/bar' on the whole Debian tree gets a fast response that this patch
> delivers. Having said that, I don't think this patch is a band-aid and
> it meets all our criteria for approval: code quality is good, test
> coverage is not reduced and performance is improved by a factor of
> 4(!!)
> on a common operation. I think it ought to be landed as a step forward
> while I work on a faster solution. But that's just me.

Another factor now I think about this is people using
record_iter_changes. The caveats for using it are not really clear. And
having code in commit.py to workaround limits in iter_changes won't help
with that.

I can be persuaded that we should land this as is, but:
 - it definitely does not fix the bug report, because the bug is about
   iter_changes
 - I think its only a couple of hours work to do it right, within the
   constraints of 'get the interface solid and tested'. I don't see any
   benefit to not doing the work now, while its paged in and in your
mind.

-Rob

lp:~bzr/bzr/faster-commit-file updated on 2009-05-13
4357. By Ian Clatworthy on 2009-05-13

move exclude & parent yielding from commit into tree.py

4358. By Ian Clatworthy on 2009-05-13

add tests and tweak names in API

Ian Clatworthy (ian-clatworthy) wrote :

Resubmitted along the lines discussed on IRC.

Robert Collins (lifeless) wrote :
Download full text (10.1 KiB)

I've taken a diff against bzr.dev - review follows. Look for ****
introducing my comments

=== modified file 'bzrlib/commit.py'
--- bzrlib/commit.py 2009-04-03 00:07:49 +0000
+++ bzrlib/commit.py 2009-05-15 02:31:26 +0000
@@ -289,12 +289,14 @@
         # We can use record_iter_changes IFF iter_changes is compatible
with
         # the command line parameters, and the repository has fast
delta
         # generation. See bug 347649.
- self.use_record_iter_changes = (
- not self.specific_files and
- not self.exclude and
- not self.branch.repository._format.supports_tree_reference
and
- (self.branch.repository._format.fast_deltas or
- len(self.parents) < 2))
+ if self.branch.repository._format.supports_tree_reference:
+ # TODO: fix this
+ self.use_record_iter_changes = False
+ else:
+ self.use_record_iter_changes = (
+ self.branch.repository._format.fast_deltas
+ or (len(self.parents) < 2 and not self.specific_files
+ and not self.exclude))

**** You've altered this if block substantially, and I think
unintentionally. Please just use the old phrase and remove the two lines
that you're address. *OR* Update the comment to match what your new
if:else: actually does.

@@ -632,13 +634,16 @@
     def _update_builder_with_changes(self):
         """Update the commit builder with the data about what has
changed.
         """
- exclude = self.exclude
- specific_files = self.specific_files or []
- mutter("Selecting files for commit with filter %s",
specific_files)
+ mutter("Selecting files for commit with filter %s excluding %
s",
+ self.specific_files, self.exclude)

         self._check_strict()
         if self.use_record_iter_changes:
             iter_changes = self.work_tree.iter_changes(self.basis_tree)
+ if self.specific_files or self.exclude:
+ iter_changes =
tree.filter_iter_changes_by_paths(iter_changes,
+ self.specific_files, self.exclude,
+ yield_changed_parents=True)

**** I don't think a flag for 'yield_changed_parents' is needed; it
smells of YAGNI to me: its a new interface, and our only user wants
yield-changed-parents always on.

@@ -955,17 +959,16 @@

     def _set_specific_file_ids(self):
         """populate self.specific_file_ids if we will use it."""
- if not self.use_record_iter_changes:
- # If provided, ensure the specified files are versioned
- if self.specific_files is not None:
- # Note: This routine is being called because it raises
- # PathNotVersionedError as a side effect of finding the
IDs. We
- # later use the ids we found as input to the working
tree
- # inventory iterator, so we only consider those ids
rather than
- # examining the whole tree again.
- # XXX: Dont we have filter_unversioned to do this more
- # cheaply?
- self.specific_file_ids = tree.find_ids_across_trees(
- self.specific_files,...

review: Disapprove
Robert Collins (lifeless) wrote :

> **** Reading this, I'm not sure how you are getting the deltas for the
> parents if they were not selected by iter changes.
>
> Perhaps I"m missing something

(but I don't think I am - I've checked back against the previous diff
and this isn't a change from the refactoring; it was present in that
version as well.)

-Rob

Ian Clatworthy (ian-clatworthy) wrote :
Download full text (6.7 KiB)

Follow-up IRC discussion re review and next steps ...

(14:46:07) igc: lifeless: so, I might be stupid but I still don't see the problem ...
(14:46:13) lifeless: ok
(14:46:24) igc: iter_changes only needs to return changed parents, not all of them
(14:46:46) lifeless: right, but unless its looking at parents it won't return them, because iter_changes has a bug
(14:46:52) igc: i.e. if a parent isn't in iter_changes, then it isn't needed in apply_delta IIUIC
(14:47:22) igc: lifeless: but it *is* looking at parent right now
(14:47:27) lifeless: how?
(14:47:33) igc: it looks at the whole tree, then post-filters
(14:47:59) lifeless: ow
(14:48:17) lifeless: I'm really torn
(14:48:17) igc: and osutils.parent_directories doesn't need to return the root
(14:48:31) igc: at the os level, '' doesn't make sense
(14:48:52) lifeless: well, other code would be cleaner if it does - as you are making sure you have [''] always
(14:49:05) igc: also, my change to when record_iter_changes was deliberate ...
(14:49:15) igc: and the comment still hold exactly as is btw
(14:49:26) igc: I benchmarked to decide when it was the right thing fwiw
(14:51:24) lifeless: why the (len(self.parents) < 2 and not self.specific_files and not self.exclude)
(14:51:27) lifeless: clause?
(14:51:46) igc: because for non-chk formats, ...
(14:52:21) igc: it's slower than the current code otherwise
(14:52:29) igc: at least on Emacs
(14:53:17) igc: I'm happy to extend the comment along those lines
(14:53:42) lifeless: So this means that the self.specific_files and self.exclude processing with record_iter_changes is _slower_ than the old full-tree code path
(14:54:02) lifeless: and that its only faster with record_iter_changes on chk repositoreis because the chk code is able to compensate
(14:54:13) lifeless: I appreciate that you've got it going faster
(14:54:17) igc: yes, according to my measurements
(14:54:47) lifeless: I'm really worried that the other changes, such as the reinstance of files_across_trees are going to make it harder for someone to come bacj and fix bug 347649 properly
(14:54:48) igc: it was consistently slower - 08. vs 04 or something like that
(14:54:48) ubottu: Launchpad bug 347649 in bzr "iter_changes missing support needed for commit" [High,Confirmed] https://launchpad.net/bugs/347649
(14:54:56) lifeless: as they will have to basically undo this patch to fix it
(14:55:28) lifeless: on the other hand I don't want to block performance improvements
(14:55:35) igc: lifeless: I did look at trying to remove the file_across_trees bit but ...
(14:55:44) igc: it broke the test suite cos ...
(14:55:45) lifeless: do you have any suggestion about how we can do both things?
(14:55:56) igc: we trap for bogus include paths
(14:56:06) igc: and iter_changes doesn't see those
(14:56:21) lifeless: file_across_trees is a problem because it upcassts the dirstate to an inventory - its spectcularly slow
(14:56:44) lifeless: emacs is what 20K files?
(14:56:59) igc: just checking, we're talking about the _set_specific_file_ids() method yes?
(14:57:06) lifeless: yes
(14:57:25) igc: emacs is 8K from memory
(14:57:31) igc: 100K revisions but ...
(14:57:38) igc: that doesn't matter myuch here
(...

Read more...

lp:~bzr/bzr/faster-commit-file updated on 2009-06-25
4359. By Ian Clatworthy on 2009-06-25

merge bzr.dev r4476

4360. By Ian Clatworthy on 2009-06-25

simplify trigger for using record-iter-changes & make tree helper method private

Unmerged revisions

4360. By Ian Clatworthy on 2009-06-25

simplify trigger for using record-iter-changes & make tree helper method private

4359. By Ian Clatworthy on 2009-06-25

merge bzr.dev r4476

4358. By Ian Clatworthy on 2009-05-13

add tests and tweak names in API

4357. By Ian Clatworthy on 2009-05-13

move exclude & parent yielding from commit into tree.py

4356. By Ian Clatworthy on 2009-05-12

tweak exactly when record-iter-changes is used

4355. By Ian Clatworthy on 2009-05-12

minor tweaks and performance tuning

4354. By Ian Clatworthy on 2009-05-12

first cut at using iter_changes for selective commits

4353. By Ian Clatworthy on 2009-05-12

add osutils.parent_directories() with test

4352. By Ian Clatworthy on 2009-05-12

spelling fix in comment

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'NEWS'
2--- NEWS 2009-06-15 15:20:24 +0000
3+++ NEWS 2009-06-16 02:38:08 +0000
4@@ -291,6 +291,9 @@
5
6 * ``bzr rm *`` is now as fast as ``bzr rm * --keep``. (Johan Walles, #180116)
7
8+* Selective commit performance on ``development6-rich-root`` is now
9+ better than it is on ``1.9`` format. (Ian Clatworthy)
10+
11 Bug Fixes
12 *********
13
14
15=== modified file 'bzrlib/commit.py'
16--- bzrlib/commit.py 2009-06-15 19:04:38 +0000
17+++ bzrlib/commit.py 2009-06-16 02:38:08 +0000
18@@ -283,12 +283,14 @@
19 # We can use record_iter_changes IFF iter_changes is compatible with
20 # the command line parameters, and the repository has fast delta
21 # generation. See bug 347649.
22- self.use_record_iter_changes = (
23- not self.specific_files and
24- not self.exclude and
25- not self.branch.repository._format.supports_tree_reference and
26- (self.branch.repository._format.fast_deltas or
27- len(self.parents) < 2))
28+ if self.branch.repository._format.supports_tree_reference:
29+ # TODO: fix this
30+ self.use_record_iter_changes = False
31+ else:
32+ self.use_record_iter_changes = (
33+ self.branch.repository._format.fast_deltas
34+ or (len(self.parents) < 2 and not self.specific_files
35+ and not self.exclude))
36 self.pb = bzrlib.ui.ui_factory.nested_progress_bar()
37 self.basis_revid = self.work_tree.last_revision()
38 self.basis_tree = self.work_tree.basis_tree()
39@@ -576,10 +578,10 @@
40 except Exception, e:
41 found_exception = e
42 if found_exception is not None:
43- # don't do a plan raise, because the last exception may have been
44+ # don't do a plain raise, because the last exception may have been
45 # trashed, e is our sure-to-work exception even though it loses the
46 # full traceback. XXX: RBC 20060421 perhaps we could check the
47- # exc_info and if its the same one do a plain raise otherwise
48+ # exc_info and if it's the same one do a plain raise otherwise
49 # 'raise e' as we do now.
50 raise e
51
52@@ -614,13 +616,16 @@
53 def _update_builder_with_changes(self):
54 """Update the commit builder with the data about what has changed.
55 """
56- exclude = self.exclude
57- specific_files = self.specific_files or []
58- mutter("Selecting files for commit with filter %s", specific_files)
59+ mutter("Selecting files for commit with filter %s excluding %s",
60+ self.specific_files, self.exclude)
61
62 self._check_strict()
63 if self.use_record_iter_changes:
64 iter_changes = self.work_tree.iter_changes(self.basis_tree)
65+ if self.specific_files or self.exclude:
66+ iter_changes = tree.filter_iter_changes_by_paths(iter_changes,
67+ self.specific_files, self.exclude,
68+ yield_changed_parents=True)
69 iter_changes = self._filter_iter_changes(iter_changes)
70 for file_id, path, fs_hash in self.builder.record_iter_changes(
71 self.work_tree, self.basis_revid, iter_changes):
72@@ -636,7 +641,7 @@
73
74 This method reports on the changes in iter_changes to the user, and
75 converts 'missing' entries in the iter_changes iterator to 'deleted'
76- entries. 'missing' entries have their
77+ entries.
78
79 :param iter_changes: An iter_changes to process.
80 :return: A generator of changes.
81@@ -648,7 +653,6 @@
82 if report_changes:
83 old_path = change[1][0]
84 new_path = change[1][1]
85- versioned = change[3][1]
86 kind = change[6][1]
87 versioned = change[3][1]
88 if kind is None and versioned:
89@@ -937,17 +941,16 @@
90
91 def _set_specific_file_ids(self):
92 """populate self.specific_file_ids if we will use it."""
93- if not self.use_record_iter_changes:
94- # If provided, ensure the specified files are versioned
95- if self.specific_files is not None:
96- # Note: This routine is being called because it raises
97- # PathNotVersionedError as a side effect of finding the IDs. We
98- # later use the ids we found as input to the working tree
99- # inventory iterator, so we only consider those ids rather than
100- # examining the whole tree again.
101- # XXX: Dont we have filter_unversioned to do this more
102- # cheaply?
103- self.specific_file_ids = tree.find_ids_across_trees(
104- self.specific_files, [self.basis_tree, self.work_tree])
105- else:
106- self.specific_file_ids = None
107+ # If provided, ensure the specified files are versioned
108+ if self.specific_files is not None:
109+ # Note: This routine is being called because it raises
110+ # PathNotVersionedError as a side effect of finding the IDs. We
111+ # later use the ids we found as input to the working tree
112+ # inventory iterator, so we only consider those ids rather than
113+ # examining the whole tree again.
114+ # XXX: Dont we have filter_unversioned to do this more
115+ # cheaply?
116+ self.specific_file_ids = tree.find_ids_across_trees(
117+ self.specific_files, [self.basis_tree, self.work_tree])
118+ else:
119+ self.specific_file_ids = None
120
121=== modified file 'bzrlib/osutils.py'
122--- bzrlib/osutils.py 2009-06-10 06:25:56 +0000
123+++ bzrlib/osutils.py 2009-06-16 02:38:08 +0000
124@@ -868,6 +868,7 @@
125 return pathjoin(*p)
126
127
128+<<<<<<< TREE
129 def parent_directories(filename):
130 """Return the list of parent directories, deepest first.
131
132@@ -881,6 +882,18 @@
133 return parents
134
135
136+=======
137+def parent_directories(filename):
138+ """Return the list of parent directories, deepest first."""
139+ parents = []
140+ parts = splitpath(dirname(filename))
141+ while parts:
142+ parents.append(joinpath(parts))
143+ parts.pop()
144+ return parents
145+
146+
147+>>>>>>> MERGE-SOURCE
148 try:
149 from bzrlib._chunks_to_lines_pyx import chunks_to_lines
150 except ImportError:
151
152=== modified file 'bzrlib/repository.py'
153--- bzrlib/repository.py 2009-06-12 01:11:00 +0000
154+++ bzrlib/repository.py 2009-06-16 02:38:08 +0000
155@@ -563,7 +563,7 @@
156 :param iter_changes: An iter_changes iterator with the changes to apply
157 to basis_revision_id. The iterator must not include any items with
158 a current kind of None - missing items must be either filtered out
159- or errored-on beefore record_iter_changes sees the item.
160+ or errored-on before record_iter_changes sees the item.
161 :param _entry_factory: Private method to bind entry_factory locally for
162 performance.
163 :return: A generator of (file_id, relpath, fs_hash) tuples for use with
164
165=== modified file 'bzrlib/tests/test_osutils.py'
166--- bzrlib/tests/test_osutils.py 2009-06-10 03:56:49 +0000
167+++ bzrlib/tests/test_osutils.py 2009-06-16 02:38:09 +0000
168@@ -860,6 +860,7 @@
169 self.assertRaises(errors.BzrError, osutils.splitpath, 'a/../b')
170
171
172+<<<<<<< TREE
173 class TestParentDirectories(tests.TestCaseInTempDir):
174 """Test osutils.parent_directories()"""
175
176@@ -869,6 +870,20 @@
177 self.assertEqual(['a/b', 'a'], osutils.parent_directories('a/b/c'))
178
179
180+=======
181+class TestParentDirectories(tests.TestCaseInTempDir):
182+ """Test osutils.parent_directories()"""
183+
184+ def test_parent_directories(self):
185+ def check(expected, path):
186+ self.assertEqual(expected, osutils.parent_directories(path))
187+
188+ check([], 'a')
189+ check(['a'], 'a/b')
190+ check(['a/b', 'a'], 'a/b/c')
191+
192+
193+>>>>>>> MERGE-SOURCE
194 class TestMacFuncsDirs(tests.TestCaseInTempDir):
195 """Test mac special functions that require directories."""
196
197
198=== modified file 'bzrlib/tests/test_tree.py'
199--- bzrlib/tests/test_tree.py 2009-03-23 14:59:43 +0000
200+++ bzrlib/tests/test_tree.py 2009-06-16 02:38:09 +0000
201@@ -23,7 +23,7 @@
202 tree as _mod_tree,
203 )
204 from bzrlib.tests import TestCaseWithTransport
205-from bzrlib.tree import InterTree
206+from bzrlib.tree import InterTree, filter_iter_changes_by_paths
207
208
209 class TestInterTree(TestCaseWithTransport):
210@@ -417,3 +417,84 @@
211 self.assertPathToKey(([u''], u'a'), u'a')
212 self.assertPathToKey(([u'a'], u'b'), u'a/b')
213 self.assertPathToKey(([u'a', u'b'], u'c'), u'a/b/c')
214+
215+
216+class TestFilterIterChangesByPaths(TestCaseWithTransport):
217+ """Tests for tree.filter_iter_changes_by_paths()."""
218+
219+ def make_changes_iter(self):
220+ wt = self.make_branch_and_tree('.')
221+ b = wt.branch
222+ self.build_tree(['foo/', 'foo/foo1', 'bar/', 'bar/bar1', 'bar/bar2',
223+ 'baz'])
224+ wt.add(['foo', 'foo/foo1', 'bar', 'bar/bar1', 'bar/bar2', 'baz'],
225+ ['foo-id', 'foo1-id', 'bar-id', 'bar1-id', 'bar2-id', 'baz-id'])
226+ wt.commit('bar/bar1', specific_files=['bar/bar1'], rev_id='1')
227+ basis = wt.basis_tree()
228+ wt.lock_read()
229+ self.addCleanup(wt.unlock)
230+ basis.lock_read()
231+ self.addCleanup(basis.unlock)
232+ return wt.iter_changes(basis)
233+
234+ def test_no_filtering(self):
235+ original = self.make_changes_iter()
236+ original_list = list(original)
237+ filtered = filter_iter_changes_by_paths(iter(original_list))
238+ self.assertEqual(original_list, list(filtered))
239+
240+ def test_include_files(self):
241+ original = self.make_changes_iter()
242+ filtered = filter_iter_changes_by_paths(original,
243+ include_files=['bar'])
244+ self.assertEqual([
245+ ('bar2-id', (None, u'bar/bar2'), True, (False, True),
246+ (None, 'bar-id'), (None, u'bar2'), (None, 'file'), (None, 0)),
247+ ], list(filtered))
248+
249+ def test_include_files_yielding_changed_parents(self):
250+ original = self.make_changes_iter()
251+ filtered = filter_iter_changes_by_paths(original,
252+ include_files=['bar'], yield_changed_parents=True)
253+ self.assertEqual([
254+ ('bar2-id', (None, u'bar/bar2'), True, (False, True),
255+ (None, 'bar-id'), (None, u'bar2'), (None, 'file'), (None, 0)),
256+ ], list(filtered))
257+
258+ def test_include_files_yielding_changed_parents2(self):
259+ original = self.make_changes_iter()
260+ filtered = filter_iter_changes_by_paths(original,
261+ include_files=['foo'], yield_changed_parents=True)
262+ self.assertEqual([
263+ ('foo-id', (None, u'foo'), True, (False, True),
264+ (None, 'TREE_ROOT'), (None, u'foo'), (None, 'directory'), (None, 0)),
265+ ('foo1-id', (None, u'foo/foo1'), True, (False, True),
266+ (None, 'foo-id'), (None, u'foo1'), (None, 'file'), (None, 0)),
267+ ], list(filtered))
268+
269+ def test_exclude_files(self):
270+ original = self.make_changes_iter()
271+ filtered = filter_iter_changes_by_paths(original,
272+ exclude_files=['foo'])
273+ self.assertEqual([
274+ ('baz-id', (None, u'baz'), True, (False, True),
275+ (None, 'TREE_ROOT'), (None, u'baz'), (None, 'file'), (None, 0)),
276+ ('bar2-id', (None, u'bar/bar2'), True, (False, True),
277+ (None, 'bar-id'), (None, u'bar2'), (None, 'file'), (None, 0)),
278+ ], list(filtered))
279+ pass
280+
281+ def test_excludes_override_includes(self):
282+ original = self.make_changes_iter()
283+ filtered = filter_iter_changes_by_paths(original,
284+ include_files=['bar/bar2'], exclude_files=['bar'])
285+ self.assertEqual([], list(filtered))
286+
287+ def test_excludes_override_includes2(self):
288+ original = self.make_changes_iter()
289+ filtered = filter_iter_changes_by_paths(original,
290+ include_files=['foo'], exclude_files=['foo/foo1'])
291+ self.assertEqual([
292+ ('foo-id', (None, u'foo'), True, (False, True),
293+ (None, 'TREE_ROOT'), (None, u'foo'), (None, 'directory'), (None, 0)),
294+ ], list(filtered))
295
296=== modified file 'bzrlib/tree.py'
297--- bzrlib/tree.py 2009-06-10 03:56:49 +0000
298+++ bzrlib/tree.py 2009-06-16 02:38:09 +0000
299@@ -37,7 +37,7 @@
300 from bzrlib import errors
301 from bzrlib.inventory import Inventory, InventoryFile
302 from bzrlib.inter import InterObject
303-from bzrlib.osutils import fingerprint_file
304+from bzrlib.osutils import fingerprint_file, is_inside_any
305 import bzrlib.revision
306 from bzrlib.symbol_versioning import deprecated_function, deprecated_in
307 from bzrlib.trace import mutter, note
308@@ -1293,3 +1293,59 @@
309 other_values.append(self._lookup_by_file_id(
310 alt_extra, alt_tree, file_id))
311 yield other_path, file_id, None, other_values
312+
313+
314+def filter_iter_changes_by_paths(changes, include_files=None,
315+ exclude_files=None, yield_changed_parents=False):
316+ """Filter the results from iter_changes using paths.
317+
318+ This decorator is useful for post-processing the output from
319+ Tree.iter_changes(). It may also be used for post-procesing the output
320+ from result-compatible methods such as Inventory.iter_changes() and
321+ PreviewTree.iter_changes(). Note that specific-path filtering should not
322+ have already been applied.
323+
324+ :param changes: the iterator of changes to filter
325+ :param include_files: paths of files and directories to include or None
326+ for no masking.
327+ :param exclude_files: paths of files and directories to exclude or None
328+ for no masking. Excludes take precedence over includes.
329+ :param yield_changed_parents: if True, include changed parent directories
330+ of included paths.
331+ """
332+ if not (include_files or exclude_files):
333+ for change in changes:
334+ yield change
335+
336+ # Find the sets of files to include, additional parents and excludes,
337+ # always including the root in the parents
338+ includes_parents = set([''])
339+ if include_files:
340+ include_set = osutils.minimum_path_selection(include_files)
341+ if yield_changed_parents:
342+ for include in include_set:
343+ for parent in osutils.parent_directories(include):
344+ if parent not in include_set:
345+ includes_parents.add(parent)
346+ else:
347+ include_set = set([''])
348+ if exclude_files:
349+ exclude_set = osutils.minimum_path_selection(exclude_files)
350+ else:
351+ exclude_set = set([])
352+
353+ for change in changes:
354+ # Decide which path to use
355+ old_path, new_path = change[1]
356+ if new_path is None:
357+ path = old_path
358+ else:
359+ path = new_path
360+
361+ # Do the filtering
362+ if exclude_set and is_inside_any(exclude_set, path):
363+ continue
364+ elif is_inside_any(include_set, path):
365+ yield change
366+ elif yield_changed_parents and path in includes_parents:
367+ yield change