Merge lp:~bzr/bzr/faster-commit-file into lp:~bzr/bzr/trunk-old

Proposed by Ian Clatworthy
Status: Superseded
Proposed branch: lp:~bzr/bzr/faster-commit-file
Merge into: lp:~bzr/bzr/trunk-old
Diff against target: 367 lines (has conflicts)
Text conflict in bzrlib/osutils.py
Text conflict in bzrlib/tests/test_osutils.py
To merge this branch: bzr merge lp:~bzr/bzr/faster-commit-file
Reviewer Review Type Date Requested Status
Robert Collins (community) Disapprove
Bazaar Developers Pending
Review via email: mp+6468@code.launchpad.net

This proposal has been superseded by a proposal from 2009-06-25.

To post a comment you must log in.
Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote :

This branch makes commit of selected files on development6-rich-root faster than it is on 1.9 format, instead of being twice as slow. All tests still pass and the code changes are pretty simple so I think it is safe for 1.15rc.

It's possible to get further improvements here by pushing more complexity down into iter_changes as bug #347649 suggests. Given selective commit performance is down to 0.27 seconds for Emacs with 3K files and 105K revisions, I'm not sure that's necessary yet. That path is certainly a lot more work than this patch, given the multiple implementations of iter_changes() around the place and the tuning that has gone into them to date.

Revision history for this message
Robert Collins (lifeless) wrote :

On Tue, 2009-05-12 at 14:05 +0000, Ian Clatworthy wrote:
> Ian Clatworthy has proposed merging lp:~bzr/bzr/faster-commit-file
> into lp:bzr.
>
> Requested reviews:
> Bazaar Developers (bzr)
>
> This branch makes commit of selected files on development6-rich-root
> faster than it is on 1.9 format, instead of being twice as slow. All
> tests still pass and the code changes are pretty simple so I think it
> is safe for 1.15rc.
>
> It's possible to get further improvements here by pushing more
> complexity down into iter_changes as bug #347649 suggests. Given
> selective commit performance is down to 0.27 seconds for Emacs with 3K
> files and 105K revisions, I'm not sure that's necessary yet. That path
> is certainly a lot more work than this patch, given the multiple
> implementations of iter_changes() around the place and the tuning that
> has gone into them to date.

This feels like a bit of a bandaid, and I don't think that the lack of
failures is an indication of correctness - there weren't tests for the
failures that bug 347649 refers to. For instance, simple inspection of
the patch suggests to me that you can miss the root id - e.g. a
pathological case would be a selective first commit with the root
skipped.

I appreciate not wanting to monkey with iter_changes, but the problem
with doing this just up at the commit level is that you have to black
box test. I don't think its safe to put these particular changes into
commit without specific tests.

If you were to put the masking support into iter-changes as a optional
decorating generator (*)you could using test iter-changes quite easily
(using the intertree tests) - if its fast enough for commit, then as you
say its likely fast enough in general, and by preserving the layering we
can optimise later. And unit tests for iter_changes will actually speak
to the core issue rather than circumstance, which is the most a test on
commit itself could do.

e.g.
def iter_changes_exclude_paths(changes, paths)
    for change in changes:
        if change[0][1] is_inside_any(paths):
            continue
    yield change

def iter_changes_include_added_parents(changes, specific_paths):
   ....

def iter_changes(....)
    result = self._iter_changes(...)
    if specific:
        result = iter_changes_include_added_parents(result, foo)
    if exclude:
        result = iter_changes_exclude_paths(result, foo)
    return result

 review disapprove

-Rob

review: Disapprove
Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote :
Download full text (3.6 KiB)

Robert Collins wrote:
> Review: Disapprove
> On Tue, 2009-05-12 at 14:05 +0000, Ian Clatworthy wrote:
>
>> t's possible to get further improvements here by pushing more
>> complexity down into iter_changes as bug #347649 suggests. Given
>> selective commit performance is down to 0.27 seconds for Emacs with 3K
>> files and 105K revisions, I'm not sure that's necessary yet. That path
>> is certainly a lot more work than this patch, given the multiple
>> implementations of iter_changes() around the place and the tuning that
>> has gone into them to date.
>>
>
> This feels like a bit of a bandaid, and I don't think that the lack of
> failures is an indication of correctness - there weren't tests for the
> failures that bug 347649 refers to. For instance, simple inspection of
> the patch suggests to me that you can miss the root id - e.g. a
> pathological case would be a selective first commit with the root
> skipped.
>
>
I'd expect bzr commit -x '' to be a pointless commit. I suspect my code
will error in exactly that way. It's a *very* pathological case as well:
no sane person would do that on the command line (though it may happen
via the API).
> I appreciate not wanting to monkey with iter_changes, but the problem
> with doing this just up at the commit level is that you have to black
> box test. I don't think its safe to put these particular changes into
> commit without specific tests.
>
>
"bzr selftest commit" runs 2335 tests so I think it's test coverage is
pretty comprehensive. Lots of tests broke during the course of putting
this patch together fwiw, e.g. simply passing specific_files to
iter_changes breaks the test suite so bug 347649 is indeed trapped by tests.

> If you were to put the masking support into iter-changes as a optional
> decorating generator (*)you could using test iter-changes quite easily
> (using the intertree tests) - if its fast enough for commit, then as you
> say its likely fast enough in general, and by preserving the layering we
> can optimise later. And unit tests for iter_changes will actually speak
> to the core issue rather than circumstance, which is the most a test on
> commit itself could do.
>
> e.g.
> def iter_changes_exclude_paths(changes, paths)
> for change in changes:
> if change[0][1] is_inside_any(paths):
> continue
> yield change
>
> def iter_changes_include_added_parents(changes, specific_paths):
> ....
>
>
> def iter_changes(....)
> result = self._iter_changes(...)
> if specific:
> result = iter_changes_include_added_parents(result, foo)
> if exclude:
> result = iter_changes_exclude_paths(result, foo)
> return result
>
>
That still implies adding 2 new options to *multiple* implementations of
iter_changes: yield_parents and exclude_files, say. That's a far bigger
change than this for no additional performance gain. Longer term, I'd be
surprised if pushing exclude_files down could gain us much. Yielding
parents in-situ could be *slightly* faster though but it would take a
large delta before any speed difference was noticed I suspect.
> review disapprove
This patch is an interim step - we really want to leverage the
...

Read more...

Revision history for this message
Robert Collins (lifeless) wrote :

On Wed, 2009-05-13 at 03:33 +0000, Ian Clatworthy wrote:
>
> I'd expect bzr commit -x '' to be a pointless commit. I suspect my
> code
> will error in exactly that way. It's a *very* pathological case as
> well:
> no sane person would do that on the command line (though it may happen
> via the API).

Thats not what I mean. What I mean is
bzr init
bzr add *
bzr commit -m 'foo' README
-> bad data.

> > I appreciate not wanting to monkey with iter_changes, but the
> problem
> > with doing this just up at the commit level is that you have to
> black
> > box test. I don't think its safe to put these particular changes
> into
> > commit without specific tests.
> >
> >
> "bzr selftest commit" runs 2335 tests so I think it's test coverage is
> pretty comprehensive. Lots of tests broke during the course of putting
> this patch together fwiw, e.g. simply passing specific_files to
> iter_changes breaks the test suite so bug 347649 is indeed trapped by
> tests.

That count includes all the micro tests that test specific behaviours of
commit builder. I'm glad that we have accidental coverage of the bug;
I'm still very sure we don't have explicit coverage.

> That still implies adding 2 new options to *multiple* implementations
> of
> iter_changes: yield_parents and exclude_files, say. That's a far
> bigger
> change than this for no additional performance gain. Longer term, I'd
> be
> surprised if pushing exclude_files down could gain us much.

We have the concept that 'diff' and 'status' show us what 'commit' will
do. Until we have commit building on the same logic we'll struggle to
make that a reality. That is another thing that pushing the logic down
will help with.

> Yielding
> parents in-situ could be *slightly* faster though but it would take a
> large delta before any speed difference was noticed I suspect.
> > review disapprove

> This patch is an interim step - we really want to leverage the
> specific_files masking at the layer below so someone running 'bzr
> commit
> foo/bar' on the whole Debian tree gets a fast response that this patch
> delivers. Having said that, I don't think this patch is a band-aid and
> it meets all our criteria for approval: code quality is good, test
> coverage is not reduced and performance is improved by a factor of
> 4(!!)
> on a common operation. I think it ought to be landed as a step forward
> while I work on a faster solution. But that's just me.

Another factor now I think about this is people using
record_iter_changes. The caveats for using it are not really clear. And
having code in commit.py to workaround limits in iter_changes won't help
with that.

I can be persuaded that we should land this as is, but:
 - it definitely does not fix the bug report, because the bug is about
   iter_changes
 - I think its only a couple of hours work to do it right, within the
   constraints of 'get the interface solid and tested'. I don't see any
   benefit to not doing the work now, while its paged in and in your
mind.

-Rob

Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote :

Resubmitted along the lines discussed on IRC.

Revision history for this message
Robert Collins (lifeless) wrote :
Download full text (10.1 KiB)

I've taken a diff against bzr.dev - review follows. Look for ****
introducing my comments

=== modified file 'bzrlib/commit.py'
--- bzrlib/commit.py 2009-04-03 00:07:49 +0000
+++ bzrlib/commit.py 2009-05-15 02:31:26 +0000
@@ -289,12 +289,14 @@
         # We can use record_iter_changes IFF iter_changes is compatible
with
         # the command line parameters, and the repository has fast
delta
         # generation. See bug 347649.
- self.use_record_iter_changes = (
- not self.specific_files and
- not self.exclude and
- not self.branch.repository._format.supports_tree_reference
and
- (self.branch.repository._format.fast_deltas or
- len(self.parents) < 2))
+ if self.branch.repository._format.supports_tree_reference:
+ # TODO: fix this
+ self.use_record_iter_changes = False
+ else:
+ self.use_record_iter_changes = (
+ self.branch.repository._format.fast_deltas
+ or (len(self.parents) < 2 and not self.specific_files
+ and not self.exclude))

**** You've altered this if block substantially, and I think
unintentionally. Please just use the old phrase and remove the two lines
that you're address. *OR* Update the comment to match what your new
if:else: actually does.

@@ -632,13 +634,16 @@
     def _update_builder_with_changes(self):
         """Update the commit builder with the data about what has
changed.
         """
- exclude = self.exclude
- specific_files = self.specific_files or []
- mutter("Selecting files for commit with filter %s",
specific_files)
+ mutter("Selecting files for commit with filter %s excluding %
s",
+ self.specific_files, self.exclude)

         self._check_strict()
         if self.use_record_iter_changes:
             iter_changes = self.work_tree.iter_changes(self.basis_tree)
+ if self.specific_files or self.exclude:
+ iter_changes =
tree.filter_iter_changes_by_paths(iter_changes,
+ self.specific_files, self.exclude,
+ yield_changed_parents=True)

**** I don't think a flag for 'yield_changed_parents' is needed; it
smells of YAGNI to me: its a new interface, and our only user wants
yield-changed-parents always on.

@@ -955,17 +959,16 @@

     def _set_specific_file_ids(self):
         """populate self.specific_file_ids if we will use it."""
- if not self.use_record_iter_changes:
- # If provided, ensure the specified files are versioned
- if self.specific_files is not None:
- # Note: This routine is being called because it raises
- # PathNotVersionedError as a side effect of finding the
IDs. We
- # later use the ids we found as input to the working
tree
- # inventory iterator, so we only consider those ids
rather than
- # examining the whole tree again.
- # XXX: Dont we have filter_unversioned to do this more
- # cheaply?
- self.specific_file_ids = tree.find_ids_across_trees(
- self.specific_files,...

review: Disapprove
Revision history for this message
Robert Collins (lifeless) wrote :

> **** Reading this, I'm not sure how you are getting the deltas for the
> parents if they were not selected by iter changes.
>
> Perhaps I"m missing something

(but I don't think I am - I've checked back against the previous diff
and this isn't a change from the refactoring; it was present in that
version as well.)

-Rob

Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote :
Download full text (6.7 KiB)

Follow-up IRC discussion re review and next steps ...

(14:46:07) igc: lifeless: so, I might be stupid but I still don't see the problem ...
(14:46:13) lifeless: ok
(14:46:24) igc: iter_changes only needs to return changed parents, not all of them
(14:46:46) lifeless: right, but unless its looking at parents it won't return them, because iter_changes has a bug
(14:46:52) igc: i.e. if a parent isn't in iter_changes, then it isn't needed in apply_delta IIUIC
(14:47:22) igc: lifeless: but it *is* looking at parent right now
(14:47:27) lifeless: how?
(14:47:33) igc: it looks at the whole tree, then post-filters
(14:47:59) lifeless: ow
(14:48:17) lifeless: I'm really torn
(14:48:17) igc: and osutils.parent_directories doesn't need to return the root
(14:48:31) igc: at the os level, '' doesn't make sense
(14:48:52) lifeless: well, other code would be cleaner if it does - as you are making sure you have [''] always
(14:49:05) igc: also, my change to when record_iter_changes was deliberate ...
(14:49:15) igc: and the comment still hold exactly as is btw
(14:49:26) igc: I benchmarked to decide when it was the right thing fwiw
(14:51:24) lifeless: why the (len(self.parents) < 2 and not self.specific_files and not self.exclude)
(14:51:27) lifeless: clause?
(14:51:46) igc: because for non-chk formats, ...
(14:52:21) igc: it's slower than the current code otherwise
(14:52:29) igc: at least on Emacs
(14:53:17) igc: I'm happy to extend the comment along those lines
(14:53:42) lifeless: So this means that the self.specific_files and self.exclude processing with record_iter_changes is _slower_ than the old full-tree code path
(14:54:02) lifeless: and that its only faster with record_iter_changes on chk repositoreis because the chk code is able to compensate
(14:54:13) lifeless: I appreciate that you've got it going faster
(14:54:17) igc: yes, according to my measurements
(14:54:47) lifeless: I'm really worried that the other changes, such as the reinstance of files_across_trees are going to make it harder for someone to come bacj and fix bug 347649 properly
(14:54:48) igc: it was consistently slower - 08. vs 04 or something like that
(14:54:48) ubottu: Launchpad bug 347649 in bzr "iter_changes missing support needed for commit" [High,Confirmed] https://launchpad.net/bugs/347649
(14:54:56) lifeless: as they will have to basically undo this patch to fix it
(14:55:28) lifeless: on the other hand I don't want to block performance improvements
(14:55:35) igc: lifeless: I did look at trying to remove the file_across_trees bit but ...
(14:55:44) igc: it broke the test suite cos ...
(14:55:45) lifeless: do you have any suggestion about how we can do both things?
(14:55:56) igc: we trap for bogus include paths
(14:56:06) igc: and iter_changes doesn't see those
(14:56:21) lifeless: file_across_trees is a problem because it upcassts the dirstate to an inventory - its spectcularly slow
(14:56:44) lifeless: emacs is what 20K files?
(14:56:59) igc: just checking, we're talking about the _set_specific_file_ids() method yes?
(14:57:06) lifeless: yes
(14:57:25) igc: emacs is 8K from memory
(14:57:31) igc: 100K revisions but ...
(14:57:38) igc: that doesn't matter myuch here
(...

Read more...

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'NEWS'
--- NEWS 2009-06-15 15:20:24 +0000
+++ NEWS 2009-06-16 02:38:08 +0000
@@ -291,6 +291,9 @@
291291
292* ``bzr rm *`` is now as fast as ``bzr rm * --keep``. (Johan Walles, #180116)292* ``bzr rm *`` is now as fast as ``bzr rm * --keep``. (Johan Walles, #180116)
293293
294* Selective commit performance on ``development6-rich-root`` is now
295 better than it is on ``1.9`` format. (Ian Clatworthy)
296
294Bug Fixes297Bug Fixes
295*********298*********
296299
297300
=== modified file 'bzrlib/commit.py'
--- bzrlib/commit.py 2009-06-15 19:04:38 +0000
+++ bzrlib/commit.py 2009-06-16 02:38:08 +0000
@@ -283,12 +283,14 @@
283 # We can use record_iter_changes IFF iter_changes is compatible with283 # We can use record_iter_changes IFF iter_changes is compatible with
284 # the command line parameters, and the repository has fast delta284 # the command line parameters, and the repository has fast delta
285 # generation. See bug 347649.285 # generation. See bug 347649.
286 self.use_record_iter_changes = (286 if self.branch.repository._format.supports_tree_reference:
287 not self.specific_files and287 # TODO: fix this
288 not self.exclude and 288 self.use_record_iter_changes = False
289 not self.branch.repository._format.supports_tree_reference and289 else:
290 (self.branch.repository._format.fast_deltas or290 self.use_record_iter_changes = (
291 len(self.parents) < 2))291 self.branch.repository._format.fast_deltas
292 or (len(self.parents) < 2 and not self.specific_files
293 and not self.exclude))
292 self.pb = bzrlib.ui.ui_factory.nested_progress_bar()294 self.pb = bzrlib.ui.ui_factory.nested_progress_bar()
293 self.basis_revid = self.work_tree.last_revision()295 self.basis_revid = self.work_tree.last_revision()
294 self.basis_tree = self.work_tree.basis_tree()296 self.basis_tree = self.work_tree.basis_tree()
@@ -576,10 +578,10 @@
576 except Exception, e:578 except Exception, e:
577 found_exception = e579 found_exception = e
578 if found_exception is not None:580 if found_exception is not None:
579 # don't do a plan raise, because the last exception may have been581 # don't do a plain raise, because the last exception may have been
580 # trashed, e is our sure-to-work exception even though it loses the582 # trashed, e is our sure-to-work exception even though it loses the
581 # full traceback. XXX: RBC 20060421 perhaps we could check the583 # full traceback. XXX: RBC 20060421 perhaps we could check the
582 # exc_info and if its the same one do a plain raise otherwise584 # exc_info and if it's the same one do a plain raise otherwise
583 # 'raise e' as we do now.585 # 'raise e' as we do now.
584 raise e586 raise e
585587
@@ -614,13 +616,16 @@
614 def _update_builder_with_changes(self):616 def _update_builder_with_changes(self):
615 """Update the commit builder with the data about what has changed.617 """Update the commit builder with the data about what has changed.
616 """618 """
617 exclude = self.exclude619 mutter("Selecting files for commit with filter %s excluding %s",
618 specific_files = self.specific_files or []620 self.specific_files, self.exclude)
619 mutter("Selecting files for commit with filter %s", specific_files)
620621
621 self._check_strict()622 self._check_strict()
622 if self.use_record_iter_changes:623 if self.use_record_iter_changes:
623 iter_changes = self.work_tree.iter_changes(self.basis_tree)624 iter_changes = self.work_tree.iter_changes(self.basis_tree)
625 if self.specific_files or self.exclude:
626 iter_changes = tree.filter_iter_changes_by_paths(iter_changes,
627 self.specific_files, self.exclude,
628 yield_changed_parents=True)
624 iter_changes = self._filter_iter_changes(iter_changes)629 iter_changes = self._filter_iter_changes(iter_changes)
625 for file_id, path, fs_hash in self.builder.record_iter_changes(630 for file_id, path, fs_hash in self.builder.record_iter_changes(
626 self.work_tree, self.basis_revid, iter_changes):631 self.work_tree, self.basis_revid, iter_changes):
@@ -636,7 +641,7 @@
636641
637 This method reports on the changes in iter_changes to the user, and 642 This method reports on the changes in iter_changes to the user, and
638 converts 'missing' entries in the iter_changes iterator to 'deleted'643 converts 'missing' entries in the iter_changes iterator to 'deleted'
639 entries. 'missing' entries have their644 entries.
640645
641 :param iter_changes: An iter_changes to process.646 :param iter_changes: An iter_changes to process.
642 :return: A generator of changes.647 :return: A generator of changes.
@@ -648,7 +653,6 @@
648 if report_changes:653 if report_changes:
649 old_path = change[1][0]654 old_path = change[1][0]
650 new_path = change[1][1]655 new_path = change[1][1]
651 versioned = change[3][1]
652 kind = change[6][1]656 kind = change[6][1]
653 versioned = change[3][1]657 versioned = change[3][1]
654 if kind is None and versioned:658 if kind is None and versioned:
@@ -937,17 +941,16 @@
937941
938 def _set_specific_file_ids(self):942 def _set_specific_file_ids(self):
939 """populate self.specific_file_ids if we will use it."""943 """populate self.specific_file_ids if we will use it."""
940 if not self.use_record_iter_changes:944 # If provided, ensure the specified files are versioned
941 # If provided, ensure the specified files are versioned945 if self.specific_files is not None:
942 if self.specific_files is not None:946 # Note: This routine is being called because it raises
943 # Note: This routine is being called because it raises947 # PathNotVersionedError as a side effect of finding the IDs. We
944 # PathNotVersionedError as a side effect of finding the IDs. We948 # later use the ids we found as input to the working tree
945 # later use the ids we found as input to the working tree949 # inventory iterator, so we only consider those ids rather than
946 # inventory iterator, so we only consider those ids rather than950 # examining the whole tree again.
947 # examining the whole tree again.951 # XXX: Dont we have filter_unversioned to do this more
948 # XXX: Dont we have filter_unversioned to do this more952 # cheaply?
949 # cheaply?953 self.specific_file_ids = tree.find_ids_across_trees(
950 self.specific_file_ids = tree.find_ids_across_trees(954 self.specific_files, [self.basis_tree, self.work_tree])
951 self.specific_files, [self.basis_tree, self.work_tree])955 else:
952 else:956 self.specific_file_ids = None
953 self.specific_file_ids = None
954957
=== modified file 'bzrlib/osutils.py'
--- bzrlib/osutils.py 2009-06-10 06:25:56 +0000
+++ bzrlib/osutils.py 2009-06-16 02:38:08 +0000
@@ -868,6 +868,7 @@
868 return pathjoin(*p)868 return pathjoin(*p)
869869
870870
871<<<<<<< TREE
871def parent_directories(filename):872def parent_directories(filename):
872 """Return the list of parent directories, deepest first.873 """Return the list of parent directories, deepest first.
873 874
@@ -881,6 +882,18 @@
881 return parents882 return parents
882883
883884
885=======
886def parent_directories(filename):
887 """Return the list of parent directories, deepest first."""
888 parents = []
889 parts = splitpath(dirname(filename))
890 while parts:
891 parents.append(joinpath(parts))
892 parts.pop()
893 return parents
894
895
896>>>>>>> MERGE-SOURCE
884try:897try:
885 from bzrlib._chunks_to_lines_pyx import chunks_to_lines898 from bzrlib._chunks_to_lines_pyx import chunks_to_lines
886except ImportError:899except ImportError:
887900
=== modified file 'bzrlib/repository.py'
--- bzrlib/repository.py 2009-06-12 01:11:00 +0000
+++ bzrlib/repository.py 2009-06-16 02:38:08 +0000
@@ -563,7 +563,7 @@
563 :param iter_changes: An iter_changes iterator with the changes to apply563 :param iter_changes: An iter_changes iterator with the changes to apply
564 to basis_revision_id. The iterator must not include any items with564 to basis_revision_id. The iterator must not include any items with
565 a current kind of None - missing items must be either filtered out565 a current kind of None - missing items must be either filtered out
566 or errored-on beefore record_iter_changes sees the item.566 or errored-on before record_iter_changes sees the item.
567 :param _entry_factory: Private method to bind entry_factory locally for567 :param _entry_factory: Private method to bind entry_factory locally for
568 performance.568 performance.
569 :return: A generator of (file_id, relpath, fs_hash) tuples for use with569 :return: A generator of (file_id, relpath, fs_hash) tuples for use with
570570
=== modified file 'bzrlib/tests/test_osutils.py'
--- bzrlib/tests/test_osutils.py 2009-06-10 03:56:49 +0000
+++ bzrlib/tests/test_osutils.py 2009-06-16 02:38:09 +0000
@@ -860,6 +860,7 @@
860 self.assertRaises(errors.BzrError, osutils.splitpath, 'a/../b')860 self.assertRaises(errors.BzrError, osutils.splitpath, 'a/../b')
861861
862862
863<<<<<<< TREE
863class TestParentDirectories(tests.TestCaseInTempDir):864class TestParentDirectories(tests.TestCaseInTempDir):
864 """Test osutils.parent_directories()"""865 """Test osutils.parent_directories()"""
865866
@@ -869,6 +870,20 @@
869 self.assertEqual(['a/b', 'a'], osutils.parent_directories('a/b/c'))870 self.assertEqual(['a/b', 'a'], osutils.parent_directories('a/b/c'))
870871
871872
873=======
874class TestParentDirectories(tests.TestCaseInTempDir):
875 """Test osutils.parent_directories()"""
876
877 def test_parent_directories(self):
878 def check(expected, path):
879 self.assertEqual(expected, osutils.parent_directories(path))
880
881 check([], 'a')
882 check(['a'], 'a/b')
883 check(['a/b', 'a'], 'a/b/c')
884
885
886>>>>>>> MERGE-SOURCE
872class TestMacFuncsDirs(tests.TestCaseInTempDir):887class TestMacFuncsDirs(tests.TestCaseInTempDir):
873 """Test mac special functions that require directories."""888 """Test mac special functions that require directories."""
874889
875890
=== modified file 'bzrlib/tests/test_tree.py'
--- bzrlib/tests/test_tree.py 2009-03-23 14:59:43 +0000
+++ bzrlib/tests/test_tree.py 2009-06-16 02:38:09 +0000
@@ -23,7 +23,7 @@
23 tree as _mod_tree,23 tree as _mod_tree,
24 )24 )
25from bzrlib.tests import TestCaseWithTransport25from bzrlib.tests import TestCaseWithTransport
26from bzrlib.tree import InterTree26from bzrlib.tree import InterTree, filter_iter_changes_by_paths
2727
2828
29class TestInterTree(TestCaseWithTransport):29class TestInterTree(TestCaseWithTransport):
@@ -417,3 +417,84 @@
417 self.assertPathToKey(([u''], u'a'), u'a')417 self.assertPathToKey(([u''], u'a'), u'a')
418 self.assertPathToKey(([u'a'], u'b'), u'a/b')418 self.assertPathToKey(([u'a'], u'b'), u'a/b')
419 self.assertPathToKey(([u'a', u'b'], u'c'), u'a/b/c')419 self.assertPathToKey(([u'a', u'b'], u'c'), u'a/b/c')
420
421
422class TestFilterIterChangesByPaths(TestCaseWithTransport):
423 """Tests for tree.filter_iter_changes_by_paths()."""
424
425 def make_changes_iter(self):
426 wt = self.make_branch_and_tree('.')
427 b = wt.branch
428 self.build_tree(['foo/', 'foo/foo1', 'bar/', 'bar/bar1', 'bar/bar2',
429 'baz'])
430 wt.add(['foo', 'foo/foo1', 'bar', 'bar/bar1', 'bar/bar2', 'baz'],
431 ['foo-id', 'foo1-id', 'bar-id', 'bar1-id', 'bar2-id', 'baz-id'])
432 wt.commit('bar/bar1', specific_files=['bar/bar1'], rev_id='1')
433 basis = wt.basis_tree()
434 wt.lock_read()
435 self.addCleanup(wt.unlock)
436 basis.lock_read()
437 self.addCleanup(basis.unlock)
438 return wt.iter_changes(basis)
439
440 def test_no_filtering(self):
441 original = self.make_changes_iter()
442 original_list = list(original)
443 filtered = filter_iter_changes_by_paths(iter(original_list))
444 self.assertEqual(original_list, list(filtered))
445
446 def test_include_files(self):
447 original = self.make_changes_iter()
448 filtered = filter_iter_changes_by_paths(original,
449 include_files=['bar'])
450 self.assertEqual([
451 ('bar2-id', (None, u'bar/bar2'), True, (False, True),
452 (None, 'bar-id'), (None, u'bar2'), (None, 'file'), (None, 0)),
453 ], list(filtered))
454
455 def test_include_files_yielding_changed_parents(self):
456 original = self.make_changes_iter()
457 filtered = filter_iter_changes_by_paths(original,
458 include_files=['bar'], yield_changed_parents=True)
459 self.assertEqual([
460 ('bar2-id', (None, u'bar/bar2'), True, (False, True),
461 (None, 'bar-id'), (None, u'bar2'), (None, 'file'), (None, 0)),
462 ], list(filtered))
463
464 def test_include_files_yielding_changed_parents2(self):
465 original = self.make_changes_iter()
466 filtered = filter_iter_changes_by_paths(original,
467 include_files=['foo'], yield_changed_parents=True)
468 self.assertEqual([
469 ('foo-id', (None, u'foo'), True, (False, True),
470 (None, 'TREE_ROOT'), (None, u'foo'), (None, 'directory'), (None, 0)),
471 ('foo1-id', (None, u'foo/foo1'), True, (False, True),
472 (None, 'foo-id'), (None, u'foo1'), (None, 'file'), (None, 0)),
473 ], list(filtered))
474
475 def test_exclude_files(self):
476 original = self.make_changes_iter()
477 filtered = filter_iter_changes_by_paths(original,
478 exclude_files=['foo'])
479 self.assertEqual([
480 ('baz-id', (None, u'baz'), True, (False, True),
481 (None, 'TREE_ROOT'), (None, u'baz'), (None, 'file'), (None, 0)),
482 ('bar2-id', (None, u'bar/bar2'), True, (False, True),
483 (None, 'bar-id'), (None, u'bar2'), (None, 'file'), (None, 0)),
484 ], list(filtered))
485 pass
486
487 def test_excludes_override_includes(self):
488 original = self.make_changes_iter()
489 filtered = filter_iter_changes_by_paths(original,
490 include_files=['bar/bar2'], exclude_files=['bar'])
491 self.assertEqual([], list(filtered))
492
493 def test_excludes_override_includes2(self):
494 original = self.make_changes_iter()
495 filtered = filter_iter_changes_by_paths(original,
496 include_files=['foo'], exclude_files=['foo/foo1'])
497 self.assertEqual([
498 ('foo-id', (None, u'foo'), True, (False, True),
499 (None, 'TREE_ROOT'), (None, u'foo'), (None, 'directory'), (None, 0)),
500 ], list(filtered))
420501
=== modified file 'bzrlib/tree.py'
--- bzrlib/tree.py 2009-06-10 03:56:49 +0000
+++ bzrlib/tree.py 2009-06-16 02:38:09 +0000
@@ -37,7 +37,7 @@
37from bzrlib import errors37from bzrlib import errors
38from bzrlib.inventory import Inventory, InventoryFile38from bzrlib.inventory import Inventory, InventoryFile
39from bzrlib.inter import InterObject39from bzrlib.inter import InterObject
40from bzrlib.osutils import fingerprint_file40from bzrlib.osutils import fingerprint_file, is_inside_any
41import bzrlib.revision41import bzrlib.revision
42from bzrlib.symbol_versioning import deprecated_function, deprecated_in42from bzrlib.symbol_versioning import deprecated_function, deprecated_in
43from bzrlib.trace import mutter, note43from bzrlib.trace import mutter, note
@@ -1293,3 +1293,59 @@
1293 other_values.append(self._lookup_by_file_id(1293 other_values.append(self._lookup_by_file_id(
1294 alt_extra, alt_tree, file_id))1294 alt_extra, alt_tree, file_id))
1295 yield other_path, file_id, None, other_values1295 yield other_path, file_id, None, other_values
1296
1297
1298def filter_iter_changes_by_paths(changes, include_files=None,
1299 exclude_files=None, yield_changed_parents=False):
1300 """Filter the results from iter_changes using paths.
1301
1302 This decorator is useful for post-processing the output from
1303 Tree.iter_changes(). It may also be used for post-procesing the output
1304 from result-compatible methods such as Inventory.iter_changes() and
1305 PreviewTree.iter_changes(). Note that specific-path filtering should not
1306 have already been applied.
1307
1308 :param changes: the iterator of changes to filter
1309 :param include_files: paths of files and directories to include or None
1310 for no masking.
1311 :param exclude_files: paths of files and directories to exclude or None
1312 for no masking. Excludes take precedence over includes.
1313 :param yield_changed_parents: if True, include changed parent directories
1314 of included paths.
1315 """
1316 if not (include_files or exclude_files):
1317 for change in changes:
1318 yield change
1319
1320 # Find the sets of files to include, additional parents and excludes,
1321 # always including the root in the parents
1322 includes_parents = set([''])
1323 if include_files:
1324 include_set = osutils.minimum_path_selection(include_files)
1325 if yield_changed_parents:
1326 for include in include_set:
1327 for parent in osutils.parent_directories(include):
1328 if parent not in include_set:
1329 includes_parents.add(parent)
1330 else:
1331 include_set = set([''])
1332 if exclude_files:
1333 exclude_set = osutils.minimum_path_selection(exclude_files)
1334 else:
1335 exclude_set = set([])
1336
1337 for change in changes:
1338 # Decide which path to use
1339 old_path, new_path = change[1]
1340 if new_path is None:
1341 path = old_path
1342 else:
1343 path = new_path
1344
1345 # Do the filtering
1346 if exclude_set and is_inside_any(exclude_set, path):
1347 continue
1348 elif is_inside_any(include_set, path):
1349 yield change
1350 elif yield_changed_parents and path in includes_parents:
1351 yield change