Bazaar

Merge lp:~ian-clatworthy/bzr/eol-update-bug into lp:bzr/2.0

eol-update-bug
Merge into 2.0

Proposed by Ian Clatworthy on 2009-09-01

Status:

Merged

Merged at revision:

not available

Proposed branch:

lp:~ian-clatworthy/bzr/eol-update-bug

Merge into:

lp:bzr/2.0

Diff against target:

325 lines

To merge this branch:

bzr merge lp:~ian-clatworthy/bzr/eol-update-bug

High

Fix Released

Link a bug report

Reviewer	Review Type	Date Requested	Status
Robert Collins (community)		2009-09-01	Needs Fixing on 2009-09-02
Review via email: mp+10959@code.launchpad.net

Revision history for this message

Ian Clatworthy (ian-clatworthy) wrote on 2009-09-01:

This patch fixes content filtering so that it works after most operations, not just after the initial branch/checkout. There were 2 code paths in transform.py that needed enhancing to check for content filters: one used by 'merging' and one used by 'reverting'. The merging code path is used by lots of operations including merge, pull, update and switch.

I've added several 'blackbox-style' tests as part of this patch. It probably wouldn't hurt to add some more for other operations, e.g. shelve/unshelve, though I'm optimistic that the 2 code paths above will cover 99% of them.

In any case, I'm submitting this for review now because:

1. Without these code changes, content filtering is effectively broken for most users.

2. The 2.0 code freeze is really close and I want to maximise the time available for review of what's fixed/tested already.

3. Those tests could come in a latter patch (after there's agreement on whether blackbox-style tests are acceptable here or not).

BTW, the bug report also mentions that after committing a tree with keywords expanded, the keywords aren't re-expanded after commit. I'm going to work on that issue now, either by adding a post-commit hook to the keywords plugin or by changing commit. The latter approach implies always paying a performance cost whether it's required or not so I'll try the post-commit approach first. Either way, I feel it's best to keep that issue out of scope of this patch.

Revision history for this message

Robert Collins (lifeless) wrote on 2009-09-02:

>> BTW, the bug report also mentions that after committing a tree with keywords expanded, the keywords aren't re-expanded after commit.

Commit does not write to the users tree : a post commit hook isn't appropriate [for starters, other post commit hooks could interfere/see convenience forms themselves etc]. I suggest looking closely for the cause/some possible confusion is going on here.

Revision history for this message

Robert Collins (lifeless) wrote on 2009-09-02:

I'm trying to channel Aaron here, and I don't know the transform code *all that well*. However, what you've done looks wrong to me.

Specifically, you look up a path in one tree, and use that to select filters in a different tree (this is at the very top of your patch).

This has the problem that when a containing directory is renamed in this tree, it will be matching on a different path than actually will be used.

Secondly, as filters are defined as being path based, *every* change to a transform that can cause a change to the path of something that can be filtered needs to recalculate the filter stack and possibly reapply it, for all those somethings. E.g. renaming a directory foo/baz to bar/baz, and foo was filtered and bar isn't.

I think the change to the API to pass in a path that is able to be wrong reduces clarity and shouldn't be done. We can probably get by (with a critical bug to be fixed asap after the release) with out updating filters appropriately.

review: Needs Fixing

Revision history for this message

Ian Clatworthy (ian-clatworthy) wrote on 2009-09-03:

Robert Collins wrote:
> Review: Needs Fixing

Thanks for the feedback. poolie is taking over this patch from here so I
can focus on the Windows installer. I'll let him look deeper but some
quick comments on your first point ...

> Specifically, you look up a path in one tree, and use that to select filters in a different tree (this is at the very top of your patch).

The patch mightn't be quite right. I can say however that this cross
tree stuff is tricky. Keep in mind that:

* in the case of reverting a removed file, its path doesn't exist
in the working tree so the only place to find that is the rev-tree

* rules aren't stored historically so, IIRC, revtrees don't know
what they are - only the WT knows that.

So looking up a path in the revtree and using the filters from the WT
may indeed be correct, in the case of revert at least.

Merging has similar challenges. A new path may only exist in the OTHER
tree say but the filters applied need to come from the working tree
being merged into.

Ian C.

Revision history for this message

Ian Clatworthy (ian-clatworthy) wrote on 2009-09-03:

Robert Collins wrote:
>>> BTW, the bug report also mentions that after committing a tree with keywords expanded, the keywords aren't re-expanded after commit.
>
> Commit does not write to the users tree : a post commit hook isn't appropriate [for starters, other post commit hooks could interfere/see convenience forms themselves etc]. I suggest looking closely for the cause/some possible confusion is going on here.

Plugins like keywords do need to update the working tree, if any, after
a commit so that files just committed get keywords values updated. As
the keywords plugin is currently designed, the only files needing
potential update are the newly committed ones.

I agree a post commit hook isn't the answer. I've put a separate patch
up for a finish-commit hook on mutable trees instead.

Ian C.

Revision history for this message

Robert Collins (lifeless) wrote on 2009-09-03:

On Thu, 2009-09-03 at 00:57 +0000, Ian Clatworthy wrote:
>
> > Commit does not write to the users tree : a post commit hook isn't
> appropriate [for starters, other post commit hooks could interfere/see
> convenience forms themselves etc]. I suggest looking closely for the
> cause/some possible confusion is going on here.
>
> Plugins like keywords do need to update the working tree, if any,
> after
> a commit so that files just committed get keywords values updated. As
> the keywords plugin is currently designed, the only files needing
> potential update are the newly committed ones.
>
> I agree a post commit hook isn't the answer. I've put a separate patch
> up for a finish-commit hook on mutable trees instead.

Oh, that use case wasn't clear - I didn't get what was needed.

Branch.post_commit definitely isn't the right thing as its on branch.

There may be room for MutableTree.post_commit too - the difference being
whether or not <activity> should occur before the basis revision id is
changed.

So, the question to ask is 'should keywords update the files *after* the
basis changes, or *before*. I think the answer is *after*, but you've
spent more time staring at this problem: its up to you.

I do suggest that the hook name try to reflect where in the process its
happening, and if its after commit has done its stuff, its really just a
post commit hook on a different object, so lets use the namespaces and
call it MT.post_commit

-Rob

Revision history for this message

Robert Collins (lifeless) wrote on 2009-09-03:

On Thu, 2009-09-03 at 00:54 +0000, Ian Clatworthy wrote:
>
> The patch mightn't be quite right. I can say however that this cross
> tree stuff is tricky. Keep in mind that:
>
> * in the case of reverting a removed file, its path doesn't exist
> in the working tree so the only place to find that is the rev-tree

It has a path that it will end up with in the output tree; thats the one
that should be used, right? Paths in other trees are a poor proxy for
that.

> So looking up a path in the revtree and using the filters from the WT
> may indeed be correct, in the case of revert at least.

I argue that it would only be accurate in the most trivial of cases;
solving the primary problem - that a transform may need to reapply
filters some N times - will solve the other cases as well.

> Merging has similar challenges. A new path may only exist in the OTHER
> tree say but the filters applied need to come from the working tree
> being merged into.

Right, its exactly what I'm driving at.

-Rob

Revision history for this message

Martin Pool (mbp) wrote on 2009-10-08:

I don't think this is safe for 2.0 but it might be ok for 2.1b1

280 +def create_from_tree(tt, trans_id, tree, file_id, bytes=None,
281 + filter_tree_path=None):
282 + """Create new file contents according to tree contents.
283 +
284 + :param filter_tree_path: the tree path to use to lookup
285 + content filters to apply to the bytes output in the working tree.
286 + This only applies if the working tree supports content filtering.
287 + """

I don't understand how that new parameter works:

Is it a single file? How can that work with tt working across a whole tree?

Revision history for this message

Ian Clatworthy (ian-clatworthy) wrote on 2009-10-27:

> I don't think this is safe for 2.0 but it might be ok for 2.1b1
>
> 280 +def create_from_tree(tt, trans_id, tree, file_id, bytes=None,
> 281 + filter_tree_path=None):
> 282 + """Create new file contents according to tree contents.
> 283 +
> 284 + :param filter_tree_path: the tree path to use to lookup
> 285 + content filters to apply to the bytes output in the working tree.
> 286 + This only applies if the working tree supports content filtering.
> 287 + """
>
> I don't understand how that new parameter works:
>
> Is it a single file? How can that work with tt working across a whole tree?

This API is called per file-id, it's not called per tree.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Alexander Belchenko

Bazaar Codereview Subscribers

Benoit Pierre

Ian Clatworthy

Martin Pool

Matt Nordhoff

bzr PQM

pascalprost

Bazaar

Merge lp:~ian-clatworthy/bzr/eol-update-bug into lp:bzr/2.0

Commit message

Description of the change

Preview Diff

Subscribers

 === modified file 'NEWS'
 --- NEWS	2009-08-31 00:25:33 +0000
 +++ NEWS	2009-09-01 06:35:12 +0000
@@ -20,6 +20,9 @@
    revisions that are in the fallback repository. (Regressed in 2.0rc1).
    (John Arbash Meinel, #419241)
++* Content filters are now applied correctly after pull, merge, switch
++  and revert. (Ian Clatworthy, #385879)
++
  * Fix a segmentation fault when computing the ``merge_sort`` of a graph
    that has a ghost in the mainline ancestry.
    (John Arbash Meinel, #419241)
 === modified file 'bzrlib/merge.py'
 --- bzrlib/merge.py	2009-08-26 03:20:32 +0000
 +++ bzrlib/merge.py	2009-09-01 06:35:12 +0000
@@ -1160,8 +1160,15 @@
                  self.tt.delete_contents(trans_id)
              if file_id in self.other_tree:
                  # OTHER changed the file
++                wt = self.this_tree
++                if wt.supports_content_filtering():
++                    filter_tree_path = self.other_tree.id2path(file_id)
++                else:
++                    # Skip the id2path lookup for older formats
++                    filter_tree_path = None
                  create_from_tree(self.tt, trans_id,
--                                 self.other_tree, file_id)
++                                 self.other_tree, file_id,
++                                 filter_tree_path=filter_tree_path)
                  if not file_in_this:
                      self.tt.version_file(file_id, trans_id)
                  return "modified"
@@ -1254,12 +1261,26 @@
                  ('THIS', self.this_tree, this_lines)]
          if not no_base:
              data.append(('BASE', self.base_tree, base_lines))
++
++        # We need to use the actual path in the working tree of the file here,
++        # ignoring the conflict suffixes
++        wt = self.this_tree
++        if wt.supports_content_filtering():
++            try:
++                filter_tree_path = wt.id2path(file_id)
++            except errors.NoSuchId:
++                # file has been deleted
++                filter_tree_path = None
++        else:
++            # Skip the id2path lookup for older formats
++            filter_tree_path = None
++
          versioned = False
          file_group = []
          for suffix, tree, lines in data:
              if file_id in tree:
                  trans_id = self._conflict_file(name, parent_id, tree, file_id,
--                                               suffix, lines)
++                                               suffix, lines, filter_tree_path)
                  file_group.append(trans_id)
                  if set_version and not versioned:
                      self.tt.version_file(file_id, trans_id)
@@ -1267,11 +1288,12 @@
          return file_group
      def _conflict_file(self, name, parent_id, tree, file_id, suffix,
--                       lines=None):
++                       lines=None, filter_tree_path=None):
          """Emit a single conflict file."""
          name = name + '.' + suffix
          trans_id = self.tt.create_path(name, parent_id)
--        create_from_tree(self.tt, trans_id, tree, file_id, lines)
++        create_from_tree(self.tt, trans_id, tree, file_id, lines,
++            filter_tree_path)
          return trans_id
      def merge_executable(self, file_id, file_status):
 === modified file 'bzrlib/tests/per_workingtree/test_content_filters.py'
 --- bzrlib/tests/per_workingtree/test_content_filters.py	2009-08-25 04:43:21 +0000
 +++ bzrlib/tests/per_workingtree/test_content_filters.py	2009-09-01 06:35:12 +0000
@@ -16,7 +16,11 @@
  """Tests for content filtering conformance"""
++import os
++
++from bzrlib.bzrdir import BzrDir
  from bzrlib.filters import ContentFilter
++from bzrlib.switch import switch
  from bzrlib.workingtree import WorkingTree
  from bzrlib.tests.per_workingtree import TestCaseWithWorkingTree
@@ -64,7 +68,8 @@
  class TestWorkingTreeWithContentFilters(TestCaseWithWorkingTree):
--    def create_cf_tree(self, txt_reader, txt_writer, dir='.'):
++    def create_cf_tree(self, txt_reader, txt_writer, dir='.',
++        two_revisions=False):
          tree = self.make_branch_and_tree(dir)
          def _content_filter_stack(path=None, file_id=None):
              if path.endswith('.txt'):
@@ -77,10 +82,37 @@
              (dir + '/file2.bin', 'Foo Bin')])
          tree.add(['file1.txt', 'file2.bin'])
          tree.commit('commit raw content')
++        # Commit another revision with changed text, if requested
++        if two_revisions:
++            self.build_tree_contents([(dir + '/file1.txt', 'Foo ROCKS!')])
++            tree.commit("changed file1.txt")
          txt_fileid = tree.path2id('file1.txt')
          bin_fileid = tree.path2id('file2.bin')
          return tree, txt_fileid, bin_fileid
++    def patch_in_content_filter(self):
++        # Patch in a custom, symmetric content filter stack
++        self.real_content_filter_stack = WorkingTree._content_filter_stack
++        def restore_real_content_filter_stack():
++            WorkingTree._content_filter_stack = self.real_content_filter_stack
++        self.addCleanup(restore_real_content_filter_stack)
++        def _content_filter_stack(tree, path=None, file_id=None):
++            if path.endswith('.txt'):
++                return [ContentFilter(_swapcase, _swapcase)]
++            else:
++                return []
++        WorkingTree._content_filter_stack = _content_filter_stack
++
++    def assert_basis_content(self, expected_content, branch, file_id):
++        # Note: We need to use try/finally here instead of addCleanup()
++        # as the latter leaves the read lock in place too long
++        basis = branch.basis_tree()
++        basis.lock_read()
++        try:
++            self.assertEqual(expected_content, basis.get_file_text(file_id))
++        finally:
++            basis.unlock()
++
      def test_symmetric_content_filtering(self):
          # test handling when read then write gives back the initial content
          tree, txt_fileid, bin_fileid = self.create_cf_tree(
@@ -132,10 +164,7 @@
          if not source.supports_content_filtering():
              return
          self.assertFileEqual("Foo Txt", 'source/file1.txt')
--        basis = source.basis_tree()
--        basis.lock_read()
--        self.addCleanup(basis.unlock)
--        self.assertEqual("FOO TXT", basis.get_file_text(txt_fileid))
++        self.assert_basis_content("FOO TXT", source, txt_fileid)
          # Now branch it
          self.run_bzr('branch source target')
@@ -153,24 +182,10 @@
          if not source.supports_content_filtering():
              return
          self.assertFileEqual("Foo Txt", 'source/file1.txt')
--        basis = source.basis_tree()
--        basis.lock_read()
--        self.addCleanup(basis.unlock)
--        self.assertEqual("Foo Txt", basis.get_file_text(txt_fileid))
--
--        # Patch in a custom, symmetric content filter stack
--        self.real_content_filter_stack = WorkingTree._content_filter_stack
--        def restore_real_content_filter_stack():
--            WorkingTree._content_filter_stack = self.real_content_filter_stack
--        self.addCleanup(restore_real_content_filter_stack)
--        def _content_filter_stack(tree, path=None, file_id=None):
--            if path.endswith('.txt'):
--                return [ContentFilter(_swapcase, _swapcase)]
--            else:
--                return []
--        WorkingTree._content_filter_stack = _content_filter_stack
--
--        # Now branch it
++        self.assert_basis_content("Foo Txt", source, txt_fileid)
++
++        # Now patch in content filtering and branch the source
++        self.patch_in_content_filter()
          self.run_bzr('branch source target')
          target = WorkingTree.open('target')
          # Even though the content in source and target are different
@@ -209,3 +224,86 @@
          # we could give back the length of the canonical form, but in general
          # that will be expensive to compute, so it's acceptable to just return
          # None.
++
++    def test_content_filtering_applied_on_pull(self):
++        # Create a source branch with two revisions
++        source, txt_fileid, bin_fileid = self.create_cf_tree(
++            txt_reader=None, txt_writer=None, dir='source', two_revisions=True)
++        if not source.supports_content_filtering():
++            return
++        self.assertFileEqual("Foo ROCKS!", 'source/file1.txt')
++        self.assert_basis_content("Foo ROCKS!", source, txt_fileid)
++
++        # Now patch in content filtering and branch from revision 1
++        self.patch_in_content_filter()
++        self.run_bzr('branch -r1 source target')
++        target = WorkingTree.open('target')
++        self.assertFileEqual("fOO tXT", 'target/file1.txt')
++        self.assert_basis_content("Foo Txt", target, txt_fileid)
++
++        # Pull the latter change and check the target tree is updated
++        self.run_bzr('pull -d target')
++        self.assertFileEqual("fOO rocks!", 'target/file1.txt')
++        self.assert_basis_content("Foo ROCKS!", target, txt_fileid)
++
++    def test_content_filtering_applied_on_merge(self):
++        # Create a source branch with two revisions
++        source, txt_fileid, bin_fileid = self.create_cf_tree(
++            txt_reader=None, txt_writer=None, dir='source', two_revisions=True)
++        if not source.supports_content_filtering():
++            return
++        self.assertFileEqual("Foo ROCKS!", 'source/file1.txt')
++        self.assert_basis_content("Foo ROCKS!", source, txt_fileid)
++
++        # Now patch in content filtering and branch from revision 1
++        self.patch_in_content_filter()
++        self.run_bzr('branch -r1 source target')
++        target = WorkingTree.open('target')
++        self.assertFileEqual("fOO tXT", 'target/file1.txt')
++        self.assert_basis_content("Foo Txt", target, txt_fileid)
++
++        # Merge the latter change and check the target tree is updated
++        self.run_bzr('merge -d target source')
++        self.assertFileEqual("fOO rocks!", 'target/file1.txt')
++
++        # Commit the merge and check the right content is stored
++        target.commit("merge file1.txt changes from source")
++        self.assert_basis_content("Foo ROCKS!", target, txt_fileid)
++
++    def test_content_filtering_applied_on_switch(self):
++        # Create a source branch with two revisions
++        source, txt_fileid, bin_fileid = self.create_cf_tree(
++            txt_reader=None, txt_writer=None, dir='branch-a', two_revisions=True)
++        if not source.supports_content_filtering():
++            return
++
++        # Now patch in content filtering and branch from revision 1
++        self.patch_in_content_filter()
++        self.run_bzr('branch -r1 branch-a branch-b')
++
++        # Now create a lightweight checkout referring to branch-b
++        self.run_bzr('checkout --lightweight branch-b checkout')
++        self.assertFileEqual("fOO tXT", 'checkout/file1.txt')
++
++        # Switch it to branch-b and check the tree is updated
++        checkout_control_dir = BzrDir.open_containing('checkout')[0]
++        switch(checkout_control_dir, source.branch)
++        self.assertFileEqual("fOO rocks!", 'checkout/file1.txt')
++
++    def test_content_filtering_applied_on_revert(self):
++        # Create a source branch with content filtering
++        source, txt_fileid, bin_fileid = self.create_cf_tree(
++            txt_reader=_uppercase, txt_writer=_lowercase, dir='source')
++        if not source.supports_content_filtering():
++            return
++        self.assertFileEqual("Foo Txt", 'source/file1.txt')
++        self.assert_basis_content("FOO TXT", source, txt_fileid)
++
++        # Now delete the file, revert it and check the content
++        os.unlink('source/file1.txt')
++        self.assertFalse(os.path.exists('source/file1.txt'))
++        source.revert(['file1.txt'])
++        self.assertTrue(os.path.exists('source/file1.txt'))
++        # Note: we don't get back exactly what was in the tree
++        # previously because lower(upper(text)) is a lossy transformation
++        self.assertFileEqual("foo txt", 'source/file1.txt')
 === modified file 'bzrlib/transform.py'
 --- bzrlib/transform.py	2009-08-26 05:38:16 +0000
 +++ bzrlib/transform.py	2009-09-01 06:35:12 +0000
@@ -2402,8 +2402,14 @@
          tt.create_directory(trans_id)
--def create_from_tree(tt, trans_id, tree, file_id, bytes=None):
--    """Create new file contents according to tree contents."""
++def create_from_tree(tt, trans_id, tree, file_id, bytes=None,
++    filter_tree_path=None):
++    """Create new file contents according to tree contents.
++
++    :param filter_tree_path: the tree path to use to lookup
++      content filters to apply to the bytes output in the working tree.
++      This only applies if the working tree supports content filtering.
++    """
      kind = tree.kind(file_id)
      if kind == 'directory':
          tt.create_directory(trans_id)
@@ -2414,6 +2420,11 @@
                  bytes = tree_file.readlines()
              finally:
                  tree_file.close()
++        wt = tt._tree
++        if wt.supports_content_filtering() and filter_tree_path is not None:
++            filters = wt._content_filter_stack(filter_tree_path)
++            bytes = filtered_output_bytes(bytes, filters,
++                ContentFilterContext(filter_tree_path, tree))
          tt.create_file(bytes, trans_id)
      elif kind == "symlink":
          tt.create_symlink(tree.get_symlink_target(file_id), trans_id)
@@ -2610,9 +2621,19 @@
                  tt.adjust_path(name[1], parent_trans, trans_id)
              if executable[0] != executable[1] and kind[1] == "file":
                  tt.set_executability(executable[1], trans_id)
--        for (trans_id, mode_id), bytes in target_tree.iter_files_bytes(
--            deferred_files):
--            tt.create_file(bytes, trans_id, mode_id)
++        if working_tree.supports_content_filtering():
++            for index, ((trans_id, mode_id), bytes) in enumerate(
++                target_tree.iter_files_bytes(deferred_files)):
++                file_id = deferred_files[index][0]
++                filter_tree_path = target_tree.id2path(file_id)
++                filters = working_tree._content_filter_stack(filter_tree_path)
++                bytes = filtered_output_bytes(bytes, filters,
++                    ContentFilterContext(filter_tree_path, working_tree))
++                tt.create_file(bytes, trans_id, mode_id)
++        else:
++            for (trans_id, mode_id), bytes in target_tree.iter_files_bytes(
++                deferred_files):
++                tt.create_file(bytes, trans_id, mode_id)
      finally:
          if basis_tree is not None:
              basis_tree.unlock()