Merge lp:~ian-clatworthy/bzr/eol-update-bug into lp:bzr/2.0

Proposed by Ian Clatworthy
Status: Merged
Merged at revision: not available
Proposed branch: lp:~ian-clatworthy/bzr/eol-update-bug
Merge into: lp:bzr/2.0
Diff against target: 325 lines
To merge this branch: bzr merge lp:~ian-clatworthy/bzr/eol-update-bug
Reviewer Review Type Date Requested Status
Robert Collins (community) Needs Fixing
Review via email: mp+10959@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote :

This patch fixes content filtering so that it works after most operations, not just after the initial branch/checkout. There were 2 code paths in transform.py that needed enhancing to check for content filters: one used by 'merging' and one used by 'reverting'. The merging code path is used by lots of operations including merge, pull, update and switch.

I've added several 'blackbox-style' tests as part of this patch. It probably wouldn't hurt to add some more for other operations, e.g. shelve/unshelve, though I'm optimistic that the 2 code paths above will cover 99% of them.

In any case, I'm submitting this for review now because:

1. Without these code changes, content filtering is effectively broken for most users.

2. The 2.0 code freeze is really close and I want to maximise the time available for review of what's fixed/tested already.

3. Those tests could come in a latter patch (after there's agreement on whether blackbox-style tests are acceptable here or not).

BTW, the bug report also mentions that after committing a tree with keywords expanded, the keywords aren't re-expanded after commit. I'm going to work on that issue now, either by adding a post-commit hook to the keywords plugin or by changing commit. The latter approach implies always paying a performance cost whether it's required or not so I'll try the post-commit approach first. Either way, I feel it's best to keep that issue out of scope of this patch.

Revision history for this message
Robert Collins (lifeless) wrote :

>> BTW, the bug report also mentions that after committing a tree with keywords expanded, the keywords aren't re-expanded after commit.

Commit does not write to the users tree : a post commit hook isn't appropriate [for starters, other post commit hooks could interfere/see convenience forms themselves etc]. I suggest looking closely for the cause/some possible confusion is going on here.

Revision history for this message
Robert Collins (lifeless) wrote :

I'm trying to channel Aaron here, and I don't know the transform code *all that well*. However, what you've done looks wrong to me.

Specifically, you look up a path in one tree, and use that to select filters in a different tree (this is at the very top of your patch).

This has the problem that when a containing directory is renamed in this tree, it will be matching on a different path than actually will be used.

Secondly, as filters are defined as being path based, *every* change to a transform that can cause a change to the path of something that can be filtered needs to recalculate the filter stack and possibly reapply it, for all those somethings. E.g. renaming a directory foo/baz to bar/baz, and foo was filtered and bar isn't.

I think the change to the API to pass in a path that is able to be wrong reduces clarity and shouldn't be done. We can probably get by (with a critical bug to be fixed asap after the release) with out updating filters appropriately.

review: Needs Fixing
Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote :

Robert Collins wrote:
> Review: Needs Fixing

Thanks for the feedback. poolie is taking over this patch from here so I
can focus on the Windows installer. I'll let him look deeper but some
quick comments on your first point ...

> Specifically, you look up a path in one tree, and use that to select filters in a different tree (this is at the very top of your patch).

The patch mightn't be quite right. I can say however that this cross
tree stuff is tricky. Keep in mind that:

* in the case of reverting a removed file, its path doesn't exist
  in the working tree so the only place to find that is the rev-tree

* rules aren't stored historically so, IIRC, revtrees don't know
  what they are - only the WT knows that.

So looking up a path in the revtree and using the filters from the WT
may indeed be correct, in the case of revert at least.

Merging has similar challenges. A new path may only exist in the OTHER
tree say but the filters applied need to come from the working tree
being merged into.

Ian C.

Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote :

Robert Collins wrote:
>>> BTW, the bug report also mentions that after committing a tree with keywords expanded, the keywords aren't re-expanded after commit.
>
> Commit does not write to the users tree : a post commit hook isn't appropriate [for starters, other post commit hooks could interfere/see convenience forms themselves etc]. I suggest looking closely for the cause/some possible confusion is going on here.

Plugins like keywords do need to update the working tree, if any, after
a commit so that files just committed get keywords values updated. As
the keywords plugin is currently designed, the only files needing
potential update are the newly committed ones.

I agree a post commit hook isn't the answer. I've put a separate patch
up for a finish-commit hook on mutable trees instead.

Ian C.

Revision history for this message
Robert Collins (lifeless) wrote :

On Thu, 2009-09-03 at 00:57 +0000, Ian Clatworthy wrote:
>
> > Commit does not write to the users tree : a post commit hook isn't
> appropriate [for starters, other post commit hooks could interfere/see
> convenience forms themselves etc]. I suggest looking closely for the
> cause/some possible confusion is going on here.
>
> Plugins like keywords do need to update the working tree, if any,
> after
> a commit so that files just committed get keywords values updated. As
> the keywords plugin is currently designed, the only files needing
> potential update are the newly committed ones.
>
> I agree a post commit hook isn't the answer. I've put a separate patch
> up for a finish-commit hook on mutable trees instead.

Oh, that use case wasn't clear - I didn't get what was needed.

Branch.post_commit definitely isn't the right thing as its on branch.

There may be room for MutableTree.post_commit too - the difference being
whether or not <activity> should occur before the basis revision id is
changed.

So, the question to ask is 'should keywords update the files *after* the
basis changes, or *before*. I think the answer is *after*, but you've
spent more time staring at this problem: its up to you.

I do suggest that the hook name try to reflect where in the process its
happening, and if its after commit has done its stuff, its really just a
post commit hook on a different object, so lets use the namespaces and
call it MT.post_commit

-Rob

Revision history for this message
Robert Collins (lifeless) wrote :

On Thu, 2009-09-03 at 00:54 +0000, Ian Clatworthy wrote:
>
> The patch mightn't be quite right. I can say however that this cross
> tree stuff is tricky. Keep in mind that:
>
> * in the case of reverting a removed file, its path doesn't exist
> in the working tree so the only place to find that is the rev-tree

It has a path that it will end up with in the output tree; thats the one
that should be used, right? Paths in other trees are a poor proxy for
that.

> So looking up a path in the revtree and using the filters from the WT
> may indeed be correct, in the case of revert at least.

I argue that it would only be accurate in the most trivial of cases;
solving the primary problem - that a transform may need to reapply
filters some N times - will solve the other cases as well.

> Merging has similar challenges. A new path may only exist in the OTHER
> tree say but the filters applied need to come from the working tree
> being merged into.

Right, its exactly what I'm driving at.

-Rob

Revision history for this message
Martin Pool (mbp) wrote :

I don't think this is safe for 2.0 but it might be ok for 2.1b1

280 +def create_from_tree(tt, trans_id, tree, file_id, bytes=None,
281 + filter_tree_path=None):
282 + """Create new file contents according to tree contents.
283 +
284 + :param filter_tree_path: the tree path to use to lookup
285 + content filters to apply to the bytes output in the working tree.
286 + This only applies if the working tree supports content filtering.
287 + """

I don't understand how that new parameter works:

Is it a single file? How can that work with tt working across a whole tree?

Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote :

> I don't think this is safe for 2.0 but it might be ok for 2.1b1
>
> 280 +def create_from_tree(tt, trans_id, tree, file_id, bytes=None,
> 281 + filter_tree_path=None):
> 282 + """Create new file contents according to tree contents.
> 283 +
> 284 + :param filter_tree_path: the tree path to use to lookup
> 285 + content filters to apply to the bytes output in the working tree.
> 286 + This only applies if the working tree supports content filtering.
> 287 + """
>
> I don't understand how that new parameter works:
>
> Is it a single file? How can that work with tt working across a whole tree?

This API is called per file-id, it's not called per tree.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'NEWS'
2--- NEWS 2009-08-31 00:25:33 +0000
3+++ NEWS 2009-09-01 06:35:12 +0000
4@@ -20,6 +20,9 @@
5 revisions that are in the fallback repository. (Regressed in 2.0rc1).
6 (John Arbash Meinel, #419241)
7
8+* Content filters are now applied correctly after pull, merge, switch
9+ and revert. (Ian Clatworthy, #385879)
10+
11 * Fix a segmentation fault when computing the ``merge_sort`` of a graph
12 that has a ghost in the mainline ancestry.
13 (John Arbash Meinel, #419241)
14
15=== modified file 'bzrlib/merge.py'
16--- bzrlib/merge.py 2009-08-26 03:20:32 +0000
17+++ bzrlib/merge.py 2009-09-01 06:35:12 +0000
18@@ -1160,8 +1160,15 @@
19 self.tt.delete_contents(trans_id)
20 if file_id in self.other_tree:
21 # OTHER changed the file
22+ wt = self.this_tree
23+ if wt.supports_content_filtering():
24+ filter_tree_path = self.other_tree.id2path(file_id)
25+ else:
26+ # Skip the id2path lookup for older formats
27+ filter_tree_path = None
28 create_from_tree(self.tt, trans_id,
29- self.other_tree, file_id)
30+ self.other_tree, file_id,
31+ filter_tree_path=filter_tree_path)
32 if not file_in_this:
33 self.tt.version_file(file_id, trans_id)
34 return "modified"
35@@ -1254,12 +1261,26 @@
36 ('THIS', self.this_tree, this_lines)]
37 if not no_base:
38 data.append(('BASE', self.base_tree, base_lines))
39+
40+ # We need to use the actual path in the working tree of the file here,
41+ # ignoring the conflict suffixes
42+ wt = self.this_tree
43+ if wt.supports_content_filtering():
44+ try:
45+ filter_tree_path = wt.id2path(file_id)
46+ except errors.NoSuchId:
47+ # file has been deleted
48+ filter_tree_path = None
49+ else:
50+ # Skip the id2path lookup for older formats
51+ filter_tree_path = None
52+
53 versioned = False
54 file_group = []
55 for suffix, tree, lines in data:
56 if file_id in tree:
57 trans_id = self._conflict_file(name, parent_id, tree, file_id,
58- suffix, lines)
59+ suffix, lines, filter_tree_path)
60 file_group.append(trans_id)
61 if set_version and not versioned:
62 self.tt.version_file(file_id, trans_id)
63@@ -1267,11 +1288,12 @@
64 return file_group
65
66 def _conflict_file(self, name, parent_id, tree, file_id, suffix,
67- lines=None):
68+ lines=None, filter_tree_path=None):
69 """Emit a single conflict file."""
70 name = name + '.' + suffix
71 trans_id = self.tt.create_path(name, parent_id)
72- create_from_tree(self.tt, trans_id, tree, file_id, lines)
73+ create_from_tree(self.tt, trans_id, tree, file_id, lines,
74+ filter_tree_path)
75 return trans_id
76
77 def merge_executable(self, file_id, file_status):
78
79=== modified file 'bzrlib/tests/per_workingtree/test_content_filters.py'
80--- bzrlib/tests/per_workingtree/test_content_filters.py 2009-08-25 04:43:21 +0000
81+++ bzrlib/tests/per_workingtree/test_content_filters.py 2009-09-01 06:35:12 +0000
82@@ -16,7 +16,11 @@
83
84 """Tests for content filtering conformance"""
85
86+import os
87+
88+from bzrlib.bzrdir import BzrDir
89 from bzrlib.filters import ContentFilter
90+from bzrlib.switch import switch
91 from bzrlib.workingtree import WorkingTree
92 from bzrlib.tests.per_workingtree import TestCaseWithWorkingTree
93
94@@ -64,7 +68,8 @@
95
96 class TestWorkingTreeWithContentFilters(TestCaseWithWorkingTree):
97
98- def create_cf_tree(self, txt_reader, txt_writer, dir='.'):
99+ def create_cf_tree(self, txt_reader, txt_writer, dir='.',
100+ two_revisions=False):
101 tree = self.make_branch_and_tree(dir)
102 def _content_filter_stack(path=None, file_id=None):
103 if path.endswith('.txt'):
104@@ -77,10 +82,37 @@
105 (dir + '/file2.bin', 'Foo Bin')])
106 tree.add(['file1.txt', 'file2.bin'])
107 tree.commit('commit raw content')
108+ # Commit another revision with changed text, if requested
109+ if two_revisions:
110+ self.build_tree_contents([(dir + '/file1.txt', 'Foo ROCKS!')])
111+ tree.commit("changed file1.txt")
112 txt_fileid = tree.path2id('file1.txt')
113 bin_fileid = tree.path2id('file2.bin')
114 return tree, txt_fileid, bin_fileid
115
116+ def patch_in_content_filter(self):
117+ # Patch in a custom, symmetric content filter stack
118+ self.real_content_filter_stack = WorkingTree._content_filter_stack
119+ def restore_real_content_filter_stack():
120+ WorkingTree._content_filter_stack = self.real_content_filter_stack
121+ self.addCleanup(restore_real_content_filter_stack)
122+ def _content_filter_stack(tree, path=None, file_id=None):
123+ if path.endswith('.txt'):
124+ return [ContentFilter(_swapcase, _swapcase)]
125+ else:
126+ return []
127+ WorkingTree._content_filter_stack = _content_filter_stack
128+
129+ def assert_basis_content(self, expected_content, branch, file_id):
130+ # Note: We need to use try/finally here instead of addCleanup()
131+ # as the latter leaves the read lock in place too long
132+ basis = branch.basis_tree()
133+ basis.lock_read()
134+ try:
135+ self.assertEqual(expected_content, basis.get_file_text(file_id))
136+ finally:
137+ basis.unlock()
138+
139 def test_symmetric_content_filtering(self):
140 # test handling when read then write gives back the initial content
141 tree, txt_fileid, bin_fileid = self.create_cf_tree(
142@@ -132,10 +164,7 @@
143 if not source.supports_content_filtering():
144 return
145 self.assertFileEqual("Foo Txt", 'source/file1.txt')
146- basis = source.basis_tree()
147- basis.lock_read()
148- self.addCleanup(basis.unlock)
149- self.assertEqual("FOO TXT", basis.get_file_text(txt_fileid))
150+ self.assert_basis_content("FOO TXT", source, txt_fileid)
151
152 # Now branch it
153 self.run_bzr('branch source target')
154@@ -153,24 +182,10 @@
155 if not source.supports_content_filtering():
156 return
157 self.assertFileEqual("Foo Txt", 'source/file1.txt')
158- basis = source.basis_tree()
159- basis.lock_read()
160- self.addCleanup(basis.unlock)
161- self.assertEqual("Foo Txt", basis.get_file_text(txt_fileid))
162-
163- # Patch in a custom, symmetric content filter stack
164- self.real_content_filter_stack = WorkingTree._content_filter_stack
165- def restore_real_content_filter_stack():
166- WorkingTree._content_filter_stack = self.real_content_filter_stack
167- self.addCleanup(restore_real_content_filter_stack)
168- def _content_filter_stack(tree, path=None, file_id=None):
169- if path.endswith('.txt'):
170- return [ContentFilter(_swapcase, _swapcase)]
171- else:
172- return []
173- WorkingTree._content_filter_stack = _content_filter_stack
174-
175- # Now branch it
176+ self.assert_basis_content("Foo Txt", source, txt_fileid)
177+
178+ # Now patch in content filtering and branch the source
179+ self.patch_in_content_filter()
180 self.run_bzr('branch source target')
181 target = WorkingTree.open('target')
182 # Even though the content in source and target are different
183@@ -209,3 +224,86 @@
184 # we could give back the length of the canonical form, but in general
185 # that will be expensive to compute, so it's acceptable to just return
186 # None.
187+
188+ def test_content_filtering_applied_on_pull(self):
189+ # Create a source branch with two revisions
190+ source, txt_fileid, bin_fileid = self.create_cf_tree(
191+ txt_reader=None, txt_writer=None, dir='source', two_revisions=True)
192+ if not source.supports_content_filtering():
193+ return
194+ self.assertFileEqual("Foo ROCKS!", 'source/file1.txt')
195+ self.assert_basis_content("Foo ROCKS!", source, txt_fileid)
196+
197+ # Now patch in content filtering and branch from revision 1
198+ self.patch_in_content_filter()
199+ self.run_bzr('branch -r1 source target')
200+ target = WorkingTree.open('target')
201+ self.assertFileEqual("fOO tXT", 'target/file1.txt')
202+ self.assert_basis_content("Foo Txt", target, txt_fileid)
203+
204+ # Pull the latter change and check the target tree is updated
205+ self.run_bzr('pull -d target')
206+ self.assertFileEqual("fOO rocks!", 'target/file1.txt')
207+ self.assert_basis_content("Foo ROCKS!", target, txt_fileid)
208+
209+ def test_content_filtering_applied_on_merge(self):
210+ # Create a source branch with two revisions
211+ source, txt_fileid, bin_fileid = self.create_cf_tree(
212+ txt_reader=None, txt_writer=None, dir='source', two_revisions=True)
213+ if not source.supports_content_filtering():
214+ return
215+ self.assertFileEqual("Foo ROCKS!", 'source/file1.txt')
216+ self.assert_basis_content("Foo ROCKS!", source, txt_fileid)
217+
218+ # Now patch in content filtering and branch from revision 1
219+ self.patch_in_content_filter()
220+ self.run_bzr('branch -r1 source target')
221+ target = WorkingTree.open('target')
222+ self.assertFileEqual("fOO tXT", 'target/file1.txt')
223+ self.assert_basis_content("Foo Txt", target, txt_fileid)
224+
225+ # Merge the latter change and check the target tree is updated
226+ self.run_bzr('merge -d target source')
227+ self.assertFileEqual("fOO rocks!", 'target/file1.txt')
228+
229+ # Commit the merge and check the right content is stored
230+ target.commit("merge file1.txt changes from source")
231+ self.assert_basis_content("Foo ROCKS!", target, txt_fileid)
232+
233+ def test_content_filtering_applied_on_switch(self):
234+ # Create a source branch with two revisions
235+ source, txt_fileid, bin_fileid = self.create_cf_tree(
236+ txt_reader=None, txt_writer=None, dir='branch-a', two_revisions=True)
237+ if not source.supports_content_filtering():
238+ return
239+
240+ # Now patch in content filtering and branch from revision 1
241+ self.patch_in_content_filter()
242+ self.run_bzr('branch -r1 branch-a branch-b')
243+
244+ # Now create a lightweight checkout referring to branch-b
245+ self.run_bzr('checkout --lightweight branch-b checkout')
246+ self.assertFileEqual("fOO tXT", 'checkout/file1.txt')
247+
248+ # Switch it to branch-b and check the tree is updated
249+ checkout_control_dir = BzrDir.open_containing('checkout')[0]
250+ switch(checkout_control_dir, source.branch)
251+ self.assertFileEqual("fOO rocks!", 'checkout/file1.txt')
252+
253+ def test_content_filtering_applied_on_revert(self):
254+ # Create a source branch with content filtering
255+ source, txt_fileid, bin_fileid = self.create_cf_tree(
256+ txt_reader=_uppercase, txt_writer=_lowercase, dir='source')
257+ if not source.supports_content_filtering():
258+ return
259+ self.assertFileEqual("Foo Txt", 'source/file1.txt')
260+ self.assert_basis_content("FOO TXT", source, txt_fileid)
261+
262+ # Now delete the file, revert it and check the content
263+ os.unlink('source/file1.txt')
264+ self.assertFalse(os.path.exists('source/file1.txt'))
265+ source.revert(['file1.txt'])
266+ self.assertTrue(os.path.exists('source/file1.txt'))
267+ # Note: we don't get back exactly what was in the tree
268+ # previously because lower(upper(text)) is a lossy transformation
269+ self.assertFileEqual("foo txt", 'source/file1.txt')
270
271=== modified file 'bzrlib/transform.py'
272--- bzrlib/transform.py 2009-08-26 05:38:16 +0000
273+++ bzrlib/transform.py 2009-09-01 06:35:12 +0000
274@@ -2402,8 +2402,14 @@
275 tt.create_directory(trans_id)
276
277
278-def create_from_tree(tt, trans_id, tree, file_id, bytes=None):
279- """Create new file contents according to tree contents."""
280+def create_from_tree(tt, trans_id, tree, file_id, bytes=None,
281+ filter_tree_path=None):
282+ """Create new file contents according to tree contents.
283+
284+ :param filter_tree_path: the tree path to use to lookup
285+ content filters to apply to the bytes output in the working tree.
286+ This only applies if the working tree supports content filtering.
287+ """
288 kind = tree.kind(file_id)
289 if kind == 'directory':
290 tt.create_directory(trans_id)
291@@ -2414,6 +2420,11 @@
292 bytes = tree_file.readlines()
293 finally:
294 tree_file.close()
295+ wt = tt._tree
296+ if wt.supports_content_filtering() and filter_tree_path is not None:
297+ filters = wt._content_filter_stack(filter_tree_path)
298+ bytes = filtered_output_bytes(bytes, filters,
299+ ContentFilterContext(filter_tree_path, tree))
300 tt.create_file(bytes, trans_id)
301 elif kind == "symlink":
302 tt.create_symlink(tree.get_symlink_target(file_id), trans_id)
303@@ -2610,9 +2621,19 @@
304 tt.adjust_path(name[1], parent_trans, trans_id)
305 if executable[0] != executable[1] and kind[1] == "file":
306 tt.set_executability(executable[1], trans_id)
307- for (trans_id, mode_id), bytes in target_tree.iter_files_bytes(
308- deferred_files):
309- tt.create_file(bytes, trans_id, mode_id)
310+ if working_tree.supports_content_filtering():
311+ for index, ((trans_id, mode_id), bytes) in enumerate(
312+ target_tree.iter_files_bytes(deferred_files)):
313+ file_id = deferred_files[index][0]
314+ filter_tree_path = target_tree.id2path(file_id)
315+ filters = working_tree._content_filter_stack(filter_tree_path)
316+ bytes = filtered_output_bytes(bytes, filters,
317+ ContentFilterContext(filter_tree_path, working_tree))
318+ tt.create_file(bytes, trans_id, mode_id)
319+ else:
320+ for (trans_id, mode_id), bytes in target_tree.iter_files_bytes(
321+ deferred_files):
322+ tt.create_file(bytes, trans_id, mode_id)
323 finally:
324 if basis_tree is not None:
325 basis_tree.unlock()

Subscribers

People subscribed via source and target branches