Merge lp:~vila/bzr/880701-resolve-duplicate into lp:bzr/2.4

Proposed by Vincent Ladeuil
Status: Merged
Approved by: Vincent Ladeuil
Approved revision: no longer in the source branch.
Merged at revision: 6059
Proposed branch: lp:~vila/bzr/880701-resolve-duplicate
Merge into: lp:bzr/2.4
Diff against target: 280 lines (+107/-35)
5 files modified
bzrlib/errors.py (+1/-1)
bzrlib/merge.py (+73/-32)
bzrlib/tests/test_conflicts.py (+20/-1)
bzrlib/transform.py (+1/-1)
doc/en/release-notes/bzr-2.4.txt (+12/-0)
To merge this branch: bzr merge lp:~vila/bzr/880701-resolve-duplicate
Reviewer Review Type Date Requested Status
Martin Packman (community) Approve
Review via email: mp+80489@code.launchpad.net

Commit message

Generate a 'duplicate' conflict instead of a 'content' conflict when the same path is involved in both branches.

Description of the change

td;lr: When a file is replaced by one with a new file-id and a merge is
attempted from a branch where the original file has been modified (or
vice-versa) a content conflict was created instead of duplicate one.

Hmm, even the short version is long isn't it ?

$ bzr init trunk ; cd trunk
$ echo 'foo' > file
$ bzr add ; bzr commit -m 'add file'
$ bzr rm file ; bzr commit -m 'delete file'
$ echo 'bar' > file
$ bzr add ; bzr commit -m 'add file again'

$ cd ..; bzr branch -r1 trunk feature; cd feature
$ echo 'baz' > file
$ bzr commit -m 'modify file'

$ cd ../trunk
$ bzr merge ../feature

At this point, two conflicts may created:
- 'content conflict' because file has been modified/deleted
- 'duplicate conflict' because file got a new file-id

The 'duplicate' one is more appropriate as it better reflects the resulting
situation. There is little point bothering the user about the changes made
on the other side and deleted on her side, the file has been replaced anyway
so either the user keeps the changes or take the new file which are the
choices provided after a 'duplicate' conflict.

The fix looks like black magic because the 'duplicate' conflicts are
generated later (in bzrlib.transform.conflict_pass).

I'm reasonably confident that this fixes one more 'malformed transform' bug
and get us closer to adressing some 'parallel imports' use cases.

To post a comment you must log in.
Revision history for this message
Vincent Ladeuil (vila) wrote :

My description was may be as obscure as the fix itself, so let's try again.

The merge process is roughly:

1 - establish an iter_changes() between THIS and OTHER,
2 - do merge_names/merge_content for each entry and build the corresponding tree transform,
3 - do _compute_transform, itself calling _finish_computing_transform which itself does:
3.1 - transform_resolve_conflicts
3.2 - cook_conflicts

All these steps (except for 1) create and transform the conflicts objects but some conflicts are handled only by some of them.

The 'content' conflicts are created during 2 but the 'duplicate' are created during 3. So basically either we:
- create the content conflicts and later replace them by duplicate conflicts
- avoid creating the content conflicts making sure the duplicate ones will be created later

I went with the later to avoid undoing the work done when creating a content conflict (helpers creation .THIS/.OTHER/.BASE, unversioning the THIS 'file', making 'file.OTHER' versioned, etc).

This way what happens is:

- 'file' is left as is in THIS
- 'file' (with a different filed-id) is merged from OTHER
- two 'file's want to be at the same path (i.e. a 'duplicate' conflict)
- the old 'file' becomes 'file.moved' and the one coming from OTHER keeps the 'file' name

I hope this helps better understand the fix.

Revision history for this message
Martin Packman (gz) wrote :

The basic aim to here sounds like the right thing, content conflicts for files that happen to share the same path after merging is never going to be useful. Some code style nits you can act on or not as you wish, but needs fixing for improving the release notes a little.

+ keep_this = False

An alternative to this flag and the check at the end would be to `return None` early inside these two blocks. This also avoids the need for the `inhibit_content_conflict` flag.

The remainder of the logic added here looks sensible to me, but it's hard to comprehend the exact implications.

+ def _get_filter_tree_path(self, file_id):

Now you've split this to a separate function, may as well return early.

+ filter_tree_path = wt.id2path(file_id)

    return wt.id2path(file_id)

+ except errors.NoSuchId:
+ filter_tree_path = self.other_tree.id2path(file_id)

    except errors.NoSuchId:
        return self.other_tree.id2path(file_id)

+ else:
+ # Skip the id2path lookup for older formats
+ filter_tree_path = None
+ return filter_tree_path

    return None

+ # File created with different file-ids but deleted on one side
+ (dict(_base_actions='create_file_a'),

I like functional programming languages, but the way these test_conflict cases are spelt with scenarios like this is... not very easy to read. Per the prevailing style though, they look about right.

In the release notes:

+* Fixed an infinite loop when creating a repo at the root of the filesystem,

Moving this is right, but:

+* 'Duplicate' conflicts are now created instead of 'Content' ones when a
+ merge tries to create two entries for the same path.
+ (Vincent Ladeuil, #880701)

'D' should come before 'F'?

Also, saying something about the what effect this fix has directly would be good, as this is a user-facing bug. So, mention the exception that will no longer be raised in which circumstances or similar.

General review note, fixing code formatting nits as you go when working on something like this is a good habit, but pulling them out to a separate, prerequisite if needed, branch would make life easier for diff readers.

review: Needs Fixing
Revision history for this message
Vincent Ladeuil (vila) wrote :
Download full text (4.2 KiB)

> The basic aim to here sounds like the right thing, content conflicts for files
> that happen to share the same path after merging is never going to be useful.
> Some code style nits you can act on or not as you wish, but needs fixing for
> improving the release notes a little.
>
>
> + keep_this = False
>
> An alternative to this flag and the check at the end would be to `return None`

return result you mean, but I see your point.

... crap !... it really returns None. That's bad :-( Yet another point to
address ...

Anyway, I don't really agree with returning early as this method is already
very hard to read and I'm afraid adding early returns, while locally making
it more understandable, will make it harder to grasp globally.

It's 120 lines long and really need a better refactoring that I don't want
to do now :-/

If you look at annotations, you'll see that postponing the
self.tt.delete_contents(trans_id) was introduced recently and it took me a
while to understand its true meaning and *this* proposal introducing an
exception for it makes me think that we'd better put it back in all relevant
parts instead of delaying it.

As it stands, I think it makes more sense to leave it this way so people
realize that its intent is to get rid of the content of the file *before*
the merge *because* it's obsoleted by the merge producing a different
content (I've clarified the associated comment).

Also, there is:

        if not self.this_tree.has_id(file_id) and result == "modified":
            self.tt.version_file(file_id, trans_id)

that is still needed (or may be not but the fact that I can't tell is
another indication we still have more work to do here).

> early inside these two blocks. This also avoids the need for the
> `inhibit_content_conflict` flag.
>
> The remainder of the logic added here looks sensible to me, but it's hard to
> comprehend the exact implications.

Yeah, I know :-/ The conflicts being handled in various parts under various
circumstances is part of the problem. I don't have a good answer for that
(yet) as it's strongly related to tree transform features.

>
>
> + def _get_filter_tree_path(self, file_id):
>
> Now you've split this to a separate function, may as well return early.

Fixed.

>
> + # File created with different file-ids but deleted on one side
> + (dict(_base_actions='create_file_a'),
>

> I like functional programming languages, but the way these test_conflict
> cases are spelt with scenarios like this is... not very easy to read. Per
> the prevailing style though, they look about right.

Yeah, this has been mentioned in the past.

These tests are... hard to describe.

An alternative would be to generate test scripts instead but I refrained
from doing so in the past as that turned them into blackbox tests which
couldn't be debugged easily. This is not true anymore.

Otherwise, their main interest is that they allow to really run 4 different
tests with a single description:
- merging from one branch into the other (and vice-versa)
- resolving in both directions (--take-this and --take-other)

They also encompass running two bzr commands (merge and resolve) which is
not en...

Read more...

Revision history for this message
Martin Packman (gz) wrote :

> > + keep_this = False
> >
> > An alternative to this flag and the check at the end would be to `return
> None`
>
> return result you mean, but I see your point.
>
> ... crap !... it really returns None. That's bad :-( Yet another point to
> address ...

Should is also be a string of some sort? Is None intended to indicate unhandled?

> Anyway, I don't really agree with returning early as this method is already
> very hard to read and I'm afraid adding early returns, while locally making
> it more understandable, will make it harder to grasp globally.
>
> It's 120 lines long and really need a better refactoring that I don't want
> to do now :-/

It's reasonable to avoid funny control flow in a function this large, the flags are at least quite obvious.

> Also, there is:
>
> if not self.this_tree.has_id(file_id) and result == "modified":
> self.tt.version_file(file_id, trans_id)
>
> that is still needed (or may be not but the fact that I can't tell is
> another indication we still have more work to do here).

That result being None thing for those two cases skips this currently, which is correct as versioning file_id already happens in your new code when it's required.

> > The remainder of the logic added here looks sensible to me, but it's hard to
> > comprehend the exact implications.
>
> Yeah, I know :-/ The conflicts being handled in various parts under various
> circumstances is part of the problem. I don't have a good answer for that
> (yet) as it's strongly related to tree transform features.

Ideally I think we want someone else to look at this too, but we are rather short of people currently.

> > Also, saying something about the what effect this fix has directly would be
> > good, as this is a user-facing bug. So, mention the exception that will no
> > longer be raised in which circumstances or similar.
>
> Done.

You confused me for a bit there by having two entries under bug fixes now, but the new wording is a definite improvement.

review: Approve
Revision history for this message
Vincent Ladeuil (vila) wrote :

> > ... crap !... it really returns None. That's bad :-( Yet another point to
> > address ...
>
> Should is also be a string of some sort?

Probably.

> Is None intended to indicate unhandled?

I think it's a hole in the implementation. It doesn't really matter for now
but it's weird that it's not more related to whatever the hooks return.

>
> > Anyway, I don't really agree with returning early as this method is already
> > very hard to read and I'm afraid adding early returns, while locally making
> > it more understandable, will make it harder to grasp globally.
> >
> > It's 120 lines long and really need a better refactoring that I don't want
> > to do now :-/
>
> It's reasonable to avoid funny control flow in a function this large, the
> flags are at least quite obvious.

Good, that was the main intent even if I still think this method should be
split into more understandable parts.

>
> > Also, there is:
> >
> > if not self.this_tree.has_id(file_id) and result == "modified":
> > self.tt.version_file(file_id, trans_id)
> >
> > that is still needed (or may be not but the fact that I can't tell is
> > another indication we still have more work to do here).
>
> That result being None thing for those two cases skips this currently, which
> is correct as versioning file_id already happens in your new code when it's
> required.

Damn, you're right ! More evidence this method is hard to grasp.

> Ideally I think we want someone else to look at this too, but we are rather
> short of people currently.

As mentioned on IRC, I'm confident this introduces a special case for
'content conflicts' *only* and those were leading to a 'marlformed
transform'.

> You confused me for a bit there by having two entries under bug fixes now,
> but the new wording is a definite improvement.

Sorry. I should have put those two entries from the start since they address
the two issues mentioned in the bug report. Thanks for helping clarify them anyway.

I'm now waiting for 2.4.3 to open to land there and then in bzr.dev.

Revision history for this message
Vincent Ladeuil (vila) wrote :

sent to pqm by email

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bzrlib/errors.py'
2--- bzrlib/errors.py 2011-08-01 06:24:48 +0000
3+++ bzrlib/errors.py 2011-10-27 15:07:26 +0000
4@@ -1964,7 +1964,7 @@
5 self.prefix = prefix
6
7
8-class MalformedTransform(BzrError):
9+class MalformedTransform(InternalBzrError):
10
11 _fmt = "Tree transform is malformed %(conflicts)r"
12
13
14=== modified file 'bzrlib/merge.py'
15--- bzrlib/merge.py 2011-07-06 09:22:00 +0000
16+++ bzrlib/merge.py 2011-10-27 15:07:26 +0000
17@@ -591,11 +591,11 @@
18 else:
19 self.base_rev_id = self.revision_graph.find_unique_lca(
20 *lcas)
21- sorted_lca_keys = self.revision_graph.find_merge_order(
22+ sorted_lca_keys = self.revision_graph.find_merge_order(
23 revisions[0], lcas)
24 if self.base_rev_id == _mod_revision.NULL_REVISION:
25 self.base_rev_id = sorted_lca_keys[0]
26-
27+
28 if self.base_rev_id == _mod_revision.NULL_REVISION:
29 raise errors.UnrelatedBranches()
30 if self._is_criss_cross:
31@@ -604,7 +604,8 @@
32 trace.mutter('Criss-cross lcas: %r' % lcas)
33 if self.base_rev_id in lcas:
34 trace.mutter('Unable to find unique lca. '
35- 'Fallback %r as best option.' % self.base_rev_id)
36+ 'Fallback %r as best option.'
37+ % self.base_rev_id)
38 interesting_revision_ids = set(lcas)
39 interesting_revision_ids.add(self.base_rev_id)
40 interesting_trees = dict((t.get_revision_id(), t)
41@@ -689,7 +690,8 @@
42 continue
43 sub_merge = Merger(sub_tree.branch, this_tree=sub_tree)
44 sub_merge.merge_type = self.merge_type
45- other_branch = self.other_branch.reference_parent(file_id, relpath)
46+ other_branch = self.other_branch.reference_parent(file_id,
47+ relpath)
48 sub_merge.set_other_revision(other_revision, other_branch)
49 base_revision = self.base_tree.get_reference_revision(file_id)
50 sub_merge.base_tree = \
51@@ -1387,18 +1389,54 @@
52 if hook_status != 'not_applicable':
53 # Don't try any more hooks, this one applies.
54 break
55+ # If the merge ends up replacing the content of the file, we get rid of
56+ # it at the end of this method (this variable is used to track the
57+ # exceptions to this rule).
58+ keep_this = False
59 result = "modified"
60 if hook_status == 'not_applicable':
61- # This is a contents conflict, because none of the available
62- # functions could merge it.
63+ # No merge hook was able to resolve the situation. Two cases exist:
64+ # a content conflict or a duplicate one.
65 result = None
66 name = self.tt.final_name(trans_id)
67 parent_id = self.tt.final_parent(trans_id)
68- if self.this_tree.has_id(file_id):
69- self.tt.unversion_file(trans_id)
70- file_group = self._dump_conflicts(name, parent_id, file_id,
71- set_version=True)
72- self._raw_conflicts.append(('contents conflict', file_group))
73+ duplicate = False
74+ inhibit_content_conflict = False
75+ if params.this_kind is None: # file_id is not in THIS
76+ # Is the name used for a different file_id ?
77+ dupe_path = self.other_tree.id2path(file_id)
78+ this_id = self.this_tree.path2id(dupe_path)
79+ if this_id is not None:
80+ # Two entries for the same path
81+ keep_this = True
82+ # versioning the merged file will trigger a duplicate
83+ # conflict
84+ self.tt.version_file(file_id, trans_id)
85+ transform.create_from_tree(
86+ self.tt, trans_id, self.other_tree, file_id,
87+ filter_tree_path=self._get_filter_tree_path(file_id))
88+ inhibit_content_conflict = True
89+ elif params.other_kind is None: # file_id is not in OTHER
90+ # Is the name used for a different file_id ?
91+ dupe_path = self.this_tree.id2path(file_id)
92+ other_id = self.other_tree.path2id(dupe_path)
93+ if other_id is not None:
94+ # Two entries for the same path again, but here, the other
95+ # entry will also be merged. We simply inhibit the
96+ # 'content' conflict creation because we know OTHER will
97+ # create (or has already created depending on ordering) an
98+ # entry at the same path. This will trigger a 'duplicate'
99+ # conflict later.
100+ keep_this = True
101+ inhibit_content_conflict = True
102+ if not inhibit_content_conflict:
103+ if params.this_kind is not None:
104+ self.tt.unversion_file(trans_id)
105+ # This is a contents conflict, because none of the available
106+ # functions could merge it.
107+ file_group = self._dump_conflicts(name, parent_id, file_id,
108+ set_version=True)
109+ self._raw_conflicts.append(('contents conflict', file_group))
110 elif hook_status == 'success':
111 self.tt.create_file(lines, trans_id)
112 elif hook_status == 'conflicted':
113@@ -1420,36 +1458,23 @@
114 raise AssertionError('unknown hook_status: %r' % (hook_status,))
115 if not self.this_tree.has_id(file_id) and result == "modified":
116 self.tt.version_file(file_id, trans_id)
117- # The merge has been performed, so the old contents should not be
118- # retained.
119- self.tt.delete_contents(trans_id)
120+ if not keep_this:
121+ # The merge has been performed and produced a new content, so the
122+ # old contents should not be retained.
123+ self.tt.delete_contents(trans_id)
124 return result
125
126 def _default_other_winner_merge(self, merge_hook_params):
127 """Replace this contents with other."""
128 file_id = merge_hook_params.file_id
129 trans_id = merge_hook_params.trans_id
130- file_in_this = self.this_tree.has_id(file_id)
131 if self.other_tree.has_id(file_id):
132 # OTHER changed the file
133- wt = self.this_tree
134- if wt.supports_content_filtering():
135- # We get the path from the working tree if it exists.
136- # That fails though when OTHER is adding a file, so
137- # we fall back to the other tree to find the path if
138- # it doesn't exist locally.
139- try:
140- filter_tree_path = wt.id2path(file_id)
141- except errors.NoSuchId:
142- filter_tree_path = self.other_tree.id2path(file_id)
143- else:
144- # Skip the id2path lookup for older formats
145- filter_tree_path = None
146- transform.create_from_tree(self.tt, trans_id,
147- self.other_tree, file_id,
148- filter_tree_path=filter_tree_path)
149+ transform.create_from_tree(
150+ self.tt, trans_id, self.other_tree, file_id,
151+ filter_tree_path=self._get_filter_tree_path(file_id))
152 return 'done', None
153- elif file_in_this:
154+ elif self.this_tree.has_id(file_id):
155 # OTHER deleted the file
156 return 'delete', None
157 else:
158@@ -1529,6 +1554,20 @@
159 other_lines)
160 file_group.append(trans_id)
161
162+
163+ def _get_filter_tree_path(self, file_id):
164+ if self.this_tree.supports_content_filtering():
165+ # We get the path from the working tree if it exists.
166+ # That fails though when OTHER is adding a file, so
167+ # we fall back to the other tree to find the path if
168+ # it doesn't exist locally.
169+ try:
170+ return self.this_tree.id2path(file_id)
171+ except errors.NoSuchId:
172+ return self.other_tree.id2path(file_id)
173+ # Skip the id2path lookup for older formats
174+ return None
175+
176 def _dump_conflicts(self, name, parent_id, file_id, this_lines=None,
177 base_lines=None, other_lines=None, set_version=False,
178 no_base=False):
179@@ -1652,10 +1691,12 @@
180 for trans_id in conflict[1]:
181 file_id = self.tt.final_file_id(trans_id)
182 if file_id is not None:
183+ # Ok we found the relevant file-id
184 break
185 path = fp.get_path(trans_id)
186 for suffix in ('.BASE', '.THIS', '.OTHER'):
187 if path.endswith(suffix):
188+ # Here is the raw path
189 path = path[:-len(suffix)]
190 break
191 c = _mod_conflicts.Conflict.factory(conflict_type,
192
193=== modified file 'bzrlib/tests/test_conflicts.py'
194--- bzrlib/tests/test_conflicts.py 2011-07-07 10:20:59 +0000
195+++ bzrlib/tests/test_conflicts.py 2011-10-27 15:07:26 +0000
196@@ -677,6 +677,14 @@
197 ('fileb_created',
198 dict(actions='create_file_b', check='file_content_b',
199 path='file', file_id='file-b-id')),),
200+ # File created with different file-ids but deleted on one side
201+ (dict(_base_actions='create_file_a'),
202+ ('filea_replaced',
203+ dict(actions='replace_file_a_by_b', check='file_content_b',
204+ path='file', file_id='file-b-id')),
205+ ('filea_modified',
206+ dict(actions='modify_file_a', check='file_new_content',
207+ path='file', file_id='file-a-id')),),
208 ])
209
210 def do_nothing(self):
211@@ -694,6 +702,16 @@
212 def check_file_content_b(self):
213 self.assertFileEqual('file b content\n', 'branch/file')
214
215+ def do_replace_file_a_by_b(self):
216+ return [('unversion', 'file-a-id'),
217+ ('add', ('file', 'file-b-id', 'file', 'file b content\n'))]
218+
219+ def do_modify_file_a(self):
220+ return [('modify', ('file-a-id', 'new content\n'))]
221+
222+ def check_file_new_content(self):
223+ self.assertFileEqual('new content\n', 'branch/file')
224+
225 def _get_resolve_path_arg(self, wt, action):
226 return self._this['path']
227
228@@ -1043,7 +1061,8 @@
229 # This is nearly like TestResolveNonDirectoryParent but with branch and
230 # trunk switched. As such it should certainly produce the same
231 # conflict.
232- self.run_script("""
233+ self.assertRaises(errors.MalformedTransform,
234+ self.run_script,"""
235 $ bzr init trunk
236 ...
237 $ cd trunk
238
239=== modified file 'bzrlib/transform.py'
240--- bzrlib/transform.py 2011-07-06 20:52:00 +0000
241+++ bzrlib/transform.py 2011-10-27 15:07:26 +0000
242@@ -3068,7 +3068,7 @@
243 existing_file, new_file = conflict[2], conflict[1]
244 else:
245 existing_file, new_file = conflict[1], conflict[2]
246- new_name = tt.final_name(existing_file)+'.moved'
247+ new_name = tt.final_name(existing_file) + '.moved'
248 tt.adjust_path(new_name, final_parent, existing_file)
249 new_conflicts.add((c_type, 'Moved existing file to',
250 existing_file, new_file))
251
252=== modified file 'doc/en/release-notes/bzr-2.4.txt'
253--- doc/en/release-notes/bzr-2.4.txt 2011-10-27 14:16:10 +0000
254+++ doc/en/release-notes/bzr-2.4.txt 2011-10-27 15:07:26 +0000
255@@ -32,6 +32,17 @@
256 .. Fixes for situations where bzr would previously crash or give incorrect
257 or undesirable results.
258
259+* During merges, when two entries end up using the same path for two
260+ different file-ids (the same file being 'bzr added' in two different
261+ branches) , 'duplicate' conflicts are created instead of 'content'
262+ ones. This was previously leading to a 'Malformed tramsform' exception.
263+ (Vincent Ladeuil, #880701)
264+
265+* 'Malformed transform' exceptions are now recognized as internal errors
266+ instead of user errors and report a traceback. This will reduce user
267+ confusion as there is generally nothing users can do about them.
268+ (Vincent Ladeuil, #880701)
269+
270 Documentation
271 *************
272
273@@ -102,6 +113,7 @@
274 * Return early from create_delta_index_from_delta given tiny inputs. This
275 avoids raising a spurious MemoryError on certain platforms such as AIX.
276 (John Arbash Meinel, #856731)
277+
278
279 Documentation
280 *************

Subscribers

People subscribed via source and target branches