Merge lp:~gz/bzr/conflicts_non_ascii_ui_686161 into lp:bzr

Proposed by Martin Packman
Status: Merged
Approved by: John A Meinel
Approved revision: no longer in the source branch.
Merged at revision: 5907
Proposed branch: lp:~gz/bzr/conflicts_non_ascii_ui_686161
Merge into: lp:bzr
Prerequisite: lp:~gz/bzr/trivial_refactor_conflict_stanza_tests
Diff against target: 256 lines (+83/-25)
8 files modified
bzrlib/conflicts.py (+4/-4)
bzrlib/merge.py (+1/-1)
bzrlib/status.py (+1/-1)
bzrlib/tests/blackbox/test_conflicts.py (+58/-14)
bzrlib/tests/test_conflicts.py (+13/-2)
bzrlib/tests/test_transform.py (+1/-1)
bzrlib/transform.py (+2/-2)
doc/en/release-notes/bzr-2.4.txt (+3/-0)
To merge this branch: bzr merge lp:~gz/bzr/conflicts_non_ascii_ui_686161
Reviewer Review Type Date Requested Status
John A Meinel Approve
Review via email: mp+61763@code.launchpad.net

Commit message

Fix unicode handling in bzrlib.conflicts ui code

Description of the change

Conflict objects contain paths, which are unicode, but the conflicts ui calls str() on them which only works for ascii paths.

This branch adds tests that cover the Conflict stringifying code, and changes str() calls. Note, despite changing the __str__ method to __unicode__ this won't protect other callers from doing the wrong thing. Anyone currently using str() will continue to work in the ascii case and continue to fail in the non-ascii case due to the joys of Python string logic.

A particularly fun gotcha was the per-conflict tests actually passed regardless, because of this:

>>> class U(object):
... def __str__(self):
... return u"\xa7"
...
>>> unicode(U())
u'\xa7'
>>> str(U())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa7' in position 0: ordinal not in range(128)

To post a comment you must log in.
Revision history for this message
Martin Packman (gz) wrote :

Forgot to mention: TestUnicodePathsOnAsciiTerminal is currently skipped because the ui code doesn't decide what to do in this case yet apparently. That's really another issue so this fix shouldn't depend on that being resolved.

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 5/20/2011 3:26 PM, Martin [gz] wrote:
> Martin [gz] has proposed merging lp:~gz/bzr/conflicts_non_ascii_ui_686161 into lp:bzr with lp:~gz/bzr/trivial_refactor_conflict_stanza_tests as a prerequisite.
>
> Requested reviews:
> bzr-core (bzr-core)
> Related bugs:
> Bug #686161 in Bazaar: "conflicts involving unicode paths can't be displayed"
> https://bugs.launchpad.net/bzr/+bug/686161
>
> For more details, see:
> https://code.launchpad.net/~gz/bzr/conflicts_non_ascii_ui_686161/+merge/61763
>
> Conflict objects contain paths, which are unicode, but the conflicts ui calls str() on them which only works for ascii paths.
>
> This branch adds tests that cover the Conflict stringifying code, and changes str() calls. Note, despite changing the __str__ method to __unicode__ this won't protect other callers from doing the wrong thing. Anyone currently using str() will continue to work in the ascii case and continue to fail in the non-ascii case due to the joys of Python string logic.
>
> A particularly fun gotcha was the per-conflict tests actually passed regardless, because of this:
>
>>>> class U(object):
> ... def __str__(self):
> ... return u"\xa7"
> ...
>>>> unicode(U())
> u'\xa7'
>>>> str(U())
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xa7' in position 0: ordinal not in range(128)
>

Looks fine to me.

 merge: approve

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk3WcnsACgkQJdeBCYSNAAOJ0wCguczgoaf5SdOLfFbLRNFVucBU
RR8AnRQy0plKnZBN0vnex9aune5M+HWH
=BhG6
-----END PGP SIGNATURE-----

review: Approve
Revision history for this message
Martin Packman (gz) wrote :

sent to pqm by email

Revision history for this message
Martin Packman (gz) wrote :

sent to pqm by email

Revision history for this message
Martin Packman (gz) wrote :

My understanding of Python string method logic was wrong, with no __str__ method the __repr__ is used as a fallback instead. This is a good thing, it means callers can't accidentally hit UnicodeError if they do the wrong thing.

However, it broke a bunch of tests in other commands that print conflicts, which I missed grepping the source. This is because trace.warning and trace.note implicitly str() their first parameter, and it needs unicode() instead. The existing coverage looks pretty good here and as the tests broke with the repr output, I don't think I need to hunt around trying to inject non-ascii text into these various commands that print conflicts.

Revision history for this message
Martin Packman (gz) wrote :

sent to pqm by email

Revision history for this message
Martin Packman (gz) wrote :

Trying to finish this quickly to land on the last day was a bad idea. :)

I've gone back a few revisions and cleaned up some dumb mistakes and will try on PQM again, and repropose if it still runs into problems.

Revision history for this message
Martin Packman (gz) wrote :

sent to pqm by email

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bzrlib/conflicts.py'
2--- bzrlib/conflicts.py 2011-02-08 13:56:49 +0000
3+++ bzrlib/conflicts.py 2011-05-21 16:48:38 +0000
4@@ -72,7 +72,7 @@
5 continue
6 self.outf.write(conflict.path + '\n')
7 else:
8- self.outf.write(str(conflict) + '\n')
9+ self.outf.write(unicode(conflict) + '\n')
10
11
12 resolve_action_registry = registry.Registry()
13@@ -148,7 +148,7 @@
14 trace.note('%d conflict(s) auto-resolved.', len(resolved))
15 trace.note('Remaining conflicts:')
16 for conflict in un_resolved:
17- trace.note(conflict)
18+ trace.note(unicode(conflict))
19 return 1
20 else:
21 trace.note('All conflicts resolved.')
22@@ -291,7 +291,7 @@
23 def to_strings(self):
24 """Generate strings for the provided conflicts"""
25 for conflict in self:
26- yield str(conflict)
27+ yield unicode(conflict)
28
29 def remove_files(self, tree):
30 """Remove the THIS, BASE and OTHER files for listed conflicts"""
31@@ -390,7 +390,7 @@
32 def __ne__(self, other):
33 return not self.__eq__(other)
34
35- def __str__(self):
36+ def __unicode__(self):
37 return self.format % self.__dict__
38
39 def __repr__(self):
40
41=== modified file 'bzrlib/merge.py'
42--- bzrlib/merge.py 2011-05-18 16:42:48 +0000
43+++ bzrlib/merge.py 2011-05-21 16:48:38 +0000
44@@ -888,7 +888,7 @@
45 self.tt.iter_changes(), self.change_reporter)
46 self.cook_conflicts(fs_conflicts)
47 for conflict in self.cooked_conflicts:
48- trace.warning(conflict)
49+ trace.warning(unicode(conflict))
50
51 def _entries3(self):
52 """Gather data about files modified between three trees.
53
54=== modified file 'bzrlib/status.py'
55--- bzrlib/status.py 2011-03-30 11:50:40 +0000
56+++ bzrlib/status.py 2011-05-21 16:48:38 +0000
57@@ -194,7 +194,7 @@
58 prefix = 'C '
59 else:
60 prefix = ' '
61- to_file.write("%s %s\n" % (prefix, conflict))
62+ to_file.write("%s %s\n" % (prefix, unicode(conflict)))
63 # Show files that were requested but don't exist (and are
64 # not versioned). We don't involve delta in this; these
65 # paths are really the province of just the status
66
67=== modified file 'bzrlib/tests/blackbox/test_conflicts.py'
68--- bzrlib/tests/blackbox/test_conflicts.py 2010-11-07 16:32:51 +0000
69+++ bzrlib/tests/blackbox/test_conflicts.py 2011-05-21 16:48:38 +0000
70@@ -1,4 +1,4 @@
71-# Copyright (C) 2006, 2007, 2009, 2010 Canonical Ltd
72+# Copyright (C) 2006, 2007, 2009, 2010, 2011 Canonical Ltd
73 #
74 # This program is free software; you can redistribute it and/or modify
75 # it under the terms of the GNU General Public License as published by
76@@ -21,29 +21,31 @@
77 )
78 from bzrlib.tests import script
79
80-def make_tree_with_conflicts(test, this_path='this', other_path='other'):
81+
82+def make_tree_with_conflicts(test, this_path='this', other_path='other',
83+ prefix='my'):
84 this_tree = test.make_branch_and_tree(this_path)
85 test.build_tree_contents([
86- ('%s/myfile' % (this_path,), 'this content\n'),
87- ('%s/my_other_file' % (this_path,), 'this content\n'),
88- ('%s/mydir/' % (this_path,),),
89+ ('%s/%sfile' % (this_path, prefix), 'this content\n'),
90+ ('%s/%s_other_file' % (this_path, prefix), 'this content\n'),
91+ ('%s/%sdir/' % (this_path, prefix),),
92 ])
93- this_tree.add('myfile')
94- this_tree.add('my_other_file')
95- this_tree.add('mydir')
96+ this_tree.add(prefix+'file')
97+ this_tree.add(prefix+'_other_file')
98+ this_tree.add(prefix+'dir')
99 this_tree.commit(message="new")
100 other_tree = this_tree.bzrdir.sprout(other_path).open_workingtree()
101 test.build_tree_contents([
102- ('%s/myfile' % (other_path,), 'contentsb\n'),
103- ('%s/my_other_file' % (other_path,), 'contentsb\n'),
104+ ('%s/%sfile' % (other_path, prefix), 'contentsb\n'),
105+ ('%s/%s_other_file' % (other_path, prefix), 'contentsb\n'),
106 ])
107- other_tree.rename_one('mydir', 'mydir2')
108+ other_tree.rename_one(prefix+'dir', prefix+'dir2')
109 other_tree.commit(message="change")
110 test.build_tree_contents([
111- ('%s/myfile' % (this_path,), 'contentsa2\n'),
112- ('%s/my_other_file' % (this_path,), 'contentsa2\n'),
113+ ('%s/%sfile' % (this_path, prefix), 'contentsa2\n'),
114+ ('%s/%s_other_file' % (this_path, prefix), 'contentsa2\n'),
115 ])
116- this_tree.rename_one('mydir', 'mydir3')
117+ this_tree.rename_one(prefix+'dir', prefix+'dir3')
118 this_tree.commit(message='change')
119 this_tree.merge_from_branch(other_tree.branch)
120 return this_tree, other_tree
121@@ -79,3 +81,45 @@
122 Path conflict: mydir3 / mydir2
123 Text conflict in myfile
124 """)
125+
126+
127+class TestUnicodePaths(tests.TestCaseWithTransport):
128+ """Unicode characters in conflicts should be displayed properly"""
129+
130+ encoding = "UTF-8"
131+
132+ def _as_output(self, text):
133+ return text
134+
135+ def test_messages(self):
136+ """Conflict messages involving non-ascii paths are displayed okay"""
137+ make_tree_with_conflicts(self, "branch", prefix=u"\xA7")
138+ out, err = self.run_bzr(["conflicts", "-d", "branch"],
139+ encoding=self.encoding)
140+ self.assertEqual(out.decode(self.encoding),
141+ u"Text conflict in \xA7_other_file\n"
142+ u"Path conflict: \xA7dir3 / \xA7dir2\n"
143+ u"Text conflict in \xA7file\n")
144+ self.assertEqual(err, "")
145+
146+ def test_text_conflict_paths(self):
147+ """Text conflicts on non-ascii paths are displayed okay"""
148+ make_tree_with_conflicts(self, "branch", prefix=u"\xA7")
149+ out, err = self.run_bzr(["conflicts", "-d", "branch", "--text"],
150+ encoding=self.encoding)
151+ self.assertEqual(out.decode(self.encoding),
152+ u"\xA7_other_file\n"
153+ u"\xA7file\n")
154+ self.assertEqual(err, "")
155+
156+
157+class TestUnicodePathsOnAsciiTerminal(TestUnicodePaths):
158+ """Undisplayable unicode characters in conflicts should be escaped"""
159+
160+ encoding = "ascii"
161+
162+ def setUp(self):
163+ self.skip("Need to decide if replacing is the desired behaviour")
164+
165+ def _as_output(self, text):
166+ return text.encode(self.encoding, "replace")
167
168=== modified file 'bzrlib/tests/test_conflicts.py'
169--- bzrlib/tests/test_conflicts.py 2011-05-21 16:48:37 +0000
170+++ bzrlib/tests/test_conflicts.py 2011-05-21 16:48:38 +0000
171@@ -125,10 +125,17 @@
172 self.assertEqual(conflicts.ConflictList([]), tree.conflicts())
173
174
175-class TestConflictStanzas(tests.TestCase):
176+class TestPerConflict(tests.TestCase):
177
178 scenarios = scenarios.multiply_scenarios(vary_by_conflicts())
179
180+ def test_stringification(self):
181+ text = unicode(self.conflict)
182+ self.assertContainsString(text, self.conflict.path)
183+ self.assertContainsString(text.lower(), "conflict")
184+ self.assertContainsString(repr(self.conflict),
185+ self.conflict.__class__.__name__)
186+
187 def test_stanza_roundtrip(self):
188 p = self.conflict
189 o = conflicts.Conflict.factory(**p.as_stanza().as_dict())
190@@ -159,13 +166,17 @@
191 self.assertStartsWith(stanza['conflict_file_id'], u'\xeed')
192
193
194-class TestConflictListStanzas(tests.TestCase):
195+class TestConflictList(tests.TestCase):
196
197 def test_stanzas_roundtrip(self):
198 stanzas_iter = example_conflicts.to_stanzas()
199 processed = conflicts.ConflictList.from_stanzas(stanzas_iter)
200 self.assertEqual(example_conflicts, processed)
201
202+ def test_stringification(self):
203+ for text, o in zip(example_conflicts.to_strings(), example_conflicts):
204+ self.assertEqual(text, unicode(o))
205+
206
207 # FIXME: The shell-like tests should be converted to real whitebox tests... or
208 # moved to a blackbox module -- vila 20100205
209
210=== modified file 'bzrlib/tests/test_transform.py'
211--- bzrlib/tests/test_transform.py 2011-05-13 12:51:05 +0000
212+++ bzrlib/tests/test_transform.py 2011-05-21 16:48:38 +0000
213@@ -811,7 +811,7 @@
214 raw_conflicts = resolve_conflicts(tt)
215 cooked_conflicts = cook_conflicts(raw_conflicts, tt)
216 tt.finalize()
217- conflicts_s = [str(c) for c in cooked_conflicts]
218+ conflicts_s = [unicode(c) for c in cooked_conflicts]
219 self.assertEqual(len(cooked_conflicts), len(conflicts_s))
220 self.assertEqual(conflicts_s[0], 'Conflict adding file dorothy. '
221 'Moved existing file to '
222
223=== modified file 'bzrlib/transform.py'
224--- bzrlib/transform.py 2011-05-13 14:02:14 +0000
225+++ bzrlib/transform.py 2011-05-21 16:48:38 +0000
226@@ -2578,7 +2578,7 @@
227 precomputed_delta = None
228 conflicts = cook_conflicts(raw_conflicts, tt)
229 for conflict in conflicts:
230- trace.warning(conflict)
231+ trace.warning(unicode(conflict))
232 try:
233 wt.add_conflicts(conflicts)
234 except errors.UnsupportedOperation:
235@@ -2820,7 +2820,7 @@
236 unversioned_filter=working_tree.is_ignored)
237 delta.report_changes(tt.iter_changes(), change_reporter)
238 for conflict in conflicts:
239- trace.warning(conflict)
240+ trace.warning(unicode(conflict))
241 pp.next_phase()
242 tt.apply()
243 working_tree.set_merge_modified(merge_modified)
244
245=== modified file 'doc/en/release-notes/bzr-2.4.txt'
246--- doc/en/release-notes/bzr-2.4.txt 2011-05-20 13:28:35 +0000
247+++ doc/en/release-notes/bzr-2.4.txt 2011-05-21 16:48:38 +0000
248@@ -87,6 +87,9 @@
249 submit_location public_location`` never sets ``submit_branch``
250 nor ``public_branch``. (Vincent Ladeuil)
251
252+* Conflicts involving non-ascii filenames are now properly reported rather
253+ than failing with a UnicodeEncodeError. (Martin [GZ], #686161)
254+
255 * Correct parent is now set when using 'switch -b' with bound branches.
256 (A. S. Budden, #513709)
257