Merge lp:~gz/bzr/conflicts_non_ascii_ui_686161 into lp:bzr

Proposed by Martin Packman
Status: Merged
Approved by: John A Meinel
Approved revision: no longer in the source branch.
Merged at revision: 5907
Proposed branch: lp:~gz/bzr/conflicts_non_ascii_ui_686161
Merge into: lp:bzr
Prerequisite: lp:~gz/bzr/trivial_refactor_conflict_stanza_tests
Diff against target: 256 lines (+83/-25)
8 files modified
bzrlib/conflicts.py (+4/-4)
bzrlib/merge.py (+1/-1)
bzrlib/status.py (+1/-1)
bzrlib/tests/blackbox/test_conflicts.py (+58/-14)
bzrlib/tests/test_conflicts.py (+13/-2)
bzrlib/tests/test_transform.py (+1/-1)
bzrlib/transform.py (+2/-2)
doc/en/release-notes/bzr-2.4.txt (+3/-0)
To merge this branch: bzr merge lp:~gz/bzr/conflicts_non_ascii_ui_686161
Reviewer Review Type Date Requested Status
John A Meinel Approve
Review via email: mp+61763@code.launchpad.net

Commit message

Fix unicode handling in bzrlib.conflicts ui code

Description of the change

Conflict objects contain paths, which are unicode, but the conflicts ui calls str() on them which only works for ascii paths.

This branch adds tests that cover the Conflict stringifying code, and changes str() calls. Note, despite changing the __str__ method to __unicode__ this won't protect other callers from doing the wrong thing. Anyone currently using str() will continue to work in the ascii case and continue to fail in the non-ascii case due to the joys of Python string logic.

A particularly fun gotcha was the per-conflict tests actually passed regardless, because of this:

>>> class U(object):
... def __str__(self):
... return u"\xa7"
...
>>> unicode(U())
u'\xa7'
>>> str(U())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa7' in position 0: ordinal not in range(128)

To post a comment you must log in.
Revision history for this message
Martin Packman (gz) wrote :

Forgot to mention: TestUnicodePathsOnAsciiTerminal is currently skipped because the ui code doesn't decide what to do in this case yet apparently. That's really another issue so this fix shouldn't depend on that being resolved.

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 5/20/2011 3:26 PM, Martin [gz] wrote:
> Martin [gz] has proposed merging lp:~gz/bzr/conflicts_non_ascii_ui_686161 into lp:bzr with lp:~gz/bzr/trivial_refactor_conflict_stanza_tests as a prerequisite.
>
> Requested reviews:
> bzr-core (bzr-core)
> Related bugs:
> Bug #686161 in Bazaar: "conflicts involving unicode paths can't be displayed"
> https://bugs.launchpad.net/bzr/+bug/686161
>
> For more details, see:
> https://code.launchpad.net/~gz/bzr/conflicts_non_ascii_ui_686161/+merge/61763
>
> Conflict objects contain paths, which are unicode, but the conflicts ui calls str() on them which only works for ascii paths.
>
> This branch adds tests that cover the Conflict stringifying code, and changes str() calls. Note, despite changing the __str__ method to __unicode__ this won't protect other callers from doing the wrong thing. Anyone currently using str() will continue to work in the ascii case and continue to fail in the non-ascii case due to the joys of Python string logic.
>
> A particularly fun gotcha was the per-conflict tests actually passed regardless, because of this:
>
>>>> class U(object):
> ... def __str__(self):
> ... return u"\xa7"
> ...
>>>> unicode(U())
> u'\xa7'
>>>> str(U())
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xa7' in position 0: ordinal not in range(128)
>

Looks fine to me.

 merge: approve

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk3WcnsACgkQJdeBCYSNAAOJ0wCguczgoaf5SdOLfFbLRNFVucBU
RR8AnRQy0plKnZBN0vnex9aune5M+HWH
=BhG6
-----END PGP SIGNATURE-----

review: Approve
Revision history for this message
Martin Packman (gz) wrote :

sent to pqm by email

Revision history for this message
Martin Packman (gz) wrote :

sent to pqm by email

Revision history for this message
Martin Packman (gz) wrote :

My understanding of Python string method logic was wrong, with no __str__ method the __repr__ is used as a fallback instead. This is a good thing, it means callers can't accidentally hit UnicodeError if they do the wrong thing.

However, it broke a bunch of tests in other commands that print conflicts, which I missed grepping the source. This is because trace.warning and trace.note implicitly str() their first parameter, and it needs unicode() instead. The existing coverage looks pretty good here and as the tests broke with the repr output, I don't think I need to hunt around trying to inject non-ascii text into these various commands that print conflicts.

Revision history for this message
Martin Packman (gz) wrote :

sent to pqm by email

Revision history for this message
Martin Packman (gz) wrote :

Trying to finish this quickly to land on the last day was a bad idea. :)

I've gone back a few revisions and cleaned up some dumb mistakes and will try on PQM again, and repropose if it still runs into problems.

Revision history for this message
Martin Packman (gz) wrote :

sent to pqm by email

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'bzrlib/conflicts.py'
--- bzrlib/conflicts.py 2011-02-08 13:56:49 +0000
+++ bzrlib/conflicts.py 2011-05-21 16:48:38 +0000
@@ -72,7 +72,7 @@
72 continue72 continue
73 self.outf.write(conflict.path + '\n')73 self.outf.write(conflict.path + '\n')
74 else:74 else:
75 self.outf.write(str(conflict) + '\n')75 self.outf.write(unicode(conflict) + '\n')
7676
7777
78resolve_action_registry = registry.Registry()78resolve_action_registry = registry.Registry()
@@ -148,7 +148,7 @@
148 trace.note('%d conflict(s) auto-resolved.', len(resolved))148 trace.note('%d conflict(s) auto-resolved.', len(resolved))
149 trace.note('Remaining conflicts:')149 trace.note('Remaining conflicts:')
150 for conflict in un_resolved:150 for conflict in un_resolved:
151 trace.note(conflict)151 trace.note(unicode(conflict))
152 return 1152 return 1
153 else:153 else:
154 trace.note('All conflicts resolved.')154 trace.note('All conflicts resolved.')
@@ -291,7 +291,7 @@
291 def to_strings(self):291 def to_strings(self):
292 """Generate strings for the provided conflicts"""292 """Generate strings for the provided conflicts"""
293 for conflict in self:293 for conflict in self:
294 yield str(conflict)294 yield unicode(conflict)
295295
296 def remove_files(self, tree):296 def remove_files(self, tree):
297 """Remove the THIS, BASE and OTHER files for listed conflicts"""297 """Remove the THIS, BASE and OTHER files for listed conflicts"""
@@ -390,7 +390,7 @@
390 def __ne__(self, other):390 def __ne__(self, other):
391 return not self.__eq__(other)391 return not self.__eq__(other)
392392
393 def __str__(self):393 def __unicode__(self):
394 return self.format % self.__dict__394 return self.format % self.__dict__
395395
396 def __repr__(self):396 def __repr__(self):
397397
=== modified file 'bzrlib/merge.py'
--- bzrlib/merge.py 2011-05-18 16:42:48 +0000
+++ bzrlib/merge.py 2011-05-21 16:48:38 +0000
@@ -888,7 +888,7 @@
888 self.tt.iter_changes(), self.change_reporter)888 self.tt.iter_changes(), self.change_reporter)
889 self.cook_conflicts(fs_conflicts)889 self.cook_conflicts(fs_conflicts)
890 for conflict in self.cooked_conflicts:890 for conflict in self.cooked_conflicts:
891 trace.warning(conflict)891 trace.warning(unicode(conflict))
892892
893 def _entries3(self):893 def _entries3(self):
894 """Gather data about files modified between three trees.894 """Gather data about files modified between three trees.
895895
=== modified file 'bzrlib/status.py'
--- bzrlib/status.py 2011-03-30 11:50:40 +0000
+++ bzrlib/status.py 2011-05-21 16:48:38 +0000
@@ -194,7 +194,7 @@
194 prefix = 'C '194 prefix = 'C '
195 else:195 else:
196 prefix = ' '196 prefix = ' '
197 to_file.write("%s %s\n" % (prefix, conflict))197 to_file.write("%s %s\n" % (prefix, unicode(conflict)))
198 # Show files that were requested but don't exist (and are198 # Show files that were requested but don't exist (and are
199 # not versioned). We don't involve delta in this; these199 # not versioned). We don't involve delta in this; these
200 # paths are really the province of just the status200 # paths are really the province of just the status
201201
=== modified file 'bzrlib/tests/blackbox/test_conflicts.py'
--- bzrlib/tests/blackbox/test_conflicts.py 2010-11-07 16:32:51 +0000
+++ bzrlib/tests/blackbox/test_conflicts.py 2011-05-21 16:48:38 +0000
@@ -1,4 +1,4 @@
1# Copyright (C) 2006, 2007, 2009, 2010 Canonical Ltd1# Copyright (C) 2006, 2007, 2009, 2010, 2011 Canonical Ltd
2#2#
3# This program is free software; you can redistribute it and/or modify3# This program is free software; you can redistribute it and/or modify
4# it under the terms of the GNU General Public License as published by4# it under the terms of the GNU General Public License as published by
@@ -21,29 +21,31 @@
21 )21 )
22from bzrlib.tests import script22from bzrlib.tests import script
2323
24def make_tree_with_conflicts(test, this_path='this', other_path='other'):24
25def make_tree_with_conflicts(test, this_path='this', other_path='other',
26 prefix='my'):
25 this_tree = test.make_branch_and_tree(this_path)27 this_tree = test.make_branch_and_tree(this_path)
26 test.build_tree_contents([28 test.build_tree_contents([
27 ('%s/myfile' % (this_path,), 'this content\n'),29 ('%s/%sfile' % (this_path, prefix), 'this content\n'),
28 ('%s/my_other_file' % (this_path,), 'this content\n'),30 ('%s/%s_other_file' % (this_path, prefix), 'this content\n'),
29 ('%s/mydir/' % (this_path,),),31 ('%s/%sdir/' % (this_path, prefix),),
30 ])32 ])
31 this_tree.add('myfile')33 this_tree.add(prefix+'file')
32 this_tree.add('my_other_file')34 this_tree.add(prefix+'_other_file')
33 this_tree.add('mydir')35 this_tree.add(prefix+'dir')
34 this_tree.commit(message="new")36 this_tree.commit(message="new")
35 other_tree = this_tree.bzrdir.sprout(other_path).open_workingtree()37 other_tree = this_tree.bzrdir.sprout(other_path).open_workingtree()
36 test.build_tree_contents([38 test.build_tree_contents([
37 ('%s/myfile' % (other_path,), 'contentsb\n'),39 ('%s/%sfile' % (other_path, prefix), 'contentsb\n'),
38 ('%s/my_other_file' % (other_path,), 'contentsb\n'),40 ('%s/%s_other_file' % (other_path, prefix), 'contentsb\n'),
39 ])41 ])
40 other_tree.rename_one('mydir', 'mydir2')42 other_tree.rename_one(prefix+'dir', prefix+'dir2')
41 other_tree.commit(message="change")43 other_tree.commit(message="change")
42 test.build_tree_contents([44 test.build_tree_contents([
43 ('%s/myfile' % (this_path,), 'contentsa2\n'),45 ('%s/%sfile' % (this_path, prefix), 'contentsa2\n'),
44 ('%s/my_other_file' % (this_path,), 'contentsa2\n'),46 ('%s/%s_other_file' % (this_path, prefix), 'contentsa2\n'),
45 ])47 ])
46 this_tree.rename_one('mydir', 'mydir3')48 this_tree.rename_one(prefix+'dir', prefix+'dir3')
47 this_tree.commit(message='change')49 this_tree.commit(message='change')
48 this_tree.merge_from_branch(other_tree.branch)50 this_tree.merge_from_branch(other_tree.branch)
49 return this_tree, other_tree51 return this_tree, other_tree
@@ -79,3 +81,45 @@
79Path conflict: mydir3 / mydir281Path conflict: mydir3 / mydir2
80Text conflict in myfile82Text conflict in myfile
81""")83""")
84
85
86class TestUnicodePaths(tests.TestCaseWithTransport):
87 """Unicode characters in conflicts should be displayed properly"""
88
89 encoding = "UTF-8"
90
91 def _as_output(self, text):
92 return text
93
94 def test_messages(self):
95 """Conflict messages involving non-ascii paths are displayed okay"""
96 make_tree_with_conflicts(self, "branch", prefix=u"\xA7")
97 out, err = self.run_bzr(["conflicts", "-d", "branch"],
98 encoding=self.encoding)
99 self.assertEqual(out.decode(self.encoding),
100 u"Text conflict in \xA7_other_file\n"
101 u"Path conflict: \xA7dir3 / \xA7dir2\n"
102 u"Text conflict in \xA7file\n")
103 self.assertEqual(err, "")
104
105 def test_text_conflict_paths(self):
106 """Text conflicts on non-ascii paths are displayed okay"""
107 make_tree_with_conflicts(self, "branch", prefix=u"\xA7")
108 out, err = self.run_bzr(["conflicts", "-d", "branch", "--text"],
109 encoding=self.encoding)
110 self.assertEqual(out.decode(self.encoding),
111 u"\xA7_other_file\n"
112 u"\xA7file\n")
113 self.assertEqual(err, "")
114
115
116class TestUnicodePathsOnAsciiTerminal(TestUnicodePaths):
117 """Undisplayable unicode characters in conflicts should be escaped"""
118
119 encoding = "ascii"
120
121 def setUp(self):
122 self.skip("Need to decide if replacing is the desired behaviour")
123
124 def _as_output(self, text):
125 return text.encode(self.encoding, "replace")
82126
=== modified file 'bzrlib/tests/test_conflicts.py'
--- bzrlib/tests/test_conflicts.py 2011-05-21 16:48:37 +0000
+++ bzrlib/tests/test_conflicts.py 2011-05-21 16:48:38 +0000
@@ -125,10 +125,17 @@
125 self.assertEqual(conflicts.ConflictList([]), tree.conflicts())125 self.assertEqual(conflicts.ConflictList([]), tree.conflicts())
126126
127127
128class TestConflictStanzas(tests.TestCase):128class TestPerConflict(tests.TestCase):
129129
130 scenarios = scenarios.multiply_scenarios(vary_by_conflicts())130 scenarios = scenarios.multiply_scenarios(vary_by_conflicts())
131131
132 def test_stringification(self):
133 text = unicode(self.conflict)
134 self.assertContainsString(text, self.conflict.path)
135 self.assertContainsString(text.lower(), "conflict")
136 self.assertContainsString(repr(self.conflict),
137 self.conflict.__class__.__name__)
138
132 def test_stanza_roundtrip(self):139 def test_stanza_roundtrip(self):
133 p = self.conflict140 p = self.conflict
134 o = conflicts.Conflict.factory(**p.as_stanza().as_dict())141 o = conflicts.Conflict.factory(**p.as_stanza().as_dict())
@@ -159,13 +166,17 @@
159 self.assertStartsWith(stanza['conflict_file_id'], u'\xeed')166 self.assertStartsWith(stanza['conflict_file_id'], u'\xeed')
160167
161168
162class TestConflictListStanzas(tests.TestCase):169class TestConflictList(tests.TestCase):
163170
164 def test_stanzas_roundtrip(self):171 def test_stanzas_roundtrip(self):
165 stanzas_iter = example_conflicts.to_stanzas()172 stanzas_iter = example_conflicts.to_stanzas()
166 processed = conflicts.ConflictList.from_stanzas(stanzas_iter)173 processed = conflicts.ConflictList.from_stanzas(stanzas_iter)
167 self.assertEqual(example_conflicts, processed)174 self.assertEqual(example_conflicts, processed)
168175
176 def test_stringification(self):
177 for text, o in zip(example_conflicts.to_strings(), example_conflicts):
178 self.assertEqual(text, unicode(o))
179
169180
170# FIXME: The shell-like tests should be converted to real whitebox tests... or181# FIXME: The shell-like tests should be converted to real whitebox tests... or
171# moved to a blackbox module -- vila 20100205182# moved to a blackbox module -- vila 20100205
172183
=== modified file 'bzrlib/tests/test_transform.py'
--- bzrlib/tests/test_transform.py 2011-05-13 12:51:05 +0000
+++ bzrlib/tests/test_transform.py 2011-05-21 16:48:38 +0000
@@ -811,7 +811,7 @@
811 raw_conflicts = resolve_conflicts(tt)811 raw_conflicts = resolve_conflicts(tt)
812 cooked_conflicts = cook_conflicts(raw_conflicts, tt)812 cooked_conflicts = cook_conflicts(raw_conflicts, tt)
813 tt.finalize()813 tt.finalize()
814 conflicts_s = [str(c) for c in cooked_conflicts]814 conflicts_s = [unicode(c) for c in cooked_conflicts]
815 self.assertEqual(len(cooked_conflicts), len(conflicts_s))815 self.assertEqual(len(cooked_conflicts), len(conflicts_s))
816 self.assertEqual(conflicts_s[0], 'Conflict adding file dorothy. '816 self.assertEqual(conflicts_s[0], 'Conflict adding file dorothy. '
817 'Moved existing file to '817 'Moved existing file to '
818818
=== modified file 'bzrlib/transform.py'
--- bzrlib/transform.py 2011-05-13 14:02:14 +0000
+++ bzrlib/transform.py 2011-05-21 16:48:38 +0000
@@ -2578,7 +2578,7 @@
2578 precomputed_delta = None2578 precomputed_delta = None
2579 conflicts = cook_conflicts(raw_conflicts, tt)2579 conflicts = cook_conflicts(raw_conflicts, tt)
2580 for conflict in conflicts:2580 for conflict in conflicts:
2581 trace.warning(conflict)2581 trace.warning(unicode(conflict))
2582 try:2582 try:
2583 wt.add_conflicts(conflicts)2583 wt.add_conflicts(conflicts)
2584 except errors.UnsupportedOperation:2584 except errors.UnsupportedOperation:
@@ -2820,7 +2820,7 @@
2820 unversioned_filter=working_tree.is_ignored)2820 unversioned_filter=working_tree.is_ignored)
2821 delta.report_changes(tt.iter_changes(), change_reporter)2821 delta.report_changes(tt.iter_changes(), change_reporter)
2822 for conflict in conflicts:2822 for conflict in conflicts:
2823 trace.warning(conflict)2823 trace.warning(unicode(conflict))
2824 pp.next_phase()2824 pp.next_phase()
2825 tt.apply()2825 tt.apply()
2826 working_tree.set_merge_modified(merge_modified)2826 working_tree.set_merge_modified(merge_modified)
28272827
=== modified file 'doc/en/release-notes/bzr-2.4.txt'
--- doc/en/release-notes/bzr-2.4.txt 2011-05-20 13:28:35 +0000
+++ doc/en/release-notes/bzr-2.4.txt 2011-05-21 16:48:38 +0000
@@ -87,6 +87,9 @@
87 submit_location public_location`` never sets ``submit_branch``87 submit_location public_location`` never sets ``submit_branch``
88 nor ``public_branch``. (Vincent Ladeuil)88 nor ``public_branch``. (Vincent Ladeuil)
8989
90* Conflicts involving non-ascii filenames are now properly reported rather
91 than failing with a UnicodeEncodeError. (Martin [GZ], #686161)
92
90* Correct parent is now set when using 'switch -b' with bound branches.93* Correct parent is now set when using 'switch -b' with bound branches.
91 (A. S. Budden, #513709)94 (A. S. Budden, #513709)
9295