Merge lp:~gz/bzr/2.5_text_progress_view_unicode_966934 into lp:bzr/2.5

Proposed by Martin Packman
Status: Merged
Approved by: Vincent Ladeuil
Approved revision: no longer in the source branch.
Merged at revision: 6497
Proposed branch: lp:~gz/bzr/2.5_text_progress_view_unicode_966934
Merge into: lp:bzr/2.5
Diff against target: 241 lines (+80/-34)
5 files modified
bzrlib/progress.py (+7/-1)
bzrlib/tests/test_progress.py (+40/-22)
bzrlib/tests/test_ui.py (+20/-8)
bzrlib/ui/text.py (+10/-3)
doc/en/release-notes/bzr-2.5.txt (+3/-0)
To merge this branch: bzr merge lp:~gz/bzr/2.5_text_progress_view_unicode_966934
Reviewer Review Type Date Requested Status
Vincent Ladeuil Approve
Review via email: mp+99790@code.launchpad.net

Commit message

Encode progress task messages when written to a terminal

Description of the change

Some of the progress descriptions in bzrlib.transform were wrapped in gettext() calls, but the underlying TextProgressView has no handling for non-ascii text. So, as these strings get translated, operations that display one of these updates will break trying to write unicode.

This branch changes the TextProgressView class to expect unicode, and accept encoding and errors parameters. The ui stuff in general could do with neater handling here, but this should do for now.

We do not seem to have any tests for the transform progress itself, or a mechanism for blackbox tests to assert on progress updates. For now there's a basic test checking that the progress output is indeed encoded as bytes. I have left worrying about the text width measurement being broken for later, but this may also break progress display for anyone with a fancy UTF-8 terminal.

As I wanted to change from using StringIO to cStringIO to get the failing test, I moved some classes that needed to subclass StringIO into the test module where they are used.

To post a comment you must log in.
Revision history for this message
Vincent Ladeuil (vila) wrote :

8 -from StringIO import StringIO
9 +from cStringIO import StringIO

Good, that was a bug in itself.

71 +class _TTYStringIO(StringIO):
78 +class _NonTTYStringIO(StringIO):

I don't think we want the leading '_' here. Those are helper classes (and
there is even "113 + TTYStringIO = _TTYStringIO" which seems even weirder.

142 + self._encoding = getattr(term_file, "encoding", "ascii")

Don't we want to default to utf8 instead ?

159 + # GZ 2012-03-28: Counting bytes is wrong for calculating width of
160 + # text but better than counting codepoints.

Can you just briefly add (in this comment) a description of what you think
should be done (or even better file a bug and just reference it there) ?
Just from reading the comment, I'm not sure if we should blame python for
not providing an easy way to do that or what *we* need to do there.

155 + def _show_line(self, u):

Worth a trivial docstring mentioning 'u' means: 'We expect unicode'

review: Needs Fixing
Revision history for this message
Martin Packman (gz) wrote :
Download full text (3.3 KiB)

> 71 +class _TTYStringIO(StringIO):
> 78 +class _NonTTYStringIO(StringIO):
>
> I don't think we want the leading '_' here. Those are helper classes (and
> there is even "113 + TTYStringIO = _TTYStringIO" which seems even weirder.

Okay, can do, as I'm touching those lines anyway with the module move.

> 142 + self._encoding = getattr(term_file, "encoding", "ascii")
>
> Don't we want to default to utf8 instead ?

Not really, this class is just for printing to a terminal, so we don't have the case of redirecting to a file to be concerned with. If we print bytes that the term doesn't support, at best we get replacement characters, and generally we'll get random junk. Given the way gettext and locales work, users shouldn't ever hit a situation where translations are installed but the terminal doesn't support them.

> 159 + # GZ 2012-03-28: Counting bytes is wrong for calculating width of
> 160 + # text but better than counting codepoints.
>
> Can you just briefly add (in this comment) a description of what you think
> should be done (or even better file a bug and just reference it there) ?
> Just from reading the comment, I'm not sure if we should blame python for
> not providing an easy way to do that or what *we* need to do there.

It's ridiculously complicated.

On my SJIS term things are reasonable simple. Characters that use a single byte, so ascii and the halfwidth katakana, are one sell wide. Double byte characters are two cells wide, including greek and cyrillic which is only appropriate in an ideographic context (unlike native or mixed latin use).

With terminals using utf-8, things a vastly more complex as the display width of a character the number of bytes used to encode it, but instead depends on what and how the terminal implements various unicode features. So, a linux term supports up to 512 glyphs, so as configured here means it can display CP437 characters, and any other (decoded from utf-8) codepoint appears as a diamond. An xterm is much more capable, and will display most things provided relevent fonts are installed.

Posix provides wcwidth and wcswidth functions that aim to resolve the problem of knowing how many columns are needed to display a character or string, but like all things C and locale related, they're not all that great. One of the issues is there's a gap between what a term can actually display, and how in theory a string could be displayed. So, it will give a width of 1 for a decomposed string of character and combining diacritic, but on basic terminals that will still be shown as the width 2 sequence character and missing glyph marker.

What I've done in the past is use the unicode tables plus some knowledge about the platform to get the basic cases correct, then try and fudge so incorrect outcomes are on the less-bad side. So, when trying to blank a whole line, under-measure so that mistakes will result in some junk at the end of the line, rather than wrapping to a new line.

Just found a recent python bug aimed at exposing the posix function that covers a few points as well:

<http://bugs.python.org/issue12568>

> 155 + def _show_line(self, u):
>
> Worth a trivial docstring mentioni...

Read more...

Revision history for this message
Vincent Ladeuil (vila) wrote :

Thanks a lot for the explanations, I really appreciate.

I'm fine with the tweaks you propose.

review: Approve
Revision history for this message
Martin Packman (gz) wrote :

sent to pqm by email

Revision history for this message
Martin Packman (gz) wrote :

failure: bzrlib.tests.blackbox.test_debug.TestDebugBytes.test_bytes_reports_activity

Traceback (most recent call last):
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/commands.py", line 920, in exception_to_return_code
    return the_callable(*args, **kwargs)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/commands.py", line 1131, in run_bzr
    ret = run(*run_argv)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/commands.py", line 673, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/commands.py", line 695, in run
    return self._operation.run_simple(*args, **kwargs)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/cleanup.py", line 136, in run_simple
    self.cleanups, self.func, *args, **kwargs)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/cleanup.py", line 166, in _do_with_cleanups
    result = func(*args, **kwargs)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/builtins.py", line 1475, in run
    source_branch=br_from)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/bzrdir.py", line 366, in sprout
    create_tree_if_local=create_tree_if_local)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/cleanup.py", line 132, in run
    self.cleanups, self.func, self, *args, **kwargs)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/cleanup.py", line 166, in _do_with_cleanups
    result = func(*args, **kwargs)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/bzrdir.py", line 416, in _sprout
    result_repo.fetch(source_repository, fetch_spec=fetch_spec)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/vf_repository.py", line 1267, in fetch
    find_ghosts=find_ghosts, fetch_spec=fetch_spec)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/decorators.py", line 218, in write_locked
    result = unbound(self, *args, **kwargs)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/vf_repository.py", line 2584, in fetch
    find_ghosts=find_ghosts)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/fetch.py", line 77, in __init__
    self.__fetch()
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/fetch.py", line 98, in __fetch
    pb.update(gettext("Finding revisions"), 0, 2)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/progress.py", line 122, in update
    self.ui_factory._progress_updated(self)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/ui/text.py", line 374, in _progress_updated
    self._progress_view.show_progress(task)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/ui/text.py", line 561, in show_progress
    self._repaint()
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/ui/text.py", line 543, in _repaint
    self._show_line(s)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/ui/text.py", line 441, in _show_line
    s = u.encode(self._encoding, self._encoding_errors)
TypeError: encode() argument 1 must be string, not None

Revision history for this message
Martin Packman (gz) wrote :

Pretty simple error in how a fallback encoding is specified, real file objects tend to have an encoding attribute set to None rather than no encoding attribute like cStringIO.StringIO objects. Fixed and added some tests, as we don't want to rely on blackbox stuff in a subprocess to catch such problems.

Revision history for this message
Martin Packman (gz) wrote :

sent to pqm by email

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bzrlib/progress.py'
2--- bzrlib/progress.py 2011-12-19 13:23:58 +0000
3+++ bzrlib/progress.py 2012-04-30 10:48:17 +0000
4@@ -58,7 +58,9 @@
5 Code updating the task may also set fields as hints about how to display
6 it: show_pct, show_spinner, show_eta, show_count, show_bar. UIs
7 will not necessarily respect all these fields.
8-
9+
10+ The message given when updating a task must be unicode, not bytes.
11+
12 :ivar update_latency: The interval (in seconds) at which the PB should be
13 updated. Setting this to zero suggests every update should be shown
14 synchronously.
15@@ -106,6 +108,10 @@
16 self.msg)
17
18 def update(self, msg, current_cnt=None, total_cnt=None):
19+ """Report updated task message and if relevent progress counters
20+
21+ The message given must be unicode, not a byte string.
22+ """
23 self.msg = msg
24 self.current_cnt = current_cnt
25 if total_cnt:
26
27=== modified file 'bzrlib/tests/test_progress.py'
28--- bzrlib/tests/test_progress.py 2011-01-12 01:01:53 +0000
29+++ bzrlib/tests/test_progress.py 2012-04-30 10:48:17 +0000
30@@ -15,32 +15,20 @@
31 # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
32
33
34-from StringIO import StringIO
35+from cStringIO import StringIO
36
37+from bzrlib import (
38+ tests,
39+ )
40 from bzrlib.progress import (
41 ProgressTask,
42 )
43-from bzrlib.tests import TestCase
44 from bzrlib.ui.text import (
45 TextProgressView,
46 )
47
48
49-class _TTYStringIO(StringIO):
50- """A helper class which makes a StringIO look like a terminal"""
51-
52- def isatty(self):
53- return True
54-
55-
56-class _NonTTYStringIO(StringIO):
57- """Helper that implements isatty() but returns False"""
58-
59- def isatty(self):
60- return False
61-
62-
63-class TestTextProgressView(TestCase):
64+class TestTextProgressView(tests.TestCase):
65 """Tests for text display of progress bars.
66
67 These try to exercise the progressview independently of its construction,
68@@ -49,13 +37,16 @@
69 # The ProgressTask now connects directly to the ProgressView, so we can
70 # check them independently of the factory or of the determination of what
71 # view to use.
72-
73+
74+ def make_view_only(self, out, width=79):
75+ view = TextProgressView(out)
76+ view._avail_width = lambda: width
77+ return view
78+
79 def make_view(self):
80 out = StringIO()
81- view = TextProgressView(out)
82- view._avail_width = lambda: 79
83- return out, view
84-
85+ return out, self.make_view_only(out)
86+
87 def make_task(self, parent_task, view, msg, curr, total):
88 # would normally be done by UIFactory; is done here so that we don't
89 # have to have one.
90@@ -168,3 +159,30 @@
91 ' 123kB 100kB/s \\ start_aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.. 2000/5000',
92 line)
93 self.assertEqual(len(line), 79)
94+
95+ def test_render_progress_unicode_enc_utf8(self):
96+ out = tests.StringIOWrapper()
97+ out.encoding = "utf-8"
98+ view = self.make_view_only(out, 20)
99+ task = self.make_task(None, view, u"\xa7", 0, 1)
100+ view.show_progress(task)
101+ self.assertEqual('\r/ \xc2\xa7 0/1 \r',
102+ out.getvalue())
103+
104+ def test_render_progress_unicode_enc_missing(self):
105+ out = StringIO()
106+ self.assertRaises(AttributeError, getattr, out, "encoding")
107+ view = self.make_view_only(out, 20)
108+ task = self.make_task(None, view, u"\xa7", 0, 1)
109+ view.show_progress(task)
110+ self.assertEqual('\r/ ? 0/1 \r',
111+ out.getvalue())
112+
113+ def test_render_progress_unicode_enc_none(self):
114+ out = tests.StringIOWrapper()
115+ out.encoding = None
116+ view = self.make_view_only(out, 20)
117+ task = self.make_task(None, view, u"\xa7", 0, 1)
118+ view.show_progress(task)
119+ self.assertEqual('\r/ ? 0/1 \r',
120+ out.getvalue())
121
122=== modified file 'bzrlib/tests/test_ui.py'
123--- bzrlib/tests/test_ui.py 2011-10-08 19:01:59 +0000
124+++ bzrlib/tests/test_ui.py 2012-04-30 10:48:17 +0000
125@@ -31,7 +31,6 @@
126 )
127 from bzrlib.tests import (
128 fixtures,
129- test_progress,
130 )
131 from bzrlib.ui import text as _mod_ui_text
132 from bzrlib.tests.testui import (
133@@ -39,6 +38,20 @@
134 )
135
136
137+class TTYStringIO(StringIO):
138+ """A helper class which makes a StringIO look like a terminal"""
139+
140+ def isatty(self):
141+ return True
142+
143+
144+class NonTTYStringIO(StringIO):
145+ """Helper that implements isatty() but returns False"""
146+
147+ def isatty(self):
148+ return False
149+
150+
151 class TestUIConfiguration(tests.TestCaseWithTransport):
152
153 def test_output_encoding_configuration(self):
154@@ -221,7 +234,7 @@
155
156 def test_text_factory_prompts_and_clears(self):
157 # a get_boolean call should clear the pb before prompting
158- out = test_progress._TTYStringIO()
159+ out = TTYStringIO()
160 self.overrideEnv('TERM', 'xterm')
161 factory = _mod_ui_text.TextUIFactory(
162 stdin=tests.StringIOWrapper("yada\ny\n"),
163@@ -292,8 +305,8 @@
164 def test_quietness(self):
165 self.overrideEnv('BZR_PROGRESS_BAR', 'text')
166 ui_factory = _mod_ui_text.TextUIFactory(None,
167- test_progress._TTYStringIO(),
168- test_progress._TTYStringIO())
169+ TTYStringIO(),
170+ TTYStringIO())
171 self.assertIsInstance(ui_factory._progress_view,
172 _mod_ui_text.TextProgressView)
173 ui_factory.be_quiet(True)
174@@ -358,7 +371,6 @@
175 def test_progress_construction(self):
176 """TextUIFactory constructs the right progress view.
177 """
178- TTYStringIO = test_progress._TTYStringIO
179 FileStringIO = tests.StringIOWrapper
180 for (file_class, term, pb, expected_pb_class) in (
181 # on an xterm, either use them or not as the user requests,
182@@ -391,9 +403,9 @@
183
184 def test_text_ui_non_terminal(self):
185 """Even on non-ttys, make_ui_for_terminal gives a text ui."""
186- stdin = test_progress._NonTTYStringIO('')
187- stderr = test_progress._NonTTYStringIO()
188- stdout = test_progress._NonTTYStringIO()
189+ stdin = NonTTYStringIO('')
190+ stderr = NonTTYStringIO()
191+ stdout = NonTTYStringIO()
192 for term_type in ['dumb', None, 'xterm']:
193 self.overrideEnv('TERM', term_type)
194 uif = _mod_ui.make_ui_for_terminal(stdin, stdout, stderr)
195
196=== modified file 'bzrlib/ui/text.py'
197--- bzrlib/ui/text.py 2011-12-18 12:46:49 +0000
198+++ bzrlib/ui/text.py 2012-04-30 10:48:17 +0000
199@@ -404,8 +404,13 @@
200 this only prints the stack from the nominated current task up to the root.
201 """
202
203- def __init__(self, term_file):
204+ def __init__(self, term_file, encoding=None, errors="replace"):
205 self._term_file = term_file
206+ if encoding is None:
207+ self._encoding = getattr(term_file, "encoding", None) or "ascii"
208+ else:
209+ self._encoding = encoding
210+ self._encoding_errors = errors
211 # true when there's output on the screen we may need to clear
212 self._have_output = False
213 self._last_transport_msg = ''
214@@ -432,10 +437,12 @@
215 else:
216 return w - 1
217
218- def _show_line(self, s):
219- # sys.stderr.write("progress %r\n" % s)
220+ def _show_line(self, u):
221+ s = u.encode(self._encoding, self._encoding_errors)
222 width = self._avail_width()
223 if width is not None:
224+ # GZ 2012-03-28: Counting bytes is wrong for calculating width of
225+ # text but better than counting codepoints.
226 s = '%-*.*s' % (width, width, s)
227 self._term_file.write('\r' + s + '\r')
228
229
230=== modified file 'doc/en/release-notes/bzr-2.5.txt'
231--- doc/en/release-notes/bzr-2.5.txt 2012-03-29 12:53:01 +0000
232+++ doc/en/release-notes/bzr-2.5.txt 2012-04-30 10:48:17 +0000
233@@ -36,6 +36,9 @@
234 destination rather than the proxy when checking certificates.
235 (Martin Packman, #944696)
236
237+* Fix UnicodeEncodeError when translated progress task messages contain
238+ non-ascii text. (Martin Packman, #966934)
239+
240 * Fixed merge tool availability checking and invocation to search the
241 Windows App Path registry in addition to the PATH. (Gordon Tyler, #939605)
242

Subscribers

People subscribed via source and target branches