Merge lp:~gz/bzr/2.5_text_progress_view_unicode_966934 into lp:bzr/2.5

Proposed by Martin Packman
Status: Merged
Approved by: Vincent Ladeuil
Approved revision: no longer in the source branch.
Merged at revision: 6497
Proposed branch: lp:~gz/bzr/2.5_text_progress_view_unicode_966934
Merge into: lp:bzr/2.5
Diff against target: 241 lines (+80/-34)
5 files modified
bzrlib/progress.py (+7/-1)
bzrlib/tests/test_progress.py (+40/-22)
bzrlib/tests/test_ui.py (+20/-8)
bzrlib/ui/text.py (+10/-3)
doc/en/release-notes/bzr-2.5.txt (+3/-0)
To merge this branch: bzr merge lp:~gz/bzr/2.5_text_progress_view_unicode_966934
Reviewer Review Type Date Requested Status
Vincent Ladeuil Approve
Review via email: mp+99790@code.launchpad.net

Commit message

Encode progress task messages when written to a terminal

Description of the change

Some of the progress descriptions in bzrlib.transform were wrapped in gettext() calls, but the underlying TextProgressView has no handling for non-ascii text. So, as these strings get translated, operations that display one of these updates will break trying to write unicode.

This branch changes the TextProgressView class to expect unicode, and accept encoding and errors parameters. The ui stuff in general could do with neater handling here, but this should do for now.

We do not seem to have any tests for the transform progress itself, or a mechanism for blackbox tests to assert on progress updates. For now there's a basic test checking that the progress output is indeed encoded as bytes. I have left worrying about the text width measurement being broken for later, but this may also break progress display for anyone with a fancy UTF-8 terminal.

As I wanted to change from using StringIO to cStringIO to get the failing test, I moved some classes that needed to subclass StringIO into the test module where they are used.

To post a comment you must log in.
Revision history for this message
Vincent Ladeuil (vila) wrote :

8 -from StringIO import StringIO
9 +from cStringIO import StringIO

Good, that was a bug in itself.

71 +class _TTYStringIO(StringIO):
78 +class _NonTTYStringIO(StringIO):

I don't think we want the leading '_' here. Those are helper classes (and
there is even "113 + TTYStringIO = _TTYStringIO" which seems even weirder.

142 + self._encoding = getattr(term_file, "encoding", "ascii")

Don't we want to default to utf8 instead ?

159 + # GZ 2012-03-28: Counting bytes is wrong for calculating width of
160 + # text but better than counting codepoints.

Can you just briefly add (in this comment) a description of what you think
should be done (or even better file a bug and just reference it there) ?
Just from reading the comment, I'm not sure if we should blame python for
not providing an easy way to do that or what *we* need to do there.

155 + def _show_line(self, u):

Worth a trivial docstring mentioning 'u' means: 'We expect unicode'

review: Needs Fixing
Revision history for this message
Martin Packman (gz) wrote :
Download full text (3.3 KiB)

> 71 +class _TTYStringIO(StringIO):
> 78 +class _NonTTYStringIO(StringIO):
>
> I don't think we want the leading '_' here. Those are helper classes (and
> there is even "113 + TTYStringIO = _TTYStringIO" which seems even weirder.

Okay, can do, as I'm touching those lines anyway with the module move.

> 142 + self._encoding = getattr(term_file, "encoding", "ascii")
>
> Don't we want to default to utf8 instead ?

Not really, this class is just for printing to a terminal, so we don't have the case of redirecting to a file to be concerned with. If we print bytes that the term doesn't support, at best we get replacement characters, and generally we'll get random junk. Given the way gettext and locales work, users shouldn't ever hit a situation where translations are installed but the terminal doesn't support them.

> 159 + # GZ 2012-03-28: Counting bytes is wrong for calculating width of
> 160 + # text but better than counting codepoints.
>
> Can you just briefly add (in this comment) a description of what you think
> should be done (or even better file a bug and just reference it there) ?
> Just from reading the comment, I'm not sure if we should blame python for
> not providing an easy way to do that or what *we* need to do there.

It's ridiculously complicated.

On my SJIS term things are reasonable simple. Characters that use a single byte, so ascii and the halfwidth katakana, are one sell wide. Double byte characters are two cells wide, including greek and cyrillic which is only appropriate in an ideographic context (unlike native or mixed latin use).

With terminals using utf-8, things a vastly more complex as the display width of a character the number of bytes used to encode it, but instead depends on what and how the terminal implements various unicode features. So, a linux term supports up to 512 glyphs, so as configured here means it can display CP437 characters, and any other (decoded from utf-8) codepoint appears as a diamond. An xterm is much more capable, and will display most things provided relevent fonts are installed.

Posix provides wcwidth and wcswidth functions that aim to resolve the problem of knowing how many columns are needed to display a character or string, but like all things C and locale related, they're not all that great. One of the issues is there's a gap between what a term can actually display, and how in theory a string could be displayed. So, it will give a width of 1 for a decomposed string of character and combining diacritic, but on basic terminals that will still be shown as the width 2 sequence character and missing glyph marker.

What I've done in the past is use the unicode tables plus some knowledge about the platform to get the basic cases correct, then try and fudge so incorrect outcomes are on the less-bad side. So, when trying to blank a whole line, under-measure so that mistakes will result in some junk at the end of the line, rather than wrapping to a new line.

Just found a recent python bug aimed at exposing the posix function that covers a few points as well:

<http://bugs.python.org/issue12568>

> 155 + def _show_line(self, u):
>
> Worth a trivial docstring mentioni...

Read more...

Revision history for this message
Vincent Ladeuil (vila) wrote :

Thanks a lot for the explanations, I really appreciate.

I'm fine with the tweaks you propose.

review: Approve
Revision history for this message
Martin Packman (gz) wrote :

sent to pqm by email

Revision history for this message
Martin Packman (gz) wrote :

failure: bzrlib.tests.blackbox.test_debug.TestDebugBytes.test_bytes_reports_activity

Traceback (most recent call last):
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/commands.py", line 920, in exception_to_return_code
    return the_callable(*args, **kwargs)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/commands.py", line 1131, in run_bzr
    ret = run(*run_argv)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/commands.py", line 673, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/commands.py", line 695, in run
    return self._operation.run_simple(*args, **kwargs)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/cleanup.py", line 136, in run_simple
    self.cleanups, self.func, *args, **kwargs)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/cleanup.py", line 166, in _do_with_cleanups
    result = func(*args, **kwargs)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/builtins.py", line 1475, in run
    source_branch=br_from)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/bzrdir.py", line 366, in sprout
    create_tree_if_local=create_tree_if_local)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/cleanup.py", line 132, in run
    self.cleanups, self.func, self, *args, **kwargs)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/cleanup.py", line 166, in _do_with_cleanups
    result = func(*args, **kwargs)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/bzrdir.py", line 416, in _sprout
    result_repo.fetch(source_repository, fetch_spec=fetch_spec)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/vf_repository.py", line 1267, in fetch
    find_ghosts=find_ghosts, fetch_spec=fetch_spec)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/decorators.py", line 218, in write_locked
    result = unbound(self, *args, **kwargs)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/vf_repository.py", line 2584, in fetch
    find_ghosts=find_ghosts)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/fetch.py", line 77, in __init__
    self.__fetch()
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/fetch.py", line 98, in __fetch
    pb.update(gettext("Finding revisions"), 0, 2)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/progress.py", line 122, in update
    self.ui_factory._progress_updated(self)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/ui/text.py", line 374, in _progress_updated
    self._progress_view.show_progress(task)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/ui/text.py", line 561, in show_progress
    self._repaint()
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/ui/text.py", line 543, in _repaint
    self._show_line(s)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/ui/text.py", line 441, in _show_line
    s = u.encode(self._encoding, self._encoding_errors)
TypeError: encode() argument 1 must be string, not None

Revision history for this message
Martin Packman (gz) wrote :

Pretty simple error in how a fallback encoding is specified, real file objects tend to have an encoding attribute set to None rather than no encoding attribute like cStringIO.StringIO objects. Fixed and added some tests, as we don't want to rely on blackbox stuff in a subprocess to catch such problems.

Revision history for this message
Martin Packman (gz) wrote :

sent to pqm by email

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'bzrlib/progress.py'
--- bzrlib/progress.py 2011-12-19 13:23:58 +0000
+++ bzrlib/progress.py 2012-04-30 10:48:17 +0000
@@ -58,7 +58,9 @@
58 Code updating the task may also set fields as hints about how to display58 Code updating the task may also set fields as hints about how to display
59 it: show_pct, show_spinner, show_eta, show_count, show_bar. UIs59 it: show_pct, show_spinner, show_eta, show_count, show_bar. UIs
60 will not necessarily respect all these fields.60 will not necessarily respect all these fields.
61 61
62 The message given when updating a task must be unicode, not bytes.
63
62 :ivar update_latency: The interval (in seconds) at which the PB should be64 :ivar update_latency: The interval (in seconds) at which the PB should be
63 updated. Setting this to zero suggests every update should be shown65 updated. Setting this to zero suggests every update should be shown
64 synchronously.66 synchronously.
@@ -106,6 +108,10 @@
106 self.msg)108 self.msg)
107109
108 def update(self, msg, current_cnt=None, total_cnt=None):110 def update(self, msg, current_cnt=None, total_cnt=None):
111 """Report updated task message and if relevent progress counters
112
113 The message given must be unicode, not a byte string.
114 """
109 self.msg = msg115 self.msg = msg
110 self.current_cnt = current_cnt116 self.current_cnt = current_cnt
111 if total_cnt:117 if total_cnt:
112118
=== modified file 'bzrlib/tests/test_progress.py'
--- bzrlib/tests/test_progress.py 2011-01-12 01:01:53 +0000
+++ bzrlib/tests/test_progress.py 2012-04-30 10:48:17 +0000
@@ -15,32 +15,20 @@
15# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA15# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
1616
1717
18from StringIO import StringIO18from cStringIO import StringIO
1919
20from bzrlib import (
21 tests,
22 )
20from bzrlib.progress import (23from bzrlib.progress import (
21 ProgressTask,24 ProgressTask,
22 )25 )
23from bzrlib.tests import TestCase
24from bzrlib.ui.text import (26from bzrlib.ui.text import (
25 TextProgressView,27 TextProgressView,
26 )28 )
2729
2830
29class _TTYStringIO(StringIO):31class TestTextProgressView(tests.TestCase):
30 """A helper class which makes a StringIO look like a terminal"""
31
32 def isatty(self):
33 return True
34
35
36class _NonTTYStringIO(StringIO):
37 """Helper that implements isatty() but returns False"""
38
39 def isatty(self):
40 return False
41
42
43class TestTextProgressView(TestCase):
44 """Tests for text display of progress bars.32 """Tests for text display of progress bars.
4533
46 These try to exercise the progressview independently of its construction,34 These try to exercise the progressview independently of its construction,
@@ -49,13 +37,16 @@
49 # The ProgressTask now connects directly to the ProgressView, so we can37 # The ProgressTask now connects directly to the ProgressView, so we can
50 # check them independently of the factory or of the determination of what38 # check them independently of the factory or of the determination of what
51 # view to use.39 # view to use.
52 40
41 def make_view_only(self, out, width=79):
42 view = TextProgressView(out)
43 view._avail_width = lambda: width
44 return view
45
53 def make_view(self):46 def make_view(self):
54 out = StringIO()47 out = StringIO()
55 view = TextProgressView(out)48 return out, self.make_view_only(out)
56 view._avail_width = lambda: 7949
57 return out, view
58
59 def make_task(self, parent_task, view, msg, curr, total):50 def make_task(self, parent_task, view, msg, curr, total):
60 # would normally be done by UIFactory; is done here so that we don't51 # would normally be done by UIFactory; is done here so that we don't
61 # have to have one.52 # have to have one.
@@ -168,3 +159,30 @@
168' 123kB 100kB/s \\ start_aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.. 2000/5000',159' 123kB 100kB/s \\ start_aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.. 2000/5000',
169 line) 160 line)
170 self.assertEqual(len(line), 79)161 self.assertEqual(len(line), 79)
162
163 def test_render_progress_unicode_enc_utf8(self):
164 out = tests.StringIOWrapper()
165 out.encoding = "utf-8"
166 view = self.make_view_only(out, 20)
167 task = self.make_task(None, view, u"\xa7", 0, 1)
168 view.show_progress(task)
169 self.assertEqual('\r/ \xc2\xa7 0/1 \r',
170 out.getvalue())
171
172 def test_render_progress_unicode_enc_missing(self):
173 out = StringIO()
174 self.assertRaises(AttributeError, getattr, out, "encoding")
175 view = self.make_view_only(out, 20)
176 task = self.make_task(None, view, u"\xa7", 0, 1)
177 view.show_progress(task)
178 self.assertEqual('\r/ ? 0/1 \r',
179 out.getvalue())
180
181 def test_render_progress_unicode_enc_none(self):
182 out = tests.StringIOWrapper()
183 out.encoding = None
184 view = self.make_view_only(out, 20)
185 task = self.make_task(None, view, u"\xa7", 0, 1)
186 view.show_progress(task)
187 self.assertEqual('\r/ ? 0/1 \r',
188 out.getvalue())
171189
=== modified file 'bzrlib/tests/test_ui.py'
--- bzrlib/tests/test_ui.py 2011-10-08 19:01:59 +0000
+++ bzrlib/tests/test_ui.py 2012-04-30 10:48:17 +0000
@@ -31,7 +31,6 @@
31 )31 )
32from bzrlib.tests import (32from bzrlib.tests import (
33 fixtures,33 fixtures,
34 test_progress,
35 )34 )
36from bzrlib.ui import text as _mod_ui_text35from bzrlib.ui import text as _mod_ui_text
37from bzrlib.tests.testui import (36from bzrlib.tests.testui import (
@@ -39,6 +38,20 @@
39 )38 )
4039
4140
41class TTYStringIO(StringIO):
42 """A helper class which makes a StringIO look like a terminal"""
43
44 def isatty(self):
45 return True
46
47
48class NonTTYStringIO(StringIO):
49 """Helper that implements isatty() but returns False"""
50
51 def isatty(self):
52 return False
53
54
42class TestUIConfiguration(tests.TestCaseWithTransport):55class TestUIConfiguration(tests.TestCaseWithTransport):
4356
44 def test_output_encoding_configuration(self):57 def test_output_encoding_configuration(self):
@@ -221,7 +234,7 @@
221234
222 def test_text_factory_prompts_and_clears(self):235 def test_text_factory_prompts_and_clears(self):
223 # a get_boolean call should clear the pb before prompting236 # a get_boolean call should clear the pb before prompting
224 out = test_progress._TTYStringIO()237 out = TTYStringIO()
225 self.overrideEnv('TERM', 'xterm')238 self.overrideEnv('TERM', 'xterm')
226 factory = _mod_ui_text.TextUIFactory(239 factory = _mod_ui_text.TextUIFactory(
227 stdin=tests.StringIOWrapper("yada\ny\n"),240 stdin=tests.StringIOWrapper("yada\ny\n"),
@@ -292,8 +305,8 @@
292 def test_quietness(self):305 def test_quietness(self):
293 self.overrideEnv('BZR_PROGRESS_BAR', 'text')306 self.overrideEnv('BZR_PROGRESS_BAR', 'text')
294 ui_factory = _mod_ui_text.TextUIFactory(None,307 ui_factory = _mod_ui_text.TextUIFactory(None,
295 test_progress._TTYStringIO(),308 TTYStringIO(),
296 test_progress._TTYStringIO())309 TTYStringIO())
297 self.assertIsInstance(ui_factory._progress_view,310 self.assertIsInstance(ui_factory._progress_view,
298 _mod_ui_text.TextProgressView)311 _mod_ui_text.TextProgressView)
299 ui_factory.be_quiet(True)312 ui_factory.be_quiet(True)
@@ -358,7 +371,6 @@
358 def test_progress_construction(self):371 def test_progress_construction(self):
359 """TextUIFactory constructs the right progress view.372 """TextUIFactory constructs the right progress view.
360 """373 """
361 TTYStringIO = test_progress._TTYStringIO
362 FileStringIO = tests.StringIOWrapper374 FileStringIO = tests.StringIOWrapper
363 for (file_class, term, pb, expected_pb_class) in (375 for (file_class, term, pb, expected_pb_class) in (
364 # on an xterm, either use them or not as the user requests,376 # on an xterm, either use them or not as the user requests,
@@ -391,9 +403,9 @@
391403
392 def test_text_ui_non_terminal(self):404 def test_text_ui_non_terminal(self):
393 """Even on non-ttys, make_ui_for_terminal gives a text ui."""405 """Even on non-ttys, make_ui_for_terminal gives a text ui."""
394 stdin = test_progress._NonTTYStringIO('')406 stdin = NonTTYStringIO('')
395 stderr = test_progress._NonTTYStringIO()407 stderr = NonTTYStringIO()
396 stdout = test_progress._NonTTYStringIO()408 stdout = NonTTYStringIO()
397 for term_type in ['dumb', None, 'xterm']:409 for term_type in ['dumb', None, 'xterm']:
398 self.overrideEnv('TERM', term_type)410 self.overrideEnv('TERM', term_type)
399 uif = _mod_ui.make_ui_for_terminal(stdin, stdout, stderr)411 uif = _mod_ui.make_ui_for_terminal(stdin, stdout, stderr)
400412
=== modified file 'bzrlib/ui/text.py'
--- bzrlib/ui/text.py 2011-12-18 12:46:49 +0000
+++ bzrlib/ui/text.py 2012-04-30 10:48:17 +0000
@@ -404,8 +404,13 @@
404 this only prints the stack from the nominated current task up to the root.404 this only prints the stack from the nominated current task up to the root.
405 """405 """
406406
407 def __init__(self, term_file):407 def __init__(self, term_file, encoding=None, errors="replace"):
408 self._term_file = term_file408 self._term_file = term_file
409 if encoding is None:
410 self._encoding = getattr(term_file, "encoding", None) or "ascii"
411 else:
412 self._encoding = encoding
413 self._encoding_errors = errors
409 # true when there's output on the screen we may need to clear414 # true when there's output on the screen we may need to clear
410 self._have_output = False415 self._have_output = False
411 self._last_transport_msg = ''416 self._last_transport_msg = ''
@@ -432,10 +437,12 @@
432 else:437 else:
433 return w - 1438 return w - 1
434439
435 def _show_line(self, s):440 def _show_line(self, u):
436 # sys.stderr.write("progress %r\n" % s)441 s = u.encode(self._encoding, self._encoding_errors)
437 width = self._avail_width()442 width = self._avail_width()
438 if width is not None:443 if width is not None:
444 # GZ 2012-03-28: Counting bytes is wrong for calculating width of
445 # text but better than counting codepoints.
439 s = '%-*.*s' % (width, width, s)446 s = '%-*.*s' % (width, width, s)
440 self._term_file.write('\r' + s + '\r')447 self._term_file.write('\r' + s + '\r')
441448
442449
=== modified file 'doc/en/release-notes/bzr-2.5.txt'
--- doc/en/release-notes/bzr-2.5.txt 2012-03-29 12:53:01 +0000
+++ doc/en/release-notes/bzr-2.5.txt 2012-04-30 10:48:17 +0000
@@ -36,6 +36,9 @@
36 destination rather than the proxy when checking certificates.36 destination rather than the proxy when checking certificates.
37 (Martin Packman, #944696)37 (Martin Packman, #944696)
3838
39* Fix UnicodeEncodeError when translated progress task messages contain
40 non-ascii text. (Martin Packman, #966934)
41
39* Fixed merge tool availability checking and invocation to search the42* Fixed merge tool availability checking and invocation to search the
40 Windows App Path registry in addition to the PATH. (Gordon Tyler, #939605)43 Windows App Path registry in addition to the PATH. (Gordon Tyler, #939605)
4144

Subscribers

People subscribed via source and target branches