Merge into 2.5 : 2.5_text_progress_view_unicode

Reviewer	Review Type	Date Requested	Status
Vincent Ladeuil		2012-03-28	Approve on 2012-03-29
Review via email: mp+99790@code.launchpad.net

Revision history for this message

Vincent Ladeuil (vila) wrote on 2012-03-29:

#

8 -from StringIO import StringIO
9 +from cStringIO import StringIO

Good, that was a bug in itself.

71 +class _TTYStringIO(StringIO):
78 +class _NonTTYStringIO(StringIO):

I don't think we want the leading '_' here. Those are helper classes (and
there is even "113 + TTYStringIO = _TTYStringIO" which seems even weirder.

142 + self._encoding = getattr(term_file, "encoding", "ascii")

Don't we want to default to utf8 instead ?

159 + # GZ 2012-03-28: Counting bytes is wrong for calculating width of
160 + # text but better than counting codepoints.

Can you just briefly add (in this comment) a description of what you think
should be done (or even better file a bug and just reference it there) ?
Just from reading the comment, I'm not sure if we should blame python for
not providing an easy way to do that or what *we* need to do there.

155 + def _show_line(self, u):

Worth a trivial docstring mentioning 'u' means: 'We expect unicode'

review: Needs Fixing

Revision history for this message

Martin Packman (gz) wrote on 2012-03-29:

#

Download full text (3.3 KiB)

> 71 +class _TTYStringIO(StringIO):
> 78 +class _NonTTYStringIO(StringIO):
>
> I don't think we want the leading '_' here. Those are helper classes (and
> there is even "113 + TTYStringIO = _TTYStringIO" which seems even weirder.

Okay, can do, as I'm touching those lines anyway with the module move.

> 142 + self._encoding = getattr(term_file, "encoding", "ascii")
>
> Don't we want to default to utf8 instead ?

Not really, this class is just for printing to a terminal, so we don't have the case of redirecting to a file to be concerned with. If we print bytes that the term doesn't support, at best we get replacement characters, and generally we'll get random junk. Given the way gettext and locales work, users shouldn't ever hit a situation where translations are installed but the terminal doesn't support them.

> 159 + # GZ 2012-03-28: Counting bytes is wrong for calculating width of
> 160 + # text but better than counting codepoints.
>
> Can you just briefly add (in this comment) a description of what you think
> should be done (or even better file a bug and just reference it there) ?
> Just from reading the comment, I'm not sure if we should blame python for
> not providing an easy way to do that or what *we* need to do there.

It's ridiculously complicated.

On my SJIS term things are reasonable simple. Characters that use a single byte, so ascii and the halfwidth katakana, are one sell wide. Double byte characters are two cells wide, including greek and cyrillic which is only appropriate in an ideographic context (unlike native or mixed latin use).

With terminals using utf-8, things a vastly more complex as the display width of a character the number of bytes used to encode it, but instead depends on what and how the terminal implements various unicode features. So, a linux term supports up to 512 glyphs, so as configured here means it can display CP437 characters, and any other (decoded from utf-8) codepoint appears as a diamond. An xterm is much more capable, and will display most things provided relevent fonts are installed.

Posix provides wcwidth and wcswidth functions that aim to resolve the problem of knowing how many columns are needed to display a character or string, but like all things C and locale related, they're not all that great. One of the issues is there's a gap between what a term can actually display, and how in theory a string could be displayed. So, it will give a width of 1 for a decomposed string of character and combining diacritic, but on basic terminals that will still be shown as the width 2 sequence character and missing glyph marker.

What I've done in the past is use the unicode tables plus some knowledge about the platform to get the basic cases correct, then try and fudge so incorrect outcomes are on the less-bad side. So, when trying to blank a whole line, under-measure so that mistakes will result in some junk at the end of the line, rather than wrapping to a new line.

Just found a recent python bug aimed at exposing the posix function that covers a few points as well:

<http://bugs.python.org/issue12568>

> 155 + def _show_line(self, u):
>
> Worth a trivial docstring mentioni...

> 71      +class _TTYStringIO(StringIO):
> 78      +class _NonTTYStringIO(StringIO):
>
> I don't think we want the leading '_' here. Those are helper classes (and
> there is even "113 + TTYStringIO = _TTYStringIO" which seems even weirder.

Okay, can do, as I'm touching those lines anyway with the module move.

> 142     + self._encoding = getattr(term_file, "encoding", "ascii")
>
> Don't we want to default to utf8 instead ?

Not really, this class is just for printing to a terminal, so we don't have the case of redirecting to a file to be concerned with. If we print bytes that the term doesn't support, at best we get replacement characters, and generally we'll get random junk. Given the way gettext and locales work, users shouldn't ever hit a situation where translations are installed but the terminal doesn't support them.

> 159     + # GZ 2012-03-28: Counting bytes is wrong for calculating width of
> 160     + # text but better than counting codepoints.
>
> Can you just briefly add (in this comment) a description of what you think
> should be done (or even better file a bug and just reference it there) ?
> Just from reading the comment, I'm not sure if we should blame python for
> not providing an easy way to do that or what *we* need to do there.

It's ridiculously complicated.

On my SJIS term things are reasonable simple. Characters that use a single byte, so ascii and the halfwidth katakana, are one sell wide. Double byte characters are two cells wide, including greek and cyrillic which is only appropriate in an ideographic context (unlike native or mixed latin use).

With terminals using utf-8, things a vastly more complex as the display width of a character the number of bytes used to encode it, but instead depends on what and how the terminal implements various unicode features. So, a linux term supports up to 512 glyphs, so as configured here means it can display CP437 characters, and any other (decoded from utf-8) codepoint appears as a diamond. An xterm is much more capable, and will display most things provided relevent fonts are installed.

Posix provides wcwidth and wcswidth functions that aim to resolve the problem of knowing how many columns are needed to display a character or string, but like all things C and locale related, they're not all that great. One of the issues is there's a gap between what a term can actually display, and how in theory a string could be displayed. So, it will give a width of 1 for a decomposed string of character and combining diacritic, but on basic terminals that will still be shown as the width 2 sequence character and missing glyph marker.

What I've done in the past is use the unicode tables plus some knowledge about the platform to get the basic cases correct, then try and fudge so incorrect outcomes are on the less-bad side. So, when trying to blank a whole line, under-measure so that mistakes will result in some junk at the end of the line, rather than wrapping to a new line.

Just found a recent python bug aimed at exposing the posix function that covers a few points as well:

<http://bugs.python.org/issue12568>

> 155     + def _show_line(self, u):
>
> Worth a trivial docstring mentioning 'u' means: 'We expect unicode'

As it's a private helper, I'll probably just document in the public class interfaces that unicode should be passed.

Revision history for this message

Vincent Ladeuil (vila) wrote on 2012-03-29:

#

Thanks a lot for the explanations, I really appreciate.

I'm fine with the tweaks you propose.

review: Approve

Revision history for this message

Martin Packman (gz) wrote on 2012-04-30:

#

sent to pqm by email

Revision history for this message

Martin Packman (gz) wrote on 2012-04-30:

#

failure: bzrlib.tests.blackbox.test_debug.TestDebugBytes.test_bytes_reports_activity

Traceback (most recent call last):
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/commands.py", line 920, in exception_to_return_code
    return the_callable(*args, **kwargs)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/commands.py", line 1131, in run_bzr
    ret = run(*run_argv)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/commands.py", line 673, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/commands.py", line 695, in run
    return self._operation.run_simple(*args, **kwargs)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/cleanup.py", line 136, in run_simple
    self.cleanups, self.func, *args, **kwargs)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/cleanup.py", line 166, in _do_with_cleanups
    result = func(*args, **kwargs)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/builtins.py", line 1475, in run
    source_branch=br_from)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/bzrdir.py", line 366, in sprout
    create_tree_if_local=create_tree_if_local)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/cleanup.py", line 132, in run
    self.cleanups, self.func, self, *args, **kwargs)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/cleanup.py", line 166, in _do_with_cleanups
    result = func(*args, **kwargs)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/bzrdir.py", line 416, in _sprout
    result_repo.fetch(source_repository, fetch_spec=fetch_spec)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/vf_repository.py", line 1267, in fetch
    find_ghosts=find_ghosts, fetch_spec=fetch_spec)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/decorators.py", line 218, in write_locked
    result = unbound(self, *args, **kwargs)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/vf_repository.py", line 2584, in fetch
    find_ghosts=find_ghosts)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/fetch.py", line 77, in __init__
    self.__fetch()
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/fetch.py", line 98, in __fetch
    pb.update(gettext("Finding revisions"), 0, 2)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/progress.py", line 122, in update
    self.ui_factory._progress_updated(self)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/ui/text.py", line 374, in _progress_updated
    self._progress_view.show_progress(task)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/ui/text.py", line 561, in show_progress
    self._repaint()
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/ui/text.py", line 543, in _repaint
    self._show_line(s)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/ui/text.py", line 441, in _show_line
    s = u.encode(self._encoding, self._encoding_errors)
TypeError: encode() argument 1 must be string, not None

failure: bzrlib.tests.blackbox.test_debug.TestDebugBytes.test_bytes_reports_activity

Traceback (most recent call last):
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/commands.py", line 920, in exception_to_return_code
    return the_callable(*args, **kwargs)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/commands.py", line 1131, in run_bzr
    ret = run(*run_argv)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/commands.py", line 673, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/commands.py", line 695, in run
    return self._operation.run_simple(*args, **kwargs)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/cleanup.py", line 136, in run_simple
    self.cleanups, self.func, *args, **kwargs)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/cleanup.py", line 166, in _do_with_cleanups
    result = func(*args, **kwargs)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/builtins.py", line 1475, in run
    source_branch=br_from)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/bzrdir.py", line 366, in sprout
    create_tree_if_local=create_tree_if_local)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/cleanup.py", line 132, in run
    self.cleanups, self.func, self, *args, **kwargs)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/cleanup.py", line 166, in _do_with_cleanups
    result = func(*args, **kwargs)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/bzrdir.py", line 416, in _sprout
    result_repo.fetch(source_repository, fetch_spec=fetch_spec)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/vf_repository.py", line 1267, in fetch
    find_ghosts=find_ghosts, fetch_spec=fetch_spec)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/decorators.py", line 218, in write_locked
    result = unbound(self, *args, **kwargs)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/vf_repository.py", line 2584, in fetch
    find_ghosts=find_ghosts)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/fetch.py", line 77, in __init__
    self.__fetch()
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/fetch.py", line 98, in __fetch
    pb.update(gettext("Finding revisions"), 0, 2)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/progress.py", line 122, in update
    self.ui_factory._progress_updated(self)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/ui/text.py", line 374, in _progress_updated
    self._progress_view.show_progress(task)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/ui/text.py", line 561, in show_progress
    self._repaint()
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/ui/text.py", line 543, in _repaint
    self._show_line(s)
  File "/home/pqm/pqm-workdir/srv/2.5/bzrlib/ui/text.py", line 441, in _show_line
    s = u.encode(self._encoding, self._encoding_errors)
TypeError: encode() argument 1 must be string, not None

Revision history for this message

Martin Packman (gz) wrote on 2012-04-30:

#

Pretty simple error in how a fallback encoding is specified, real file objects tend to have an encoding attribute set to None rather than no encoding attribute like cStringIO.StringIO objects. Fixed and added some tests, as we don't want to rely on blackbox stuff in a subprocess to catch such problems.

Revision history for this message

Martin Packman (gz) wrote on 2012-04-30:

#

sent to pqm by email

Bazaar

Merge lp:~gz/bzr/2.5_text_progress_view_unicode_966934 into lp:bzr/2.5

Commit message

Description of the change

Preview Diff

Subscribers

 === modified file 'bzrlib/progress.py'
 --- bzrlib/progress.py	2011-12-19 13:23:58 +0000
 +++ bzrlib/progress.py	2012-04-30 10:48:17 +0000
@@ -58,7 +58,9 @@
      Code updating the task may also set fields as hints about how to display
      it: show_pct, show_spinner, show_eta, show_count, show_bar.  UIs
      will not necessarily respect all these fields.
--
++
++    The message given when updating a task must be unicode, not bytes.
++
      :ivar update_latency: The interval (in seconds) at which the PB should be
          updated.  Setting this to zero suggests every update should be shown
          synchronously.
@@ -106,6 +108,10 @@
              self.msg)
      def update(self, msg, current_cnt=None, total_cnt=None):
++        """Report updated task message and if relevent progress counters
++
++        The message given must be unicode, not a byte string.
++        """
          self.msg = msg
          self.current_cnt = current_cnt
          if total_cnt:
 === modified file 'bzrlib/tests/test_progress.py'
 --- bzrlib/tests/test_progress.py	2011-01-12 01:01:53 +0000
 +++ bzrlib/tests/test_progress.py	2012-04-30 10:48:17 +0000
@@ -15,32 +15,20 @@
  # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
--from StringIO import StringIO
++from cStringIO import StringIO
++from bzrlib import (
++    tests,
++    )
  from bzrlib.progress import (
      ProgressTask,
+     )
--from bzrlib.tests import TestCase
  from bzrlib.ui.text import (
      TextProgressView,
+     )
--class _TTYStringIO(StringIO):
--    """A helper class which makes a StringIO look like a terminal"""
--
--    def isatty(self):
--        return True
--
--
--class _NonTTYStringIO(StringIO):
--    """Helper that implements isatty() but returns False"""
--
--    def isatty(self):
--        return False
--
--
--class TestTextProgressView(TestCase):
++class TestTextProgressView(tests.TestCase):
      """Tests for text display of progress bars.
      These try to exercise the progressview independently of its construction,
@@ -49,13 +37,16 @@
      # The ProgressTask now connects directly to the ProgressView, so we can
      # check them independently of the factory or of the determination of what
      # view to use.
--
++
++    def make_view_only(self, out, width=79):
++        view = TextProgressView(out)
++        view._avail_width = lambda: width
++        return view
++
      def make_view(self):
          out = StringIO()
--        view = TextProgressView(out)
--        view._avail_width = lambda: 79
--        return out, view
--
++        return out, self.make_view_only(out)
++
      def make_task(self, parent_task, view, msg, curr, total):
          # would normally be done by UIFactory; is done here so that we don't
          # have to have one.
@@ -168,3 +159,30 @@
  '   123kB   100kB/s \\ start_aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.. 2000/5000',
             line)
          self.assertEqual(len(line), 79)
++
++    def test_render_progress_unicode_enc_utf8(self):
++        out = tests.StringIOWrapper()
++        out.encoding = "utf-8"
++        view = self.make_view_only(out, 20)
++        task = self.make_task(None, view, u"\xa7", 0, 1)
++        view.show_progress(task)
++        self.assertEqual('\r/ \xc2\xa7 0/1            \r',
++            out.getvalue())
++
++    def test_render_progress_unicode_enc_missing(self):
++        out = StringIO()
++        self.assertRaises(AttributeError, getattr, out, "encoding")
++        view = self.make_view_only(out, 20)
++        task = self.make_task(None, view, u"\xa7", 0, 1)
++        view.show_progress(task)
++        self.assertEqual('\r/ ? 0/1             \r',
++            out.getvalue())
++
++    def test_render_progress_unicode_enc_none(self):
++        out = tests.StringIOWrapper()
++        out.encoding = None
++        view = self.make_view_only(out, 20)
++        task = self.make_task(None, view, u"\xa7", 0, 1)
++        view.show_progress(task)
++        self.assertEqual('\r/ ? 0/1             \r',
++            out.getvalue())
 === modified file 'bzrlib/tests/test_ui.py'
 --- bzrlib/tests/test_ui.py	2011-10-08 19:01:59 +0000
 +++ bzrlib/tests/test_ui.py	2012-04-30 10:48:17 +0000
@@ -31,7 +31,6 @@
+     )
  from bzrlib.tests import (
      fixtures,
--    test_progress,
+     )
  from bzrlib.ui import text as _mod_ui_text
  from bzrlib.tests.testui import (
@@ -39,6 +38,20 @@
+     )
++class TTYStringIO(StringIO):
++    """A helper class which makes a StringIO look like a terminal"""
++
++    def isatty(self):
++        return True
++
++
++class NonTTYStringIO(StringIO):
++    """Helper that implements isatty() but returns False"""
++
++    def isatty(self):
++        return False
++
++
  class TestUIConfiguration(tests.TestCaseWithTransport):
      def test_output_encoding_configuration(self):
@@ -221,7 +234,7 @@
      def test_text_factory_prompts_and_clears(self):
          # a get_boolean call should clear the pb before prompting
--        out = test_progress._TTYStringIO()
++        out = TTYStringIO()
          self.overrideEnv('TERM', 'xterm')
          factory = _mod_ui_text.TextUIFactory(
              stdin=tests.StringIOWrapper("yada\ny\n"),
@@ -292,8 +305,8 @@
      def test_quietness(self):
          self.overrideEnv('BZR_PROGRESS_BAR', 'text')
          ui_factory = _mod_ui_text.TextUIFactory(None,
--            test_progress._TTYStringIO(),
--            test_progress._TTYStringIO())
++            TTYStringIO(),
++            TTYStringIO())
          self.assertIsInstance(ui_factory._progress_view,
              _mod_ui_text.TextProgressView)
          ui_factory.be_quiet(True)
@@ -358,7 +371,6 @@
      def test_progress_construction(self):
          """TextUIFactory constructs the right progress view.
          """
--        TTYStringIO = test_progress._TTYStringIO
          FileStringIO = tests.StringIOWrapper
          for (file_class, term, pb, expected_pb_class) in (
              # on an xterm, either use them or not as the user requests,
@@ -391,9 +403,9 @@
      def test_text_ui_non_terminal(self):
          """Even on non-ttys, make_ui_for_terminal gives a text ui."""
--        stdin = test_progress._NonTTYStringIO('')
--        stderr = test_progress._NonTTYStringIO()
--        stdout = test_progress._NonTTYStringIO()
++        stdin = NonTTYStringIO('')
++        stderr = NonTTYStringIO()
++        stdout = NonTTYStringIO()
          for term_type in ['dumb', None, 'xterm']:
              self.overrideEnv('TERM', term_type)
              uif = _mod_ui.make_ui_for_terminal(stdin, stdout, stderr)
 === modified file 'bzrlib/ui/text.py'
 --- bzrlib/ui/text.py	2011-12-18 12:46:49 +0000
 +++ bzrlib/ui/text.py	2012-04-30 10:48:17 +0000
@@ -404,8 +404,13 @@
      this only prints the stack from the nominated current task up to the root.
      """
--    def __init__(self, term_file):
++    def __init__(self, term_file, encoding=None, errors="replace"):
          self._term_file = term_file
++        if encoding is None:
++            self._encoding = getattr(term_file, "encoding", None) or "ascii"
++        else:
++            self._encoding = encoding
++        self._encoding_errors = errors
          # true when there's output on the screen we may need to clear
          self._have_output = False
          self._last_transport_msg = ''
@@ -432,10 +437,12 @@
          else:
              return w - 1
--    def _show_line(self, s):
--        # sys.stderr.write("progress %r\n" % s)
++    def _show_line(self, u):
++        s = u.encode(self._encoding, self._encoding_errors)
          width = self._avail_width()
          if width is not None:
++            # GZ 2012-03-28: Counting bytes is wrong for calculating width of
++            #                text but better than counting codepoints.
              s = '%-*.*s' % (width, width, s)
          self._term_file.write('\r' + s + '\r')
 === modified file 'doc/en/release-notes/bzr-2.5.txt'
 --- doc/en/release-notes/bzr-2.5.txt	2012-03-29 12:53:01 +0000
 +++ doc/en/release-notes/bzr-2.5.txt	2012-04-30 10:48:17 +0000
@@ -36,6 +36,9 @@
    destination rather than the proxy when checking certificates.
    (Martin Packman, #944696)
++* Fix UnicodeEncodeError when translated progress task messages contain
++  non-ascii text. (Martin Packman, #966934)
++
  * Fixed merge tool availability checking and invocation to search the
    Windows App Path registry in addition to the PATH. (Gordon Tyler, #939605)