Code review comment for lp:~songofacandy/bzr/i18n-utextwrap

Martin Packman (gz) wrote :

Thanks for following up on this!

> Could you tell me what is the problem of `unicode.expandtabs`?
> I haven't heard about that.

It gets the tabulation wrong in this context, eg:

>>> s = u"English\ten\n日本語\tja"
>>> print s
English en
日本語 ja
>>> print s.expandtabs()
English en
日本語 ja

The help text we're dealing with doesn't use tabs for tables, so just noting the `expand_tabs` param may give surprising results is fine.

> > + if isinstance(s, str):
> > + return len(s)
> > + assert isinstance(s, unicode)
...
>
> I've removed `if isinstance(...`.

One thing I forgot to mention is that we can't use `assert` either, try `bzr selftest -s bt.test_source`, in these two cases just removing that statement should be fine.

> This private method is tested by test_textwrap and it passes byte string.
> `unicode(text)` uses default encoding that is 'ascii' on most platform and
> raises UnicodeDecodeError when we passes non-ascii string.

This is fine then.

> I think we can use i18n for it. For example, i18n module may be able to
> provide a
> function like:
>
> def get_ambiguous_width():
> if _installed_language in ('cn', 'ja', 'kr'):
> return 2
> else:
> return 1
>
> For now, I've stopped using utextwrap in bzrlib.help module.

That's the kind of thing I was thinking of, I'm not sure if gettext really makes it easy to do this. Passing the actual language to UTextWrapper would also allow it do vary the wrapping behaviour in other language-specific contexts. This isn't something that needs doing in this merge proposal.

> Splitting between combined character or surrogate pair may cause a problem.
> I'll try to make more clever line breaking if the problem happens.
> But if the problem happen, users can show English help (we shoud provide
> that way) so nothing goes worse than now.

Yes, this was just some extra thoughts and not any changes that needed doing now.

« Back to merge proposal