Merge lp:~parthm/bzr/563646-commit-unicode-message into lp:bzr

Proposed by Parth Malwankar
Status: Merged
Approved by: Robert Collins
Approved revision: no longer in the source branch.
Merged at revision: 5214
Proposed branch: lp:~parthm/bzr/563646-commit-unicode-message
Merge into: lp:bzr
Diff against target: 82 lines (+43/-0)
3 files modified
NEWS (+5/-0)
bzrlib/tests/blackbox/test_commit.py (+35/-0)
bzrlib/ui/text.py (+3/-0)
To merge this branch: bzr merge lp:~parthm/bzr/563646-commit-unicode-message
Reviewer Review Type Date Requested Status
Martin Pool Approve
Martin Packman (community) Abstain
Robert Collins (community) Approve
bzr-core Pending
Review via email: mp+23752@code.launchpad.net

Commit message

better unicode handling when commit message (-m) is same as unicode filename. (parthm, #563646)

Description of the change

=== Fixes Bug #563646 ===

When a file has the same name as the commit message (-m), bzr displays a warning. When the file name is unicode, this was causing ui.text.show_warning to fail with UnicodeEncodeError.
ui.text.show_warning is updated to msg.encode in case the message is unicode. This now works as expected.
Test case is added for the above scenario.

[abc]% ~/src/bzr.dev/563646-commit-unicode-message/bzr --no-plugins ci -m €
bzr: warning: The commit message is a file name: "€".
(use --file "€" to take commit message from that file)
Committing to: /home/parthm/tmp/abc/
added d
Committed revision 7.
[abc]%

To post a comment you must log in.
Revision history for this message
Robert Collins (lifeless) wrote :

Great

review: Approve
Revision history for this message
Robert Collins (lifeless) wrote :

Oh, the vertical white space on line 40 of your diff is a little ugly.

Revision history for this message
Parth Malwankar (parthm) wrote :

> Oh, the vertical white space on line 40 of your diff is a little ugly.

Removed blank line.

Revision history for this message
Martin Packman (gz) wrote :

What should happen if the filename is not in the console output codepage?

Running the test currently I get this error:

Traceback (most recent call last):
  File "C:\Python24\lib\site-packages\testtools\runtest.py", line 128, in _run_user
    return fn(*args)
  File "C:\Python24\lib\site-packages\testtools\testcase.py", line 368, in _run_test_method
    testMethod()
  File "C:\bzr\bzr\563646-commit-unicode-message\bzrlib\tests\blackbox\test_commit.py", line 118, in test_unicode_commit_message
    out, err = self.run_bzr(['commit', '-m', file_name])
  File "C:\bzr\bzr\563646-commit-unicode-message\bzrlib\tests\__init__.py", line 1844, in run_bzr
    working_dir=working_dir,
  File "C:\bzr\bzr\563646-commit-unicode-message\bzrlib\tests\__init__.py", line 1743, in _run_bzr_autosplit
    encoding=encoding, stdin=stdin, working_dir=working_dir,
  File "C:\bzr\bzr\563646-commit-unicode-message\bzrlib\tests\__init__.py", line 1777, in _run_bzr_core
    args)
  File "C:\bzr\bzr\563646-commit-unicode-message\bzrlib\tests\__init__.py", line 2082, in apply_redirected
    return a_callable(*args, **kwargs)
  File "C:\bzr\bzr\563646-commit-unicode-message\bzrlib\commands.py", line 1219, in run_bzr_catch_user_errors
    return run_bzr(argv)
  File "C:\bzr\bzr\563646-commit-unicode-message\bzrlib\commands.py", line 1111, in run_bzr
    ret = run(*run_argv)
  File "C:\bzr\bzr\563646-commit-unicode-message\bzrlib\commands.py", line 685, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "C:\bzr\bzr\563646-commit-unicode-message\bzrlib\commands.py", line 700, in run
    return self._operation.run_simple(*args, **kwargs)
  File "C:\bzr\bzr\563646-commit-unicode-message\bzrlib\cleanup.py", line 121, in run_simple
    return _do_with_cleanups(
  File "C:\bzr\bzr\563646-commit-unicode-message\bzrlib\cleanup.py", line 156, in _do_with_cleanups
    result = func(*args, **kwargs)
  File "C:\bzr\bzr\563646-commit-unicode-message\bzrlib\builtins.py", line 3143, in run
    ui.ui_factory.show_warning(warning_msg)
  File "C:\bzr\bzr\563646-commit-unicode-message\bzrlib\ui\text.py", line 238, in show_warning
    msg = msg.encode(te)
UnicodeEncodeError: 'cp932' codec can't encode character u'\u20ac' in position 36: illegal multibyte sequence

review: Needs Fixing
Revision history for this message
Robert Collins (lifeless) wrote :

On Wed, Apr 21, 2010 at 2:45 AM, Martin [gz] <email address hidden> wrote:

> Review: Needs Fixing
> What should happen if the filename is not in the console output codepage?
>

So, the filename is unicode in this change, so it will get encoded in the
console codepage.

however, the error you see is more interesting:

> ... File "C:\bzr\bzr\563646-commit-unicode-message\bzrlib\ui\text.py", line
> 238, in show_warning
> msg = msg.encode(te)
> UnicodeEncodeError: 'cp932' codec can't encode character u'\u20ac' in
> position 36: illegal multibyte sequence

My copy of text.py does not do msg.encode(te) - do you have an unmerged
patch you are using ?

Revision history for this message
Parth Malwankar (parthm) wrote :

On Tue, Apr 20, 2010 at 10:15 PM, Martin [gz] <email address hidden> wrote:
> Review: Needs Fixing
> What should happen if the filename is not in the console output codepage?
>
> Running the test currently I get this error:
>

I am able to run the tests on WinXP without any problems.
Is there any specific setting on your windows system that I need to
do to reproduce this?

I do have a msg.encode(te) in my branch.

    def show_warning(self, msg):
        self.clear_term()
        if isinstance(msg, unicode):
            te = osutils.get_terminal_encoding()
            msg = msg.encode(te)
        self.stderr.write("bzr: warning: %s\n" % msg)

Revision history for this message
Martin Packman (gz) wrote :

I'm guessing the euro sign is in your codepage. Try a character that's not in any of them, Alexander likes u"\u1234" for instance.

Revision history for this message
Parth Malwankar (parthm) wrote :

> I'm guessing the euro sign is in your codepage. Try a character that's not in
> any of them, Alexander likes u"\u1234" for instance.

I am able to reproduce this error now. We have a path that can't
be shown in terminal encoding.

My initial though is to use either "msg = msg.encode(te. 'replace')"
in show_warning instead of msg.encode(te).

Or to handle UnicodeEncodeError one level up
and show a general warning w/o the filename e.g.

Instead of:

bzr: warning: The commit message is a file name: "€".
(use --file "€" to take commit message from that file)

we show:

bzr: warning: The commit message is a file name.
(use --file to take commit message from that file)

The latter may be more acceptable rather than using
'replace'. In the case of 'replace' user will end
up with a message "... commit message is file name: "?".
(use --file "?" ...)" which may cause more confusion.

Thoughts? Any other ideas?

Revision history for this message
Parth Malwankar (parthm) wrote :

> Or to handle UnicodeEncodeError one level up
> and show a general warning w/o the filename e.g.
>

Just to be clear, what I mean is that we show
the complete message with the filename in the
usual case but in case of a UnicodeEncodeError
doing this we show the message omitting the
file name. The thing I don't like about this
option is that we would be doing special handling
one level up.

Revision history for this message
Alexander Belchenko (bialix) wrote :

Parth Malwankar пишет:
>> I'm guessing the euro sign is in your codepage. Try a character that's not in
>> any of them, Alexander likes u"\u1234" for instance.
>
> I am able to reproduce this error now. We have a path that can't
> be shown in terminal encoding.
>
> My initial though is to use either "msg = msg.encode(te. 'replace')"
> in show_warning instead of msg.encode(te).
>
> Or to handle UnicodeEncodeError one level up
> and show a general warning w/o the filename e.g.
>
> Instead of:
>
> bzr: warning: The commit message is a file name: "€".
> (use --file "€" to take commit message from that file)
>
> we show:
>
> bzr: warning: The commit message is a file name.
> (use --file to take commit message from that file)

+1

--
All the dude wanted was his rug back

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Parth Malwankar wrote:
>> I'm guessing the euro sign is in your codepage. Try a character that's not in
>> any of them, Alexander likes u"\u1234" for instance.
>
> I am able to reproduce this error now. We have a path that can't
> be shown in terminal encoding.
>
> My initial though is to use either "msg = msg.encode(te. 'replace')"
> in show_warning instead of msg.encode(te).
>
> Or to handle UnicodeEncodeError one level up
> and show a general warning w/o the filename e.g.
>
> Instead of:
>
> bzr: warning: The commit message is a file name: "€".
> (use --file "€" to take commit message from that file)
>
> we show:
>
> bzr: warning: The commit message is a file name.
> (use --file to take commit message from that file)
>
> The latter may be more acceptable rather than using
> 'replace'. In the case of 'replace' user will end
> up with a message "... commit message is file name: "?".
> (use --file "?" ...)" which may cause more confusion.
>
>
> Thoughts? Any other ideas?
>

I prefer using 'replace'. Mostly because it is very rare to have only
unprintable characters. So you are more likely to get:

bzr: warning: The commit message is a file name: "foo?.txt".

Rather than just
bzr: warning: The commit message is a file name: "?".

And having *some* context is better than having no context.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkvQWZcACgkQJdeBCYSNAAPAXwCgyw9N9LJADhwoIgTz04JHoLc+
CqIAoNMq2NO3yLf5G2S/ez76JSiSsmkZ
=GD/U
-----END PGP SIGNATURE-----

Revision history for this message
Parth Malwankar (parthm) wrote :

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Parth Malwankar wrote:
> >> I'm guessing the euro sign is in your codepage. Try a character that's not
> in
> >> any of them, Alexander likes u"\u1234" for instance.
> >
> > I am able to reproduce this error now. We have a path that can't
> > be shown in terminal encoding.
> >
> > My initial though is to use either "msg = msg.encode(te. 'replace')"
> > in show_warning instead of msg.encode(te).
> >
> > Or to handle UnicodeEncodeError one level up
> > and show a general warning w/o the filename e.g.
> >
> > Instead of:
> >
> > bzr: warning: The commit message is a file name: "€".
> > (use --file "€" to take commit message from that file)
> >
> > we show:
> >
> > bzr: warning: The commit message is a file name.
> > (use --file to take commit message from that file)
> >
> > The latter may be more acceptable rather than using
> > 'replace'. In the case of 'replace' user will end
> > up with a message "... commit message is file name: "?".
> > (use --file "?" ...)" which may cause more confusion.
> >
> >
> > Thoughts? Any other ideas?
> >
>
> I prefer using 'replace'. Mostly because it is very rare to have only
> unprintable characters. So you are more likely to get:
>
>
> bzr: warning: The commit message is a file name: "foo?.txt".
>
> Rather than just
> bzr: warning: The commit message is a file name: "?".
>
> And having *some* context is better than having no context.
>

Sounds fine to me. This also keeps the code
cleaner.

Only thing is that this is for show_warning
so this is how all warnings would behave.
That should be ok as if the characters are
replaced they can't be displayed in the terminal
encoding anyway. Thats better than show_warning
crashing.

> John
> =:->
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (Cygwin)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAkvQWZcACgkQJdeBCYSNAAPAXwCgyw9N9LJADhwoIgTz04JHoLc+
> CqIAoNMq2NO3yLf5G2S/ez76JSiSsmkZ
> =GD/U
> -----END PGP SIGNATURE-----

Revision history for this message
Martin Packman (gz) wrote :

Test no longer fails here. I don't like sprouting apis that don't know if they should be taking bytes or unicode and try to muddle through inside the function, but I guess this function already has the problem so the change isn't making anything worse.

review: Abstain
Revision history for this message
Parth Malwankar (parthm) wrote :

@Martin Pool: There is a review request by you pending on this mp (2010-04-22). This mp is already in the "ready to land" queue. If this looks ok to you or you want to abstain, I can go ahead and land it.

Revision history for this message
Martin Pool (mbp) wrote :

That looks ok to me. It would be nice to directly test the ui handling of unicode, but you can land without it.

Thanks for all the patches!

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'NEWS'
2--- NEWS 2010-05-05 14:44:53 +0000
3+++ NEWS 2010-05-05 16:03:27 +0000
4@@ -91,6 +91,11 @@
5 * Unicode characters in aliases are now handled correctly and do not cause
6 UnicodeEncodeError exception. (Parth Malwankar, #529930)
7
8+* Unicode commit messages that are the same as a file name no longer cause
9+ UnicodeEncodeError. ``ui.text.show_warning`` now handles unicode
10+ messages.
11+ (Parth Malwankar, #563646)
12+
13 Improvements
14 ************
15
16
17=== modified file 'bzrlib/tests/blackbox/test_commit.py'
18--- bzrlib/tests/blackbox/test_commit.py 2010-04-07 22:22:27 +0000
19+++ bzrlib/tests/blackbox/test_commit.py 2010-05-05 16:03:27 +0000
20@@ -18,6 +18,7 @@
21 """Tests for the commit CLI of bzr."""
22
23 import os
24+import re
25 import sys
26
27 from bzrlib import (
28@@ -107,6 +108,40 @@
29 'modified hello\.txt\n'
30 'Committed revision 2\.\n$')
31
32+ def test_unicode_commit_message_is_filename(self):
33+ """Unicode commit message same as a filename (Bug #563646).
34+ """
35+ file_name = u'\N{euro sign}'
36+ self.run_bzr(['init'])
37+ open(file_name, 'w').write('hello world')
38+ self.run_bzr(['add'])
39+ out, err = self.run_bzr(['commit', '-m', file_name])
40+ reflags = re.MULTILINE|re.DOTALL|re.UNICODE
41+ te = osutils.get_terminal_encoding()
42+ self.assertContainsRe(err.decode(te),
43+ u'The commit message is a file name:',
44+ flags=reflags)
45+
46+ # Run same test with a filename that causes encode
47+ # error for the terminal encoding. We do this
48+ # by forcing terminal encoding of ascii for
49+ # osutils.get_terminal_encoding which is used
50+ # by ui.text.show_warning
51+ default_get_terminal_enc = osutils.get_terminal_encoding
52+ try:
53+ osutils.get_terminal_encoding = lambda: 'ascii'
54+ file_name = u'foo\u1234'
55+ open(file_name, 'w').write('hello world')
56+ self.run_bzr(['add'])
57+ out, err = self.run_bzr(['commit', '-m', file_name])
58+ reflags = re.MULTILINE|re.DOTALL|re.UNICODE
59+ te = osutils.get_terminal_encoding()
60+ self.assertContainsRe(err.decode(te, 'replace'),
61+ u'The commit message is a file name:',
62+ flags=reflags)
63+ finally:
64+ osutils.get_terminal_encoding = default_get_terminal_enc
65+
66 def test_warn_about_forgotten_commit_message(self):
67 """Test that the lack of -m parameter is caught"""
68 wt = self.make_branch_and_tree('.')
69
70=== modified file 'bzrlib/ui/text.py'
71--- bzrlib/ui/text.py 2010-03-25 07:34:15 +0000
72+++ bzrlib/ui/text.py 2010-05-05 16:03:27 +0000
73@@ -233,6 +233,9 @@
74
75 def show_warning(self, msg):
76 self.clear_term()
77+ if isinstance(msg, unicode):
78+ te = osutils.get_terminal_encoding()
79+ msg = msg.encode(te, 'replace')
80 self.stderr.write("bzr: warning: %s\n" % msg)
81
82 def _progress_updated(self, task):