Merge lp:~parthm/bzr/563646-commit-unicode-message into lp:bzr
| Status: | Merged |
|---|---|
| Approved by: | Robert Collins on 2010-05-05 |
| Approved revision: | 5176 |
| Merged at revision: | 5214 |
| Proposed branch: | lp:~parthm/bzr/563646-commit-unicode-message |
| Merge into: | lp:bzr |
| Diff against target: |
82 lines (+43/-0) 3 files modified
NEWS (+5/-0) bzrlib/tests/blackbox/test_commit.py (+35/-0) bzrlib/ui/text.py (+3/-0) |
| To merge this branch: | bzr merge lp:~parthm/bzr/563646-commit-unicode-message |
| Related bugs: |
| Reviewer | Review Type | Date Requested | Status |
|---|---|---|---|
| Martin Pool | 2010-04-22 | Approve on 2010-05-05 | |
| Martin Packman (community) | Abstain on 2010-04-27 | ||
| Robert Collins (community) | 2010-04-20 | Approve on 2010-04-20 | |
| bzr-core | 2010-05-04 | Pending | |
|
Review via email:
|
|||
Commit Message
better unicode handling when commit message (-m) is same as unicode filename. (parthm, #563646)
Description of the Change
=== Fixes Bug #563646 ===
When a file has the same name as the commit message (-m), bzr displays a warning. When the file name is unicode, this was causing ui.text.
ui.text.
Test case is added for the above scenario.
[abc]% ~/src/bzr.
bzr: warning: The commit message is a file name: "€".
(use --file "€" to take commit message from that file)
Committing to: /home/parthm/
added d
Committed revision 7.
[abc]%
| Robert Collins (lifeless) wrote : | # |
Oh, the vertical white space on line 40 of your diff is a little ugly.
| Parth Malwankar (parthm) wrote : | # |
> Oh, the vertical white space on line 40 of your diff is a little ugly.
Removed blank line.
| Martin Packman (gz) wrote : | # |
What should happen if the filename is not in the console output codepage?
Running the test currently I get this error:
Traceback (most recent call last):
File "C:\Python24\
return fn(*args)
File "C:\Python24\
testMethod()
File "C:\bzr\
out, err = self.run_
File "C:\bzr\
working_
File "C:\bzr\
encoding=
File "C:\bzr\
args)
File "C:\bzr\
return a_callable(*args, **kwargs)
File "C:\bzr\
return run_bzr(argv)
File "C:\bzr\
ret = run(*run_argv)
File "C:\bzr\
return self.run(
File "C:\bzr\
return self._operation
File "C:\bzr\
return _do_with_cleanups(
File "C:\bzr\
result = func(*args, **kwargs)
File "C:\bzr\
ui.
File "C:\bzr\
msg = msg.encode(te)
UnicodeEncodeError: 'cp932' codec can't encode character u'\u20ac' in position 36: illegal multibyte sequence
| Robert Collins (lifeless) wrote : | # |
On Wed, Apr 21, 2010 at 2:45 AM, Martin [gz] <email address hidden> wrote:
> Review: Needs Fixing
> What should happen if the filename is not in the console output codepage?
>
So, the filename is unicode in this change, so it will get encoded in the
console codepage.
however, the error you see is more interesting:
> ... File "C:\bzr\
> 238, in show_warning
> msg = msg.encode(te)
> UnicodeEncodeError: 'cp932' codec can't encode character u'\u20ac' in
> position 36: illegal multibyte sequence
My copy of text.py does not do msg.encode(te) - do you have an unmerged
patch you are using ?
| Parth Malwankar (parthm) wrote : | # |
On Tue, Apr 20, 2010 at 10:15 PM, Martin [gz] <email address hidden> wrote:
> Review: Needs Fixing
> What should happen if the filename is not in the console output codepage?
>
> Running the test currently I get this error:
>
I am able to run the tests on WinXP without any problems.
Is there any specific setting on your windows system that I need to
do to reproduce this?
I do have a msg.encode(te) in my branch.
def show_warning(self, msg):
if isinstance(msg, unicode):
te = osutils.
msg = msg.encode(te)
| Martin Packman (gz) wrote : | # |
I'm guessing the euro sign is in your codepage. Try a character that's not in any of them, Alexander likes u"\u1234" for instance.
| Parth Malwankar (parthm) wrote : | # |
> I'm guessing the euro sign is in your codepage. Try a character that's not in
> any of them, Alexander likes u"\u1234" for instance.
I am able to reproduce this error now. We have a path that can't
be shown in terminal encoding.
My initial though is to use either "msg = msg.encode(te. 'replace')"
in show_warning instead of msg.encode(te).
Or to handle UnicodeEncodeError one level up
and show a general warning w/o the filename e.g.
Instead of:
bzr: warning: The commit message is a file name: "€".
(use --file "€" to take commit message from that file)
we show:
bzr: warning: The commit message is a file name.
(use --file to take commit message from that file)
The latter may be more acceptable rather than using
'replace'. In the case of 'replace' user will end
up with a message "... commit message is file name: "?".
(use --file "?" ...)" which may cause more confusion.
Thoughts? Any other ideas?
| Parth Malwankar (parthm) wrote : | # |
> Or to handle UnicodeEncodeError one level up
> and show a general warning w/o the filename e.g.
>
Just to be clear, what I mean is that we show
the complete message with the filename in the
usual case but in case of a UnicodeEncodeError
doing this we show the message omitting the
file name. The thing I don't like about this
option is that we would be doing special handling
one level up.
| Alexander Belchenko (bialix) wrote : | # |
Parth Malwankar пишет:
>> I'm guessing the euro sign is in your codepage. Try a character that's not in
>> any of them, Alexander likes u"\u1234" for instance.
>
> I am able to reproduce this error now. We have a path that can't
> be shown in terminal encoding.
>
> My initial though is to use either "msg = msg.encode(te. 'replace')"
> in show_warning instead of msg.encode(te).
>
> Or to handle UnicodeEncodeError one level up
> and show a general warning w/o the filename e.g.
>
> Instead of:
>
> bzr: warning: The commit message is a file name: "€".
> (use --file "€" to take commit message from that file)
>
> we show:
>
> bzr: warning: The commit message is a file name.
> (use --file to take commit message from that file)
+1
--
All the dude wanted was his rug back
| John A Meinel (jameinel) wrote : | # |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Parth Malwankar wrote:
>> I'm guessing the euro sign is in your codepage. Try a character that's not in
>> any of them, Alexander likes u"\u1234" for instance.
>
> I am able to reproduce this error now. We have a path that can't
> be shown in terminal encoding.
>
> My initial though is to use either "msg = msg.encode(te. 'replace')"
> in show_warning instead of msg.encode(te).
>
> Or to handle UnicodeEncodeError one level up
> and show a general warning w/o the filename e.g.
>
> Instead of:
>
> bzr: warning: The commit message is a file name: "€".
> (use --file "€" to take commit message from that file)
>
> we show:
>
> bzr: warning: The commit message is a file name.
> (use --file to take commit message from that file)
>
> The latter may be more acceptable rather than using
> 'replace'. In the case of 'replace' user will end
> up with a message "... commit message is file name: "?".
> (use --file "?" ...)" which may cause more confusion.
>
>
> Thoughts? Any other ideas?
>
I prefer using 'replace'. Mostly because it is very rare to have only
unprintable characters. So you are more likely to get:
bzr: warning: The commit message is a file name: "foo?.txt".
Rather than just
bzr: warning: The commit message is a file name: "?".
And having *some* context is better than having no context.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAkv
CqIAoNMq2NO3yLf
=GD/U
-----END PGP SIGNATURE-----
| Parth Malwankar (parthm) wrote : | # |
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Parth Malwankar wrote:
> >> I'm guessing the euro sign is in your codepage. Try a character that's not
> in
> >> any of them, Alexander likes u"\u1234" for instance.
> >
> > I am able to reproduce this error now. We have a path that can't
> > be shown in terminal encoding.
> >
> > My initial though is to use either "msg = msg.encode(te. 'replace')"
> > in show_warning instead of msg.encode(te).
> >
> > Or to handle UnicodeEncodeError one level up
> > and show a general warning w/o the filename e.g.
> >
> > Instead of:
> >
> > bzr: warning: The commit message is a file name: "€".
> > (use --file "€" to take commit message from that file)
> >
> > we show:
> >
> > bzr: warning: The commit message is a file name.
> > (use --file to take commit message from that file)
> >
> > The latter may be more acceptable rather than using
> > 'replace'. In the case of 'replace' user will end
> > up with a message "... commit message is file name: "?".
> > (use --file "?" ...)" which may cause more confusion.
> >
> >
> > Thoughts? Any other ideas?
> >
>
> I prefer using 'replace'. Mostly because it is very rare to have only
> unprintable characters. So you are more likely to get:
>
>
> bzr: warning: The commit message is a file name: "foo?.txt".
>
> Rather than just
> bzr: warning: The commit message is a file name: "?".
>
> And having *some* context is better than having no context.
>
Sounds fine to me. This also keeps the code
cleaner.
Only thing is that this is for show_warning
so this is how all warnings would behave.
That should be ok as if the characters are
replaced they can't be displayed in the terminal
encoding anyway. Thats better than show_warning
crashing.
> John
> =:->
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (Cygwin)
> Comment: Using GnuPG with Mozilla - http://
>
> iEYEARECAAYFAkv
> CqIAoNMq2NO3yLf
> =GD/U
> -----END PGP SIGNATURE-----
- 5174. By Parth Malwankar on 2010-04-23
-
merged in trunk.
- 5175. By Parth Malwankar on 2010-04-23
-
moved NEWS entry to 2.2b3
| Martin Packman (gz) wrote : | # |
Test no longer fails here. I don't like sprouting apis that don't know if they should be taking bytes or unicode and try to muddle through inside the function, but I guess this function already has the problem so the change isn't making anything worse.
- 5176. By Parth Malwankar on 2010-04-29
-
merged in changes from trunk and resolved conflict in NEWS
| Parth Malwankar (parthm) wrote : | # |
@Martin Pool: There is a review request by you pending on this mp (2010-04-22). This mp is already in the "ready to land" queue. If this looks ok to you or you want to abstain, I can go ahead and land it.
| Martin Pool (mbp) wrote : | # |
That looks ok to me. It would be nice to directly test the ui handling of unicode, but you can land without it.
Thanks for all the patches!
- 5177. By Parth Malwankar on 2010-05-05
-
merged in changes from trunk and resolved NEWS conflicts.

Great