Merge lp:~gz/bzr/setlocale_on_posix_only_631350 into lp:bzr/2.2

Proposed by Martin Packman
Status: Merged
Approved by: John A Meinel
Approved revision: no longer in the source branch.
Merged at revision: 5084
Proposed branch: lp:~gz/bzr/setlocale_on_posix_only_631350
Merge into: lp:bzr/2.2
Diff against target: 54 lines (+15/-11)
2 files modified
NEWS (+5/-0)
bzr (+10/-11)
To merge this branch: bzr merge lp:~gz/bzr/setlocale_on_posix_only_631350
Reviewer Review Type Date Requested Status
Alexander Belchenko Approve
John A Meinel Approve
Review via email: mp+34743@code.launchpad.net

Commit message

Use setlocale only on posix systems which avoids console output breakage on windows with Python 2.6+

Description of the change

This branch fixes a serious and highly confusing issue with non-ascii terminal output Alexander hit using the bzr form the 2.2 windows installer. The immediate cause is the upgrade there to Python 2.6, but the underlying issue is a behaviour change involving setlocale the newer windows C runtime used.

The C locale needs to care about encodings because it determines the behaviour of various functions that deal in bytes, mostly in output of localised text like day names in strftime. One of the categories also deals with character types and case conversion so deals with bytes input, a poor (or old) man's unicodedata.

The problem appears to have arisen from some VS coder thinking it would be a good idea to use this locale setting to decode any bytes written to the (unicode) console, rather than following the previous practice of using the console output encoding to determine what they mean. Piping is a bytes->bytes operation and is not affected.

As there is little benefit to trying to use what little localisation the C language provides outside it's native land, and great scope for breakage, this changes the startup script to only call setlocale on posix systems. The Python folks have been working around locale support for years, see for instance <http://bugs.python.org/issue774665> so there's not actually much left affected by it. However, as LC_CTYPE is the only problem category here, an alternative would be to still set everything else on all platforms. But then we'd really want to record the codepage we get back from LC_TIME so we know how to decode what we get back from strftime and so on.

I've got some ideas on how to test this, the problem is it involves writing to the console, which is generally a shared resource between the test runner and the test framework. It's possible to create a new console to doodle in, but flashing up new windows in testing isn't great either.

To post a comment you must log in.
Revision history for this message
Alexander Belchenko (bialix) wrote :

+1e6

review: Approve
Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 9/7/2010 7:23 AM, Martin [gz] wrote:
> Martin [gz] has proposed merging lp:~gz/bzr/setlocale_on_posix_only_631350 into lp:bzr/2.2.
>
> Requested reviews:
> bzr-core (bzr-core)
> Related bugs:
> #631350 Python 2.6 @ win32: print non-ascii characters to console produce borked output
> https://bugs.launchpad.net/bugs/631350
>
>
> This branch fixes a serious and highly confusing issue with non-ascii terminal output Alexander hit using the bzr form the 2.2 windows installer. The immediate cause is the upgrade there to Python 2.6, but the underlying issue is a behaviour change involving setlocale the newer windows C runtime used.
>
> The C locale needs to care about encodings because it determines the behaviour of various functions that deal in bytes, mostly in output of localised text like day names in strftime. One of the categories also deals with character types and case conversion so deals with bytes input, a poor (or old) man's unicodedata.
>
> The problem appears to have arisen from some VS coder thinking it would be a good idea to use this locale setting to decode any bytes written to the (unicode) console, rather than following the previous practice of using the console output encoding to determine what they mean. Piping is a bytes->bytes operation and is not affected.
>
> As there is little benefit to trying to use what little localisation the C language provides outside it's native land, and great scope for breakage, this changes the startup script to only call setlocale on posix systems. The Python folks have been working around locale support for years, see for instance <http://bugs.python.org/issue774665> so there's not actually much left affected by it. However, as LC_CTYPE is the only problem category here, an alternative would be to still set everything else on all platforms. But then we'd really want to record the codepage we get back from LC_TIME so we know how to decode what we get back from strftime and so on.
>
> I've got some ideas on how to test this, the problem is it involves writing to the console, which is generally a shared resource between the test runner and the test framework. It's possible to create a new console to doodle in, but flashing up new windows in testing isn't great either.

This seems fine to me, but:

1) Definitely needs a NEWS entry
2) Why did you move the location of the call?

John
=:->

 merge: approve
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkyGgIAACgkQJdeBCYSNAAOKXgCgyBoJRVl3JeFIe5swO8cLfQO9
BPEAoI65uDWHIlrt5P8saIzuMsP2JxzP
=dnwQ
-----END PGP SIGNATURE-----

review: Approve
Revision history for this message
Gary van der Merwe (garyvdm) wrote :
Download full text (3.3 KiB)

I built a installer based on this branch: http://dl.dropbox.com/u/4494367/bzr-2.2.0-setup.exe

I was not able to go through the test case given in the bug, as I was not able to paste the unicode string into the console. So I created a branch on linux, and pulled it. The results were not good.

C:\Users\Gary\test>bzr pull sftp://garyvdm@192.168.1.152/home/garyvdm/test_unicode
Connected (version 2.0, client OpenSSH_5.3p1)
Authentication (publickey) failed.
SSH garyvdm@192.168.1.152 password:
Authentication (password) successful!
Secsh channel 1 opened.
[chan 1] Opened sftp connection (server version 3)
+N ????/
All changes applied successfully.
Now on revision 1.

C:\Users\Gary\test>bzr log
------------------------------------------------------------
revno: 1
committer: Gary van der Merwe <email address hidden>
branch nick: test_unicode
timestamp: Tue 2010-09-07 21:05:39 +0200
message:
  ????

C:\Users\Gary\test>bzr ls
bzr: ERROR: exceptions.UnicodeEncodeError: 'charmap' codec can't encode characte
rs in position 0-3: character maps to <undefined>

Traceback (most recent call last):
  File "bzrlib\commands.pyo", line 912, in exception_to_return_code
  File "bzrlib\commands.pyo", line 1112, in run_bzr
  File "bzrlib\commands.pyo", line 690, in run_argv_aliases
  File "bzrlib\commands.pyo", line 705, in run
  File "bzrlib\cleanup.pyo", line 135, in run_simple
  File "bzrlib\cleanup.pyo", line 165, in _do_with_cleanups
  File "bzrlib\commands.pyo", line 1127, in ignore_pipe
  File "bzrlib\builtins.pyo", line 2613, in run
  File "bzrlib\ui\text.pyo", line 538, in write
  File "codecs.pyo", line 351, in write
  File "encodings\cp437.pyo", line 12, in encode
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-3: cha
racter maps to <undefined>

bzr 2.2.0 on python 2.6.4 (Windows-post2008Server-6.1.7600)
arguments: ['bzr', 'ls']
encoding: 'cp1252', fsenc: 'mbcs', lang: None
plugins:
  bzrtools C:\Program Files\Bazaar\plugins\bzrtools [2.2.0]
  colo C:\Program Files\Bazaar\plugins\colo [0.1.0]
  explorer C:\Program Files\Bazaar\plugins\explorer [1.1b1]
  fastimport C:\Program Files\Bazaar\plugins\fastimport [0.9.0dev]
  launchpad C:\Program Files\Bazaar\plugins\launchpad [2.2.0]
  loom C:\Program Files\Bazaar\plugins\loom [2.2.1dev]
  netrc_credential_store C:\Program Files\Bazaar\plugins\netrc_credential_store
[2.2.0]
  news_merge C:\Program Files\Bazaar\plugins\news_merge [2.2.0]
  pipeline C:\Program Files\Bazaar\plugins\pipeline [unknown]
  qbzr C:\Program Files\Bazaar\plugins\qbzr [0.19.0]
  rewrite C:\Program Files\Bazaar\plugins\rewrite [0.6.0]
  svn C:\Program Files\Bazaar\plugins\svn [1.0.3]
  upload C:\Program Files\Bazaar\plugins\upload [1.0.0dev]
  xmloutput C:\Program Files\Bazaar\plugins\xmloutput [0.8.6]

*** Bazaar has encountered an internal error. This probably indicates a
    bug in Bazaar. You can help us fix it by filing a bug report at
        https://bugs.launchpad.net/bzr/+file...

Read more...

Revision history for this message
Alexander Belchenko (bialix) wrote :

Gary van der Merwe пишет:
> I built a installer based on this branch: http://dl.dropbox.com/u/4494367/bzr-2.2.0-setup.exe
>
> I was not able to go through the test case given in the bug, as I was not able to paste the unicode string into the console. So I created a branch on linux, and pulled it. The results were not good.
>
> C:\Users\Gary\test>bzr pull sftp://garyvdm@192.168.1.152/home/garyvdm/test_unicode
> Connected (version 2.0, client OpenSSH_5.3p1)
> Authentication (publickey) failed.
> SSH garyvdm@192.168.1.152 password:
> Authentication (password) successful!
> Secsh channel 1 opened.
> [chan 1] Opened sftp connection (server version 3)
> +N ????/
> All changes applied successfully.
> Now on revision 1.
>
> C:\Users\Gary\test>bzr log
> ------------------------------------------------------------
> revno: 1
> committer: Gary van der Merwe <email address hidden>
> branch nick: test_unicode
> timestamp: Tue 2010-09-07 21:05:39 +0200
> message:
> ????

I think these two related to the fact your OEM encoding (cp437 as I
see) is not able to print Russian characters, and because we're using
encode(xxx, replace) those characters became question marks.

> C:\Users\Gary\test>bzr ls
> bzr: ERROR: exceptions.UnicodeEncodeError: 'charmap' codec can't encode characte
> rs in position 0-3: character maps to <undefined>
>
> Traceback (most recent call last):
> File "bzrlib\commands.pyo", line 912, in exception_to_return_code
> File "bzrlib\commands.pyo", line 1112, in run_bzr
> File "bzrlib\commands.pyo", line 690, in run_argv_aliases
> File "bzrlib\commands.pyo", line 705, in run
> File "bzrlib\cleanup.pyo", line 135, in run_simple
> File "bzrlib\cleanup.pyo", line 165, in _do_with_cleanups
> File "bzrlib\commands.pyo", line 1127, in ignore_pipe
> File "bzrlib\builtins.pyo", line 2613, in run
> File "bzrlib\ui\text.pyo", line 538, in write
> File "codecs.pyo", line 351, in write
> File "encodings\cp437.pyo", line 12, in encode
> UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-3: cha
> racter maps to <undefined>

I think this is separate unicode bug, maybe ls command using
encode(xxx, errors) and therefore crashed with traceback.

> bzr 2.2.0 on python 2.6.4 (Windows-post2008Server-6.1.7600)
> arguments: ['bzr', 'ls']
> encoding: 'cp1252', fsenc: 'mbcs', lang: None

Especially because your console is non-russsian capable one.

> I don't know if I'm ment to configure anything to make this work. I also guess that the ls error is maybe a different bug.

You can try running this command in the console first:

chcp 866

and then try again. Also it worth to change settings of the console to
use Lucida Console font (it better works with unicode).

--
All the dude wanted was his rug back

Revision history for this message
Alexander Belchenko (bialix) wrote :

Alexander Belchenko пишет:
> Also it worth to change settings of the console to
> use Lucida Console font (it better works with unicode).

Although it shouldn't matter to copy/paste results here.

Revision history for this message
Martin Pool (mbp) wrote :

No objection to the diff assuming windows testing is ok. I agree it needs news etc.

Revision history for this message
Alexander Belchenko (bialix) wrote :

Gary van der Merwe пишет:
> I built a installer based on this branch: http://dl.dropbox.com/u/4494367/bzr-2.2.0-setup.exe

I've just tested this build, and `bzr ls` shows russian characters
correctly.

Revision history for this message
Martin Packman (gz) wrote :

Sorry about the summary, will try to be more coherent in future.

> This seems fine to me, but:
>
> 1) Definitely needs a NEWS entry

Sure.

> 2) Why did you move the location of the call?

As I was reindenting anyway, moving it up by the actual import (and related OSX hack) makes the script easier to follow.

> > UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-3:
> cha
> > racter maps to <undefined>
>
> I think this is separate unicode bug, maybe ls command using
> encode(xxx, errors) and therefore crashed with traceback.

Ugugugu... why is ls different to everything else? Have we got a bug number for this?

> I've just tested this build, and `bzr ls` shows russian characters
> correctly.

Alexander, did you also get a chance to test for the original issue on a pure Russian machine and check it was still present?

Revision history for this message
Alexander Belchenko (bialix) wrote :

Martin [gz] пишет:
> Alexander, did you also get a chance to test for the original issue on a pure Russian machine and check it was still present?

Not yet, sorry. Will do tonight.

Revision history for this message
Alexander Belchenko (bialix) wrote :

Alexander Belchenko пишет:
> Martin [gz] пишет:
>> Alexander, did you also get a chance to test for the original issue on a pure Russian machine and check it was still present?
>
> Not yet, sorry. Will do tonight.

I've just tested it. On native Russian Windows XP I have the same
borked output:

C:\Temp\3>bzr init
Created a standalone tree (format: pack-0.92)

C:\Temp\3>bzr mkdir Тест
added '?бв

C:\Temp\3>bzr st
added:
  '?бв/

Should be: Тест/

--
All the dude wanted was his rug back

Revision history for this message
Alexander Belchenko (bialix) wrote :

You have news: check
The windows testing says: all right
Ready to land: it seems so

Summary: good to go

review: Approve
Revision history for this message
Martin Packman (gz) wrote :

sent to pqm by email

Revision history for this message
Martin Packman (gz) wrote :

Gra, sorry, forgot I didn't have perms to 2.2, need someone else to merge this.

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 9/8/2010 4:23 PM, Martin [gz] wrote:
> Gra, sorry, forgot I didn't have perms to 2.2, need someone else to merge this.

I thought we were supposed to have the same permissions everywhere. We
should ping a LOSA about this.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkyIHjYACgkQJdeBCYSNAAOuuACfYO5O48qEVrnjEEwt1mJQIY4m
ZgEAoIq0MWGyT4AY65uVY58Sop96osec
=VNe/
-----END PGP SIGNATURE-----

Revision history for this message
John A Meinel (jameinel) wrote :

sent to pqm by email

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'NEWS'
2--- NEWS 2010-08-17 04:55:44 +0000
3+++ NEWS 2010-09-08 13:27:46 +0000
4@@ -32,6 +32,11 @@
5 directory that was a symlink in the previous commit.
6 (Martin Pool, #192859)
7
8+* Only call ``setlocale`` in the bzr startup script on posix systems. This
9+ avoids an issue with the newer windows C runtimes used by Python 2.6 and
10+ later which can mangle bytestrings printed to the console.
11+ (Martin [gz], #631350)
12+
13 Improvements
14 ************
15
16
17=== modified file 'bzr'
18--- bzr 2010-06-22 22:40:29 +0000
19+++ bzr 2010-09-08 13:27:46 +0000
20@@ -75,6 +75,16 @@
21 else:
22 import locale
23
24+if os.name == "posix":
25+ try:
26+ locale.setlocale(locale.LC_ALL, '')
27+ except locale.Error, e:
28+ sys.stderr.write('bzr: warning: %s\n'
29+ ' bzr could not set the application locale.\n'
30+ ' Although this should be no problem for bzr itself, it might\n'
31+ ' cause problems with some plugins. To investigate the issue,\n'
32+ ' look at the output of the locale(1p) tool.\n' % e)
33+
34
35 # The python2.6 release includes some libraries that have deprecation warnings
36 # against the interpreter - see https://bugs.launchpad.net/bzr/+bug/387139
37@@ -86,17 +96,6 @@
38 )
39
40
41-try:
42- locale.setlocale(locale.LC_ALL, '')
43-except locale.Error, e:
44- sys.stderr.write('bzr: warning: %s\n'
45- ' bzr could not set the application locale.\n'
46- ' Although this should be no problem for bzr itself,\n'
47- ' it might cause problems with some plugins.\n'
48- ' To investigate the issue, look at the output\n'
49- ' of the locale(1p) tool available on POSIX systems.\n'
50- % e)
51-
52 # instruct bzrlib/__init__.py to install lazy_regex
53 sys._bzr_lazy_regex = True
54 try:

Subscribers

People subscribed via source and target branches