Merge lp:~lovesyao/bzr/windows-utf8 into lp:~bzr/bzr/trunk-old

Proposed by Nazo on 2008-11-20
Status: Work in progress
Proposed branch: lp:~lovesyao/bzr/windows-utf8
Merge into: lp:~bzr/bzr/trunk-old
Diff against target: 95 lines
To merge this branch: bzr merge lp:~lovesyao/bzr/windows-utf8
Reviewer Review Type Date Requested Status
John A Meinel Needs Fixing on 2009-06-08
Robert Collins (community) 2008-11-20 Abstain on 2009-05-06
Review via email:
To post a comment you must log in.
Robert Collins (lifeless) wrote :

I'm not in a position to comment on the correctness of this. I would like to note that it really belongs deeper in the library where our existing encoding support is.

review: Abstain
John A Meinel (jameinel) wrote :

Something like this seems potentially very nice. However as Robert mentions, there are some changes that need to happen.

1) All of the logic should be moved out of 'bzr' and into something like 'bzrlib/'. It would be nice to have it as an independent class that perhaps wasn't strictly tied to Windows, so we could write unit tests for it that run everywhere.

For example, if you can just subclass with:

class LoggingConsoleWriter(WindowsConsoleWriter):

  def _write(self, text):

Then you can test that things are getting mapped into UTF-16, etc properly.

+ def writelines(self, text):
+ self.write(text + "\n")

^- I'm 95% sure that this is completely wrong. "writelines(self, lines)" is the api, where 'lines' is an iterable of strings, not a simple text string. (I'm guessing you are thinking of the Java api, writeline(), though my Java is weak, I do remember something that would add '\n' for you.)

Also, having unit tests lets us test stuff like _color() is getting called on bad characters, etc.

I think the reason you are seeing UTF-8 weirdness, is that we have defined the 'note()' functionality as writing out to a UTF-8 stream, which goes to stdout. (Which is probably also why you ended up realizing that you needed to overwrite stderr as well as stdout.)

My guess is that replacing sys.stdout and sys.stderr unilaterally is not the correct fix, and instead we should be just hooking into things like 'note()' and making sure it gets formatted correctly.

This can also get hooked into our infrastructure for "Command.encoding_type" which handles stuff like "the bytes of files should be output exactly as read from disk" but "log should show User names in appropriate encoding."

review: Needs Fixing
Alexander Belchenko (bialix) wrote :

> Something like this seems potentially very nice. However as Robert mentions,
> there are some changes that need to happen.
> 1) All of the logic should be moved out of 'bzr' and into something like
> 'bzrlib/'.

You mean 'bzrlib/' actually.

Martin Pool (mbp) wrote :

beyond jam's comment, it would be nice to be clear on just what bug is causing this.

I guess that it is that sys.stdout.write(u'文字化け') gets mangled, so it's nothing specific to our note() or logging infrastructure. It's conceivable (I'm not sure) that on Windows these streams are irretrievably in an encoding that cannot represent all unicode characters and the best thing really is to use a win32 api that is unicode-clean. In that case perhaps replacing something at a very global level really is best, though I might try to do it more through UIFactory than sys.stdout.

Supporting colors would be interesting but keying off reserved unicode characters to do it is a bit weird.

Parth Malwankar (parthm) wrote :

> Supporting colors would be interesting but keying off reserved unicode
> characters to do it is a bit weird.

It would be good to have bzrlib support color on windows and unix.
As this feature is not present, plugins end up with their own support. E.g. lp:bzrtools[1] and lp:bzr-grep[2] have their own implementation of color (unix only). I know there is a PyPI package (colorama[3] if I remember the code correctly) that supports color on both windows and unix. On Windows, from the developer/user perspective it translates the standard unix escape sequence into the appropriate windows equivalent. Maybe we can just use colorama or use a similar approach. Having bzrlib color support will also allow other commands to log etc. to have color.

Hmm. Maybe this should be a separate wishlist bug.


Unmerged revisions

3844. By Nazo on 2008-11-20

Fix mojibake of utf_8 character for stderr on Windows Command Prompt

3843. By Nazo on 2008-11-20

cleanup code

3842. By Nazo on 2008-11-19

Fix mojibake of utf_8 character on Windows Command Prompt

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bzr'
2--- bzr 2009-08-28 05:11:10 +0000
3+++ bzr 2009-08-31 04:38:40 +0000
4@@ -53,6 +53,91 @@
5 os.unsetenv(REINVOKE)
8+if sys.platform == 'win32':
9+ from ctypes import windll, Structure, c_short, c_ushort, byref
10+ import codecs
12+ class COORD(Structure):
13+ _fields_ = [
14+ ("X", c_short),
15+ ("Y", c_short)
16+ ]
18+ class SMALL_RECT(Structure):
19+ _fields_ = [
20+ ("Left", c_short),
21+ ("Top", c_short),
22+ ("Right", c_short),
23+ ("Bottom", c_short)
24+ ]
26+ class CONSOLE_SCREEN_BUFFER_INFO(Structure):
27+ _fields_ = [
28+ ("dwSize", COORD),
29+ ("dwCursorPosition", COORD),
30+ ("wAttributes", c_ushort),
31+ ("srWindow", SMALL_RECT),
32+ ("dwMaximumWindowSize", COORD)
33+ ]
38+ def consolereplace_errors(exception):
39+ if not isinstance(exception, UnicodeDecodeError):
40+ raise TypeError("consolereplace() cannot handle %r" % exception)
41+ l = [u"\\%x" % ord(exception.object[pos]) for pos in xrange(exception.start, exception.end)]
42+ return (u"\ufffe%s\ufeff" % u"".join(l), exception.end)
44+ codecs.register_error("consolereplace", consolereplace_errors)
48+ class WindowsConsoleWriter:
49+ def __init__(self, stdhandle, errors='strict'):
50+ self.errors = errors
51+ self.encoding = 'utf_16'
52+ self.handle = windll.kernel32.GetStdHandle(stdhandle)
55+ windll.kernel32.GetConsoleScreenBufferInfo(self.handle, byref(csbi))
57+ self.defattr = csbi.wAttributes
59+ def writelines(self, text):
60+ self.write(text + "\n")
62+ def _write(self, text):
63+ windll.kernel32.WriteConsoleW(self.handle, text, len(text), 0, 0)
65+ def _color(self, color):
66+ windll.kernel32.SetConsoleTextAttribute(self.handle, color)
68+ def write(self, text):
69+ text = unicode(text, 'utf_8', errors='consolereplace')
70+ i = 0
71+ j = 0
72+ for c in text:
73+ if c == u'\ufffe':
74+ self._write(text[i:j])
75+ i = j+1
76+ self._color(FOREGROUND_RED)
77+ elif c == u'\ufeff':
78+ self._write(text[i:j])
79+ i = j+1
80+ self._color(self.defattr)
81+ j+=1
82+ self._write(text[i:j])
84+ def isatty(self): return True
86+ def flush(self): pass
88+ if sys.stdout.isatty():
89+ sys.stdout = WindowsConsoleWriter(STD_OUTPUT_HANDLE)
90+ if sys.stderr.isatty():
91+ sys.stderr = WindowsConsoleWriter(STD_ERROR_HANDLE)
93 profiling = False
94 if '--profile-imports' in sys.argv:
95 sys.argv.remove('--profile-imports')