Merge lp:~songofacandy/bzr/fix-523746 into lp:bzr/2.0

Proposed by methane
Status: Superseded
Proposed branch: lp:~songofacandy/bzr/fix-523746
Merge into: lp:bzr/2.0
Diff against target: 95 lines (+24/-13)
3 files modified
bzrlib/diff.py (+7/-12)
bzrlib/osutils.py (+10/-1)
bzrlib/tests/test_osutils.py (+7/-0)
To merge this branch: bzr merge lp:~songofacandy/bzr/fix-523746
Reviewer Review Type Date Requested Status
bzr-core Pending
Review via email: mp+19734@code.launchpad.net
To post a comment you must log in.
Revision history for this message
methane (songofacandy) wrote :

Fix UnicodeEncodeError when invoking external diff with non ASCII filenames.

Revision history for this message
Robert Collins (lifeless) wrote :

I recall subprocess being very slow to import; it worries me that we'll
be importing it all the time.

A quick check on my machine took about 10ms to import, so its definitely
noticable.

-Rob

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> I recall subprocess being very slow to import; it worries me that we'll
> be importing it all the time.
>
> A quick check on my machine took about 10ms to import, so its definitely
> noticable.
>
> -Rob
>

It can pretty easily be moved into the lazy_import section.

Also, this doesn't handle stuff like Unicode filenames that are valid on
disk, but not valid in user encoding. (Which can be pretty easy to
trigger on Windows.)

However, it is probably better than what we have today.

Actually, we have a different problem. In that 'get_user_encoding()' !=
'sys.get_filesystemencoding'. I'm not 100% sure what needs to be passed
to programs.

The real fix is to use CreateProcessW and pass it a unicode string,
rather than trying to get everything round-tripped across an encoding
barrier.

This may be a stepping point in that direction, though.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkt+9AQACgkQJdeBCYSNAANcDACeMCCHcSVntHnO3XkvWLzCj82R
H2IAoMGFEiBOXh+t4Z0W+C64WG59gPWo
=6JUh
-----END PGP SIGNATURE-----

Revision history for this message
methane (songofacandy) wrote :

>> I recall subprocess being very slow to import; it worries me that we'll
>> be importing it all the time.
>>
>> A quick check on my machine took about 10ms to import, so its definitely
>> noticable.
>>
>> -Rob
>>
>
>It can pretty easily be moved into the lazy_import section.

osutils.Popen is a class. So subprocess imported every times
when import osutils.
I'll make osutils.Popen as a factory function instead of class.

> Actually, we have a different problem. In that 'get_user_encoding()' !=
> 'sys.get_filesystemencoding'. I'm not 100% sure what needs to be passed
> to programs.

Yes. Arguments contains file path and others texts.
I think there are no single right way to encode process arguments.
But I change to use filesystemencoding because file paths appears in
arguments rather than non-ASCII text arguments.
In most case, filesystemencoding == user_encoding(). So I think using
filesystemencoding solves 90% case.

> The real fix is to use CreateProcessW and pass it a unicode string,
> rather than trying to get everything round-tripped across an encoding
> barrier.

On Windows, CreateProcessW is good. But this bug affects not only Windows.
Popen encoding arguments with defaultencoding so popen can't encode utf-8
file path on Linux, too.

To use CreateProcessW, we should decode bytes strings into unicode.
This brings another problem: "What encoding should we use when decoding".
And to use CreateProcessW, we should change subprocess internals or reimplement
subprocess module.

So, just for now, I'd like to fix 90% case.

lp:~songofacandy/bzr/fix-523746 updated
4737. By methane

Make Popen as a factory function.

4738. By methane

Use _fs_enc instead of user_encoding()

4739. By methane

Add test and fix easy miss.

Revision history for this message
methane (songofacandy) wrote :

I resubmitted merge proposal.

lp:~songofacandy/bzr/fix-523746 updated
4740. By methane

Fix import ordering.

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

INADA Naoki wrote:
>>> I recall subprocess being very slow to import; it worries me that we'll
>>> be importing it all the time.
>>>
>>> A quick check on my machine took about 10ms to import, so its definitely
>>> noticable.
>>>
>>> -Rob
>>>
>> It can pretty easily be moved into the lazy_import section.
>
> osutils.Popen is a class. So subprocess imported every times
> when import osutils.
> I'll make osutils.Popen as a factory function instead of class.
>
>
>> Actually, we have a different problem. In that 'get_user_encoding()' !=
>> 'sys.get_filesystemencoding'. I'm not 100% sure what needs to be passed
>> to programs.
>
> Yes. Arguments contains file path and others texts.
> I think there are no single right way to encode process arguments.
> But I change to use filesystemencoding because file paths appears in
> arguments rather than non-ASCII text arguments.
> In most case, filesystemencoding == user_encoding(). So I think using
> filesystemencoding solves 90% case.

Only outsidef of Windows. On Windows filesystemencoding == mbcs, but
user_encoding is something like 'cp1252'. (and terminal encoding is
cp437...)

Now, I don't know whether arbitrary programs are going to use filesystem
encoding or user_encoding for decoding their arguments. I *think* that
argument decoding would be an OS sort of thing, and we use
filesystemencoding for ENV variables and thus it would make sense for
arguments. bzr itself used user_encoding before we use the Unicode api.

>
>> The real fix is to use CreateProcessW and pass it a unicode string,
>> rather than trying to get everything round-tripped across an encoding
>> barrier.
>
> On Windows, CreateProcessW is good. But this bug affects not only Windows.
> Popen encoding arguments with defaultencoding so popen can't encode utf-8
> file path on Linux, too.

Sure. And on Linux, it has pretty much standardized that everything is
in utf-8. Which is *soooo* much nicer. (file content, filenames, env
vars, terminal encoding, etc.)

>
> To use CreateProcessW, we should decode bytes strings into unicode.
> This brings another problem: "What encoding should we use when decoding".
> And to use CreateProcessW, we should change subprocess internals or reimplement
> subprocess module.
>
> So, just for now, I'd like to fix 90% case.
>

As long as we don't break other cases, I'm fine with that.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkuC3GUACgkQJdeBCYSNAAMUuwCfS28ggzvo1JNRialesoqv8Slw
U2MAn0J3tGr5HGGOPIqEz+w7CCm10hKm
=V9+v
-----END PGP SIGNATURE-----

lp:~songofacandy/bzr/fix-523746 updated
4741. By methane

Don't use Conditional Expressions for Python 2.4 compatibility.

Unmerged revisions

4741. By methane

Don't use Conditional Expressions for Python 2.4 compatibility.

4740. By methane

Fix import ordering.

4739. By methane

Add test and fix easy miss.

4738. By methane

Use _fs_enc instead of user_encoding()

4737. By methane

Make Popen as a factory function.

4736. By methane

Wrap Popen() to support unicode arguments.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bzrlib/diff.py'
2--- bzrlib/diff.py 2009-07-29 21:35:05 +0000
3+++ bzrlib/diff.py 2010-02-20 05:02:13 +0000
4@@ -131,11 +131,11 @@
5 stderr = None
6
7 try:
8- pipe = subprocess.Popen(diffcmd,
9- stdin=subprocess.PIPE,
10- stdout=subprocess.PIPE,
11- stderr=stderr,
12- env=env)
13+ pipe = osutils.Popen(diffcmd,
14+ stdin=subprocess.PIPE,
15+ stdout=subprocess.PIPE,
16+ stderr=stderr,
17+ env=env)
18 except OSError, e:
19 if e.errno == errno.ENOENT:
20 raise errors.NoDiff(str(e))
21@@ -171,11 +171,6 @@
22
23 if not diff_opts:
24 diff_opts = []
25- if sys.platform == 'win32':
26- # Popen doesn't do the proper encoding for external commands
27- # Since we are dealing with an ANSI api, use mbcs encoding
28- old_filename = old_filename.encode('mbcs')
29- new_filename = new_filename.encode('mbcs')
30 diffcmd = ['diff',
31 '--label', old_filename,
32 old_abspath,
33@@ -687,8 +682,8 @@
34 def _execute(self, old_path, new_path):
35 command = self._get_command(old_path, new_path)
36 try:
37- proc = subprocess.Popen(command, stdout=subprocess.PIPE,
38- cwd=self._root)
39+ proc = osutils.Popen(command, stdout=subprocess.PIPE,
40+ cwd=self._root)
41 except OSError, e:
42 if e.errno == errno.ENOENT:
43 raise errors.ExecutableMissing(command[0])
44
45=== modified file 'bzrlib/osutils.py'
46--- bzrlib/osutils.py 2009-10-08 03:55:30 +0000
47+++ bzrlib/osutils.py 2010-02-20 05:02:13 +0000
48@@ -34,11 +34,11 @@
49 splitdrive as _nt_splitdrive,
50 )
51 import posixpath
52+import subprocess
53 import shutil
54 from shutil import (
55 rmtree,
56 )
57-import subprocess
58 import tempfile
59 from tempfile import (
60 mkdtemp,
61@@ -1907,3 +1907,12 @@
62 if use_cache:
63 _cached_concurrency = concurrency
64 return concurrency
65+
66+def Popen(subprocess_args, *args, **kwargs):
67+ """A wrapper for Popen for unicode arguments."""
68+ if not isinstance(subprocess_args, basestring):
69+ subprocess_args = [s.encode(_fs_enc) if isinstance(s, unicode) else s
70+ for s in subprocess_args]
71+ elif isinstance(subprocess_args, unicode):
72+ subprocess_args = subprocess_args.encode(_fs_enc)
73+ return subprocess.Popen(subprocess_args, *args, **kwargs)
74
75=== modified file 'bzrlib/tests/test_osutils.py'
76--- bzrlib/tests/test_osutils.py 2009-10-08 17:13:24 +0000
77+++ bzrlib/tests/test_osutils.py 2010-02-20 05:02:13 +0000
78@@ -24,6 +24,7 @@
79 import stat
80 import sys
81 import time
82+import subprocess
83
84 from bzrlib import (
85 errors,
86@@ -1841,3 +1842,9 @@
87 def test_local_concurrency(self):
88 concurrency = osutils.local_concurrency()
89 self.assertIsInstance(concurrency, int)
90+
91+class TestPopen(tests.TestCase):
92+
93+ def test_unicode_args(self):
94+ proc = osutils.Popen(["echo", u"a\N{Euro Sign}b"], stdout=subprocess.PIPE)
95+ self.assertEquals(proc.wait(), 0)

Subscribers

People subscribed via source and target branches