Merge lp:~songofacandy/bzr/fix-523746-2 into lp:bzr/2.0
- fix-523746-2
- Merge into 2.0
Status: | Rejected | ||||
---|---|---|---|---|---|
Rejected by: | Vincent Ladeuil | ||||
Proposed branch: | lp:~songofacandy/bzr/fix-523746-2 | ||||
Merge into: | lp:bzr/2.0 | ||||
Diff against target: |
52 lines (+23/-5) 1 file modified
bzrlib/diff.py (+23/-5) |
||||
To merge this branch: | bzr merge lp:~songofacandy/bzr/fix-523746-2 | ||||
Related bugs: |
|
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Vincent Ladeuil | Needs Information | ||
Martin Packman (community) | Needs Information | ||
Review via email: mp+19938@code.launchpad.net |
Commit message
Description of the change
methane (songofacandy) wrote : | # |
Martin Packman (gz) wrote : | # |
I think the basic idea is sound here, just a couple of questions.
Are the mangled filenames going to be in the output of the external tool? If so there needs to be some unmangling code.
Can relpath be a string like u"docs/
Could use `sys.getfilesys
methane (songofacandy) wrote : | # |
> Are the mangled filenames going to be in the output of the external tool? If
> so there needs to be some unmangling code.
These filenames are just an input of the external diff tool.
> Can relpath be a string like u"docs/
> path components, should use 'replace' for encoding errors and then do
> .replace("?", "_") to make it windows-happy.
You're right! Adding "_" prefix is not enough.
> Could use `sys.getfilesys
> the encoding. Likewise could use `(path) or 'FILE'` rather than `'_'+(path)`
> to avoid the empty string.
I've made it.
Please review again.
Martin Packman (gz) wrote : | # |
Ack, was that big a follow up needed? I only intended to suggest something like:
- full_path = osutils.
+ full_path = osutils.
+ sys.getfilesyst
+ "?", "_"))
The new encoding-
methane (songofacandy) wrote : | # |
> Ack, was that big a follow up needed? I only intended to suggest something
> like:
>
> - full_path = osutils.
> + full_path = osutils.
> + sys.getfilesyst
> + "?", "_"))
>
> The new encoding-
> `_get_command`, what prompted you to add that when the last version was so
> simple?
Multibyte encoding may contains '?' as a trailing char.
So ``full_
multibyte char.
And I hate self._root or prefix is changed through encode().decode() and bzr
attempts to write outside of _root.
So I do:
1) Write temporary file with safe unicode path.
2) Launch external diff with encoded path.
Martin Packman (gz) wrote : | # |
> Multibyte encoding may contains '?' as a trailing char.
> So ``full_
> multibyte char.
Really? Even SJIS avoids that one, and it's pretty good at breaking naïve programs. I get the reasoning now though.
> And I hate self._root or prefix is changed through encode().decode() and bzr
> attempts to write outside of _root.
> So I do:
> 1) Write temporary file with safe unicode path.
> 2) Launch external diff with encoded path.
Hm, I see. Probably overly cautious though, the windows filesystem layer will just be decoding by the system locale internally anyway.
methane (songofacandy) wrote : | # |
> > Multibyte encoding may contains '?' as a trailing char.
> > So ``full_
> > multibyte char.
>
> Really? Even SJIS avoids that one, and it's pretty good at breaking naïve
> programs. I get the reasoning now though.
I've checked SJIS and BIG5 a little while ago, and find '?' (0x3f) is not in trailing char.
So replace('?', '_') may be safe.
But I doesn't like depending character mapping.
I think encoded string is magic bytes and modifying it is dangerous.
Martin Packman (gz) wrote : | # |
> But I doesn't like depending character mapping.
> I think encoded string is magic bytes and modifying it is dangerous.
Yeah, fair enough. Perhaps dealing with this just for tempfiles is too much and sticking to ascii-only for the mangling is the best choice.
methane (songofacandy) wrote : | # |
> Perhaps dealing with this just for tempfiles is too much
> and sticking to ascii-only for the mangling is the best choice.
So I use ascii before.
But I find filename of tempfile is shown on GUI Diff title.
"old/????/????.txt - new/????/????.txt" isn't cool.
John A Meinel (jameinel) wrote : | # |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Martin [gz] wrote:
> Ack, was that big a follow up needed? I only intended to suggest something like:
>
> - full_path = osutils.
> + full_path = osutils.
> + sys.getfilesyst
> + "?", "_"))
^- This should probably be 'osutils._fs_enc' where we already handle fs
encoding of None.
>
> The new encoding-
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAku
oqoAn3NAsSjyZ7v
=eWJM
-----END PGP SIGNATURE-----
methane (songofacandy) wrote : | # |
> Martin [gz] wrote:
> > Ack, was that big a follow up needed? I only intended to suggest something
> like:
> >
> > - full_path = osutils.
> > + full_path = osutils.
> > + sys.getfilesyst
> > + "?", "_"))
>
> ^- This should probably be 'osutils._fs_enc' where we already handle fs
> encoding of None.
subprocess.Popen uses os.execv(), and os.execv() use Py_FileSystemDe
osutils._fs_enc fallbacks to 'utf-8'. So using osutils._fs_enc may cause UnicodeEncodeError on Popen.
# I think osutils._fs_enc fallbacks to utf-8 is good because I hope bzr uses
# utf-8 when I set LANG=C.
John A Meinel (jameinel) wrote : | # |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
INADA Naoki wrote:
>> Martin [gz] wrote:
>>> Ack, was that big a follow up needed? I only intended to suggest something
>> like:
>>> - full_path = osutils.
>>> + full_path = osutils.
>>> + sys.getfilesyst
>>> + "?", "_"))
>> ^- This should probably be 'osutils._fs_enc' where we already handle fs
>> encoding of None.
>
> subprocess.Popen uses os.execv(), and os.execv() use Py_FileSystemDe
>
> osutils._fs_enc fallbacks to 'utf-8'. So using osutils._fs_enc may cause UnicodeEncodeError on Popen.
> # I think osutils._fs_enc fallbacks to utf-8 is good because I hope bzr uses
> # utf-8 when I set LANG=C.
If you set LANG=C then we see fs_enc as ascii, not None.
The change was primarily for BSD, where fs_enc was None, and we wanted
UTF-8 as the most reasonable option.
However, if execv is going to fail if we pass something other than
ascii, then fair enough.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAku
ZnMAoJO+
=W/Bm
-----END PGP SIGNATURE-----
Vincent Ladeuil (vila) wrote : | # |
This sounds far too controversial for landing in 2.0 (especially without tests), can we have a summary from Naoki and reviewers about landing this patch into bzr.dev instead ?
methane (songofacandy) wrote : | # |
This problem depends on OS, locale settings and other environment.
So I agree with you that this patch should land to bzr.dev and be tested on
vary environment while bzr 2.2 is released.
methane (songofacandy) wrote : | # |
I've made new merge proposal to bzr.dev.
https:/
Preview Diff
1 | === modified file 'bzrlib/diff.py' |
2 | --- bzrlib/diff.py 2009-07-29 21:35:05 +0000 |
3 | +++ bzrlib/diff.py 2010-02-26 11:17:14 +0000 |
4 | @@ -682,7 +682,17 @@ |
5 | |
6 | def _get_command(self, old_path, new_path): |
7 | my_map = {'old_path': old_path, 'new_path': new_path} |
8 | - return [t % my_map for t in self.command_template] |
9 | + command = [t % my_map for t in self.command_template] |
10 | + if sys.platform == 'win32': # Popen doesn't accept unicode on win32 |
11 | + command_encoded = [] |
12 | + for c in command: |
13 | + if isinstance(c, unicode): |
14 | + command_encoded.append(c.encode('mbcs')) |
15 | + else: |
16 | + command_encoded.append(c) |
17 | + return command_encoded |
18 | + else: |
19 | + return command |
20 | |
21 | def _execute(self, old_path, new_path): |
22 | command = self._get_command(old_path, new_path) |
23 | @@ -709,7 +719,15 @@ |
24 | return True |
25 | |
26 | def _write_file(self, file_id, tree, prefix, relpath): |
27 | - full_path = osutils.pathjoin(self._root, prefix, relpath) |
28 | + if sys.platform == 'win32': |
29 | + fenc = 'mbcs' |
30 | + else: |
31 | + fenc = sys.getfilesystemencoding() |
32 | + # encoded_str.replace('?', '_') may break multibyte char. |
33 | + # So we should encode, decode, then replace(u'?', u'_') |
34 | + relpath_tmp = relpath.encode(fenc, 'replace').decode(fenc, 'replace') |
35 | + relpath_tmp = relpath_tmp.replace(u'?', u'_') |
36 | + full_path = osutils.pathjoin(self._root, prefix, relpath_tmp) |
37 | if self._try_symlink_root(tree, prefix): |
38 | return full_path |
39 | parent_dir = osutils.dirname(full_path) |
40 | @@ -750,9 +768,9 @@ |
41 | def diff(self, file_id, old_path, new_path, old_kind, new_kind): |
42 | if (old_kind, new_kind) != ('file', 'file'): |
43 | return DiffPath.CANNOT_DIFF |
44 | - self._prepare_files(file_id, old_path, new_path) |
45 | - self._execute(osutils.pathjoin('old', old_path), |
46 | - osutils.pathjoin('new', new_path)) |
47 | + old_disk_path, new_disk_path = \ |
48 | + self._prepare_files(file_id, old_path, new_path) |
49 | + self._execute(old_disk_path, new_disk_path) |
50 | |
51 | |
52 | class DiffTree(object): |
Make sure that temporary file name is ASCII.