Merge lp:~songofacandy/bzr/fix-523746-2 into lp:bzr/2.0
- fix-523746-2
- Merge into 2.0
Status: | Rejected | ||||
---|---|---|---|---|---|
Rejected by: | Vincent Ladeuil | ||||
Proposed branch: | lp:~songofacandy/bzr/fix-523746-2 | ||||
Merge into: | lp:bzr/2.0 | ||||
Diff against target: |
52 lines (+23/-5) 1 file modified
bzrlib/diff.py (+23/-5) |
||||
To merge this branch: | bzr merge lp:~songofacandy/bzr/fix-523746-2 | ||||
Related bugs: |
|
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Vincent Ladeuil | Needs Information | ||
Martin Packman (community) | Needs Information | ||
Review via email: mp+19938@code.launchpad.net |
Commit message
Description of the change
methane (songofacandy) wrote : | # |
Martin Packman (gz) wrote : | # |
I think the basic idea is sound here, just a couple of questions.
Are the mangled filenames going to be in the output of the external tool? If so there needs to be some unmangling code.
Can relpath be a string like u"docs/
Could use `sys.getfilesys
methane (songofacandy) wrote : | # |
> Are the mangled filenames going to be in the output of the external tool? If
> so there needs to be some unmangling code.
These filenames are just an input of the external diff tool.
> Can relpath be a string like u"docs/
> path components, should use 'replace' for encoding errors and then do
> .replace("?", "_") to make it windows-happy.
You're right! Adding "_" prefix is not enough.
> Could use `sys.getfilesys
> the encoding. Likewise could use `(path) or 'FILE'` rather than `'_'+(path)`
> to avoid the empty string.
I've made it.
Please review again.
Martin Packman (gz) wrote : | # |
Ack, was that big a follow up needed? I only intended to suggest something like:
- full_path = osutils.
+ full_path = osutils.
+ sys.getfilesyst
+ "?", "_"))
The new encoding-
methane (songofacandy) wrote : | # |
> Ack, was that big a follow up needed? I only intended to suggest something
> like:
>
> - full_path = osutils.
> + full_path = osutils.
> + sys.getfilesyst
> + "?", "_"))
>
> The new encoding-
> `_get_command`, what prompted you to add that when the last version was so
> simple?
Multibyte encoding may contains '?' as a trailing char.
So ``full_
multibyte char.
And I hate self._root or prefix is changed through encode().decode() and bzr
attempts to write outside of _root.
So I do:
1) Write temporary file with safe unicode path.
2) Launch external diff with encoded path.
Martin Packman (gz) wrote : | # |
> Multibyte encoding may contains '?' as a trailing char.
> So ``full_
> multibyte char.
Really? Even SJIS avoids that one, and it's pretty good at breaking naïve programs. I get the reasoning now though.
> And I hate self._root or prefix is changed through encode().decode() and bzr
> attempts to write outside of _root.
> So I do:
> 1) Write temporary file with safe unicode path.
> 2) Launch external diff with encoded path.
Hm, I see. Probably overly cautious though, the windows filesystem layer will just be decoding by the system locale internally anyway.
methane (songofacandy) wrote : | # |
> > Multibyte encoding may contains '?' as a trailing char.
> > So ``full_
> > multibyte char.
>
> Really? Even SJIS avoids that one, and it's pretty good at breaking naïve
> programs. I get the reasoning now though.
I've checked SJIS and BIG5 a little while ago, and find '?' (0x3f) is not in trailing char.
So replace('?', '_') may be safe.
But I doesn't like depending character mapping.
I think encoded string is magic bytes and modifying it is dangerous.
Martin Packman (gz) wrote : | # |
> But I doesn't like depending character mapping.
> I think encoded string is magic bytes and modifying it is dangerous.
Yeah, fair enough. Perhaps dealing with this just for tempfiles is too much and sticking to ascii-only for the mangling is the best choice.
methane (songofacandy) wrote : | # |
> Perhaps dealing with this just for tempfiles is too much
> and sticking to ascii-only for the mangling is the best choice.
So I use ascii before.
But I find filename of tempfile is shown on GUI Diff title.
"old/????/????.txt - new/????/????.txt" isn't cool.
John A Meinel (jameinel) wrote : | # |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Martin [gz] wrote:
> Ack, was that big a follow up needed? I only intended to suggest something like:
>
> - full_path = osutils.
> + full_path = osutils.
> + sys.getfilesyst
> + "?", "_"))
^- This should probably be 'osutils._fs_enc' where we already handle fs
encoding of None.
>
> The new encoding-
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAku
oqoAn3NAsSjyZ7v
=eWJM
-----END PGP SIGNATURE-----
methane (songofacandy) wrote : | # |
> Martin [gz] wrote:
> > Ack, was that big a follow up needed? I only intended to suggest something
> like:
> >
> > - full_path = osutils.
> > + full_path = osutils.
> > + sys.getfilesyst
> > + "?", "_"))
>
> ^- This should probably be 'osutils._fs_enc' where we already handle fs
> encoding of None.
subprocess.Popen uses os.execv(), and os.execv() use Py_FileSystemDe
osutils._fs_enc fallbacks to 'utf-8'. So using osutils._fs_enc may cause UnicodeEncodeError on Popen.
# I think osutils._fs_enc fallbacks to utf-8 is good because I hope bzr uses
# utf-8 when I set LANG=C.
John A Meinel (jameinel) wrote : | # |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
INADA Naoki wrote:
>> Martin [gz] wrote:
>>> Ack, was that big a follow up needed? I only intended to suggest something
>> like:
>>> - full_path = osutils.
>>> + full_path = osutils.
>>> + sys.getfilesyst
>>> + "?", "_"))
>> ^- This should probably be 'osutils._fs_enc' where we already handle fs
>> encoding of None.
>
> subprocess.Popen uses os.execv(), and os.execv() use Py_FileSystemDe
>
> osutils._fs_enc fallbacks to 'utf-8'. So using osutils._fs_enc may cause UnicodeEncodeError on Popen.
> # I think osutils._fs_enc fallbacks to utf-8 is good because I hope bzr uses
> # utf-8 when I set LANG=C.
If you set LANG=C then we see fs_enc as ascii, not None.
The change was primarily for BSD, where fs_enc was None, and we wanted
UTF-8 as the most reasonable option.
However, if execv is going to fail if we pass something other than
ascii, then fair enough.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAku
ZnMAoJO+
=W/Bm
-----END PGP SIGNATURE-----
Vincent Ladeuil (vila) wrote : | # |
This sounds far too controversial for landing in 2.0 (especially without tests), can we have a summary from Naoki and reviewers about landing this patch into bzr.dev instead ?
methane (songofacandy) wrote : | # |
This problem depends on OS, locale settings and other environment.
So I agree with you that this patch should land to bzr.dev and be tested on
vary environment while bzr 2.2 is released.
methane (songofacandy) wrote : | # |
I've made new merge proposal to bzr.dev.
https:/
Preview Diff
1 | === modified file 'bzrlib/diff.py' | |||
2 | --- bzrlib/diff.py 2009-07-29 21:35:05 +0000 | |||
3 | +++ bzrlib/diff.py 2010-02-26 11:17:14 +0000 | |||
4 | @@ -682,7 +682,17 @@ | |||
5 | 682 | 682 | ||
6 | 683 | def _get_command(self, old_path, new_path): | 683 | def _get_command(self, old_path, new_path): |
7 | 684 | my_map = {'old_path': old_path, 'new_path': new_path} | 684 | my_map = {'old_path': old_path, 'new_path': new_path} |
9 | 685 | return [t % my_map for t in self.command_template] | 685 | command = [t % my_map for t in self.command_template] |
10 | 686 | if sys.platform == 'win32': # Popen doesn't accept unicode on win32 | ||
11 | 687 | command_encoded = [] | ||
12 | 688 | for c in command: | ||
13 | 689 | if isinstance(c, unicode): | ||
14 | 690 | command_encoded.append(c.encode('mbcs')) | ||
15 | 691 | else: | ||
16 | 692 | command_encoded.append(c) | ||
17 | 693 | return command_encoded | ||
18 | 694 | else: | ||
19 | 695 | return command | ||
20 | 686 | 696 | ||
21 | 687 | def _execute(self, old_path, new_path): | 697 | def _execute(self, old_path, new_path): |
22 | 688 | command = self._get_command(old_path, new_path) | 698 | command = self._get_command(old_path, new_path) |
23 | @@ -709,7 +719,15 @@ | |||
24 | 709 | return True | 719 | return True |
25 | 710 | 720 | ||
26 | 711 | def _write_file(self, file_id, tree, prefix, relpath): | 721 | def _write_file(self, file_id, tree, prefix, relpath): |
28 | 712 | full_path = osutils.pathjoin(self._root, prefix, relpath) | 722 | if sys.platform == 'win32': |
29 | 723 | fenc = 'mbcs' | ||
30 | 724 | else: | ||
31 | 725 | fenc = sys.getfilesystemencoding() | ||
32 | 726 | # encoded_str.replace('?', '_') may break multibyte char. | ||
33 | 727 | # So we should encode, decode, then replace(u'?', u'_') | ||
34 | 728 | relpath_tmp = relpath.encode(fenc, 'replace').decode(fenc, 'replace') | ||
35 | 729 | relpath_tmp = relpath_tmp.replace(u'?', u'_') | ||
36 | 730 | full_path = osutils.pathjoin(self._root, prefix, relpath_tmp) | ||
37 | 713 | if self._try_symlink_root(tree, prefix): | 731 | if self._try_symlink_root(tree, prefix): |
38 | 714 | return full_path | 732 | return full_path |
39 | 715 | parent_dir = osutils.dirname(full_path) | 733 | parent_dir = osutils.dirname(full_path) |
40 | @@ -750,9 +768,9 @@ | |||
41 | 750 | def diff(self, file_id, old_path, new_path, old_kind, new_kind): | 768 | def diff(self, file_id, old_path, new_path, old_kind, new_kind): |
42 | 751 | if (old_kind, new_kind) != ('file', 'file'): | 769 | if (old_kind, new_kind) != ('file', 'file'): |
43 | 752 | return DiffPath.CANNOT_DIFF | 770 | return DiffPath.CANNOT_DIFF |
47 | 753 | self._prepare_files(file_id, old_path, new_path) | 771 | old_disk_path, new_disk_path = \ |
48 | 754 | self._execute(osutils.pathjoin('old', old_path), | 772 | self._prepare_files(file_id, old_path, new_path) |
49 | 755 | osutils.pathjoin('new', new_path)) | 773 | self._execute(old_disk_path, new_disk_path) |
50 | 756 | 774 | ||
51 | 757 | 775 | ||
52 | 758 | class DiffTree(object): | 776 | class DiffTree(object): |
Make sure that temporary file name is ASCII.