Merge lp:~bialix/bzr/diff-header-encoding into lp:bzr
- diff-header-encoding
- Merge into bzr.dev
Status: | Merged |
---|---|
Approved by: | Vincent Ladeuil |
Approved revision: | no longer in the source branch. |
Merged at revision: | 5305 |
Proposed branch: | lp:~bialix/bzr/diff-header-encoding |
Merge into: | lp:bzr |
Diff against target: |
222 lines (+78/-11) 7 files modified
NEWS (+6/-0) bzrlib/builtins.py (+8/-2) bzrlib/diff.py (+8/-5) bzrlib/log.py (+3/-1) bzrlib/osutils.py (+4/-0) bzrlib/shelf_ui.py (+5/-2) bzrlib/tests/test_diff.py (+44/-1) |
To merge this branch: | bzr merge lp:~bialix/bzr/diff-header-encoding |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Martin Pool | Needs Fixing | ||
Martin Packman (community) | Approve | ||
Robert Collins (community) | Needs Information | ||
John A Meinel | Approve | ||
Review via email: mp+26067@code.launchpad.net |
Commit message
Show unicode filenames in diff headers using terminal encoding.
Description of the change
This branch ensure filenames in diff header on windows always will be in mbcs encoding. My next step is to add new command-line option ``--path-encoding`` to some commands to control that encoding.
I would like to get some review on this work to ensure it's heading the right way. Also I would like to know: should I add --path-encoding option in this patch or it could be done as follow-on patch.
Martin Pool (mbp) wrote : | # |
You can't assume paths will always be in utf-8 on unix. You should probably use the output encoding.
So the plan is that in the diff output stream the paths will always be mbcs and the body of the diff may be in some other encoding?
Is 'path_encoding' really the ideal name for this? To me it that name seems confusable with the filesystem path encoding.
John A Meinel (jameinel) wrote : | # |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Martin Pool wrote:
> Review: Needs Fixing
> You can't assume paths will always be in utf-8 on unix. You should probably use the output encoding.
>
Prior to this, we always used utf-8, so I don't feel that this is a blocker.
> So the plan is that in the diff output stream the paths will always be mbcs and the body of the diff may be in some other encoding?
>
> Is 'path_encoding' really the ideal name for this? To me it that name seems confusable with the filesystem path encoding.
Well, for diff, it is the encoding of the path portion, but since we
don't transcode the file content, I don't know that we would want to use
a generic "output" encoding.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAkv
P5EAoMEZrPxtzLJ
=m8IN
-----END PGP SIGNATURE-----
Martin Pool (mbp) wrote : | # |
On 27 May 2010 07:00, John A Meinel <email address hidden> wrote:
>> Is 'path_encoding' really the ideal name for this? To me it that name seems confusable with the filesystem path encoding.
>
> Well, for diff, it is the encoding of the path portion, but since we
> don't transcode the file content, I don't know that we would want to use
> a generic "output" encoding.
Within a diff we will have
- actual file content - not transcoded (but potentially could be in
future if we knew the file encoding and the output encoding)
- diff syntax (like +,-,line numbers) - mostly ascii so it's fairly
easy, but would need to be encoded for eg ucs-2
- filenames - presumably should be in the same encoding as the diff
syntax, but more likely to be non-ascii so more important to encode
Also a diff may occur within the context of a merge directive or log
output, and there too I think you would want all the non-file metadata
in the same encoding, wouldn't you?
--
Martin <http://
Robert Collins (lifeless) wrote : | # |
So AIUI the problem has been that we were using utf8 for path metadata, and mcbs on windows for other stuff.
Isn't the appropriate thing to do to pick 1 encoding - e.g. the most capable of the filesystem encoding and the terminal encoding (or perhaps least capable), and use that for:
- paths
- diff syntax
- bzr chatter
And raw content for the file content itself (but perhaps escaped to not mangle terminals?)
Martin Pool (mbp) wrote : | # |
On 27 May 2010 11:41, Robert Collins <email address hidden> wrote:
> Review: Needs Information
> So AIUI the problem has been that we were using utf8 for path metadata, and mcbs on windows for other stuff.
>
> Isn't the appropriate thing to do to pick 1 encoding - e.g. the most capable of the filesystem encoding and the terminal encoding (or perhaps least capable), and use that for:
> - paths
> - diff syntax
> - bzr chatter
>
> And raw content for the file content itself (but perhaps escaped to not mangle terminals?)
I was pointing to having one encoding for all of these too, except
that I don't think that's the right heuristic to pick it. afaict if
it's going to a terminal we should use the terminal encoding,
otherwise we should use the output encoding.
--
Martin <http://
Robert Collins (lifeless) wrote : | # |
Alexander, will that work for you?
Alexander Belchenko (bialix) wrote : | # |
Robert Collins пишет:
> Review: Needs Information
> So AIUI the problem has been that we were using utf8 for path metadata, and mcbs on windows for other stuff.
What is "other stuff" here? bzr diff produce "binary output", i.e. it
does not try to encode the diff itself, but only (generated by bzr
itself) header information like
=== modified file "foo.txt"
--- foo.txt
+++ foo.txt
If this part is not called header, then please forgive me.
My patch only touches that header.
I don't think we can transcoding file content until we have some sort of
encoding properties for files.
> Isn't the appropriate thing to do to pick 1 encoding - e.g. the most capable of the filesystem encoding and the terminal encoding (or perhaps least capable), and use that for:
> - paths
> - diff syntax
> - bzr chatter
>
> And raw content for the file content itself (but perhaps escaped to not mangle terminals?)
The problem is there is no "only one right choice" on Windows.
If user want to see diff in the terminal then it will be better to use
terminal encoding.
If user want to save diff into file and use it as patch (with GNU patch)
then the best we can do is to use mbcs (or user_encoding).
The problem is bzr can't detect user intention.
Alexander Belchenko (bialix) wrote : | # |
Robert Collins пишет:
> Alexander, will that work for you?
I don't know. I'm happy with qdiff which does the right thing. I'm
working on this patch because Russian people often asking me about this
problem.
So I've sent RFC mail to ru_bzr ML to collect opinions from other
people. Because in main bzr ML there is still no feedback from non-ascii
people :-/
--
All the dude wanted was his rug back
Martin Pool (mbp) wrote : | # |
On 27 May 2010 21:31, Alexander Belchenko <email address hidden> wrote:
> Robert Collins пишет:
>> Review: Needs Information
>> So AIUI the problem has been that we were using utf8 for path metadata, and mcbs on windows for other stuff.
>
> What is "other stuff" here? bzr diff produce "binary output", i.e. it
> does not try to encode the diff itself, but only (generated by bzr
> itself) header information like
>
> === modified file "foo.txt"
> --- foo.txt
> +++ foo.txt
>
> If this part is not called header, then please forgive me.
>
> My patch only touches that header.
>
> I don't think we can transcoding file content until we have some sort of
> encoding properties for files.
'other stuff' includes the '@@' '===' '---' and the line numbers, as I
discussed above. They're not very interesting to encode but they do
have an encoding. I think they should be the same as the paths.
>
>> Isn't the appropriate thing to do to pick 1 encoding - e.g. the most capable of the filesystem encoding and the terminal encoding (or perhaps least capable), and use that for:
>> - paths
>> - diff syntax
>> - bzr chatter
>>
>> And raw content for the file content itself (but perhaps escaped to not mangle terminals?)
>
> The problem is there is no "only one right choice" on Windows.
>
> If user want to see diff in the terminal then it will be better to use
> terminal encoding.
>
> If user want to save diff into file and use it as patch (with GNU patch)
> then the best we can do is to use mbcs (or user_encoding).
>
> The problem is bzr can't detect user intention.
We can't tell if stdout is redirected to a file? On Unix we can tell
with isatty().
I think these headings should be in output_encoding, which already has
special logic to decide whether to use the terminal encoding or
something else.
--
Martin <http://
Alexander Belchenko (bialix) wrote : | # |
Martin Pool пишет:
> On 27 May 2010 21:31, Alexander Belchenko <email address hidden> wrote:
>> Robert Collins пишет:
>>> Review: Needs Information
>>> So AIUI the problem has been that we were using utf8 for path metadata, and mcbs on windows for other stuff.
>> What is "other stuff" here? bzr diff produce "binary output", i.e. it
>> does not try to encode the diff itself, but only (generated by bzr
>> itself) header information like
>>
>> === modified file "foo.txt"
>> --- foo.txt
>> +++ foo.txt
>>
>> If this part is not called header, then please forgive me.
>>
>> My patch only touches that header.
>>
>> I don't think we can transcoding file content until we have some sort of
>> encoding properties for files.
>
> 'other stuff' includes the '@@' '===' '---' and the line numbers, as I
> discussed above. They're not very interesting to encode but they do
> have an encoding. I think they should be the same as the paths.
As I can see '@@' is always ascii and they're generated by difflib
itself, right? I'd better do not touch them without very important reason.
>>> Isn't the appropriate thing to do to pick 1 encoding - e.g. the most capable of the filesystem encoding and the terminal encoding (or perhaps least capable), and use that for:
>>> - paths
>>> - diff syntax
>>> - bzr chatter
>>>
>>> And raw content for the file content itself (but perhaps escaped to not mangle terminals?)
>> The problem is there is no "only one right choice" on Windows.
>>
>> If user want to see diff in the terminal then it will be better to use
>> terminal encoding.
>>
>> If user want to save diff into file and use it as patch (with GNU patch)
>> then the best we can do is to use mbcs (or user_encoding).
>>
>> The problem is bzr can't detect user intention.
>
> We can't tell if stdout is redirected to a file? On Unix we can tell
> with isatty().
On Windows it's not reliable as I can see:
C:\Temp\8>python -c "import sys; print >>sys.stderr, sys.stdout.
True
C:\Temp\8>python -c "import sys; print >>sys.stderr,
sys.stdout.
False
C:\Temp\8>python -c "import sys; print >>sys.stderr,
sys.stdout.
True
C:\Temp\8>python -c "import sys; print >>sys.stderr,
sys.stdout.
False
> I think these headings should be in output_encoding, which already has
> special logic to decide whether to use the terminal encoding or
> something else.
Then, I think I should switch to terminal_encoding which is the right
thing for main use case: see the diff on the screen. And provide
--path-encoding for other cases.
John A Meinel (jameinel) wrote : | # |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Alexander Belchenko wrote:
> Martin Pool пишет:
>> On 27 May 2010 21:31, Alexander Belchenko <email address hidden> wrote:
>>> Robert Collins пишет:
>>>> Review: Needs Information
>>>> So AIUI the problem has been that we were using utf8 for path metadata, and mcbs on windows for other stuff.
>>> What is "other stuff" here? bzr diff produce "binary output", i.e. it
>>> does not try to encode the diff itself, but only (generated by bzr
>>> itself) header information like
>>>
>>> === modified file "foo.txt"
>>> --- foo.txt
>>> +++ foo.txt
>>>
>>> If this part is not called header, then please forgive me.
>>>
>>> My patch only touches that header.
>>>
>>> I don't think we can transcoding file content until we have some sort of
>>> encoding properties for files.
>> 'other stuff' includes the '@@' '===' '---' and the line numbers, as I
>> discussed above. They're not very interesting to encode but they do
>> have an encoding. I think they should be the same as the paths.
>
> As I can see '@@' is always ascii and they're generated by difflib
> itself, right? I'd better do not touch them without very important reason.
>
Martin's point is that if you selected "--encoding=utf-16" then you
should have "\0@\0@\0 \0m\0o\
output, rather than a mix of 8-bit ascii followed by 16-bit unicode chars.
Yes, it turns out that most (all?) 8-bit encodings use ascii as a
subset, but that certainly isn't true for 16-bit encodings.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAkv
fI8AnRMBiSrbj1w
=wzCu
-----END PGP SIGNATURE-----
Alexander Belchenko (bialix) wrote : | # |
John A Meinel пишет:
> Alexander Belchenko wrote:
>> Martin Pool пишет:
>>> On 27 May 2010 21:31, Alexander Belchenko <email address hidden> wrote:
>>>> Robert Collins пишет:
>>>>> Review: Needs Information
>>>>> So AIUI the problem has been that we were using utf8 for path metadata, and mcbs on windows for other stuff.
>>>> What is "other stuff" here? bzr diff produce "binary output", i.e. it
>>>> does not try to encode the diff itself, but only (generated by bzr
>>>> itself) header information like
>>>>
>>>> === modified file "foo.txt"
>>>> --- foo.txt
>>>> +++ foo.txt
>>>>
>>>> If this part is not called header, then please forgive me.
>>>>
>>>> My patch only touches that header.
>>>>
>>>> I don't think we can transcoding file content until we have some sort of
>>>> encoding properties for files.
>>> 'other stuff' includes the '@@' '===' '---' and the line numbers, as I
>>> discussed above. They're not very interesting to encode but they do
>>> have an encoding. I think they should be the same as the paths.
>> As I can see '@@' is always ascii and they're generated by difflib
>> itself, right? I'd better do not touch them without very important reason.
>>
>
> Martin's point is that if you selected "--encoding=utf-16" then you
> should have "\0@\0@\0 \0m\0o\
> output, rather than a mix of 8-bit ascii followed by 16-bit unicode chars.
>
> Yes, it turns out that most (all?) 8-bit encodings use ascii as a
> subset, but that certainly isn't true for 16-bit encodings.
Hmmm. Good point. But I think it won't fly anyway. What we should do
with file content? It still will be just 8-bit binary stream?
Or this makes sense only if the file content itself is utf-16? Right?
Alexander Belchenko (bialix) wrote : | # |
Alexander Belchenko пишет:
> John A Meinel пишет:
>> Alexander Belchenko wrote:
>>> Martin Pool пишет:
>>>> On 27 May 2010 21:31, Alexander Belchenko <email address hidden> wrote:
>>>>> Robert Collins пишет:
>>>>>> Review: Needs Information
>>>>>> So AIUI the problem has been that we were using utf8 for path metadata, and mcbs on windows for other stuff.
>>>>> What is "other stuff" here? bzr diff produce "binary output", i.e. it
>>>>> does not try to encode the diff itself, but only (generated by bzr
>>>>> itself) header information like
>>>>>
>>>>> === modified file "foo.txt"
>>>>> --- foo.txt
>>>>> +++ foo.txt
>>>>>
>>>>> If this part is not called header, then please forgive me.
>>>>>
>>>>> My patch only touches that header.
>>>>>
>>>>> I don't think we can transcoding file content until we have some sort of
>>>>> encoding properties for files.
>>>> 'other stuff' includes the '@@' '===' '---' and the line numbers, as I
>>>> discussed above. They're not very interesting to encode but they do
>>>> have an encoding. I think they should be the same as the paths.
>>> As I can see '@@' is always ascii and they're generated by difflib
>>> itself, right? I'd better do not touch them without very important reason.
>>>
>> Martin's point is that if you selected "--encoding=utf-16" then you
>> should have "\0@\0@\0 \0m\0o\
>> output, rather than a mix of 8-bit ascii followed by 16-bit unicode chars.
>>
>> Yes, it turns out that most (all?) 8-bit encodings use ascii as a
>> subset, but that certainly isn't true for 16-bit encodings.
>
> Hmmm. Good point. But I think it won't fly anyway. What we should do
> with file content? It still will be just 8-bit binary stream?
>
> Or this makes sense only if the file content itself is utf-16? Right?
It won't work with external diff. I think supporting utf-16 is separate
issue.
John A Meinel (jameinel) wrote : | # |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
...
>>> Martin's point is that if you selected "--encoding=utf-16" then you
>>> should have "\0@\0@\0 \0m\0o\
>>> output, rather than a mix of 8-bit ascii followed by 16-bit unicode chars.
>>>
>>> Yes, it turns out that most (all?) 8-bit encodings use ascii as a
>>> subset, but that certainly isn't true for 16-bit encodings.
>> Hmmm. Good point. But I think it won't fly anyway. What we should do
>> with file content? It still will be just 8-bit binary stream?
>>
>> Or this makes sense only if the file content itself is utf-16? Right?
>
> It won't work with external diff. I think supporting utf-16 is separate
> issue.
Conceptually if someone had utf-16 content, they would then want utf-16
diff + utf-16 headers.
Martin's point is basically about this. I don't know what (if any) 8-bit
encodings don't use the ascii subset, but that might also be an issue.
Short term, this is probably better than it was, As long as it gives us
a way to transition to even better.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAkv
Ge8AoI2NvGUb0kX
=s/WC
-----END PGP SIGNATURE-----
Martin Packman (gz) wrote : | # |
Okay, so, there are no perfect answers here, but this is a clear improvement and should be landed.
The problem with diff is that it's half user-readable text format, and half machine-readable patch input. Using the locale byte-encoding on Windows means throwing away information, but the without storing the encodings of the actual files we're already stuffed on correctness anyway (and as Alexander mentions, that means there's no way of generating a sensible UTF-16 diff file, even if such things were taken as valid input anywhere).
The terminal encoding issue is a worthwhile point, and a future enhancement could be to detect where we have a terminal handle rather than a pipe (which might still end up on the terminal, but again, we're stuffed on that), and use the unicode console apis.
+ else:
+ return 'utf8'
As mentioned by Martin Pool, this should probably instead be something like:
+ return locale.
That would actually be fine on win32 too, but we're happy with the lossy downcoding the Python mbcs codec does, right?
Alexander Belchenko (bialix) wrote : | # |
OK, I've changed default encoding from mbcs to terminal encoding, thus optimizing my patch to use case: "show me a diff on the screen" rather than provide compatibility with GNU diff/patch tools. If user need to save diff to file then user need to change terminal encoding with `chcp` command.
Also I've added a NEWS.
vila and mgz persuaded me to land this patch in its current form and add --path-encoding option later.
Alexander Belchenko (bialix) wrote : | # |
PING.
Andrew Bennetts (spiv) wrote : | # |
Thanks for the ping. I've poked Martin (Pool) in person; he's promised to take another look at this.
Alexander Belchenko (bialix) wrote : | # |
Andrew Bennetts пишет:
> Thanks for the ping. I've poked Martin (Pool) in person; he's promised to take another look at this.
Thank you, mr. lotzman!
Vincent Ladeuil (vila) wrote : | # |
@poolie: ping
Martin Pool (mbp) wrote : | # |
Sorry for the lag.
I agree this is probably a step forward except that you really should not just default to utf8 on non-Windows, as Martin[gz] and I said previously in this review. You could just default to output_encoding. But in fact, why not always default to output_encoding, and then on win32 we should arrange for that to be either the terminal encoding or something else, as appropriate.
Alexander Belchenko (bialix) wrote : | # |
What do you mean by "output_encoding"?
Martin Pool (mbp) wrote : | # |
> What do you mean by "output_encoding"?
I was confused by the fact that the value returned by get_terminal_
In short what I'd recommend for now is that you change
+def get_diff_
123 + if sys.platform == 'win32':
124 + return get_terminal_
125 + else:
126 + return 'utf8'
127 +
128 +
to simply return get_terminal_
Vincent Ladeuil (vila) wrote : | # |
@Sacha: I think Martin's request is just a tweak, if you agree with that, please land once it's fixed.
I'll mark this proposal approved, make it wip if needed.
Preview Diff
1 | === modified file 'NEWS' |
2 | --- NEWS 2010-06-17 02:06:20 +0000 |
3 | +++ NEWS 2010-06-17 08:55:41 +0000 |
4 | @@ -66,6 +66,12 @@ |
5 | which previously caused "SyntaxError: No command for line". |
6 | (Martin Pool) |
7 | |
8 | +* Show unicode filenames in diff headers using terminal encoding. |
9 | + (Alexander Belchenko, Bug #382699) |
10 | + NOTE for Windows users: If user need to save diff to file then user need to |
11 | + change encoding of the terminal to ANSI encoding with command ``chcp XXX`` |
12 | + (e.g. ``chcp 1251`` for Russian Windows). |
13 | + |
14 | Improvements |
15 | ************ |
16 | |
17 | |
18 | === modified file 'bzrlib/builtins.py' |
19 | --- bzrlib/builtins.py 2010-06-07 02:38:04 +0000 |
20 | +++ bzrlib/builtins.py 2010-06-17 08:55:41 +0000 |
21 | @@ -1970,11 +1970,15 @@ |
22 | old_branch, new_branch, |
23 | specific_files, extra_trees) = get_trees_and_branches_to_diff_locked( |
24 | file_list, revision, old, new, self.add_cleanup, apply_view=True) |
25 | + # GNU diff on Windows uses ANSI encoding for filenames |
26 | + path_encoding = osutils.get_diff_header_encoding() |
27 | return show_diff_trees(old_tree, new_tree, sys.stdout, |
28 | specific_files=specific_files, |
29 | external_diff_options=diff_options, |
30 | old_label=old_label, new_label=new_label, |
31 | - extra_trees=extra_trees, using=using, |
32 | + extra_trees=extra_trees, |
33 | + path_encoding=path_encoding, |
34 | + using=using, |
35 | format_cls=format) |
36 | |
37 | |
38 | @@ -3892,8 +3896,10 @@ |
39 | def _do_preview(self, merger): |
40 | from bzrlib.diff import show_diff_trees |
41 | result_tree = self._get_preview(merger) |
42 | + path_encoding = osutils.get_diff_header_encoding() |
43 | show_diff_trees(merger.this_tree, result_tree, self.outf, |
44 | - old_label='', new_label='') |
45 | + old_label='', new_label='', |
46 | + path_encoding=path_encoding) |
47 | |
48 | def _do_merge(self, merger, change_reporter, allow_pending, verified): |
49 | merger.change_reporter = change_reporter |
50 | |
51 | === modified file 'bzrlib/diff.py' |
52 | --- bzrlib/diff.py 2010-05-25 17:27:52 +0000 |
53 | +++ bzrlib/diff.py 2010-06-17 08:55:41 +0000 |
54 | @@ -99,8 +99,8 @@ |
55 | if sequence_matcher is None: |
56 | sequence_matcher = patiencediff.PatienceSequenceMatcher |
57 | ud = patiencediff.unified_diff(oldlines, newlines, |
58 | - fromfile=old_filename.encode(path_encoding), |
59 | - tofile=new_filename.encode(path_encoding), |
60 | + fromfile=old_filename.encode(path_encoding, 'replace'), |
61 | + tofile=new_filename.encode(path_encoding, 'replace'), |
62 | sequencematcher=sequence_matcher) |
63 | |
64 | ud = list(ud) |
65 | @@ -713,11 +713,11 @@ |
66 | from_text = _get_text(self.old_tree, from_file_id, from_path) |
67 | to_text = _get_text(self.new_tree, to_file_id, to_path) |
68 | self.text_differ(from_label, from_text, to_label, to_text, |
69 | - self.to_file) |
70 | + self.to_file, path_encoding=self.path_encoding) |
71 | except errors.BinaryFile: |
72 | self.to_file.write( |
73 | ("Binary files %s and %s differ\n" % |
74 | - (from_label, to_label)).encode(self.path_encoding)) |
75 | + (from_label, to_label)).encode(self.path_encoding,'replace')) |
76 | return self.CHANGED |
77 | |
78 | |
79 | @@ -920,7 +920,10 @@ |
80 | extra_factories = [] |
81 | if external_diff_options: |
82 | opts = external_diff_options.split() |
83 | - def diff_file(olab, olines, nlab, nlines, to_file): |
84 | + def diff_file(olab, olines, nlab, nlines, to_file, path_encoding=None): |
85 | + """:param path_encoding: not used but required |
86 | + to match the signature of internal_diff. |
87 | + """ |
88 | external_diff(olab, olines, nlab, nlines, to_file, opts) |
89 | else: |
90 | diff_file = internal_diff |
91 | |
92 | === modified file 'bzrlib/log.py' |
93 | --- bzrlib/log.py 2010-06-08 09:50:27 +0000 |
94 | +++ bzrlib/log.py 2010-06-17 08:55:41 +0000 |
95 | @@ -70,6 +70,7 @@ |
96 | diff, |
97 | errors, |
98 | foreign, |
99 | + osutils, |
100 | repository as _mod_repository, |
101 | revision as _mod_revision, |
102 | revisionspec, |
103 | @@ -432,8 +433,9 @@ |
104 | else: |
105 | specific_files = None |
106 | s = StringIO() |
107 | + path_encoding = osutils.get_diff_header_encoding() |
108 | diff.show_diff_trees(tree_1, tree_2, s, specific_files, old_label='', |
109 | - new_label='') |
110 | + new_label='', path_encoding=path_encoding) |
111 | return s.getvalue() |
112 | |
113 | def _create_log_revision_iterator(self): |
114 | |
115 | === modified file 'bzrlib/osutils.py' |
116 | --- bzrlib/osutils.py 2010-06-07 01:13:05 +0000 |
117 | +++ bzrlib/osutils.py 2010-06-17 08:55:41 +0000 |
118 | @@ -1952,6 +1952,10 @@ |
119 | return user_encoding |
120 | |
121 | |
122 | +def get_diff_header_encoding(): |
123 | + return get_terminal_encoding() |
124 | + |
125 | + |
126 | def get_host_name(): |
127 | """Return the current unicode host name. |
128 | |
129 | |
130 | === modified file 'bzrlib/shelf_ui.py' |
131 | --- bzrlib/shelf_ui.py 2010-01-15 05:33:28 +0000 |
132 | +++ bzrlib/shelf_ui.py 2010-06-17 08:55:41 +0000 |
133 | @@ -241,7 +241,8 @@ |
134 | new_tree = self.work_tree |
135 | old_path = old_tree.id2path(file_id) |
136 | new_path = new_tree.id2path(file_id) |
137 | - text_differ = diff.DiffText(old_tree, new_tree, diff_file) |
138 | + text_differ = diff.DiffText(old_tree, new_tree, diff_file, |
139 | + path_encoding=osutils.get_terminal_encoding()) |
140 | patch = text_differ.diff(file_id, old_path, new_path, 'file', 'file') |
141 | diff_file.seek(0) |
142 | return patches.parse_patch(diff_file) |
143 | @@ -493,7 +494,9 @@ |
144 | new_tree = tt.get_preview_tree() |
145 | if self.write_diff_to is None: |
146 | self.write_diff_to = ui.ui_factory.make_output_stream() |
147 | - diff.show_diff_trees(merger.this_tree, new_tree, self.write_diff_to) |
148 | + path_encoding = osutils.get_diff_header_encoding() |
149 | + diff.show_diff_trees(merger.this_tree, new_tree, self.write_diff_to, |
150 | + path_encoding=path_encoding) |
151 | tt.finalize() |
152 | |
153 | def show_changes(self, merger): |
154 | |
155 | === modified file 'bzrlib/tests/test_diff.py' |
156 | --- bzrlib/tests/test_diff.py 2010-05-20 08:08:20 +0000 |
157 | +++ bzrlib/tests/test_diff.py 2010-06-17 08:55:41 +0000 |
158 | @@ -34,6 +34,7 @@ |
159 | ) |
160 | from bzrlib.symbol_versioning import deprecated_in |
161 | from bzrlib.tests import features |
162 | +from bzrlib.tests.blackbox.test_diff import subst_dates |
163 | |
164 | |
165 | class _AttribFeature(tests.Feature): |
166 | @@ -521,7 +522,6 @@ |
167 | self.assertNotContainsRe(d, r"file 'e'") |
168 | self.assertNotContainsRe(d, r"file 'f'") |
169 | |
170 | - |
171 | def test_binary_unicode_filenames(self): |
172 | """Test that contents of files are *not* encoded in UTF-8 when there |
173 | is a binary file in the diff. |
174 | @@ -580,6 +580,49 @@ |
175 | self.assertContainsRe(d, "=== modified file 'mod_%s'"%autf8) |
176 | self.assertContainsRe(d, "=== removed file 'del_%s'"%autf8) |
177 | |
178 | + def test_unicode_filename_path_encoding(self): |
179 | + """Test for bug #382699: unicode filenames on Windows should be shown |
180 | + in user encoding. |
181 | + """ |
182 | + self.requireFeature(tests.UnicodeFilenameFeature) |
183 | + # The word 'test' in Russian |
184 | + _russian_test = u'\u0422\u0435\u0441\u0442' |
185 | + directory = _russian_test + u'/' |
186 | + test_txt = _russian_test + u'.txt' |
187 | + u1234 = u'\u1234.txt' |
188 | + |
189 | + tree = self.make_branch_and_tree('.') |
190 | + self.build_tree_contents([ |
191 | + (test_txt, 'foo\n'), |
192 | + (u1234, 'foo\n'), |
193 | + (directory, None), |
194 | + ]) |
195 | + tree.add([test_txt, u1234, directory]) |
196 | + |
197 | + sio = StringIO() |
198 | + diff.show_diff_trees(tree.basis_tree(), tree, sio, |
199 | + path_encoding='cp1251') |
200 | + |
201 | + output = subst_dates(sio.getvalue()) |
202 | + shouldbe = ('''\ |
203 | +=== added directory '%(directory)s' |
204 | +=== added file '%(test_txt)s' |
205 | +--- a/%(test_txt)s\tYYYY-MM-DD HH:MM:SS +ZZZZ |
206 | ++++ b/%(test_txt)s\tYYYY-MM-DD HH:MM:SS +ZZZZ |
207 | +@@ -0,0 +1,1 @@ |
208 | ++foo |
209 | + |
210 | +=== added file '?.txt' |
211 | +--- a/?.txt\tYYYY-MM-DD HH:MM:SS +ZZZZ |
212 | ++++ b/?.txt\tYYYY-MM-DD HH:MM:SS +ZZZZ |
213 | +@@ -0,0 +1,1 @@ |
214 | ++foo |
215 | + |
216 | +''' % {'directory': _russian_test.encode('cp1251'), |
217 | + 'test_txt': test_txt.encode('cp1251'), |
218 | + }) |
219 | + self.assertEqualDiff(output, shouldbe) |
220 | + |
221 | |
222 | class DiffWasIs(diff.DiffPath): |
223 |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Alexander Belchenko wrote: /bugs.launchpad .net/bugs/ 382699
> Alexander Belchenko has proposed merging lp:~bialix/bzr/diff-header-encoding into lp:bzr.
>
> Requested reviews:
> John A Meinel (jameinel)
> bzr-core (bzr-core)
> Related bugs:
> #382699 diff headers should contain non-ascii filenames in user_encoding, not in utf-8
> https:/
>
>
> This branch ensure filenames in diff header on windows always will be in mbcs encoding. My next step is to add new command-line option ``--path-encoding`` to some commands to control that encoding.
>
> I would like to get some review on this work to ensure it's heading the right way. Also I would like to know: should I add --path-encoding option in this patch or it could be done as follow-on patch.
>
Looks good to me.
merge: approve
John
=:->
-----BEGIN PGP SIGNATURE----- enigmail. mozdev. org/
9gWwACgkQJdeBCY SNAANcCgCgreM1B HLlYxRjw4wl/ LikySLA 1tjn9w4KoJCHOFP UW
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAkv
FVwAoL8H4Dq25YI
=P51i
-----END PGP SIGNATURE-----