Merge lp:~songofacandy/bzr/user_encoding_utf8 into lp:bzr/2.5

Proposed by INADA Naoki
Status: Rejected
Rejected by: Martin Packman
Proposed branch: lp:~songofacandy/bzr/user_encoding_utf8
Merge into: lp:bzr/2.5
Diff against target: 136 lines (+39/-9)
3 files modified
bzrlib/builtins.py (+3/-3)
bzrlib/msgeditor.py (+18/-6)
bzrlib/tests/test_msgeditor.py (+18/-0)
To merge this branch: bzr merge lp:~songofacandy/bzr/user_encoding_utf8
Reviewer Review Type Date Requested Status
Martin Packman (community) Disapprove
Vincent Ladeuil Needs Fixing
Review via email: mp+90039@code.launchpad.net

Description of the change

Add ``editor_encoding`` configuration to allow writing non-ascii commit message without locale setting.

To post a comment you must log in.
Revision history for this message
Alexander Belchenko (bialix) wrote :

INADA Naoki пишет:
> INADA Naoki has proposed merging lp:~songofacandy/bzr/user_encoding_utf8 into lp:bzr/2.5.
>
> Requested reviews:
> bzr-core (bzr-core)
>
> For more details, see:
> https://code.launchpad.net/~songofacandy/bzr/user_encoding_utf8/+merge/90039
>
> * osutils.get_user_encoding() fallbacks to utf8 instead of ascii.
> * Add BZR_USER_ENCODING envvar to override user encoding.

The latter sounds great!

I've spotted small typo in your patch though:

+ if old_env is not None:
+ os.envirion['BZR_USER_ENCODING'] = old_env

should be os.environ.

But I think bzrlib.osutils already has special method to set/unset
environment variables, and in the tests you should use it instead of
manual manipulations.

--
All the dude wanted was his rug back

Revision history for this message
INADA Naoki (songofacandy) wrote :

Thank you.
Can I use osutils' method in test for osutils?

Revision history for this message
Alexander Belchenko (bialix) wrote :

INADA Naoki пишет:
> Thank you.
> Can I use osutils' method in test for osutils?

Yes, why not?

--
All the dude wanted was his rug back

Revision history for this message
Martin Pool (mbp) wrote :

That's really nice, thanks.

It would be nice if this could also check a variable in the global configuration, so that people can set it permanently in bazaar.conf. Some people may find that easier than getting an environment variable set, especially on windows or on remote machines. But, we could come back and do that later. Doing it from config may cause some bootstrapping issues, as bzr needs to know the encoding pretty early on, probably before it loads any configuration.

Revision history for this message
Vincent Ladeuil (vila) wrote :

> That's really nice, thanks.
>
> It would be nice if this could also check a variable in the global
> configuration, so that people can set it permanently in bazaar.conf. Some
> people may find that easier than getting an environment variable set,
> especially on windows or on remote machines. But, we could come back and do
> that later.

Even better would be to use only a config variable since env variables are not universal as you point out. In this precise case there (use encoding) I can't think of a use case where an env variable will be more appropriate.

> Doing it from config may cause some bootstrapping issues, as bzr
> needs to know the encoding pretty early on, probably before it loads any
> configuration.

I don't think so, config files are encoded in utf-8 only and cannot be encoded into anything else.

63 + old_env = os.environ.get('BZR_USER_ENCODING')
64 + def cleanup():
65 + osutils.set_or_unset_env('BZR_USER_ENCODING', old_env)
66 + self.addCleanup(cleanup)

As mentioned by bialix, the idiom here should be:

   self.overrideEnv('BZR_USER_ENCODING', <new value>)

But since osutils.get_user cache its result, this won't work reliably you also need:

   self.overrideAttr(osutils, '_cached_user_encoding', None)

So anyway, better switch to config variables.

review: Needs Fixing
Revision history for this message
INADA Naoki (songofacandy) wrote :

OK, I'll change to use global config.
What name is good for the option? "user_encoding" or just "encoding"?

Revision history for this message
Martin Packman (gz) wrote :

Treating ascii from the posix locale as utf-8 may be okay, but adding a mechanism outside locale to override this will break things I'm afraid.

This is essentially the argument python-dev were making against allowing any kind of overriding, which is overly conservative, but get_user_encoding is the way bzr finds out what actual encoding the rest of the system is using. On windows for instance, letting someone override this to a different value from what GetACP returns can corrupt the environment block and crash the process. On posix, you'd get unicode decode errors trying to interpret localised error messages, and pass incompatible bytestrings down into libraries and on to other processes.

For your original problem, of wanting to enter a utf-8 commit message in an editor when LANG=C, I think we just need better logic in that particular area. There's little reason not to just try utf-8 first, catch the error, and then try the locale encoding. I don't like that kind of heuristic much, but it will do the right thing more often than the current code.

review: Disapprove
Revision history for this message
INADA Naoki (songofacandy) wrote :

Is there any example of test using GLobalConfig?
set_user_option() changes my bazaar.conf...

Revision history for this message
INADA Naoki (songofacandy) wrote :

> Treating ascii from the posix locale as utf-8 may be okay, but adding a
> mechanism outside locale to override this will break things I'm afraid.

OKey, I've reverted overriding mechanism. Just use utf-8 when posix locale is ascii.

Revision history for this message
Vincent Ladeuil (vila) wrote :

> Is there any example of test using GLobalConfig?
> set_user_option() changes my bazaar.conf...

Your test(s) inherited from TestCase instead of TestCaseInTempDir ?

Search for GlobalStack instead of GlobalConfig and use the new config implementation (feedback on whether the doc in doc/developers/configuration.txt is highly welcome).

6459. By Patch Queue Manager <email address hidden>

(jelmer) Fix the default target name in 'bzr branch' when there is no
 selected colocated branch. (Jelmer Vernooij)

6460. By Patch Queue Manager <email address hidden>

(jelmer) Add utility code for opening branches with a specific policy for
 URLs to access. (Jelmer Vernooij)

6461. By Patch Queue Manager <email address hidden>

(jelmer) Fix 'bzr send' in a treeless branch. (Jelmer Vernooij)

6462. By Patch Queue Manager <email address hidden>

(jelmer) Force lightweight checkouts for colocated branches. (Jelmer
 Vernooij)

6463. By Patch Queue Manager <email address hidden>

(jelmer) Support forward slashes in colocated branch names. (Jelmer Vernooij)

6464. By Patch Queue Manager <email address hidden>

(gz) Close files more promptly in tests to avoid issues on pypy (Wouter van
 Heyst)

6465. By Patch Queue Manager <email address hidden>

(jelmer) Merge colocated branch support back into the 2a format. (Jelmer
 Vernooij)

6466. By Patch Queue Manager <email address hidden>

(jelmer) Raise NotBranchError from BzrDir.destroy_branch. (Jelmer Vernooij)

6467. By Patch Queue Manager <email address hidden>

(vila) We'll release a sixth beta before 2.5.0 (Vincent Ladeuil)

6468. By Patch Queue Manager <email address hidden>

(vila) Provide an ``ssl.ca_certs`` default value for supported platforms.
 (Vincent Ladeuil)

6469. By Patch Queue Manager <email address hidden>

(gz) Store the path to .bzr.log as unicode so it can be safely printed by
 `bzr version` (Martin Packman)

6470. By Patch Queue Manager <email address hidden>

(gz) Make config.config_dir and related functions consistenly return unicode
 (Martin Packman)

6471. By Patch Queue Manager <email address hidden>

(vila) Release bzr-2.5b6 (Vincent Ladeuil)

6472. By Patch Queue Manager <email address hidden>

(vila) Open 2.5.0 for bugfixes (second attempt) (Vincent Ladeuil)

6473. By INADA Naoki

Add ``editor_encoding`` configuration to specify encoding to decode commit message from file.

6474. By INADA Naoki

Add tests for editor_encoding.

Revision history for this message
INADA Naoki (songofacandy) wrote :

I've rewote my branch to use special configuration variable and
don't change osutils.get_user_encoding().
I think this is most conservative way.

Revision history for this message
Martin Packman (gz) wrote :

I'm still torn on this, we do have a bunch of encoding related bugs in bazaar that have yet to be fixed, but I don't like proliferating global config options for things like file contents where we can get a good default from the OS but may get arbitrary bytes anyway. Commit messages from file are particularly fun because we may put bytes from the files as diffs which again could be whatever.

Unmerged revisions

6474. By INADA Naoki

Add tests for editor_encoding.

6473. By INADA Naoki

Add ``editor_encoding`` configuration to specify encoding to decode commit message from file.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bzrlib/builtins.py'
2--- bzrlib/builtins.py 2012-01-31 15:43:17 +0000
3+++ bzrlib/builtins.py 2012-02-03 17:53:19 +0000
4@@ -3557,6 +3557,7 @@
5 from bzrlib.msgeditor import (
6 edit_commit_message_encoded,
7 generate_commit_message_template,
8+ get_editor_encoding,
9 make_commit_message_template_encoded,
10 set_commit_message,
11 )
12@@ -3614,7 +3615,7 @@
13 if file:
14 f = open(file)
15 try:
16- my_message = f.read().decode(osutils.get_user_encoding())
17+ my_message = f.read().decode(get_editor_encoding())
18 finally:
19 f.close()
20 elif message is not None:
21@@ -3623,8 +3624,7 @@
22 # No message supplied: make one up.
23 # text is the status of the tree
24 text = make_commit_message_template_encoded(tree,
25- selected_list, diff=show_diff,
26- output_encoding=osutils.get_user_encoding())
27+ selected_list, diff=show_diff)
28 # start_message is the template generated from hooks
29 # XXX: Warning - looks like hooks return unicode,
30 # make_commit_message_template_encoded returns user encoding.
31
32=== modified file 'bzrlib/msgeditor.py'
33--- bzrlib/msgeditor.py 2011-12-19 13:23:58 +0000
34+++ bzrlib/msgeditor.py 2012-02-03 17:53:19 +0000
35@@ -58,6 +58,14 @@
36 yield editor, None
37
38
39+def get_editor_encoding():
40+ """Return a encoding used when editing commit message file with editor."""
41+ e = config.GlobalStack().get('editor_encoding')
42+ if e is not None:
43+ return e
44+ return osutils.get_user_encoding()
45+
46+
47 def _run_editor(filename):
48 """Try to execute an editor to edit the commit message."""
49 for candidate, candidate_source in _get_editor():
50@@ -110,10 +118,11 @@
51
52 :return: commit message or None.
53 """
54-
55- if not start_message is None:
56- start_message = start_message.encode(osutils.get_user_encoding())
57- infotext = infotext.encode(osutils.get_user_encoding(), 'replace')
58+ msgencoding = get_editor_encoding()
59+ if start_message is not None:
60+ start_message = start_message.encode(msgencoding)
61+ infotext = infotext.encode(msgencoding, 'replace')
62+ # TODO: Add info to tell user which encoding is used.
63 return edit_commit_message_encoded(infotext, ignoreline, start_message)
64
65
66@@ -139,6 +148,7 @@
67 :return: commit message or None.
68 """
69 msgfilename = None
70+ msgencoding = get_editor_encoding()
71 try:
72 msgfilename, hasinfo = _create_temp_file_with_commit_template(
73 infotext, ignoreline, start_message)
74@@ -167,7 +177,7 @@
75 f = file(msgfilename, 'rU')
76 try:
77 try:
78- for line in codecs.getreader(osutils.get_user_encoding())(f):
79+ for line in codecs.getreader(msgencoding)(f):
80 stripped_line = line.strip()
81 # strip empty line before the log message starts
82 if not started:
83@@ -264,7 +274,7 @@
84
85
86 def make_commit_message_template_encoded(working_tree, specific_files,
87- diff=None, output_encoding='utf-8'):
88+ diff=None, output_encoding=None):
89 """Prepare a template file for a commit into a branch.
90
91 Returns an encoded string.
92@@ -277,6 +287,8 @@
93 from StringIO import StringIO # must be unicode-safe
94 from bzrlib.diff import show_diff_trees
95
96+ if output_encoding is None:
97+ output_encoding = get_editor_encoding()
98 template = make_commit_message_template(working_tree, specific_files)
99 template = template.encode(output_encoding, "replace")
100
101
102=== modified file 'bzrlib/tests/test_msgeditor.py'
103--- bzrlib/tests/test_msgeditor.py 2011-11-16 17:19:13 +0000
104+++ bzrlib/tests/test_msgeditor.py 2012-02-03 17:53:19 +0000
105@@ -267,6 +267,12 @@
106 self.assertEqual(['/usr/bin/editor', 'vi', 'pico', 'nano', 'joe'],
107 editors[4:])
108
109+ def test_get_editor_encoding(self):
110+ conf = config.GlobalStack()
111+ conf.store._load_from_string('[DEFAULT]\neditor_encoding = dummy\n')
112+ conf.store.save()
113+ encoding = msgeditor.get_editor_encoding()
114+ self.assertEqual('dummy', encoding)
115
116 def test__run_editor_EACCES(self):
117 """If running a configured editor raises EACESS, the user is warned."""
118@@ -345,6 +351,18 @@
119 self.assertRaises(errors.BadCommitMessageEncoding,
120 msgeditor.edit_commit_message, '')
121
122+ def test_configured_encoding_commit_message(self):
123+ self.make_fake_editor(u'\u1234\n'.encode('utf-8'))
124+ self.overrideEnv('LANG', 'C')
125+ conf = config.GlobalStack()
126+ conf.store._load_from_string('[DEFAULT]\neditor_encoding = utf-8')
127+ conf.store.save()
128+
129+ working_tree = self.make_uncommitted_tree()
130+
131+ msg = edit_commit_message_encoded(u'\u1235'.encode("utf-8"))
132+ self.assertEqual(u'\u1234\n', msg)
133+
134 def test_set_commit_message_no_hooks(self):
135 commit_obj = commit.Commit()
136 self.assertIs(None,

Subscribers

People subscribed via source and target branches