Merge lp:~songofacandy/bzr/user_encoding_utf8 into lp:bzr/2.5

Proposed by INADA Naoki
Status: Rejected
Rejected by: Martin Packman
Proposed branch: lp:~songofacandy/bzr/user_encoding_utf8
Merge into: lp:bzr/2.5
Diff against target: 136 lines (+39/-9)
3 files modified
bzrlib/builtins.py (+3/-3)
bzrlib/msgeditor.py (+18/-6)
bzrlib/tests/test_msgeditor.py (+18/-0)
To merge this branch: bzr merge lp:~songofacandy/bzr/user_encoding_utf8
Reviewer Review Type Date Requested Status
Martin Packman (community) Disapprove
Vincent Ladeuil Needs Fixing
Review via email: mp+90039@code.launchpad.net

Description of the change

Add ``editor_encoding`` configuration to allow writing non-ascii commit message without locale setting.

To post a comment you must log in.
Revision history for this message
Alexander Belchenko (bialix) wrote :

INADA Naoki пишет:
> INADA Naoki has proposed merging lp:~songofacandy/bzr/user_encoding_utf8 into lp:bzr/2.5.
>
> Requested reviews:
> bzr-core (bzr-core)
>
> For more details, see:
> https://code.launchpad.net/~songofacandy/bzr/user_encoding_utf8/+merge/90039
>
> * osutils.get_user_encoding() fallbacks to utf8 instead of ascii.
> * Add BZR_USER_ENCODING envvar to override user encoding.

The latter sounds great!

I've spotted small typo in your patch though:

+ if old_env is not None:
+ os.envirion['BZR_USER_ENCODING'] = old_env

should be os.environ.

But I think bzrlib.osutils already has special method to set/unset
environment variables, and in the tests you should use it instead of
manual manipulations.

--
All the dude wanted was his rug back

Revision history for this message
INADA Naoki (songofacandy) wrote :

Thank you.
Can I use osutils' method in test for osutils?

Revision history for this message
Alexander Belchenko (bialix) wrote :

INADA Naoki пишет:
> Thank you.
> Can I use osutils' method in test for osutils?

Yes, why not?

--
All the dude wanted was his rug back

Revision history for this message
Martin Pool (mbp) wrote :

That's really nice, thanks.

It would be nice if this could also check a variable in the global configuration, so that people can set it permanently in bazaar.conf. Some people may find that easier than getting an environment variable set, especially on windows or on remote machines. But, we could come back and do that later. Doing it from config may cause some bootstrapping issues, as bzr needs to know the encoding pretty early on, probably before it loads any configuration.

Revision history for this message
Vincent Ladeuil (vila) wrote :

> That's really nice, thanks.
>
> It would be nice if this could also check a variable in the global
> configuration, so that people can set it permanently in bazaar.conf. Some
> people may find that easier than getting an environment variable set,
> especially on windows or on remote machines. But, we could come back and do
> that later.

Even better would be to use only a config variable since env variables are not universal as you point out. In this precise case there (use encoding) I can't think of a use case where an env variable will be more appropriate.

> Doing it from config may cause some bootstrapping issues, as bzr
> needs to know the encoding pretty early on, probably before it loads any
> configuration.

I don't think so, config files are encoded in utf-8 only and cannot be encoded into anything else.

63 + old_env = os.environ.get('BZR_USER_ENCODING')
64 + def cleanup():
65 + osutils.set_or_unset_env('BZR_USER_ENCODING', old_env)
66 + self.addCleanup(cleanup)

As mentioned by bialix, the idiom here should be:

   self.overrideEnv('BZR_USER_ENCODING', <new value>)

But since osutils.get_user cache its result, this won't work reliably you also need:

   self.overrideAttr(osutils, '_cached_user_encoding', None)

So anyway, better switch to config variables.

review: Needs Fixing
Revision history for this message
INADA Naoki (songofacandy) wrote :

OK, I'll change to use global config.
What name is good for the option? "user_encoding" or just "encoding"?

Revision history for this message
Martin Packman (gz) wrote :

Treating ascii from the posix locale as utf-8 may be okay, but adding a mechanism outside locale to override this will break things I'm afraid.

This is essentially the argument python-dev were making against allowing any kind of overriding, which is overly conservative, but get_user_encoding is the way bzr finds out what actual encoding the rest of the system is using. On windows for instance, letting someone override this to a different value from what GetACP returns can corrupt the environment block and crash the process. On posix, you'd get unicode decode errors trying to interpret localised error messages, and pass incompatible bytestrings down into libraries and on to other processes.

For your original problem, of wanting to enter a utf-8 commit message in an editor when LANG=C, I think we just need better logic in that particular area. There's little reason not to just try utf-8 first, catch the error, and then try the locale encoding. I don't like that kind of heuristic much, but it will do the right thing more often than the current code.

review: Disapprove
Revision history for this message
INADA Naoki (songofacandy) wrote :

Is there any example of test using GLobalConfig?
set_user_option() changes my bazaar.conf...

Revision history for this message
INADA Naoki (songofacandy) wrote :

> Treating ascii from the posix locale as utf-8 may be okay, but adding a
> mechanism outside locale to override this will break things I'm afraid.

OKey, I've reverted overriding mechanism. Just use utf-8 when posix locale is ascii.

Revision history for this message
Vincent Ladeuil (vila) wrote :

> Is there any example of test using GLobalConfig?
> set_user_option() changes my bazaar.conf...

Your test(s) inherited from TestCase instead of TestCaseInTempDir ?

Search for GlobalStack instead of GlobalConfig and use the new config implementation (feedback on whether the doc in doc/developers/configuration.txt is highly welcome).

6459. By Patch Queue Manager <email address hidden>

(jelmer) Fix the default target name in 'bzr branch' when there is no
 selected colocated branch. (Jelmer Vernooij)

6460. By Patch Queue Manager <email address hidden>

(jelmer) Add utility code for opening branches with a specific policy for
 URLs to access. (Jelmer Vernooij)

6461. By Patch Queue Manager <email address hidden>

(jelmer) Fix 'bzr send' in a treeless branch. (Jelmer Vernooij)

6462. By Patch Queue Manager <email address hidden>

(jelmer) Force lightweight checkouts for colocated branches. (Jelmer
 Vernooij)

6463. By Patch Queue Manager <email address hidden>

(jelmer) Support forward slashes in colocated branch names. (Jelmer Vernooij)

6464. By Patch Queue Manager <email address hidden>

(gz) Close files more promptly in tests to avoid issues on pypy (Wouter van
 Heyst)

6465. By Patch Queue Manager <email address hidden>

(jelmer) Merge colocated branch support back into the 2a format. (Jelmer
 Vernooij)

6466. By Patch Queue Manager <email address hidden>

(jelmer) Raise NotBranchError from BzrDir.destroy_branch. (Jelmer Vernooij)

6467. By Patch Queue Manager <email address hidden>

(vila) We'll release a sixth beta before 2.5.0 (Vincent Ladeuil)

6468. By Patch Queue Manager <email address hidden>

(vila) Provide an ``ssl.ca_certs`` default value for supported platforms.
 (Vincent Ladeuil)

6469. By Patch Queue Manager <email address hidden>

(gz) Store the path to .bzr.log as unicode so it can be safely printed by
 `bzr version` (Martin Packman)

6470. By Patch Queue Manager <email address hidden>

(gz) Make config.config_dir and related functions consistenly return unicode
 (Martin Packman)

6471. By Patch Queue Manager <email address hidden>

(vila) Release bzr-2.5b6 (Vincent Ladeuil)

6472. By Patch Queue Manager <email address hidden>

(vila) Open 2.5.0 for bugfixes (second attempt) (Vincent Ladeuil)

6473. By INADA Naoki

Add ``editor_encoding`` configuration to specify encoding to decode commit message from file.

6474. By INADA Naoki

Add tests for editor_encoding.

Revision history for this message
INADA Naoki (songofacandy) wrote :

I've rewote my branch to use special configuration variable and
don't change osutils.get_user_encoding().
I think this is most conservative way.

Revision history for this message
Martin Packman (gz) wrote :

I'm still torn on this, we do have a bunch of encoding related bugs in bazaar that have yet to be fixed, but I don't like proliferating global config options for things like file contents where we can get a good default from the OS but may get arbitrary bytes anyway. Commit messages from file are particularly fun because we may put bytes from the files as diffs which again could be whatever.

Unmerged revisions

6474. By INADA Naoki

Add tests for editor_encoding.

6473. By INADA Naoki

Add ``editor_encoding`` configuration to specify encoding to decode commit message from file.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'bzrlib/builtins.py'
--- bzrlib/builtins.py 2012-01-31 15:43:17 +0000
+++ bzrlib/builtins.py 2012-02-03 17:53:19 +0000
@@ -3557,6 +3557,7 @@
3557 from bzrlib.msgeditor import (3557 from bzrlib.msgeditor import (
3558 edit_commit_message_encoded,3558 edit_commit_message_encoded,
3559 generate_commit_message_template,3559 generate_commit_message_template,
3560 get_editor_encoding,
3560 make_commit_message_template_encoded,3561 make_commit_message_template_encoded,
3561 set_commit_message,3562 set_commit_message,
3562 )3563 )
@@ -3614,7 +3615,7 @@
3614 if file:3615 if file:
3615 f = open(file)3616 f = open(file)
3616 try:3617 try:
3617 my_message = f.read().decode(osutils.get_user_encoding())3618 my_message = f.read().decode(get_editor_encoding())
3618 finally:3619 finally:
3619 f.close()3620 f.close()
3620 elif message is not None:3621 elif message is not None:
@@ -3623,8 +3624,7 @@
3623 # No message supplied: make one up.3624 # No message supplied: make one up.
3624 # text is the status of the tree3625 # text is the status of the tree
3625 text = make_commit_message_template_encoded(tree,3626 text = make_commit_message_template_encoded(tree,
3626 selected_list, diff=show_diff,3627 selected_list, diff=show_diff)
3627 output_encoding=osutils.get_user_encoding())
3628 # start_message is the template generated from hooks3628 # start_message is the template generated from hooks
3629 # XXX: Warning - looks like hooks return unicode,3629 # XXX: Warning - looks like hooks return unicode,
3630 # make_commit_message_template_encoded returns user encoding.3630 # make_commit_message_template_encoded returns user encoding.
36313631
=== modified file 'bzrlib/msgeditor.py'
--- bzrlib/msgeditor.py 2011-12-19 13:23:58 +0000
+++ bzrlib/msgeditor.py 2012-02-03 17:53:19 +0000
@@ -58,6 +58,14 @@
58 yield editor, None58 yield editor, None
5959
6060
61def get_editor_encoding():
62 """Return a encoding used when editing commit message file with editor."""
63 e = config.GlobalStack().get('editor_encoding')
64 if e is not None:
65 return e
66 return osutils.get_user_encoding()
67
68
61def _run_editor(filename):69def _run_editor(filename):
62 """Try to execute an editor to edit the commit message."""70 """Try to execute an editor to edit the commit message."""
63 for candidate, candidate_source in _get_editor():71 for candidate, candidate_source in _get_editor():
@@ -110,10 +118,11 @@
110118
111 :return: commit message or None.119 :return: commit message or None.
112 """120 """
113121 msgencoding = get_editor_encoding()
114 if not start_message is None:122 if start_message is not None:
115 start_message = start_message.encode(osutils.get_user_encoding())123 start_message = start_message.encode(msgencoding)
116 infotext = infotext.encode(osutils.get_user_encoding(), 'replace')124 infotext = infotext.encode(msgencoding, 'replace')
125 # TODO: Add info to tell user which encoding is used.
117 return edit_commit_message_encoded(infotext, ignoreline, start_message)126 return edit_commit_message_encoded(infotext, ignoreline, start_message)
118127
119128
@@ -139,6 +148,7 @@
139 :return: commit message or None.148 :return: commit message or None.
140 """149 """
141 msgfilename = None150 msgfilename = None
151 msgencoding = get_editor_encoding()
142 try:152 try:
143 msgfilename, hasinfo = _create_temp_file_with_commit_template(153 msgfilename, hasinfo = _create_temp_file_with_commit_template(
144 infotext, ignoreline, start_message)154 infotext, ignoreline, start_message)
@@ -167,7 +177,7 @@
167 f = file(msgfilename, 'rU')177 f = file(msgfilename, 'rU')
168 try:178 try:
169 try:179 try:
170 for line in codecs.getreader(osutils.get_user_encoding())(f):180 for line in codecs.getreader(msgencoding)(f):
171 stripped_line = line.strip()181 stripped_line = line.strip()
172 # strip empty line before the log message starts182 # strip empty line before the log message starts
173 if not started:183 if not started:
@@ -264,7 +274,7 @@
264274
265275
266def make_commit_message_template_encoded(working_tree, specific_files,276def make_commit_message_template_encoded(working_tree, specific_files,
267 diff=None, output_encoding='utf-8'):277 diff=None, output_encoding=None):
268 """Prepare a template file for a commit into a branch.278 """Prepare a template file for a commit into a branch.
269279
270 Returns an encoded string.280 Returns an encoded string.
@@ -277,6 +287,8 @@
277 from StringIO import StringIO # must be unicode-safe287 from StringIO import StringIO # must be unicode-safe
278 from bzrlib.diff import show_diff_trees288 from bzrlib.diff import show_diff_trees
279289
290 if output_encoding is None:
291 output_encoding = get_editor_encoding()
280 template = make_commit_message_template(working_tree, specific_files)292 template = make_commit_message_template(working_tree, specific_files)
281 template = template.encode(output_encoding, "replace")293 template = template.encode(output_encoding, "replace")
282294
283295
=== modified file 'bzrlib/tests/test_msgeditor.py'
--- bzrlib/tests/test_msgeditor.py 2011-11-16 17:19:13 +0000
+++ bzrlib/tests/test_msgeditor.py 2012-02-03 17:53:19 +0000
@@ -267,6 +267,12 @@
267 self.assertEqual(['/usr/bin/editor', 'vi', 'pico', 'nano', 'joe'],267 self.assertEqual(['/usr/bin/editor', 'vi', 'pico', 'nano', 'joe'],
268 editors[4:])268 editors[4:])
269269
270 def test_get_editor_encoding(self):
271 conf = config.GlobalStack()
272 conf.store._load_from_string('[DEFAULT]\neditor_encoding = dummy\n')
273 conf.store.save()
274 encoding = msgeditor.get_editor_encoding()
275 self.assertEqual('dummy', encoding)
270276
271 def test__run_editor_EACCES(self):277 def test__run_editor_EACCES(self):
272 """If running a configured editor raises EACESS, the user is warned."""278 """If running a configured editor raises EACESS, the user is warned."""
@@ -345,6 +351,18 @@
345 self.assertRaises(errors.BadCommitMessageEncoding,351 self.assertRaises(errors.BadCommitMessageEncoding,
346 msgeditor.edit_commit_message, '')352 msgeditor.edit_commit_message, '')
347353
354 def test_configured_encoding_commit_message(self):
355 self.make_fake_editor(u'\u1234\n'.encode('utf-8'))
356 self.overrideEnv('LANG', 'C')
357 conf = config.GlobalStack()
358 conf.store._load_from_string('[DEFAULT]\neditor_encoding = utf-8')
359 conf.store.save()
360
361 working_tree = self.make_uncommitted_tree()
362
363 msg = edit_commit_message_encoded(u'\u1235'.encode("utf-8"))
364 self.assertEqual(u'\u1234\n', msg)
365
348 def test_set_commit_message_no_hooks(self):366 def test_set_commit_message_no_hooks(self):
349 commit_obj = commit.Commit()367 commit_obj = commit.Commit()
350 self.assertIs(None,368 self.assertIs(None,

Subscribers

People subscribed via source and target branches