Merge lp:~vila/bzr/799212-non-ascii-confs into lp:bzr

Proposed by Vincent Ladeuil
Status: Merged
Approved by: Jelmer Vernooij
Approved revision: no longer in the source branch.
Merged at revision: 5989
Proposed branch: lp:~vila/bzr/799212-non-ascii-confs
Merge into: lp:bzr
Diff against target: 261 lines (+131/-16)
4 files modified
bzrlib/config.py (+21/-11)
bzrlib/errors.py (+9/-0)
bzrlib/tests/test_config.py (+97/-5)
doc/en/release-notes/bzr-2.4.txt (+4/-0)
To merge this branch: bzr merge lp:~vila/bzr/799212-non-ascii-confs
Reviewer Review Type Date Requested Status
Jelmer Vernooij (community) Approve
Review via email: mp+65199@code.launchpad.net

Commit message

Clearer exceptions for bad-content config files

Description of the change

This complements the previous submission by making sure all
config files errors related to a non-utf8 or not parseable
content properly mention the appropriate path/url.

The least obvious part is that TransportConfig() objects have
been involved in some bugs and since it uses ConfigObj directly,
it escaped the exception cathcing that was happening in
IniBasedConfig.

To post a comment you must log in.
Revision history for this message
Jelmer Vernooij (jelmer) :
review: Approve
Revision history for this message
Vincent Ladeuil (vila) wrote :

sent to pqm by email

Revision history for this message
Martin Packman (gz) wrote :

This set of changes is what I had in mind.

As I understand it, bug 502060 is about TransportConfig lacking
try/except, then bug 688677 and bug 797246 are the same non-utf8 issue
in copied code. However I don't understand where bug 799212 comes in
or even what it is really from the report.

One change I'd suggest is making ConfigContentError take the
UnicodeError instance as a second param so the UI has the option of
giving more details like where in the file to look for the bad bytes.

Revision history for this message
Vincent Ladeuil (vila) wrote :

> This set of changes is what I had in mind.

Good.

>
> As I understand it, bug 502060 is about TransportConfig lacking
> try/except,

Yes.

> then bug 688677 and bug 797246 are the same non-utf8 issue
> in copied code.

Not exactly but close enough.

> However I don't understand where bug 799212 comes in
> or even what it is really from the report.

Bah, that's just lp being helpful because I reused the same branch (see lp:bzr, a previous fix has already landed).

>
> One change I'd suggest is making ConfigContentError take the
> UnicodeError instance as a second param so the UI has the option of
> giving more details like where in the file to look for the bad bytes.

I thought about that even tried to point to the right line number.

But whatever trick you may want to do at this point you have to inspect the content of the file again. So you may as well split its content and try to decode each line (which I thought was a bit too much) so adding the UnicodeError doesn't add much (and I didn't want to dive into the various possible errors).

We may revisit that if we get new bug reports from people not getting the clue about the encoding.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'bzrlib/config.py'
--- bzrlib/config.py 2011-06-20 09:31:17 +0000
+++ bzrlib/config.py 2011-06-20 14:07:30 +0000
@@ -692,6 +692,8 @@
692 self._parser = ConfigObj(co_input, encoding='utf-8')692 self._parser = ConfigObj(co_input, encoding='utf-8')
693 except configobj.ConfigObjError, e:693 except configobj.ConfigObjError, e:
694 raise errors.ParseConfigError(e.errors, e.config.filename)694 raise errors.ParseConfigError(e.errors, e.config.filename)
695 except UnicodeDecodeError:
696 raise errors.ConfigContentError(self.file_name)
695 # Make sure self.reload() will use the right file name697 # Make sure self.reload() will use the right file name
696 self._parser.filename = self.file_name698 self._parser.filename = self.file_name
697 for hook in OldConfigHooks['load']:699 for hook in OldConfigHooks['load']:
@@ -1697,6 +1699,8 @@
1697 self._config = ConfigObj(self._input, encoding='utf-8')1699 self._config = ConfigObj(self._input, encoding='utf-8')
1698 except configobj.ConfigObjError, e:1700 except configobj.ConfigObjError, e:
1699 raise errors.ParseConfigError(e.errors, e.config.filename)1701 raise errors.ParseConfigError(e.errors, e.config.filename)
1702 except UnicodeError:
1703 raise errors.ConfigContentError(self._filename)
1700 return self._config1704 return self._config
17011705
1702 def _save(self):1706 def _save(self):
@@ -2178,12 +2182,21 @@
2178 except errors.NoSuchFile:2182 except errors.NoSuchFile:
2179 return StringIO()2183 return StringIO()
21802184
2185 def _external_url(self):
2186 return urlutils.join(self._transport.external_url(), self._filename)
2187
2181 def _get_configobj(self):2188 def _get_configobj(self):
2182 f = self._get_config_file()2189 f = self._get_config_file()
2183 try:2190 try:
2184 return ConfigObj(f, encoding='utf-8')2191 try:
2192 conf = ConfigObj(f, encoding='utf-8')
2193 except configobj.ConfigObjError, e:
2194 raise errors.ParseConfigError(e.errors, self._external_url())
2195 except UnicodeDecodeError:
2196 raise errors.ConfigContentError(self._external_url())
2185 finally:2197 finally:
2186 f.close()2198 f.close()
2199 return conf
21872200
2188 def _set_configobj(self, configobj):2201 def _set_configobj(self, configobj):
2189 out_file = StringIO()2202 out_file = StringIO()
@@ -2285,14 +2298,10 @@
2285 """Loads the Store from persistent storage."""2298 """Loads the Store from persistent storage."""
2286 raise NotImplementedError(self.load)2299 raise NotImplementedError(self.load)
22872300
2288 def _load_from_string(self, str_or_unicode):2301 def _load_from_string(self, bytes):
2289 """Create a store from a string in configobj syntax.2302 """Create a store from a string in configobj syntax.
22902303
2291 :param str_or_unicode: A string representing the file content. This will2304 :param bytes: A string representing the file content.
2292 be encoded to suit store needs internally.
2293
2294 This is for tests and should not be used in production unless a
2295 convincing use case can be demonstrated :)
2296 """2305 """
2297 raise NotImplementedError(self._load_from_string)2306 raise NotImplementedError(self._load_from_string)
22982307
@@ -2370,21 +2379,22 @@
2370 for hook in ConfigHooks['load']:2379 for hook in ConfigHooks['load']:
2371 hook(self)2380 hook(self)
23722381
2373 def _load_from_string(self, str_or_unicode):2382 def _load_from_string(self, bytes):
2374 """Create a config store from a string.2383 """Create a config store from a string.
23752384
2376 :param str_or_unicode: A string representing the file content. This will2385 :param bytes: A string representing the file content.
2377 be utf-8 encoded internally.
2378 """2386 """
2379 if self.is_loaded():2387 if self.is_loaded():
2380 raise AssertionError('Already loaded: %r' % (self._config_obj,))2388 raise AssertionError('Already loaded: %r' % (self._config_obj,))
2381 co_input = StringIO(str_or_unicode)2389 co_input = StringIO(bytes)
2382 try:2390 try:
2383 # The config files are always stored utf8-encoded2391 # The config files are always stored utf8-encoded
2384 self._config_obj = ConfigObj(co_input, encoding='utf-8')2392 self._config_obj = ConfigObj(co_input, encoding='utf-8')
2385 except configobj.ConfigObjError, e:2393 except configobj.ConfigObjError, e:
2386 self._config_obj = None2394 self._config_obj = None
2387 raise errors.ParseConfigError(e.errors, self.external_url())2395 raise errors.ParseConfigError(e.errors, self.external_url())
2396 except UnicodeDecodeError:
2397 raise errors.ConfigContentError(self.external_url())
23882398
2389 def save(self):2399 def save(self):
2390 if not self.is_loaded():2400 if not self.is_loaded():
23912401
=== modified file 'bzrlib/errors.py'
--- bzrlib/errors.py 2011-06-02 16:57:09 +0000
+++ bzrlib/errors.py 2011-06-20 14:07:30 +0000
@@ -1770,6 +1770,15 @@
1770 _fmt = "Working tree has conflicts."1770 _fmt = "Working tree has conflicts."
17711771
17721772
1773class ConfigContentError(BzrError):
1774
1775 _fmt = "Config file %(filename)s is not UTF-8 encoded\n"
1776
1777 def __init__(self, filename):
1778 BzrError.__init__(self)
1779 self.filename = filename
1780
1781
1773class ParseConfigError(BzrError):1782class ParseConfigError(BzrError):
17741783
1775 _fmt = "Error(s) parsing config file %(filename)s:\n%(errors)s"1784 _fmt = "Error(s) parsing config file %(filename)s:\n%(errors)s"
17761785
=== modified file 'bzrlib/tests/test_config.py'
--- bzrlib/tests/test_config.py 2011-06-20 09:31:17 +0000
+++ bzrlib/tests/test_config.py 2011-06-20 14:07:30 +0000
@@ -1871,9 +1871,34 @@
18711871
1872class TestTransportConfig(tests.TestCaseWithTransport):1872class TestTransportConfig(tests.TestCaseWithTransport):
18731873
1874 def test_load_utf8(self):
1875 """Ensure we can load an utf8-encoded file."""
1876 t = self.get_transport()
1877 unicode_user = u'b\N{Euro Sign}ar'
1878 unicode_content = u'user=%s' % (unicode_user,)
1879 utf8_content = unicode_content.encode('utf8')
1880 # Store the raw content in the config file
1881 t.put_bytes('foo.conf', utf8_content)
1882 conf = config.TransportConfig(t, 'foo.conf')
1883 self.assertEquals(unicode_user, conf.get_option('user'))
1884
1885 def test_load_non_ascii(self):
1886 """Ensure we display a proper error on non-ascii, non utf-8 content."""
1887 t = self.get_transport()
1888 t.put_bytes('foo.conf', 'user=foo\n#\xff\n')
1889 conf = config.TransportConfig(t, 'foo.conf')
1890 self.assertRaises(errors.ConfigContentError, conf._get_configobj)
1891
1892 def test_load_erroneous_content(self):
1893 """Ensure we display a proper error on content that can't be parsed."""
1894 t = self.get_transport()
1895 t.put_bytes('foo.conf', '[open_section\n')
1896 conf = config.TransportConfig(t, 'foo.conf')
1897 self.assertRaises(errors.ParseConfigError, conf._get_configobj)
1898
1874 def test_get_value(self):1899 def test_get_value(self):
1875 """Test that retreiving a value from a section is possible"""1900 """Test that retreiving a value from a section is possible"""
1876 bzrdir_config = config.TransportConfig(transport.get_transport('.'),1901 bzrdir_config = config.TransportConfig(self.get_transport('.'),
1877 'control.conf')1902 'control.conf')
1878 bzrdir_config.set_option('value', 'key', 'SECTION')1903 bzrdir_config.set_option('value', 'key', 'SECTION')
1879 bzrdir_config.set_option('value2', 'key2')1904 bzrdir_config.set_option('value2', 'key2')
@@ -2328,12 +2353,22 @@
2328 self.assertRaises(AssertionError, store._load_from_string, 'bar=baz')2353 self.assertRaises(AssertionError, store._load_from_string, 'bar=baz')
23292354
23302355
2331class TestBug799212(TestStore):2356class TestIniFileStoreContent(tests.TestCaseWithTransport):
23322357 """Simulate loading a config store without content of various encodings.
2333 def test_load_utf8(self):2358
2359 All files produced by bzr are in utf8 content.
2360
2361 Users may modify them manually and end up with a file that can't be
2362 loaded. We need to issue proper error messages in this case.
2363 """
2364
2365 invalid_utf8_char = '\xff'
2366
2367 def test_load_utf8(self):
2368 """Ensure we can load an utf8-encoded file."""
2334 t = self.get_transport()2369 t = self.get_transport()
2335 # From http://pad.lv/7992122370 # From http://pad.lv/799212
2336 unicode_user = u'Piotr O\u017carowski'2371 unicode_user = u'b\N{Euro Sign}ar'
2337 unicode_content = u'user=%s' % (unicode_user,)2372 unicode_content = u'user=%s' % (unicode_user,)
2338 utf8_content = unicode_content.encode('utf8')2373 utf8_content = unicode_content.encode('utf8')
2339 # Store the raw content in the config file2374 # Store the raw content in the config file
@@ -2343,6 +2378,58 @@
2343 stack = config.Stack([store.get_sections], store)2378 stack = config.Stack([store.get_sections], store)
2344 self.assertEquals(unicode_user, stack.get('user'))2379 self.assertEquals(unicode_user, stack.get('user'))
23452380
2381 def test_load_non_ascii(self):
2382 """Ensure we display a proper error on non-ascii, non utf-8 content."""
2383 t = self.get_transport()
2384 t.put_bytes('foo.conf', 'user=foo\n#%s\n' % (self.invalid_utf8_char,))
2385 store = config.IniFileStore(t, 'foo.conf')
2386 self.assertRaises(errors.ConfigContentError, store.load)
2387
2388 def test_load_erroneous_content(self):
2389 """Ensure we display a proper error on content that can't be parsed."""
2390 t = self.get_transport()
2391 t.put_bytes('foo.conf', '[open_section\n')
2392 store = config.IniFileStore(t, 'foo.conf')
2393 self.assertRaises(errors.ParseConfigError, store.load)
2394
2395
2396class TestIniConfigContent(tests.TestCaseWithTransport):
2397 """Simulate loading a IniBasedConfig without content of various encodings.
2398
2399 All files produced by bzr are in utf8 content.
2400
2401 Users may modify them manually and end up with a file that can't be
2402 loaded. We need to issue proper error messages in this case.
2403 """
2404
2405 invalid_utf8_char = '\xff'
2406
2407 def test_load_utf8(self):
2408 """Ensure we can load an utf8-encoded file."""
2409 # From http://pad.lv/799212
2410 unicode_user = u'b\N{Euro Sign}ar'
2411 unicode_content = u'user=%s' % (unicode_user,)
2412 utf8_content = unicode_content.encode('utf8')
2413 # Store the raw content in the config file
2414 with open('foo.conf', 'wb') as f:
2415 f.write(utf8_content)
2416 conf = config.IniBasedConfig(file_name='foo.conf')
2417 self.assertEquals(unicode_user, conf.get_user_option('user'))
2418
2419 def test_load_badly_encoded_content(self):
2420 """Ensure we display a proper error on non-ascii, non utf-8 content."""
2421 with open('foo.conf', 'wb') as f:
2422 f.write('user=foo\n#%s\n' % (self.invalid_utf8_char,))
2423 conf = config.IniBasedConfig(file_name='foo.conf')
2424 self.assertRaises(errors.ConfigContentError, conf._get_parser)
2425
2426 def test_load_erroneous_content(self):
2427 """Ensure we display a proper error on content that can't be parsed."""
2428 with open('foo.conf', 'wb') as f:
2429 f.write('[open_section\n')
2430 conf = config.IniBasedConfig(file_name='foo.conf')
2431 self.assertRaises(errors.ParseConfigError, conf._get_parser)
2432
23462433
2347class TestMutableStore(TestStore):2434class TestMutableStore(TestStore):
23482435
@@ -3052,6 +3139,11 @@
3052 self.assertEquals({}, conf._get_config())3139 self.assertEquals({}, conf._get_config())
3053 self._got_user_passwd(None, None, conf, 'http', 'foo.net')3140 self._got_user_passwd(None, None, conf, 'http', 'foo.net')
30543141
3142 def test_non_utf8_config(self):
3143 conf = config.AuthenticationConfig(_file=StringIO(
3144 'foo = bar\xff'))
3145 self.assertRaises(errors.ConfigContentError, conf._get_config)
3146
3055 def test_missing_auth_section_header(self):3147 def test_missing_auth_section_header(self):
3056 conf = config.AuthenticationConfig(_file=StringIO('foo = bar'))3148 conf = config.AuthenticationConfig(_file=StringIO('foo = bar'))
3057 self.assertRaises(ValueError, conf.get_credentials, 'ftp', 'foo.net')3149 self.assertRaises(ValueError, conf.get_credentials, 'ftp', 'foo.net')
30583150
=== modified file 'doc/en/release-notes/bzr-2.4.txt'
--- doc/en/release-notes/bzr-2.4.txt 2011-06-20 09:31:17 +0000
+++ doc/en/release-notes/bzr-2.4.txt 2011-06-20 14:07:30 +0000
@@ -29,6 +29,10 @@
29Bug Fixes29Bug Fixes
30*********30*********
3131
32* Display a proper error message when a config file content cannot be
33 decoded as UTF-8 or when it cannot be parsed.
34 (Vincent Ladeuil, #502060, #688677, #792246)
35
32* Properly load utf8-encoded config files. (Vincent Ladeuil, #799212)36* Properly load utf8-encoded config files. (Vincent Ladeuil, #799212)
3337
34.. Fixes for situations where bzr would previously crash or give incorrect38.. Fixes for situations where bzr would previously crash or give incorrect