Proposed by Scott Moser on 2018-02-22
Status: Merged
Merged at revision: 46cb6716c27d4496ce3d2bea7684803f522f277d
Proposed branch: ~smoser/cloud-init:bug/1751051-subp-encode-with-utf8
Merge into: cloud-init:master
Diff against target: 72 lines (+40/-1)
2 files modified
cloudinit/ (+6/-1)
tests/unittests/ (+34/-0)
Reviewer Review Type Date Requested Status
Server Team CI bot continuous-integration Approve on 2018-02-23
Ryan Harper 2018-02-22 Approve on 2018-02-23
Commit message

subp: Fix subp usage with non-ascii characters when no system locale.

If python starts up without a locale set, then its default encoding
ends up set as ascii. That is not easily changed with the likes of
setlocale. In order to avoid UnicodeDecodeErrors cloud-init will
encode to bytes a python3 string or python2 basestring so that the
values passed to Popen are already bytes.

LP: #1751051

Description of the change

see commit message

FAILED: Continuous integration, rev:927423ed11f3e2695384521a4ffe8c54df153a75
Executed test runs:
    SUCCESS: Checkout
    FAILED: Unit & Style Tests

review: Needs Fixing (continuous-integration)

FAILED: Continuous integration, rev:529ac9d19a1d73807efb956ae7de1ece9f51c0f9
Executed test runs:
    SUCCESS: Checkout
    FAILED: Unit & Style Tests

review: Needs Fixing (continuous-integration)
Ryan Harper (raharper) wrote :

Nifty test, and good change. The comment doesn't quite match the code;

'cloud-init will decode'

should be 'cloud-init will encode'


Scott Moser (smoser) wrote :

yeah, good catch. updated commit message and the comment also.

Ryan Harper (raharper) :
review: Approve

PASSED: Continuous integration, rev:6f9d258eda583a4250182bb2cb1ab6af93c83383
Executed test runs:
    SUCCESS: Checkout
    SUCCESS: Unit & Style Tests
    SUCCESS: Ubuntu LTS: Build
    SUCCESS: Ubuntu LTS: Integration
    SUCCESS: MAAS Compatability Testing
    IN_PROGRESS: Declarative: Post Actions

review: Approve (continuous-integration)

1diff --git a/cloudinit/ b/cloudinit/
2index 338fb97..5a919cf 100644
3--- a/cloudinit/
4+++ b/cloudinit/
5@@ -1865,8 +1865,13 @@ def subp(args, data=None, rcs=None, env=None, capture=True, shell=False,
6 if not isinstance(data, bytes):
7 data = data.encode()
9+ # Popen converts entries in the arguments array from non-bytes to bytes.
10+ # When locale is unset it may use ascii for that encoding which can
11+ # cause UnicodeDecodeErrors. (LP: #1751051)
12+ bytes_args = [x if isinstance(x, six.binary_type) else x.encode("utf-8")
13+ for x in args]
14 try:
15- sp = subprocess.Popen(args, stdout=stdout,
16+ sp = subprocess.Popen(bytes_args, stdout=stdout,
17 stderr=stderr, stdin=stdin,
18 env=env, shell=shell)
19 (out, err) = sp.communicate(data)
20diff --git a/tests/unittests/ b/tests/unittests/
21index 4a92e74..89ae40f 100644
22--- a/tests/unittests/
23+++ b/tests/unittests/
24@@ -8,7 +8,9 @@ import shutil
25 import stat
26 import tempfile
28+import json
29 import six
30+import sys
31 import yaml
33 from cloudinit import importer, util
34@@ -733,6 +735,38 @@ class TestSubp(helpers.CiTestCase):
35 self.assertEqual("/target/my/path/",
36 util.target_path("/target/", "///my/path/"))
38+ def test_c_lang_can_take_utf8_args(self):
39+ """Independent of system LC_CTYPE, args can contain utf-8 strings.
41+ When python starts up, its default encoding gets set based on
42+ the value of LC_CTYPE. If no system locale is set, the default
43+ encoding for both python2 and python3 in some paths will end up
44+ being ascii.
46+ Attempts to use setlocale or patching (or changing) os.environ
47+ in the current environment seem to not be effective.
49+ This test starts up a python with LC_CTYPE set to C so that
50+ the default encoding will be set to ascii. In such an environment
51+ Popen(['command', 'non-ascii-arg']) would cause a UnicodeDecodeError.
52+ """
53+ python_prog = '\n'.join([
54+ 'import json, sys',
55+ 'from cloudinit.util import subp',
56+ 'data =',
57+ 'cmd = json.loads(data)',
58+ 'subp(cmd, capture=False)',
59+ ''])
60+ cmd = [BASH, '-c', 'echo -n "$@"', '--',
61+ self.utf8_valid.decode("utf-8")]
62+ python_subp = [sys.executable, '-c', python_prog]
64+ out, _err = util.subp(
65+ python_subp, update_env={'LC_CTYPE': 'C'},
66+ data=json.dumps(cmd).encode("utf-8"),
67+ decode=False)
68+ self.assertEqual(self.utf8_valid, out)
71 class TestEncode(helpers.TestCase):
72 """Test the encoding functions"""


