Merge ~jasonzio/cloud-init:randomSeed into cloud-init:master

Proposed by Jason Zions
Status: Merged
Approved by: Dan Watkins
Approved revision: 335ec2cdc09995ecbb90e5d1aa5622ea6c6a958e
Merge reported by: Server Team CI bot
Merged at revision: not available
Proposed branch: ~jasonzio/cloud-init:randomSeed
Merge into: cloud-init:master
Diff against target: 105 lines (+42/-7)
3 files modified
cloudinit/sources/DataSourceAzure.py (+19/-5)
tests/data/azure/non_unicode_random_string (+1/-0)
tests/unittests/test_datasource/test_azure.py (+22/-2)
Reviewer Review Type Date Requested Status
Dan Watkins Approve
Server Team CI bot continuous-integration Approve
Review via email: mp+365065@code.launchpad.net

Commit message

Azure: Ensure platform random_seed is always serializable as JSON.

The Azure platform surfaces random bytes into /sys via Hyper-V.
Python 2.7 json.dump() raises an exception if asked to convert
a str with non-character content, and python 3.0 json.dump()
won't serialize a "bytes" value. As a result, c-i instance
data is often not written by Azure, making reboots slower (c-i
has to repeat work).

The random data is base64-encoded and then decoded into a string
(str or unicode depending on the version of Python in use). The
base64 string has just as many bits of entropy, so we're not
throwing away useful "information", but we can be certain
json.dump() will correctly serialize the bits.

To post a comment you must log in.
Revision history for this message
Server Team CI bot (server-team-bot) wrote :

FAILED: Continuous integration, rev:87a6981b6caf9dbe049c6924a13d83597bf73c9a
https://jenkins.ubuntu.com/server/job/cloud-init-ci/650/
Executed test runs:
    SUCCESS: Checkout
    FAILED: Unit & Style Tests

Click here to trigger a rebuild:
https://jenkins.ubuntu.com/server/job/cloud-init-ci/650/rebuild

review: Needs Fixing (continuous-integration)
Revision history for this message
Dan Watkins (oddbloke) wrote :

Hi Jason, thanks for the MP! I've left some inline comments (including pointing out what's causing the CI failure).

review: Needs Fixing
Revision history for this message
Jason Zions (jasonzio) wrote :

Responses in-line. I'll update the MP.

~jasonzio/cloud-init:randomSeed updated
7fbdb3c... by "Jason Zions (MSFT)" <email address hidden>

Always b64-encode. Move poison seed to data file.

Revision history for this message
Server Team CI bot (server-team-bot) wrote :

FAILED: Continuous integration, rev:7fbdb3cc10a0a37e856eb85229dcad8130456408
https://jenkins.ubuntu.com/server/job/cloud-init-ci/651/
Executed test runs:
    SUCCESS: Checkout
    FAILED: Unit & Style Tests

Click here to trigger a rebuild:
https://jenkins.ubuntu.com/server/job/cloud-init-ci/651/rebuild

review: Needs Fixing (continuous-integration)
~jasonzio/cloud-init:randomSeed updated
80e0c63... by "Jason Zions (MSFT)" <email address hidden>

Tell load_json() to expect to produce something other than a dict

Revision history for this message
Dan Watkins (oddbloke) :
Revision history for this message
Dan Watkins (oddbloke) wrote :

Thanks for the swift response Jason!

Revision history for this message
Server Team CI bot (server-team-bot) wrote :

FAILED: Continuous integration, rev:80e0c63e21b7ac3a64bb4f2977993368032b50ec
https://jenkins.ubuntu.com/server/job/cloud-init-ci/652/
Executed test runs:
    SUCCESS: Checkout
    FAILED: Unit & Style Tests

Click here to trigger a rebuild:
https://jenkins.ubuntu.com/server/job/cloud-init-ci/652/rebuild

review: Needs Fixing (continuous-integration)
~jasonzio/cloud-init:randomSeed updated
02e2aef... by "Jason Zions (MSFT)" <email address hidden>

load_json is happiest with top-level dict

Revision history for this message
Jason Zions (jasonzio) wrote :

This is why I wrapped the test string in a dict. util.load_json() assumes the result is a dict unless told otherwise, and the type that comes back is either unicode in python2 and str in python3 but there's no immediately obvious way to tell load_json() that it's one of those two. If I wrap the string in a dict, util.load_json() is happy, and the code is clear.

Revision history for this message
Server Team CI bot (server-team-bot) wrote :

FAILED: Continuous integration, rev:02e2aefa74abd14bde7aeefc5b7d17229a7842e1
https://jenkins.ubuntu.com/server/job/cloud-init-ci/653/
Executed test runs:
    SUCCESS: Checkout
    FAILED: Unit & Style Tests

Click here to trigger a rebuild:
https://jenkins.ubuntu.com/server/job/cloud-init-ci/653/rebuild

review: Needs Fixing (continuous-integration)
~jasonzio/cloud-init:randomSeed updated
1d44d97... by "Jason Zions (MSFT)" <email address hidden>

Handle bytes() object correctly for python3

Revision history for this message
Server Team CI bot (server-team-bot) wrote :

FAILED: Continuous integration, rev:1d44d97034d0fcc65ba481b1fafd5e781e99e18e
https://jenkins.ubuntu.com/server/job/cloud-init-ci/654/
Executed test runs:
    SUCCESS: Checkout
    FAILED: Unit & Style Tests

Click here to trigger a rebuild:
https://jenkins.ubuntu.com/server/job/cloud-init-ci/654/rebuild

review: Needs Fixing (continuous-integration)
~jasonzio/cloud-init:randomSeed updated
335ec2c... by "Jason Zions (MSFT)" <email address hidden>

Remove unneeded import

Revision history for this message
Server Team CI bot (server-team-bot) wrote :

PASSED: Continuous integration, rev:335ec2cdc09995ecbb90e5d1aa5622ea6c6a958e
https://jenkins.ubuntu.com/server/job/cloud-init-ci/655/
Executed test runs:
    SUCCESS: Checkout
    SUCCESS: Unit & Style Tests
    SUCCESS: Ubuntu LTS: Build
    SUCCESS: Ubuntu LTS: Integration
    IN_PROGRESS: Declarative: Post Actions

Click here to trigger a rebuild:
https://jenkins.ubuntu.com/server/job/cloud-init-ci/655/rebuild

review: Approve (continuous-integration)
Revision history for this message
Dan Watkins (oddbloke) wrote :

This looks great, thanks for iterating on it, Jason!

(I'm removing the Description because it tickles a bug in our auto-lander; I've just appended it to the commit message.)

review: Approve
Revision history for this message
Scott Moser (smoser) wrote :

There was discussion on this issue/fix in #cloud-init yesterday.
It is really just a specific case of the general python 2 only issue described in bug 1801364

Revision history for this message
Jason Zions (jasonzio) wrote :

You're correct, Scott, this is a specific fix in Azure for the python2 issue. The true heart of the issue is "JSON cannot store a random sequence of octets that do not form a legal UTF-8 string", but that is precisely the desired contract for random_seed.

I suggest cloud-init change its internal contract for random_seed to be "datasource provides a base64-encoded string of random octets". cc_random_seed.py should rely upon that and do a base64.b64decode on that value before writing it to /dev/urandom. Instance data persist/restore would work unchanged, since they'd now be dealing with a JSON-safe string. Only the two datasources that provide random_seed would have to change, and that change would look almost exactly like the one I made to the Azure source.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/cloudinit/sources/DataSourceAzure.py b/cloudinit/sources/DataSourceAzure.py
2index eccbee5..b4e3f06 100644
3--- a/cloudinit/sources/DataSourceAzure.py
4+++ b/cloudinit/sources/DataSourceAzure.py
5@@ -54,6 +54,7 @@ REPROVISION_MARKER_FILE = "/var/lib/cloud/data/poll_imds"
6 REPORTED_READY_MARKER_FILE = "/var/lib/cloud/data/reported_ready"
7 AGENT_SEED_DIR = '/var/lib/waagent'
8 IMDS_URL = "http://169.254.169.254/metadata/"
9+PLATFORM_ENTROPY_SOURCE = "/sys/firmware/acpi/tables/OEM0"
10
11 # List of static scripts and network config artifacts created by
12 # stock ubuntu suported images.
13@@ -195,6 +196,8 @@ if util.is_FreeBSD():
14 RESOURCE_DISK_PATH = "/dev/" + res_disk
15 else:
16 LOG.debug("resource disk is None")
17+ # TODO Find where platform entropy data is surfaced
18+ PLATFORM_ENTROPY_SOURCE = None
19
20 BUILTIN_DS_CONFIG = {
21 'agent_command': AGENT_START_BUILTIN,
22@@ -1100,16 +1103,27 @@ def _check_freebsd_cdrom(cdrom_dev):
23 return False
24
25
26-def _get_random_seed():
27+def _get_random_seed(source=PLATFORM_ENTROPY_SOURCE):
28 """Return content random seed file if available, otherwise,
29 return None."""
30 # azure / hyper-v provides random data here
31- # TODO. find the seed on FreeBSD platform
32 # now update ds_cfg to reflect contents pass in config
33- if util.is_FreeBSD():
34+ if source is None:
35 return None
36- return util.load_file("/sys/firmware/acpi/tables/OEM0",
37- quiet=True, decode=False)
38+ seed = util.load_file(source, quiet=True, decode=False)
39+
40+ # The seed generally contains non-Unicode characters. load_file puts
41+ # them into a str (in python 2) or bytes (in python 3). In python 2,
42+ # bad octets in a str cause util.json_dumps() to throw an exception. In
43+ # python 3, bytes is a non-serializable type, and the handler load_file
44+ # uses applies b64 encoding *again* to handle it. The simplest solution
45+ # is to just b64encode the data and then decode it to a serializable
46+ # string. Same number of bits of entropy, just with 25% more zeroes.
47+ # There's no need to undo this base64-encoding when the random seed is
48+ # actually used in cc_seed_random.py.
49+ seed = base64.b64encode(seed).decode()
50+
51+ return seed
52
53
54 def list_possible_azure_ds_devs():
55diff --git a/tests/data/azure/non_unicode_random_string b/tests/data/azure/non_unicode_random_string
56new file mode 100644
57index 0000000..b9ecefb
58--- /dev/null
59+++ b/tests/data/azure/non_unicode_random_string
60@@ -0,0 +1 @@
61+OEM0d\x00\x00\x00\x01\x80VRTUALMICROSFT\x02\x17\x00\x06MSFT\x97\x00\x00\x00C\xb4{V\xf4X%\x061x\x90\x1c\xfen\x86\xbf~\xf5\x8c\x94&\x88\xed\x84\xf9B\xbd\xd3\xf1\xdb\xee:\xd9\x0fc\x0e\x83(\xbd\xe3'\xfc\x85,\xdf\xf4\x13\x99N\xc5\xf3Y\x1e\xe3\x0b\xa4H\x08J\xb9\xdcdb$
62\ No newline at end of file
63diff --git a/tests/unittests/test_datasource/test_azure.py b/tests/unittests/test_datasource/test_azure.py
64index 6b05b8f..53c56cd 100644
65--- a/tests/unittests/test_datasource/test_azure.py
66+++ b/tests/unittests/test_datasource/test_azure.py
67@@ -7,11 +7,11 @@ from cloudinit.sources import (
68 UNSET, DataSourceAzure as dsaz, InvalidMetaDataException)
69 from cloudinit.util import (b64e, decode_binary, load_file, write_file,
70 find_freebsd_part, get_path_dev_freebsd,
71- MountFailedError)
72+ MountFailedError, json_dumps, load_json)
73 from cloudinit.version import version_string as vs
74 from cloudinit.tests.helpers import (
75 HttprettyTestCase, CiTestCase, populate_dir, mock, wrap_and_call,
76- ExitStack)
77+ ExitStack, resourceLocation)
78
79 import crypt
80 import httpretty
81@@ -1923,4 +1923,24 @@ class TestWBIsPlatformViable(CiTestCase):
82 self.logs.getvalue())
83
84
85+class TestRandomSeed(CiTestCase):
86+ """Test proper handling of random_seed"""
87+
88+ def test_non_ascii_seed_is_serializable(self):
89+ """Pass if a random string from the Azure infrastructure which
90+ contains at least one non-Unicode character can be converted to/from
91+ JSON without alteration and without throwing an exception.
92+ """
93+ path = resourceLocation("azure/non_unicode_random_string")
94+ result = dsaz._get_random_seed(path)
95+
96+ obj = {'seed': result}
97+ try:
98+ serialized = json_dumps(obj)
99+ deserialized = load_json(serialized)
100+ except UnicodeDecodeError:
101+ self.fail("Non-serializable random seed returned")
102+
103+ self.assertEqual(deserialized['seed'], result)
104+
105 # vi: ts=4 expandtab

Subscribers

People subscribed via source and target branches