Merge ~jocha/cloud-init:jocha-reboot-bugfix into cloud-init:master

Proposed by Joshua Chan
Status: Superseded
Proposed branch: ~jocha/cloud-init:jocha-reboot-bugfix
Merge into: cloud-init:master
Diff against target: 170 lines (+86/-8)
2 files modified
cloudinit/sources/DataSourceAzure.py (+14/-6)
tests/unittests/test_datasource/test_azure.py (+72/-2)
Reviewer Review Type Date Requested Status
Chad Smith Needs Fixing
Douglas Jordan Pending
Review via email: mp+343563@code.launchpad.net

This proposal has been superseded by a proposal from 2018-04-24.

Commit message

Add reported ready marker file.

This change is for Azure VM Preprovisioning. A bug was found when after azure VMs report ready the first time, during the time when VM is polling indefinitely for the new ovf-env.xml from Instance Metadata Service (IMDS), if a reboot happens, we send another report ready signal to the fabric, which deletes the reprovisioning data on the node.

This marker file is used to fix this issue so that we will only send a report ready signal to the fabric when no marker file is present. Then, create a marker file so that when a reboot does occur, we check if a marker file has been created and decide whether we would like to send the repot ready signal.

LP: #1765214

To post a comment you must log in.
32327ce... by Joshua Chan

moved deletion of marker file after it is safe from reboot

Revision history for this message
Douglas Jordan (dojordan) :
5670b53... by Scott Moser

Schema: do not warn on duplicate items in commands.

runcmd, bootcmd, snap/commands, ubuntu-advantage/commands would
log warning (and fail if strict) on duplicate values in the commands.
But those should be allowed. Example, it is perfectly valid to do:
   runcmd: ['sleep 1', 'sleep 1']

LP: #1764264

f7ac3e6... by Junjie.Wang

doc: Add documentation for AliYun datasource.

Just add some documentation to readthedocs for AliYun.

222cabe... by Scott Moser

pylint: pay attention to unused variable warnings.

This enables warnings produced by pylint for unused variables (W0612),
and fixes the existing errors.

5114828... by Scott Moser

set_passwords: Add newline to end of sshd config, only restart if updated.

This admittedly does a fairly extensive re-factor to simply add a newline
to the end of sshd_config.

It makes the ssh_config updating portion of set_passwords more testable
and adds tests for that.

The new function is in 'update_ssh_config_lines' which allows you
to update a config with multiple changes even though only a single one
is currently used.

We also only restart the ssh daemon now if a change was made to the
config file. Before it was always restarted if the user specified
a value for ssh_pwauth other than 'unchanged'.

Thanks to Lorens Kockum for initial diagnosis and patch.

LP: #1677205

22d5d58... by Scott Moser

schema: in validation, raise ImportError if strict but no jsonschema.

validate_cloudconfig_schema with strict=True would not actually validate
if there was no jsonschema available. That seems kind of strange.
The change here is to make it raise an exception if strict was passed in.
And then to fix the one test that needed a skipIfJsonSchema wrapper.

fe82866... by Mike Gerdts

DataSourceSmartOS: list() should always return a list

If customer_metadata has no keys, the KEYS request returns an empty
string. Callers of the list() method expect a list to be returned and
will give a stack trace if this expectation is not met.

LP: #1763480

3352633... by Mike Gerdts

DataSourceSmartOS: sdc:hostname is ignored

There are three potential sources of the hostname, one of which is
documented SmartOS's vmadm(1M) via the hostname property. That
property's value is retrieved via the sdc:hostname key. The other
two sources for the hostname are a hostname key in customer_metadata
and the VM's uuid (sdc:uuid). Of these three, the sdc:hostname value
is not used in a meaningful way by DataSourceSmartOS.

This fix changes the fallback mechanism when hostname is not
specified in customer_metadata. The order of precedence for setting
the hostname is now 1) hostname in customer_metadata,
2) sdc:hostname, then 3) sdc:uuid.

LP: #1765085

eb99235... by Mike Gerdts

DataSourceSmartOS: add locking of serial device.

cloud-init and mdata-get each have their own implementation of the
SmartOS metadata protocol. If cloud-init and other services that call
mdata-get are run concurrently, crosstalk on the serial port can cause
them both to become confused.

This change makes it so that cloud-init uses the same cooperative
locking scheme that's used by mdata-get, thus preventing cross-talk
between mdata-get and cloud-init.

For testing, a VM running on a SmartOS host and pyserial are required.
If the tests are run on a platform other than SmartOS, those that use a
real serial port are skipped. pyserial remains commented in
requirements.txt because most testers will not be running atop SmartOS.

LP: #1746605

Revision history for this message
Chad Smith (chad.smith) :
Revision history for this message
Chad Smith (chad.smith) :
review: Needs Fixing
23002d0... by Joshua Chan

fix string formatting and split up unit tests

Unmerged commits

23002d0... by Joshua Chan

fix string formatting and split up unit tests

eb99235... by Mike Gerdts

DataSourceSmartOS: add locking of serial device.

cloud-init and mdata-get each have their own implementation of the
SmartOS metadata protocol. If cloud-init and other services that call
mdata-get are run concurrently, crosstalk on the serial port can cause
them both to become confused.

This change makes it so that cloud-init uses the same cooperative
locking scheme that's used by mdata-get, thus preventing cross-talk
between mdata-get and cloud-init.

For testing, a VM running on a SmartOS host and pyserial are required.
If the tests are run on a platform other than SmartOS, those that use a
real serial port are skipped. pyserial remains commented in
requirements.txt because most testers will not be running atop SmartOS.

LP: #1746605

3352633... by Mike Gerdts

DataSourceSmartOS: sdc:hostname is ignored

There are three potential sources of the hostname, one of which is
documented SmartOS's vmadm(1M) via the hostname property. That
property's value is retrieved via the sdc:hostname key. The other
two sources for the hostname are a hostname key in customer_metadata
and the VM's uuid (sdc:uuid). Of these three, the sdc:hostname value
is not used in a meaningful way by DataSourceSmartOS.

This fix changes the fallback mechanism when hostname is not
specified in customer_metadata. The order of precedence for setting
the hostname is now 1) hostname in customer_metadata,
2) sdc:hostname, then 3) sdc:uuid.

LP: #1765085

fe82866... by Mike Gerdts

DataSourceSmartOS: list() should always return a list

If customer_metadata has no keys, the KEYS request returns an empty
string. Callers of the list() method expect a list to be returned and
will give a stack trace if this expectation is not met.

LP: #1763480

22d5d58... by Scott Moser

schema: in validation, raise ImportError if strict but no jsonschema.

validate_cloudconfig_schema with strict=True would not actually validate
if there was no jsonschema available. That seems kind of strange.
The change here is to make it raise an exception if strict was passed in.
And then to fix the one test that needed a skipIfJsonSchema wrapper.

5114828... by Scott Moser

set_passwords: Add newline to end of sshd config, only restart if updated.

This admittedly does a fairly extensive re-factor to simply add a newline
to the end of sshd_config.

It makes the ssh_config updating portion of set_passwords more testable
and adds tests for that.

The new function is in 'update_ssh_config_lines' which allows you
to update a config with multiple changes even though only a single one
is currently used.

We also only restart the ssh daemon now if a change was made to the
config file. Before it was always restarted if the user specified
a value for ssh_pwauth other than 'unchanged'.

Thanks to Lorens Kockum for initial diagnosis and patch.

LP: #1677205

222cabe... by Scott Moser

pylint: pay attention to unused variable warnings.

This enables warnings produced by pylint for unused variables (W0612),
and fixes the existing errors.

f7ac3e6... by Junjie.Wang

doc: Add documentation for AliYun datasource.

Just add some documentation to readthedocs for AliYun.

5670b53... by Scott Moser

Schema: do not warn on duplicate items in commands.

runcmd, bootcmd, snap/commands, ubuntu-advantage/commands would
log warning (and fail if strict) on duplicate values in the commands.
But those should be allowed. Example, it is perfectly valid to do:
   runcmd: ['sleep 1', 'sleep 1']

LP: #1764264

3bd8fd3... by Chad Smith

net: Depend on iproute2's ip instead of net-tools ifconfig or route

The net-tools package is deprecated and will eventually be dropped. Use
"ip route", "link" or "address" instead of "ifconfig" or "route" calls.
Cloud-init can now run in an environment that no longer has net-tools.
This affects the network and route printing emitted to
cloud-config-output.log as well as the cc_disable_ec2_metadata module.

Additional changes:
 - separate readResource and resourceLocation into standalone test
   functions
 - Fix ipv4 address rows to report scopes represented by ip addr show
 - Formatted route/address ouput now handles multiple ipv4 and ipv6
   addresses on a single interface

Co-authored-by: James Hogarth <email address hidden>
Co-authored-by: Robert Schweikert <email address hidden>

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/cloudinit/sources/DataSourceAzure.py b/cloudinit/sources/DataSourceAzure.py
2index a71197a..b47a6af 100644
3--- a/cloudinit/sources/DataSourceAzure.py
4+++ b/cloudinit/sources/DataSourceAzure.py
5@@ -48,6 +48,7 @@ DEFAULT_FS = 'ext4'
6 # DMI chassis-asset-tag is set static for all azure instances
7 AZURE_CHASSIS_ASSET_TAG = '7783-7084-3265-9085-8269-3286-77'
8 REPROVISION_MARKER_FILE = "/var/lib/cloud/data/poll_imds"
9+REPORTED_READY_MARKER_FILE = "/var/lib/cloud/data/reported_ready"
10 IMDS_URL = "http://169.254.169.254/metadata/reprovisiondata"
11
12
13@@ -450,13 +451,16 @@ class DataSourceAzure(sources.DataSource):
14 # call DHCP and setup the ephemeral network to acquire the new IP.
15 return False
16
17- need_report = report_ready
18 while True:
19 try:
20 with EphemeralDHCPv4() as lease:
21- if need_report:
22+ if report_ready:
23 self._report_ready(lease=lease)
24- need_report = False
25+ path = REPORTED_READY_MARKER_FILE
26+ LOG.info("Create flag file, reported ready: %s", path)
27+ util.write_file(path, "{pid}: {time}\n".format(
28+ pid=os.getpid(), time=time()))
29+ report_ready = False
30 return readurl(url, timeout=1, headers=headers,
31 exception_cb=exc_cb, infinite=True).contents
32 except UrlError:
33@@ -490,14 +494,17 @@ class DataSourceAzure(sources.DataSource):
34 if (cfg.get('PreprovisionedVm') is True or
35 os.path.isfile(path)):
36 if not os.path.isfile(path):
37- LOG.info("Creating a marker file to poll imds")
38- util.write_file(path, "%s: %s\n" % (os.getpid(), time()))
39+ LOG.info("Creating a marker file to poll imds: %s",
40+ path)
41+ util.write_file(path, "{pid}: {time}\n".format(
42+ pid=os.getpid(), time=time()))
43 return True
44 return False
45
46 def _reprovision(self):
47 """Initiate the reprovisioning workflow."""
48- contents = self._poll_imds()
49+ report_ready = bool(not os.path.isfile(REPORTED_READY_MARKER_FILE))
50+ contents = self._poll_imds(report_ready)
51 md, ud, cfg = read_azure_ovf(contents)
52 return (md, ud, cfg, {'ovf-env.xml': contents})
53
54@@ -526,6 +533,7 @@ class DataSourceAzure(sources.DataSource):
55 "Error communicating with Azure fabric; You may experience."
56 "connectivity issues.", exc_info=True)
57 return False
58+ util.del_file(REPORTED_READY_MARKER_FILE)
59 util.del_file(REPROVISION_MARKER_FILE)
60 return fabric_data
61
62diff --git a/tests/unittests/test_datasource/test_azure.py b/tests/unittests/test_datasource/test_azure.py
63index 88fe76c..4ff2a66 100644
64--- a/tests/unittests/test_datasource/test_azure.py
65+++ b/tests/unittests/test_datasource/test_azure.py
66@@ -1163,12 +1163,13 @@ class TestAzureDataSourcePreprovisioning(CiTestCase):
67 cfg = ret[2]
68 self.assertFalse(cfg['PreprovisionedVm'])
69
70+ @mock.patch('cloudinit.sources.DataSourceAzure.util.write_file')
71 @mock.patch('cloudinit.sources.DataSourceAzure.util.is_FreeBSD')
72 @mock.patch('cloudinit.net.dhcp.EphemeralIPv4Network')
73 @mock.patch('cloudinit.net.dhcp.maybe_perform_dhcp_discovery')
74 @mock.patch('requests.Session.request')
75 def test_poll_imds_returns_ovf_env(self, fake_resp, m_dhcp, m_net,
76- m_is_bsd, *args):
77+ m_is_bsd, write_f, *args):
78 """The _poll_imds method should return the ovf_env.xml."""
79 m_is_bsd.return_value = False
80 m_dhcp.return_value = [{
81@@ -1194,12 +1195,13 @@ class TestAzureDataSourcePreprovisioning(CiTestCase):
82 prefix_or_mask='255.255.255.0', router='192.168.2.1')
83 self.assertEqual(m_net.call_count, 1)
84
85+ @mock.patch('cloudinit.sources.DataSourceAzure.util.write_file')
86 @mock.patch('cloudinit.sources.DataSourceAzure.util.is_FreeBSD')
87 @mock.patch('cloudinit.net.dhcp.EphemeralIPv4Network')
88 @mock.patch('cloudinit.net.dhcp.maybe_perform_dhcp_discovery')
89 @mock.patch('requests.Session.request')
90 def test__reprovision_calls__poll_imds(self, fake_resp, m_dhcp, m_net,
91- m_is_bsd, *args):
92+ m_is_bsd, write_f, *args):
93 """The _reprovision method should call poll IMDS."""
94 m_is_bsd.return_value = False
95 m_dhcp.return_value = [{
96@@ -1231,6 +1233,74 @@ class TestAzureDataSourcePreprovisioning(CiTestCase):
97 prefix_or_mask='255.255.255.0', router='192.168.2.1')
98 self.assertEqual(m_net.call_count, 1)
99
100+ @mock.patch('cloudinit.sources.DataSourceAzure.DataSourceAzure._poll_imds')
101+ @mock.patch('os.path.isfile')
102+ def test_reprovision_report_ready(self, isfile, _poll_imds,
103+ *args):
104+ """_reprovision should call _poll_IMDS(True) (with report ready)
105+ if flag file is not present"""
106+ isfile.return_value = False
107+ hostname = "myhost"
108+ username = "myuser"
109+ odata = {'HostName': hostname, 'UserName': username}
110+ _poll_imds.return_value = construct_valid_ovf_env(data=odata)
111+ dsa = dsaz.DataSourceAzure({}, distro=None, paths=self.paths)
112+ md, ud, cfg, d = dsa._reprovision()
113+ self.assertEqual(_poll_imds.call_args[0], tuple([True]))
114+
115+ @mock.patch('cloudinit.sources.DataSourceAzure.DataSourceAzure._poll_imds')
116+ @mock.patch('os.path.isfile')
117+ def test_reprovision_no_report_ready(self, isfile, _poll_imds,
118+ *args):
119+ """_reprovision should call _poll_IMDS(False) (without report ready)
120+ if flag file is present"""
121+ isfile.return_value = True
122+ hostname = "myhost"
123+ username = "myuser"
124+ odata = {'HostName': hostname, 'UserName': username}
125+ _poll_imds.return_value = construct_valid_ovf_env(data=odata)
126+ dsa = dsaz.DataSourceAzure({}, distro=None, paths=self.paths)
127+ md, ud, cfg, d = dsa._reprovision()
128+ self.assertEqual(_poll_imds.call_args[0], tuple([False]))
129+
130+ @mock.patch('cloudinit.sources.DataSourceAzure.util.write_file')
131+ @mock.patch('cloudinit.net.dhcp.EphemeralIPv4Network')
132+ @mock.patch('cloudinit.net.dhcp.maybe_perform_dhcp_discovery')
133+ @mock.patch('requests.Session.request')
134+ @mock.patch(
135+ 'cloudinit.sources.DataSourceAzure.DataSourceAzure._report_ready')
136+ def test_poll_imds_report_ready_true(self, report_ready_func,
137+ fake_resp, m_dhcp, m_net,
138+ write_f, *args):
139+ """The poll_imds should call report_ready when flag is true"""
140+ m_dhcp.return_value = [{
141+ 'interface': 'eth9', 'fixed-address': '192.168.2.9',
142+ 'routers': '192.168.2.1', 'subnet-mask': '255.255.255.0',
143+ 'unknown-245': '624c3620'}]
144+ dsa = dsaz.DataSourceAzure({}, distro=None, paths=self.paths)
145+ report_ready_flag = True
146+ dsa._poll_imds(report_ready_flag)
147+ self.assertEqual(report_ready_func.call_count, 1)
148+
149+ @mock.patch('cloudinit.net.dhcp.EphemeralIPv4Network')
150+ @mock.patch('cloudinit.net.dhcp.maybe_perform_dhcp_discovery')
151+ @mock.patch('requests.Session.request')
152+ @mock.patch(
153+ 'cloudinit.sources.DataSourceAzure.DataSourceAzure._report_ready')
154+ def test_poll_imds_report_ready_false(self, report_ready_func,
155+ fake_resp, m_dhcp, m_net,
156+ *args):
157+ """The poll_imds should not call reporting ready
158+ when flag is false"""
159+ m_dhcp.return_value = [{
160+ 'interface': 'eth9', 'fixed-address': '192.168.2.9',
161+ 'routers': '192.168.2.1', 'subnet-mask': '255.255.255.0',
162+ 'unknown-245': '624c3620'}]
163+ dsa = dsaz.DataSourceAzure({}, distro=None, paths=self.paths)
164+ report_ready_flag = False
165+ dsa._poll_imds(report_ready_flag)
166+ self.assertEqual(report_ready_func.call_count, 0)
167+
168 @mock.patch('cloudinit.sources.DataSourceAzure.util.write_file')
169 @mock.patch('os.path.isfile')
170 def test__should_reprovision_with_true_cfg(self, isfile, write_f, *args):

Subscribers

People subscribed via source and target branches