Merge ~jocha/cloud-init:jocha-reboot-bugfix-2 into cloud-init:master

Proposed by Joshua Chan
Status: Merged
Approved by: Chad Smith
Approved revision: 40962b66cd13f48fbf21b46f9acc25bed1cd0d42
Merge reported by: Chad Smith
Merged at revision: aae494c39f4c6f625e7409ca262e657d085dd5d1
Proposed branch: ~jocha/cloud-init:jocha-reboot-bugfix-2
Merge into: cloud-init:master
Diff against target: 286 lines (+134/-57)
2 files modified
cloudinit/sources/DataSourceAzure.py (+15/-6)
tests/unittests/test_datasource/test_azure.py (+119/-51)
Reviewer Review Type Date Requested Status
Chad Smith Approve
Server Team CI bot continuous-integration Approve
Douglas Jordan Pending
Review via email: mp+344192@code.launchpad.net

This proposal supersedes a proposal from 2018-04-24.

Commit message

azure: Add reported ready marker file.

This change is for Azure VM Preprovisioning. A bug was found when after
azure VMs report ready the first time, during the time when VM is polling
indefinitely for the new ovf-env.xml from Instance Metadata Service
(IMDS), if a reboot happens, we send another report ready signal to the
fabric, which deletes the reprovisioning data on the node.

This marker file is used to fix this issue so that we will only send a
report ready signal to the fabric when no marker file is present. Then,
create a marker file so that when a reboot does occur, we check if a
marker file has been created and decide whether we would like to send the
repot ready signal.

LP: #1765214

To post a comment you must log in.
Revision history for this message
Douglas Jordan (dojordan) : Posted in a previous version of this proposal
Revision history for this message
Chad Smith (chad.smith) : Posted in a previous version of this proposal
Revision history for this message
Chad Smith (chad.smith) : Posted in a previous version of this proposal
review: Needs Fixing
Revision history for this message
Joshua Chan (jocha) wrote :

I think I messed up my previous branch (jocha-reboot-bugfix) when I did "git pull --rebase" as it had all the commits from others listed as unmerged commits on my branch (jocha-reboot-bugfix). So I decided to start a new branch (jocha-reboot-bugfix-2) of the latest master and cherry picked my changes on top of it.

Thanks Chad for the feedback. I agree with all the comments and have resolved it.

Revision history for this message
Ryan Harper (raharper) :
Revision history for this message
Joshua Chan (jocha) wrote :

Thanks Ryan for the feedback! I replied to some comments. New commit to follow after receiving more feedback.

e549839... by Joshua Chan

rephrased a log and moved report ready after writing marker file

Revision history for this message
Chad Smith (chad.smith) :
60bccb9... by Joshua Chan

refactored unit test into seperate focussed test classes

Revision history for this message
Server Team CI bot (server-team-bot) wrote :

PASSED: Continuous integration, rev:60bccb9fa864a616b3781560d1c95fd50b111b71
https://jenkins.ubuntu.com/server/job/cloud-init-ci/1097/
Executed test runs:
    SUCCESS: Checkout
    SUCCESS: Unit & Style Tests
    SUCCESS: Ubuntu LTS: Build
    SUCCESS: Ubuntu LTS: Integration
    SUCCESS: MAAS Compatability Testing
    IN_PROGRESS: Declarative: Post Actions

Click here to trigger a rebuild:
https://jenkins.ubuntu.com/server/job/cloud-init-ci/1097/rebuild

review: Approve (continuous-integration)
Revision history for this message
Chad Smith (chad.smith) wrote :

Last pass I promise. What do you think about this. http://paste.ubuntu.com/p/MsVK5PZZ98/

let's avoid leaking REPORTED_READY_MARKER_FILE outside of poll_imds because poll_imds is the only thing that cares about whether or not this marker exists. It simplifies the poll_imds arg list and avoid unnecessary complexity outside of this method.

Revision history for this message
Chad Smith (chad.smith) wrote :

tested with my latest/last suggested pass on Azure and +1 to landing if you agree

Revision history for this message
Chad Smith (chad.smith) wrote :

Updated the patch to fix flakes/lint http://paste.ubuntu.com/p/DtSmYqbgMV/

40962b6... by Joshua Chan

patching chad's suggestions to limit scope of marker file and flakes

Revision history for this message
Chad Smith (chad.smith) wrote :

Thank you for your merge proposal.

Your branch has been set to 'Work in progress'.
Please set the branch back to 'Needs Review' after resolving the issues below.

Thanks again,
Your friendly neighborhood cloud-init robot.

Please fix the following issues:
------------------------------
Commit message lints:
 - Line #2 has 270 too many characters. Line starts with: "This change is for Azure"... - Line #4 has 231 too many characters. Line starts with: "This marker file is used"...
------------------------------

For more information, see commit message guidelines at
https://cloudinit.readthedocs.io/en/latest/topics/hacking.html#do-these-things-for-each-feature-or-bug

review: Needs Fixing
Revision history for this message
Chad Smith (chad.smith) wrote :

An upstream commit landed for this bug.

To view that commit see the following URL:
https://git.launchpad.net/cloud-init/commit/?id=aae494c3

Revision history for this message
Chad Smith (chad.smith) :
review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/cloudinit/sources/DataSourceAzure.py b/cloudinit/sources/DataSourceAzure.py
2index a71197a..1b03d46 100644
3--- a/cloudinit/sources/DataSourceAzure.py
4+++ b/cloudinit/sources/DataSourceAzure.py
5@@ -48,6 +48,7 @@ DEFAULT_FS = 'ext4'
6 # DMI chassis-asset-tag is set static for all azure instances
7 AZURE_CHASSIS_ASSET_TAG = '7783-7084-3265-9085-8269-3286-77'
8 REPROVISION_MARKER_FILE = "/var/lib/cloud/data/poll_imds"
9+REPORTED_READY_MARKER_FILE = "/var/lib/cloud/data/reported_ready"
10 IMDS_URL = "http://169.254.169.254/metadata/reprovisiondata"
11
12
13@@ -436,11 +437,12 @@ class DataSourceAzure(sources.DataSource):
14 LOG.debug("negotiating already done for %s",
15 self.get_instance_id())
16
17- def _poll_imds(self, report_ready=True):
18+ def _poll_imds(self):
19 """Poll IMDS for the new provisioning data until we get a valid
20 response. Then return the returned JSON object."""
21 url = IMDS_URL + "?api-version=2017-04-02"
22 headers = {"Metadata": "true"}
23+ report_ready = bool(not os.path.isfile(REPORTED_READY_MARKER_FILE))
24 LOG.debug("Start polling IMDS")
25
26 def exc_cb(msg, exception):
27@@ -450,13 +452,17 @@ class DataSourceAzure(sources.DataSource):
28 # call DHCP and setup the ephemeral network to acquire the new IP.
29 return False
30
31- need_report = report_ready
32 while True:
33 try:
34 with EphemeralDHCPv4() as lease:
35- if need_report:
36+ if report_ready:
37+ path = REPORTED_READY_MARKER_FILE
38+ LOG.info(
39+ "Creating a marker file to report ready: %s", path)
40+ util.write_file(path, "{pid}: {time}\n".format(
41+ pid=os.getpid(), time=time()))
42 self._report_ready(lease=lease)
43- need_report = False
44+ report_ready = False
45 return readurl(url, timeout=1, headers=headers,
46 exception_cb=exc_cb, infinite=True).contents
47 except UrlError:
48@@ -490,8 +496,10 @@ class DataSourceAzure(sources.DataSource):
49 if (cfg.get('PreprovisionedVm') is True or
50 os.path.isfile(path)):
51 if not os.path.isfile(path):
52- LOG.info("Creating a marker file to poll imds")
53- util.write_file(path, "%s: %s\n" % (os.getpid(), time()))
54+ LOG.info("Creating a marker file to poll imds: %s",
55+ path)
56+ util.write_file(path, "{pid}: {time}\n".format(
57+ pid=os.getpid(), time=time()))
58 return True
59 return False
60
61@@ -526,6 +534,7 @@ class DataSourceAzure(sources.DataSource):
62 "Error communicating with Azure fabric; You may experience."
63 "connectivity issues.", exc_info=True)
64 return False
65+ util.del_file(REPORTED_READY_MARKER_FILE)
66 util.del_file(REPROVISION_MARKER_FILE)
67 return fabric_data
68
69diff --git a/tests/unittests/test_datasource/test_azure.py b/tests/unittests/test_datasource/test_azure.py
70index 88fe76c..26e8d7d 100644
71--- a/tests/unittests/test_datasource/test_azure.py
72+++ b/tests/unittests/test_datasource/test_azure.py
73@@ -1125,19 +1125,9 @@ class TestAzureNetExists(CiTestCase):
74 self.assertTrue(hasattr(dsaz, "DataSourceAzureNet"))
75
76
77-@mock.patch('cloudinit.sources.DataSourceAzure.util.subp')
78-@mock.patch.object(dsaz, 'get_hostname')
79-@mock.patch.object(dsaz, 'set_hostname')
80-class TestAzureDataSourcePreprovisioning(CiTestCase):
81-
82- def setUp(self):
83- super(TestAzureDataSourcePreprovisioning, self).setUp()
84- tmp = self.tmp_dir()
85- self.waagent_d = self.tmp_path('/var/lib/waagent', tmp)
86- self.paths = helpers.Paths({'cloud_dir': tmp})
87- dsaz.BUILTIN_DS_CONFIG['data_dir'] = self.waagent_d
88+class TestPreprovisioningReadAzureOvfFlag(CiTestCase):
89
90- def test_read_azure_ovf_with_true_flag(self, *args):
91+ def test_read_azure_ovf_with_true_flag(self):
92 """The read_azure_ovf method should set the PreprovisionedVM
93 cfg flag if the proper setting is present."""
94 content = construct_valid_ovf_env(
95@@ -1146,7 +1136,7 @@ class TestAzureDataSourcePreprovisioning(CiTestCase):
96 cfg = ret[2]
97 self.assertTrue(cfg['PreprovisionedVm'])
98
99- def test_read_azure_ovf_with_false_flag(self, *args):
100+ def test_read_azure_ovf_with_false_flag(self):
101 """The read_azure_ovf method should set the PreprovisionedVM
102 cfg flag to false if the proper setting is false."""
103 content = construct_valid_ovf_env(
104@@ -1155,7 +1145,7 @@ class TestAzureDataSourcePreprovisioning(CiTestCase):
105 cfg = ret[2]
106 self.assertFalse(cfg['PreprovisionedVm'])
107
108- def test_read_azure_ovf_without_flag(self, *args):
109+ def test_read_azure_ovf_without_flag(self):
110 """The read_azure_ovf method should not set the
111 PreprovisionedVM cfg flag."""
112 content = construct_valid_ovf_env()
113@@ -1163,12 +1153,121 @@ class TestAzureDataSourcePreprovisioning(CiTestCase):
114 cfg = ret[2]
115 self.assertFalse(cfg['PreprovisionedVm'])
116
117- @mock.patch('cloudinit.sources.DataSourceAzure.util.is_FreeBSD')
118- @mock.patch('cloudinit.net.dhcp.EphemeralIPv4Network')
119- @mock.patch('cloudinit.net.dhcp.maybe_perform_dhcp_discovery')
120- @mock.patch('requests.Session.request')
121+
122+@mock.patch('os.path.isfile')
123+class TestPreprovisioningShouldReprovision(CiTestCase):
124+
125+ def setUp(self):
126+ super(TestPreprovisioningShouldReprovision, self).setUp()
127+ tmp = self.tmp_dir()
128+ self.waagent_d = self.tmp_path('/var/lib/waagent', tmp)
129+ self.paths = helpers.Paths({'cloud_dir': tmp})
130+ dsaz.BUILTIN_DS_CONFIG['data_dir'] = self.waagent_d
131+
132+ @mock.patch('cloudinit.sources.DataSourceAzure.util.write_file')
133+ def test__should_reprovision_with_true_cfg(self, isfile, write_f):
134+ """The _should_reprovision method should return true with config
135+ flag present."""
136+ isfile.return_value = False
137+ dsa = dsaz.DataSourceAzure({}, distro=None, paths=self.paths)
138+ self.assertTrue(dsa._should_reprovision(
139+ (None, None, {'PreprovisionedVm': True}, None)))
140+
141+ def test__should_reprovision_with_file_existing(self, isfile):
142+ """The _should_reprovision method should return True if the sentinal
143+ exists."""
144+ isfile.return_value = True
145+ dsa = dsaz.DataSourceAzure({}, distro=None, paths=self.paths)
146+ self.assertTrue(dsa._should_reprovision(
147+ (None, None, {'preprovisionedvm': False}, None)))
148+
149+ def test__should_reprovision_returns_false(self, isfile):
150+ """The _should_reprovision method should return False
151+ if config and sentinal are not present."""
152+ isfile.return_value = False
153+ dsa = dsaz.DataSourceAzure({}, distro=None, paths=self.paths)
154+ self.assertFalse(dsa._should_reprovision((None, None, {}, None)))
155+
156+ @mock.patch('cloudinit.sources.DataSourceAzure.DataSourceAzure._poll_imds')
157+ def test_reprovision_calls__poll_imds(self, _poll_imds, isfile):
158+ """_reprovision will poll IMDS."""
159+ isfile.return_value = False
160+ hostname = "myhost"
161+ username = "myuser"
162+ odata = {'HostName': hostname, 'UserName': username}
163+ _poll_imds.return_value = construct_valid_ovf_env(data=odata)
164+ dsa = dsaz.DataSourceAzure({}, distro=None, paths=self.paths)
165+ dsa._reprovision()
166+ _poll_imds.assert_called_with()
167+
168+
169+@mock.patch('cloudinit.net.dhcp.EphemeralIPv4Network')
170+@mock.patch('cloudinit.net.dhcp.maybe_perform_dhcp_discovery')
171+@mock.patch('requests.Session.request')
172+@mock.patch(
173+ 'cloudinit.sources.DataSourceAzure.DataSourceAzure._report_ready')
174+class TestPreprovisioningPollIMDS(CiTestCase):
175+
176+ def setUp(self):
177+ super(TestPreprovisioningPollIMDS, self).setUp()
178+ self.tmp = self.tmp_dir()
179+ self.waagent_d = self.tmp_path('/var/lib/waagent', self.tmp)
180+ self.paths = helpers.Paths({'cloud_dir': self.tmp})
181+ dsaz.BUILTIN_DS_CONFIG['data_dir'] = self.waagent_d
182+
183+ @mock.patch('cloudinit.sources.DataSourceAzure.util.write_file')
184+ def test_poll_imds_calls_report_ready(self, write_f, report_ready_func,
185+ fake_resp, m_dhcp, m_net):
186+ """The poll_imds will call report_ready after creating marker file."""
187+ report_marker = self.tmp_path('report_marker', self.tmp)
188+ lease = {
189+ 'interface': 'eth9', 'fixed-address': '192.168.2.9',
190+ 'routers': '192.168.2.1', 'subnet-mask': '255.255.255.0',
191+ 'unknown-245': '624c3620'}
192+ m_dhcp.return_value = [lease]
193+ dsa = dsaz.DataSourceAzure({}, distro=None, paths=self.paths)
194+ mock_path = (
195+ 'cloudinit.sources.DataSourceAzure.REPORTED_READY_MARKER_FILE')
196+ with mock.patch(mock_path, report_marker):
197+ dsa._poll_imds()
198+ self.assertEqual(report_ready_func.call_count, 1)
199+ report_ready_func.assert_called_with(lease=lease)
200+
201+ def test_poll_imds_report_ready_false(self, report_ready_func,
202+ fake_resp, m_dhcp, m_net):
203+ """The poll_imds should not call reporting ready
204+ when flag is false"""
205+ report_marker = self.tmp_path('report_marker', self.tmp)
206+ write_file(report_marker, content='dont run report_ready :)')
207+ m_dhcp.return_value = [{
208+ 'interface': 'eth9', 'fixed-address': '192.168.2.9',
209+ 'routers': '192.168.2.1', 'subnet-mask': '255.255.255.0',
210+ 'unknown-245': '624c3620'}]
211+ dsa = dsaz.DataSourceAzure({}, distro=None, paths=self.paths)
212+ mock_path = (
213+ 'cloudinit.sources.DataSourceAzure.REPORTED_READY_MARKER_FILE')
214+ with mock.patch(mock_path, report_marker):
215+ dsa._poll_imds()
216+ self.assertEqual(report_ready_func.call_count, 0)
217+
218+
219+@mock.patch('cloudinit.sources.DataSourceAzure.util.subp')
220+@mock.patch('cloudinit.sources.DataSourceAzure.util.write_file')
221+@mock.patch('cloudinit.sources.DataSourceAzure.util.is_FreeBSD')
222+@mock.patch('cloudinit.net.dhcp.EphemeralIPv4Network')
223+@mock.patch('cloudinit.net.dhcp.maybe_perform_dhcp_discovery')
224+@mock.patch('requests.Session.request')
225+class TestAzureDataSourcePreprovisioning(CiTestCase):
226+
227+ def setUp(self):
228+ super(TestAzureDataSourcePreprovisioning, self).setUp()
229+ tmp = self.tmp_dir()
230+ self.waagent_d = self.tmp_path('/var/lib/waagent', tmp)
231+ self.paths = helpers.Paths({'cloud_dir': tmp})
232+ dsaz.BUILTIN_DS_CONFIG['data_dir'] = self.waagent_d
233+
234 def test_poll_imds_returns_ovf_env(self, fake_resp, m_dhcp, m_net,
235- m_is_bsd, *args):
236+ m_is_bsd, write_f, subp):
237 """The _poll_imds method should return the ovf_env.xml."""
238 m_is_bsd.return_value = False
239 m_dhcp.return_value = [{
240@@ -1194,12 +1293,8 @@ class TestAzureDataSourcePreprovisioning(CiTestCase):
241 prefix_or_mask='255.255.255.0', router='192.168.2.1')
242 self.assertEqual(m_net.call_count, 1)
243
244- @mock.patch('cloudinit.sources.DataSourceAzure.util.is_FreeBSD')
245- @mock.patch('cloudinit.net.dhcp.EphemeralIPv4Network')
246- @mock.patch('cloudinit.net.dhcp.maybe_perform_dhcp_discovery')
247- @mock.patch('requests.Session.request')
248 def test__reprovision_calls__poll_imds(self, fake_resp, m_dhcp, m_net,
249- m_is_bsd, *args):
250+ m_is_bsd, write_f, subp):
251 """The _reprovision method should call poll IMDS."""
252 m_is_bsd.return_value = False
253 m_dhcp.return_value = [{
254@@ -1231,32 +1326,5 @@ class TestAzureDataSourcePreprovisioning(CiTestCase):
255 prefix_or_mask='255.255.255.0', router='192.168.2.1')
256 self.assertEqual(m_net.call_count, 1)
257
258- @mock.patch('cloudinit.sources.DataSourceAzure.util.write_file')
259- @mock.patch('os.path.isfile')
260- def test__should_reprovision_with_true_cfg(self, isfile, write_f, *args):
261- """The _should_reprovision method should return true with config
262- flag present."""
263- isfile.return_value = False
264- dsa = dsaz.DataSourceAzure({}, distro=None, paths=self.paths)
265- self.assertTrue(dsa._should_reprovision(
266- (None, None, {'PreprovisionedVm': True}, None)))
267-
268- @mock.patch('os.path.isfile')
269- def test__should_reprovision_with_file_existing(self, isfile, *args):
270- """The _should_reprovision method should return True if the sentinal
271- exists."""
272- isfile.return_value = True
273- dsa = dsaz.DataSourceAzure({}, distro=None, paths=self.paths)
274- self.assertTrue(dsa._should_reprovision(
275- (None, None, {'preprovisionedvm': False}, None)))
276-
277- @mock.patch('os.path.isfile')
278- def test__should_reprovision_returns_false(self, isfile, *args):
279- """The _should_reprovision method should return False
280- if config and sentinal are not present."""
281- isfile.return_value = False
282- dsa = dsaz.DataSourceAzure({}, distro=None, paths=self.paths)
283- self.assertFalse(dsa._should_reprovision((None, None, {}, None)))
284-
285
286 # vi: ts=4 expandtab

Subscribers

People subscribed via source and target branches