Merge ~ma-brothier/cloud-init:cs-multi-dhcp into cloud-init:master

Proposed by Marc-Aurele Brothier
Status: Work in progress
Proposed branch: ~ma-brothier/cloud-init:cs-multi-dhcp
Merge into: cloud-init:master
Diff against target: 275 lines (+63/-60)
2 files modified
cloudinit/sources/DataSourceCloudStack.py (+44/-42)
tests/unittests/test_datasource/test_cloudstack.py (+19/-18)
Reviewer Review Type Date Requested Status
Server Team CI bot continuous-integration Needs Fixing
Review via email: mp+336145@code.launchpad.net

Description of the change

CloudStack changes on DHCP leases crawling mechanism for VR address

Currently the DHCP leases are search in reverse chronological order to look for the virtual router IP inside the lease information. In case more than one interface is added to a VM using a DHCP service to configure it, the latest DHCP lease will be linked to this new interface instead of the default one, thus waiting for the fault back method to fetch the default gateway address.

This change looks for all DHCP leases, grab all potential IP addresses (removing potential ducplicate due to multiple lease information in the same file) for the VR and try all of them in alphabetical order of the interface names until a good match is found (correct response), with a last default to the default gateway IP as originally.

To post a comment you must log in.
Revision history for this message
Server Team CI bot (server-team-bot) wrote :

FAILED: Continuous integration, rev:
https://jenkins.ubuntu.com/server/job/cloud-init-ci/697/
Executed test runs:
    SUCCESS: Checkout
    FAILED: Unit & Style Tests

Click here to trigger a rebuild:
https://jenkins.ubuntu.com/server/job/cloud-init-ci/697/rebuild

review: Needs Fixing (continuous-integration)
Revision history for this message
Server Team CI bot (server-team-bot) wrote :

FAILED: Continuous integration, rev:
https://jenkins.ubuntu.com/server/job/cloud-init-ci/698/
Executed test runs:
    SUCCESS: Checkout
    FAILED: Unit & Style Tests

Click here to trigger a rebuild:
https://jenkins.ubuntu.com/server/job/cloud-init-ci/698/rebuild

review: Needs Fixing (continuous-integration)
Revision history for this message
Scott Moser (smoser) wrote :

Marc,

Thanks for this MP.

I have two thoughts:
a.) we should probably look into using the output of 'cloud-init dhclient-hook' which gets called by tools/hook-dhclient. That writes files in /run/cloud-init/dhclient.hooks/eth0.json which look like this below. I think its just a matter of then reading those files and looking for a dhcp_server_identifier key.

{
 "broadcast_address": "10.75.205.255",
 "dhcp_lease_time": "3600",
 "dhcp_message_type": "5",
 "dhcp_rebinding_time": "3150",
 "dhcp_renewal_time": "1800",
 "dhcp_server_identifier": "10.75.205.1",
 "domain_name": "lxd",
 "domain_name_servers": "10.75.205.1",
 "expiry": "1516216015",
 "host_name": "x1",
 "ip_address": "10.75.205.55",
 "network_number": "10.75.205.0",
 "next_server": "10.75.205.1",
 "routers": "10.75.205.1",
 "subnet_mask": "255.255.255.0"
}

b. I think that cloudstack is probably broken on Ubuntu 17.10 and bionic (18.04) and anywhere else that uses systemd-networkd. systemd-networkd doesnt' write the same leases files as dhclient, nor do the hooks get called. It does though provide info like:
$ cat /run/systemd/netif/leases/41
# This is private data. Do not parse.
ADDRESS=10.75.205.120
NETMASK=255.255.255.0
ROUTER=10.75.205.1
SERVER_ADDRESS=10.75.205.1
NEXT_SERVER=10.75.205.1
BROADCAST=10.75.205.255
T1=1609
T2=2959
LIFETIME=3600
DNS=10.75.205.1
DOMAINNAME=lxd
HOSTNAME=b1
CLIENTID=ffc011a99100020000ab118d8462b1b543ef0c

(Yes, i'm aware of the 'Do not parse' comment in the file, but there is currently no other way to get this information out of the networkd-systemd built in dhcp client).

Revision history for this message
Scott Moser (smoser) wrote :

Oh, also, if we *were* to need to parse dhcp leases files, there is now some code in cloudinit/net/dhcp.py that could be re-used or improved on.

Revision history for this message
Scott Moser (smoser) wrote :

So I was wrong about my 'b' above. that was fixed already. for systemd systems it currently just walks the leases dir in sorted order. these are numbers of the iface... not sorted like 'eth0', 'eth1' or any other mreaningful manner.

then the first entry that has that key it will return the value.

Revision history for this message
Marc-Aurele Brothier (ma-brothier) wrote :

Hi Scott,

If cloud-init creates a file for the lease of the interface it's taking
care, it's much better than using the default system location. What if
cloud-init manage multiple interface as part of the cloud-config? I don't
think it's the case now to say that eth1 should be confogured through DHCP?

Does the failed tests are expected? Localy I had some failing on another
part of the code I didn't touch (or at least thought it's not related).

On Thu, Jan 18, 2018 at 7:33 PM, Scott Moser <email address hidden>
wrote:

> So I was wrong about my 'b' above. that was fixed already. for systemd
> systems it currently just walks the leases dir in sorted order. these are
> numbers of the iface... not sorted like 'eth0', 'eth1' or any other
> mreaningful manner.
>
> then the first entry that has that key it will return the value.
>
> --
> https://code.launchpad.net/~ma-brothier/cloud-init/+git/
> cloud-init/+merge/336145
> You are the owner of ~ma-brothier/cloud-init:cs-multi-dhcp.
>

Revision history for this message
Scott Moser (smoser) wrote :

Marc,
cloud-init only currently creates the files in /run/ on azure. We can change that to do it also on CloudStack. It is optimized to only run on azure just to stop cloud-init (and python) in the hook path for dhcp and nic bringup everywhere.

I'm not sure I follow "What if cloud-init mange multiple interface..."

Tests should pass on trunk. nightly runs of tip can be seen:
 https://jenkins.ubuntu.com/server/view/cloud-init/job/cloud-init-ci-nightly/

its possible that there are some bad tests that interact poorly with the environment they're running in. unittests fail for you, please paste those failures somewhere and I'll look at helping out (feel free to ping in #cloud-init also).

Revision history for this message
Marc-Aurele Brothier (ma-brothier) wrote :

I've pushed a fixed for the failed test, it's running now locally.

Sorry for the confusion on my previous mail, it was unclear.Since
cloud-init does change the configuration of the first interface to be
configured through a DHCP, it would be great to read the JSON file to grab
the cloudstack password/metadata server address. Then if another interface
is configured through DHCP, cloud-init won't create the related json file,
am I right? Or would it be possible to pass a cloud-init config to state a
list of interface to be configured through a DHCP ?

On Fri, Jan 19, 2018 at 8:03 PM, Scott Moser <email address hidden>
wrote:

> Marc,
> cloud-init only currently creates the files in /run/ on azure. We can
> change that to do it also on CloudStack. It is optimized to only run on
> azure just to stop cloud-init (and python) in the hook path for dhcp and
> nic bringup everywhere.
>
> I'm not sure I follow "What if cloud-init mange multiple interface..."
>
> Tests should pass on trunk. nightly runs of tip can be seen:
> https://jenkins.ubuntu.com/server/view/cloud-init/job/
> cloud-init-ci-nightly/
>
> its possible that there are some bad tests that interact poorly with the
> environment they're running in. unittests fail for you, please paste those
> failures somewhere and I'll look at helping out (feel free to ping in
> #cloud-init also).
>
>
> --
> https://code.launchpad.net/~ma-brothier/cloud-init/+git/
> cloud-init/+merge/336145
> You are the owner of ~ma-brothier/cloud-init:cs-multi-dhcp.
>

Revision history for this message
Marc-Aurele Brothier (ma-brothier) wrote :

I'm force pushing the change to force a rebuild of the CI bot, but it does seem to work. How should I do?

1832cc7... by Marc-Aurele Brothier

CloudStack: go through all potential DHCP lease to find the meta/user-data server

Signed-off-by: Marc-Aurèle Brothier <email address hidden>

Revision history for this message
Scott Moser (smoser) wrote :
Revision history for this message
Server Team CI bot (server-team-bot) wrote :

FAILED: Continuous integration, rev:
https://jenkins.ubuntu.com/server/job/cloud-init-ci/879/
Executed test runs:
    SUCCESS: Checkout
    FAILED: Unit & Style Tests

Click here to trigger a rebuild:
https://jenkins.ubuntu.com/server/job/cloud-init-ci/879/rebuild

review: Needs Fixing (continuous-integration)
Revision history for this message
Scott Moser (smoser) wrote :

@Marc,

Can you rebase and push --force again ? it looks like you have one failing test. possibly it is just notn completely mocked. i didnt' look much further, but below is how it fails on my system.

$ tox-venv py27 - tests/unittests/test_datasource/test_cloudstack.py
envpython=python2.7
inside tox:py27 running: python -m nose tests/unittests/test_datasource/test_cloudstack.py
.........F...
======================================================================
FAIL: If multiple files match, all should be used.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/smoser-public/src/cloud-init/cloud-init/tests/unittests/test_datasource/test_cloudstack.py", line 173, in test_selects_all_matching
    get_leases(lease_d))
AssertionError: Lists differ: ['/tm[29 chars]lient.leases', '/tmp/ci-TestGetLeases.LS7lQh/d[62 chars]ase'] != ['/tm[29 chars]lient-eth3.lease', '/tmp/ci-TestGetLeases.LS7l[62 chars]ase']

First differing element 0:
/tmp/ci-TestGetLeases.LS7lQh/dhclient.leases
/tmp/ci-TestGetLeases.LS7lQh/dhclient-eth3.lease

- ['/tmp/ci-TestGetLeases.LS7lQh/dhclient.leases',
? -

+ ['/tmp/ci-TestGetLeases.LS7lQh/dhclient-eth3.lease',
? +++++

- '/tmp/ci-TestGetLeases.LS7lQh/dhclient.lease',
+ '/tmp/ci-TestGetLeases.LS7lQh/dhclient.leases',
? +

- '/tmp/ci-TestGetLeases.LS7lQh/dhclient-eth3.lease']
? -----

+ '/tmp/ci-TestGetLeases.LS7lQh/dhclient.lease']
-------------------- >> begin captured logging << --------------------
cloudinit.util: DEBUG: Writing to /tmp/ci-TestGetLeases.LS7lQh/dhclient.leases - wb: [644] 15 bytes
cloudinit.util: DEBUG: Writing to /tmp/ci-TestGetLeases.LS7lQh/dhclient.lease - wb: [644] 14 bytes
cloudinit.util: DEBUG: Writing to /tmp/ci-TestGetLeases.LS7lQh/dhclient.leases - wb: [644] 15 bytes
cloudinit.util: DEBUG: Writing to /tmp/ci-TestGetLeases.LS7lQh/dhclient.lease - wb: [644] 14 bytes
cloudinit.util: DEBUG: Writing to /tmp/ci-TestGetLeases.LS7lQh/dhclient-eth3.lease - wb: [644] 19 bytes
--------------------- >> end captured logging << ---------------------

----------------------------------------------------------------------
Ran 13 tests in 0.259s

FAILED (failures=1)

Revision history for this message
Scott Moser (smoser) wrote :

Bump.

@Marc,
I had a request above.

Revision history for this message
Ryan Harper (raharper) wrote :

Hi Marc,

I've marked this branch Work in Progress. Scott had left you some review comments that need applied before we can land this branch. Once you've done so (and likely rebase on master) please push and then update this merge proposal back to Needs Review and we'll pick things back up.

Unmerged commits

1832cc7... by Marc-Aurele Brothier

CloudStack: go through all potential DHCP lease to find the meta/user-data server

Signed-off-by: Marc-Aurèle Brothier <email address hidden>

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/cloudinit/sources/DataSourceCloudStack.py b/cloudinit/sources/DataSourceCloudStack.py
2index 0df545f..e3e922b 100644
3--- a/cloudinit/sources/DataSourceCloudStack.py
4+++ b/cloudinit/sources/DataSourceCloudStack.py
5@@ -13,23 +13,22 @@
6 # This file is part of cloud-init. See LICENSE file for license information.
7
8 import os
9+import time
10 from socket import inet_ntoa
11 from struct import pack
12-import time
13
14 from cloudinit import ec2_utils as ec2
15 from cloudinit import log as logging
16-from cloudinit.net import dhcp
17 from cloudinit import sources
18 from cloudinit import url_helper as uhelp
19 from cloudinit import util
20+from cloudinit.net import dhcp
21
22 LOG = logging.getLogger(__name__)
23
24
25 class CloudStackPasswordServerClient(object):
26- """
27- Implements password fetching from the CloudStack password server.
28+ """Implements password fetching from the CloudStack password server.
29
30 http://cloudstack-administration.readthedocs.org/
31 en/latest/templates.html#adding-password-management-to-your-templates
32@@ -74,10 +73,9 @@ class DataSourceCloudStack(sources.DataSource):
33 # Cloudstack has its metadata/userdata URLs located at
34 # http://<virtual-router-ip>/latest/
35 self.api_ver = 'latest'
36- self.vr_addr = get_vr_address()
37- if not self.vr_addr:
38+ self.vr_addresses = get_vr_addresses()
39+ if not self.vr_addresses:
40 raise RuntimeError("No virtual router found!")
41- self.metadata_address = "http://%s/" % (self.vr_addr,)
42 self.cfg = {}
43
44 def _get_url_settings(self):
45@@ -102,14 +100,17 @@ class DataSourceCloudStack(sources.DataSource):
46 def wait_for_metadata_service(self):
47 (max_wait, timeout) = self._get_url_settings()
48
49- urls = [uhelp.combine_url(self.metadata_address,
50- 'latest/meta-data/instance-id')]
51+ urls = [uhelp.combine_url("http://%s/" % vr_potential_ip,
52+ 'latest/meta-data/instance-id')
53+ for vr_potential_ip in self.vr_addresses]
54 start_time = time.time()
55 url = uhelp.wait_for_url(urls=urls, max_wait=max_wait,
56 timeout=timeout, status_cb=LOG.warn)
57
58 if url:
59 LOG.debug("Using metadata source: '%s'", url)
60+ self.vr_addr = url.split('/')[2]
61+ self.metadata_address = "http://%s/" % (self.vr_addr,)
62 else:
63 LOG.critical(("Giving up on waiting for the metadata from %s"
64 " after %s seconds"),
65@@ -168,7 +169,7 @@ class DataSourceCloudStack(sources.DataSource):
66
67
68 def get_default_gateway():
69- # Returns the default gateway ip address in the dotted format.
70+ """Returns the default gateway ip address in the dotted format."""
71 lines = util.load_file("/proc/net/route").splitlines()
72 for line in lines:
73 items = line.split("\t")
74@@ -181,7 +182,7 @@ def get_default_gateway():
75
76
77 def get_dhclient_d():
78- # find lease files directory
79+ """Find lease files directory."""
80 supported_dirs = ["/var/lib/dhclient", "/var/lib/dhcp",
81 "/var/lib/NetworkManager"]
82 for d in supported_dirs:
83@@ -191,15 +192,14 @@ def get_dhclient_d():
84 return None
85
86
87-def get_latest_lease(lease_d=None):
88- # find latest lease file
89+def get_leases(lease_d=None):
90+ """Find all lease files."""
91 if lease_d is None:
92 lease_d = get_dhclient_d()
93 if not lease_d:
94 return None
95 lease_files = os.listdir(lease_d)
96- latest_mtime = -1
97- latest_file = None
98+ valid_lease_files = []
99
100 # lease files are named inconsistently across distros.
101 # We assume that 'dhclient6' indicates ipv6 and ignore it.
102@@ -215,18 +215,17 @@ def get_latest_lease(lease_d=None):
103 continue
104 if not (fname.endswith(".lease") or fname.endswith(".leases")):
105 continue
106-
107- abs_path = os.path.join(lease_d, fname)
108- mtime = os.path.getmtime(abs_path)
109- if mtime > latest_mtime:
110- latest_mtime = mtime
111- latest_file = abs_path
112- return latest_file
113+ valid_lease_files.append(os.path.join(lease_d, fname))
114+ return valid_lease_files
115
116
117-def get_vr_address():
118- # Get the address of the virtual router via dhcp leases
119- # If no virtual router is detected, fallback on default gateway.
120+def get_vr_addresses():
121+ """Get a list of potential addresses for the virtual router via DHCP
122+ leases. This overcomes the situation when multiple interfaces are
123+ configured through DHCP, one of them will come from CloudStack VR and
124+ all addresses should be tested until the meta-data server responds. The
125+ latest entry will always be the default gateway.
126+ """
127 # See http://docs.cloudstack.apache.org/projects/cloudstack-administration/en/4.8/virtual_machines/user-data.html # noqa
128
129 # Try networkd first...
130@@ -234,27 +233,30 @@ def get_vr_address():
131 if latest_address:
132 LOG.debug("Found SERVER_ADDRESS '%s' via networkd_leases",
133 latest_address)
134- return latest_address
135+ return [latest_address]
136
137 # Try dhcp lease files next...
138- lease_file = get_latest_lease()
139- if not lease_file:
140+ lease_files = get_leases()
141+ if not lease_files:
142 LOG.debug("No lease file found, using default gateway")
143- return get_default_gateway()
144-
145- with open(lease_file, "r") as fd:
146- for line in fd:
147- if "dhcp-server-identifier" in line:
148- words = line.strip(" ;\r\n").split(" ")
149- if len(words) > 2:
150- dhcptok = words[2]
151- LOG.debug("Found DHCP identifier %s", dhcptok)
152- latest_address = dhcptok
153- if not latest_address:
154+ return [get_default_gateway()]
155+
156+ potential_addresses = []
157+ for lease_file in lease_files:
158+ with open(lease_file, "r") as fd:
159+ for line in fd:
160+ if "dhcp-server-identifier" in line:
161+ words = line.strip(" ;\r\n").split(" ")
162+ if len(words) > 2:
163+ dhcptok = words[2]
164+ if dhcptok not in potential_addresses:
165+ LOG.debug("Found DHCP identifier %s", dhcptok)
166+ potential_addresses.append(dhcptok)
167+ if not potential_addresses:
168 # No virtual router found, fallback on default gateway
169 LOG.debug("No DHCP found, using default gateway")
170- return get_default_gateway()
171- return latest_address
172+ return [get_default_gateway()]
173+ return potential_addresses
174
175
176 # Used to match classes to dependencies
177@@ -263,8 +265,8 @@ datasources = [
178 ]
179
180
181-# Return a list of data sources that match this set of dependencies
182 def get_datasource_list(depends):
183+ """Return a list of data sources that match this set of dependencies."""
184 return sources.list_from_depends(depends, datasources)
185
186 # vi: ts=4 expandtab
187diff --git a/tests/unittests/test_datasource/test_cloudstack.py b/tests/unittests/test_datasource/test_cloudstack.py
188index d6d2d6b..29de7e4 100644
189--- a/tests/unittests/test_datasource/test_cloudstack.py
190+++ b/tests/unittests/test_datasource/test_cloudstack.py
191@@ -1,15 +1,14 @@
192 # This file is part of cloud-init. See LICENSE file for license information.
193
194+import os
195+import time
196+
197 from cloudinit import helpers
198 from cloudinit import util
199 from cloudinit.sources.DataSourceCloudStack import (
200- DataSourceCloudStack, get_latest_lease)
201-
202+ DataSourceCloudStack, get_leases)
203 from cloudinit.tests.helpers import CiTestCase, ExitStack, mock
204
205-import os
206-import time
207-
208
209 class TestCloudStackPasswordFetching(CiTestCase):
210
211@@ -21,9 +20,9 @@ class TestCloudStackPasswordFetching(CiTestCase):
212 self.patches.enter_context(mock.patch('{0}.ec2'.format(mod_name)))
213 self.patches.enter_context(mock.patch('{0}.uhelp'.format(mod_name)))
214 default_gw = "192.201.20.0"
215- get_latest_lease = mock.MagicMock(return_value=None)
216+ get_leases = mock.MagicMock(return_value=None)
217 self.patches.enter_context(mock.patch(
218- mod_name + '.get_latest_lease', get_latest_lease))
219+ mod_name + '.get_leases', get_leases))
220
221 get_default_gw = mock.MagicMock(return_value=default_gw)
222 self.patches.enter_context(mock.patch(
223@@ -105,7 +104,7 @@ class TestCloudStackPasswordFetching(CiTestCase):
224 self._check_password_not_saved_for('bad_request')
225
226
227-class TestGetLatestLease(CiTestCase):
228+class TestGetLeases(CiTestCase):
229
230 def _populate_dir_list(self, bdir, files):
231 """populate_dir_list([(name, data), (name, data)])
232@@ -122,8 +121,8 @@ class TestGetLatestLease(CiTestCase):
233 def _pop_and_test(self, files, expected):
234 lease_d = self.tmp_dir()
235 self._populate_dir_list(lease_d, files)
236- self.assertEqual(self.tmp_path(expected, lease_d),
237- get_latest_lease(lease_d))
238+ self.assertEqual([self.tmp_path(expected, lease_d)],
239+ get_leases(lease_d))
240
241 def test_skips_dhcpv6_files(self):
242 """files started with dhclient6 should be skipped."""
243@@ -154,22 +153,24 @@ class TestGetLatestLease(CiTestCase):
244 "dhclient.lease-old", "dhclient.leaselease"],
245 "dhclient.lease")
246
247- def test_selects_newest_matching(self):
248- """If multiple files match, the newest written should be used."""
249+ def test_selects_all_matching(self):
250+ """If multiple files match, all should be used."""
251 lease_d = self.tmp_dir()
252 valid_1 = "dhclient.leases"
253 valid_2 = "dhclient.lease"
254 valid_1_path = self.tmp_path(valid_1, lease_d)
255 valid_2_path = self.tmp_path(valid_2, lease_d)
256
257- self._populate_dir_list(lease_d, [valid_1, valid_2])
258- self.assertEqual(valid_2_path, get_latest_lease(lease_d))
259+ self._populate_dir_list(lease_d, [valid_2, valid_1])
260+ self.assertEqual([valid_1_path, valid_2_path], get_leases(lease_d))
261
262- # now update mtime on valid_2 to be older than valid_1 and re-check.
263- mtime = int(os.path.getmtime(valid_1_path)) - 1
264- os.utime(valid_2_path, (mtime, mtime))
265+ # now add another lease file
266+ valid_3 = "dhclient-eth3.lease"
267+ valid_3_path = self.tmp_path(valid_3, lease_d)
268+ self._populate_dir_list(lease_d, [valid_3, valid_2, valid_1])
269
270- self.assertEqual(valid_1_path, get_latest_lease(lease_d))
271+ self.assertEqual([valid_1_path, valid_2_path, valid_3_path],
272+ get_leases(lease_d))
273
274
275 # vi: ts=4 expandtab

Subscribers

People subscribed via source and target branches