Merge ~chad.smith/cloud-init:fix-ec2-fallback-nic into cloud-init:master

Proposed by Chad Smith on 2017-11-18
Status: Merged
Approved by: Scott Moser on 2017-11-20
Approved revision: c8864aca05cdbc6a8a35c944f0f75d6d687e92fb
Merged at revision: 281a82181716183d526e76f4e0415e0a6f680cbe
Proposed branch: ~chad.smith/cloud-init:fix-ec2-fallback-nic
Merge into: cloud-init:master
Diff against target: 144 lines (+67/-13)
3 files modified
cloudinit/net/dhcp.py (+1/-2)
cloudinit/sources/DataSourceEc2.py (+33/-11)
tests/unittests/test_datasource/test_ec2.py (+33/-0)
Reviewer Review Type Date Requested Status
Scott Moser 2017-11-18 Approve on 2017-11-20
Server Team CI bot continuous-integration Approve on 2017-11-20
Review via email: mp+333927@code.launchpad.net

Commit Message

EC2: Fix bug using fallback_nic and metadata when restoring from cache.

If user upgraded to new cloud-init and attempted to run 'cloud-init init'
without rebooting, cloud-init restores the datasource object from pickle.
The older version pickled datasource object had no value for
_network_config or fallback_nic. This caused the Ec2 datasource to attempt
to reconfigure networking with a None fallback_nic. The pickled object
also cached an older version of ec2 metadata which didn't contain network
information.

This branch does two things:
 - Add a fallback_interface property to DatasourceEC2 to support reading the
   old .fallback_nic attribute if it was set. New versions will
   call net.find_fallback_nic() if there has not been one found.
 - Re-crawl metadata if we are on Ec2 and don't have a 'network' key in
   metadata

LP: #1732917

Description of the Change

EC2: Fix bug using fallback_nic and metadata when restoring from cache.

If user upgraded to new cloud-init and attempted to run 'cloud-init init'
without rebooting, cloud-init restores the datasource object from pickle.
The older version pickled datasource object had no value for
_network_config or fallback_nic. This caused the Ec2 datasource to attempt
to reconfigure networking with a None fallback_nic. The pickled object
also cached an older version of ec2 metadata which didn't contain network
information.

This branch does two things:
 - Add a fallback_interface property to DatasourceEC2 to support reading the
   old .fallback_nic attribute if it was set. New versions will
   call net.find_fallback_nic() if there has not been one found.
 - Re-crawl metadata if we are on Ec2 and don't have a 'network' key in
   metadata

LP: #1732917

To post a comment you must log in.
Ryan Harper (raharper) wrote :

I'm some what leery of recrawling merely because someone ran 'cloud-init init' when there is a cached object for the current instance-id. Wouldn't it be better to *not* re-read metadata?

Now, thinking about a reboot; due to Ec2 datasource not providing a check_instance_id() method, we always throw out the previous boot's Datasource cached object. Prior to local datasource Ec2, we always re-read metadata when acquiring an EC2 datasource; every boot we always read metadata.

I think that reasonably convinces (at least myself) that any time we instantiate an Ec2 datasource object (whether from unpickle or reboot); it's going to read the metadata.

Chad Smith (chad.smith) wrote :
Download full text (4.1 KiB)

Just tested on Ec2

ubuntu@ip-172-31-16-165:~$ sudo dpkg -i cloud-init_0.7.9-233-ge586fe35-0ubuntu1~16.04.2_all.deb

sudo rm -rf /var/log/cloud-init* /var/lib/cloud;
sudo cloud-init init

ubuntu@ip-172-31-16-165:~$ sudo dpkg -i cloud-init_17.1-44-gc8864aca-1~bddeb_all.deb
(Reading database ... 51123 files and directories currently installed.)
Preparing to unpack cloud-init_17.1-44-gc8864aca-1~bddeb_all.deb ...
Unpacking cloud-init (17.1-44-gc8864aca-1~bddeb) over (0.7.9-233-ge586fe35-0ubuntu1~16.04.2) ...
Setting up cloud-init (17.1-44-gc8864aca-1~bddeb) ...
Installing new version of config file /etc/cloud/cloud.cfg ...
Installing new version of config file /etc/cloud/templates/hosts.suse.tmpl ...
Installing new version of config file /etc/cloud/templates/ntp.conf.sles.tmpl ...
Installing new version of config file /etc/cloud/templates/sources.list.debian.tmpl ...
ubuntu@ip-172-31-16-165:~$ sudo cloud-init init
Cloud-init v. 17.1 running 'init' at Sat, 18 Nov 2017 03:01:33 +0000. Up 17185.03 seconds.
ci-info: ++++++++++++++++++++++++++++++++++++++Net device info++++++++++++++++++++++++++++++++++++++
ci-info: +--------+------+-----------------------------+---------------+-------+-------------------+
ci-info: | Device | Up | Address | Mask | Scope | Hw-Address |
ci-info: +--------+------+-----------------------------+---------------+-------+-------------------+
ci-info: | eth0 | True | 172.31.16.165 | 255.255.240.0 | . | 06:23:bc:a9:a4:fa |
ci-info: | eth0 | True | fe80::423:bcff:fea9:a4fa/64 | . | link | 06:23:bc:a9:a4:fa |
ci-info: | lo | True | 127.0.0.1 | 255.0.0.0 | . | . |
ci-info: | lo | True | ::1/128 | . | host | . |
ci-info: +--------+------+-----------------------------+---------------+-------+-------------------+
ci-info: +++++++++++++++++++++++++++++Route IPv4 info+++++++++++++++++++++++++++++
ci-info: +-------+-------------+-------------+---------------+-----------+-------+
ci-info: | Route | Destination | Gateway | Genmask | Interface | Flags |
ci-info: +-------+-------------+-------------+---------------+-----------+-------+
ci-info: | 0 | 0.0.0.0 | 172.31.16.1 | 0.0.0.0 | eth0 | UG |
ci-info: | 1 | 172.31.16.0 | 0.0.0.0 | 255.255.240.0 | eth0 | U |
ci-info: +-------+-------------+-------------+---------------+-----------+-------+
ubuntu@ip-172-31-16-165:~$ systemctl status cloud-init.service
● cloud-init.service - Initial cloud-init job (metadata service crawler)
   Loaded: loaded (/lib/systemd/system/cloud-init.service; enabled; vendor preset: enabled)
   Active: active (exited) since Fri 2017-11-17 23:01:14 UTC; 4h 0min ago
 Main PID: 3781 (code=exited, status=0/SUCCESS)

Nov 17 23:01:14 ip-172-31-16-165 cloud-init[3781]: ci-info: | lo | True | ::1/128 | . | host | . |
Nov 17 23:01:14 ip-172-31-16-165 cloud-init[3781]: ci-info: +--------+------+-----------------------------+---------------+-------+-------------------+
Nov 17 23:01:14 ip-172-31-16-165 ...

Read more...

FAILED: Continuous integration, rev:c8864aca05cdbc6a8a35c944f0f75d6d687e92fb
https://jenkins.ubuntu.com/server/job/cloud-init-ci/513/
Executed test runs:
    SUCCESS: Checkout
    SUCCESS: Unit & Style Tests
    SUCCESS: Ubuntu LTS: Build
    SUCCESS: Ubuntu LTS: Integration
    FAILED: MAAS Compatability Testing

Click here to trigger a rebuild:
https://jenkins.ubuntu.com/server/job/cloud-init-ci/513/rebuild

review: Needs Fixing (continuous-integration)
Scott Moser (smoser) wrote :

one comment inline.

PASSED: Continuous integration, rev:
https://jenkins.ubuntu.com/server/job/cloud-init-ci/516/
Executed test runs:
    SUCCESS: Checkout
    SUCCESS: Unit & Style Tests
    SUCCESS: Ubuntu LTS: Build
    SUCCESS: Ubuntu LTS: Integration
    SUCCESS: MAAS Compatability Testing
    IN_PROGRESS: Declarative: Post Actions

Click here to trigger a rebuild:
https://jenkins.ubuntu.com/server/job/cloud-init-ci/516/rebuild

review: Approve (continuous-integration)
Scott Moser (smoser) :
review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/cloudinit/net/dhcp.py b/cloudinit/net/dhcp.py
2index 0cba703..1a561be 100644
3--- a/cloudinit/net/dhcp.py
4+++ b/cloudinit/net/dhcp.py
5@@ -41,8 +41,7 @@ def maybe_perform_dhcp_discovery(nic=None):
6 if nic is None:
7 nic = find_fallback_nic()
8 if nic is None:
9- LOG.debug(
10- 'Skip dhcp_discovery: Unable to find fallback nic.')
11+ LOG.debug('Skip dhcp_discovery: Unable to find fallback nic.')
12 return {}
13 elif nic not in get_devicelist():
14 LOG.debug(
15diff --git a/cloudinit/sources/DataSourceEc2.py b/cloudinit/sources/DataSourceEc2.py
16index 0ef2217..7bbbfb6 100644
17--- a/cloudinit/sources/DataSourceEc2.py
18+++ b/cloudinit/sources/DataSourceEc2.py
19@@ -65,7 +65,7 @@ class DataSourceEc2(sources.DataSource):
20 get_network_metadata = False
21
22 # Track the discovered fallback nic for use in configuration generation.
23- fallback_nic = None
24+ _fallback_interface = None
25
26 def __init__(self, sys_cfg, distro, paths):
27 sources.DataSource.__init__(self, sys_cfg, distro, paths)
28@@ -92,18 +92,17 @@ class DataSourceEc2(sources.DataSource):
29 elif self.cloud_platform == Platforms.NO_EC2_METADATA:
30 return False
31
32- self.fallback_nic = net.find_fallback_nic()
33 if self.get_network_metadata: # Setup networking in init-local stage.
34 if util.is_FreeBSD():
35 LOG.debug("FreeBSD doesn't support running dhclient with -sf")
36 return False
37- dhcp_leases = dhcp.maybe_perform_dhcp_discovery(self.fallback_nic)
38+ dhcp_leases = dhcp.maybe_perform_dhcp_discovery(
39+ self.fallback_interface)
40 if not dhcp_leases:
41 # DataSourceEc2Local failed in init-local stage. DataSourceEc2
42 # will still run in init-network stage.
43 return False
44 dhcp_opts = dhcp_leases[-1]
45- self.fallback_nic = dhcp_opts.get('interface')
46 net_params = {'interface': dhcp_opts.get('interface'),
47 'ip': dhcp_opts.get('fixed-address'),
48 'prefix_or_mask': dhcp_opts.get('subnet-mask'),
49@@ -301,21 +300,44 @@ class DataSourceEc2(sources.DataSource):
50 return None
51
52 result = None
53- net_md = self.metadata.get('network')
54+ no_network_metadata_on_aws = bool(
55+ 'network' not in self.metadata and
56+ self.cloud_platform == Platforms.AWS)
57+ if no_network_metadata_on_aws:
58+ LOG.debug("Metadata 'network' not present:"
59+ " Refreshing stale metadata from prior to upgrade.")
60+ util.log_time(
61+ logfunc=LOG.debug, msg='Re-crawl of metadata service',
62+ func=self._crawl_metadata)
63+
64 # Limit network configuration to only the primary/fallback nic
65- macs_to_nics = {
66- net.get_interface_mac(self.fallback_nic): self.fallback_nic}
67+ iface = self.fallback_interface
68+ macs_to_nics = {net.get_interface_mac(iface): iface}
69+ net_md = self.metadata.get('network')
70 if isinstance(net_md, dict):
71 result = convert_ec2_metadata_network_config(
72- net_md, macs_to_nics=macs_to_nics,
73- fallback_nic=self.fallback_nic)
74+ net_md, macs_to_nics=macs_to_nics, fallback_nic=iface)
75 else:
76- LOG.warning("unexpected metadata 'network' key not valid: %s",
77- net_md)
78+ LOG.warning("Metadata 'network' key not valid: %s.", net_md)
79 self._network_config = result
80
81 return self._network_config
82
83+ @property
84+ def fallback_interface(self):
85+ if self._fallback_interface is None:
86+ # fallback_nic was used at one point, so restored objects may
87+ # have an attribute there. respect that if found.
88+ _legacy_fbnic = getattr(self, 'fallback_nic', None)
89+ if _legacy_fbnic:
90+ self._fallback_interface = _legacy_fbnic
91+ self.fallback_nic = None
92+ else:
93+ self._fallback_interface = net.find_fallback_nic()
94+ if self._fallback_interface is None:
95+ LOG.warning("Did not find a fallback interface on EC2.")
96+ return self._fallback_interface
97+
98 def _crawl_metadata(self):
99 """Crawl metadata service when available.
100
101diff --git a/tests/unittests/test_datasource/test_ec2.py b/tests/unittests/test_datasource/test_ec2.py
102index 6af699a..ba328ee 100644
103--- a/tests/unittests/test_datasource/test_ec2.py
104+++ b/tests/unittests/test_datasource/test_ec2.py
105@@ -307,6 +307,39 @@ class TestEc2(test_helpers.HttprettyTestCase):
106
107 @httpretty.activate
108 @mock.patch('cloudinit.net.dhcp.maybe_perform_dhcp_discovery')
109+ def test_network_config_cached_property_refreshed_on_upgrade(self, m_dhcp):
110+ """Refresh the network_config Ec2 cache if network key is absent.
111+
112+ This catches an upgrade issue where obj.pkl contained stale metadata
113+ which lacked newly required network key.
114+ """
115+ old_metadata = copy.deepcopy(DEFAULT_METADATA)
116+ old_metadata.pop('network')
117+ ds = self._setup_ds(
118+ platform_data=self.valid_platform_data,
119+ sys_cfg={'datasource': {'Ec2': {'strict_id': True}}},
120+ md=old_metadata)
121+ self.assertTrue(ds.get_data())
122+ # Provide new revision of metadata that contains network data
123+ register_mock_metaserver(
124+ 'http://169.254.169.254/2009-04-04/meta-data/', DEFAULT_METADATA)
125+ mac1 = '06:17:04:d7:26:09' # Defined in DEFAULT_METADATA
126+ get_interface_mac_path = (
127+ 'cloudinit.sources.DataSourceEc2.net.get_interface_mac')
128+ ds.fallback_nic = 'eth9'
129+ with mock.patch(get_interface_mac_path) as m_get_interface_mac:
130+ m_get_interface_mac.return_value = mac1
131+ ds.network_config # Will re-crawl network metadata
132+ self.assertIn('Re-crawl of metadata service', self.logs.getvalue())
133+ expected = {'version': 1, 'config': [
134+ {'mac_address': '06:17:04:d7:26:09',
135+ 'name': 'eth9',
136+ 'subnets': [{'type': 'dhcp4'}, {'type': 'dhcp6'}],
137+ 'type': 'physical'}]}
138+ self.assertEqual(expected, ds.network_config)
139+
140+ @httpretty.activate
141+ @mock.patch('cloudinit.net.dhcp.maybe_perform_dhcp_discovery')
142 def test_valid_platform_with_strict_true(self, m_dhcp):
143 """Valid platform data should return true with strict_id true."""
144 ds = self._setup_ds(

Subscribers

People subscribed via source and target branches