Merge lp:~chad.smith/vmbuilder/jenkins_kvm_azure_netplan_hotplug into lp:~ubuntu-on-ec2/vmbuilder/jenkins_kvm

Proposed by Chad Smith on 2018-05-31
Status: Merged
Merged at revision: 802
Proposed branch: lp:~chad.smith/vmbuilder/jenkins_kvm_azure_netplan_hotplug
Merge into: lp:~ubuntu-on-ec2/vmbuilder/jenkins_kvm
Diff against target: 79 lines (+40/-5)
1 file modified
templates/img-extra-nets.tmpl (+40/-5)
To merge this branch: bzr merge lp:~chad.smith/vmbuilder/jenkins_kvm_azure_netplan_hotplug
Reviewer Review Type Date Requested Status
Francis Ginther Approve on 2018-07-12
Daniel Axtens (community) 2018-05-31 Approve on 2018-06-01
Dan Watkins 2018-05-31 Approve on 2018-06-01
Ryan Harper 2018-05-31 Pending
VMBuilder 2018-05-31 Pending
Review via email: mp+347212@code.launchpad.net

This proposal supersedes a proposal from 2018-05-31.

Commit message

Azure's nic hotplug: deliver static/etc/netplan/90-hotplug-azure.yaml

Allow netplan/networkd to automatically bringup hotplugged nics
by delivering a flexible static netplan yaml file to catch any new ethernet devices attached to the Azure instance. The hotplug-azure yaml will complement cloud-init's initial/strict eth0 match definition which is matching only to a specific to macaddress for eth0.

Because cloud-init's netwok configuration is is only created at boot,
a detach of the original eth0 nic and attach of a new nic would cause
cloud-init's netplan yaml to fail to match, resulting in no network
config. Given the supplemental 90-hotplug-azure.yaml, the following
"hotpluggedeth0" opaque-id will match on any new eth0 present on the
system after reboot. Systemd will and set that nic the
primary/mandatory interface. eth1 or greater are left as optional
ephemeral devices, meaning that networkd boot will not block on their
link being up.

   hotpluggedeth0:
       dhcp4: true
       match:
          driver: hv_netvsc
          name: 'eth0'

Also avoid appending unnecessary include directives or udev rules in
/etc/network/interfaces on netplan-enabled systems.

Description of the change

Azure images deliver a script /usr/local/sbin/ephemeral_eth.sh which is called from udev add rules on nic hotplug events for nics named eth[1-9]*. This script was created when netplan wasn't a 'thing' and, as such, only cared about /etc/network/interfaces.

In Artful and later, cloud-init writes a fallback interface config in /etc/netplan/50-cloud-init.yaml configuration dhcp on eth0 as a primary/mandatory NIC (optional: false). So boot will wait on that device to come up.

This change adds a flexible static /etc/netplan/90-hotplug-azure.yaml which won't conflict with cloud-init's config

To post a comment you must log in.
Daniel Axtens (daxtens) wrote : Posted in a previous version of this proposal

A few things:

1) netplan is the default on Artful too. I think your detection code is right, but your commit message is potentially wrong?

2) If I understand cloud-init and netplan correctly, couldn't you achieve the same effect by just adding this as /etc/netplan/99-azure-hotplug.yaml? Then you could drop ephemeral_eth.sh entirely on Artful and Bionic.

network:
....version: 2
....ethernets:
........ephemeral:
............dhcp4: true
............match:
................driver: hv_netvsc
................name: "eth*"
............optional: true

3) Looking at the code itself, you should probably use /run/netplan for ephemeral files, rather than /etc/netplan. That also solves your cleanup problem.

4) And it's worth knowing that netplan apply will look for network devices that are 'down' and them from their drivers and rebind them. With your approach, netplan apply will be run for each extra device, so if there are 4 extra devices, the first one configured won't be replugged, the second will be replugged once, the third will be replugged twice and so on. This *probably* isn't problematic, but it makes me nervous, especially doing it in rapid succession.

Chad Smith (chad.smith) wrote :

Daniel thanks for the solid review here.

1. Oops right. My description in the MP was bogus mentioning Bionic, changed to Artful, though the commit message will still only be the following:

Update Azure's nic hotplug script to use netplan if available instead of ENI

Also avoid appending unnecessary include directives in
/etc/network/interfaces on netplan-enabled systems.

2. We can't quite just write out a netplan match: name: "eth*" because 50-cloud-init.yaml also delivers a match here and I believe that results in a collision on eth0 with the original match keys on macaddress. So cloud-init's eth0 netplan yaml, would still not match for a reattached eth0 if the macaddress changed.
I'm testing now on azure whether we can emit something like:
network:
....version: 2
....ethernets:
........ephemeral:
............dhcp4: true
............match:
................driver: hv_netvsc
................name: "!eth0"
............optional: true

Which hopefully will avoid potential conflicts with 50-cloud-init.yaml for eth0.
Cloud-init will have to sort the detatch/re-attach story for eth0 as a later time I expect.

If the name !eth0 works, then we'll just emit that static file /etc/netplan/99-azure-hotplug.yaml and I'll drop the udev rules from this script on netplan-enabled systems.

3. Thanks for the suggestion on /run/netplan instead of /etc/netplan for ephemeral, will make those changes if the static /etc/netplan/99-azure-hotplug,yaml is not workable across non-eth0 nic add.

4. Much thanks on the 'netplan apply'behavior info. If we can emit a static /etc/netplan/99-azure-hotplug.yaml per suggestion 2, then we probably don't have to go this route. I only called netplan apply here due to the udev add firing, then new yaml written and netplan needing to refresh that content to react to it. Hopefully with your other suggestion and netplan matching regex we can avoid this.

Ryan Harper (raharper) :
Mathieu Trudel-Lapierre (cyphermox) wrote :

Comments inline.

Chad Smith (chad.smith) :
800. By Chad Smith on 2018-05-31

Revert changes to ephemeral_eth.sh and emit a netplan 90-hotplug-azure.yaml

cloud-init only sets up a network configuration at initial boot pinned to
the original macaddress. If we are building a netplan enabled image,
emit a static netplan yaml which will complement the orignal /etc/netplan/50-cloud-init.yaml fallback definition. If the original eth0 is no longer attached to vm, cloud-init's netplan yaml will not match by macaddress and system will fall through to match the following hotpluggedeth0 definition:

        hotpluggedeth0:
            dhcp4: true
            match:
                driver: hv_netvsc
                name: 'eth0'

Chad Smith (chad.smith) wrote :

OK I changed up the approach here after some good testing on Azure (attaching new nics, detatching original eth0 nic, reattaching nics etc).

systemd does let us pass inverted matches per https://www.freedesktop.org/software/systemd/man/systemd.network.html#Name=

Per Daniel's suggestion, we can deliver a static netplan yaml in Azure netplan-enabled cloud-images. But, that netplan needs to complement the original 50-cloud-init.yaml which will not be updated by cloud-init after first boot.

We can then also drop the udev rules and the changes I made to ephemeral_eth.sh script as that would be completely unused in netplan environments.

This way we don't have to run netplan apply on each udev add event.

The static netplan is as follows:
 /etc/netplan/90-hotplug-azure.yaml
# Automatically generated by /usr/local/sbin/ephemeral_eth.sh to setup
# Azure attached nics after initial instance boot
network:
    version: 2
    ethernets:
        ephemeral:
            dhcp4: true
            match:
                driver: hv_netvsc
                name: '!eth0'
            optional: true
        hotpluggedeth0:
            dhcp4: true
            match:
                driver: hv_netvsc
                name: 'eth0'

801. By Chad Smith on 2018-06-01

Move /etc/network/interfaces include directive back out of config_udev.

Appended include directive in /etc/network/interfaces needs to exist for
both upstart and udev solutions. So, it can't live exclusively within
config_udev_or_netplan function. It needs to be present on all non-netplan
environments (upstart and ENI), but test we are not a netplan enabled
image before manipulating /etc/network/interfaces.

Daniel Axtens (daxtens) wrote :

What happens when you remove the original eth0 and replace it? Does the boot happen promptly, or does it stall on the 50-cloud-init.yaml interface?

Would it be simpler just to suppress the original cloud-init netplan config?

Apart from that I think it looks fine, but I'm not an expert on how these things are built.

Dan Watkins (daniel-thewatkins) wrote :

Thanks for this work, Chad, and the testing/investigation you've done; it's much appreciated!

This looks good to me, and I'm happy to be able to move to just configuration rather than having extra code on the image. netplan FTW!

review: Approve
Chad Smith (chad.smith) wrote :

> What happens when you remove the original eth0 and replace it? Does the boot
> happen promptly, or does it stall on the 50-cloud-init.yaml interface?
>
> Would it be simpler just to suppress the original cloud-init netplan config?
>
> Apart from that I think it looks fine, but I'm not an expert on how these
> things are built.

Thanks for the thoughts here Daniel.

I had tested detaching the original nic (which had cloud-init's matched macaddress), and attaching a new nic (new mac) which showed up as eth0. On boot, the new nic didn't match systemd's match rule emitted by netplan due to 50-cloud-init.yaml. systemd-analyze showed no timeout or wait attempts on 50-cloud-init's declared non-optional device.

I'm seeing `systemd-analyze blame` without any timeouts:
...
           989ms systemd-networkd-wait-online.service
...

This reason that boot online didn't wait is networkd's intended behavior, any unmatched network device doesn't honor any additional config (like RequiredForOnline=yes) because that device isn't currently managed by networkd, so it won't wait around for that additional device that doesn't even exist.

The only time network-wait-online will wait, is for matches devices that are present on the system, and specify RequiredForOnline=yes.

Ryan Harper (raharper) :
Chad Smith (chad.smith) wrote :

type fixed Rewrote my last hidden comment fixing typo

All said, I think this patch handles most use cases, hotpluggedeth0 will match any newly attached eth0 that is present on the system during boot (and mark that device RequiredForOnline=yes if 50-cloud-init.yaml doesn't match because of a mac address mismatch.

We won't clean up cloud-init's yaml for a couple reasons:
 1. Long term cloud-init is growing the ability to handle cold plug and hot plug events like this generically when presented with hardware/network deltas in metatdata so it will be able to regenerate the proper config (and remove 90-hotplug-azure.yaml in the process.

 2. If this script removed or disabled the 50-cloud-init.yaml and the original eth0 were reattached in the future, we would want to resurrect that original 50-cloud-init.yaml. This adds complexity and additional udev rule complexity further coupling this script to cloud-init existing behavior. I'm not certain that adding complexity and risk to this one-off solution is something that we want to add here given that cloud-init is actively working on the hotplug feature that should replace this image template in its entirity.

Daniel Axtens (daxtens) wrote :

OK, I'm sold. Thanks Chad.

review: Approve
Chad Smith (chad.smith) :
Francis Ginther (fginther) wrote :

I'm in the process of testing an image built from this MP.

Francis Ginther (fginther) wrote :

Azure has signed off on this change and it's passed our automated tests. I plan to merge this immediately after the next release image is published to azure (expected to be the week of July 2).

Francis Ginther (fginther) wrote :

Approve

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'templates/img-extra-nets.tmpl'
2--- templates/img-extra-nets.tmpl 2017-08-22 19:29:36 +0000
3+++ templates/img-extra-nets.tmpl 2018-06-01 02:30:12 +0000
4@@ -5,6 +5,7 @@
5 # vi: ts=4 noexpandtab syntax=sh
6
7 int_d="/run/network/interfaces.ephemeral.d"
8+netplan_d="/etc/netplan"
9
10 function config_upstart {
11 # Give an upstart job for defining a secondary interface
12@@ -42,8 +43,39 @@
13 EOF
14 }
15
16-function config_udev {
17- # Give a udev script to define eth1+ interfaces, as they appear
18+function config_udev_or_netplan {
19+ if [ -e ${mp}/usr/sbin/netplan ]; then
20+ # When creating a netplan image, emit no udev rules as netplan can
21+ # configure hotplugged devices if we setup the right yaml
22+.
23+ # We explicitly need to match !eth0 for 'ephemeral' eth devices because
24+ # we don't want boot to wait for them if they are later detached.
25+ cat << EOF > ${mp}/${netplan_d}/90-hotplug-azure.yaml
26+# This netplan yaml is delivered in Azure cloud images to support
27+# attaching and detaching nics after the instance first boot.
28+# Cloud-init otherwise handles initial boot network configuration in
29+# /etc/netplan/50-cloud-init.yaml
30+network:
31+ version: 2
32+ ethernets:
33+ ephemeral:
34+ dhcp4: true
35+ match:
36+ driver: hv_netvsc
37+ name: '!eth0'
38+ optional: true
39+ hotpluggedeth0:
40+ dhcp4: true
41+ match:
42+ driver: hv_netvsc
43+ name: 'eth0'
44+
45+EOF
46+ return
47+ fi
48+
49+ # Non-netplan images need ENI ephemeral nic definitions
50+ # Give a udev script to define eth1+ interfaces, as they are added
51 cat << EOF > ${mp}/usr/local/sbin/ephemeral_eth.sh
52 #!/bin/bash
53 ${CLOUD_IMG_STR}
54@@ -93,12 +125,14 @@
55 config_upstart
56 ;;
57 *)
58- config_udev
59+ config_udev_or_netplan
60 ;;
61 esac
62
63-# Add the network ephemeral mount...
64-cat << EOF >> ${mp}/etc/network/interfaces
65+if [ ! -e ${mp}/usr/sbin/netplan ]; then
66+ # Not netplan-enabled, so we need to include ephemeral ENI definitions
67+ # for both upstart (precise|trusty) and non-netplan udev solutions
68+ cat << EOF >> ${mp}/etc/network/interfaces
69
70 # Read the dynamically created configurations from tmpfs mount. If you want a static
71 # configuration, disable the line below. However, you will have to manually configure
72@@ -107,6 +141,7 @@
73 source ${int_d}/*.cfg
74
75 EOF
76+fi
77
78 # END NICS
79 ##################

Subscribers

People subscribed via source and target branches