Unable to apply netplan configuration in ephemeral environment

Bug #1835275 reported by Lee Trager
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
Critical
Lee Trager
2.6
Fix Released
Critical
Lee Trager
maas-images
Fix Released
Critical
Lee Trager
systemd
Incomplete
Undecided
Unassigned

Bug Description

During ephemeral deployments and network validation testing MAAS boots into an Ubuntu ephemeral environment and applies user provided network configuration. Because the system is booting from the standard Ubuntu cloud image[1]. The system boots, cloud-init creates a stock DHCP network config, and MAAS applies custom network configuration when user_data is proceed by cloud-init.

While basic network configuration such as DHCP and setting a static IP address works, bonds[2] and aliases[3] never come up. Additionally netplan apply returns 0 so this is not detected as a failure(LP:1701434).

netplan apply --debug gives no additional information

[1] http://cloud-images.ubuntu.com/
[2] http://paste.ubuntu.com/p/RyVXVrwD7z/
[3] http://paste.ubuntu.com/p/YcmZSR2R9K/

Related branches

Lee Trager (ltrager)
summary: - Ephemeral deployment does not apply bonds
+ Unable to apply netplan configuration in ephemeral environment
Lee Trager (ltrager)
description: updated
Lee Trager (ltrager)
description: updated
Revision history for this message
Lee Trager (ltrager) wrote :

A file exists in /run/netplan which marks the PXE boot interface as critical. Removing this file and running netplan apply does not allow a bond to be created.

Revision history for this message
Lee Trager (ltrager) wrote :

I reran applying the network configuration with debug mode enabled as follows

systemctl stop systemd-networkd
SYSTEMD_LOG_LEVEL=debug /lib/systemd/systemd-networkd

While nothing in particular stood out from the debug output I did see this in journalctl -u systemd-networkd

ul 09 00:37:13 maas-test-2 systemd[1]: Started Network Service.
Jul 09 00:37:13 maas-test-2 systemd-networkd[1734]: bond0: netdev could not be created: Operation not supported
Jul 09 00:37:13 maas-test-2 systemd-networkd[1734]: lo: Link is not managed by us
Jul 09 00:37:13 maas-test-2 systemd-networkd[1734]: enp1s0: Link is not managed by us
Jul 09 00:37:13 maas-test-2 systemd-networkd[1734]: enp2s0: Link is not managed by us
Jul 09 00:37:13 maas-test-2 systemd-networkd[1734]: enp3s0: IPv6 successfully enabled
Jul 09 00:37:13 maas-test-2 systemd-networkd[1734]: enp2s0: Could not join netdev: No such device
Jul 09 00:37:13 maas-test-2 systemd-networkd[1734]: enp2s0: Failed
Jul 09 00:37:13 maas-test-2 systemd-networkd[1734]: enp1s0: Could not join netdev: No such device
Jul 09 00:37:13 maas-test-2 systemd-networkd[1734]: enp1s0: Failed
Jul 09 00:37:13 maas-test-2 systemd-networkd[1734]: Could not emit changed OperationalState: Transport endpoint is not connected
Jul 09 00:37:13 maas-test-2 systemd-networkd[1734]: enp3s0: Gained carrier
Jul 09 00:37:13 maas-test-2 systemd-networkd[1734]: Could not emit changed OperationalState: Transport endpoint is not connected
Jul 09 00:37:13 maas-test-2 systemd-networkd[1734]: enp3s0: DHCPv4 address 192.168.122.241/24 via 192.168.122.1
Jul 09 00:37:13 maas-test-2 systemd-networkd[1734]: Not connected to system bus, not setting hostname.
Jul 09 00:37:13 maas-test-2 systemd-networkd[1734]: Could not emit changed OperationalState: Transport endpoint is not connected
Jul 09 00:37:14 maas-test-2 systemd-networkd[1734]: enp3s0: Gained IPv6LL
Jul 09 00:37:14 maas-test-2 systemd-networkd[1734]: enp3s0: Configured

Revision history for this message
Lee Trager (ltrager) wrote :

If I remove all files in /run/netplan first I get similar output however enp1s0(boot interface) goes down and never comes back up.

Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

This warning:

Jul 09 00:37:13 maas-test-2 systemd-networkd[1734]: Could not emit changed OperationalState: Transport endpoint is not connected

Along with the "could not find netdev" error, as worrying a bit. Could it be that the ephemeral environment doesn't have everything mounted the way it should?

Is this reproducible in an installed system, ie. outside of the ephemeral environment?

Anyway, this doesn't look at all like a netplan bug, more like an issue with systemd-networkd; reassigning to the right package...

affects: netplan → systemd
Changed in systemd:
status: New → Incomplete
Revision history for this message
Lee Trager (ltrager) wrote :

The config works fine when I use MAAS to deploy the system, I only run into this bug when trying to apply the config in the ephemeral environment. Everything should be mounted properly. Below is what is mounted in the ephemeral environment. Is there a way to get systemd-networkd to tell me exactly what resource is missing?

sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
udev on /dev type devtmpfs (rw,nosuid,relatime,size=988392k,nr_inodes=247098,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,noexec,relatime,size=204104k,mode=755)
/root.tmp.img (deleted) on /media/root-ro type squashfs (ro,relatime)
tmpfs-root on /media/root-rw type tmpfs (rw,relatime)
overlayroot on / type overlay (rw,relatime,lowerdir=/media/root-ro,upperdir=/media/root-rw/overlay,workdir=/media/root-rw/overlay-workdir/_)
copymods on /lib/modules type tmpfs (rw,relatime)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
mqueue on /dev/mqueue type mqueue (rw,relatime)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,pagesize=2M)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
configfs on /sys/kernel/config type configfs (rw,relatime)
lxcfs on /var/lib/lxcfs type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
tmpfs on /run/user/1000 type tmpfs (rw,nosuid,nodev,relatime,size=204100k,mode=700,uid=1000,gid=1000)

Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

After looking some more into this, I noticed two more things that are off:

Jul 09 00:37:13 maas-test-2 systemd-networkd[1734]: bond0: netdev could not be created: Operation not supported

^ Operation not supported here is quite weird. I can't think of a reason why it could happen. Are you using a special kernel there? Is it missing some module that needs to be modprobed (bonding)? What else might make creating a bond UNSUPP?

Also, the other errors:

Jul 09 00:37:13 maas-test-2 systemd-networkd[1734]: Could not emit changed OperationalState: Transport endpoint is not connected

Indicate there is no valid connection to the systemd bus. Why would DBUS not be running? This isn't a question of mountpoints, but at the very least a question of the makeup of the ephemeral environment, for instance, if not all the services are started correctly there.

If you're in the ephemeral environment, what happens if you run:

systemctl is-system-running

and

sudo systemd-analyze critical-chain

Revision history for this message
Lee Trager (ltrager) wrote :

It seems the ephemeral environment is missing some kernel modules. I've added lp:maas-images to this bug to fix that. After I installed the linux-modules-$(uname -r) package I was able to bring up the devices! When using a config with static IP addresses I had to remove /run/netplan/* but I believe that is to be expected.

Revision history for this message
Lee Trager (ltrager) wrote :

The only network kernel modules missing is the bond kernel module however the ephemeral environment is missing a few others

ib_iser - OpenSCSI, which is in the ephemeral image, wants to load this kernel module. This is causing systemctl is-system-running to show up as degraded. Since we were already discussing adding Infiniband drivers for IBM S390X I think we should add these
aes_x86_64 - This autoloads once I install the linux-modules-$(uname -r) package. This kernel can be used by the RAID driver to get faster performance. Since a deployed Ubuntu environment will have this I think we should add it as well
hid_generic, virtio_gpu - Both are accelerated drivers for using user input with KVM. Since most users are using the VM remote and few if any install a graphical environment I think we can continue to skip these.

Revision history for this message
Lee Trager (ltrager) wrote :

Correction aes-x86_64 is in the initrd. My comparison script thought it was missing due to the kernel showing it as aes_x86_64 and the module file name is aes-x86_64.ko

Changed in maas:
status: Triaged → Invalid
Changed in maas-images:
status: New → Fix Released
importance: Undecided → Critical
assignee: nobody → Lee Trager (ltrager)
Lee Trager (ltrager)
Changed in maas:
status: Invalid → Fix Committed
Changed in maas:
milestone: none → 2.7.0b1
Changed in maas:
status: Fix Committed → Fix Released
Lee Trager (ltrager)
Changed in maas:
assignee: nobody → Lee Trager (ltrager)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.