netplan dhcp interface with no default route causes systemd-networkd-wait-online to hang

Bug #1804478 reported by Mikko Korkalo
32
This bug affects 5 people
Affects Status Importance Assigned to Milestone
systemd (Ubuntu)
Fix Released
Medium
Unassigned
Bionic
Fix Released
Medium
Dan Streetman

Bug Description

[impact]

systemd-networkd, when configured to use ipv4 dhcp for an interface can hang. This triggers in two known cases:
a) if configured to ignore the dhcp server's route
b) the server provides no route
Then systemd-networkd will hang waiting for the interface's configuration to complete (until it times out).

This delays boot as well as any restart to systemd-networkd.

The fix is backporting upstream commit [1]

[1]: https://github.com/systemd/systemd/commit/223932c7

[test case]

There are two ways to test this.
A) make the system ignore Routes (slightly less realistic but easier to test)
configure an interface using systemd-networkd:

$ cat /etc/systemd/network/20-ens7.network
[Match]
Name=ens7

[Network]
DHCP=ipv4

[DHCP]
UseRoutes=false

then reboot, and check:

$ systemctl status systemd-networkd-wait-online
● systemd-networkd-wait-online.service - Wait for Network to be Configured
   Loaded: loaded (/lib/systemd/system/systemd-networkd-wait-online.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Mon 2019-04-08 23:59:26 UTC; 2min 59s ago
[...]
Apr 08 23:57:27 lp1804478 systemd[1]: Starting Wait for Network to be Configured...
Apr 08 23:57:30 lp1804478 systemd-networkd-wait-online[593]: managing: ens3
Apr 08 23:57:30 lp1804478 systemd-networkd-wait-online[593]: ignoring: lo
Apr 08 23:59:26 lp1804478 systemd-networkd-wait-online[593]: Event loop failed: Connection timed out
Apr 08 23:59:26 lp1804478 systemd[1]: systemd-networkd-wait-online.service: Main process exited, code=exited, status=1/FAILURE
Apr 08 23:59:26 lp1804478 systemd[1]: systemd-networkd-wait-online.service: Failed with result 'exit-code'.
Apr 08 23:59:26 lp1804478 systemd[1]: Failed to start Wait for Network to be Configured.

B) Make a dhcp server to not provide rules
Prepare a Ubuntu Bionic Guest under libvirt, e.g. using uvtool:
  $ uvt-simplestreams-libvirt --verbose sync --source http://cloud-images.ubuntu.com/daily arch=amd64 label=daily release=bionic
  $ uvt-kvm create --password ubuntu bionic arch=amd64 release=bionic label=daily

Create a isolated network:
$ cat > isolate.xml << EOF
<network>
  <name>isolated</name>
  <!-- <bridge name='virbriso1' stp='on' delay='0'/> -->
  <ip address='192.168.251.1' netmask='255.255.255.0'>
    <dhcp>
      <range start='192.168.251.2' end='192.168.251.254'/>
    </dhcp>
  </ip>
</network>
EOF
$ virsh net-define isolate.xml
$ virsh net-start isolated

Edit the guest and add that network:
$ virsh shutdown bionic
$ virsh edit bionic
#add this:
  <interface type='network'>
    <source network='isolated'/>
    <mac address='52:54:00:c1:69:08'/>
    <model type='virtio'/>
    <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
  </interface>

In the guest make the device use DHCP:
Add this to /etc/netplan/50-cloud-init.yaml
        ens7:
            dhcp4: true
            match:
                macaddress: 52:54:00:c1:69:08
            set-name: ens7
$ sudo netplan apply

When rebooting the guest again it will fail
● systemd-networkd-wait-online.service - Wait for Network to be Configured
   Loaded: loaded (/lib/systemd/system/systemd-networkd-wait-online.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Tue 2019-04-09 07:07:48 UTC; 1min 41s ago
     Docs: man:systemd-networkd-wait-online.service(8)
  Process: 563 ExecStart=/lib/systemd/systemd-networkd-wait-online (code=exited, status=1/FAILURE)
 Main PID: 563 (code=exited, status=1/FAILURE)

Apr 09 07:05:48 bionic-dgx2 systemd[1]: Starting Wait for Network to be Configured...
Apr 09 07:05:50 bionic-dgx2 systemd-networkd-wait-online[563]: managing: ens3
Apr 09 07:07:48 bionic-dgx2 systemd-networkd-wait-online[563]: Event loop failed: Connection timed out
Apr 09 07:07:48 bionic-dgx2 systemd[1]: systemd-networkd-wait-online.service: Main process exited, code=exited, status=1/FAILURE
Apr 09 07:07:48 bionic-dgx2 systemd[1]: systemd-networkd-wait-online.service: Failed with result 'exit-code'.
Apr 09 07:07:48 bionic-dgx2 systemd[1]: Failed to start Wait for Network to be Configured.

When working cases A and B will both have the service starting fast and happy.
$ sudo systemctl status systemd-networkd-wait-online
   Active: active (exited) since Tue 2019-04-09 07:17:16 UTC; 12s ago
[...]
Apr 09 07:17:16 bionic-dgx2 systemd-networkd-wait-online[575]: managing: ens3
[...]
Apr 09 07:17:16 bionic-dgx2 systemd-networkd-wait-online[575]: managing: ens7

[regression potential]

this alters how systemd-networkd works when it starts or restarts, specifically how it handles dhcp4; regressions would be around setting up interface(s) using dhcpv4, possibly failing to correctly configure dhcpv4 interface(s) or even failing internal assertions and exiting.

The problem is that without a route there will be no netlink even t that can be tapped.

But the change is only a 4 line change that does explicitly link->dhcp4_configured = true and calling link_check_ready() in dhcp4_address_handler() in case link_set_dhcp_routes() sent no netlink messages (dhcp4_messages is zero).

Therefore the change seems rather small, reviewable and safe.

[other info]

original description:

--

root cause in systemd bug fixed in Commit
223932c786ada7f758a7b7878a6ad2dae0d1e5fb:
https://github.com/systemd/systemd/pull/8728

Environment:
Ubuntu 18.04 LTS amd64
systemd package version 237-3ubuntu10.9

How to trigger:
1. add netplan interface with dhcpv4 client enabled:
        enp0s8:
            addresses: []
            dhcp4: true
2. configure dhcp server to NOT give a default route, or any route for that matter
3. reboot
4. systemd-networkd-wait-online will block until dhcp renew is triggered
root@sensor1:~# /lib/systemd/systemd-networkd-wait-online --timeout=3
Event loop failed: Connection timed out
root@sensor1:~# /lib/systemd/systemd-networkd-wait-online --timeout=3 --ignore=enp0s8
managing: enp0s3
ignoring: lo
ignoring: enp0s8

How to fix:
Backport upstream changes from systemd (see related systemd ticket).

Benjamin Drung (bdrung)
summary: - netplan dhcp interface with no default route causes causes systemd-
- networkd-wait-online to hang
+ netplan dhcp interface with no default route causes systemd-networkd-
+ wait-online to hang
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in systemd (Ubuntu):
status: New → Confirmed
Revision history for this message
Benjamin Drung (bdrung) wrote :

The systemd bug #3752 was fixed by commit 444b01704648c2f1292531f17c3b8c95bf1a9896 (networkd: configure link even if no routes have been received by dhcp (#6886)] which should be included in Ubuntu 18.04 systemd 237:

$ git tag --contains 444b01704648c2f1292531f17c3b8c95bf1a9896
v236
v237
v238
v239

Revision history for this message
Benjamin Drung (bdrung) wrote :

Commit 223932c786ada7f758a7b7878a6ad2dae0d1e5fb [networkd: fix dhcp4 link without routes not being considered ready (#8728)]. Only systemd release v239 contains this fix.

no longer affects: systemd
Benjamin Drung (bdrung)
Changed in systemd:
status: New → Fix Released
Benjamin Drung (bdrung)
description: updated
description: updated
Changed in systemd (Ubuntu):
status: Confirmed → Triaged
Revision history for this message
Benjamin Drung (bdrung) wrote :

I verified that applying commit 223932c786ada7f758a7b7878a6ad2dae0d1e5fb from https://github.com/systemd/systemd/pull/8728 on top of systemd 237-3ubuntu10.9 solves this bug.

Therefore please include the patch in systemd for Ubuntu 18.04.

dann frazier (dannf)
Changed in systemd (Ubuntu Bionic):
status: New → Confirmed
Dan Streetman (ddstreet)
description: updated
Changed in systemd (Ubuntu Bionic):
assignee: nobody → Dan Streetman (ddstreet)
importance: Undecided → Medium
status: Confirmed → In Progress
Changed in systemd (Ubuntu):
status: Triaged → Fix Released
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks Dan[FS] for working on this!
I tested the PPA of Dannf and can also confirm it works fine, I extended the SRU template of this bug to be slightly more readable and SRU acceptable (more details in regression considerations).

Furthermore I added a testcase for the mode when external DHCP servers do not push a route.

Overall that seems in good shape and I encourage you to go on with the fix.

description: updated
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Please test proposed package

Hello Mikko, or anyone else affected,

Accepted systemd into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/systemd/237-3ubuntu10.21 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in systemd (Ubuntu Bionic):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-bionic
Revision history for this message
Łukasz Zemczak (sil2100) wrote :

As with any systemd SRUs, the more testing we do, the better. Can we get both A and B test-cases ran as part of the verification?

description: updated
description: updated
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (3.3 KiB)

I checked case (B) that I added.
First I created a new setup as described (to make sure no old modifications influence the verification)
That worked, I saw the wait line on boot:
[*** ] A start job is running for Wait for… to be Configured (29s / no limit)

After boot I had the wait time in the failed service:
  $ systemctl status systemd-networkd-wait-online
● systemd-networkd-wait-online.service - Wait for Network to be Configured
   Loaded: loaded (/lib/systemd/system/systemd-networkd-wait-online.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Tue 2019-04-16 13:52:04 UTC; 15s ago

Installing the version in proposed ...
$ sudo apt install systemd=237-3ubuntu10.21
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following package was automatically installed and is no longer required:
  grub-pc-bin
Use 'sudo apt autoremove' to remove it.
The following additional packages will be installed:
  libnss-systemd libpam-systemd libsystemd0
Suggested packages:
  systemd-container
The following packages will be upgraded:
  libnss-systemd libpam-systemd libsystemd0 systemd
4 upgraded, 0 newly installed, 0 to remove and 42 not upgraded.
Need to get 3318 kB of archives.
After this operation, 11.3 kB of additional disk space will be used.
Do you want to continue? [Y/n] Y
Get:1 http://archive.ubuntu.com/ubuntu bionic-proposed/main amd64 libnss-systemd amd64 237-3ubuntu10.21 [105 kB]
Get:2 http://archive.ubuntu.com/ubuntu bionic-proposed/main amd64 libpam-systemd amd64 237-3ubuntu10.21 [108 kB]
Get:3 http://archive.ubuntu.com/ubuntu bionic-proposed/main amd64 systemd amd64 237-3ubuntu10.21 [2901 kB]
Get:4 http://archive.ubuntu.com/ubuntu bionic-proposed/main amd64 libsystemd0 amd64 237-3ubuntu10.21 [204 kB]
Fetched 3318 kB in 1s (4658 kB/s)
(Reading database ... 60080 files and directories currently installed.)
Preparing to unpack .../libnss-systemd_237-3ubuntu10.21_amd64.deb ...
Unpacking libnss-systemd:amd64 (237-3ubuntu10.21) over (237-3ubuntu10.19) ...
Preparing to unpack .../libpam-systemd_237-3ubuntu10.21_amd64.deb ...
Unpacking libpam-systemd:amd64 (237-3ubuntu10.21) over (237-3ubuntu10.19) ...
Preparing to unpack .../systemd_237-3ubuntu10.21_amd64.deb ...
Unpacking systemd (237-3ubuntu10.21) over (237-3ubuntu10.19) ...
Preparing to unpack .../libsystemd0_237-3ubuntu10.21_amd64.deb ...
Unpacking libsystemd0:amd64 (237-3ubuntu10.21) over (237-3ubuntu10.19) ...
Setting up libsystemd0:amd64 (237-3ubuntu10.21) ...
Processing triggers for ureadahead (0.100.0-20) ...
Processing triggers for libc-bin (2.27-3ubuntu1) ...
Setting up systemd (237-3ubuntu10.21) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
Processing triggers for dbus (1.12.2-1ubuntu1) ...
Setting up libnss-systemd:amd64 (237-3ubuntu10.21) ...
Setting up libpam-systemd:amd64 (237-3ubuntu10.21) ...
Processing triggers for libc-bin (2.27-3ubuntu1) ...

Rebooting:
- bootup was fast
- service works fine
  $ systemctl status systemd-networkd-wait-online
● systemd-networkd-wait-online.service - Wait for Network to be Configured
   Loaded: loaded (/lib/systemd/system/systemd-networkd-wait-online.se...

Read more...

Revision history for this message
dann frazier (dannf) wrote :

And here's test A:

ubuntu@anuchin:~$ cat /etc/systemd/network/20-enP2p1s0f2.network
[Match]
Name=enP2p1s0f3

[Network]
DHCP=ipv4

[DHCP]
UseRoutes=false
ubuntu@anuchin:~$ dpkg-query -W systemd
systemd 237-3ubuntu10.19
ubuntu@anuchin:~$ systemctl status systemd-networkd-wait-online
● systemd-networkd-wait-online.service - Wait for Network to be Configured
   Loaded: loaded (/lib/systemd/system/systemd-networkd-wait-online.service; ena
   Active: failed (Result: exit-code) since Tue 2019-04-16 17:14:07 UTC; 3min 4s
     Docs: man:systemd-networkd-wait-online.service(8)
  Process: 1345 ExecStart=/lib/systemd/systemd-networkd-wait-online (code=exited
 Main PID: 1345 (code=exited, status=1/FAILURE)

Apr 16 17:12:08 anuchin systemd[1]: Starting Wait for Network to be Configured..
Apr 16 17:14:07 anuchin systemd-networkd-wait-online[1345]: Event loop failed: C
Apr 16 17:14:07 anuchin systemd[1]: systemd-networkd-wait-online.service: Main p
Apr 16 17:14:07 anuchin systemd[1]: systemd-networkd-wait-online.service: Failed
Apr 16 17:14:07 anuchin systemd[1]: Failed to start Wait for Network to be Confi

After Upgrade/reboot:
ubuntu@anuchin:~$ dpkg-query -W systemd
systemd 237-3ubuntu10.21
ubuntu@anuchin:~$ systemctl status systemd-networkd-wait-online
● systemd-networkd-wait-online.service - Wait for Network to be Configured
   Loaded: loaded (/lib/systemd/system/systemd-networkd-wait-online.service; ena
   Active: active (exited) since Tue 2019-04-16 17:20:18 UTC; 4min 14s ago
     Docs: man:systemd-networkd-wait-online.service(8)
  Process: 1403 ExecStart=/lib/systemd/systemd-networkd-wait-online (code=exited
 Main PID: 1403 (code=exited, status=0/SUCCESS)

Apr 16 17:20:13 anuchin systemd[1]: Starting Wait for Network to be Configured..
Apr 16 17:20:18 anuchin systemd-networkd-wait-online[1403]: managing: enP2p1s0f2
Apr 16 17:20:18 anuchin systemd-networkd-wait-online[1403]: ignoring: lo
Apr 16 17:20:18 anuchin systemd-networkd-wait-online[1403]: managing: enP2p1s0f3
Apr 16 17:20:18 anuchin systemd[1]: Started Wait for Network to be Configured.

tags: added: verification-done verification-done-bionic
removed: verification-needed verification-needed-bionic
Mathew Hodson (mhodson)
Changed in systemd:
importance: Undecided → Unknown
status: Fix Released → Unknown
Mathew Hodson (mhodson)
affects: systemd → ubuntu-translations
Changed in ubuntu-translations:
importance: Unknown → Undecided
status: Unknown → New
no longer affects: ubuntu-translations
Changed in systemd (Ubuntu):
importance: Undecided → Medium
Revision history for this message
Dan Streetman (ddstreet) wrote :

autopkgtest regressions:

ignorable:

systemd (ppc64el): bug 1821625
Assertion 'wait_for_terminate_and_check("memoryseccomp-mmap", pid, WAIT_LOG) == EXIT_SUCCESS' failed at ../src/test/test-seccomp.c:427, function test_memory_deny_write_execute_mmap(). Aborting.
FAIL: test-seccomp (code: 134)

linux (various): too flaky to investigate; they fail most of the time.

linux-aws-edge: bug 1723223

remaining:

network-manager (various): re-running, but may need to investigate this failure

Revision history for this message
Dan Streetman (ddstreet) wrote :

autopkgtest regression:

network-manager (various): bug 1825946, can be ignored.

all autopkgtest failures explained and ignorable.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package systemd - 237-3ubuntu10.21

---------------
systemd (237-3ubuntu10.21) bionic; urgency=medium

  * d/p/networkd-fix-dhcp4-link-without-routes-not-being-con.patch:
    - fix dhcp4 link without routes not being considered ready
    - (LP: #1804478)

 -- Dan Streetman <email address hidden> Mon, 15 Apr 2019 08:29:50 -0400

Changed in systemd (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for systemd has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.