nm-online times out, failing autopkgtests

Bug #1936312 reported by Lukas Märdian
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
network-manager (Ubuntu)
Fix Released
High
Unassigned

Bug Description

The nm-online util does not return successfully as of 1.32.2-0ubuntu1 (currently in impish-proposed).

systemd VS network-manager autopkgtests (in LXD) fail, because the autopkgtest-virt-lxd runner tries to wait for "network ready" before it is reconnecting after a reboot. This happens by checking the 'systemctl start network-online.target' command, in turn executing the 'nm-online -s -q' util (via NetworkManager-wait-online.service), in addition to systemd-networkd-wait-online.service (which seems to be working OK).

nm-online does not return successfully anymore as of 1.32.2-0ubuntu1, but it times out. That is a regression compared to v1.30.

Reproducer:
$ lxc launch ubuntu-daily:impish test
$ lxc exec test bash
# apt install network-manager #(version 1.30.0-1ubuntu4)
# time systemctl start network-online.target

real 0m0.071s
user 0m0.021s
sys 0m0.023s
# time nm-online -s
Connecting............... 30s [started]

real 0m0.043s
user 0m0.027s
sys 0m0.006s
root@test:~# echo $?
0

Everything is OK up to here.

# vim /etc/apt/sources.list #(enable impish-proposed)
# apt update && apt install network-manager #(1.32.2-0ubuntu1)
# time nm-online -s
Connecting............... 0s [startup-pending]

real 0m30.019s
user 0m0.021s
sys 0m0.014s

# time systemctl start network-online.target

real 1m0.104s
user 0m0.013s
sys 0m0.023s
# journalctl -u NetworkManager | grep startup
Jul 15 09:05:06 test NetworkManager[1912]: <info> [1626339906.6507] manager: startup complete
root@test:~# NetworkManager -V
1.32.2
root@test:~# nm-online -s
Connecting............... 0s [startup-pending]
root@test:~# echo $?
1
root@test:~# nm-online
Connecting............... 0s [offline]
root@test:~# echo $?
1

The journal log contains the "startup completed" line, so 'nm-online -s' should return immediately. But as we can see, nm-online fails (after a timeout) and thus blocks network-online.target, failing the systemd autopkgtests (in LXD).

Lukas Märdian (slyon)
tags: added: update-excuse
Revision history for this message
Lukas Märdian (slyon) wrote :
Revision history for this message
Sebastien Bacher (seb128) wrote :

Thanks Lukas, I hadn't noticed that the systemd was reverted, I added back the nm change for now

Changed in network-manager (Ubuntu):
status: New → Fix Committed
importance: Undecided → High
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package network-manager - 1.32.2-0ubuntu2

---------------
network-manager (1.32.2-0ubuntu2) impish; urgency=medium

  * debian/patches/ubuntu_revert_systemd.patch:
    - restore the workaround for the systemd and lxd issue, it's still
      needed (lp: #1936312)

 -- Sebastien Bacher <email address hidden> Thu, 15 Jul 2021 16:45:08 +0200

Changed in network-manager (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
Lukas Märdian (slyon) wrote :

This should have been fixed upstream by: https://github.com/systemd/systemd/pull/18559

Or https://github.com/systemd/systemd/pull/18684 and https://github.com/systemd/systemd/pull/18717 respectively. I wonder why this fix isn't working for network-manager, while it seems to be good for LXD?

Revision history for this message
Lukas Märdian (slyon) wrote (last edit ):

After reading a bit more about this issue, I can see where the conflict happens:

1/ systemd requires any container manager to mount /sys read-only, according to https://systemd.io/CONTAINER_INTERFACE/ in order to make udevd behave properly.

2/ NetworkManager checks for /sys to be read-only in order to decide if it should not be using udev

3/ lxc has a different understanding (and requirements), so it mounts /sys r/w, leading to confusion in NetworkManager.

=> IMO NetworkManager needs to do an improved check if it is running inside a container (in addition to https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/commit/78dc57d8f4af1a230053473f3eb7c18d1eaf0730) as the /sys read-only check is not enough for the LXC environment.
Maybe NM could be extended to check for something like "systemd-detect-virt --container == 0" and not use udev in this case?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.