lxcbr0 dissappears on Ubuntu 15.10

Bug #1512749 reported by brian mullan
28
This bug affects 5 people
Affects Status Importance Assigned to Milestone
network-manager (Ubuntu)
Fix Released
Critical
Unassigned
Wily
Fix Released
Critical
Unassigned
Xenial
Fix Released
Critical
Unassigned

Bug Description

=== SRU ===
Rationale:
 Network Manager in wily triggers on newly created interfaces and resets their network configuration even though it's not supposed to manage them at all.
 This breaks LXC and quite possibly libvirt as those provide bridges which are then completely unconfigured by Network Manager.

Test case:
 1) Update (or not) network-manager
 2) Restart it
 3) Install lxc
 4) Check that lxcbr0 exists and has an IP configured

Regression potential:
 The change to Network Manager is related to udev handling of new devices, even though normal operation on a regular desktop machine was tested, it's not impossible that this may regress handling of some devices.
 The fix was cherry-picked directly from upstream, so did go through code review and has been publicly available for a while.

 This fixes a significant regression compared to Ubuntu 15.04 and looks less risky than the alternative workaround which was uploaded earlier.

=== Original report ===
After initial upgrade from Ubuntu 15.04 to Ubuntu 15.10 LXC worked for a couple days then failed. I found that the lxcbr0 interface no longer existed.

I reported this on the lxc-user alias and about the same time several others had this happen to them also.

Serge Hallyn requested I open a launchpad bug and post some info but before I could gather that info the system returned to normal (re lxcbr0 was back) the next day after the server was booted up again.

note: at least one other person had this happen to them also (lxcbr0 came back by itself).

Today, I booted this server again and ... again lxcbr0 was missing where it had been working last night.

Below is all of the Info Serge Hallyn asked me to post.

$ ifconfig lxcbr0
lxcbr0: error fetching interface information: Device not found

$ journalctl -u lxc-net
-- Logs begin at Tue 2015-11-03 07:25:22 EST, end at Tue 2015-11-03 10:02:08 EST
Nov 03 07:25:48 server3 systemd[1]: Starting LXC network bridge setup...
Nov 03 07:25:50 server3 lxc-net[913]: dnsmasq: failed to create listening socket
Nov 03 07:25:50 server3 lxc-net[913]: Failed to setup lxc-net.
Nov 03 07:25:50 server3 systemd[1]: Started LXC network bridge setup.

root@server3:/home/bmullan# /usr/lib/x86_64-linux-gnu/lxc/lxc-net stop # note - execution just returns to command line
root@server3:/home/bmullan#

>> and lxcbr0 is still missing

# ifconfig lxcbr0
lxcbr0: error fetching interface information: Device not found

root@server3:/home/bmullan# /usr/lib/x86_64-linux-gnu/lxc/lxc-net start

dnsmasq: failed to create listening socket for 10.0.3.1: Cannot assign requested address
Failed to setup lxc-net.
root@server3:/home/bmullan#

$ uname -a
Linux server3 4.2.0-16-generic #19-Ubuntu SMP Thu Oct 8 15:35:06 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 15.10
Release: 15.10
Codename: wily

Revision history for this message
Stéphane Graber (stgraber) wrote :

Just managed to reproduce it here, it's caused by Network Manager deciding to mess with our bridge instead of leaving it alone as it should.

Current workarounds include:
 - reboot
 - systemctl stop network-manager && systemctl restart lxc-net && systemct start network-manager

This only kicks in when the interface is brought up after network-manager, so it doesn't affect boot since lxc-net starts before network-manager and it doesn't affect upgrades where a container is already running (as we don't destroy the bridge in that case).

But it does absolutely affect all new installs and upgrades done when no container is running.

This is a critical regression in Network Manager behavior, NM should NEVER touch non-physical interfaces and it should even less start flushing existing network configuration.

Changes are this affects libvirt too, unless libvirt bring up takes long enough to win the race against NM.

affects: lxc (Ubuntu) → network-manager (Ubuntu)
Changed in network-manager (Ubuntu):
importance: Undecided → Critical
status: New → Triaged
Changed in network-manager (Ubuntu Wily):
status: New → Triaged
importance: Undecided → Critical
tags: added: regression-release
Revision history for this message
Stéphane Graber (stgraber) wrote :

Tracking for lxc too as we'll be landing a workaround there for now.

Changed in lxc (Ubuntu Wily):
status: New → Triaged
Changed in lxc (Ubuntu Xenial):
status: New → Triaged
Changed in lxc (Ubuntu Wily):
importance: Undecided → Critical
Changed in lxc (Ubuntu Xenial):
importance: Undecided → Critical
Revision history for this message
Stéphane Graber (stgraber) wrote :

simplest reproduce for NM: ip link add dev test type bridge && ip link set dev test up && ip addr add 1.2.3.4/24 dev test && sleep 2 && ifconfig test && ip link delete test

Revision history for this message
Stéphane Graber (stgraber) wrote :

A proper network-manager fix is being uploaded now, I'll confirm it and then revert the LXC workaround.

Revision history for this message
Stéphane Graber (stgraber) wrote :

Confirmed that the new network-manager does fix the issue here. I've now uploaded a reverted lxc.

Changed in lxc (Ubuntu Wily):
status: Triaged → Invalid
Changed in lxc (Ubuntu Xenial):
status: Triaged → Invalid
description: updated
Revision history for this message
Stéphane Graber (stgraber) wrote : Please test proposed package

Hello brian, or anyone else affected,

Accepted network-manager into wily-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/network-manager/1.0.4-0ubuntu5.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in network-manager (Ubuntu Wily):
status: Triaged → Fix Committed
tags: added: verification-needed
Revision history for this message
Stéphane Graber (stgraber) wrote :

As this is currently breaking all lxc installs, therefore breaking JuJu, LXD and anything else that depends or installs LXC, I am going to fast track this SRU.

Test plan before releasing is:
 - Test on a wired machine without LXC
 - Test on a wireless machine without LXC
 - Test on a wired machine with LXC
 - Test on a wireless machine with LXC

Changed in network-manager (Ubuntu Xenial):
status: Triaged → Fix Committed
Revision history for this message
brian mullan (bmullan) wrote : Re: [Bug 1512749] Please test proposed package

will do...

On Tue, Nov 3, 2015 at 4:20 PM, Stéphane Graber <email address hidden>
wrote:

> Hello brian, or anyone else affected,
>
> Accepted network-manager into wily-proposed. The package will build now
> and be available at https://launchpad.net/ubuntu/+source/network-
> manager/1.0.4-0ubuntu5.1 in a few hours, and then in the -proposed
> repository.
>
> Please help us by testing this new package. See
> https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to
> enable and use -proposed. Your feedback will aid us getting this update
> out to other Ubuntu users.
>
> If this package fixes the bug for you, please add a comment to this bug,
> mentioning the version of the package you tested, and change the tag
> from verification-needed to verification-done. If it does not fix the
> bug for you, please add a comment stating that, and change the tag to
> verification-failed. In either case, details of your testing will help
> us make a better decision.
>
> Further information regarding the verification process can be found at
> https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in
> advance!
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1512749
>
> Title:
> lxcbr0 dissappears on Ubuntu 15.10
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1512749/+subscriptions
>

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package network-manager - 1.0.4-0ubuntu6

---------------
network-manager (1.0.4-0ubuntu6) xenial; urgency=medium

  * debian/patches/git_fix_race_external_down_e29ab543.patch: fix race in
    wrongly managing devices due to receiving udev signals late. (LP: #1512749)

 -- Mathieu Trudel-Lapierre <email address hidden> Tue, 03 Nov 2015 14:39:12 -0600

Changed in network-manager (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
Stéphane Graber (stgraber) wrote :

I have confirmed on both wily and xenial that the new network-manager does behave properly.

After updating network-manager and restarting it (or rebooting as the updater recommends), installing lxd which then pulls lxc as a dependency does work reliably.

I've also confirmed that wired connectivity as well as wireless connectivity appears unaffected by this fix, my laptop has been running the xenial version ever since Mathieu uploaded it and it's been working perfectly.

I'm marking this verification-done and given the importance of the fix, will immediately release it to wily-updates.

tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package network-manager - 1.0.4-0ubuntu5.1

---------------
network-manager (1.0.4-0ubuntu5.1) wily; urgency=medium

  * debian/patches/git_fix_race_external_down_e29ab543.patch: fix race in
    wrongly managing devices due to receiving udev signals late. (LP: #1512749)

 -- Mathieu Trudel-Lapierre <email address hidden> Tue, 03 Nov 2015 14:42:19 -0600

Changed in network-manager (Ubuntu Wily):
status: Fix Committed → Fix Released
Revision history for this message
Stéphane Graber (stgraber) wrote : Update Released

The verification of the Stable Release Update for network-manager has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
modolo (modolo) wrote :

Hi!

After upgrading - network-manager (1.0.4-0ubuntu5.1) - all went back to work!

modolo@gsat019046:~⟫ sudo systemctl restart lxc-net
modolo@gsat019046:~⟫ sudo systemctl status lxc-net
● lxc-net.service - LXC network bridge setup
   Loaded: loaded (/lib/systemd/system/lxc-net.service; enabled; vendor preset: enabled)
   Active: active (exited) since Qua 2015-11-04 19:26:20 BRST; 1s ago
  Process: 28899 ExecStop=/usr/lib/x86_64-linux-gnu/lxc/lxc-net stop (code=exited, status=0/SUCCESS)
  Process: 28904 ExecStart=/usr/lib/x86_64-linux-gnu/lxc/lxc-net start (code=exited, status=0/SUCCESS)
 Main PID: 28904 (code=exited, status=0/SUCCESS)
   Memory: 380.0K
      CPU: 9ms
   CGroup: /system.slice/lxc-net.service
           └─28952 dnsmasq -u lxc-dnsmasq --strict-order --bind-interfaces --pid-file=/run/lxc/dnsmasq.pid --listen-address 10.0.3.1 --dhcp-range 10...

Nov 04 19:26:20 gsat019046 systemd[1]: Starting LXC network bridge setup...
Nov 04 19:26:20 gsat019046 dnsmasq[28952]: iniciado, versão 2.75 tamanho de cache 150
Nov 04 19:26:20 gsat019046 dnsmasq[28952]: opções de tempo de compilação: IPv6 GNU-getopt DBus i18n IDN DHCP DHCPv6 no-Lua TFTP conntrack ip…ct inotify
Nov 04 19:26:20 gsat019046 dnsmasq-dhcp[28952]: DHCP, IP range 10.0.3.2 -- 10.0.3.254, lease time 1h
Nov 04 19:26:20 gsat019046 dnsmasq-dhcp[28952]: DHCP, sockets bound exclusively to interface lxcbr0
Nov 04 19:26:20 gsat019046 dnsmasq[28952]: lendo /etc/resolv.conf
Nov 04 19:26:20 gsat019046 dnsmasq[28952]: usando nome de servidor 127.0.1.1#53
Nov 04 19:26:20 gsat019046 dnsmasq[28952]: ler /etc/hosts - 7 endereços
Nov 04 19:26:20 gsat019046 systemd[1]: Started LXC network bridge setup.
Hint: Some lines were ellipsized, use -l to show in full.

Mathew Hodson (mhodson)
no longer affects: lxc (Ubuntu Wily)
no longer affects: lxc (Ubuntu)
no longer affects: lxc (Ubuntu Xenial)
tags: added: wily
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.