Network Manager service fails when embedded to the Ubuntu Core 20 Image

Bug #1961442 reported by Bugra Aydogar
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
snappy-hwe-snaps
Fix Released
Undecided
Unassigned

Bug Description

Hi,

When network-manager and modem-manager are added to the Ubuntu Core image, the network-manager service fails to start. (Please see the attached log for details)

However, if the network-manager and modem-manager are removed from the image and installed later on, network-manager service works just fine.

The Hardware Info:

AAEON EPIC-BT07 (X86)
* Built-in network Interface
* USB based GSM module attached (`Quectel EC200T-EU`).

The Software Info:
Ubuntu Core 20
network-manager and modem-manager follows 20/stable

Please let me know if you need more information.

Related branches

Revision history for this message
Bugra Aydogar (bugraaydogar) wrote :
description: updated
summary: - Network Manager service fails when embedded to the Ubuntu Core Image
+ Network Manager service fails when embedded to the Ubuntu Core 20 Image
description: updated
Revision history for this message
Tony Espy (awe) wrote :

Can you please add the output of the following commands to the bug after booting the image which includes the network-manager snap?

 - snap list
 - snap services network-manager
 - systemctl status snap.network-manager.networkmanager.service
 - network-manager.nmcli

Can you also include the output of these commands when you *install* the network-manager snap and it works:

 - network-manager.nmcli c
 - network-manager.nmcli d

Changed in snappy-hwe-snaps:
status: New → Incomplete
Revision history for this message
Tony Espy (awe) wrote :

Also please note, that there are a number of Ubuntu Core 20 projects that are pre-installing the network-manager snap from 20/stable, so my guess is that this problem is related to the system itself.

Revision history for this message
ahmet (ahozdemir) wrote :

Hi Tony;

I've also attached logs file before 3g modem was removed and after

Revision history for this message
ahmet (ahozdemir) wrote :

It has also attached dmesg logs

regards

Revision history for this message
ahmet (ahozdemir) wrote :

Manuel installed logs are attached

description: updated
Changed in snappy-hwe-snaps:
status: Incomplete → New
Revision history for this message
Alfonso Sanchez-Beato (alfonsosanchezbeato) wrote :

Maybe related to debian/patches/Force-online-state-with-unmanaged-devices.patch , which is the only part of the code that mentions /run/network/ifstate, and probably it is not needed for the UC case.

Revision history for this message
Bugra Aydogar (bugraaydogar) wrote :

In the good case scenario, the similar error message is also seen so I don't directly expect it is caused by the mentioned patch.

Revision history for this message
Alfonso Sanchez-Beato (alfonsosanchezbeato) wrote :

It would be useful to get the debug trace from NM (https://ubuntu.com/core/docs/networkmanager/snap-configuration/debug). As the problem happens when bundling NM in the image, the option would need to be set in gadget.yaml with something like:

defaults:
  RmBXKl6HO6YOC2DE4G2q1JzWImC04EUy:
    debug:
      enable: true

(RmBXKl6HO6YOC2DE4G2q1JzWImC04EUy is NM's snap ID)

Revision history for this message
Alfonso Sanchez-Beato (alfonsosanchezbeato) wrote :

FTR, I tried to reproduce by creating an image with NM+MM and running on a system with a modem connected, but was not lucky enough to reproduce.

Revision history for this message
Mehmet TURPÇU (mehmetturpcu) wrote :

Alfonso, we may give you a remote connection to device, if you want to check the problem.

Revision history for this message
Bugra Aydogar (bugraaydogar) wrote :

Hi Mehmet,

Please provide as much as logs and information possible. It is not a viable option to remotely connect your environment.

Thanks.

Revision history for this message
Bugra Aydogar (bugraaydogar) wrote :

Hi all,

I have collected some additional debug logs from the network-manager. Please find it in the attachment.

In addition to that, if I populate the user/system-user by cloud-init rather than adding with USB auto-import, the problem is not visible. It also proves that there is a kind of race condition and system behavior.

Revision history for this message
Bugra Aydogar (bugraaydogar) wrote :

Please also see the attached dmesg logs.

Revision history for this message
Bugra Aydogar (bugraaydogar) wrote (last edit ):

Core dump is attached and backtrace is;
```
Thread 1 "NetworkManager" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 13139.13139]
0x00007f08b4e885f4 in g_str_hash () from target:/lib/x86_64-linux-gnu/libglib-2.0.so.0
(gdb) bt
#0 0x00007f08b4e885f4 in g_str_hash () from target:/lib/x86_64-linux-gnu/libglib-2.0.so.0
#1 0x00007f08b4e8773c in g_hash_table_lookup () from target:/lib/x86_64-linux-gnu/libglib-2.0.so.0
#2 0x00007f08b51d34cb in netplan_delete_connection () from target:/lib/x86_64-linux-gnu/libnetplan.so.0.0
#3 0x000055d2d91132f9 in delete_connection (plugin=0x55d2da712400, storage_x=<optimized out>, error=<optimized out>) at src/settings/plugins/keyfile/nms-keyfile-plugin.c:1029
#4 0x000055d2d910fe22 in nm_settings_delete_connection (self=0x55d2da6f4000, sett_conn=sett_conn@entry=0x55d2da6e4540, allow_add_to_no_auto_default=allow_add_to_no_auto_default@entry=0)
    at src/settings/nm-settings-storage.h:57
#5 0x000055d2d928aef1 in nm_settings_connection_delete (self=self@entry=0x55d2da6e4540, allow_add_to_no_auto_default=allow_add_to_no_auto_default@entry=0) at src/settings/nm-settings-connection.c:612
#6 0x000055d2d912d4b0 in _delete_volatile_connection_do (connection=0x55d2da6e4540, self=0x55d2da6f0010) at src/nm-manager.c:837
#7 _delete_volatile_connection_do (self=0x55d2da6f0010, connection=0x55d2da6e4540) at src/nm-manager.c:818
#8 0x000055d2d912d53b in _delete_volatile_connection_all (self=0x55d2da6f0010, do_delete=do_delete@entry=1) at src/nm-manager.c:2198
#9 0x000055d2d912d55c in _delete_volatile_connection_cb (user_data=<optimized out>) at src/nm-manager.c:2209
#10 0x00007f08b4e9a04e in g_main_context_dispatch () from target:/lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x00007f08b4e9a400 in ?? () from target:/lib/x86_64-linux-gnu/libglib-2.0.so.0
#12 0x00007f08b4e9a6f3 in g_main_loop_run () from target:/lib/x86_64-linux-gnu/libglib-2.0.so.0
#13 0x000055d2d90fbd5a in main (argc=<optimized out>, argv=<optimized out>) at src/main.c:456
(gdb)
```

Revision history for this message
Bugra Aydogar (bugraaydogar) wrote :

In additon to that, in the initial boot, while network-manager was active, I was not able to connect to the device with ssh. I had to manually bring up the enp1s0 interface by running;
`nmcli con add connection.interface-name enp1s0 type ethernet`. Just as a side note in case it might be useful.

systemuser@ubuntu:/etc/netplan$ ls
00-default-nm-renderer.yaml 00-snapd-config.yaml 90-NM-3be932e2-b57c-4c2a-aba7-970b0513bf81.yaml
systemuser@ubuntu:/etc/netplan$ cat 00-default-nm-renderer.yaml
network:
  renderer: NetworkManager
systemuser@ubuntu:/etc/netplan$ cat 00-snapd-config.yaml
# This is the initial network config.
# It can be overwritten by cloud-init or console-conf.
network:
    version: 2
    ethernets:
        all-en:
            match:
                name: "en*"
            dhcp4: true
        all-eth:
            match:
                name: "eth*"
            dhcp4: true
systemuser@ubuntu:/etc/netplan$ cat 90-NM-3be932e2-b57c-4c2a-aba7-970b0513bf81.yaml
network:
  version: 2
  ethernets:
    NM-3be932e2-b57c-4c2a-aba7-970b0513bf81:
      renderer: NetworkManager
      match:
        name: "enp1s0"
      dhcp4: true
      dhcp6: true
      ipv6-address-generation: "stable-privacy"
      wakeonlan: true
      networkmanager:
        uuid: "3be932e2-b57c-4c2a-aba7-970b0513bf81"
        name: "ethernet-enp1s0"
        passthrough:
          connection.permissions: ""
          ethernet.mac-address-blacklist: ""
          ipv4.dns-search: ""
          ipv6.dns-search: ""
          proxy._: ""
systemuser@ubuntu:/etc/netplan$

Revision history for this message
Lukas Märdian (slyon) wrote (last edit ):

I think I was able to reproduce that backtrace in my local test-container (although this is not a full reproducer). This should probably fix the problem: https://git.launchpad.net/~slyon/snappy-hwe-snaps/+git/network-manager/commit/?id=2144bc3d731412694e804679fe35a048d2e2f804

I squeezed that commit into the pending MP at https://code.launchpad.net/~slyon/snappy-hwe-snaps/+git/network-manager/+merge/415913

Most probably NM tires to delete a non-netplan connection (maybe volatile connection profile?) for some reason. Therefor, libnetplan cannot extract a netplan_id and passes NULL into the hash-table, which leads to a segfault. The above mentioned commit checks for this condition and avoids it.

Revision history for this message
Bugra Aydogar (bugraaydogar) wrote :

Hi,

I can confirm that the suggested changes are fixing the segmentation fault and network-manager is always up an running across network manager service restarts and system reboots.

Thanks

Revision history for this message
Bugra Aydogar (bugraaydogar) wrote :

The bug is fixed with the new network-manager release version 1.22.10-11

Thanks

Changed in snappy-hwe-snaps:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.