dbus timeout-ed during an upgrade, taking services down including gdm

Bug #1871538 reported by Christian Ehrhardt 
48
This bug affects 6 people
Affects Status Importance Assigned to Milestone
D-Bus
New
Unknown
systemd
New
Unknown
accountsservice (Ubuntu)
Invalid
Undecided
Unassigned
Focal
Invalid
Undecided
Unassigned
Groovy
Invalid
Undecided
Unassigned
Hirsute
Invalid
Undecided
Unassigned
Impish
Invalid
Undecided
Unassigned
Jammy
Invalid
Undecided
Unassigned
dbus (Ubuntu)
Invalid
High
Unassigned
Focal
Invalid
High
Unassigned
Groovy
Invalid
Undecided
Unassigned
Hirsute
Won't Fix
Undecided
Unassigned
Impish
Invalid
High
Unassigned
Jammy
Invalid
High
Unassigned
gnome-shell (Ubuntu)
Invalid
Undecided
Unassigned
Focal
Invalid
Undecided
Unassigned
Groovy
Invalid
Undecided
Unassigned
Hirsute
Invalid
Undecided
Unassigned
Impish
Invalid
Undecided
Unassigned
Jammy
Invalid
Undecided
Unassigned
systemd (Ubuntu)
Fix Released
Undecided
Lukas Märdian
Focal
Fix Released
Undecided
Unassigned
Impish
Fix Released
Undecided
Unassigned
Jammy
Fix Released
Undecided
Lukas Märdian

Bug Description

[Impact]

 * There's currently a deadlock between PID 1 and dbus-daemon: in some
cases dbus-daemon will do NSS lookups (which are blocking) at the same
time PID 1 synchronously blocks on some call to dbus-daemon (e.g. `GetConnectionUnixUser` DBus call). Let's break
that by setting SYSTEMD_NSS_DYNAMIC_BYPASS=1 env var for dbus-daemon,
which will disable synchronously blocking varlink calls from nss-systemd
to PID 1.

 * This can lead to delayed boot times

 * It can also lead to dbus-daemon being killed/re-started, taking down other services with it, like GDM, killing user sessions on the way (e.g. on installing updates)

[Test Plan]

 * This bug is really hard to reproduce, as can be seen from the multi-year long discussion at https://github.com/systemd/systemd/issues/15316

 * Canonical's CPC team has the ability to reproduce this issue (with a relatively high probability) in their Azure test environment, due to the specific setup they are using

 * So our test plan is to ask CPC (@gjolly) for confirmation if the issue is fixed.

[Where problems could occur]

 * This fix touches the communication between systemd and dbus daemon, especially the NSS lookup, so if something is broken the (user-)name resolution could be broken.

 * As a workaround dbus-daemon could be replaced by dbus-broker, which never showed this issue or the behaviour could be changed back by using the `SYSTEMD_NSS_DYNAMIC_BYPASS` env variable, like this:

#/etc/systemd/system/dbus.service.d/override.conf
[Service]
Environment=SYSTEMD_NSS_DYNAMIC_BYPASS=0

[Other Info]

 * Fixed upstream (v251) in https://github.com/systemd/systemd/pull/22552

=== Original Description ===

This morning I found my computer on the login screen.
But not the one of the screen log, no a new one - so something must have crashed.

Logging in again confirmed that all apps were gone and the gnome shell was brought down what seems like triggered by a background update o accountsservice.

As always things are not perfectly clear :-/
The following goes *back* in time through my logs one by one.

Multiple apps crashed at 06:09, but we will find later that this is a follow on issue of the underlying gnome/X/... recycling.
-rw-r----- 1 paelzer whoopsie 52962868 Apr 8 06:09 _usr_bin_konversation.1000.crash
-rw-r----- 1 paelzer whoopsie 986433 Apr 8 06:09 _usr_lib_x86_64-linux-gnu_libexec_drkonqi.1000.crash

rdkit was failing fast and giving up (that will be a different bug, it just seems broken on my system):
Apr 08 06:10:13 Keschdeichel systemd[1]: Started RealtimeKit Scheduling Policy Service.
Apr 08 06:10:13 Keschdeichel rtkit-daemon[1729333]: Successfully called chroot.
Apr 08 06:10:13 Keschdeichel rtkit-daemon[1729333]: Successfully dropped privileges.
Apr 08 06:10:13 Keschdeichel rtkit-daemon[1729333]: Successfully limited resources.
Apr 08 06:10:13 Keschdeichel rtkit-daemon[1729333]: pthread_create failed: Resource temporarily unavailable
Apr 08 06:10:13 Keschdeichel rtkit-daemon[1729333]: Canary thread running.
Apr 08 06:10:13 Keschdeichel rtkit-daemon[1729333]: Exiting canary thread.
Apr 08 06:10:13 Keschdeichel rtkit-daemon[1729333]: Demoting known real-time threads.
Apr 08 06:10:13 Keschdeichel rtkit-daemon[1729333]: Demoted 0 threads.
Apr 08 06:10:13 Keschdeichel systemd[1]: rtkit-daemon.service: Main process exited, code=exited, status=1/FAILURE
Apr 08 06:10:13 Keschdeichel systemd[1]: rtkit-daemon.service: Failed with result 'exit-code'.
Apr 08 06:10:13 Keschdeichel dbus-daemon[1208]: [system] Activating via systemd: service name='org.freedesktop.RealtimeKit1' unit='rtkit-daemon.service' requested by ':1.1176' (uid=121 pid=>
Apr 08 06:10:13 Keschdeichel systemd[1]: rtkit-daemon.service: Start request repeated too quickly.
Apr 08 06:10:13 Keschdeichel systemd[1]: rtkit-daemon.service: Failed with result 'exit-code'.
Apr 08 06:10:13 Keschdeichel systemd[1]: Failed to start RealtimeKit Scheduling Policy Service.
Apr 08 06:10:13 Keschdeichel bluetoothd[1729331]: Bluetooth daemon 5.53

But that already was only triggered by a gnome restart that kicked of earlier:

Apr 08 06:09:27 Keschdeichel systemd[1726656]: Started GNOME Shell on Wayland.
Apr 08 06:09:27 Keschdeichel systemd[1726656]: Reached target GNOME Shell on Wayland.
Apr 08 06:09:27 Keschdeichel systemd[1726656]: Reached target GNOME Session is initialized.
Apr 08 06:09:27 Keschdeichel systemd[1726656]: Reached target GNOME Wayland Session.
Apr 08 06:09:27 Keschdeichel systemd[1726656]: Reached target GNOME Session (session: gnome-login).

X was recycleing before:
Apr 08 06:09:19 Keschdeichel systemd[10683]: Stopping GNOME Shell on X11...
...
Apr 08 06:09:22 Keschdeichel /usr/lib/gdm3/gdm-x-session[10710]: (EE) systemd-logind: ReleaseControl failed: Unknown object '/org/freedesktop/login1/session/_32'.
Apr 08 06:09:22 Keschdeichel /usr/lib/gdm3/gdm-x-session[10710]: (II) Server terminated successfully (0). Closing log file.

It seems like some internal service broke and everything that followed was a secondary issue to that:
Apr 08 06:09:19 Keschdeichel systemd[1]: NetworkManager.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 08 06:09:19 Keschdeichel systemd[1]: wpa_supplicant.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 08 06:09:19 Keschdeichel systemd[1]: thermald.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 08 06:09:19 Keschdeichel thermald[1256]: [WARN]Terminating ...
Apr 08 06:09:19 Keschdeichel avahi-daemon[1204]: Got SIGTERM, quitting.
Apr 08 06:09:19 Keschdeichel systemd[1]: udisks2.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 08 06:09:19 Keschdeichel ModemManager[1308]: <info> Caught signal, shutting down...
Apr 08 06:09:19 Keschdeichel systemd[1]: switcheroo-control.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 08 06:09:19 Keschdeichel avahi-daemon[1204]: Leaving mDNS multicast group on interface vnet0.IPv6 with address fe80::fc54:ff:fe78:26c2.
Apr 08 06:09:19 Keschdeichel systemd[1]: accounts-daemon.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 08 06:09:19 Keschdeichel avahi-daemon[1204]: Leaving mDNS multicast group on interface strswanbr2.IPv4 with address 192.168.2.0.
Apr 08 06:09:19 Keschdeichel systemd[1]: gdm.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 08 06:09:19 Keschdeichel avahi-daemon[1204]: Leaving mDNS multicast group on interface virbr0.IPv4 with address 192.168.122.1.
Apr 08 06:09:19 Keschdeichel systemd[1]: systemd-logind.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 08 06:09:19 Keschdeichel avahi-daemon[1204]: Leaving mDNS multicast group on interface strswanbr1.IPv4 with address 192.168.1.0.
Apr 08 06:09:19 Keschdeichel systemd[1]: systemd-machined.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 08 06:09:19 Keschdeichel avahi-daemon[1204]: Leaving mDNS multicast group on interface lxdbr0.IPv6 with address fd42:fa18:c923:35d5::1.
Apr 08 06:09:19 Keschdeichel systemd-machined[1255]: Failed to drop reference to machine scope, ignoring: Transport endpoint is not connected
Apr 08 06:09:19 Keschdeichel avahi-daemon[1204]: Leaving mDNS multicast group on interface lxdbr0.IPv4 with address 10.253.194.1.
Apr 08 06:09:19 Keschdeichel systemd[1]: switcheroo-control.service: Succeeded.
Apr 08 06:09:19 Keschdeichel avahi-daemon[1204]: Leaving mDNS multicast group on interface wlp4s0.IPv6 with address fe80::6832:c873:a074:2c7b.
Apr 08 06:09:19 Keschdeichel systemd[1]: systemd-machined.service: Succeeded.
Apr 08 06:09:19 Keschdeichel bluetoothd[1724347]: Terminating
Apr 08 06:09:19 Keschdeichel systemd[1]: avahi-daemon.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 08 06:09:19 Keschdeichel avahi-daemon[1204]: Leaving mDNS multicast group on interface wlp4s0.IPv4 with address 10.7.0.42.
Apr 08 06:09:19 Keschdeichel systemd[1]: ModemManager.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 08 06:09:19 Keschdeichel avahi-daemon[1204]: Leaving mDNS multicast group on interface enp0s31f6.IPv6 with address fe80::80a8:743:6b37:e561.
Apr 08 06:09:19 Keschdeichel systemd[1]: polkit.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 08 06:09:19 Keschdeichel avahi-daemon[1204]: Leaving mDNS multicast group on interface enp0s31f6.IPv4 with address 10.7.0.69.
Apr 08 06:09:19 Keschdeichel systemd[1]: bolt.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 08 06:09:19 Keschdeichel avahi-daemon[1204]: Leaving mDNS multicast group on interface lo.IPv6 with address ::1.
Apr 08 06:09:19 Keschdeichel systemd[1]: rtkit-daemon.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 08 06:09:19 Keschdeichel avahi-daemon[1204]: Leaving mDNS multicast group on interface lo.IPv4 with address 127.0.0.1.
Apr 08 06:09:19 Keschdeichel systemd[1]: packagekit.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 08 06:09:19 Keschdeichel wpa_supplicant[1265]: nl80211: deinit ifname=p2p-dev-wlp4s0 disabled_11b_rates=0
Apr 08 06:09:19 Keschdeichel systemd[1]: colord.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 08 06:09:19 Keschdeichel systemd[1]: upower.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 08 06:09:19 Keschdeichel systemd[1]: bluetooth.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 08 06:09:19 Keschdeichel systemd[1]: fwupd.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 08 06:09:19 Keschdeichel xdg-desktop-por[11011]: xdg-desktop-portal-gtk: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.
Apr 08 06:09:19 Keschdeichel NetworkManager[1209]: <info> [1586318959.7797] caught SIGTERM, shutting down normally.
Apr 08 06:09:19 Keschdeichel avahi-daemon[1204]: avahi-daemon 0.7 exiting.
Apr 08 06:09:19 Keschdeichel systemd[1]: rtkit-daemon.service: Succeeded.
Apr 08 06:09:19 Keschdeichel systemd[1]: Starting Process error reports when automatic reporting is enabled...
Apr 08 06:09:19 Keschdeichel systemd[1]: colord.service: Succeeded.
Apr 08 06:09:19 Keschdeichel systemd[10683]: Stopped target GNOME X11 Session (session: ubuntu).
Apr 08 06:09:19 Keschdeichel systemd[10683]: Stopped target Current graphical user session.
Apr 08 06:09:19 Keschdeichel systemd[10683]: unicast-local-avahi.path: Succeeded.
Apr 08 06:09:19 Keschdeichel systemd[10683]: Stopped Path trigger for Avahi .local domain notifications.
Apr 08 06:09:19 Keschdeichel systemd[10683]: update-notifier-release.path: Succeeded.
Apr 08 06:09:19 Keschdeichel gnome-shell[11036]: gnome-shell: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.

The last things before was systemd reload - maybe due to a background upgrade?

Apr 08 06:08:54 Keschdeichel dbus-daemon[1208]: Unknown group "power" in message bus configuration file
Apr 08 06:08:54 Keschdeichel dbus-daemon[1208]: [system] Reloaded configuration
Apr 08 06:08:54 Keschdeichel dbus-daemon[1208]: Unknown group "power" in message bus configuration file
Apr 08 06:08:54 Keschdeichel dbus-daemon[1208]: [system] Reloaded configuration
Apr 08 06:08:54 Keschdeichel dbus-daemon[1208]: Unknown group "power" in message bus configuration file
Apr 08 06:08:54 Keschdeichel systemd[1]: Reloading.

And indeed there was an upgrade going on:

40773 2020-04-08 06:08:46 status installed libc-bin:amd64 2.31-0ubuntu7
40774 2020-04-08 06:08:54 startup archives unpack
40775 2020-04-08 06:08:54 upgrade accountsservice:amd64 0.6.55-0ubuntu10 0.6.55-0ubuntu11
40776 2020-04-08 06:08:54 status half-configured accountsservice:amd64 0.6.55-0ubuntu10
40777 2020-04-08 06:08:54 status unpacked accountsservice:amd64 0.6.55-0ubuntu10
40778 2020-04-08 06:08:54 status half-installed accountsservice:amd64 0.6.55-0ubuntu10
40779 2020-04-08 06:08:54 status triggers-pending dbus:amd64 1.12.16-2ubuntu2
40780 2020-04-08 06:09:19 status unpacked accountsservice:amd64 0.6.55-0ubuntu11
40781 2020-04-08 06:09:20 upgrade libaccountsservice0:amd64 0.6.55-0ubuntu10 0.6.55-0ubuntu11
40782 2020-04-08 06:09:20 status triggers-pending libc-bin:amd64 2.31-0ubuntu7
40783 2020-04-08 06:09:20 status half-configured libaccountsservice0:amd64 0.6.55-0ubuntu10
40784 2020-04-08 06:09:20 status unpacked libaccountsservice0:amd64 0.6.55-0ubuntu10
40785 2020-04-08 06:09:20 status half-installed libaccountsservice0:amd64 0.6.55-0ubuntu10
40786 2020-04-08 06:09:20 status unpacked libaccountsservice0:amd64 0.6.55-0ubuntu11

Nothing suspicious before that time.

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: accountsservice 0.6.55-0ubuntu11
ProcVersionSignature: Ubuntu 5.3.0-45.37~18.04.1-generic 5.3.18
Uname: Linux 5.3.0-45-generic x86_64
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
ApportVersion: 2.20.11-0ubuntu24
Architecture: amd64
CurrentDesktop: ubuntu:GNOME
Date: Wed Apr 8 07:51:03 2020
InstallationDate: Installed on 2018-10-12 (543 days ago)
InstallationMedia: Ubuntu 18.04.1 LTS "Bionic Beaver" - Release amd64 (20180725)
SourcePackage: accountsservice
UpgradeStatus: Upgraded to focal on 2020-04-03 (4 days ago)

Related branches

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
tags: added: champagne
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Most likely this was bug 1843982. So let's use that for now.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Another crash just happened:

Apr 08 10:28:06 Keschdeichel systemd[1]: Reloading.
Apr 08 10:28:06 Keschdeichel systemd[1]: /lib/systemd/system/dbus.socket:5: ListenStream= references a path below legacy directory /var/run/, updating /var/run/dbus/system_bus_socket → /run>
Apr 08 10:28:17 Keschdeichel dbus-daemon[1726352]: [session uid=1000 pid=1726352] Activating via systemd: service name='org.freedesktop.Tracker1' unit='tracker-store.service' requested by '>
Apr 08 10:28:17 Keschdeichel systemd[10683]: Starting Tracker metadata database store and lookup manager...
Apr 08 10:28:17 Keschdeichel dbus-daemon[1726352]: [session uid=1000 pid=1726352] Successfully activated service 'org.freedesktop.Tracker1'
Apr 08 10:28:17 Keschdeichel systemd[10683]: Started Tracker metadata database store and lookup manager.
Apr 08 10:28:32 Keschdeichel systemd[1]: We couldn't coldplug machine-qemu\x2d4\x2df\x2drtkit.scope, proceeding anyway: Connection timed out
Apr 08 10:28:32 Keschdeichel systemd[1]: NetworkManager.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 08 10:28:32 Keschdeichel avahi-daemon[1726370]: Got SIGTERM, quitting.
Apr 08 10:28:32 Keschdeichel systemd[1]: wpa_supplicant.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 08 10:28:32 Keschdeichel ModemManager[1727115]: <info> Caught signal, shutting down...
Apr 08 10:28:32 Keschdeichel systemd[1]: thermald.service: Unexpected error response from GetNameOwner(): Connection terminated

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi Daniel,
none of the crashes that I had has the same signature as those that are reported on the dup.
Furthermore as I outlined the crashes seem to be secondary issues after soemthing breaks and recycles gnome-shell.

I'd ask for re-triage as that doesn't seem to be the same thing to me.

Revision history for this message
Sebastien Bacher (seb128) wrote :

@Christian, could you add the journalctl log from that session, that might include some hints. Also do you have any gnome-shell/gdm crash collected in /var/crash?

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It sounds like some part of the system has crashed. To help us find the cause of the crash please follow these steps:

1. Look in /var/crash for crash files and if found run:
    ubuntu-bug YOURFILE.crash
Then tell us the ID of the newly-created bug.

2. If step 1 failed then look at https://errors.ubuntu.com/user/ID where ID is the content of file /var/lib/whoopsie/whoopsie-id on the machine. Do you find any links to recent problems on that page? If so then please send the links to us.

3. If step 2 also failed then apply the workaround from bug 994921, reboot, reproduce the crash, and retry step 1.

Please take care to avoid attaching .crash files to bugs as we are unable to process them as file attachments. It would also be a security risk for yourself.

Changed in accountsservice (Ubuntu):
status: New → Incomplete
Changed in dbus (Ubuntu):
status: New → Incomplete
Changed in gnome-shell (Ubuntu):
status: New → Incomplete
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

> Also do you have any gnome-shell/gdm crash collected in /var/crash?

No, just these:
$ ll /var/crash/*.crash
-rw-r----- 1 paelzer whoopsie 3589735 Apr 6 08:34 /var/crash/_usr_bin_gnome-clocks.1000.crash
-rw-r----- 1 paelzer whoopsie 53170176 Apr 8 10:36 /var/crash/_usr_bin_konversation.1000.crash
-rw-r----- 1 paelzer whoopsie 986433 Apr 8 06:09 /var/crash/_usr_lib_x86_64-linux-gnu_libexec_drkonqi.1000.crash

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

> could you add the journalctl log from that session, that might include some hints

Sure attached here, you see in the initial report and the later comments the time indexes to look out for.

Also FYI for the rtkit issue that you will see in there => bug 1871543

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

This looks strongly like bug 1843982. If you disagree and still want a more thorough analysis then please proceed to step 2 in comment #6.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

> 1. Look in /var/crash for crash files and if found run:
> ubuntu-bug YOURFILE.crash
> Then tell us the ID of the newly-created bug.

Again, they all seem to be prior or follow on issues, but they already have IDs in the error tracker.

/var/crash/_usr_bin_gnome-clocks.1000.uploaded
a615fed4-77d0-11ea-a806-fa163ee63de6
/var/crash/_usr_bin_konversation.1000.uploaded
0e56c64e-7976-11ea-ab51-fa163e983629
/var/crash/_usr_lib_x86_64-linux-gnu_libexec_drkonqi.1000.uploaded
29cd6100-7976-11ea-a249-fa163e6cac46

They all show up in my personal error-list at:
https://errors.ubuntu.com/user/350ea932144813c645983ba6ebba1ab8a271bbbcc08853010a42e92011f8802f77829fce9a7ad953c6c31a499d176f6312a0148c01ef7facaaa87be74566a786

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Bug 1843982 is specifically about gnome-shell (and gdm) crashing when accountsservice gets restarted, which is what you mentioned at the top of this bug. So even if you are experiencing multiple bugs, I think that's the one that matches the main bug report here.

Revision history for this message
Sebastien Bacher (seb128) wrote :

@Daniel, that doesn't look like this bug, from the journal

Apr 08 06:09:19 Keschdeichel systemd[1]: We couldn't coldplug machine-qemu\x2d2\x2df\x2dnvidia.scope, proceedin
g anyway: Connection timed out
Apr 08 06:09:19 Keschdeichel systemd[1]: NetworkManager.service: Unexpected error response from GetNameOwner():
 Connection terminated
Apr 08 06:09:19 Keschdeichel systemd[1]: wpa_supplicant.service: Unexpected error response from GetNameOwner():
 Connection terminated

etc for a stack of dbus using system services including gdm

It looks like a problem at the systemd or dbus level, I'm going to mark it invalid for the desktop components

Changed in accountsservice (Ubuntu):
status: Incomplete → Invalid
Changed in gnome-shell (Ubuntu):
status: Incomplete → Invalid
Changed in dbus (Ubuntu):
importance: Undecided → High
Revision history for this message
Sebastien Bacher (seb128) wrote :

@Christian, dbus got reloaded a bunch of time during the upgrade according to the log, that's expected, what isn't is that it stopped responding.
I haven't seen other reports from that so far so it's probably an uncommon bug/issue hit there, I'm unsure how to debug it, especially after facts...

summary: - Gnome Shell crashed on background upgrade of accountsservice
+ dbus timeout-ed during an upgrade, taking services down including gdm
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Sure @seb128 - I'll let you know if it happens again (as I did a cleaning-reboot now).

Revision history for this message
Steve Beattie (sbeattie) wrote :

Hi, I had a similar crash this morning upgrading focal, after trying to get unattended-upgrades to stop spinning on missing focal-security apt list files. In this case, I don't use gnome-shell as my desktop environment, but it still tore down my entire desktop environment and caused gdm3 to respawn.

I've attached the journalctl log from when I started apt upgrade to when I successfully re-logged into my i3 session.

Revision history for this message
Steve Beattie (sbeattie) wrote :

Oh, and I have no crash files in /var/crash/.

tags: added: rls-ff-incoming
tags: added: id-5e988df7fb344884f67bc04f
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (8.8 KiB)

I hit that again today on an upgrade.
Network manager didn't come up again.

Mai 20 11:04:46 Keschdeichel systemd[1]: NetworkManager.service: Unexpected error response from GetNameOwner(): Connection terminated
Mai 20 11:04:46 Keschdeichel NetworkManager[1330]: <info> [1589965486.9491] caught SIGTERM, shutting down normally.
Mai 20 11:04:46 Keschdeichel NetworkManager[1330]: <info> [1589965486.9504] dhcp4 (enp0s31f6): canceled DHCP transaction
Mai 20 11:04:46 Keschdeichel NetworkManager[1330]: <info> [1589965486.9504] dhcp4 (enp0s31f6): state changed extended -> done
Mai 20 11:04:46 Keschdeichel NetworkManager[1330]: <info> [1589965486.9504] device (enp0s31f6): DHCPv4: trying to acquire a new lease within 90 seconds
Mai 20 11:04:46 Keschdeichel NetworkManager[1330]: <info> [1589965486.9733] dhcp4 (wlp4s0): canceled DHCP transaction
Mai 20 11:04:46 Keschdeichel NetworkManager[1330]: <info> [1589965486.9812] dhcp4 (wlp4s0): state changed bound -> done
Mai 20 11:04:46 Keschdeichel NetworkManager[1330]: <info> [1589965486.9813] device (wlp4s0): DHCPv4: trying to acquire a new lease within 90 seconds
Mai 20 11:04:46 Keschdeichel NetworkManager[1330]: <info> [1589965486.9817] device (p2p-dev-wlp4s0): state change: disconnected -> unmanaged (reason 'unmanaged', sys-iface-state: 'managed')
Mai 20 11:04:46 Keschdeichel NetworkManager[1330]: <info> [1589965486.9825] device (strswanbr2-nic): released from master device strswanbr2
Mai 20 11:04:46 Keschdeichel NetworkManager[1330]: <info> [1589965486.9829] device (strswanbr1-nic): released from master device strswanbr1
Mai 20 11:04:46 Keschdeichel NetworkManager[1330]: <info> [1589965486.9831] device (veth47ecccc5): released from master device lxdbr0
Mai 20 11:04:47 Keschdeichel NetworkManager[1330]: <info> [1589965487.0047] device (virbr0-nic): released from master device virbr0
Mai 20 11:04:47 Keschdeichel NetworkManager[1330]: <info> [1589965487.0050] device (vethcc7405c8): released from master device lxdbr0
Mai 20 11:04:47 Keschdeichel NetworkManager[1330]: <info> [1589965487.0069] device (vetha09a9db4): released from master device lxdbr0
Mai 20 11:04:47 Keschdeichel NetworkManager[1330]: <info> [1589965487.0077] device (veth754add92): released from master device lxdbr0
Mai 20 11:04:47 Keschdeichel NetworkManager[1330]: <info> [1589965487.0094] device (vethc060f004): released from master device virbr0
Mai 20 11:04:47 Keschdeichel NetworkManager[1330]: <info> [1589965487.0101] device (veth3152ecc9): released from master device lxdbr0
Mai 20 11:04:47 Keschdeichel NetworkManager[1330]: <info> [1589965487.0109] device (vethb6dd0569): released from master device lxdbr0
Mai 20 11:04:47 Keschdeichel NetworkManager[1330]: <info> [1589965487.1008] device (virbr0): bridge port vnet1 was detached
Mai 20 11:04:47 Keschdeichel NetworkManager[1330]: <info> [1589965487.1025] manager: NetworkManager state is now CONNECTED_SITE
Mai 20 11:04:47 Keschdeichel NetworkManager[1330]: <info> [1589965487.1030] device (virbr0): bridge port vnet0 was detached
Mai 20 11:04:47 Keschdeichel NetworkManager[1330]: <info> [1589965487.1629] exiting (success)
Mai 20 11:04:47 Keschdeichel systemd[1]: NetworkManager...

Read more...

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for dbus (Ubuntu Focal) because there has been no activity for 60 days.]

Changed in dbus (Ubuntu Focal):
status: Incomplete → Expired
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for dbus (Ubuntu) because there has been no activity for 60 days.]

Changed in dbus (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Steve Beattie (sbeattie) wrote :
Download full text (5.0 KiB)

I also hit this again in focal on 2020-06-25, with an update to systemd 245.4-4ubuntu3.1; I had previously updated dbus to 1.12.16-2ubuntu2.1 on 2020-06-17 without event. It's still an issue at least with updates to systemd in focal.

Similar messages end up in the journal:

Jun 25 13:04:55 kryten dbus-daemon[1541]: Unknown group "power" in message bus configuration file
Jun 25 13:04:55 kryten dbus-daemon[1541]: [system] Reloaded configuration
Jun 25 13:04:55 kryten dbus-daemon[1541]: Unknown group "power" in message bus configuration file
Jun 25 13:04:55 kryten dbus-daemon[1541]: [system] Reloaded configuration
Jun 25 13:05:09 kryten dbus-daemon[1541]: Unknown group "power" in message bus configuration file
Jun 25 13:05:09 kryten dbus-daemon[1541]: [system] Reloaded configuration
Jun 25 13:05:09 kryten dbus-daemon[1541]: Unknown group "power" in message bus configuration file
Jun 25 13:05:09 kryten dbus-daemon[1541]: [system] Reloaded configuration
Jun 25 13:05:09 kryten dbus-daemon[1541]: Unknown group "power" in message bus configuration file
Jun 25 13:05:09 kryten dbus-daemon[1541]: [system] Reloaded configuration
Jun 25 13:05:09 kryten dbus-daemon[1541]: Unknown group "power" in message bus configuration file
Jun 25 13:05:09 kryten dbus-daemon[1541]: [system] Reloaded configuration
Jun 25 13:05:09 kryten dbus-daemon[1541]: Unknown group "power" in message bus configuration file
Jun 25 13:05:09 kryten dbus-daemon[1541]: [system] Reloaded configuration
Jun 25 13:05:10 kryten dbus-daemon[1541]: Unknown group "power" in message bus configuration file
Jun 25 13:05:10 kryten dbus-daemon[1541]: [system] Reloaded configuration
Jun 25 13:05:10 kryten dbus-daemon[1541]: Unknown group "power" in message bus configuration file
Jun 25 13:05:10 kryten dbus-daemon[1541]: [system] Reloaded configuration
Jun 25 13:05:10 kryten dbus-daemon[1541]: Unknown group "power" in message bus configuration file
Jun 25 13:05:10 kryten systemd[1]: Reloading.
Jun 25 13:05:11 kryten systemd[1]: /lib/systemd/system/dbus.socket:5: ListenStream= references a path below legacy directory /var/run/, updating /var/run/dbus/system_bus_socket → /run/dbus/system_bus_socket; please update the unit file accordingly.
Jun 25 13:05:11 kryten systemd[1]: /lib/systemd/system/fancontrol.service:11: PIDFile= references a path below legacy directory /var/run/, updating /var/run/fancontrol.pid → /run/fancontrol.pid; please update the unit file accordingly.
Jun 25 13:05:36 kryten systemd[1]: We couldn't coldplug machine-qemu\x2d1\x2dkeybase\x2dbionic\x2damd64.scope, proceeding anyway: Connection timed out
Jun 25 13:05:36 kryten dbus-daemon[1541]: [system] Reloaded configuration
Jun 25 13:05:36 kryten audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=rtkit-daemon comm="systemd" exe="/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jun 25 13:05:36 kryten audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=switcheroo-control comm="systemd" exe="/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jun 25 13:05:36 kryten audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=polkit comm="systemd" exe...

Read more...

Changed in dbus (Ubuntu):
status: Expired → Confirmed
Changed in dbus (Ubuntu Focal):
status: Expired → Confirmed
Changed in systemd:
status: Unknown → New
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

The last two updates went smooth for me, not triggering it.
I'm still too afraid to "just update while working" as I did in the past since it could drag things down thou :-/
I was feeling like this could be solved, but reading that Steve is also affected I'm worried that I might just not hit the same race/trigger by accident.

Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :

I've a similar issue on Groovy. I got it 2 days in a row after my daily package update.
Reverting the machine to a state before the upgrade and re-upgrading the same packages doesn't reproduce the issue. However, when it worked with the same package update the message "Unexpected error response from GetNameOwner()" is not present in the logs.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Hm, so is desktop working on this? Have we figured out at all what is happening? Or just the general symptoms that dbus seems to be non-responsive and then everything goes bad?

Changed in dbus (Ubuntu):
status: Confirmed → Incomplete
Changed in dbus (Ubuntu Focal):
status: Confirmed → Incomplete
Revision history for this message
Sebastien Bacher (seb128) wrote :

@Dimitri desktop isn't working on it at this point, unsure how to figure out where the issue is exactly but it sounds like it could be either dbus or systemd which are owned by foundations

Revision history for this message
Lukas Märdian (slyon) wrote :

By chance (while developing some netplan code), I seem to have found a reproducer for this bug... I don't know what is going on, but calling the 'Trigger' method from the attached 'dbus.c' file will kill the GDM session in a reproducible way.

The invalid `kill(-1, SIGINT);` of line 21, seems to somehow crash the whole DBus/GDM and other services. So there must be some problem inside DBus's error handling, which is somehow triggered during the updates.

 * WARNING: calling the Trigger method via busctl will kill your GDM session
 *
 * compile: gcc -o out dbus.c `pkg-config --cflags --libs libsystemd glib-2.0`
 * execute: sudo ./out
 * trigger: sudo busctl call io.netplan.Netplan /io/netplan/Netplan io.netplan.Netplan Trigger

Revision history for this message
Julian Andres Klode (juliank) wrote :

kill(-1, SIGINT) kills all processes on the system with SIGINT, so this is hardly surprising.

Revision history for this message
Julian Andres Klode (juliank) wrote :

Well, more correctly -1 means to all processes you can send to, and if you're root, well it's everyone.

Revision history for this message
Iain Lane (laney) wrote :

Yeah, sorry, I don't think that's a reproducer for the bug. You killed dbus, gdm and everything else, and they obligingly died on you. :)

My feeling is that this bug is about a deadlock that somehow happens when dbus and/or systemd are reloaded, possibly at the same time. It looks like dbus stops responding to systemd, and so when it calls GetNameOwner() to be able to watch type=dbus units, it gets no reply and then decides to kill all type=dbus units, which is a lot of things. If that's right, it's most likely to be a dbus bug that it can get into this state. But I'm personally at a bit of a loss as to how to approach it without being able to reproduce.

Revision history for this message
Lukas Märdian (slyon) wrote :

Oh.. I wasn't really aware of that (-1 = all) and it looked so similar.

Right, so we still somehow need to find a proper reproducer...

tags: removed: rls-ff-incoming
tags: removed: champagne
Revision history for this message
Gauthier Jolly (gjolly) wrote :

I seems that we are seeing this issue on Azure with Ubuntu >=20.04. The CPC team is running a series of tests before publishing images, those tests spin up ~50 instances in Azure and usually one of them is having this issue.

You can find the system logs of a failing instance here: https://pastebin.ubuntu.com/p/WC4M7JdTGk/ in which you can read the following:

Jun 3 06:57:12 alan-focal-base-uuisspbroykxijgbixkk systemd[1]: accounts-daemon.service: Unexpected error response from GetNameOwner(): Connection terminated
Jun 3 06:57:12 alan-focal-base-uuisspbroykxijgbixkk systemd[1]: systemd-logind.service: Unexpected error response from GetNameOwner(): Connection terminated
Jun 3 06:57:12 alan-focal-base-uuisspbroykxijgbixkk systemd[1]: polkit.service: Unexpected error response from GetNameOwner(): Connection terminated

I built a custom hirsute image with dbus-broker installed and I was unable to reproduce the issue with it.

Revision history for this message
Iain Lane (laney) wrote (last edit ):

That's interesting actually. I thought this was an interaction between reloading dbus-daemon and daemon-reloading systems at the same time. But in this cloud case, I don't think dbus is being reloaded, is it?

It's going to be next to impossible for me to look into this in detail without being able to reproduce it or getting a bit more debug information.

Maybe you could crank up the systemd log level and try to reproduce this?

Lukas Märdian (slyon)
tags: added: rls-ff-incoming rls-hh-incoming
tags: added: rls-ii-incoming
Revision history for this message
Gauthier Jolly (gjolly) wrote :

@laney I built a custom image with system logs level set to debug and I was able to reproduce the issue. You can find the logs attached.

Revision history for this message
Gauthier Jolly (gjolly) wrote :

Also, systemd is actually reloaded by cloud-init. In cloud-init logs, I can read:

2021-06-16 08:54:51,608 - subp.py[DEBUG]: Running command ['systemctl', 'daemon-reload'] with allowed return codes [0] (shell=False, capture=True)
2021-06-16 08:54:51,953 - cc_mounts.py[DEBUG]: Activate mounts: PASS:systemctl daemon-reload
2021-06-16 08:54:51,953 - handlers.py[DEBUG]: finish: init-network/config-mounts: SUCCESS: config-mounts ran successfully

Revision history for this message
Iain Lane (laney) wrote :

Thanks! Here's where I think the problem starts:

Ok, let's authenticate to dbus:

Jun 16 08:54:54 alan-hirsute-base-aiamcicscciaelhaktpo systemd[1]: Bus bus-api-system: changing state AUTHENTICATING → HELLO
...
Jun 16 08:54:54 alan-hirsute-base-aiamcicscciaelhaktpo systemd[1]: Bus bus-api-system: changing state HELLO → RUNNING

Right, systemd thinks it's done it at least. You can see a whole bunch of messages being returned, the connection is working.

After a while (note this first timestamp jump; is that significant? looks to be waiting for udev to settle I think so maybe not; could be snapd related) we decide to do something with a Type=dbus unit:

Jun 16 08:55:25 alan-hirsute-base-aiamcicscciaelhaktpo systemd[1]: Sent message type=method_call sender=n/a destination=org.freedesktop.DBus path=/org/freedesktop/DBus interface=org.freedesktop.DBus member=AddMatch cookie=1511 reply_cookie=0 signature=s error-name=n/a error-message=n/a
Jun 16 08:55:25 alan-hirsute-base-aiamcicscciaelhaktpo systemd[1]: Sent message type=method_call sender=n/a destination=org.freedesktop.DBus path=/org/freedesktop/DBus interface=org.freedesktop.DBus member=GetNameOwner cookie=1512 reply_cookie=0 signature=s error-name=n/a error-message=n/a

And another timestamp jump, waiting for this timeout:

Jun 16 08:55:45 alan-hirsute-base-aiamcicscciaelhaktpo dbus-daemon[711]: [system] Connection has not authenticated soon enough, closing it (auth_timeout=30000ms, elapsed: 45129ms)
Jun 16 08:55:45 alan-hirsute-base-aiamcicscciaelhaktpo systemd[1]: Got message type=method_return sender=org.freedesktop.DBus destination=:1.4 path=n/a interface=n/a member=n/a cookie=42 reply_cookie=1512 signature=s error-name=n/a error-message=n/a

The dbus-daemon didn't think that the connection was authenticated.

The last message AFAICS received on the bus connection was:

Jun 16 08:55:00 alan-hirsute-base-aiamcicscciaelhaktpo systemd[1]: Got message type=method_call sender=n/a destination=org.freedesktop.systemd1 path=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=Reload cookie=1 reply_cookie=0 signature=n/a error-name=n/a error-message=n/a
Jun 16 08:55:00 alan-hirsute-base-aiamcicscciaelhaktpo systemd[1]: Reloading.

No smoking gun yet, at least not to me. But interesting that the last thing which happened successfully was reloading. Can we have another go please, with these changes vs. an upstream clean image?

  - LogLevel=debug in /etc/systemd/system.conf
  - Using the dbus 1.12.20-2ubuntu1+ppa1 from ppa:ci-train-ppa-service/4587. This dbus is just the one from impish backported to hirsute with verbose debugging turned on.

And then attach the full `journalctl -b <the boot that failed>` which will probably be quite large. :-)

Maybe we'll get some information out of the dbus side to say what it thinks goes wrong with the connection.

** Please note ** The "connection has not authenticated" string does not appear in the earlier logs from Christian or Steve. It could be that there are two bugs here, with similar symptoms that systemd loses its connection to the system bus.

Revision history for this message
Gauthier Jolly (gjolly) wrote :

Ok I will cook an other custom image and try to reproduce.

Concerning this log line:

Jun 16 08:55:45 alan-hirsute-base-aiamcicscciaelhaktpo dbus-daemon[711]: [system] Connection has not authenticated soon enough, closing it (auth_timeout=30000ms, elapsed: 45129ms)

Please note that during our first investigation with @slyon, we added this
<busconfig>
  <limit name="auth_timeout">90000</limit>
</busconfig>

in /etc/dbus-1/system-local.conf and this particular error disappeared. However the original issue ("Unexpected error response from GetNameOwner()" and slow critical-chain) was still there.

Revision history for this message
Iain Lane (laney) wrote :

I filed an upstream bug with D-Bus: https://gitlab.freedesktop.org/dbus/dbus/-/issues/337

Hopefully a maintainer there will be able to give a better opinion as to what's going on here

tags: added: fr-1460
tags: removed: rls-ff-incoming rls-hh-incoming rls-ii-incoming
Iain Lane (laney)
Changed in accountsservice (Ubuntu Groovy):
status: New → Invalid
Changed in accountsservice (Ubuntu Hirsute):
status: New → Invalid
Changed in gnome-shell (Ubuntu Groovy):
status: New → Invalid
Changed in gnome-shell (Ubuntu Hirsute):
status: New → Invalid
Changed in dbus (Ubuntu Groovy):
status: New → Incomplete
Changed in dbus (Ubuntu Hirsute):
status: New → Incomplete
Revision history for this message
Lukas Märdian (slyon) wrote :

An interesting observation I made is that all journal logs (in this LP bug and the upstream systemd bug) show the "polkit.service: Unexpected error response from GetNameOwner(): Connection terminated" message, in addition to a bunch of other "Connection terminated" errors from changing services across the different installations (NetworkManager.service, accounts-daemon.service, systemd-logind.service, ...). I've also seen logs (from Gauthier) where only polkit.service is failing.

The polkit.service "Unexpected error response from GetNameOwner()" message is a log from systemd effectively telling us that systemd cannot reach org.freedesktop.PolicyKit1 on DBus. But the root cause for that must be that polkitd is not running (or dbus is somehow stuck in a deadlock, not responding anymore). This gives a new twist to this issue, as we might be searching for a phantom in (lib-)systemd... PolicyKit is using glib's gdbus implementation.

Gauthier Jolly (gjolly)
tags: added: rls-ii-incoming
Revision history for this message
Lukas Märdian (slyon) wrote :

Addressed in FR-1460

tags: removed: rls-ii-incoming
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI triggered again for me due to unattended upgrades.
The time in the journal when things go down matches the unpack/configuee/install phase of
- accountsservice:amd64 0.6.55-0ubuntu12~20.04.5
- libaccountsservice0:amd64 0.6.55-0ubuntu12~20.04.5
- dbus:amd64 1.12.16-2ubuntu2.1

I - again - had the usual suspect entries:

systemd[1]: Starting Daily apt upgrade and clean activities...
dbus-daemon[2266]: [system] Reloaded configuration
dbus-daemon[2266]: Unknown group "power" in message bus configuration file
systemd[1]: Reloading.
systemd[1]: NetworkManager.service: Unexpected error response from GetNameOwner(): Connection terminated

I didn't see anything in the log we didn't have before, but for completeness I'm attaching journal of the issue and a few hours before it

Revision history for this message
Lukas Märdian (slyon) wrote :

I'm still suspecting dbus to be at fault here. We might need to bisect dbus in order to find out more about this, but that is hard without a reliable reproducer.

Revision history for this message
Lukas Märdian (slyon) wrote (last edit ):

There is some news about this issue at systemd:
https://github.com/systemd/systemd/issues/15316#issuecomment-1000842001

Could people try to place this systemd override config for dbus and observe if it avoids the issue?

```
$ cat /etc/systemd/system/dbus.service.d/override.conf
[Service]
Environment=SYSTEMD_NSS_DYNAMIC_BYPASS=1
```

Revision history for this message
Gauthier Jolly (gjolly) wrote :

This is now part of Azure's cloud images (focal to devel) and I can confirm that it fixes the issue.

Revision history for this message
Brian Murray (brian-murray) wrote :

The Hirsute Hippo has reached End of Life, so this bug will not be fixed for that release.

Changed in dbus (Ubuntu Hirsute):
status: Incomplete → Won't Fix
Lukas Märdian (slyon)
no longer affects: systemd (Ubuntu Groovy)
no longer affects: systemd (Ubuntu Hirsute)
Changed in dbus (Ubuntu Impish):
status: Incomplete → Invalid
Changed in dbus (Ubuntu Jammy):
status: Incomplete → Invalid
Changed in dbus (Ubuntu Groovy):
status: Incomplete → Invalid
Changed in dbus (Ubuntu Focal):
status: Incomplete → Invalid
Lukas Märdian (slyon)
description: updated
Lukas Märdian (slyon)
Changed in systemd (Ubuntu Jammy):
assignee: nobody → Lukas Märdian (slyon)
status: New → In Progress
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package systemd - 249.10-0ubuntu2

---------------
systemd (249.10-0ubuntu2) jammy; urgency=medium

  * Fix deadlock between pid1 and dbus-daemon (LP: #1871538)
    File: debian/patches/pid1-set-SYSTEMD_NSS_DYNAMIC_BYPASS-1-env-var-for-dbus-da.patch
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=100a7bfc2d0f82c3afbede04a022c655529cffac
  * Don't override Ubuntu's default sysctl values (LP: #1962038)
    File: debian/patches/debian/UBUNTU-Don-t-override-Ubuntu-s-default-sysctl-values-LP-1962038.patch
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=b10cdb7da3539b3a3c3e619b470a75c83e46ef11

 -- Lukas Märdian <email address hidden> Tue, 08 Mar 2022 14:58:00 +0100

Changed in systemd (Ubuntu Jammy):
status: In Progress → Fix Released
Revision history for this message
Luca Boccassi (bluca) wrote :

Fixed also in systemd v250.4 and v249.11

Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello Christian, or anyone else affected,

Accepted systemd into impish-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/systemd/248.3-1ubuntu8.4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-impish to verification-done-impish. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-impish. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in systemd (Ubuntu Impish):
status: New → Fix Committed
tags: added: verification-needed verification-needed-impish
Revision history for this message
Brian Murray (brian-murray) wrote :

Hello Christian, or anyone else affected,

Accepted systemd into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/systemd/245.4-4ubuntu3.16 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in systemd (Ubuntu Focal):
status: New → Fix Committed
tags: added: verification-needed-focal
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I can't test this reliably (as stated in the SRU description), but at least I can say I haven't seen it in the last 24h :-) I think this is on @gjolly to try to reproduce it in the mentioned azure test environment.

Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (systemd/245.4-4ubuntu3.16)

All autopkgtests for the newly accepted systemd (245.4-4ubuntu3.16) for focal have finished running.
The following regressions have been reported in tests triggered by the package:

gvfs/1.44.1-1ubuntu1 (arm64, ppc64el, amd64)
linux-aws-5.13/5.13.0-1019.21~20.04.1 (arm64)
snapd/2.54.3+20.04.1ubuntu0.2 (arm64, ppc64el, s390x)
docker.io/20.10.7-0ubuntu5~20.04.2 (s390x)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/focal/update_excuses.html#systemd

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (systemd/248.3-1ubuntu8.4)

All autopkgtests for the newly accepted systemd (248.3-1ubuntu8.4) for impish have finished running.
The following regressions have been reported in tests triggered by the package:

systemd/248.3-1ubuntu8.4 (arm64, ppc64el)
debspawn/0.5.0-1 (s390x)
diaspora-installer/0.7.15.0+debian1 (s390x, arm64)
swupdate/2020.11-2 (ppc64el)
snapd/2.54.3+21.10.1ubuntu0.2 (s390x, ppc64el)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/impish/update_excuses.html#systemd

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Nick Rosbrook (enr0n) wrote :

The autopkgtest regressions blocking systemd 245.4-4ubuntu3.16 in focal-proposed have been resolved. The regressions appear to have been related to recent autopkgtest infrastructure issues, and retrying the tests resolved the issues.

Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello Christian, or anyone else affected,

Accepted systemd into impish-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/systemd/248.3-1ubuntu8.5 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-impish to verification-done-impish. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-impish. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (systemd/248.3-1ubuntu8.4)

All autopkgtests for the newly accepted systemd (248.3-1ubuntu8.4) for impish have finished running.
The following regressions have been reported in tests triggered by the package:

systemd/248.3-1ubuntu8.4 (arm64, ppc64el)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/impish/update_excuses.html#systemd

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (systemd/248.3-1ubuntu8.5)

All autopkgtests for the newly accepted systemd (248.3-1ubuntu8.5) for impish have finished running.
The following regressions have been reported in tests triggered by the package:

snapd/2.54.3+21.10.1ubuntu0.2 (s390x, ppc64el, arm64, amd64)
casper/1.465 (amd64)
tpm2-abrmd/unknown (ppc64el)
corosync-qdevice/unknown (ppc64el)
nftables/unknown (ppc64el)
udisks2/2.9.4-1 (arm64)
diaspora-installer/0.7.15.0+debian1 (arm64, s390x)
systemd/248.3-1ubuntu8.5 (armhf)
swupdate/2020.11-2 (s390x)
munin/2.0.57-1ubuntu1 (arm64, amd64)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/impish/update_excuses.html#systemd

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Nick Rosbrook (enr0n) wrote :

The autopkgtest regressions blocking systemd 248.3-1ubuntu8.5 in impish-proposed have been resolved. The regressions were caused either by (1) network/infrastructure issues and succeeded on retry, or by (2) unrelated snapd regressions. The systemd 248.3-1ubuntu8.5 upload added an autopkgtest change for systemd to ignore failed snap mount units which are caused by unrelated snapd issues.

Revision history for this message
Gauthier Jolly (gjolly) wrote :

I started 156 instances for both:

 * an impish image with systemd 248.3-1ubuntu8.5
 * a focal image with systemd 245.4-4ubuntu3.16

In both images, I also removed the workaround in /etc/systemd/system/dbus.service.d (Environment=SYSTEMD_NSS_DYNAMIC_BYPASS=1) to make sure I was really testing the fix.

None of the instances shown any provisioning issue (cloud-init was always done withing 5min). This was the main symptom we were observing previously with this issue.

Nick Rosbrook (enr0n)
tags: added: verification-done verification-done-focal verification-done-impish
removed: verification-needed verification-needed-focal verification-needed-impish
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package systemd - 248.3-1ubuntu8.5

---------------
systemd (248.3-1ubuntu8.5) impish; urgency=medium

  * debian/tests/boot-and-services: Ignore failed snap mount units in test_no_failed (LP: #1967576)

systemd (248.3-1ubuntu8.4) impish; urgency=medium

  [ Lukas Märdian ]
  * Fix deadlock between pid1 and dbus-daemon (LP: #1871538)
    Author: Lukas Märdian
    File: debian/patches/pid1-set-SYSTEMD_NSS_DYNAMIC_BYPASS-1-env-var-for-dbus-da.patch
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=f79535077473902bad911dc2652a2fff4066fa30
  * Don't override Ubuntu's default sysctl values (LP: #1962038)
    Author: Lukas Märdian
    File: debian/patches/debian/UBUNTU-Don-t-override-Ubuntu-s-default-sysctl-values-LP-1962038.patch
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=3ba2764d8f77e616461c9722923f685fad79f8c6

systemd (248.3-1ubuntu8.3) impish; urgency=medium

  [ Jeremy Szu ]
  * Add a allowlist to unblock intel-hid on new HP machines (LP: #1955997)
    Author: Jeremy Szu
    File: debian/patches/lp1955997-add-a-allowlist-to-unblock-intel-hid-on-HP-mach.patch
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=554d46e6a6ab80befd424ead7ffa8e6f993b5f66

 -- Nick Rosbrook <email address hidden> Fri, 01 Apr 2022 16:39:25 -0400

Changed in systemd (Ubuntu Impish):
status: Fix Committed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Update Released

The verification of the Stable Release Update for systemd has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package systemd - 245.4-4ubuntu3.16

---------------
systemd (245.4-4ubuntu3.16) focal; urgency=medium

  [ Dan Streetman ]
  * d/p/lp1946388-sd-journal-don-t-check-namespaces-if-we-have-no-name.patch:
    Avoid journalctl segfault (LP: #1946388)

  [ Jeremy Szu ]
  * Add a allowlist to unblock intel-hid on new HP machines (LP: #1955997)
    Author: Jeremy Szu
    File: debian/patches/lp1955997-add-a-allowlist-to-unblock-intel-hid-on-HP-mach.patch
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=88a859eaddb6c9a611fcbc44edab441aef4c4355

  [ Nick Rosbrook ]
  * Prevent arguments from being overwritten with defaults at shutdown (LP: #1958284)
    File: debian/patches/lp1958284-core-move-reset_arguments-to-the-end-of-main-s-finish.patch
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=e61052bd1f20bcc54e7417542c6d445cf5040f56

  [ Lukas Märdian ]
  * Fix deadlock between pid1 and dbus-daemon (LP: #1871538)
    Author: Lukas Märdian
    File: debian/patches/pid1-set-SYSTEMD_NSS_DYNAMIC_BYPASS-1-env-var-for-dbus-da.patch
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=e3aacfa26e3fc6df369e6f28e740389ae0020907

 -- Nick Rosbrook <email address hidden> Wed, 23 Mar 2022 09:29:33 -0400

Changed in systemd (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Mario Limonciello (superm1) wrote :

> https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=e3aacfa26e3fc6df369e6f28e740389ae0020907

This appears to have caused a regression in fwupd in Ubuntu 20.04 with details at https://github.com/fwupd/fwupd/issues/3037

fwupd-refresh.service uses DynamicUser and now hits this upstream bug: https://github.com/systemd/systemd/issues/22737

Revision history for this message
Lukas Märdian (slyon) wrote :

Quoting upstream systemd developers (https://github.com/systemd/systemd/issues/22737#issuecomment-1077682307):
"We essentially traded one problem (lockup when starting services) for another (the failure described in this commit).
I actually think that the lockup is worse. Here there is a simple solution: switch from dbus-daemon to dbus-broker. [...]

A proper fix will most likely be to make all dbus calls asynchronous in systemd, but that is a lot of work and it's unclear when/if it'll happen. The regression is unfortunate, but I don't think we can fix it in reasonable time."

So I wonder what's the best path forward in Ubuntu... if we revert, we'll re-introduce the lockup/timeout problem with dbus-daemon. If we keep the current version, fwupd-refresh.service is broken.

The comment at https://github.com/fwupd/fwupd/issues/3037#issuecomment-1100816992 suggests that disabling the DynamicUser= setting makes the service work again. Maybe that's worth a try, in order to get both problems solved? (i.e. shipping an override config for fwupd)

$cat /etc/systemd/system/fwupd-refresh.service.d/override.conf
[Service]
DynamicUser=no

Revision history for this message
Mario Limonciello (superm1) wrote :

> The comment at https://github.com/fwupd/fwupd/issues/3037#issuecomment-1100816992 suggests that disabling the DynamicUser= setting makes the service work again. Maybe that's worth a try, in order to get both problems solved? (i.e. shipping an override config for fwupd)

> $cat /etc/systemd/system/fwupd-refresh.service.d/override.conf
> [Service]
> DynamicUser=no

The whole point of using DynamicUser in this case was so that fwupdmgr didn't run as root when it reached out to the web to get data. Yes, certainly turning off DynamicUser in fwupd-refresh.service will solve the problem, but that is then a different threat vector.

I think if turning off DynamicUser=1, then we probably need to also create a new service account for that systemd service to use.

I'll open a new bug to move this discussion to: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1969976

Changed in dbus:
status: Unknown → New
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.