shutdown hangs at "Waiting for process: ..." for 90s, ignoring DefaultTimeoutStopSec

Bug #1958284 reported by Jean Raby
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
systemd (Ubuntu)
Fix Released
Undecided
Unassigned
Focal
Fix Released
Medium
Unassigned

Bug Description

[Impact]

The systemd shutdown sequence does not honor systemd-system.conf settings when waiting for remaining processes. This means that, for example, if a systemd service specifies KillMode=process and a process remaining from that service does not properly handle SIGTERM, then the remaining process will not be killed until after the compiled-in default value of DefaultTimeoutStopSec (90s), even if the user has changed the setting of DefaultTimeoutStopSec. In such cases, this impacts users by significantly increasing the time required for shutdown/reboot.

[Test Plan]

* Create a new script, /usr/local/bin/loop-ignore-sigterm:
  ```
  #!/bin/bash
  loop_forever() {
      while true; do sleep 1; done
  }

  (
  trap 'echo Ignoring SIGTERM...' SIGTERM
  loop_forever
  )

  loop_forever
  ```

  This script will spawn a subshell which will loop forever and ignore
  SIGTERM. This will force systemd to wait for the subprocess at
  reboot/shutdown, and eventually send SIGKILL after TimeoutStopSec
  (DefaultTimeoutStopSec in this case).

* Make the script executable:
  $ chmod +x /usr/local/bin/loop-ignore-sigterm

* Create a systemd service for this script. Add the following to
  /etc/systemd/system/loop-ignore-sigterm.service:
  ```
  [Service]
  KillMode=process
  ExecStart=/usr/local/bin/loop-ignore-sigterm
  ```

* Start the service:
  $ systemctl start loop-ignore-sigterm.service

* Edit /etc/systemd/system.conf, and uncomment the
 'DefaultTimeoutStopSec=90s' line. Modify 90s to something much shorter,
  e.g. 20s.

* Re-exec the daemon so this new default takes effect:
  $ systemctl daemon-reexec

* Reboot, and monitor the logs. Observe that systemd-shutdown will wait
  for the loop-ignore-sigterm process for 90s, instead of the 20s
  configured earlier.

[Where problems could occur]

The patch moves the reset_arguments() call to the end of main, which means reset_arguments() is no longer called before daemon re-execution (if that branch is taken). If anything in that code path relied on reset_arguments() being called before re-executing, those assumptions could be broken. Any such problems would potentially be seen during daemon re-execution, e.g. when calling systemctl daemon-reexec.

[ Original Description ]

With systemd v245 as shipped with 20.04, the shutdown sequence does not use the value of `DefaultTimeoutStopSec` to wait for remaining processes, it instead uses the compiled in default of 90s.

This is most visible with services that use `KillMode=process` (docker, k8s, k3s, etc...), especially if the remaining processes do not handle `SIGTERM` or choose to ignore it.

For example:
```
[ OK ] Finished Reboot.
[ OK ] Reached target Reboot.
[ 243.652848 ] systemd-shutdown[1]: Waiting for process: containerd-shim, containerd-shim, containerd-shim, fluent-bit

--- hangs here for 90s even if DefaultTimeoutStopSec is set to a lower value ---

```

The bug has been fixed upstream here: https://github.com/systemd/systemd/commit/7d9eea2bd3d4f83668c7a78754d201b22

Marc was kind enough to package the patch for 20.04 so I could test it (https://launchpad.net/~mdeslaur/+archive/ubuntu/testing/+sourcepub/13210617/+listing-archive-extra) and with that package, I can confirm that it indeed fixes the issue.

Here's a few github issues I stumbled upon while trying to debug this, along with a short writeup of the workaround I ended up using:

- https://github.com/moby/moby/issues/41831
- https://github.com/k3s-io/k3s/issues/2400
- https://github.com/systemd/systemd/issues/16991
- https://raby.sh/debugging-90s-hangs-during-shutdown-on-ubuntu-2004.html

Of course, it would be much better if all the processes would properly handle `SIGTERM`, but having a way to enforce a maximum wait time at shutdown is a decent workaround.

Given that the patch is relatively simple, would it be possible to add it the package for 20.04?

Thanks

Related branches

Jean Raby (g-jean)
description: updated
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in systemd (Ubuntu):
status: New → Confirmed
Lukas Märdian (slyon)
tags: added: rls-ff-incoming
tags: added: fr-1987
tags: removed: fr-1987 rls-ff-incoming
Changed in systemd (Ubuntu Focal):
status: New → Confirmed
Lukas Märdian (slyon)
Changed in systemd (Ubuntu Focal):
importance: Undecided → Medium
Revision history for this message
Marc Deslauriers (mdeslaur) wrote :

Any updates on this?

Revision history for this message
Lukas Märdian (slyon) wrote :

It has recently been picked up by Foundations, and we should have the capacity to start working on this next week.

Nick Rosbrook (enr0n)
description: updated
Lukas Märdian (slyon)
Changed in systemd (Ubuntu Focal):
status: Confirmed → In Progress
Changed in systemd (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello Jean, or anyone else affected,

Accepted systemd into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/systemd/245.4-4ubuntu3.16 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in systemd (Ubuntu Focal):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-focal
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (systemd/245.4-4ubuntu3.16)

All autopkgtests for the newly accepted systemd (245.4-4ubuntu3.16) for focal have finished running.
The following regressions have been reported in tests triggered by the package:

gvfs/1.44.1-1ubuntu1 (arm64, ppc64el, amd64)
linux-aws-5.13/5.13.0-1019.21~20.04.1 (arm64)
snapd/2.54.3+20.04.1ubuntu0.2 (arm64, ppc64el, s390x)
docker.io/20.10.7-0ubuntu5~20.04.2 (s390x)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/focal/update_excuses.html#systemd

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Nick Rosbrook (enr0n) wrote :

I tested systemd 245.4-4ubuntu3.16 from focal-proposed using the test plan above. I observed that the loop-ignore-sigterm.service processes were killed after ~20s on shutdown, which is what I configured in /etc/systemd/system.conf.

tags: added: verification-done-focal
removed: verification-needed-focal
Revision history for this message
Nick Rosbrook (enr0n) wrote :

The autopkgtest regressions blocking systemd 245.4-4ubuntu3.16 in focal-proposed have been resolved. The regressions appear to have been related to recent autopkgtest infrastructure issues, and retrying the tests resolved the issues.

Nick Rosbrook (enr0n)
tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package systemd - 245.4-4ubuntu3.16

---------------
systemd (245.4-4ubuntu3.16) focal; urgency=medium

  [ Dan Streetman ]
  * d/p/lp1946388-sd-journal-don-t-check-namespaces-if-we-have-no-name.patch:
    Avoid journalctl segfault (LP: #1946388)

  [ Jeremy Szu ]
  * Add a allowlist to unblock intel-hid on new HP machines (LP: #1955997)
    Author: Jeremy Szu
    File: debian/patches/lp1955997-add-a-allowlist-to-unblock-intel-hid-on-HP-mach.patch
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=88a859eaddb6c9a611fcbc44edab441aef4c4355

  [ Nick Rosbrook ]
  * Prevent arguments from being overwritten with defaults at shutdown (LP: #1958284)
    File: debian/patches/lp1958284-core-move-reset_arguments-to-the-end-of-main-s-finish.patch
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=e61052bd1f20bcc54e7417542c6d445cf5040f56

  [ Lukas Märdian ]
  * Fix deadlock between pid1 and dbus-daemon (LP: #1871538)
    Author: Lukas Märdian
    File: debian/patches/pid1-set-SYSTEMD_NSS_DYNAMIC_BYPASS-1-env-var-for-dbus-da.patch
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=e3aacfa26e3fc6df369e6f28e740389ae0020907

 -- Nick Rosbrook <email address hidden> Wed, 23 Mar 2022 09:29:33 -0400

Changed in systemd (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Update Released

The verification of the Stable Release Update for systemd has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.