/etc/init.d/sendsigs fails to kill some processes

Bug #665185 reported by danmb
18
This bug affects 1 person
Affects Status Importance Assigned to Milestone
sysvinit (Ubuntu)
Fix Released
Low
Surbhi Palande
Lucid
Fix Released
High
Surbhi Palande
Maverick
Fix Released
Low
Surbhi Palande

Bug Description

Binary package hint: sysvinit

While tracking the cause of unclean shutdowns on my system (/ fails to unmount), I stumbled on the following.

In /etc/init.d/sendsigs, there is an inner loop which appends to a variable:

                for pid in $(initctl list | sed -n -e "/process [0-9]/s/.*process //p"); do
                        OMITPIDS="${OMITPIDS:+$OMITPIDS }-o $pid"
                done

This loop gets executed 10 times by an outer loop:

for seq in 1 2 3 4 5 6 7 8 9 10; do
...
done

OMITPIDS grows very long, and at some point the subsequent command

killall5 -18 $OMITPIDS

overflows with "omit pid buffer size 16 exceeded!" (I guess killall5 "benefits" from legacy coding practices...)

Solution: reinitialize OMITPIDS inside the outer loop (before the inner loop):

OMITPIDS0="$OMITPIDS"
for seq in 1 2 3 4 5 6 7 8 9 10; do
  OMITPIDS="$OMITPIDS0"
  # inner loop
  # do the killall -18 test ...
done

Of course this can still fail and killall5 should be fixed.

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: initscripts 2.87dsf-4ubuntu17
ProcVersionSignature: Ubuntu 2.6.32-25.45-generic 2.6.32.21+drm33.7
Uname: Linux 2.6.32-25-generic i686
NonfreeKernelModules: nvidia
Architecture: i386
Date: Fri Oct 22 18:56:31 2010
ProcEnviron:
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: sysvinit

Revision history for this message
danmb (danmbox) wrote :
Revision history for this message
danmb (danmbox) wrote :

I wonder why ubuntu-bug filed this report against sysvinit? The problem is in the initscripts package, and that's what I originally requested...

Revision history for this message
danmb (danmbox) wrote :

This bug has not been fixed in Lucid and is still present in Natty (mitigated there by a fixed killall5 (which has no hard limit on the omitpid buffer). Ubuntu Lucid should either patch /etc/init.d/sendscripts as per in my original report (and attached patch), or backport killall5. This is a serious issue that affects proper shutdown.

In any case, the OMITPID code is... puzzling. Some PID numbers are granted immunity because at some point they were Upstart jobs. PIDs that stay alive accumulate multiple times in the OMITPIDS list during 10 iterations. PIDs can wrap around and the immunity could "save" an unrelated, non-upstart process.

Could Steve Langasek (per debian/changelog) or someone else explain the OMITPID code? I understand that the list of upstart jobs is dynamical and re-query is needed, but why hang on to old PIDs from previous iterations?

tags: added: patch
Revision history for this message
Steve Langasek (vorlon) wrote :

Dan,

I think my thought process here was that any PID belonging to an upstart job at the time this was run is astronomically unlikely to get reused for a different process during the shutdown sequence, and I don't think I noticed that we're requerying the list of pids repeatedly in a loop and concatenating the results - that's wrong for obvious reasons.

I haven't looked at your patch, but this looks like a sensible thing to do here. Asking the Foundations team to look at it.

Changed in sysvinit (Ubuntu):
assignee: nobody → Canonical Foundations Team (canonical-foundations)
importance: Undecided → Low
status: New → Triaged
Revision history for this message
danmb (danmbox) wrote :

Steve, note that while this issue has been masked in Maverick & Natty by making killall5 accept any number of OMITPIDs, it is still a problem in Lucid for those who have upstart jobs that won't shut down. The symptom is a journal replay (or heaven forbid worse) after every single reboot (because killall5 crashes and doesn't kill what it's supposed to kill). It depends on how many buggy upstart jobs one has installed...

For Maverick and Natty: it may seem that PID reusal is unlikely, but it's not quite so: at boot time, a long-running upstart job gets a low PID (e.g. 425). The system stays up for a day or two. In that interval, PIDs wrap around 32767 several times. At shutdown, it's very possible for a new process to hit PID 425 and get inappropriate immunity. You may have thought that reuse only occurred if PIDs made a full wrap-around in the 10 seconds of so worth of shutdown -- but it's not so.

In any case, as I said, why even run this risk? There's no need to accumulate PIDs that once belonged to now-dead upstart jobs (I think you agreed with my analysis of the code, but I'm not quite sure, given your phrasing.)

Revision history for this message
Surbhi Palande (csurbhi) wrote :
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package sysvinit - 2.87dsf-4ubuntu22

---------------
sysvinit (2.87dsf-4ubuntu22) natty; urgency=low

  [ Dan Muresan ]
  * sendsigs: OMITPIDS needs to be reinitalized for every loop iteration
    before concatanating pids of upstart jobs to it. Otherwise it overflows
    and throws an error. (LP: #665185)
 -- Surbhi Palande <email address hidden> Mon, 28 Mar 2011 13:11:49 +0300

Changed in sysvinit (Ubuntu):
status: Triaged → Fix Released
Revision history for this message
danmb (danmbox) wrote :

Great, thanks. Now what do I have to do to get this fix into Lucid, where it actually *is* a problem (killall5 doesn't overflow on Natty, because it's been fixed, but it does on Lucid)?

The LTS release still has years left of support, but I'm really getting the all-around feeling that it is shunned, ignored, and being treated as legacy software.

Revision history for this message
Colin Watson (cjwatson) wrote :

Surbhi, could you propose stable release updates for lucid and maverick for this? Thanks.

Changed in sysvinit (Ubuntu):
assignee: Canonical Foundations Team (canonical-foundations) → Surbhi Palande (csurbhi)
Changed in sysvinit (Ubuntu Lucid):
status: New → Triaged
importance: Undecided → Low
assignee: nobody → Surbhi Palande (csurbhi)
importance: Low → High
milestone: none → ubuntu-10.04.3
Changed in sysvinit (Ubuntu Maverick):
status: New → Triaged
importance: Undecided → Low
assignee: nobody → Surbhi Palande (csurbhi)
Revision history for this message
Surbhi Palande (csurbhi) wrote :
Revision history for this message
Surbhi Palande (csurbhi) wrote :
Revision history for this message
Michael Vogt (mvo) wrote :

I sponsored the lucid and maverick uploads now, many thanks Surbhi for the porting.

Changed in sysvinit (Ubuntu Lucid):
status: Triaged → In Progress
Changed in sysvinit (Ubuntu Maverick):
status: Triaged → In Progress
Revision history for this message
Surbhi Palande (csurbhi) wrote :
Revision history for this message
Surbhi Palande (csurbhi) wrote :
Revision history for this message
Michael Vogt (mvo) wrote :

I sponsored them now but they will probably have to wait in the queue for a little bit longer until the other SRU for sysvinit is verified.

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Michael, agreed. the patch looks good, and should enter lucid-proposed as soon as the previous SRU for sysvinit passes verification. Since neither of you were involved with that SRU, you can speed up that process by verifying that the one preceding your upload works, and posting your results in bug #672177

Revision history for this message
Martin Pitt (pitti) wrote : Please test proposed package

Accepted sysvinit into lucid-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in sysvinit (Ubuntu Lucid):
status: In Progress → Fix Committed
tags: added: verification-needed
Changed in sysvinit (Ubuntu Maverick):
status: In Progress → Fix Committed
Revision history for this message
Martin Pitt (pitti) wrote :

Accepted sysvinit into maverick-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Revision history for this message
Lukáš Calda (lukas-calda) wrote :

Martin Pitt> Unfortunately, this doesn't work for me. The problem remains, computer doesn't shut down at all. Acer Extensa 3000, Lucid Lynx.

Revision history for this message
danmb (danmbox) wrote :

There are other bugs in Ubuntu that can cause shutdown problems, for example Bug #665195 and bug #665176.

I fixed my problems by creating a /etc/init.d/sysshutdown as attached (and adding it to the various runlevels). This at least gives you a chance to examine the situation (in Virtual Console 9) and do a Alt + Sysrq + RSEIUB if need be. This, or something similar, should be included in distributions, since improper shutdown is dangerous. Sysrq can be simulated by writing to /proc/sysrq-trigger. Ubuntu should at least force-remount read-only (via sysrq) before doing its customary improper shutdowns.

Revision history for this message
Lukáš Calda (lukas-calda) wrote :

Thanks for your reply, Dan! I managed to create the sysshutdown file but I don't know how to do the various runlevels? I did "sudo update-rc.d syshutdown defaults" Is it ok this way? Cause if I did it right, then it didn't solve the problem. I still have to do hard shutdown.
Please note me being just user who doesn't understand how linux really works. :(

Revision history for this message
danmb (danmbox) wrote :

Then please ignore my previous comment and file a separate bug. This bug is about unclean shutdowns, while yours sounds like the machine hangs on shutdown -- a different symptom.

Revision history for this message
Dave Walker (davewalker) wrote :

Is there any progress on verification of lucid, it's blocking other fixes. Thanks :)

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

There is no proper TEST CASE for this bug, which may explain why it has not been verified.

Dan/Surbhi, can you please add a full TEST CASE as required in https://wiki.ubuntu.com/StableReleaseUpdates ? This bug has sat on the -proposed heap far too long without verification.

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

SRU team: I think we should waive the need to pass verification on this bug, and allow it to enter maverick/lucid updates. It is a very small patch, and no doubt does good. It has been in the respective proposed pockets for a long time, so regressions don't seem likely.

Thoughts?

Revision history for this message
taka k. (scar) wrote :

in my Bug #789419, which has been marked a duplicate, i was told to enable lucid-proposed but i don't even have sysvinit installed so i fail to see how it has been fixed and why it is a duplicate.

Revision history for this message
Steve Langasek (vorlon) wrote : Re: [Bug 665185] Re: /etc/init.d/sendsigs fails to kill some processes

On Wed, Oct 19, 2011 at 01:14:35AM -0000, scar wrote:
> in my Bug #789419, which has been marked a duplicate, i was told to
> enable lucid-proposed but i don't even have sysvinit installed so i fail
> to see how it has been fixed and why it is a duplicate.

You would need to install the initscripts package. sysvinit is the name of
the source package that provides this binary.

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
<email address hidden> <email address hidden>

Revision history for this message
taka k. (scar) wrote :

initscripts is already the latest version 2.87dsf-4ubuntu17.4 from lucid-updates, there is no update in lucid-proposed.

Revision history for this message
Steve Langasek (vorlon) wrote :

Indeed, the latest version of sysvinit in lucid-updates includes this change; apparently the version fixing this bug was later superseded by another fix that didn't mention this change in its changelog. Marking as fixed for lucid.

I think that's evidence that bug #789419 is not a duplicate of this one.

Changed in sysvinit (Ubuntu Lucid):
milestone: ubuntu-10.04.3 → none
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package sysvinit - 2.87dsf-4ubuntu19.2

---------------
sysvinit (2.87dsf-4ubuntu19.2) maverick-proposed; urgency=low

  [ Dan Muresan ]
  * sendsigs: OMITPIDS needs to be reinitalized for every loop iteration
    before concatanating pids of upstart jobs to it. Otherwise it overflows
    and throws an error. (LP: #665185)
 -- Surbhi Palande <email address hidden> Tue, 05 Apr 2011 12:35:31 +0300

Changed in sysvinit (Ubuntu Maverick):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.