Comment 16 for bug 688541

Revision history for this message
Clint Byrum (clint-fewbar) wrote : Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)

On Thu, 2010-12-16 at 15:45 +0000, Michael Biebl wrote:
> 2010/12/16 Clint Byrum <email address hidden>:
> >
> >
> > I'm attaching a debdiff that solves the race as far as I can tell,
> > though I think it needs a good long look, since it could mean shutdowns
> > hang for a long time waiting (I'm especially curious if the pre-stop
> > /post-stop's are subject to kill timeout)
>
> This code is still racy, afaics. What about upstart jobs, which are
> not stopped by "stop on runlevel [016]"? They could receive their stop
> signal at a point when your loop has already been run.
>

Indeed, there is still a race I think now that I dig through upstart's
code a bit. If any of the jobs in the stop/!waiting state have 'stop on
stopped' jobs that will be stopped after they stop, the event isn't
emitted until *after* the transition to stop/waiting.

thread A (upstart job foo):

start/running -> stop/pre-stop
sends TERM to owned process
stop/pre-stop -> stop/killed
process dies
stop/killed -> stop/waiting
emit stopped JOB=foo

thread B (upstart job baz)
start/running -> stop/pre-stop
sends kill to owned process
stop/pre-stop -> stop/killed
process dies
stop/killed -> stop/waiting

thread C (sleep loop)

runs initctl list
greps
sleeps
runs initctl list
greps
sleeps

list is handled by doing a "get all jobs" command first, and then
individual status commands for each job, so its entirely possible that
we will ask for the status of baz and it will say start/running, and
then foo finishes its transition, then we ask for foo's status and it is
stop/waiting, and we think we're done.

This race would probably be solved by having a "list all jobs with
status" command, as long as the stopped event is guaranteed to be
consumed before any commands, which, I believe it will.

One delicate issue is that if an upstart managed process dies for any
other reason than being stopped, upstart will try to respawn it, so we
can't just go sending SIGTERM/SIGKILL to all pids, as upstart will fight
us on those. We actually have to stop everything.

> If you don't want to change existing jobs, we probably have to pick up
> Ante's suggestion, and do the following in sendsigs:
>
> 1) run a for loop to wait for *all* running upstart jobs to stop.
> upstart jobs which need to keep running past sendsigs (e.g. plymouth)
> need to signal that using a similar mechanism like the killall5
> sendsigs.d omit interface. I'd at least give upstart jobs 60secs time
> to stop, so big databases etc have enough time to cleanly shutdown

IMO, leaving out a valid stop on that gets it stopped at or before
runlevel [016] is the equivilent of the omit interface. You've started
it, saying exactly when upstart should or should not stop it. However,
if you've wandered into the scenario mentioned above with stop on
stopped foo, then we need to handle that.

> 2.) run a for loop and send SIGTERM all remaining processes, but do
> *not* add upstart pids to $OMITPIDS

See above, you'd have to send 'stop' commands to upstart for them,
instead of omitting them.

> 3.) send a final SIGKILL if any processes are left.
>

I'd say "let upstart do that".. but how do we know when we can continue
on to unmounting? I suppose after a lengthy timeout (60s does seem long
enough, though mysql can take longer) this makes sense.

>
> Regarding 1.), it would be nice to have a native C implementation in
> upstart, instead of running initctl, grep and sleep manually.
>

I agree, but I'm having trouble envisioning exactly what one would ask
for. "Block until all current goals are reached." Would work maybe.