Comment 14 for bug 1124384

Revision history for this message
Stéphane Graber (stgraber) wrote :

So we've spent a good part of the afternoon going through the two cloud-init bugs with James and came to the conclusion that they are actually the same bug.

Both initctl --reload-configuration and an upstart stateful re-exec cause upstart to reload its configuration, destroy existing jobclass entries and create new ones.

As part of the process of destroy and re-creating job class entries, upstart decrements the reference counter of some related objects, including emitted events.
That has the result that if a job depends on two events, one that has already been emitted and another that hasn't been emitted yet and that the job that emitted the first event is being reloaded, then the record of that event will be dropped, leading to a failure to start the job (as only half the start condition will match).

The part of the code that causes this issue is post-reexec, which means that once we come up with a fix for this, we'll be able to SRU it and have upstart re-exec itself, applying the fix in the process.
That also means that we can't SRU any of upstart's dependencies until this issue is resolved.

James is currently working on testcases for the various scenarios that we know we need to support, so we can have comprehensive regression tests before we attempt to sort this issue. Our current hope is to have a fix for this by the end of the week.