Services not starting on boot in 10.04.1 LTS

Bug #642555 reported by Nick Davis
40
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Ubuntu
Expired
Medium
Unassigned
Lucid
Won't Fix
Medium
Canonical Foundations Team

Bug Description

I've had an issue for awhile now where on system reboot, certain services don't automatically start. Some affected include: nginx and a few custom CGI init scripts. These are all set to automatically start on boot, according to rcconf. Nothing interesting is logged to /var/log/boot{.log}.

I'd believed this was an upstart issue and related to bug 554172 (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/554172)
I tried the workarounds until an official fix was release, but neither the workarounds nor the official patch solved the issue for me.

I've also tried disabling plymouth as described in the upstart bug thread, by first removing 'quiet splash' from the GRUB menu, and then by purging the plymouth package. This likewise had no effect on my issue.

Another possibly related observance is 'shutdown -r now' doesn't work. Upon running it, I get the console message "System is going down for reboot", but my shell isn't logged out, and it never reboots. If I do the 'reboot' command that works, but it hangs the SSH session, so it doesn't seem like a proper reboot has occurred.

Revision history for this message
ingo (ingo-steiner) wrote :

The bug as reported in Bug #554172 is definitely not fixed!

The workaround to direct output to /dev/null if console is not available is just a fake. Or would you leave users with a black screen in case X-server does not start?

The result is - besides some services not starting reliable - that randomly a lot of boot messages get lost. Can be easily seen, if plymouth has been purged to display boot messages. This definitely is a bug in upstart, as upstart claims to start services whenever all prerequisites are met.

So instead of just dumping the output, upstart should wait for the console to be available.

In Bug #554172 it is stated to be a kernel-bug, where can I follow the kernel-bug, or is it rather a known feature of the kernel in case of SMP-systems?

Changed in ubuntu:
status: New → Confirmed
Revision history for this message
Robbie Williamson (robbiew) wrote :

We will begin looking at Lucid bugs targeted for 10.04.2 after the launch of 10.10. Apologies in advance for the delay.

Revision history for this message
Mike Bianchi (mbianchi-foveal) wrote :

> I'd believed this was an upstart issue and related to bug 554172 (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/554172)
> I tried the workarounds until an official fix was release, but neither the workarounds nor the official patch solved the issue for me.

Please document here precisely which workarounds you have used.

Are you seeing runlevel unknown or runlevel N 2 ?

A crucial component of my investigations was turning on the init tracing
by adding to the "linux" line in the grub file
  text init='/sbin/init --verbose'

"text" turned off the graphical boot.

"init='/sbin/init --verbose'" showed the interal activity of init .
I also _believe_ that --verbose also improved the boot reliability by
slightly serializing some (not all) things that would otherwise be parallel.

I do not have a means to exercise this bug right now, but am still very
interested.

My two cents: I think it would be good to have access to an old fashion,
one-step-at-a-time boot sequence (as an option).

Since the upstart /bin/init runs things in parallel, using events to
control critical sequences, if the writer of an init sequence does not _fully_
understand the relationships among the many services, it is possible for a
particular service to be started before or after a critical prerequisite
(that I am unaware of) has completed, making the boot randomly unreliable.

I would gladly give up boot speed for boot sequence certainty. I do not
see a way to do that with upstart. Boot sequence certainty was easy to
accomplish when everything was started by numbered filenames under
 /etc/init.d .

A variation of that theme _might_ be to record the boot sequence order of a
successful boot and then make that the normal sequence. (I know. This IS a
tricky concept in an SMP situation.)

A suggestion, slightly off topic:

Could upstart init have a command that opens /dev/console
for itself and all its children, and another command that closes it.
By putting, say, "open /dev/console" in the first /etc/init/*.conf file
and "close /dev/console" at the end of the last one, at least all the
boot sequence services would be guaranteed a writable (and presumably not
/dev/null) stdout and stderr.

Revision history for this message
Nick Davis (argoneus) wrote :

runlevel is N 2.

I've tried the --verbose boot param in grub, haven't tried the text at the beginning of the line.

I've commented out "console output" in all /etc/init/*.conf files.

I've tried editing /etc/init/rc-sysinit.conf, changing:
"start on filesystem and net-device-up IFACE=lo"
to this:
"start on filesystem and started rsyslog and net-device-up IFACE=lo"

I've also tried:
sudo apt-get install ifupdown --reinstall --purge

(These last two were suggested here http://ubuntuforums.org/showpost.php?p=9480172&postcount=20, which was linked as a suggested workaround in the bug 554172 comments).

As mentioned, also updated to the version of upstart that supposedly fixed this craziness, but alas, it didn't resolve the issue for me.

Revision history for this message
Mike Bianchi (mbianchi-foveal) wrote :

I misspoke in Comment #3.
"text" prevents X from starting.
I believe the removal of "slash" and "quiet" prevent the normal boot graphics.

Revision history for this message
Nick Davis (argoneus) wrote :

>I misspoke in Comment #3.
>"text" prevents X from starting.
>I believe the removal of "slash" and "quiet" prevent the normal boot graphics.

Oh, I've tried removing 'quiet splash' as well, without any change in behavior. In fact, this is still configured in /etc/defaults/grub: GRUB_CMDLINE_LINUX_DEFAULT="" (old value was ="quite splash").

Revision history for this message
Nick Davis (argoneus) wrote :

typo in previous comment, should be

old value was = "quiet splash"

Revision history for this message
Colin Watson (cjwatson) wrote :

It would help to try to narrow this down somewhat. While using /dev/null as a fallback if opening the real console fails is suboptimal (and I'd like to try to figure out a better fix for that, although it is tricky), it should certainly have resolved any failures to start services that were due to not having a console fd. At the very least we should now be seeing some other problem that was perhaps previously masked. As such, we should narrow that down rather than simply saying that bug 554172 has recurred, which may not quite be the case - all we know is that this bug shows similar *symptoms*, not that it has the same *cause*. I realise that if your system isn't booting properly then the distinction may not matter very much, but it does matter for us trying to figure out a solution.

One situation in which having /dev/null as the console may genuinely impede starting services would be if the service is trying to read from the console, and refusing to proceed until it gets an answer. To those affected by this bug: do you know if your system, when working correctly, would ordinarily prompt for input on the console? For example, since nginx was mentioned, an nginx configuration involving a passworded SSL private key would probably result in such a prompt.

It is unfortunate that using --verbose tends to perturb the timing such that we can't reproduce this any more. One alternative possibility might be to put 'initctl log-priority info' in /etc/init/rc-sysinit.conf, just above the 'telinit "${DEFAULT_RUNLEVEL}"' command; perhaps effectively using --verbose for a fragment of the boot sequence rather than all of it will make this bug reproducible with debugging information.

If you can reproduce this bug either with --verbose or with 'initctl log-priority info', then please attach /var/log/syslog to this bug report. The point of those options is not to work around the bug, but to generate debugging information so that we can figure out what's going on.

Revision history for this message
ingo (ingo-steiner) wrote :

It's great that you are willing to truely fix this issue!

My problems with services not starting have disappeared since purging plymouth using a patched mountall and cryptsetup. Maybe timing is different now. But still there are occasions where the boot messages are not displayed on tty1, just black screen.

Please allow me one question:
why don't you consider to resolve th root cause? Just provide a console or, in case it is not available just wait until it is available?

Revision history for this message
Colin Watson (cjwatson) wrote : Re: [Bug 642555] Re: Services not starting on boot in 10.04.1 LTS

The thing is that we don't know what the root cause of this problem
(i.e. the one at the start of this bug report) is. Without data, it's
just speculation. At the moment, we haven't proven that your missing
boot messages actually have anything to do with services not starting -
the link is only circumstantial. If there's conclusive evidence to the
contrary that I've missed, in the form of debug logs, please do point me
at it.

I'm more than happy for us to deal with the root cause once we have
definite evidence to show exactly what it is.

Revision history for this message
Martin Pitt (pitti) wrote :

As this has never been debugged fully and not even (marked as) fixed in precise, I drop the 10.04.4 milestone.

Revision history for this message
Steve Langasek (vorlon) wrote :

Indeed, it doesn't appear that anyone has been able to provide the requested logs that would enable us to reproduce this issue and debug it. Since this has not been broadly reported in 10.04 after the previous fixes were applied, this is somehow specific to the configuration of the machines where it's been seen. Nick, you said you *tried* setting --verbose on the kernel command line, but didn't indicate if that enabled you to reproduce the issue with more verbose logs - from your comment I got the impression that you were looking to this as a workaround for the issue, when as Colin says, it's meant to get us more logging information we can use to debug this.

Revision history for this message
Steve Langasek (vorlon) wrote :

It would also be helpful to know if this bug is reproducible when testing on later Ubuntu releases.

Changed in ubuntu:
importance: Undecided → Medium
status: Confirmed → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for Ubuntu because there has been no activity for 60 days.]

Changed in ubuntu:
status: Incomplete → Expired
Revision history for this message
Rolf Leggewie (r0lf) wrote :

lucid has seen the end of its life and is no longer receiving any updates. Marking the lucid task for this ticket as "Won't Fix".

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.