Comment 35 for bug 687535

Revision history for this message
Scott James Remnant (scott) wrote : Re: [Bug 687535] Re: upstart loses track of ssh daemon after reload ssh

That being said, the current process tracker is very much not ideal.

I'm working on a new one, and will probably post in this week in the
form of a test program to be run and played with.

Scott

On Mon, Feb 7, 2011 at 8:48 PM, Steve Langasek
<email address hidden> wrote:
> On Tue, Feb 08, 2011 at 04:14:59AM -0000, Scott James Remnant wrote:
>> They definitely should be allowed, ssh is actually a canon example of
>> why.
>
>> Upstart should supervise the sshd daemon, not the login sub-process
>> associated with a particular connection, and certainly not any
>> processes being run inside the login session.
>
>> Otherwise "stop ssh" would kill all user logins; and ssh would not
>> respawn on crash if there was still a used logged in (or running
>> screen!)
>
> Good point, comment withdrawn. :)
>
> --
> Steve Langasek                   Give me a lever long enough and a Free OS
> Debian Developer                   to set it on, and I can move the world.
> Ubuntu Developer                                    http://www.debian.org/
> <email address hidden>                                     <email address hidden>
>
> --
> You received this bug notification because you are a member of Upstart
> Developers, which is subscribed to upstart .
> https://bugs.launchpad.net/bugs/687535
>
> Title:
>  upstart loses track of ssh daemon after reload ssh
>
> Status in Upstart:
>  Invalid
> Status in “openssh” package in Ubuntu:
>  Fix Released
> Status in “openssh” source package in Lucid:
>  Fix Released
> Status in “openssh” source package in Maverick:
>  Fix Released
>
> Bug description:
>  When sshd gets a signal 1 for reload, it forks a new process and
>  ditches the old. This causes upstart to believe that ssh has crashed,
>  and loses track of it. A second reload (or any other initctl operation
>  on ssh) will thus say:
>
>  reload: Unknown instance:
>
>  There would be 2 ways to fix this:
>  1.  Don't have ssh fork on relod, but keep the same pid
>  2. Use a different mechanism in upstart to keep track of ssh. Maybe a pid file? Just tracking children of the exited ssh won't work, or it might accidentally track a particular session rather than the master, if somebody just happens to log in close to reload time.
>
>  openssh-server  1:5.3p1-3ubuntu4
>  upstart         0.6.5-7
>
>  ==== Info for Maverick, Lucid SRU ====
>  IMPACT: if sshd gets a HUP signal, it forks a new process and upstart thinks the process died and loses track of it, so the user/admin uses the ability to stop/start/reload the daemon through upstart.
>  The problem is fixed in Natty 5.6p1-2ubuntu3. See attached patches for Maverick and Lucid.
>
>  TEST CASE:
>
>  - install openssh-server
>  - send a HUP signal to sshd
>  - the daemon is restarted, but upstart thinks that it crashed (/var/log/daemon.log):
>
>  Dec 28 20:59:57 utest-lls32 init: ssh main process ended, respawning
>  Dec 28 20:59:57 utest-lls32 init: ssh main process (1451) terminated with status 255
>  Dec 28 20:59:57 utest-lls32 init: ssh main process ended, respawning
>  Dec 28 20:59:57 utest-lls32 init: ssh main process (1455) terminated with status 255
>  Dec 28 20:59:57 utest-lls32 init: ssh respawning too fast, stopped
>
>  - after this, upstart won't know about sshd, despite the daemon
>  running just fine:
>
>  root@utest-lls32:~# reload ssh
>  reload: Unknown instance:
>
>  With the fix applied, the correct behavior is:
>
>  - send a HUP signal to sshd
>    ps ax |grep sshd
>    kill -HUP sshd
>  - the daemon reloads (/var/log/auth.log):
>
>  Dec 28 21:37:01 utest-lls32 sshd[742]: Received SIGHUP; restarting.
>  Dec 28 21:37:01 utest-lls32 sshd[742]: Server listening on 0.0.0.0 port 22.
>  Dec 28 21:37:01 utest-lls32 sshd[742]: Server listening on :: port 22.
>
>  - reloading with upstart gives the same result, and NOT an error
>  message.
>
>  REGRESSION POTENTIAL:
>
>  There is a small race condition in sshd between when it forks, and
>  when it listens for incoming connections. The length of the race is
>  lengthened by a very tiny amount by considering sshd started as soon
>  as it has been executed, rather than when it forks. This will only
>  affect jobs that use 'start on started ssh' and immediately connect to
>  it. This is unlikely to cause problems in any real world scenario,
>  given that most of these programs would also have to fork, exec, and
>  open a socket, which is more work than what sshd will be doing in that
>  time.
>
>
>