Comment 33 for bug 687535

Revision history for this message
Scott James Remnant (scott) wrote : Re: [Bug 687535] Re: upstart loses track of ssh daemon after reload ssh

They definitely should be allowed, ssh is actually a canon example of why.

Upstart should supervise the sshd daemon, not the login sub-process
associated with a particular connection, and certainly not any
processes being run inside the login session.

Otherwise "stop ssh" would kill all user logins; and ssh would not
respawn on crash if there was still a used logged in (or running
screen!)

Scott

On Mon, Feb 7, 2011 at 5:51 PM, Steve Langasek
<email address hidden> wrote:
> I'm not convinced this is true, Clint.  Should upstart jobs be allowed
> to spawn processes that then get orphaned from the POV of the process
> supervisor?  Maybe upstart shouldn't be tracking the new PID, but maybe
> it should instead be reaping any children left behind?
>
> --
> You received this bug notification because you are a member of Upstart
> Developers, which is subscribed to upstart .
> https://bugs.launchpad.net/bugs/687535
>
> Title:
>  upstart loses track of ssh daemon after reload ssh
>
> Status in Upstart:
>  Invalid
> Status in “openssh” package in Ubuntu:
>  Fix Released
> Status in “openssh” source package in Lucid:
>  Fix Released
> Status in “openssh” source package in Maverick:
>  Fix Released
>
> Bug description:
>  When sshd gets a signal 1 for reload, it forks a new process and
>  ditches the old. This causes upstart to believe that ssh has crashed,
>  and loses track of it. A second reload (or any other initctl operation
>  on ssh) will thus say:
>
>  reload: Unknown instance:
>
>  There would be 2 ways to fix this:
>  1.  Don't have ssh fork on relod, but keep the same pid
>  2. Use a different mechanism in upstart to keep track of ssh. Maybe a pid file? Just tracking children of the exited ssh won't work, or it might accidentally track a particular session rather than the master, if somebody just happens to log in close to reload time.
>
>  openssh-server  1:5.3p1-3ubuntu4
>  upstart         0.6.5-7
>
>  ==== Info for Maverick, Lucid SRU ====
>  IMPACT: if sshd gets a HUP signal, it forks a new process and upstart thinks the process died and loses track of it, so the user/admin uses the ability to stop/start/reload the daemon through upstart.
>  The problem is fixed in Natty 5.6p1-2ubuntu3. See attached patches for Maverick and Lucid.
>
>  TEST CASE:
>
>  - install openssh-server
>  - send a HUP signal to sshd
>  - the daemon is restarted, but upstart thinks that it crashed (/var/log/daemon.log):
>
>  Dec 28 20:59:57 utest-lls32 init: ssh main process ended, respawning
>  Dec 28 20:59:57 utest-lls32 init: ssh main process (1451) terminated with status 255
>  Dec 28 20:59:57 utest-lls32 init: ssh main process ended, respawning
>  Dec 28 20:59:57 utest-lls32 init: ssh main process (1455) terminated with status 255
>  Dec 28 20:59:57 utest-lls32 init: ssh respawning too fast, stopped
>
>  - after this, upstart won't know about sshd, despite the daemon
>  running just fine:
>
>  root@utest-lls32:~# reload ssh
>  reload: Unknown instance:
>
>  With the fix applied, the correct behavior is:
>
>  - send a HUP signal to sshd
>    ps ax |grep sshd
>    kill -HUP sshd
>  - the daemon reloads (/var/log/auth.log):
>
>  Dec 28 21:37:01 utest-lls32 sshd[742]: Received SIGHUP; restarting.
>  Dec 28 21:37:01 utest-lls32 sshd[742]: Server listening on 0.0.0.0 port 22.
>  Dec 28 21:37:01 utest-lls32 sshd[742]: Server listening on :: port 22.
>
>  - reloading with upstart gives the same result, and NOT an error
>  message.
>
>  REGRESSION POTENTIAL:
>
>  There is a small race condition in sshd between when it forks, and
>  when it listens for incoming connections. The length of the race is
>  lengthened by a very tiny amount by considering sshd started as soon
>  as it has been executed, rather than when it forks. This will only
>  affect jobs that use 'start on started ssh' and immediately connect to
>  it. This is unlikely to cause problems in any real world scenario,
>  given that most of these programs would also have to fork, exec, and
>  open a socket, which is more work than what sshd will be doing in that
>  time.
>
>
>