Stopping ssh with a logged in user causes init to spin at 100%

Bug #926468 reported by Jason Conti
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
upstart
Fix Released
High
James Hunt
openssh (Ubuntu)
Fix Released
High
Colin Watson
Precise
Fix Released
High
Colin Watson
upstart (Ubuntu)
Fix Released
High
James Hunt
Precise
Fix Released
High
James Hunt

Bug Description

Steps to reproduce:

1) Boot the machine with --log on the kernel command line
2) Log into the machine through ssh
3) Execute: sudo stop ssh;
4) The init process will spin with EAGAIN on one if its file descriptors
5) Exit the remote ssh session and the service will stop

The process appears to be looping in init/log.c:log_read_watch.

Jason Conti (jconti)
summary: - Stopping ssh with a logged in user causes init to spin at %100
+ Stopping ssh with a logged in user causes init to spin at 100%
Steve Langasek (vorlon)
Changed in upstart (Ubuntu):
status: New → Triaged
importance: Undecided → High
Revision history for this message
Jason Conti (jconti) wrote :

I think the attached source file, along with a simple upstart conf with: exec /path/to/test-daemon may be a minimal example of the behavior.

sshd is spawned by upstart with -D so the initial process doesn't daemonize. The upstart logging mechanism will setup a pty for stdout/stderr. When receiving a connection, sshd forks and daemonizes, but looking at /proc/pid/fd/ it appears to omit redirecting stderr to /dev/null, so the pty is still opened on fd 2.

So when the ssh service is stopped, the inital sshd instance run with -D is terminated, but the child sshd keeps running and still has the pty open. Then when upstart flushes the log and waits for the fd to error out in log_read_watch, it loops forever with EAGAIN until the child exits.

I think that is what is going on anyway, and the attached code exhibits similar behavior.

Revision history for this message
Colin Watson (cjwatson) wrote :

As discussed with James, we should make openssh use 'console none' to avoid leaking the fd (well, there are multiple ways to do this, but this is probably least invasive), but we should also convert log_read_watch to integrate with upstart's main select loop so that it doesn't behave this way with any other jobs that may have similar behaviour.

Changed in openssh (Ubuntu Precise):
assignee: nobody → Colin Watson (cjwatson)
importance: Undecided → High
status: New → Triaged
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package openssh - 1:5.9p1-2ubuntu2

---------------
openssh (1:5.9p1-2ubuntu2) precise; urgency=low

  * Make Upstart job 'console none', to avoid pathological behaviour due to
    leaking Upstart's log fd 2 to child processes (LP: #926468).
 -- Colin Watson <email address hidden> Mon, 06 Feb 2012 15:52:58 +0000

Changed in openssh (Ubuntu Precise):
status: Triaged → Fix Released
Revision history for this message
James Hunt (jamesodhunt) wrote :

@Jason: thanks for taking the time to look at this. Yes, I can confirm your understanding of the problem as I had come to exactly the same conclusion! :-) I have raised a bug on openssh for the stderr fd leak: https://bugzilla.mindrot.org/show_bug.cgi?id=1976

James Hunt (jamesodhunt)
Changed in upstart (Ubuntu Precise):
assignee: nobody → James Hunt (jamesodhunt)
James Hunt (jamesodhunt)
Changed in upstart:
assignee: nobody → James Hunt (jamesodhunt)
importance: Undecided → High
status: New → Fix Committed
Revision history for this message
James Hunt (jamesodhunt) wrote :

If you would like to help with testing the fix for this bug, I've put an updated version of Upstart (version '1.4-0ubuntu7~jh') in my upstart-job-logging PPA:

  https://launchpad.net/~jamesodhunt/+archive/upstart-job-logging/

Please install and provide feedback.

Changed in upstart (Ubuntu Precise):
status: Triaged → In Progress
Revision history for this message
Jason Conti (jconti) wrote :

Testing 1.4-0ubuntu7~jh I haven't been able to reproduce the issue using the test code above or commenting out 'console none' in ssh.conf, so seems to be fixed here.

Revision history for this message
James Hunt (jamesodhunt) wrote :

@Jason: thanks for the feedback.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package upstart - 1.4-0ubuntu7

---------------
upstart (1.4-0ubuntu7) precise; urgency=low

  * debian/manpages/upstart-events.7: Added missing events 'container' and
    'not-container'.
  * Update 'runlevel' event emission time in upstart-events(7) (LP: #921501).
  * Merge of important upstream log fixes to avoid spinning when a job
    which leaks fds (such as 'sshd -D') is stopped (LP: #926468).
  * Merge of lp:~jamesodhunt/upstart/early-job-log-flush to allow jobs that
    _end_ very early in the boot to have their output logged.
  * debian/conf/flush-early-job-log.conf: Upstart job to force flushing of
    early job log output data when disk becomes writeable.
  * Re-enable job logging by default.

  [ Steve Langasek ]
  * init/tests/test_job_process.c: close all fds before running tests that
    check for unexpected open fds, so that files leaked from the environment
    don't cause the test suite to fail gratuitously. LP: #926473.
  * bump debhelper compat to 9 and adjust debian/rules to use the
    dpkg-buildflags interfaces, so that upstart can be cross-compiled and
    also so we pick up any future hardening flags for free.
 -- Steve Langasek <email address hidden> Thu, 16 Feb 2012 07:52:11 +0000

Changed in upstart (Ubuntu Precise):
status: In Progress → Fix Released
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Shouldn't this be Fix Released?

James Hunt (jamesodhunt)
Changed in upstart:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.