sshd never stops, prevents umount of /usr partition

Bug #603363 reported by Rob Donovan
46
This bug affects 7 people
Affects Status Importance Assigned to Milestone
openssh (Ubuntu)
Fix Released
Medium
Colin Watson
Lucid
Fix Released
High
Clint Byrum

Bug Description

Under Ubuntu 10.04 Lucid, sshd is an upstart job controlled by /etc/init/ssh.conf This file provides for start and stop as follows:

start on filesystem
stop on runlevel S

At shutdown or reboot, therefore, sshd is not stopped.

Since sshd is in /usr/sbin/sshd and also accesses lib files in /usr/lib this means that

/etc/rc0.d/S40umountfs

cannot successfully umount /usr at shutdown when /usr is on its own partition.

This definitely leads to umount reporting errors in the shutdown console messages.

It may also lead to fsck running on reboot and problems with mountall... I can't say for certain yet as I am also having problems umounting /var, possibly due to a power failure, which is what led me to notice and investigate these messages.

My guess is that when sshd was a System V init process, it was killed by the killall5 process in /etc/rc0.d/S20sendsigs. Under Lucid sshd has been made an upstart job and as such is exempt from the killall5 and so needs to be stopped explicitly.

I admit I am by no means an expert on upstart or sshd, but the fix appears to me to be to modify /etc/init/ssh.conf to read

stop on runlevel [!2345]

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: openssh-server 1:5.3p1-3ubuntu4
ProcVersionSignature: Ubuntu 2.6.32-22.36-generic 2.6.32.11+drm33.2
Uname: Linux 2.6.32-22-generic x86_64
Architecture: amd64
Date: Thu Jul 8 14:45:50 2010
InstallationMedia: Ubuntu 10.04 LTS "Lucid Lynx" - Release amd64 (20100427.1)
ProcEnviron:
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: openssh

== SRU Report ==

=== Impact / Justification ===

This bug can lead to filesystem corruption if /usr is on a separate filesystem or if sshd isn't restarted on libc6 upgrade (which is almost guaranteed due to bug #531912)

=== Dev fix ===

This was fixed in the dev branch by making sshd stop on runlevel [!2345], which is fairly standard for network services.

=== Patch ===

See attached debdiff

=== TEST CASE: ===

On a lucid system with all updates applied
1. sudo apt-get install openssh-server
2. verify it is started and running with 'sudo status ssh'
3. sudo apt-get install --reinstall libc6
4. If portmap is installed, Manually stop the portmap daemon which has a similar problem (sudo stop portmap)
5. upon reboot, look in dmesg for the word 'Orphaned'
6. upgrade system and repeat steps 3-5, you should not see Orphaned anymore

=== REGRESSION POTENTIAL ===

Its possible that people expect sshd to still be reachable until the network is shutdown, which is fairly late in the shutdown process.

Related branches

Revision history for this message
Rob Donovan (hikerman2005-ubuntu) wrote :
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Marking this as medium, and investigating.

Changed in openssh (Ubuntu):
importance: Undecided → Medium
Revision history for this message
Rob Donovan (hikerman2005-ubuntu) wrote :

Jolly good. I'm blogging my progress on this thread

http://ubuntuforums.org/showthread.php?t=1474942

which also describes the boot time messages that _may_ be related to this problem.
I'll post here directly if/when I can confirm that they are/aren't.

Revision history for this message
Colin Watson (cjwatson) wrote :

At some point sendsigs was changed to omit processes under upstart's control, and apparently I didn't notice. I've made the appropriate change for the next openssh upload. Thanks.

Changed in openssh (Ubuntu):
assignee: nobody → Colin Watson (cjwatson)
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package openssh - 1:5.5p1-4ubuntu2

---------------
openssh (1:5.5p1-4ubuntu2) maverick; urgency=low

  * Stop Upstart job on runlevel [!2345] rather than just S, since
    /etc/init.d/sendsigs no longer kills jobs under Upstart's control
    (thanks, Rob Donovan; LP: #603363).
 -- Colin Watson <email address hidden> Fri, 09 Jul 2010 12:21:17 +0100

Changed in openssh (Ubuntu):
status: New → Fix Released
Changed in openssh (Ubuntu):
status: Fix Released → Triaged
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

I was able to reproduce this by installing with the minimal ISO in a VM with a separate /usr partition, then installing openssh-server, then halting the machine.

This caused errors to be printed out because /usr had files open. The system simply remounted /usr readonly and rebooted, which in effect caused the fs to be clean so the fsck was normal. Still, this seems to be improper behavior and ssh-server should be stopped when the system is rebooting/halting.

I don't quite understand why /etc/init/ssh.conf has 'stop on runlevel S'. Does this runlevel ever occur after normal system bootup?

Revision history for this message
Colin Watson (cjwatson) wrote :

Clint, thanks for the investigation, but please see earlier messages on this bug - I've already fixed this.

Changed in openssh (Ubuntu):
status: Triaged → Fix Released
Revision history for this message
Spuerhund (spuerhund) wrote :

Will the fix also be released for Lucid or only for future versions of Ubuntu?

Revision history for this message
Daniel Néri (dne) wrote :

Why is this bug still not fixed in Lucid?

Revision history for this message
Paul van Berlo (pvanberlo) wrote :

Good question. I'm experiencing some issues with the root filesystem (only one fs on this system) being busy on reboots, running a lsof from umountroot shows that sshd is still running, which could be a reason for the root fs to be busy. How can this be fixed properly in lucid?

Revision history for this message
Oliver Siegmar (osiegmar) wrote :

Sometimes it is not even possible to remount the filesystem readonly. This can cause damage to the filesystem, thus this issue should get high priority.

Changed in openssh (Ubuntu Lucid):
status: New → Incomplete
status: Incomplete → Confirmed
Changed in openssh (Ubuntu Lucid):
importance: Undecided → High
assignee: nobody → Clint Byrum (clint-fewbar)
Revision history for this message
Oliver Siegmar (osiegmar) wrote :

I think the problem is the upstart configuration for the ssh daemon /etc/init/ssh.conf -

stop on runlevel S

Shouldn't that be:

stop on runlevel [!2345]

?

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Yes, that has been fixed as of Ubuntu 10.10. The fix just needs to be backported in an SRU to 10.04.

Revision history for this message
Oliver Siegmar (osiegmar) wrote :

I hope it will be part of 10.04.2

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Attaching the backport to lucid's upstart job file as a debdiff (I was unable to bzr branch lp:ubuntu/lucid-updates/openssh .. likely a package import problem).

Will subscribe sponsors for upload to proposed and add SRU info to the description as well.

description: updated
Revision history for this message
Colin Watson (cjwatson) wrote : Re: [Bug 603363] Re: sshd never stops, prevents umount of /usr partition

A more useful branch to start from might be
lp:~cjwatson/ubuntu/lucid/openssh/lucid-proposed. But I'll deal with it
on Monday anyway.

Revision history for this message
Michael Vogt (mvo) wrote :

I uploaded the SRU to lucid-proposed now and unsubscribed ubuntu-sponsors and subscribed ubuntu-sru.

Many thanks for this update Clint!

Changed in openssh (Ubuntu Lucid):
status: Confirmed → In Progress
Revision history for this message
Martin Pitt (pitti) wrote : Please test proposed package

Accepted openssh into lucid-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in openssh (Ubuntu Lucid):
status: In Progress → Fix Committed
tags: added: verification-needed
Revision history for this message
Oliver Siegmar (osiegmar) wrote :

Test was successful - the /usr partition could be unmounted correctly. One strange thing remains - the /var partition can only be mounted in read-only mode, but that might have nothing to do with this issue.

Martin Pitt (pitti)
tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package openssh - 1:5.3p1-3ubuntu6

---------------
openssh (1:5.3p1-3ubuntu6) lucid-proposed; urgency=low

  * Stop Upstart job on runlevel [!2345] rather than just S, since
    /etc/init.d/sendsigs no longer kills jobs under Upstart's control
    (thanks, Rob Donovan; LP: #603363).
 -- Clint Byrum <email address hidden> Sat, 12 Feb 2011 08:38:43 -0800

Changed in openssh (Ubuntu Lucid):
status: Fix Committed → Fix Released
Revision history for this message
cgi (vianetworks) wrote :

After applying this patch today and doing a reboot I COULDN'T ACCESS my (virtual) server via ssh (Ubuntu 10.04). The sshd-service failed to start and report these two lines in /var/log/daemon.log:

------------------------------
init: ssh main process terminated with status 255
init: Failed to spawn ssh pre-start process: unable to set oom adjustment: Operation not permitted
------------------------------

I have to replace the new "/etc/init/ssh.conf" with the old config-file and to reboot again. Fortunally the update-manager had made a backup of the old file.

best regards,
cgi

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

cgi, can you post the broken file and the one that works?

I did rather extensive testing, rebooting with the package 1:5.3p1-3ubuntu6 in several ways, and have not seen this sort of failure.

tags: added: testcase
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.