Dependency resolver causes hang on boot

Bug #1223745 reported by Stefan Bader
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
mountall (Ubuntu)
Fix Released
High
Unassigned

Bug Description

After upgrading to version 2.50 I had my server hang on boot without any useful messages. I could track this to a NFS type mount in fstab which is mounted under /home and that is another mounted fs in fstab. Like

/dev/xxx /home ext4 defaults 0 2
xxx:/srv/img /home/img nfs ro,nfsvers=3 0 0

Verified that with mountall v2.49 this works (kind of as it seems to be done first time when network has not finished setting resolv.conf from the info DHCP returns). But at least boot finishes and by the time one can log in the nfs mount is done.

Unlike with version 2.50 which either hangs, or when the mountpoint is moved somewhere under /, it boots but never mounts the NFS filesystem.

Since one of the bigger changes from 2.49 is trying to do those dependant mounts in the right order I suspect this caused the current situation.

ProblemType: Bug
DistroRelease: Ubuntu 13.10
Package: mountall 2.49
ProcVersionSignature: Ubuntu 3.11.0-7.13-generic 3.11.0
Uname: Linux 3.11.0-7-generic x86_64
.run.mount.utab:

ApportVersion: 2.12.1-0ubuntu3
Architecture: amd64
Date: Wed Sep 11 11:17:22 2013
InstallationDate: Installed on 2013-05-22 (111 days ago)
InstallationMedia: Ubuntu-Server 13.10 "Saucy Salamander" - Alpha amd64 (20130521)
MarkForUpload: True
ProcKernelCmdline: BOOT_IMAGE=/boot/vmlinuz-3.11.0-7-generic root=UUID=638ccd94-ce45-46d9-b3f5-2e790862fd18 ro video=640x480 console=tty0 console=ttyS0,57600n8
SourcePackage: mountall
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Stefan Bader (smb) wrote :
Revision history for this message
Steve Langasek (vorlon) wrote :

Hi Stefan,

If you bring up the plymouth splash screen (hitting 'esc' at the video console, or booting with 'splash'), does it let you skip the stalled mount? And, which mount does it say is missing?

Can you boot with --verbose + splash on the kernel commandline, skip any missing mounts, and then attach /var/log/upstart/mountall.log?

Changed in mountall (Ubuntu):
status: New → Incomplete
importance: Undecided → High
Revision history for this message
Steve Langasek (vorlon) wrote :

Ok, I see from the existing attached logfiles that the /home/cloud-images mountpoint is being incorrectly tagged 'local' by mountall instead of 'remote', which causes a deadlock waiting for idmapd and/or gssd to be started. I'm not sure why this is being tagged 'local', but I'll see if it's reproducible here.

Changed in mountall (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Steve Langasek (vorlon) wrote :

Ok, I can't reproduce this in a VM, with /home as a separate partition and an nfs mount on a subdirectory. And mountall --verbose shows the nfsmount correctly tagged as 'remote', not 'local'. The original questions stand.

Changed in mountall (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Stefan Bader (smb) wrote :

So booting with splash or pressing ESC otherwise brings up the splash screen but otherwise does not show any hint about waiting on something and neither allows to skip anything by blindly typing 'S'. (Not to mention how much I like this whole plymouth mess as my servers have console on a serial line normally, but that is another (long) story).

Luckily this box is multiboot, so I obtained mountall.log from another release after ctrl-alt-delete the waiting mountall with --verbose.

Revision history for this message
Stefan Bader (smb) wrote :

Oh, forgot to mention, this still behaves the same way, even with the newer nfs-common which came in this morning.

Revision history for this message
Stefan Bader (smb) wrote :

Hm, so I just upgraded another server with a very similar setup (just home is LVM and / is mounted by label). In that case the nfs mount is considered nowait but seems also counted as local when I look at the summary in mountall.log. Oh, and it does mount the nfs mount under /mnt but doing that on the first host caused the mount to never be done.

...
/mnt/cloud-images is nowait
...
local 4/4 remote 0/0 virtual 13/13 swap 1/1
mounting event handled for /mnt/cloud-images
mounting /mnt/cloud-images
mount.nfs: Failed to resolve server nano: Name or service not known
mountall: mount /mnt/cloud-images [1451] terminated with status 32
Filesystem could not be mounted: /mnt/cloud-images
mountall: Disconnected from Plymouth

The NFS mount does succeed eventually but it seems at least the first attempts are made before network is correctly up. But the main thing seems to be nowait and I cannot say where or why that comes from.

Revision history for this message
Steve Langasek (vorlon) wrote : Re: [Bug 1223745] Re: Dependency resolver causes hang on boot

On Thu, Sep 12, 2013 at 08:17:12AM -0000, Stefan Bader wrote:
> /mnt/cloud-images is nowait

Expected behavior; the mount point is outside of the "core" filesystem, so
mountall does not block the boot waiting for it by default. You could mark
the filesystem 'bootwait' in /etc/fstab, in which case mountall would tag it
as 'remote' instead (... at least, that's what *should* happen!).

> ...
> local 4/4 remote 0/0 virtual 13/13 swap 1/1
> mounting event handled for /mnt/cloud-images
> mounting /mnt/cloud-images
> mount.nfs: Failed to resolve server nano: Name or service not known
> mountall: mount /mnt/cloud-images [1451] terminated with status 32
> Filesystem could not be mounted: /mnt/cloud-images
> mountall: Disconnected from Plymouth

Also expected behavior (and I think there's at least one other open bug
complaining about this). mountall knows that this is a network mount, but
has no way to determine *which* network interface is required in order to
reach it, so it will retry the mount after each network interface comes
up... including lo. So yes, this results in a bit of noise in the logs.

So all in all, I don't think there's anything here that explains the problem
on the other machine, which is specific to having the NFS mount under /home.

Revision history for this message
Stefan Bader (smb) wrote :

Just for the fun of it. The machine that had the NFS mount under /mnt will lock up the same way as the other one as soon as I move that under /home. Also thinking that the NFS fs is "local".

Revision history for this message
Stefan Bader (smb) wrote :

And repeating the setup in a VM I do get the same lockup, too. Hm, Steve how is your setup? I basically have (beyond the root):

LABEL=home /home ext4 relatime 0 2
<hostname>:/srv/images/cloudimg /home/cloud-images ro,nfsvers=3 0 0

And in the VM case mountall.log again shows cloud-images as "local". For home I added another virtual disk here. In the other cases I have another partition or another LV, so at least it seems not directly related to what kind of storage volume home is on.

Revision history for this message
Stefan Bader (smb) wrote :

Ok, one more additional note. The VM behaves like the first host when the mountpoint is /mnt/cloud-images, meaning its seen as nowait but after the initial failure to mount while network is not ready, it seems not to retry and the NFS mount has to be done manually.

Steve Langasek (vorlon)
Changed in mountall (Ubuntu):
status: Incomplete → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package mountall - 2.51

---------------
mountall (2.51) unstable; urgency=low

  * Fix tagging of filesystems to not have local/remote inheritance
    overridden; otherwise we will mis-tag various mounts and deadlock the
    boot. Also fixes an inconsistency with the inheritance of
    'bootwait'/'nobootwait' flags depending on the order of mounts in
    /etc/fstab: we now always treat the 'nobootwait' flag as applying to
    submounts. LP: #1223745, LP: #1153672.

 -- Steve Langasek <email address hidden> Fri, 13 Sep 2013 22:23:55 -0700

Changed in mountall (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.