systemd mount targets fail due to device busy or already mounted
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
cloud-init |
Fix Released
|
High
|
Ryan Harper |
Bug Description
[Issue]
After rebooting a 16.04 AWS instance (ami-1d4e7a66) with several external disks attached, formatted, and added to /etc/fstab - systemd mount targets fail to mount with:
● media-v.mount - /media/v
Loaded: loaded (/etc/fstab; bad; vendor preset: enabled)
Active: failed (Result: exit-code) since Tue 2017-09-19 20:12:18 UTC; 1min 54s ago
Where: /media/v
What: /dev/xvdv
Docs: man:fstab(5)
Process: 1196 ExecMount=
Sep 19 20:12:17 ip-172-31-7-167 systemd[1]: Mounting /media/v...
Sep 19 20:12:17 ip-172-31-7-167 mount[1196]: mount: /dev/xvdv is already mounted or /media/v busy
Sep 19 20:12:18 ip-172-31-7-167 systemd[1]: media-v.mount: Mount process exited, code=exited status=32
Sep 19 20:12:18 ip-172-31-7-167 systemd[1]: Failed to mount /media/v.
Sep 19 20:12:18 ip-172-31-7-167 systemd[1]: media-v.mount: Unit entered failed state.
From the cloud-init logs, it appears that the the OVF datasource is mounting the device to find data:
2017-09-19 20:12:17,502 - util.py[DEBUG]: Peeking at /dev/xvdv (max_bytes=512)
2017-09-19 20:12:17,502 - util.py[DEBUG]: Reading from /proc/mounts (quiet=False)
2017-09-19 20:12:17,502 - util.py[DEBUG]: Read 2570 bytes from /proc/mounts
...
2017-09-19 20:12:17,506 - util.py[DEBUG]: Running command ['mount', '-o', 'ro,sync', '-t', 'iso9660', '/dev/xvdv', '/tmp/tmpw2tyqqid'] with allowed return codes [0] (shell=False, capture=True)
2017-09-19 20:12:17,545 - util.py[DEBUG]: Failed mount of '/dev/xvdv' as 'iso9660': Unexpected error while running command.
Command: ['mount', '-o', 'ro,sync', '-t', 'iso9660', '/dev/xvdv', '/tmp/tmpw2tyqqid']
Exit code: 32
Reason: -
Stdout: -
Stderr: mount: wrong fs type, bad option, bad superblock on /dev/xvdv,
In some cases useful info is found in syslog - try
2017-09-19 20:12:17,545 - util.py[DEBUG]: Recursively deleting /tmp/tmpw2tyqqid
2017-09-19 20:12:17,545 - DataSourceOVF.
[Vitals]
Version: 0.7.9-153-
OS: Ubuntu 16.04
Provider: AWS - ami-1d4e7a66
[Recreate]
To recreate this
1. Launch an AWS instance using AMI ami-1d4e7a66 and attach several disks (I used 25 additional disks)
2. Format and mount all 25:
mkdir /media/{b..z}
for i in {b..z}; do
mkfs -t ext4 /dev/xvd$i
mount /dev/xvd$i /media/$i
echo "/dev/xvd$i /media/$i ext4 defaults,nofail 0 2" >> /etc/fstab
done
3. reboot instance
Since this is a race, multiple may be necessary. A reproducer script is attached.
Related branches
- Scott Moser: Approve
- Server Team CI bot: Approve (continuous-integration)
-
Diff: 309 lines (+221/-27)3 files modifiedcloudinit/sources/DataSourceOVF.py (+47/-27)
cloudinit/tests/helpers.py (+10/-0)
tests/unittests/test_datasource/test_ovf.py (+164/-0)
Changed in cloud-init: | |
importance: | Undecided → High |
assignee: | nobody → Ryan Harper (raharper) |
Changed in cloud-init: | |
status: | In Progress → Fix Committed |
The current analysis is that any parallel invocation of mount which specifies the same device is going to result in one of the two mounts to fail with EBUSY (or some other failure) do to how the underlying device handles either opens or mounts in the kernel. This is recreatable outside of cloud-init when running mounts in parallel pointing to the same disk.
# rw mount loop /lasttouch
while true; do
umount -f ${MNT_POINT} |:
mount ${DISK} ${MNT_POINT} -t ext4 -o defaults || {
echo "`date +%s.%N`: mount failed on COUNT=$COUNT";
exit 1
}
echo "`date +%s.%N`: COUNT=$COUNT" > ${MNT_POINT}
COUNT=$(($COUNT + 1))
done
# ro mount loop
while true; do
dd if=${DISK} bs=512 count=1 of=/dev/null
# we expect this to fail, that's OK
mount -o ro,sync -t iso9660 ${DISK} ${MNT_POINT} |:
echo "`date +%s.%N`: READ COUNT=$COUNT"
COUNT=$(($COUNT + 1))
done
There isn't currently a general solution to prevent this race from occuring; so the focus is on reducing the likelihood of a race, and configuration changes to be resilient in the face of the race.
The linked changes to cloud-init will reduce the number of block devices that OVF will probe. The OVF Datasource is looking for block devices with ISO9660 filesystems; on an instance that has no ISO9660 filesystems, OVF will not perform any mount operations.
When crafting fstab entries, one can set the 6th column, (fs_passno) to zero (0)
and systemd will not attempt to fsck the device prior to mounting. This does not remove
the mount race, but it does reduce the amount of time between when the mount unit starts and when it completes.