ROOTDELAY= causes unnecessary delay in boot

Bug #1615751 reported by Scott Moser
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-images
Fix Released
High
Unassigned
initramfs-tools (Ubuntu)
Fix Released
High
Martin Pitt
Yakkety
Fix Released
High
Martin Pitt

Bug Description

I launched 2 different yakkety systems on azure instance types:
  Basic_A0
  Standard_D1_v2
More information what those things are at :
  https://azure.microsoft.com/en-us/pricing/details/virtual-machines/linux/

systemd-analize says:

## Basic A0
smoser@sm-y-d1v2-1:~$ systemd-analyze
Startup finished in 5min 8.172s (kernel) + 42.670s (userspace) = 5min 50.842s

## Standard D1 v2
$ systemd-analyze
Startup finished in 5min 13.347s (kernel) + 1min 31.532s (userspace) = 6min 44.879s

Systemd thinks that more than 5 minutes passed between the kernel being loaded and /sbin/init being run. I'm pretty sure this isn't just a bad clock, as the system actually is not reachable for something like 6 minutes after boot, so it seems sane. Additionally, I have some scripts that compare launch time and code ran inside the vm via cloud-init and they agree.

For example:
  starting at Mon, 22 Aug 2016 15:37:43 +0000
  uptime: 349.77 seconds
  you launched me at: Mon, 22 Aug 2016 15:30:11 +0000
  it is now : Mon, 22 Aug 2016 15:37:43 +0000
  kernel booted : Mon, 22 Aug 2016 15:31:54 +0000
  launch to kernel boot: 103 seconds
  launch to user-data : 452 seconds

The full dmesg is attached, but interesting bits are
[ 0.815692] Trying to unpack rootfs image as initramfs...
[ 1.289093] Freeing initrd memory: 35660K (ffff88001d47d000 - ffff88001f750000)
...
[ 7.820798] input: TPPS/2 IBM TrackPoint as /devices/platform/i8042/serio1/input/input3
[ 7.834419] input: AT Translated Set 2 keyboard as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/vmbus_5/serio2/input/input5
[ 72.464227] random: nonblocking pool is initialized
[ 305.527657] Btrfs loaded
[ 305.628091] blk_update_request: I/O error, dev fd0, sector 0
[ 305.632007] floppy: error -5 while reading block 0
[ 306.027898] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)
[ 308.901437] systemd[1]: systemd 231 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ -LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN)
[ 308.925746] systemd[1]: Detected virtualization microsoft.
[ 308.933658] systemd[1]: Detected architecture x86-64.
[ 308.999041] systemd[1]: Set hostname to <ubuntu>.
...

ProblemType: Bug
DistroRelease: Ubuntu 16.10
Package: linux-image-4.4.0-34-generic 4.4.0-34.53
ProcVersionSignature: User Name 4.4.0-34.53-generic 4.4.15
Uname: Linux 4.4.0-34-generic x86_64
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Aug 22 15:37 seq
 crw-rw---- 1 root audio 116, 33 Aug 22 15:37 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.3-0ubuntu7
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
Date: Mon Aug 22 17:31:18 2016
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
Lsusb: Error: command ['lsusb'] failed with exit code 1:
MachineType: Microsoft Corporation Virtual Machine
PciMultimedia:

ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 hyperv_fb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.4.0-34-generic root=UUID=314ea127-227e-4b64-b5b4-d70ab0fb7eb7 ro console=tty1 console=ttyS0 earlyprintk=ttyS0 rootdelay=300
RelatedPackageVersions:
 linux-restricted-modules-4.4.0-34-generic N/A
 linux-backports-modules-4.4.0-34-generic N/A
 linux-firmware 1.160
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 05/23/2012
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 090006
dmi.board.name: Virtual Machine
dmi.board.vendor: Microsoft Corporation
dmi.board.version: 7.0
dmi.chassis.asset.tag: 7783-7084-3265-9085-8269-3286-77
dmi.chassis.type: 3
dmi.chassis.vendor: Microsoft Corporation
dmi.chassis.version: 7.0
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr090006:bd05/23/2012:svnMicrosoftCorporation:pnVirtualMachine:pvr7.0:rvnMicrosoftCorporation:rnVirtualMachine:rvr7.0:cvnMicrosoftCorporation:ct3:cvr7.0:
dmi.product.name: Virtual Machine
dmi.product.version: 7.0
dmi.sys.vendor: Microsoft Corporation

Revision history for this message
Scott Moser (smoser) wrote :
description: updated
Scott Moser (smoser)
description: updated
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote : Re: kernel boot (not userspace) on azure takes > 5 minutes

Does this issue also happen with Xenial or earlier kernels?

Changed in linux (Ubuntu):
importance: Undecided → High
tags: added: kernel-da-key
Revision history for this message
Scott Moser (smoser) wrote :

I just launched 1 instance of xenial and of trusty with the same user-data, both of these were 'Standard_D1_v2' type. They do not seem to have such bad performance. trusty does not have systemd-analyze, but i've included 'dmesg | grep sda1.*mounted' for each, which shows quite a difference to the point where sda1 is mounted (and then re-mounted rw).

I've also included the initramfs-tools version as xenial and yakkety are actually running the same kernel version. So something else is at play here.

# Trusty

uptime: 49.59 seconds
you launched me at: Mon, 22 Aug 2016 18:24:45 +0000
it is now : Mon, 22 Aug 2016 18:27:08 +0000
kernel booted : Mon, 22 Aug 2016 18:26:19 +0000
launch to kernel boot: 94 seconds
launch to user-data : 143 seconds

$ dmesg | grep sda1.*mounted
[ 3.860600] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)
[ 6.797368] EXT4-fs (sda1): re-mounted. Opts: discard

$ uname -r
4.4.0-34-generic

$ dpkg-query --show initramfs-tools
initramfs-tools 0.103ubuntu4.4

# xenial
uptime: 47.52 seconds
you launched me at: Mon, 22 Aug 2016 18:23:50 +0000
it is now : Mon, 22 Aug 2016 18:26:34 +0000
kernel booted : Mon, 22 Aug 2016 18:25:47 +0000
launch to kernel boot: 117 seconds
launch to user-data : 164 seconds
$ systemd-analyze
Startup finished in 7.597s (kernel) + 40.875s (userspace) = 48.472s

$ dmesg | grep sda1.*mounted
[ 5.795233] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)
[ 12.130461] EXT4-fs (sda1): re-mounted. Opts: discard

$ uname -r
4.4.0-34-generic
$ dpkg-query --show initramfs-tools
initramfs-tools 0.122ubuntu8.1

# yakkety
$ dmesg | grep sda1.*mounted[ 305.882955] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)
[ 306.921043] EXT4-fs (sda1): re-mounted. Opts: discard

$ uname -r
4.4.0-34-generic
$ dpkg-query --show initramfs-tools
initramfs-tools 0.125ubuntu2

Changed in initramfs-tools (Ubuntu Yakkety):
status: New → Confirmed
importance: Undecided → High
no longer affects: linux (Ubuntu)
no longer affects: linux (Ubuntu Yakkety)
Revision history for this message
Scott Moser (smoser) wrote :

I moved this to initramfs-tools. I made the following change, ran 'update-grub' and rebooted. after reboot:
$ dmesg | grep 'sda1.*mou'
[ 5.614279] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)
[ 6.452427] EXT4-fs (sda1): re-mounted. Opts: discard

$ diff -u /etc/default/grub.d/50-cloudimg-settings.cfg.orig /etc/default/grub.d/50-cloudimg-settings.cfg
--- /etc/default/grub.d/50-cloudimg-settings.cfg.orig 2016-08-22 18:47:12.878049696 +0000
+++ /etc/default/grub.d/50-cloudimg-settings.cfg 2016-08-22 18:47:18.306335571 +0000
@@ -2,7 +2,7 @@
 # CLOUD_IMG: This file was created/modified by the Cloud Image build process

 # Set the default commandline
-GRUB_CMDLINE_LINUX_DEFAULT="console=tty1 console=ttyS0 earlyprintk=ttyS0 rootdelay=300"
+GRUB_CMDLINE_LINUX_DEFAULT="console=tty1 console=ttyS0 earlyprintk=ttyS0"

 # Set the grub console type
 GRUB_TERMINAL=serial

summary: - kernel boot (not userspace) on azure takes > 5 minutes
+ ROOTDELAY= causes unnecessary delay in boot
Revision history for this message
Scott Moser (smoser) wrote :

it seems that ROOTDELAY is the culprit here, and there were changes brought that mention rootdelay with yakkety merged initramfs-tools with debian.

Revision history for this message
Scott Moser (smoser) wrote :

some discussion https://irclogs.ubuntu.com/2016/08/23/%23ubuntu-devel.html#t11:36 including an un-tested fix from me.

Revision history for this message
Steve Langasek (vorlon) wrote :

> pitti smoser: ah, I remember -- indeed, this rootdelay= thing is FUBAR; TBH I don't see how to salvage it to anything remotely useful
> pitti smoser: IIRC, the problem was to define what "the necessary device" actually is
> pitti i. e. not just waiting for the root device, which we've done with udev's wait-for-root since 2006 or so
> pitti so if someone actually wants to use it, it's just a plain sleep(given_number) in the initramfs ATM, as there is no other definition of what it should do

Martin, there may not be any sort of definition upstream in Debian for what this did, but it has well-established semantics in Ubuntu that this is the *maximum* time that the initramfs should wait for the root device to be available, and if the timeout is hit the initramfs should drop you to a shell; but it should not cause the initramfs to wait longer. So this is certainly a regression. Can you please follow up on this?

Changed in initramfs-tools (Ubuntu Yakkety):
assignee: nobody → Martin Pitt (pitti)
Revision history for this message
Martin Pitt (pitti) wrote :

Right, as I said on IRC I'm more than happy to just entirely drop the static sleep in /usr/share/initramfs-tools/init, as this is a completely broken/useless concept. Using it in the wait-for-root call in /usr/share/initramfs-tools/scripts/local similar to the wait-for-root call in /usr/share/initramfs-tools/scripts/local-premount/fixrtc seems good indeed.

Scott's patch in http://paste.ubuntu.com/23081737/ LGTM, I'll test/apply it ASAP. Thanks!

Changed in initramfs-tools (Ubuntu Yakkety):
status: Confirmed → In Progress
Martin Pitt (pitti)
Changed in initramfs-tools (Ubuntu Yakkety):
status: In Progress → Fix Committed
Changed in cloud-images:
status: New → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package initramfs-tools - 0.125ubuntu3

---------------
initramfs-tools (0.125ubuntu3) yakkety; urgency=medium

  * init: Revert Debian commit ab91306. Interpreting "rootdelay" as a static
    sleep is complete nonsense. (LP: #1615751)
  * scripts/local: Let rootdelay determine the wait-for-root timeout instead
    of the shell polling loop that comes after it, as we should never hit the
    latter in Ubuntu (where we require udev everywhere). Thanks to Scott Moser
    for this patch!

 -- Martin Pitt <email address hidden> Fri, 26 Aug 2016 12:54:37 +0200

Changed in initramfs-tools (Ubuntu Yakkety):
status: Fix Committed → Fix Released
Revision history for this message
Scott Moser (smoser) wrote :

Verified this is fixed. Thank you!

Changed in cloud-images:
status: Fix Committed → Fix Released
importance: Undecided → High
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.