Karmic Boot hangs at "Configuring network interfaces"

Bug #399954 reported by Tim Gardner
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
dhcp3 (Ubuntu)
Fix Released
Medium
Jamie Strandboge
sysvinit (Ubuntu)
Fix Released
Medium
Jamie Strandboge

Bug Description

On a Tylersburg-EP (dual Nehalem CPU), the boot process stops at "Configuring network interfaces". If I set VERBOSE="yes" inside /etc/init.d/networking , then "runlevel: No such file or directory" is printed ad infinitum. I found that if I added the '-v' option to /sbin/ifup, then all works well. This behaviour started with 2.6.31-2 and persists in 2.6.31-3. Prior 2.6.30 based kernels work fine.

Revision history for this message
Anton Kraus (done) wrote :

Same here on an x86 system (Pentium 4 with Hyperthreading).

My network hardware:

02:05.0 Ethernet controller: Broadcom Corporation BCM4401 100Base-T (rev 01)
 Subsystem: ASUSTeK Computer Inc. Device 80a8
 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
 Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
 Latency: 32
 Interrupt: pin A routed to IRQ 20
 Region 0: Memory at dc000000 (32-bit, non-prefetchable) [size=8K]
 Capabilities: [40] Power Management version 2
  Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
  Status: D0 PME-Enable- DSel=0 DScale=2 PME-
 Kernel driver in use: b44
 Kernel modules: b44

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

I think this is actually the apparmor if-pre-up.d script going into an infinite loop, and I'm not quite sure why. It may be that the runlevel output has changed in some way, perhaps.

To confirm my hypothesis, could you check whether your /etc/network/if-pre-up.d/dhclient3-apparmor script does indeed call runlevel in a while loop.

If it does, could you make a few changes to this script:

Before the while loop, add the following line:
  strace runlevel > /dev/runlevel.log

This should tell us what runlevel is doing (and outputting). You'll need to comment out the entire while loop so you can boot and give me that log file :-)

Changed in upstart (Ubuntu):
importance: Undecided → Medium
status: New → Incomplete
Revision history for this message
Tim Gardner (timg-tpi) wrote :

See attached dhclient3-apparmor scripts as well as the runlevel strace output.

Revision history for this message
Tim Gardner (timg-tpi) wrote :
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

Thanks Tim.

The bug is in the dhcp3-client script, specifically this bit:

# Wait for apparmor to load
while [ ! -e "$AAPROFILES" ]; do
    # If apparmor is not loaded by the time we leave rcS, we go into S from
    # another runlevel, or are in a non-S runlevel, just exit
    runlevel | grep -E -q '( [0-9]|[0-9] S)' && exit 0
    sleep 1
done

For various reasons, this is _NEVER_ going to work!

Firstly we call "ifup" from udev for most standard network devices, and this happens very early in the boot sequence. At this point the /var/run/utmp file doesn't exist, so runlevel will output that error and exit because it can't find the file.

Secondly "runlevel" inherently returns undefined data when running through rcS.d, because you have not yet entered a runlevel. rcS.d is not the "single-user runlevel", it is the sysinit phase; until this is completed, you are neither in single-user mode *or* multi-user mode. You're still bootstrapping the system.

In fact, during rcS.d runlevel will always exit with an error code; if you happen to catch it before /var/run/utmp is created you'll get that error - if you catch it after there won't be a runlevel record in there yet, so you'll get "unknown"

I'm not entirely sure why you're grepping for "we go into S from another runlevel", going into the S runlevel (single-user mode) does not invoke anything in rcS.d -- rc1.d is used to do that. I could not find any mention of apparmor in /etc/rc1.d

This also doesn't cope with another possibility, that you *boot* into single-user mode. In this case runlevel will output (after finishing rcS.d and running sulogin)

   N S

ie. you entered single-user mode (the S runlevel) directly, without a previous runlevel.

Now what happens if you enter rc2 ? Your runlevel output will be:

  S 2

ie. you entered runlevel 2 from single-user mode.

(Booting directly into multi-user mode, you would get

  N 2

because you booted directly into it - you don't go "via" single-user to get there)

It also occurs to me that this script is incredibly brittle anyway, because it doesn't account for the fact that apparmor may fail to load any profiles.

If that happens, you may end up looping forever in S40networking waiting for apparmor profiles to appear. I think this is what happened to Tim.

affects: upstart (Ubuntu) → dhcp3 (Ubuntu)
Changed in dhcp3 (Ubuntu):
status: Incomplete → Triaged
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

And from an Upstream POV, there's an even more brittle case. Ubuntu happens to have /var/run on a tmpfs, so /var/run/utmp either doesn't exist, is empty, or contains a runlevel (after rcS.d has finished).

But that's fairly unique to Ubuntu; other distros like Debian and Fedora probably don't do this.

So during rcS.d, instead of not existing, /var/run/utmp exists and contains data from when the machine was last booted!

During shutdown we do put a record in there to wipe the runlevel information, but if the machine wasn't cleanly shut down, runlevel might return "N 2" during rcS.d before /var/run/utmp is wiped!

So, in summary, don't call runlevel during rcS.d

Changed in dhcp3 (Ubuntu):
assignee: nobody → Jamie Strandboge (jdstrand)
Revision history for this message
Jamie Strandboge (jdstrand) wrote :

> Firstly we call "ifup" from udev for most standard network devices, and this happens very early in the boot sequence. At this point the /var/run/utmp file doesn't exist, so runlevel will output that error and exit because it can't find the file.

Yes, and it is because of udev calling dhclient that all these shenanigans are needed-- the profile must be loaded before dhclient is started otherwise it is unconfined.

> Secondly "runlevel" inherently returns undefined data when running through rcS.d, because you have not yet entered a runlevel. rcS.d is not the "single-user runlevel", it is the sysinit phase; until this is completed, you are neither in single-user mode *or* multi-user mode. You're still bootstrapping the system.

I am aware of that. The point of:

while [ ! -e "$AAPROFILES" ]; do
    # If apparmor is not loaded by the time we leave rcS, we go into S from
    # another runlevel, or are in a non-S runlevel, just exit
    runlevel | grep -E -q '( [0-9]|[0-9] S)' && exit 0
    sleep 1
done

was to quietly wait around for AAPROFILES to show up, or exit if they didn't by the time we went into a numbered runlevel or went into S (I observed going into 'S' was needed when testing this on Jaunty, though I don't recall why off-hand). Due to the silent waiting, this coped with /var/run being a tmpfs just fine (ie it waited). This technique had the added benefit of working even if a big fsck occurred on /.

I thought that entering single user mode would result in 'S 1', but I'll take your word that it does not.

That said, while this did work on Jaunty just fine in my testing (and the lack of bug reports bears this out), it is clearly too brittle and I'll need to rethink it.

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

Per Scott, this worked in Jaunty due to bugs in upstart that no longer exist in Karmic. The proper solution is to mount securityfs in mountkern.sh, which then allows the dhclient ifupdown script to be greatly simplified.

Changed in dhcp3 (Ubuntu):
status: Triaged → In Progress
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package dhcp3 - 3.1.2-1ubuntu3

---------------
dhcp3 (3.1.2-1ubuntu3) karmic; urgency=low

  * simplify ifupdown logic since we will mount securityfs in mountkern.sh
    instead of trying to wait around for it here. Thanks to Scott James
    Remnant for analysis (LP: #399954)

 -- Jamie Strandboge <email address hidden> Thu, 16 Jul 2009 11:25:40 -0500

Changed in dhcp3 (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
Jamie Strandboge (jdstrand) wrote :

Adding sysvinit task as mountkernfs.sh needs to also be updated.

Changed in sysvinit (Ubuntu):
assignee: nobody → Jamie Strandboge (jdstrand)
importance: Undecided → Medium
status: New → In Progress
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package sysvinit - 2.86.ds1-61ubuntu14

---------------
sysvinit (2.86.ds1-61ubuntu14) karmic; urgency=low

  * debian/initscripts/etc/init.d/mountkernfs.sh: mount securityfs if it is
    available. This allows for easier AppArmor confinement of applications
    started early in the boot process. LP: #399954

 -- Jamie Strandboge <email address hidden> Thu, 16 Jul 2009 12:52:49 -0500

Changed in sysvinit (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.