cloud-init.conf never runs, instance not reachable via ssh

Bug #712026 reported by Scott Moser
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Unassigned
Natty
Fix Released
High
Unassigned
udev (Ubuntu)
Fix Released
High
Ubuntu Server
Natty
Fix Released
High
Ubuntu Server

Bug Description

Binary package hint: udev

In natty alpha-2 EC2 testing, I found several instances unreachable via ssh, that were "fixed" with a reboot.

I launched 182 instances across 4 regions. 87 of those were were i386 instances. 7 exhibited this behavior.
All 7 that showed the error were i386 and m1.small. So, its fairly rare.

Of the 182 instances, only the 7 that failed had lines like this in their console log:

| udevd[191]: bind failed: Address already in use
| udevd[191]: error binding control socket, seems udevd is already running

(bug 712034 is related, covering the error messages)

ProblemType: Bug
DistroRelease: Ubuntu 11.04
Package: udev 165-0ubuntu2
ProcVersionSignature: User Name 2.6.38-1.28-virtual 2.6.38-rc2
Uname: Linux 2.6.38-1-virtual i686
Architecture: i386
CurrentDmesg: [ 13.636015] eth0: no IPv6 routers present
Date: Wed Feb 2 17:40:10 2011
Ec2AMI: ami-c416e6ad
Ec2AMIManifest: ubuntu-images-testing-us/ubuntu-natty-daily-i386-server-20110202.manifest.xml
Ec2AvailabilityZone: us-east-1c
Ec2InstanceType: m1.small
Ec2Kernel: aki-407d9529
Ec2Ramdisk: unavailable
Lspci:

Lsusb: Error: command ['lsusb'] failed with exit code 1:
ProcEnviron:
 PATH=(custom, user)
 LANG=en_US.UTF-8
 LC_MESSAGES=en_US.utf8
 SHELL=/bin/bash
ProcKernelCmdLine: root=LABEL=uec-rootfs ro console=hvc0
ProcModules: acpiphp 23425 0 - Live 0xedc10000
SourcePackage: udev

Revision history for this message
Scott Moser (smoser) wrote :
Revision history for this message
Scott Moser (smoser) wrote :
Revision history for this message
Scott Moser (smoser) wrote :
Revision history for this message
Scott Moser (smoser) wrote :

I should have noted, that cloud-init.conf runs :
  start on (mounted MOUNTPOINT=/ and net-device-up IFACE=eth0 and \
      stopped cloud-init-local )

cloud-init-local had already ran in all cases. It starts on:
  start on mounted MOUNTPOINT=/

tags: added: iso-testing
Changed in udev (Ubuntu Natty):
assignee: nobody → Canonical Server Team (canonical-server)
Dave Walker (davewalker)
tags: added: server-nrs
Revision history for this message
Scott Moser (smoser) wrote :

I hit this 3 times in alpha-3 testing for natty. Again, all i386 instances.

Changed in udev (Ubuntu Natty):
importance: Undecided → High
milestone: none → ubuntu-11.04-beta-1
status: New → Confirmed
Scott Moser (smoser)
description: updated
Martin Pitt (pitti)
Changed in udev (Ubuntu Natty):
milestone: ubuntu-11.04-beta-1 → ubuntu-11.04-beta-2
Revision history for this message
Scott Moser (smoser) wrote :

We're *hoping* this is related to bug 731878.

Revision history for this message
Andy Whitcroft (apw) wrote :

@scott -- as the reference bug is now Fix Released perhaps you could re-test and confirm.

Changed in udev (Ubuntu Natty):
assignee: Canonical Server Team (canonical-server) → Ubuntu Server Team (ubuntu-server)
Revision history for this message
James Page (james-page) wrote :

I ran several iterations of multiple instance testing across three regions over the last couple of days (see [0]); all instances started up first time which would indicate that this issue is resolved.

Beta-2 candidate testing (see [1]) will complete further instance testing so suggest that we review again at the end of today.

[0] http://tinyurl.com/5v44lwh
[1] http://tinyurl.com/5rwh5sw

Revision history for this message
Dave Walker (davewalker) wrote :

Tentatively marking Fixed Released based on previous comment, and previous considerations that it may have been an infrastructure issue.

Changed in udev (Ubuntu Natty):
status: Confirmed → Fix Released
Revision history for this message
Scott Moser (smoser) wrote :

I'm tagging this as 'Affects' linux because that is where the bug/fix actually was. We're very close to certain that this is really just fallout of bug 731878.

Changed in linux (Ubuntu Natty):
importance: Undecided → High
milestone: none → ubuntu-11.04-beta-2
status: New → Fix Released
Revision history for this message
Scott Moser (smoser) wrote :

So, I marked this as fix released, and most definitely we're seeing it less.

However, we *did* see it once in today's beta2 testing. I'll get console log and attach later.

Revision history for this message
Scott Moser (smoser) wrote :

Attached is console of failed natty beta2 test.

Revision history for this message
Scott Moser (smoser) wrote :

previously i attached the wrong console log. Here is the correct console log for natty beta2 failure. Note, we see:

| Begin: Running /scripts/local-bottom ... done.
| done.
| Begin: Running /scripts/init-bottom ... done.
| udevd-work[156]: open /dev/null failed: No such file or directory
| udevd-work[159]: open /dev/null failed: No such file or directory
| udevd-work[158]: open /dev/null failed: No such file or directory
| lxcmount stop/pre-start, process 174
| udevd[220]: bind failed: Address already in use
| udevd[220]: error binding udev control socket
| init: udev main process (220) terminated with status 1
| init: udev main process ended, respawning
| cloud-init start-local running: Thu, 14 Apr 2011 11:19:24 +0000. up 1.32 seconds
| no instance data found in start-local
| init: cloud-init-local main process (243) terminated with status 1
| cloud-init-nonet waiting 60 seconds for a network device.
| cloud-init-nonet gave up waiting for a network device.

Revision history for this message
Scott Moser (smoser) wrote :

I'm attaching a similar failure in oneiric. It was fixed with reboot.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.