karmic: iSCSI root: boot hangs on starting iscsid

Bug #457767 reported by Kevin Otte
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Undecided
Unassigned
Karmic
Invalid
Undecided
Unassigned
open-iscsi (Ubuntu)
Fix Released
Undecided
Unassigned
Karmic
Invalid
Undecided
Unassigned
partman-iscsi (Ubuntu)
Fix Released
High
Colin Watson
Karmic
Fix Released
High
Colin Watson

Bug Description

Binary package hint: debian-installer

Filing against d-i since I've been working with cjwatson on the iSCSI installer bits.

I've performed a root on iSCSI install from the network using the latest netboot image. The install completes and I am able to start booting into the installed environment.

After the IP config and iscsistart succeeds, I get the following:

 * Setting preliminary keymap...
 * Starting iSCSI initiator service iscsid
[ ###.######] INFO: task kjournald2:347 blocked for more than 120 seconds.
[ ###.######] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message
[ ###.######] INFO: task sync:606 blocked for more than 120 seconds.
[ ###.######] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message

Those two tasks bounce back and forth and nothing ever happens from there.

Revision history for this message
Kevin Otte (nivex) wrote :
Revision history for this message
Kevin Otte (nivex) wrote :

Might this affect open-iscsi instead?

What can be done to get this fixed in time for release? Or will the line about root on iSCSI being supported need to be stricken from the Release Notes?

Colin Watson (cjwatson)
Changed in debian-installer (Ubuntu Karmic):
milestone: none → ubuntu-9.10
Colin Watson (cjwatson)
Changed in debian-installer (Ubuntu Karmic):
importance: Undecided → High
Revision history for this message
Kyle Kienapfel (doctor-whom) wrote :

I think this bug should be filed against the init scripts.

Mathias Gug (mathiaz)
Changed in debian-installer (Ubuntu Karmic):
assignee: nobody → Mathias Gug (mathiaz)
Revision history for this message
Mathias Gug (mathiaz) wrote :

Creating an iscsi target system with the iscsitarget package:

1. sudo apt-get install iscsitarget

2. Create a 2 GB file (/srv/disk1) to be served via iscsi:
sudo dd if=/dev/zero of=/srv/disk1 bs=1000 count=2000000

3. Enable iscsi target in /etc/default/iscsitarget.

4. Configure the iscsi target to serve /srv/disk1 in /etc/ietd.conf:

Lun 0 Path=/srv/disk1,Type=fileio

5. Start iscsitarget.

Revision history for this message
Mathias Gug (mathiaz) wrote :

Creating a non-root iscsi initiator system (ie client) with open-iscsi:

1. sudo apt-get install open-iscsi
2. Discover the iscsi target:

sudo iscsiadm -m discovery -t sendtargets -p [IP|HOSTNAME]_OF_TARGET

3. Login into the remote iscsi target:

sudo iscsiadm -m node -T TARGET_NAME -p IP:PORT -l

4. Set automatic login on boot:

sudo iscsiadm -m node -T TARGET_NAME -p IP:PORT --op update -n node.startup -v automatic

This creates a configuration where the iscsi target is considered as a normal local block device. It doesn't create a root-on-iscsi system.

This configuration is working correctly (ie on system boot /dev/sda (in my environement) is created).

Revision history for this message
Kyle Kienapfel (doctor-whom) wrote : Re: [Bug 457767] Re: karmic: iSCSI root: boot hangs on starting iscsid

Clicking the reply button in my email (sorry if this splatters)

I isolated the problem to be something in the init system by adding
break=init to the kernel command line, and then chrooting into /root
chroot root
mount / -o remount,rw
chroot root /sbin/getty 38400 tty2

I can log in and even do a apt-get update

On Sun, Oct 25, 2009 at 3:39 PM, Mathias Gug <email address hidden> wrote:

> Creating a non-root iscsi initiator system (ie client) with open-iscsi:
>
> 1. sudo apt-get install open-iscsi
> 2. Discover the iscsi target:
>
> sudo iscsiadm -m discovery -t sendtargets -p [IP|HOSTNAME]_OF_TARGET
>
> 3. Login into the remote iscsi target:
>
> sudo iscsiadm -m node -T TARGET_NAME -p IP:PORT -l
>
> 4. Set automatic login on boot:
>
> sudo iscsiadm -m node -T TARGET_NAME -p IP:PORT --op update -n
> node.startup -v automatic
>
> This creates a configuration where the iscsi target is considered as a
> normal local block device. It doesn't create a root-on-iscsi system.
>
> This configuration is working correctly (ie on system boot /dev/sda (in
> my environement) is created).
>
> --
> karmic: iSCSI root: boot hangs on starting iscsid
> https://bugs.launchpad.net/bugs/457767
> You received this bug notification because you are a direct subscriber
> of the bug.
>

Mathias Gug (mathiaz)
Changed in debian-installer (Ubuntu Karmic):
status: New → Confirmed
Revision history for this message
Mathias Gug (mathiaz) wrote :
Download full text (3.2 KiB)

Marking the bug confirmed as I've been able to reproduce it in a kvm+libvirt environment using gPXE to boot off the iscsi target.

To reproduce, you'll need two guests and manage the dhcp server configuration (to send specific dhcp option to the iscsi initiator system).

Using libvirt:

1. Create a network that doesn't have a dhcp server running:

<network>
  <name>net1</name>
  <uuid>44648543-d59d-4e61-94b4-61450fff3474</uuid>
  <forward mode='nat'/>
  <bridge name='virbr1' stp='on' forwardDelay='0' />
  <ip address='192.168.222.1' netmask='255.255.255.0' />
</network>

2. Install a guest that will act as the iscsi target (see comment 4). Moreover install dnsmasq in the guest: the iscsi target will serve as the boot server for the network (dhcp+tftpboot). Configure dnsmasq to act as the boot server on the network:

mathiaz@itarget:~$ cat /etc/dnsmasq.d/pxe
#DHCP part
domain=example.org
dhcp-range=192.168.222.100,192.168.222.150,255.255.255.0,2h
dhcp-option=option:router,192.168.222.1
dhcp-authoritative

# TFTP part
enable-tftp
tftp-root=/srv/tftproot

dhcp-match=gpxe,175 # tags the request with net:gpxe if the gPXE option was supplied in DHCP request
dhcp-option=175,8:1:1 # turn on the keep-san option to allow installation
dhcp-boot=net:#gpxe,virtio-net.pxe # Here #gpxe means 'not gpxe': that is the tag is not set
dhcp-option=net:gpxe,17,"iscsi:192.168.222.2::::iqn.2001-04.com.example:storage.disk1"

3. Get a virtio-net PXE boot room (Unload PXE stack) from http://rom-o-matic.net/ and put in /srv/tftproot/ as virtio-net.pxe (as outlined in the configuration file above).

4. Define a guest with no block devices:

mathiaz@uec-node:~/images/client1$ cat libvirt.xml
<domain type='kvm'>
  <name>client1</name>
  <uuid>1adaabdc-ea8d-4328-87cb-8aa47f64acd4</uuid>
  <memory>512000</memory>
  <vcpu>1</vcpu>
  <os>
    <type arch='x86_64' machine='pc'>hvm</type>
    <boot dev='network'/>
    <boot dev='cdrom' />
  </os>
  <features>
    <acpi/>
  </features>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/bin/kvm</emulator>
    <disk type='file' device='cdrom'>
      <source file='/home/mathiaz/isos/mini.iso'/>
      <target dev='hdc' bus='ide'/>
      <readonly/>
    </disk>
    <interface type='network'>
      <mac address='52:54:00:d1:ea:a1'/>
      <source network='net1'/>
      <model type='virtio'/>
    </interface>
    <input type='mouse' bus='ps2'/>
    <graphics type='vnc' port='5902' autoport='yes' listen='0.0.0.0'/>
  </devices>
</domain>

The guest will first try to boot for the network then from the mini.iso file.

5. Start the install on ISCSI: connect to the vnc console of the guess and skip the network boot (Q). The mini.iso prompt should show up. Start the install with "cli iscsi=true".

6. Go through the install, skipping any block device installation. At partition time, start by login into the iscsi target and then install to the newly created SCSI device.

7. When the installation has finished reboot the system and boot from the network. The guest should chainload the gPXE rom from the network. gPXE should then...

Read more...

Revision history for this message
Mathias Gug (mathiaz) wrote :
Revision history for this message
Mathias Gug (mathiaz) wrote :
Revision history for this message
Mathias Gug (mathiaz) wrote :

I'm not sure this is related to init scripts. If I disable open-iscsi init script (exit 0) the boot process hangs a little bit later. It seems to be related to a kernel bug.

I'd also add that on my system I see an oops and IRQ 11 is being disabled. I've attached some screen shots of the boot process.

Revision history for this message
Mathias Gug (mathiaz) wrote :
Revision history for this message
Mathias Gug (mathiaz) wrote :
Revision history for this message
Mathias Gug (mathiaz) wrote :
Revision history for this message
Mathias Gug (mathiaz) wrote :
Revision history for this message
Mathias Gug (mathiaz) wrote :
Changed in debian-installer (Ubuntu Karmic):
assignee: Mathias Gug (mathiaz) → nobody
Revision history for this message
Kyle Kienapfel (doctor-whom) wrote :

I didn't get any kernel messages like that when running on hardware, I need to turn on debugging?

Revision history for this message
Kyle Kienapfel (doctor-whom) wrote :

Attaching a tar file with a replacement /etc/init that I used to get enough started so i can ssh in. I currently don't have time to step through the init system

I've only tested modifying rc.conf to respond to run levels 0 and 6 and tty[1-6].conf

Actually this might be as simple as revising /etc/network/interfaces I'll try that tomorrow. :-/

Revision history for this message
Andy Whitcroft (apw) wrote :

Looking at the iscsitarget kernel component it seems to be based on r214
of the upstream repository. Looking at that repository there are a number
of fixes there-in but none seem obviously related to the hangs reported.

Looking at the original report and comparing that to Mathias' its not 100%
certain they are showing the same problem. Particularly Mathias' case
seems to show issues somewhat earlier and reports an unwanted interrupt
which may well have been targetted to the virtio_pci module and may
indicate a virtio issue rather than an iscsitarget one.

In both cases we would want to know what the apparently hung processes
are waiting for. It may be possible to get more information on why we are
getting hung up here using 'sysrq-w' which should trigger a dump of any
blocked tasks to the console. Getting that output may help us here.

See below for the kernel documentation on how to trigger the sysrq:

* How do I use the magic SysRq key?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
On x86 - You press the key combo 'ALT-SysRq-<command key>'. Note - Some
           keyboards may not have a key labeled 'SysRq'. The 'SysRq' key is
           also known as the 'Print Screen' key. Also some keyboards cannot
           handle so many keys being pressed at the same time, so you might
           have better luck with "press Alt", "press SysRq", "release SysRq",
           "press <command key>", release everything.

Revision history for this message
Colin Watson (cjwatson) wrote :

I'm not convinced that this is a kernel bug. I think that we have not been careful enough to stop Upstart tearing the network interface down and setting it back up again at boot, and that as a result the root filesystem has gone away.

Alt-Sysrq-w just says "SysRq : Show Blocked State".

The target itself seems to work fine; I can log into it from an ordinary system and fiddle about with its filesystem. Furthermore installation onto the target worked fine. I don't think iscsitarget is a likely source of problems here.

Revision history for this message
Colin Watson (cjwatson) wrote :

I can demonstrate quite straightforwardly that this is not a kernel bug. All that it takes to make the system boot cleanly is to change 'iface eth0 inet dhcp' to 'iface eth0 inet manual' in /etc/network/interfaces.

I'm looking into a better solution.

Changed in linux (Ubuntu Karmic):
status: New → Invalid
affects: debian-installer (Ubuntu Karmic) → open-iscsi (Ubuntu Karmic)
Colin Watson (cjwatson)
affects: open-iscsi (Ubuntu Karmic) → partman-iscsi (Ubuntu Karmic)
Changed in partman-iscsi (Ubuntu Karmic):
assignee: nobody → Colin Watson (cjwatson)
status: Confirmed → In Progress
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package partman-iscsi - 7

---------------
partman-iscsi (7) karmic; urgency=low

  * Work around netcfg/choose_interface not always being set, breaking our
    workaround for network interface configuration issues when the root
    filesystem is on iSCSI (LP: #457767).

 -- Colin Watson <email address hidden> Mon, 26 Oct 2009 22:57:26 +0000

Changed in partman-iscsi (Ubuntu Karmic):
status: In Progress → Fix Released
Revision history for this message
Mathias Gug (mathiaz) wrote :

Workaround working correctly. My test install can now boot from gPXE using an iscsi drive as its root file system.

Colin Watson (cjwatson)
Changed in open-iscsi (Ubuntu Karmic):
status: New → Invalid
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package open-iscsi - 2.0.871-0ubuntu1

---------------
open-iscsi (2.0.871-0ubuntu1) lucid; urgency=low

  * New upstream release.
  * If the root filesystem is on iSCSI, prevent the network interface used
    for it from being brought up or down automatically (LP: #457767).
  * Backport from upstream:
    - Allow updating of discovery records (Hannes Reinecke).
    - Fix discovery record use, rather than always using iscsid.conf
      settings (Mike Christie).
 -- Colin Watson <email address hidden> Thu, 10 Dec 2009 18:19:20 +0000

Changed in open-iscsi (Ubuntu):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.