'modprobe aoe' on Karmic kernel oopses with AOE device from Jaunty

Bug #410198 reported by Dmitrii Zagorodnov
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Andy Whitcroft
Karmic
Fix Released
High
Andy Whitcroft

Bug Description

To replicate the problem with two machines: on machine 'a', that is not running karmic (jaunty), run vblade and export an AOE device on the network. On machine 'b' that is running Karmic, run 'modprobe aoe'. As soon as the /dev/etherd devices are discovered/registered, you get a kernel oops and the devices are unusable.

Revision history for this message
Brian Murray (brian-murray) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. This bug did not have a package associated with it, which is important for ensuring that it gets looked at by the proper developers. You can learn more about finding the right package at https://wiki.ubuntu.com/Bugs/FindRightPackage. I have classified this bug as a bug in linux which provides the kernel.
For future reference you might be interested to know that a lot of applications have bug reporting functionality built in to them. This can be accessed via the Report a Problem option in the Help menu for the application with which you are having an issue. You can learn more about this feature at https://wiki.ubuntu.com/ReportingBugs.

affects: ubuntu → linux (Ubuntu)
Revision history for this message
Daniel Nurmi (nurmi) wrote :

Some more info on reproducing this bug:

machine A running jaunty, with a number of vblade processes exporting AOE devices:

# uname -a
Linux gibson 2.6.28-14-generic #47-Ubuntu SMP Sat Jul 25 01:19:55 UTC 2009 x86_64 GNU/Linux

# apt-cache policy vblade
vblade:
  Installed: 16-1ubuntu2
  Candidate: 16-1ubuntu2
  Version table:
 *** 16-1ubuntu2 0
        500 http://mirror.eucalyptus jaunty/main Packages
        100 /var/lib/dpkg/status

machine B running karmic, attaching dmesg before running 'modprobe aoe' and after running 'modprobe aoe' (the latter contains the oops):

# uname -a
Linux foobar-desktop 2.6.31-5-generic #24-Ubuntu SMP Sat Aug 1 12:47:58 UTC 2009 x86_64 GNU/Linux
(see attached)

Andy Whitcroft (apw)
Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: regression-potential
Revision history for this message
Andy Whitcroft (apw) wrote :

We are panicing when trying to register the block queue:

  [ 2645.959090] kobject '<NULL>' (ffff880059ca22c0): tried to add an uninitialized object, something is seriously wrong.
  [ 2645.959104] Pid: 6, comm: events/0 Not tainted 2.6.31-5-generic #24-Ubuntu
  [ 2645.959107] Call Trace:
  [ 2645.959139] [<ffffffff8126ca2f>] kobject_add+0x5f/0x70
  [ 2645.959151] [<ffffffff8125b4ab>] blk_register_queue+0x8b/0xf0
  [ 2645.959155] [<ffffffff8126043f>] add_disk+0x8f/0x160
  [ 2645.959161] [<ffffffffa01673c4>] aoeblk_gdalloc+0x164/0x1c0 [aoe]
  [...]

This implies we have not initialised the device queue object, which
is gen_disk->queue, this comes from the aoedev object:

  aoeblk_gdalloc(void *vp)
  {
   struct aoedev *d = vp;
  [...]
   gd->queue = &d->blkq;
  [...]
  }

It seems we do not ever initialise the blkq. Looking at other drivers
this is typically allocated using blk_init_queue(), whereas in this driver
it is allocated directly in the aoedev structure and never initialised
appropriatly.

Changed in linux (Ubuntu):
assignee: nobody → Andy Whitcroft (apw)
status: New → In Progress
Revision history for this message
Andy Whitcroft (apw) wrote :

Ok I have attempted to fix the driver to initialise the request_queue structure correctly. I do not have a simple way to test this fix. I have built fixed kernels and pushed them to the URL below. If you could test these and let me know if they work better for you that would be helpful. Kernels are here:

    http://people.canonical.com/~apw/lp410198-karmic/

Changed in linux (Ubuntu):
status: In Progress → Incomplete
Revision history for this message
Daniel Nurmi (nurmi) wrote :
Download full text (4.0 KiB)

Greetings,

Thank you for tracking down and working on this issue; I've installed the new kernel:

Linux foobar-desktop 2.6.31-6-generic #25~lp410198apw1 SMP Tue Aug 11 13:35:31 UTC 2009 x86_64 GNU/Linux

With a vblade running on another machine (on the same broadcast network), ran 'modprobe aoe' and we're seeing an oops:

Aug 12 09:03:09 foobar-desktop kernel: [ 73.109792] Modules linked in: aoe binfmt_misc lp ppdev i2c_piix4 psmouse virtio_console serio_raw virtio_balloon parport_pc parport e1000 virtio_pci virtio_ring virtio floppy fbcon tileblit font bitblit softcursor i915 drm i2c_algo_bit video output intel_agp
Aug 12 09:03:09 foobar-desktop kernel: [ 73.109819] Pid: 6, comm: events/0 Not tainted 2.6.31-6-generic #25~lp410198apw1
Aug 12 09:03:09 foobar-desktop kernel: [ 73.109822] RIP: 0010:[<ffffffff8125d74c>] [<ffffffff8125d74c>] blk_queue_make_request+0xc/0xa0
Aug 12 09:03:09 foobar-desktop kernel: [ 73.109827] RSP: 0000:ffff88005bf83dc0 EFLAGS: 00010286
Aug 12 09:03:09 foobar-desktop kernel: [ 73.109829] RAX: ffff88004ec15ba0 RBX: 0000000000000000 RCX: ffff880059ad7b00
Aug 12 09:03:09 foobar-desktop kernel: [ 73.109832] RDX: 0000000000000010 RSI: ffffffffa01675d0 RDI: 0000000000000000
Aug 12 09:03:09 foobar-desktop kernel: [ 73.109834] RBP: ffff88005bf83dd0 R08: ffff8800019d34e0 R09: 0000000000000058
Aug 12 09:03:09 foobar-desktop kernel: [ 73.109837] R10: 0000000000000056 R11: 0000000000000000 R12: ffff88004b43d800
Aug 12 09:03:09 foobar-desktop kernel: [ 73.109839] R13: ffffffffa01688c0 R14: ffff8800019d9000 R15: ffff8800019d9008
Aug 12 09:03:09 foobar-desktop kernel: [ 73.109844] FS: 0000000000000000(0000) GS:ffff8800019c1000(0000) knlGS:0000000000000000
Aug 12 09:03:09 foobar-desktop kernel: [ 73.109847] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Aug 12 09:03:09 foobar-desktop kernel: [ 73.109850] CR2: 0000000000000078 CR3: 000000004b467000 CR4: 00000000000006b0
Aug 12 09:03:09 foobar-desktop kernel: [ 73.109856] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Aug 12 09:03:09 foobar-desktop kernel: [ 73.109863] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Aug 12 09:03:09 foobar-desktop kernel: [ 73.109866] Process events/0 (pid: 6, threadinfo ffff88005bf82000, task ffff88005bf88000)
Aug 12 09:03:09 foobar-desktop kernel: [ 73.109870] ffff88005bf83dd0 ffff88005a40ba00 ffff88005bf83e00 ffffffffa01672d3
Aug 12 09:03:09 foobar-desktop kernel: [ 73.109877] <0> ffff88005a40ba28 ffff88005a40ba00 ffffffffa01688c0 ffff8800019d9000
Aug 12 09:03:09 foobar-desktop kernel: [ 73.109885] <0> ffff88005bf83e30 ffffffffa0168998 ffff88005a40ba28 ffff8800019d9000
Aug 12 09:03:09 foobar-desktop kernel: [ 73.109897] [<ffffffffa01672d3>] aoeblk_gdalloc+0x73/0x1c0 [aoe]
Aug 12 09:03:09 foobar-desktop kernel: [ 73.109904] [<ffffffffa01688c0>] ? aoecmd_sleepwork+0x0/0xf0 [aoe]
Aug 12 09:03:09 foobar-desktop kernel: [ 73.109909] [<ffffffffa0168998>] aoecmd_sleepwork+0xd8/0xf0 [aoe]
Aug 12 09:03:09 foobar-desktop kernel: [ 73.109914] [<ffffffffa01688c0>] ? aoecmd_sleepwork+0x0/0xf0 [aoe]
Aug 12 09:03:09 foobar-desktop kernel: [ 73.109935] [<ffffffff8106daa...

Read more...

Revision history for this message
Thierry Carrez (ttx) wrote :

Setting this to High as it is a blocker for Eucalyptus 1.6 delivery in karmic.

Changed in linux (Ubuntu):
importance: Medium → High
status: Incomplete → In Progress
Steve Langasek (vorlon)
Changed in linux (Ubuntu Karmic):
milestone: none → karmic-alpha-5
Revision history for this message
Andy Whitcroft (apw) wrote :

Ok, I've respun the patch could you test the updated version for me and report back here. Thanks! Kernels are at the URL below:

     http://people.ubuntu.com/~apw/lp410198-karmic/

Changed in linux (Ubuntu Karmic):
status: In Progress → Incomplete
Revision history for this message
Daniel Nurmi (nurmi) wrote :

Greetings,

I've tested (modprobe aoe with vblade running on another host), and it looks good. I was able to discover, mount read/write data from/to the volume, and unmount.

Thank you again, it looks like this latest patch resolved the oops!

Thierry Carrez (ttx)
Changed in linux (Ubuntu Karmic):
status: Incomplete → In Progress
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 2.6.31-6.26

---------------
linux (2.6.31-6.26) karmic; urgency=low

  [ Andy Whitcroft ]

  * [Config] enable CONFIG_AUFS_BR_RAMFS
    - LP: #414738
  * split out debian directory ready for abstraction
  * add printdebian target to find branch target
  * abstracted debian -- debian/files is not abstracted
  * abstracted debian -- packages must be built in debian/<pkg>
  * abstracted debian -- kernel-wedge needs to work in debian/
  * abstracted debian -- ensure we install the copyright file
  * abstracted-debian -- drop the debian directories from headers
  * abstracted-debian -- drop the debian directories from headers part 2
  * SAUCE: ubuntu-insert-changes -- follow abstracted debian
  * [Upstream] aoe: ensure we initialise the request_queue correctly V2
    - LP: #410198

  [ Luke Yelavich ]

  * [Config] Ports: Disable CONFIG_CPU_FREQ_DEBUG on powerpc-smp
  * [Config] Ports: Re-enable windfarm modules on powerpc64-smp
    - LP: #413150
  * [Config] Ports: Build all cpu frequency scaling governors into ports
    kernels
  * [Config] Ports: Build ext2 and ext3 modules into ports kernels
  * [Config] Ports: CONFIG_PACKET=y for all ports kernels
  * [Config] Ports: Enable PS3 network driver

  [ Stefan Bader ]

  * abstracted debian -- call $(DEBIAN)/rules using make

  [ Tim Gardner ]

  * [Config] Abstract the debian directory
  * SAUCE: Improve error reporting in postinst
    - LP: #358564

 -- Tim Gardner <email address hidden> Sun, 16 Aug 2009 20:33:28 -0600

Changed in linux (Ubuntu Karmic):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.