[4.8.0-14/ppc64el regression] rmmod scsi_debug keeps causing kernel oops

Bug #1626737 reported by Martin Pitt
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Medium
Unassigned
systemd (Ubuntu)
Fix Released
Medium
Martin Pitt

Bug Description

Since upgrading to 4.8.0-14, the "storage" autopkgtest of systemd is broken. This uses scsi_debug to get a test hard drive, which is reset between the test through unloading/reloading the module. This has worked fine so far (and still works on amd64/i386), but now regularly triggers a kernel oops:

[ 161.120362] Unable to handle kernel paging request for data at address 0x00000000
[ 161.120468] Faulting instruction address: 0xc000000000538ecc
[ 161.120517] Oops: Kernel access of bad area, sig: 11 [#1]
[ 161.120555] SMP NR_CPUS=2048 NUMA pSeries
[ 161.120595] Modules linked in: dm_crypt dm_mod xts algif_skcipher af_alg sd_mod sg xt_TCPMSS xt_tcpudp iptable_mangle ghash_generic gf128mul vmx_crypto virtio_balloon ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto mbcache crc32c_generic btrfs xor raid6_pq ohci_pci ehci_pci ohci_hcd virtio_blk virtio_net ehci_hcd usbcore crc32c_vpmsum usb_common virtio_pci virtio_ring virtio [last unloaded: scsi_debug]
[ 161.121016] CPU: 0 PID: 5473 Comm: rmmod Not tainted 4.8.0-15-generic #16-Ubuntu
[ 161.121067] task: c00000005ae51980 task.stack: c00000005ef58000
[ 161.121110] NIP: c000000000538ecc LR: c000000000538ee0 CTR: c0000000000f7250
[ 161.121162] REGS: c00000005ef5b9f0 TRAP: 0300 Not tainted (4.8.0-15-generic)
[ 161.121213] MSR: 800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 28002444 XER: 20000000
[ 161.121390] CFAR: c00000000009a8e0 DAR: 0000000000000000 DSISR: 40000000 SOFTE: 1
               GPR00: c000000000538e98 c00000005ef5bc70 c000000000f67b00 ffffffffffffffff
               GPR04: d000000001302018 0000000000000002 0000000000000000 c0000000010d7b00
               GPR08: c000000000fa7b00 0000000000000063 0000000000000073 0000000000000004
               GPR12: 0000000028002844 c00000000fb80000 0000000000000000 0000000000000000
               GPR16: 0000000000000000 00000100331f11f0 00000000384b3890 00000000384b3848
               GPR20: 00000000384b3830 00000000384b3870 00000000384b38a8 00000000384b3888
               GPR24: 00003fffd23d6e70 c000000000ebdec8 fffffffffffffffe d000000001302018
               GPR28: c000000000ebdeb8 0000000000000000 0000000000000000 0000000000000000
[ 161.122099] NIP [c000000000538ecc] ddebug_remove_module+0x8c/0x160
[ 161.122143] LR [c000000000538ee0] ddebug_remove_module+0xa0/0x160
[ 161.122186] Call Trace:
[ 161.122205] [c00000005ef5bc70] [c000000000538e98] ddebug_remove_module+0x58/0x160 (unreliable)
[ 161.122280] [c00000005ef5bd10] [c00000000018961c] free_module+0x21c/0x3c0
[ 161.122333] [c00000005ef5bd60] [c000000000189a38] SyS_delete_module+0x278/0x2f0
[ 161.122394] [c00000005ef5be30] [c0000000000095e0] system_call+0x38/0x108
[ 161.122445] Instruction dump:
[ 161.122472] 3d42fff5 e92a63b8 7fa9e000 7d3d4b78 ebe90000 419e00bc 7d3e4b78 3b40fffe
[ 161.122561] 48000018 7fbfe000 7ffdfb78 7ffefb78 <ebff0000> 419e0060 e87e0010 7f64db78
[ 161.122651] ---[ end trace 5f19b96c7077a0e0 ]---

This isn't reproducible by merely loading and unloading the module, it apparently needs to get some actual exercise. I'll find a simpler reproducer than running the systemd test tomorrow morning.

Martin Pitt (pitti)
tags: added: bot-stop-nagging
summary: - [4.8 regression] rmmod scsi_debug keeps causing kernel oops
+ [4.8.0-14/ppc64el regression] rmmod scsi_debug keeps causing kernel oops
Changed in linux (Ubuntu):
status: New → Confirmed
tags: added: kernel-4.8
Revision history for this message
Martin Pitt (pitti) wrote :

I adjusted the test to avoid "rmmod scsi_debug": https://anonscm.debian.org/cgit/pkg-systemd/systemd.git/commit/?id=be77e470d8

So there's still a bug there, but it won't block testing any more at least. And rmmod is always a bit brittle anyway, so let's avoid it.

Changed in systemd (Ubuntu):
assignee: nobody → Martin Pitt (pitti)
importance: Undecided → Medium
status: New → Fix Committed
Brad Figg (brad-figg)
Changed in linux (Ubuntu):
importance: Undecided → Medium
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package systemd - 231-9

---------------
systemd (231-9) unstable; urgency=medium

  * pid1: process zero-length notification messages again.
    Just remove the assertion, the "n" value was not used anyway. This fixes
    a local DoS due to unprocessed/unclosed fds which got introduced by the
    previous fix. (Closes: #839171) (LP: #1628687)
  * pid1: Robustify manager_dispatch_notify_fd()
  * test/networkd-test.py: Add missing writeConfig() helper function.

 -- Martin Pitt <email address hidden> Thu, 29 Sep 2016 23:39:24 +0200

Changed in systemd (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
Martin Pitt (pitti) wrote :

I didn't find a simpler reproducer on the CLI, and the systemd test now does not call rmmod any more, so there's no handle on this any more.

Changed in linux (Ubuntu):
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.