kernel crash when mounting encrypted (device mapped) ext4 partition

Bug #1089818 reported by Martin Pitt
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Joseph Salisbury

Bug Description

The udisks2 tests (https://jenkins.qa.ubuntu.com/view/Raring/view/AutoPkgTest/job/raring-adt-udisks2/) are currently hanging forever. Investigation shows that they cause a kernel oops in the Luks test, which uncovers a regression in the kernel (presumably from 3.5 to 3.7).

This can be reproduced with these commands, which will operate on a virtual scsi_debug device (so they won't change any data on your system). But beware, this completely crashes and freezes your computer, so save all documents before!

sudo modprobe scsi_debug
sudo luksformat -t ext4 /dev/sda
sudo cryptsetup luksOpen /dev/sda treasure

Works fine up to here. But this causes the crash:

sudo mount /dev/mapper/treasure /mnt

This is not hardware specific. It happens on real iron and in kvm, on i386 and amd64. I tested it with starting the current amd64 desktop image in kvm, switching to VT1 (so that I can see the kernel oops), and running above commands.

ProblemType: Bug
DistroRelease: Ubuntu 13.04
Package: linux-image-3.7.0-5-generic 3.7.0-5.13
ProcVersionSignature: Ubuntu 3.7.0-5.13-generic 3.7.0-rc8
Uname: Linux 3.7.0-5-generic x86_64
ApportVersion: 2.7-0ubuntu1
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: martin 1879 F.... pulseaudio
CRDA:
 country DE:
  (2400 - 2483 @ 40), (N/A, 20)
  (5150 - 5250 @ 40), (N/A, 20), NO-OUTDOOR
  (5250 - 5350 @ 40), (N/A, 20), NO-OUTDOOR, DFS
  (5470 - 5725 @ 40), (N/A, 26), DFS
Date: Thu Dec 13 10:21:09 2012
EcryptfsInUse: Yes
HibernationDevice: RESUME=UUID=a831a415-ccf4-4791-a0a0-320570fb4230
InstallationDate: Installed on 2012-07-12 (154 days ago)
InstallationMedia: Ubuntu 12.10 "Quantal Quetzal" - Alpha amd64 (20120627)
MachineType: LENOVO 3323REG
MarkForUpload: True
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=de_DE.UTF-8
 SHELL=/bin/bash
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.7.0-5-generic root=UUID=8f327c01-56d7-401c-8bd1-5442854e3c85 ro quiet splash i915.i915_enable_rc6=0 vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-3.7.0-5-generic N/A
 linux-backports-modules-3.7.0-5-generic N/A
 linux-firmware 1.98
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 06/07/2010
dmi.bios.vendor: LENOVO
dmi.bios.version: 6QET46WW (1.16 )
dmi.board.name: 3323REG
dmi.board.vendor: LENOVO
dmi.board.version: Not Available
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Not Available
dmi.modalias: dmi:bvnLENOVO:bvr6QET46WW(1.16):bd06/07/2010:svnLENOVO:pn3323REG:pvrThinkPadX201:rvnLENOVO:rn3323REG:rvrNotAvailable:cvnLENOVO:ct10:cvrNotAvailable:
dmi.product.name: 3323REG
dmi.product.version: ThinkPad X201
dmi.sys.vendor: LENOVO

Revision history for this message
Martin Pitt (pitti) wrote :
tags: added: regression-release
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Martin Pitt (pitti) wrote :

Interestingly this does not crash if the cleartext device gets VFAT, i. e. with

  sudo luksformat /dev/sda

(no "-t ext4", it defaults to "-t vfat")

I tried with "-t ext2" ant "-t ext3", and both seem to work.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

@Martin,

There is a newer kernel available, 3.7.0-6.14. Can you test with this latest Raring kernel and see if it also exhibits this bug?

Changed in linux (Ubuntu):
importance: Undecided → High
tags: added: kernel-key
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

It seems like there is an easy test case, so I'll see if I can reproduce it. If I can, I should be able to perform a bisect to identify the commit that introduced the regression.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I was able to reproduce this issue on Raring with the latest updates. Since I can reproduce, I'll perform a kernel bisect to try and identify the offending commit.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

screenshot of crash. RIP: kthread_data

tags: added: performing-bisect
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Just some testing updates:

mainline v3.7-rc1 is good.
mainline v3.7-rc4 is good.
mainline v3.7-rc6 is good.
mainline v3.7-rc7 introduced bug.

I'll now perform a bisect between v3.7-rc6 and v3.7-rc7 to identify the specific commit.

Revision history for this message
Martin Pitt (pitti) wrote :

Thanks Joseph, I appreciate!

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Hi Martin,

I bisected down this bug to commit: 5db44863b6ebbb400c5e61d56ebe8f21ef48b1bd

I built a raring test kernel with this commit reverted. The kernel fixed the bug on my system. Can you also test the kernel to see if it resolves the bug for you?

The kernel can be downloaded from:
http://people.canonical.com/~jsalisbury/lp1089818/

You will need to install both the linux-image and linux-image-extra packages.

Thanks in advance!

Revision history for this message
Martin Pitt (pitti) wrote :

I confirm that with this test kernel the crash does not happen any more and udisks mounting of encrypted ext4 partitions works fine. I ran the original udisks test several times in a row.

Thanks Joseph!

Changed in linux (Ubuntu):
status: Confirmed → Triaged
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Martin,

Are you able to capture the complete backtrace on the console with your VM setup? If not, I'll setup a a netconsole[0] on my bare metal system.

[0] https://wiki.ubuntu.com/Kernel/Netconsole

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 3.7.0-7.15

---------------
linux (3.7.0-7.15) raring; urgency=low

  [ Chris J Arges ]

  * SAUCE: add eeprom_bad_csum_allow module parameter
    - LP: #1070182

  [ Leann Ogasawara ]

  * Add ceph to linux-image for virtual instances
    - LP: #1063784

  [ Serge Hallyn ]

  * SAUCE: net: dev_change_net_namespace: send a KOBJ_REMOVED/KOBJ_ADD

  [ Tim Gardner ]

  * [Config] CONFIG_SLUB_DEBUG=y
    - LP: #1090308

  [ Upstream Kernel Changes ]

  * Revert "[SCSI] sd: Implement support for WRITE SAME"
    - LP: #1089818
 -- Leann Ogasawara <email address hidden> Wed, 12 Dec 2012 06:50:20 -0800

Changed in linux (Ubuntu):
status: Triaged → Fix Released
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Martin, commit 5db44863b6ebbb400c5e61d56ebe8f21ef48b1bd was reverted in Raring, so this bug is fixed in Ubuntu. However, I still would like to work with upstream to see if this can be fixed without reverting that commit in v3.8.

Changed in linux (Ubuntu):
status: Fix Released → In Progress
assignee: nobody → Joseph Salisbury (jsalisbury)
Revision history for this message
Martin Pitt (pitti) wrote :

> Are you able to capture the complete backtrace on the console with your VM setup?

Unfortunately not, kvm only gives me an 80x25 terminal. However, when I see the crash on my workstation on bare metal, I do see the complete one.

I don't have my real camera this week, so I only could do a photo with my mobile phone. It's rather crappy, but barely readable. Perhaps it helps to confirm that it's that patch.

Thanks for your efforts!

Changed in linux (Ubuntu):
status: In Progress → Fix Released
Changed in linux:
assignee: nobody → Joseph Salisbury (jsalisbury)
importance: Undecided → High
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Netconsole capture while reproducing bug.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

@Martin, one additional question. Do you happen to have access to real scsi hardware? It would be good to know if this bug happens on scsi_debug and real hardware.

Revision history for this message
Andy Whitcroft (apw) wrote :

Applied the official fix and backed out the revert. Moved this Fix Committed to indicate that the next upload will carry a real fix. This does not imply that the current uploads are not still protected.

Changed in linux (Ubuntu):
status: Fix Released → Fix Committed
no longer affects: linux
Brad Figg (brad-figg)
Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

@Martin,

I'd like to mark this bug as "Fix Released". Can you confirm this bug is resolved in the latest Raring release?

Thanks in advance

Revision history for this message
Martin Pitt (pitti) wrote :

Yes, I confirm that this has been fixed in raring. Thank you!

Revision history for this message
Adam Conrad (adconrad) wrote : Update Released

The verification of this Stable Release Update has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.