Karmic 2.6.31-7.27 KSM patchset breaks encrypted swap

Bug #418781 reported by Dustin Kirkland 
66
This bug affects 10 people
Affects Status Importance Assigned to Milestone
Linux
Fix Released
High
linux (Ubuntu)
Fix Released
High
Tim Gardner
Karmic
Fix Released
High
Tim Gardner

Bug Description

Installed the latest Ubuntu kernel today, 2.6.31-7.27, but I can't boot into it, because it's missing padlock-sha.ko, which is required for encrypted swap.

:-Dustin

ProblemType: Bug
Architecture: amd64
ArecordDevices:
 **** List of CAPTURE Hardware Devices ****
 card 0: Intel [HDA Intel], device 0: CONEXANT Analog [CONEXANT Analog]
   Subdevices: 1/1
   Subdevice #0: subdevice #0
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: kirkland 4035 F.... pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'Intel'/'HDA Intel at 0xf2620000 irq 17'
   Mixer name : 'Conexant CX20561 (Hermosa)'
   Components : 'HDA:14f15051,17aa20ff,00100000'
   Controls : 15
   Simple ctrls : 8
Date: Tue Aug 25 13:29:45 2009
DistroRelease: Ubuntu 9.10
MachineType: LENOVO 7454CTO
Package: linux-image-2.6.31-6-generic 2.6.31-6.26
PccardctlIdent:

PccardctlStatus:

ProcCmdLine: root=UUID=d45ce184-de1d-48ac-a143-44ab4432a207 ro quiet splash
ProcEnviron:
 PATH=(custom, user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.31-6.26-generic
RelatedPackageVersions:
 linux-backports-modules-2.6.31-6-generic N/A
 linux-firmware 1.16
SourcePackage: linux
Uname: Linux 2.6.31-6-generic x86_64
WpaSupplicantLog:

dmi.bios.date: 04/22/2009
dmi.bios.vendor: LENOVO
dmi.bios.version: 6DET44WW (2.08 )
dmi.board.name: 7454CTO
dmi.board.vendor: LENOVO
dmi.board.version: Not Available
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Not Available
dmi.modalias: dmi:bvnLENOVO:bvr6DET44WW(2.08):bd04/22/2009:svnLENOVO:pn7454CTO:pvrThinkPadX200:rvnLENOVO:rn7454CTO:rvrNotAvailable:cvnLENOVO:ct10:cvrNotAvailable:
dmi.product.name: 7454CTO
dmi.product.version: ThinkPad X200
dmi.sys.vendor: LENOVO

Revision history for this message
In , Warren (warren-redhat-bugs) wrote :

1) Upgrade to grubby-7.0.2-1.fc12.x86_64
2) Install grubby-7.0.2-1.fc12.x86_64

title Fedora (2.6.31-0.145.rc5.git3.fc12.x86_64)
 root (hd0,0)
 kernel /vmlinuz-2.6.31-0.145.rc5.git3.fc12.x86_64 ro root=/dev/mapper/vg0-rootfs rhgb quiet usbcore.autosuspend=1 SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=us rd_plytheme=charge
 initrd /initrd-generic-2.6.31-0.145.rc5.git3.fc12.x86_64.img

This stanza was written by new-kernel-pkg. Note that it is booting the initrd-generic shipped within the kernel RPM.

3) Attempt boot. My LVM vg is encrypted, with rootfs on a lv within that vg. I type my passphrase, it successfully unlocks it, but then sits there forever.

key slot 0 unlocked.
                    Command successful.

Revision history for this message
In , Warren (warren-redhat-bugs) wrote :

oops, step #2 is install kernel-2.6.31-0.145.rc5.git3.fc12.x86_64

This is with dracut-0.8-1.fc12.noarch. I tried to create a new initrd image with dracut, but that image exhibits the same problem as initrd-generic.

Revision history for this message
In , Warren (warren-redhat-bugs) wrote :

dracut on kernel-2.6.31-0.125.rc5.git2.fc12.x86_64 works. This seems to be a problem with kernel-2.6.31-0.145.rc5.git3.fc12.x86_64. Reassigning.

Revision history for this message
In , Warren (warren-redhat-bugs) wrote :

kernel-2.6.31-0.145.2.1.rc5.git3.fc12.x86_64 is broken in the same manner.

Revision history for this message
In , Sven (sven-redhat-bugs) wrote :

Same here. The last kernel that is working for me is kernel-2.6.31-0.139.rc5.git3.fc12.x86_64

Revision history for this message
In , Warren (warren-redhat-bugs) wrote :

kernel-2.6.31-0.149.rc5.git3.fc12.x86_64 mkinitrd FAIL
kernel-2.6.31-0.149.rc5.git3.fc12.x86_64 dracut FAIL

Revision history for this message
In , Warren (warren-redhat-bugs) wrote :

I built a LiveCD with kernel-2.6.31-0.149.rc5.git3.fc12.x86_6 + dracut. It gets stuck forever without any error messages and just fails to boot. It seems this has nothing to do with encrypted root.

Revision history for this message
In , Warren (warren-redhat-bugs) wrote :

GOOD kernel-2.6.31-0.139.rc5.git3.fc12.x86_64 mkinitrd
GOOD kernel-2.6.31-0.139.rc5.git3.fc12.x86_64 dracut
FAIL kernel-2.6.31-0.142.rc5.git3.fc12.x86_64 mkinitrd
FAIL kernel-2.6.31-0.142.rc5.git3.fc12.x86_64 dracut

Confirmed, it broke somewhere between 139 and 142.

Revision history for this message
In , Dennis (dennis-redhat-bugs) wrote :

kernel-2.6.31-0.149.rc5.git3.fc12.sparc64 works just fine here. im using unencrypted lvm

dracut-0.7-4.fc12 was used in the kernel build

Revision history for this message
In , Warren (warren-redhat-bugs) wrote :

I'm confused. The very same livecd of Comment #6 works today, but the kernel installed on my laptop silently gets stuck after unlocking the encrypted disk.

Sven, are you using encryption? enrypted LVM vg specifically?

Revision history for this message
In , Warren (warren-redhat-bugs) wrote :

http://people.redhat.com/wtogami/temp/post139loop.jpg
SysRQ-p after it gets stuck. It appears to be stuck in a loop.

Revision history for this message
In , Sven (sven-redhat-bugs) wrote :

I am using encryption.

I can reproduce those hangs on two machines - both using full vg encryption.

Revision history for this message
In , Sven (sven-redhat-bugs) wrote :

If I'm not mistaken F12Alpha is going to ship with a kernel >139 and that would mean full disk encrytion is broken for alpha. As this is a rather important feature I think blocking on F12Alpha is warranted.

Revision history for this message
In , Sven (sven-redhat-bugs) wrote :

This seems to be the "non-boot-side" of the same bug:

https://bugzilla.redhat.com/show_bug.cgi?id=517545

Revision history for this message
In , Tom (tom-redhat-bugs) wrote :

Yeah, believe I can reproduce a similar issue by plugging in a USB hard drive with a Luks encrypted file system:
https://bugzilla.redhat.com/show_bug.cgi?id=517545

As reported there, works for 0.139, fails for later kernels up to and including
kernel-2.6.31-0.156.rc6.fc12.x86_64.

All these kernels boot fine on my unencrypted LVM, but exhibit "cryptsetup
won't die and consumes available cpu cycles".

I've posted SysRQ-p traces there....

Same issue?

Revision history for this message
In , Milan (milan-redhat-bugs) wrote :

It works with Linus' kernel, patches which introduced problem in Fedora:

Kernel Samepage Merging (KSM).
 linux-2.6-ksm.patch
 linux-2.6-ksm-updates.patch

Quite serious bug, probably all encrypted system are not bootable now.

Revision history for this message
In , Milan (milan-redhat-bugs) wrote :

*** Bug 517545 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Sven (sven-redhat-bugs) wrote :

Both my rawhide machines are back to working state with kernel -157 (which disables the ksm-patches).

Revision history for this message
In , Milan (milan-redhat-bugs) wrote :

From the included KSM series, probmlematic is this patch
Subject: [PATCH 9/12] ksm: fix oom deadlock

(fixes one deadlock...and introduces another one:-)

Revision history for this message
In , Tom (tom-redhat-bugs) wrote :

-157 fixes my "plugging in a USB hard drive with encrypted FS" issue.

FS now mounts and cryptsetup has properly exited.

Revision history for this message
In , Mark (mark-redhat-bugs) wrote :

The F12 Alpha kernel is kernel-2.6.31-0.125.4.2.rc5.git2.fc12, so removing this from the alpha blocker

Revision history for this message
In , Mark (mark-redhat-bugs) wrote :

Summary:

  - '[PATCH 9/12] ksm: fix oom deadlock' appears to cause deadlock with an
    encrypted root volume

  - This was added in 2.6.31-0.141.rc5.git3 by the addition of this set
    of KSM patches:

  http://cvs.fedoraproject.org/viewvc/rpms/kernel/devel/linux-2.6-ksm-updates.patch?revision=1.1

  - the KSM patches have since been disabled since 2.6.31-0.157.rc6 pending
    a fix for this

Revision history for this message
In , Milan (milan-redhat-bugs) wrote :

> - '[PATCH 9/12] ksm: fix oom deadlock' appears to cause deadlock with an
> encrypted root volume

FYI: no need to have encrypted root volume, any "cryptsetup luksOpen" on x86_64 will cause deadlock, for process backtrace see bug 517545.

Revision history for this message
In , Mark (mark-redhat-bugs) wrote :

Andrea suggests checking whether these programs are calling madvise() with bogus flags

Revision history for this message
In , Milan (milan-redhat-bugs) wrote :

(In reply to comment #23)
> Andrea suggests checking whether these programs are calling madvise() with
> bogus flags

Not explicitly, but probably forgot to unlock memory - try this code:

#include <sys/mman.h>

int main (int argc, char *argv[])
{
        mlockall(MCL_CURRENT | MCL_FUTURE);
// munlockall();

        return 0;
}

Revision history for this message
In , Andrea (andrea-redhat-bugs) wrote :

Investingating why those troublesome checks that deadlocks mlocked programs are added to page fault path... at first glance they look unnecessary, so asking just in case...

Date: Tue, 25 Aug 2009 16:58:32 +0200
From: Andrea Arcangeli <email address hidden>
To: Hugh Dickins <email address hidden>
Cc: Izik Eidus <email address hidden>, Rik van Riel <email address hidden>,
        Chris Wright <email address hidden>,
        Nick Piggin <email address hidden>,
        Andrew Morton <email address hidden>,
        <email address hidden>, <email address hidden>
Subject: Re: [PATCH 9/12] ksm: fix oom deadlock

On Mon, Aug 03, 2009 at 01:18:16PM +0100, Hugh Dickins wrote:
> tables which have been freed for reuse; and even do_anonymous_page
> and __do_fault need to check they're not being called by break_ksm
> to reinstate a pte after zap_pte_range has zapped that page table.

This deadlocks exit_mmap in an infinite loop when there's some region
locked. mlock calls gup and pretends to page fault successfully if
there's a vma existing on the region, but it doesn't page fault
anymore because of the mm_count being 0 already, so follow_page fails
and gup retries the page fault forever. And generally I don't like to
add those checks to page fault fast path.

Given we check mm_users == 0 (ksm_test_exit) after taking mmap_sem in
unmerge_and_remove_all_rmap_items, why do we actually need to care
that a page fault happens? We hold mmap_sem so we're guaranteed to see
mm_users == 0 and we won't ever break COW on that mm with mm_users ==
0 so I think those troublesome checks from page fault can be simply
removed.

Revision history for this message
In , Andrea (andrea-redhat-bugs) wrote :

Created attachment 358588
attempted fix (last one was wrong diff)

Revision history for this message
Dustin Kirkland  (kirkland) wrote :
Changed in linux (Ubuntu):
importance: Undecided → High
assignee: nobody → Tim Gardner (timg-tpi)
Jonathan Davies (jpds)
Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Hrm, I couldn't reproduce this in a fresh install :-/ Closing for now.

:-Dustin

Changed in linux (Ubuntu):
status: Confirmed → Invalid
Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Check that...

I have reproduced the problem. Fresh install from today's iso was just fine until I updated. Then it exploded.

I can boot just fine using 2.6.31-6. But not 2.6.31-7.

:-Dustin

Changed in linux (Ubuntu):
status: Invalid → Confirmed
Revision history for this message
In , Andrea (andrea-redhat-bugs) wrote :

Created attachment 358624
new proposed patch

this is actually making ksm_exit simpler and it already contains down_write(mmap_sem)

(also this time I checked which workstation I'm running firefox on, before picking a random file from /tmp ;)

discussion is going live on linux-mm with Hugh

Revision history for this message
Dustin Kirkland  (kirkland) wrote : Re: padlock-sha kernel module fails to load

The actual message is:

sudo insmod /lib/modules/2.6.31-7-generic/kernel/drivers/crypto/padlock-sha.ko:
insmod: error inserting '/lib/modules/2.6.31-7-generic/kernel/drivers/crypto/padlock-sha.ko': -1 No such device

dmesg says:

padlock: VIA PadLock Hash Engine not detected.

:-Dustin

summary: - padlock-sha kernel module missing from the build
+ padlock-sha kernel module fails to load
Revision history for this message
Tim Gardner (timg-tpi) wrote :

I'm pretty sure that message is a red herring, the padlock-sha crypto engines are dependent on VIA CPUs, which I doubt you have. From the file drivers/crypto/padlock-sha.c:

/* Support for VIA PadLock hardware crypto engine. */

The first thing this module checks is to see if the CPU has cpu_has_phe capability (Padlock Hash Engine), and prints the above error message if not.

Revision history for this message
Tim Gardner (timg-tpi) wrote :

Reverted 'UBUNTU: SAUCE: (drop after 2.6.31) Added KSM from mmotm-2009-08-20-19-18' and all started working.

Revision history for this message
In , Andrea (andrea-redhat-bugs) wrote :

Hugh acked my attachment 358624 so please apply it and then we can close this bug. We've still some issue to discuss on oom handling with ksm on linux-mm but those aren't crtical issues and once we solve them, patches will flow in rawhide.

thanks!

Revision history for this message
In , Justin (justin-redhat-bugs) wrote :

Already applied and should be in kernel-2.6.31-0.180.rc7.git4.fc12 today. KSM has been re-enabled.

Revision history for this message
Tim Gardner (timg-tpi) wrote :

The KSM patch set appears to be causing instabilities in a variety of areas, most notable of which is encrypted swap. Vanilla 2.6.31-rc7 does not have any issues wrt encrypted swap, therefore I'm dropping the whole patch set pending more extensive testing.

Changed in linux (Ubuntu Karmic):
milestone: none → karmic-alpha-5
status: Confirmed → Fix Committed
summary: - padlock-sha kernel module fails to load
+ Karmic 2.6.31-7.27 KSM patchset breaks encrypted swap
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 2.6.31-8.28

---------------
linux (2.6.31-8.28) karmic; urgency=low

  [ Ike Panhc ]

  * [Config] Let nic-shared-modules depends on crypto-modules
    - LP: #360966

  [ Leann Ogasawara ]

  * [Upstream] (drop after 2.6.31) drm/i915: increase default latency
    constant
    - LP: #412492

  [ Mario Limonciello ]

  * [Upstream]: (drop after 2.6.31) dell-laptop: don't change softblock
    status if HW switch is disabled
    - LP: #418721
  * [Upstream]: (drop after 2.6.31) compal-laptop: Add support for known
    Compal made Dell laptops
  * [Upstream]: (drop after 2.6.31) compal-laptop: Replace sysfs support
    with rfkill support

  [ Tim Gardner ]

  * [Config] Add acpiphp to virtual sub-flavour
    - LP: #364916
  * Drop KSM patch set for now because of instabilities with encrypted swap.
    - LP: #418781

 -- Tim Gardner <email address hidden> Wed, 26 Aug 2009 08:14:26 -0600

Changed in linux (Ubuntu Karmic):
status: Fix Committed → Fix Released
Revision history for this message
Dustin Kirkland  (kirkland) wrote : Re: [Bug 418781] Re: padlock-sha kernel module fails to load

Good, thanks, Tim.

If you want to cook a KSM kernel off to the side somewhere, in a PPA,
perhaps, I can do a bit more testing...

:-Dustin

Revision history for this message
Luka Renko (lure) wrote :

I had the same problem, but I have only /home on dm-crypt+LVM. See bug 418901.

Revision history for this message
Jonathan Davies (jpds) wrote :

I had the same problem Luka had for the record.

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

KSM has been by Fedora. If we sync up with their patch set, we should be able to re-enable KSM for Karmic.

:-Dustin

Changed in linux (Ubuntu Karmic):
status: Fix Released → Triaged
Changed in linux:
status: Unknown → Fix Released
Revision history for this message
Steve Langasek (vorlon) wrote :

deferring milestone, per Tim

Changed in linux (Ubuntu Karmic):
milestone: karmic-alpha-5 → karmic-alpha-6
Revision history for this message
Tim Gardner (timg-tpi) wrote :
Revision history for this message
Luka Renko (lure) wrote :

I have tested the above packages (URL version is misleading):
Linux version 2.6.31-10-generic (rtg@lochsa) (gcc version 4.4.1 (Ubuntu 4.4.1-3ubuntu3) ) #30 SMP Tue Sep 1 22:28:12 UTC 2009

It does properly boot on my system (/home on dm-crypt+LVM, root on LVM), where it failed with previous KSM kernel.
I still have to workaround for bug 421876, but that is probably not kernel issue, but most probably interaction (timing issue) between cryptsetup/lvm/udev).

Revision history for this message
Tim Gardner (timg-tpi) wrote :

@Luca - thanks for the test feedback (sorry about the misleading URL)

Revision history for this message
Tim Gardner (timg-tpi) wrote :

I'm dropping the KSM stuff altogether. It appears to cause some weird issues, random crashes, etc. Dustin (on the server team) says there is no great hue and cry for this feature.

Changed in linux (Ubuntu Karmic):
status: Triaged → In Progress
Revision history for this message
Dustin Kirkland  (kirkland) wrote : Re: [Bug 418781] Re: Karmic 2.6.31-7.27 KSM patchset breaks encrypted swap

Right, just to go on record here... The KSM stuff was never in plan
for Karmic. I asked Tim to grab that patchset for a little testing in
early August. The testing showed that it's not quite ready yet, and I
agree that we should drop it for Karmic at this point. It should be
in the upstream kernel by the time we sync for the 10.04 development
cycle and we'll focus a bit more effort at that time.

Cheers,
:-Dustin

Changed in linux (Ubuntu Karmic):
status: In Progress → Fix Released
Changed in linux:
importance: Unknown → High
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.