kernel BUG at /build/buildd/linux-3.0.0/arch/x86/kvm/mmu.c:703; RIP: 0010:[<ffffffffa034b160>] [<ffffffffa034b160>] rmap_remove+0x1a0/0x1c0 [kvm]

Bug #905246 reported by Shyam
40
This bug affects 8 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Medium
Unassigned

Bug Description

After moving to oneiric (3.0 linux kernel) in good number of systems we are periodically seeing rcu_sched_state detected stall on CPU. In one of the case it was preceeded with this KVM BUG(). Not sure if this has correlation with the stall, but the stalls started happening after this

[106643.753653] rmap_remove: ffff88049a9567f8 1->BUG
[106643.763707] ------------[ cut here ]------------
[106643.773585] kernel BUG at /build/buildd/linux-3.0.0/arch/x86/kvm/mmu.c:703!
[106643.783261] invalid opcode: 0000 [#2] SMP
[106643.793054] CPU 1
[106643.793198] Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp iptable_filter ip_tables x_tables bridge stp vesafb kvm_intel kvm nfs lockd fscache auth_rpcgss nfs_acl sunrpc ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi scst_vdisk iscsi_scst scst libcrc32c dm_iostat dcdbas dm_multipath joydev psmouse serio_raw ghes i7core_edac edac_core acpi_power_meter hed lp parport usbhid hid ses enclosure megaraid_sas bnx2
[106643.843476]
[106643.843478] Pid: 29316, comm: kvm Tainted: G D W 3.0.0-13-server #22-Ubuntu Dell Inc. PowerEdge R710/00NH4P
[106643.843483] RIP: 0010:[<ffffffffa034b160>] [<ffffffffa034b160>] rmap_remove+0x1a0/0x1c0 [kvm]
[106643.843506] RSP: 0018:ffff88091f815b38 EFLAGS: 00010296
[106643.843509] RAX: 000000000000003c RBX: ffff88049a9567f8 RCX: 00000000ffffffff
[106643.843511] RDX: 0000000000000000 RSI: 0000000000000082 RDI: 0000000000000246
[106643.843514] RBP: ffff88091f815b58 R08: 0000000000000000 R09: 0000000000000000
[106643.843516] R10: 0000000000005000 R11: 0000000000000001 R12: ffff880920448000
[106643.843518] R13: ffff88046a734fa0 R14: ffff88046a734fa0 R15: ffffea0000000000
[106643.843521] FS: 00007f96354817a0(0000) GS:ffff8804bfc00000(0000) knlGS:0000000000000000
[106643.843525] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[106643.843527] CR2: 00007f77e4e21000 CR3: 0000000922982000 CR4: 00000000000026e0
[106643.843532] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[106643.843535] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[106643.843538] Process kvm (pid: 29316, threadinfo ffff88091f814000, task ffff88091fd3dc80)
[106643.843540] Stack:
[106643.843542] 000000000000ece0 ffff88049a9567f8 ffff880920448000 ffff880920448000
[106643.843545] ffff88091f815b78 ffffffffa034b1b5 ffff88049a9567f8 00000000000000ff
[106643.843549] ffff88091f815bb8 ffffffffa034bfc3 ffff88091f815b88 ffff88046a734fa0
[106643.843552] Call Trace:
[106643.843567] [<ffffffffa034b1b5>] drop_spte+0x35/0x40 [kvm]
[106643.843581] [<ffffffffa034bfc3>] kvm_mmu_page_unlink_children+0x53/0xf0 [kvm]
[106643.843596] [<ffffffffa034c0a9>] kvm_mmu_prepare_zap_page+0x49/0x280 [kvm]
[106643.843602] [<ffffffff810b6719>] ? try_stop_cpus+0x59/0x70
[106643.843615] [<ffffffffa035351a>] kvm_mmu_zap_all+0x4a/0x90 [kvm]
[106643.843629] [<ffffffffa0348696>] kvm_arch_flush_shadow+0x16/0x30 [kvm]
[106643.843640] [<ffffffffa0330393>] __kvm_set_memory_region+0x323/0x950 [kvm]
[106643.843654] [<ffffffffa035d804>] ? create_pit_timer+0xb4/0xd0 [kvm]
[106643.843668] [<ffffffffa035e03b>] ? pit_load_count.isra.10+0xcb/0x110 [kvm]
[106643.843682] [<ffffffffa035e438>] ? kvm_pit_load_count+0x28/0x70 [kvm]
[106643.843693] [<ffffffffa0330a00>] kvm_set_memory_region+0x40/0x60 [kvm]
[106643.843704] [<ffffffffa0331a38>] kvm_vm_ioctl_set_memory_region+0x18/0x20 [kvm]
[106643.843714] [<ffffffffa0331c60>] kvm_vm_ioctl+0x220/0x2e0 [kvm]
[106643.843719] [<ffffffff811792c9>] do_vfs_ioctl+0x89/0x310
[106643.843723] [<ffffffff81097d65>] ? sys_futex+0x105/0x1a0
[106643.843727] [<ffffffff811679ed>] ? vfs_read+0x10d/0x180
[106643.843730] [<ffffffff811795e1>] sys_ioctl+0x91/0xa0
[106643.843736] [<ffffffff81607102>] system_call_fastpath+0x16/0x1b
[106643.843738] Code: 8b 16 48 89 10 eb b2 48 89 de 48 c7 c7 50 75 36 a0 31 c0 e8 ab d6 29 e1 0f 0b 48 89 de 48 c7 c7 cf 85 36 a0 31 c0 e8 98 d6 29 e1 <0f> 0b 48 89 de 48 c7 c7 b4 85 36 a0 31 c0 e8 85 d6 29 e1 0f 0b
[106643.843759] RIP [<ffffffffa034b160>] rmap_remove+0x1a0/0x1c0 [kvm]
[106643.843773] RSP <ffff88091f815b38>
[106643.870629] ---[ end trace 1df1b3861d030b7c ]---
[106711.721069] INFO: rcu_sched_state detected stall on CPU 5 (t=15000 jiffies)
[106711.721081] INFO: rcu_sched_state detected stall on CPU 3 (t=15000 jiffies)
[106711.721094] INFO: rcu_sched_state detected stall on CPU 0 (t=15000 jiffies)
[106711.721103] INFO: rcu_sched_state detected stall on CPU 6 (t=15000 jiffies)
[106711.721112] INFO: rcu_sched_state detected stall on CPU 1 (t=15000 jiffies)
[106891.478695] INFO: rcu_sched_state detected stall on CPU 3 (t=60030 jiffies)
[106891.478701] INFO: rcu_sched_state detected stall on CPU 1 (t=60030 jiffies)
[106891.478707] INFO: rcu_sched_state detected stall on CPU 5 (t=60030 jiffies)
[107071.236324] INFO: rcu_sched_state detected stall on CPU 2 (t=105060 jiffies)
[107071.236336] INFO: rcu_sched_state detected stall on CPU 1 (t=105060 jiffies)
[107250.993951] INFO: rcu_sched_state detected stall on CPU 6 (t=150090 jiffies)

ProblemType: Bug
DistroRelease: Ubuntu 11.10
Package: linux-image-3.0.0-13-server 3.0.0-13.22
ProcVersionSignature: Ubuntu 3.0.0-13.22-server 3.0.6
Uname: Linux 3.0.0-13-server x86_64
AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access /dev/snd/: No such file or directory
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 1.23-0ubuntu4
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
CRDA: Error: [Errno 2] No such file or directory
Date: Fri Dec 16 10:37:00 2011
HibernationDevice: RESUME=UUID=5d83c5ae-1432-4ed0-be62-0b5bb54d8989
IwConfig: Error: [Errno 2] No such file or directory
MachineType: Dell Inc. PowerEdge R710
PciMultimedia:

ProcEnviron:
 LANGUAGE=en_US:
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.0.0-13-server root=UUID=6adff529-7713-48d3-9174-d7e8f7fe5053 ro splash quiet vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-3.0.0-13-server N/A
 linux-backports-modules-3.0.0-13-server N/A
 linux-firmware 1.60
RfKill: Error: [Errno 2] No such file or directory
SourcePackage: linux
UpgradeStatus: Upgraded to oneiric on 2011-12-09 (6 days ago)
dmi.bios.date: 01/31/2011
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 3.0.0
dmi.board.name: 00NH4P
dmi.board.vendor: Dell Inc.
dmi.board.version: A12
dmi.chassis.type: 23
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvr3.0.0:bd01/31/2011:svnDellInc.:pnPowerEdgeR710:pvr:rvnDellInc.:rn00NH4P:rvrA12:cvnDellInc.:ct23:cvr:
dmi.product.name: PowerEdge R710
dmi.sys.vendor: Dell Inc.

Revision history for this message
Shyam (shyam-zadarastorage) wrote :
Brad Figg (brad-figg)
Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Hi Shyam,

As requested in your other bug, would it also be possible for you to test the latest upstream kernel for this bug? It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . If possible, please test the latest v3.2-rcN kernel (Not a kernel in the daily directory). Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag(Only that one tag, please leave the other tags). This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text.

If this bug is fixed by the mainline kernel, please add the following tag 'kernel-fixed-upstream-KERNEL-VERSION'. For example, if kernel version 3.2-rc1 fixed and issue, the tag would be: 'kernel-fixed-upstream-v3.2-rc1'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: kernel-da-key needs-upstream-testing
Revision history for this message
Shyam (shyam-zadarastorage) wrote :

Due to instability in oneiric we moved back to Natty. Marking this bug 'kernel-unable-to-test-upstream'

Revision history for this message
Arkadiy Kulev (eth-ethaniel) wrote :

I have the same problem in the latest oneiric. My system keeps stalling every 24-48 hours. I had to add a "kernel.panic" into the sysctl.conf to keep the system "alive".

Ubuntu 11.10 (GNU/Linux 3.0.0-14-generic x86_64)

Revision history for this message
Matthew O'Connor (uvirjf2u1144-matt-hknftjnl78lw) wrote :

I have started encountering this problem since upgrade from 11.04 to 11.10, also. Current kernel version is:

3.0.0-19-server #33-Ubuntu SMP Thu Apr 19 20:32:48 UTC 2012 x86_64 x86_64 x86_64

I run KVM/QEMU on the afflicted system and attach to an iSCSI target. Nothing visible in the logs. I have kernel.panic configured in my sysctl, also, but it has no effect and the system does not restart. I actually have a pair of systems on identical kernel versions. The surviving system isn't doing much work right now, whereas the failing system is running a couple of KVM/QEMU virtual machines. It takes about 12-18 hours for this failure to occur right now. Unfortunately by the time I get to the machine's terminal, so many "rcu_sched_state..." messages have printed that I don't see if there is a BUG report or stack-trace available.

I would like to assist with testing - which upstream kernel should I try, and should I only look at the oneiric releases or can I push into the precise releases? Anything else I can do to help?

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

@Arkadiy, @Matthew

Can you test the upstream kernel available at:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.4-rc6-precise/

Revision history for this message
Matthew O'Connor (uvirjf2u1144-matt-hknftjnl78lw) wrote : Re: [Bug 905246] Re: kernel BUG at/build/buildd/linux-3.0.0/arch/x86/kvm/mmu.c:703

Hi Joseph,

This kernel is causing panics when I try to crank up the open-iscsi
initiator. Would you like the OOPS if I can catch it?

On 5/18/2012 4:02 PM, Joseph Salisbury wrote:
> @Arkadiy, @Matthew
>
> Can you test the upstream kernel available at:
> http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.4-rc6-precise/
>

--

Sincerely,
  Matthew O'Connor

-----------------------------------------------------------------
Sr. Software Engineer
PGP/GPG Key: 0x55F981C4
Fingerprint: E5DC A0F8 5A40 E4DA 2CE6 B5A2 014C 2CBF 55F9 81C4

Engineering and Computer Simulations, Inc.
11825 High Tech Ave Suite 250
Orlando, FL 32817

Tel: 407-823-9991 x315
Fax: 407-823-8299
Email: <email address hidden>
Web: www.ecsorl.com
-----------------------------------------------------------------

CONFIDENTIAL NOTICE: The information contained in this electronic
message is legally privileged, confidential and exempt from disclosure
under applicable law. It is intended only for the use of the individual
or entity named above. If the reader of this message is not the intended
recipient, you are hereby notified that any dissemination, distribution
or copying of this message is strictly prohibited. If you have received
this communication in error, please notify the sender immediately by
return e-mail and delete the original message and any copies of it from
your computer system. Thank you.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote : Re: kernel BUG at /build/buildd/linux-3.0.0/arch/x86/kvm/mmu.c:703

@Matthew

Can you possible test the latest v3.2 and/or the latest v3.3 kernels?

v3.2: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.2.18-precise/

v3.3: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.3.7-precise/

Revision history for this message
Matthew O'Connor (uvirjf2u1144-matt-hknftjnl78lw) wrote : Re: [Bug 905246] Re: kernel BUG at/build/buildd/linux-3.0.0/arch/x86/kvm/mmu.c:703

3.3 doesn't exhibit the iSCSI initiator panic, but unfortunately causes
one of my two nodes (node B) to spontaneously reboot (no oops reported)
when it joins the DLM with node A. I am beginning to think node A has
some underlying issues, perhaps related to the last upgrade I did, and I
am tempted to perform a "clean"-install on it. Node A was the first and
only (to date) to exhibit the original CPU hang-bug that I commented
on. Node A started as 10.04 I believe, and was upgraded to 11.10. Node
B was clean-installed with 11.04 and upgraded to 11.10.

Node B is unfortunately a production node, so I have to roll back to
v3.0 and leave it there. I will try to isolate node A and get it some
(non-critical) company to perform further testing on kernels v3.3 and v3.2.

On 5/24/2012 3:43 PM, Joseph Salisbury wrote:
> @Matthew
>
> Can you possible test the latest v3.2 and/or the latest v3.3 kernels?
>
> v3.2: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.2.18-precise/
>
> v3.3: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.3.7-precise/
>

--

Sincerely,
  Matthew O'Connor

-----------------------------------------------------------------
Sr. Software Engineer
PGP/GPG Key: 0x55F981C4
Fingerprint: E5DC A0F8 5A40 E4DA 2CE6 B5A2 014C 2CBF 55F9 81C4

Engineering and Computer Simulations, Inc.
11825 High Tech Ave Suite 250
Orlando, FL 32817

Tel: 407-823-9991 x315
Fax: 407-823-8299
Email: <email address hidden>
Web: www.ecsorl.com
-----------------------------------------------------------------

CONFIDENTIAL NOTICE: The information contained in this electronic
message is legally privileged, confidential and exempt from disclosure
under applicable law. It is intended only for the use of the individual
or entity named above. If the reader of this message is not the intended
recipient, you are hereby notified that any dissemination, distribution
or copying of this message is strictly prohibited. If you have received
this communication in error, please notify the sender immediately by
return e-mail and delete the original message and any copies of it from
your computer system. Thank you.

Revision history for this message
penalvch (penalvch) wrote :

Shyam, this bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? If so, could you please test the latest upstream kernel available following https://wiki.ubuntu.com/KernelMainlineBuilds ? It will allow additional upstream developers to examine the issue. Please do not test the kernel in the daily folder, but the one all the way at the bottom. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested and remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text.

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream-VERSION-NUMBER', where VERSION-NUMBER is the version number of the kernel you tested.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream-VERSION-NUMBER', where VERSION-NUMBER is the version number of the kernel you tested.

If you are unable to test the mainline kernel, please comment as to why specifically you were unable to test it and add the tag: 'kernel-unable-to-test-upstream'.

Please let us know your results. Thank you for your understanding.

Helpful Bug Reporting Links:
https://help.ubuntu.com/community/ReportingBugs#Bug_Reporting_Etiquette
https://help.ubuntu.com/community/ReportingBugs#A3._Make_sure_the_bug_hasn.27t_already_been_reported
https://help.ubuntu.com/community/ReportingBugs#Adding_Apport_Debug_Information_to_an_Existing_Launchpad_Bug
https://help.ubuntu.com/community/ReportingBugs#Adding_Additional_Attachments_to_an_Existing_Launchpad_Bug

summary: - kernel BUG at /build/buildd/linux-3.0.0/arch/x86/kvm/mmu.c:703
+ kernel BUG at /build/buildd/linux-3.0.0/arch/x86/kvm/mmu.c:703; RIP:
+ 0010:[<ffffffffa034b160>] [<ffffffffa034b160>] rmap_remove+0x1a0/0x1c0
+ [kvm]
tags: added: regression-release
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Shyam (shyam-zadarastorage) wrote :

Hi,

I added tag kernel-unable-to-test-upstream. I cannot currently test this on upstream kernel as my environments are tied up in production activity.

--Shyam

tags: added: kernel-unable-to-test-upstream
Revision history for this message
penalvch (penalvch) wrote :

Shyam, marking this back to Confirmed. For future reference you can manage the status of your own bugs by clicking on the current status in the yellow line and then choosing a new status in the revealed drop down box. You can learn more about bug statuses at https://wiki.ubuntu.com/Bugs/Status.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
penalvch (penalvch) wrote :

Shyam, as per http://www.dell.com/support/troubleshooting/us/en/19/Product/poweredge-r710 an update is available for your BIOS (6.3.0). If you update to this during your maintenance window, does it change anything?

If not, could you please both specify what happened, and provide the output of the following terminal command:
sudo dmidecode -s bios-version && sudo dmidecode -s bios-release-date

Please note your current BIOS is already in the Bug Description, so posting this on the old BIOS would not be helpful.

Thank you for your understanding.

tags: added: bios-outdated-6.3.0
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.