kernel bug seen while try to use madvise system call with MADV_HWPOISON mode

Bug #1370425 reported by bugproxy
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Unassigned
Utopic
Fix Released
Undecided
Tim Gardner
Vivid
Fix Released
High
Unassigned

Bug Description

Problem Description
====================
kernel bug seen while try to use madvise system call with MADV_HWPOISON mode

---uname output---
Linux u10thp 3.16.0-9-generic #14-Ubuntu SMP Fri Aug 15 15:03:36 UTC 2014 ppc64le ppc64le ppc64le GNU/Linux

Machine Type = Power 8

Steps to Reproduce
====================
1. Install Ubuntu 14.10 guest on PowerKVM.
2. Setup hugepage backing guest VM.
3. Try madv_poison.c code to test madvise sys. call with HWPOISON mode(test code is attached).
 gcc -o madv_poison madv_poison.c
 ./madv_poison -C -i 1 (1 - shm_test)

Ubuntu 14.10 LE throws kernel bug :
root@u10thp:~# ./madv_poison -C -i 1
vm.memory_failure_early_kill = 0
[pid 2301] start page-poisoning test
[pid 2301] there are 1 shm_child
[pid 2301] have spawned 1 processes
[pid 2301] wait for Pid 2304
[pid 2304] shm dirty poisoning page 0x3fffa7ce0000
[ 7905.009001] Injecting memory failure for page 0xe6a7 at 0x3fffa7ce0000
[ 7905.009359] MCE 0xe6a7: dirty LRU page recovery: Recovered
[pid 2304] writing 2
[ 7905.009901] ------------[ cut here ]------------
[ 7905.010164] kernel BUG at /build/buildd/linux-3.16.0/arch/powerpc/mm/fault.c:180!
[ 7905.010396] Oops: Exception in kernel mode, sig: 5 [#234]
[ 7905.010438] SMP NR_CPUS=2048 NUMA pSeries
[ 7905.010480] Modules linked in: pseries_rng rtc_generic ohci_pci
[ 7905.010614] CPU: 0 PID: 2304 Comm: madv_poison Tainted: G D 3.16.0-9-generic #14-Ubuntu
[ 7905.010686] task: c0000000e0a92a60 ti: c0000000e09e8000 task.ti: c0000000e09e8000
[ 7905.010746] NIP: c0000000009e3314 LR: c0000000009e2e54 CTR: 0000000000000000
[ 7905.010864] REGS: c0000000e09eb990 TRAP: 0700 Tainted: G D (3.16.0-9-generic)
[ 7905.010924] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 28002882 XER: 00000000
[ 7905.011125] CFAR: c0000000009e3170 SOFTE: 1
GPR00: c0000000009e2e54 c0000000e09ebc10 c0000000013742e0 0000000000000010
GPR04: c0000000e0b37ff8 00003fffa7ce0000 00000000000000a9 0000000000000000
GPR08: 0000000000000000 0000000000000010 c0000000e0a92a60 0000000000000020
GPR12: 0000000048002884 c00000000fe40000 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 00000000000000a9 0000000000000000 c0000000e0597a40 c0000000e022b060
GPR24: 0000000000000010 c0000000e022b000 c000000000009568 00003fffa7ce0000
GPR28: 0000000000000000 0000000000000000 0000000002000000 c0000000e09ebea0
[ 7905.012189] NIP [c0000000009e3314] do_page_fault+0x984/0x990
[ 7905.012241] LR [c0000000009e2e54] do_page_fault+0x4c4/0x990
[ 7905.012281] Call Trace:
[ 7905.012361] [c0000000e09ebc10] [c0000000009e2e54] do_page_fault+0x4c4/0x990 (unreliable)
[ 7905.012434] [c0000000e09ebe30] [c000000000009568] handle_page_fault+0x10/0x30
[ 7905.012494] Instruction dump:
[ 7905.012580] e92d0290 e8690460 38630060 4b7274d9 60000000 e93f0108 3bc00000 792a97e3
[ 7905.012683] 4082f77c 3bc00009 60000000 4bfff774 <0fe00000> 00000000 00000000 3c4c0099
[ 7905.012845] ---[ end trace a48a199a061eed79 ]---
[ 7905.019084]
[pid 2301] Ins 0: Pid 2304: failed - shared memory test
[pid 2301] !!! Page Poisoning Test is FAILED (1 failures found). !!!

[pid 2301] page-poisoning test done!
root@u10thp:~#

== Comment: #1 - Kalpana Shetty <email address hidden> - ==
The test code works fine with x86/Ubuntu VM so if it is not supported on power then it should have thrown an error not supported as it does with PowerKVM / RHEL 7 VM.

Intel/Ubuntu 14.04 VM: =================================> Working fine.
root@u04vm14:~# ./madv_poison -C -i 1 (shm_test case)
vm.memory_failure_early_kill = 0
[pid 7325] start page-poisoning test
[pid 7325] there are 1 shm_child
[pid 7325] have spawned 1 processes
[pid 7325] wait for Pid 7328
[pid 7328] shm dirty poisoning page 0x7f60ca8ea000
[pid 7328] writing 2
[pid 7328] signal 7 code 4 addr 0x7f60ca8ea000
[pid 7328] pass: recovered
[pid 7325] Ins 0: Pid 7328: pass - shared memory test
[pid 7325] !!! Page Poisoning Test got PASS. !!!

[pid 7325] page-poisoning test done!

PowerKVM / RHEL 7 VM:
[root@rhel7-web-VM1 ~]# ./madv_poison -C -i 1
sysctl: cannot stat /proc/sys/vm/memory_failure_early_kill: No such file or directory
[pid 11512] start page-poisoning test
[pid 11512] there are 1 shm_child
[pid 11512] have spawned 1 processes
[pid 11514] shm dirty poisoning page 0x3fff84d60000
[pid 11512] wait for Pid 11514
[pid 11514] failed: Kernel doesn't support poison injection ============================> unsupported error.
[pid 11512] Ins 0: Pid 11514: failed - shared memory test
[pid 11512] !!! Page Poisoning Test is FAILED (1 failures found). !!!

[pid 11512] page-poisoning test done!

Revision history for this message
bugproxy (bugproxy) wrote : madv_poison.c

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-115345 severity-high targetmilestone-inin---
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1370425/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
Luciano Chavez (lnx1138)
affects: ubuntu → linux (Ubuntu)
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1370425

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: kernel-da-key
Changed in linux (Ubuntu):
importance: Undecided → High
status: Incomplete → Confirmed
Revision history for this message
Anton Blanchard (anton-samba) wrote :
Revision history for this message
Thierry FAUCK (thierry-j) wrote :

Assigning to Canonical-taco-screen-team since patch is available
http://patchwork.ozlabs.org/patch/392712/
http://patchwork.ozlabs.org/patch/392713/
Submitter Anton Blanchard
Date Sept. 24, 2014, 12:27 a.m.
Message ID <email address hidden>
Download mbox | patch
Permalink /patch/392713/
State Accepted
Commit 9d57472f61acd7c3a33ebf5a79361e316d8ffbef

Changed in linux (Ubuntu):
assignee: nobody → Canonical Taco Screen Team (canonical-taco-screeners)
Changed in linux (Ubuntu):
assignee: Canonical Taco Screen Team (canonical-taco-screeners) → Canonical Kernel Team (canonical-kernel-team)
status: Confirmed → Triaged
Revision history for this message
Tim Gardner (timg-tpi) wrote :

These patches are in linux-next and should get merged soon which will make them available for SRU.

Chris J Arges (arges)
Changed in linux (Ubuntu):
assignee: Canonical Kernel Team (canonical-kernel-team) → Chris J Arges (arges)
status: Triaged → In Progress
Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Utopic):
assignee: nobody → Tim Gardner (timg-tpi)
status: New → In Progress
Revision history for this message
Tim Gardner (timg-tpi) wrote :

The following changes since commit 8b38f4d728f119b337fd279af7a44dd176be776b:

  GenWQE: Support blocking when DDCB queue is busy (2014-11-21 10:13:05 -0600)

are available in the git repository at:

  git://kernel.ubuntu.com/rtg/ubuntu-utopic.git

for you to fetch changes up to 4e44362baaa389cd36a1ca8bd000e72c4edbac88:

  powerpc: Fill in si_addr_lsb siginfo field (2014-11-21 11:43:31 -0600)

----------------------------------------------------------------
Anton Blanchard (3):
      powerpc: Simplify do_sigbus
      powerpc: Add VM_FAULT_HWPOISON handling to powerpc page fault handler
      powerpc: Fill in si_addr_lsb siginfo field

 arch/powerpc/mm/fault.c | 43 ++++++++++++++++++++++++++++---------------
 1 file changed, 28 insertions(+), 15 deletions(-)

Changed in linux (Ubuntu Vivid):
assignee: Chris J Arges (arges) → nobody
status: In Progress → Fix Released
Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Utopic):
status: In Progress → Fix Committed
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-utopic' to 'verification-done-utopic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-utopic
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla
Download full text (3.6 KiB)

------- Comment From <email address hidden> 2014-12-02 08:30 EDT-------
Failed in applying proposed fix for this bug.

I followed this link - https://wiki.ubuntu.com/Testing/EnableProposed
to apply utopic proposed fix.

# deb-src http://extras.ubuntu.com/ubuntu utopic main
1. Added utopic-proposed into end of repo file - /etc/apt/sources.list
deb http://ports.ubuntu.com/ubuntu-ports utopic-proposed restricted main multiverse universe

2. Create the file /etc/apt/preferences with below content:
Package: *
Pin: release a=utopic-proposed
Pin-Priority: 400

3. issue upgrade

apt-get upgrade -s

root@u1410:~# apt-get upgrade -s
Reading package lists... Done
Building dependency tree
Reading state information... Done
Calculating upgrade... Done
The following packages have been kept back:
linux-generic linux-headers-generic linux-image-generic
The following packages will be upgraded:
command-not-found command-not-found-data libcurl3-gnutls libgnutls-deb0-28
libgnutls-openssl27 lshw mountall python3-commandnotfound systemd-shim
tzdata wget
11 upgraded, 0 newly installed, 0 to remove and 3 not upgraded.
Inst libgnutls-openssl27 [3.2.16-1ubuntu2] (3.2.16-1ubuntu2.1 Ubuntu:14.10/utopic-updates [ppc64el]) []
Inst libgnutls-deb0-28 [3.2.16-1ubuntu2] (3.2.16-1ubuntu2.1 Ubuntu:14.10/utopic-updates [ppc64el])
Inst mountall [2.54build1] (2.54ubuntu0.14.10.1 Ubuntu:14.10/utopic-updates [ppc64el])
Inst libcurl3-gnutls [7.37.1-1ubuntu3] (7.37.1-1ubuntu3.1 Ubuntu:14.10/utopic-updates [ppc64el])
Inst tzdata [2014h-2] (2014i-0ubuntu0.14.10 Ubuntu:14.10/utopic-updates [all])
Conf tzdata (2014i-0ubuntu0.14.10 Ubuntu:14.10/utopic-updates [all])
Inst command-not-found-data [0.3ubuntu15] (0.3ubuntu15.1 Ubuntu:14.10/utopic-updates [ppc64el])
Inst python3-commandnotfound [0.3ubuntu15] (0.3ubuntu15.1 Ubuntu:14.10/utopic-updates [all])
Inst command-not-found [0.3ubuntu15] (0.3ubuntu15.1 Ubuntu:14.10/utopic-updates [all])
Inst lshw [02.16-2ubuntu2] (02.16-2ubuntu2.1 Ubuntu:14.10/utopic-updates [ppc64el])
Inst systemd-shim [8-1] (8-1ubuntu0.1 Ubuntu:14.10/utopic-updates [ppc64el])
Inst wget [1.15-1ubuntu1] (1.15-1ubuntu1.14.10.1 Ubuntu:14.10/utopic-updates [ppc64el])
Conf libgnutls-deb0-28 (3.2.16-1ubuntu2.1 Ubuntu:14.10/utopic-updates [ppc64el])
Conf libgnutls-openssl27 (3.2.16-1ubuntu2.1 Ubuntu:14.10/utopic-updates [ppc64el])
Conf mountall (2.54ubuntu0.14.10.1 Ubuntu:14.10/utopic-updates [ppc64el])
Conf libcurl3-gnutls (7.37.1-1ubuntu3.1 Ubuntu:14.10/utopic-updates [ppc64el])
Conf command-not-found-data (0.3ubuntu15.1 Ubuntu:14.10/utopic-updates [ppc64el])
Conf python3-commandnotfound (0.3ubuntu15.1 Ubuntu:14.10/utopic-updates [all])
Conf command-not-found (0.3ubuntu15.1 Ubuntu:14.10/utopic-updataptitude -t trusty-proposedes [all])
Conf lshw (02.16-2ubuntu2.1 Ubuntu:14.10/utopic-updates [ppc64el])
Conf systemd-shim (8-1ubuntu0.1 Ubuntu:14.10/utopic-updates [ppc64el])
Conf wget (1.15-1ubuntu1.14.10.1 Ubuntu:14.10/utopic-updates [ppc64el])

4. Verify
root@u1410:~# aptitude -t utopic-proposed

Actions Undo Package Resolver Search Options Views Help
C-T: Menu ?: Help q: Quit u: Update g: Download/Install/Remove Pkgs
??????????????????????????????????????????????????????...

Read more...

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2014-12-03 12:06 EDT-------
Finally got up with utopic proposed changes applied on my Ubuntu 14.10 guest VM.

Fix is working fine.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2014-12-03 12:09 EDT-------
root@u1410:~# ./madv_poison -C -i 1
vm.memory_failure_early_kill = 0
[pid 1043] start page-poisoning test
[pid 1043] there are 1 shm_child
[pid 1043] have spawned 1 processes
[pid 1043] wait for Pid 1046
[pid 1046] shm dirty poisoning page 0x3fff8d1c0000
[ 348.014055] MCE 0x6e4: dirty LRU page recovery: Recovered
[pid 1046] writing 2
[ 348.014210] MCE: Killing madv_poison:1046 due to hardware memory corruption fault at 3fff8d1c0000
[pid 1046] signal 7 code 4 addr 0x3fff8d1c0000
[pid 1046] pass: recovered
[pid 1043] Ins 0: Pid 1046: pass - shared memory test
[pid 1043] !!! Page Poisoning Test got PASS. !!!

[pid 1043] page-poisoning test done!
root@u1410:~#

Revision history for this message
Luis Henriques (henrix) wrote :

Thanks for testing! As per comment #10, I'm tagging this bug as verified in utopic.

tags: added: verification-done-utopic
removed: verification-needed-utopic
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (16.7 KiB)

This bug was fixed in the package linux - 3.16.0-26.35

---------------
linux (3.16.0-26.35) utopic; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1398118

  [ Upstream Kernel Changes ]

  * Revert "drm/nouveau: punt fbcon resume out to a workqueue"
  * Revert "drm/nouveau/kms: take more care when pulling down accelerated
    fbcon"

linux (3.16.0-26.34) utopic; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1395892

  [ Chris J Arges ]

  * [Config] CONFIG_SCOM_DEBUGFS=y for powerpc/powerpc64-smp ppc64el/generic
    - LP: #1395855

  [ Tim Gardner ]

  * [Config] CONFIG_GENWQE_PLATFORM_ERROR_RECOVERY=1 for powerpc/ppc64el
    - LP: #1392021

  [ Upstream Kernel Changes ]

  * Revert "usb: dwc3: dwc3-omap: Disable/Enable only wrapper interrupts in
    prepare/complete"
    - LP: #1393401
  * Revert "iwlwifi: mvm: treat EAPOLs like mgmt frames wrt rate"
    - LP: #1393401
  * Revert "block: all blk-mq requests are tagged"
    - LP: #1393401
  * ACPI / blacklist: add Win8 OSI quirks for some Dell laptop models
    - LP: #1339456
  * PCI: Remove "no hotplug settings from platform" warning
    - LP: #1390182
  * drm/nouveau/kms: take more care when pulling down accelerated fbcon
    - LP: #1386695
  * drm/nouveau: punt fbcon resume out to a workqueue
    - LP: #1386695
  * drm/tilcdc: Fix the error path in tilcdc_load()
    - LP: #1393401
  * builddeb: put the dbg files into the correct directory
    - LP: #1393401
  * switch iov_iter_get_pages() to passing maximal number of pages
    - LP: #1393401
  * fuse: honour max_read and max_write in direct_io mode
    - LP: #1393401
  * usb: phy: return -ENODEV on failure of try_module_get
    - LP: #1393401
  * PM / clk: Fix crash in clocks management code if !CONFIG_PM_RUNTIME
    - LP: #1393401
  * rt2x00: support Ralink 5362.
    - LP: #1393401
  * wireless: rt2x00: add new rt2800usb devices
    - LP: #1393401
  * NFS: Fix /proc/fs/nfsfs/servers and /proc/fs/nfsfs/volumes
    - LP: #1393401
  * nfs: fix duplicate proc entries
    - LP: #1393401
  * ext4: check EA value offset when loading
    - LP: #1393401
  * jbd2: free bh when descriptor block checksum fails
    - LP: #1393401
  * ext4: don't check quota format when there are no quota files
    - LP: #1393401
  * target: Fix queue full status NULL pointer for SCF_TRANSPORT_TASK_SENSE
    - LP: #1393401
  * vfs: fix data corruption when blocksize < pagesize for mmaped data
    - LP: #1393401
  * ext4: fix mmap data corruption when blocksize < pagesize
    - LP: #1393401
  * ext4: grab missed write_count for EXT4_IOC_SWAP_BOOT
    - LP: #1393401
  * qla_target: don't delete changed nacls
    - LP: #1393401
  * target: Fix APTPL metadata handling for dynamic MappedLUNs
    - LP: #1393401
  * iser-target: Disable TX completion interrupt coalescing
    - LP: #1393401
  * ext4: don't orphan or truncate the boot loader inode
    - LP: #1393401
  * ext4: add ext4_iget_normal() which is to be used for dir tree lookups
    - LP: #1393401
  * ext4: fix reservation overflow in ext4_da_write_begin
    - LP: #1393401
  * ext4: Replace open coded mdata csum feature to helper function
    - LP: #1393401
  * ext4: move error ...

Changed in linux (Ubuntu Utopic):
status: Fix Committed → Fix Released
bugproxy (bugproxy)
tags: added: targetmilestone-inin1504
removed: targetmilestone-inin---
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.