[Ubuntu18.04][Power9][DD2.2]package installation segfaults inside debian chroot env in P9 KVM guest with HTM enabled (kvm)

Bug #1792501 reported by bugproxy
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Fix Released
Critical
Canonical Kernel Team
linux (Ubuntu)
Fix Released
Medium
Joseph Salisbury
Bionic
Fix Released
Medium
Joseph Salisbury
Cosmic
Fix Released
Medium
Joseph Salisbury

Bug Description

== SRU Justification ==
IBM is requesting this commit in Bionic. It fixes a regression
introduced by upstream commit 4bb3c7a020.

Without this patch, package installation segfaults inside debian chroot
env in P9 KVM guest with HTM enabled.

The fix has already landed in Cosmic master-next.

== Fix ==
f14040bca892 ("KVM: PPC: Book3S HV: Fix guest r11 corruption with POWER9 TM workarounds")

== Regression Potential ==
Low. This commit fixes an existing regrssion and is specific to powerpc. It has been cc'd to
upstream stable, so has had additional upstream review.

== Test Case ==
A test kernel was built with this patch and tested by the original bug reporter.
The bug reporter states the test kernel resolved the bug.

== Comment: #0 - Satheesh Rajendran <email address hidden> - 2018-09-11 04:10:09 ==
---Problem Description---
package installation segfaults inside debian chroot env in P9 KVM guest with HTM enabled

---Additional Hardware Info---
FW with tm-suspend-mode enabled
#cd /sys/firmware/devicetree/base/ibm,opal/fw-features/
#ls -1 tm-suspend-mode
enabled
name
phandle

qemu-kvm 1:2.11+dfsg-1ubuntu7.4

Machine Type = Power9 DD2.2

---Steps to Reproduce---
 1. Boot a P9 KVM guest Ubuntu 18.04 (with cap-htm=on, bydefault it is on)
tried with upstream kernel aswell(same results)

create tap device in host
# tunctl -t tap1 -u `whoami`;brctl addif virbr0 tap1;ifconfig tap1 up
#qemu-system-ppc64 -enable-kvm -M pseries -m 8192 -smp 4 -drive file=/home/sath/ubuntu-18.04-ppc64le.qcow2,format=qcow2,if=none,id=drive-scsi0 -device virtio-scsi-pci,id=drive-scsi0 -device scsi-hd,drive=drive-scsi0 -serial mon:stdio -enable-kvm -vga none -nographic -kernel /home/sath/vmlinux_4.19 -append root=/dev/sda2 rw console=tty0 console=ttyS0,115200 init=/sbin/init initcall_debug -netdev tap,id=mynet1,ifname=tap1,script=no,downscript=no -device virtio-net,netdev=mynet1,mac=52:55:00:d1:55:42

run dhclient inside guest.

2. # mkdir -p stretch
# debootstrap stretch /stretch http://httpredir.debian.org/debian
# chroot /stretch
/# apt-get update && apt-get install -y make gcc ruby python

...
[ 32.029474] random: crng init done
[ 32.029477] random: 7 urandom warning(s) missed due to ratelimiting
[ 500.300835] dpkg-deb[8704]: segfault (11) at c0000000000037fa nip 7fffac2d098c lr 7fffac2d08c4 code 1 in libc-2.24.so[7fffac170000+190000]
[ 500.300863] dpkg-deb[8704]: code: 48000028 eb090010 2eb80000 4096006c 419e0074 85270004 394a0001 794a0020
[ 500.300881] dpkg-deb[8704]: code: 71280001 408200a0 1d2a0018 7d2b4a14 <a1090006> 2ea80000 40960010 e9090008

---uname output---
4.15.0-34,4.19.0-rc3

---Debugger---
A debugger is not configured

Contact Information = <email address hidden>

Userspace tool common name:

KVM Guest: Ubuntu GLIBC 2.27-3ubuntu1) stable release version 2.27,
Chroot inside KVM Guest: Debian GLIBC 2.24-11+deb9u3) stable release version 2.24

Userspace rpm:

KVM Guest: Ubuntu GLIBC 2.27-3ubuntu1) stable release version 2.27,
Chroot inside KVM Guest: Debian GLIBC 2.24-11+deb9u3) stable release version 2.24

The userspace tool has the following bit modes: both

Userspace tool obtained from project website: na

*Additional Instructions for <email address hidden>:
-Post a private note with access information to the machine that the bug is occuring on.
-Attach ltrace and strace of userspace application.

So latest update taken from https://github.ibm.com/powercloud/icp-ppc64le/issues/470

was able to recreate segfault using TM test cases

/linux/tools/testing/selftests/powerpc/tm

# ./tm-vmxcopy
test: tm_vmxcopy
tags: git_version:v4.19-rc3-0-g11da3a7f84f1-dirty
!! child died by signal 11
failure: tm_vmxcopy

this particular test on being run gets a signal 11

[267132.434651] tm-vmxcopy[641]: unhandled signal 11 at 0000000000000001 nip 0000000104ba122c lr 0000000104ba11e4 code 30001
[267253.708795] tm-vmxcopy[7861]: unhandled signal 11 at 0000000000000001 nip 000000012a31122c lr 000000012a3111e4 code 30001
[267385.064533] tm-vmxcopy[13314]: unhandled signal 11 at 0000000000000001 nip 00000001235f122c lr 00000001235f11e4 code 30001

== Comment: #12 - Michael Neuling <email address hidden> - 2018-09-13 00:34:16 ==
Fixes r11 corruption.

== Comment: #14 - Satheesh Rajendran <email address hidden> - 2018-09-13 03:15:46 ==
Tested with above patch on KVM host and reported issue is fixed.

# git log -1
commit 72664e47565f5de0a1fead1d9111c97b9b537713 (HEAD -> fix)
Author: Michael Neuling <email address hidden>
Date: Thu Sep 13 15:33:47 2018 +1000

    KVM: PPC: Book3S HV: Fix guest r11 corruption with POWER9 TM workarounds

    When we come into the softpatch handler (0x1500), we use r11 to store
    the HSRR0 for later use by the denorm handler.

    We also use the softpatch handler for the TM workarounds for
    POWER9. Unfortunately, in kvmppc_interrupt_hv we later store r11 out
    to the vcpu assuming it's still what we got from userspace.

    This causes r11 to be corrupted in the VCPU and hence when we restore
    the guest, we get a corrupted r11. We've seen this when running TM
    tests inside guests on P9.

    This fixes the problem by only touching r11 in the denorm case.

    Fixes: 4bb3c7a020 ("KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9")
    Cc: <email address hidden> # 4.17+
    Test-by: Suraj Jitindar Singh <email address hidden>
    Reviewed-by: Paul Mackerras <email address hidden>
    Signed-off-by: Michael Neuling <email address hidden>

Regards,
-Satheesh

http://patchwork.ozlabs.org/patch/969256/

Revision history for this message
bugproxy (bugproxy) wrote : sosreport of debian chroot inside kvm guest

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-171274 severity-critical targetmilestone-inin---
Revision history for this message
bugproxy (bugproxy) wrote : sosreport of kvm guest

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote : Fix for KVM host

Default Comment by Bridge

Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
Frank Heimes (fheimes)
affects: ubuntu → linux (Ubuntu)
Changed in ubuntu-power-systems:
importance: Undecided → Critical
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks for the report!

With the patch not even being in 4.19 yet this clearly affects C+B at least.
Since it is P9 only and P9 is not meant to work on xenial, I think the P9 HTM workarounds are only in Bionic - so this is B+C but no earlier release.

tags: added: p9
Changed in linux (Ubuntu Cosmic):
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Bionic):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Cosmic):
status: New → In Progress
Changed in linux (Ubuntu Bionic):
status: New → In Progress
Changed in linux (Ubuntu Cosmic):
importance: Undecided → Medium
Changed in linux (Ubuntu Bionic):
importance: Undecided → Medium
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: New → In Progress
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a test kernel with the patch posted to this bug. The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1792501

Can you test this kernel and see if it resolves this bug?

Note about installing test kernels:
* If the test kernel is prior to 4.15(Bionic) you need to install the linux-image and linux-image-extra .deb packages.
* If the test kernel is 4.15(Bionic) or newer, you need to install the linux-modules, linux-modules-extra and linux-image-unsigned .deb packages.

Thanks in advance!

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2018-09-21 11:29 EDT-------
(In reply to comment #20)
> I built a test kernel with the patch posted to this bug. The test kernel
> can be downloaded from:
> http://kernel.ubuntu.com/~jsalisbury/lp1792501
>
> Can you test this kernel and see if it resolves this bug?
>
> Note about installing test kernels:
> * If the test kernel is prior to 4.15(Bionic) you need to install the
> linux-image and linux-image-extra .deb packages.
> * If the test kernel is 4.15(Bionic) or newer, you need to install the
> linux-modules, linux-modules-extra and linux-image-unsigned .deb packages.
>
> Thanks in advance!

Tested and the issue is not seen with this kernel.

Regards,
-Satheesh.

Changed in linux (Ubuntu Cosmic):
status: In Progress → Fix Committed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :
description: updated
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: In Progress → Fix Committed
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-10-30 06:54 EDT-------
Tested with the proposed kernel (4.15.0-39-generic) and found the issue fixed, fix can be included in official release and bug can be closed.

Env: host kernel 4.15.0-39-generic
guest kernel 4.15.0-39-generic

Regards,
-Satheesh.

Revision history for this message
bugproxy (bugproxy) wrote : testcase

------- Comment on attachment From <email address hidden> 2018-10-30 06:58 EDT-------

alternate testcase
1. boot the guest as mentioned in bug description
2. copy the testcase inside guest
3. compile and run the testcase
cc -g -ggdb -O3 -static -I. -m64 ./htm_demo.c -lpthread
#./a.out
Starting 2 worker threads, 1000 loops
Thread 2 crossed zero
PASS -- shared values correct
Done! Bye!

Test passed.

Regards,
-Satheesh

Revision history for this message
Andrew Cloke (andrew-cloke) wrote :

Thanks for that verification confirmation, updated the tags to reflect verification-done.

tags: added: verification-done-bionic
removed: verification-needed-bionic
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (5.4 KiB)

This bug was fixed in the package linux - 4.15.0-39.42

---------------
linux (4.15.0-39.42) bionic; urgency=medium

  * linux: 4.15.0-39.42 -proposed tracker (LP: #1799411)

  * Linux: insufficient shootdown for paging-structure caches (LP: #1798897)
    - mm: move tlb_table_flush to tlb_flush_mmu_free
    - mm/tlb: Remove tlb_remove_table() non-concurrent condition
    - mm/tlb, x86/mm: Support invalidating TLB caches for RCU_TABLE_FREE
    - [Config] CONFIG_HAVE_RCU_TABLE_INVALIDATE=y

  * Ubuntu18.04: GPU total memory is reduced (LP: #1792102)
    - Revert "powerpc/powernv: Increase memory block size to 1GB on radix"

  * arm64: snapdragon: reduce boot noise (LP: #1797154)
    - [Config] arm64: snapdragon: DRM_MSM=m
    - [Config] arm64: snapdragon: SND*=m
    - [Config] arm64: snapdragon: disable ARM_SDE_INTERFACE
    - [Config] arm64: snapdragon: disable DRM_I2C_ADV7511_CEC
    - [Config] arm64: snapdragon: disable VIDEO_ADV7511, VIDEO_COBALT

  * [Bionic] CPPC bug fixes (LP: #1796949)
    - ACPI / CPPC: Update all pr_(debug/err) messages to log the susbspace id
    - cpufreq: CPPC: Don't set transition_latency
    - ACPI / CPPC: Fix invalid PCC channel status errors

  * regression in 'ip --family bridge neigh' since linux v4.12 (LP: #1796748)
    - rtnetlink: fix rtnl_fdb_dump() for ndmsg header

  * screen displays abnormally on the lenovo M715 with the AMD GPU (Radeon Vega
    8 Mobile, rev ca, 1002:15dd) (LP: #1796786)
    - drm/amd/display: Fix takover from VGA mode
    - drm/amd/display: early return if not in vga mode in disable_vga
    - drm/amd/display: Refine disable VGA

  * arm64: snapdragon: WARNING: CPU: 0 PID: 1 arch/arm64/kernel/setup.c:271
    reserve_memblock_reserved_regions (LP: #1797139)
    - SAUCE: arm64: Fix /proc/iomem for reserved but not memory regions

  * The front MIC can't work on the Lenovo M715 (LP: #1797292)
    - ALSA: hda/realtek - Fix the problem of the front MIC on the Lenovo M715

  * Keyboard backlight sysfs sometimes is missing on Dell laptops (LP: #1797304)
    - platform/x86: dell-smbios: Correct some style warnings
    - platform/x86: dell-smbios: Rename dell-smbios source to dell-smbios-base
    - platform/x86: dell-smbios: Link all dell-smbios-* modules together
    - [Config] CONFIG_DELL_SMBIOS_SMM=y, CONFIG_DELL_SMBIOS_WMI=y

  * rpi3b+: ethernet not working (LP: #1797406)
    - lan78xx: Don't reset the interface on open

  * 87cdf3148b11 was never backported to 4.15 (LP: #1795653)
    - xfrm: Verify MAC header exists before overwriting eth_hdr(skb)->h_proto

  * [Ubuntu18.04][Power9][DD2.2]package installation segfaults inside debian
    chroot env in P9 KVM guest with HTM enabled (kvm) (LP: #1792501)
    - KVM: PPC: Book3S HV: Fix guest r11 corruption with POWER9 TM workarounds

  * Provide mode where all vCPUs on a core must be the same VM (LP: #1792957)
    - KVM: PPC: Book3S HV: Provide mode where all vCPUs on a core must be the same
      VM

  * fscache: bad refcounting in fscache_op_complete leads to OOPS (LP: #1797314)
    - SAUCE: fscache: Fix race in decrementing refcount of op->npages

  * CVE-2018-9363
    - Bluetooth: hidp: buffer overflow in hidp_process_report

  * CVE-20...

Read more...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu Cosmic):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
Changed in ubuntu-power-systems:
status: Fix Committed → Fix Released
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2019-02-25 05:38 EDT-------
Thanks, Closing as per previous comment.

tags: added: targetmilestone-inin18041
removed: targetmilestone-inin---
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.