[Bionic] arm64: topology: Avoid checking numa mask for scheduler MC selection

Bug #1797202 reported by Manoj Iyer
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Undecided
Unassigned
Bionic
Fix Released
Undecided
Unassigned

Bug Description

[Impact]
The numa mask subset check can often lead to system hang or crash during CPU hotplug and system suspend operation if NUMA is disabled. This is mostly observed on HMP systems where the CPU compute capacities are different and ends up in different scheduler domains. Since cpumask_of_node is returned instead core_sibling, the scheduler is confused with incorrect cpumasks(e.g. one CPU in two different sched domains at the same time) on CPU hotplug.

[Fix]
The following patch fixes this issue:
e156ab71a974 arm64: topology: Avoid checking numa mask for scheduler MC selection

[Test]
-- testcase --

#!/bin/bash

stress-ng -q --vm 100 --vm-bytes 1G --cpu 210 --timeout 2400s --metrics-brief &
pid=$!

for i in {1..210}; do
 num=$((RANDOM%10+$i))
 echo "Offline CPU $num"
 echo 0 > /sys/devices/system/cpu/cpu$num/online || exit 1
 sleep 3s
 echo "Online CPU $num"
 echo 1 > /sys/devices/system/cpu/cpu$num/online || exit 1
done

kill -9 $pid
----------------
Executed the test case with stock Bionic kernel and with Bionic kernel with patches applied. No system hang or crash was observed. On Cavium CN99XX Thunder X2 system.

[Regression Potential]
Regression tested on Cavium Thunder X2 CN99XX system, and none were found. Regression risk is low.

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1797202

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: bionic
Manoj Iyer (manjo)
description: updated
Changed in linux (Ubuntu Bionic):
status: New → In Progress
Changed in linux (Ubuntu):
status: Incomplete → Invalid
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Revision history for this message
Manoj Iyer (manjo) wrote :

Executed the test case with -proposed Bionic kernel. No system hang or crash was observed. On Cavium CN99XX Thunder X2 system.

tags: added: verification-done-bionic
removed: verification-needed-bionic
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (5.4 KiB)

This bug was fixed in the package linux - 4.15.0-39.42

---------------
linux (4.15.0-39.42) bionic; urgency=medium

  * linux: 4.15.0-39.42 -proposed tracker (LP: #1799411)

  * Linux: insufficient shootdown for paging-structure caches (LP: #1798897)
    - mm: move tlb_table_flush to tlb_flush_mmu_free
    - mm/tlb: Remove tlb_remove_table() non-concurrent condition
    - mm/tlb, x86/mm: Support invalidating TLB caches for RCU_TABLE_FREE
    - [Config] CONFIG_HAVE_RCU_TABLE_INVALIDATE=y

  * Ubuntu18.04: GPU total memory is reduced (LP: #1792102)
    - Revert "powerpc/powernv: Increase memory block size to 1GB on radix"

  * arm64: snapdragon: reduce boot noise (LP: #1797154)
    - [Config] arm64: snapdragon: DRM_MSM=m
    - [Config] arm64: snapdragon: SND*=m
    - [Config] arm64: snapdragon: disable ARM_SDE_INTERFACE
    - [Config] arm64: snapdragon: disable DRM_I2C_ADV7511_CEC
    - [Config] arm64: snapdragon: disable VIDEO_ADV7511, VIDEO_COBALT

  * [Bionic] CPPC bug fixes (LP: #1796949)
    - ACPI / CPPC: Update all pr_(debug/err) messages to log the susbspace id
    - cpufreq: CPPC: Don't set transition_latency
    - ACPI / CPPC: Fix invalid PCC channel status errors

  * regression in 'ip --family bridge neigh' since linux v4.12 (LP: #1796748)
    - rtnetlink: fix rtnl_fdb_dump() for ndmsg header

  * screen displays abnormally on the lenovo M715 with the AMD GPU (Radeon Vega
    8 Mobile, rev ca, 1002:15dd) (LP: #1796786)
    - drm/amd/display: Fix takover from VGA mode
    - drm/amd/display: early return if not in vga mode in disable_vga
    - drm/amd/display: Refine disable VGA

  * arm64: snapdragon: WARNING: CPU: 0 PID: 1 arch/arm64/kernel/setup.c:271
    reserve_memblock_reserved_regions (LP: #1797139)
    - SAUCE: arm64: Fix /proc/iomem for reserved but not memory regions

  * The front MIC can't work on the Lenovo M715 (LP: #1797292)
    - ALSA: hda/realtek - Fix the problem of the front MIC on the Lenovo M715

  * Keyboard backlight sysfs sometimes is missing on Dell laptops (LP: #1797304)
    - platform/x86: dell-smbios: Correct some style warnings
    - platform/x86: dell-smbios: Rename dell-smbios source to dell-smbios-base
    - platform/x86: dell-smbios: Link all dell-smbios-* modules together
    - [Config] CONFIG_DELL_SMBIOS_SMM=y, CONFIG_DELL_SMBIOS_WMI=y

  * rpi3b+: ethernet not working (LP: #1797406)
    - lan78xx: Don't reset the interface on open

  * 87cdf3148b11 was never backported to 4.15 (LP: #1795653)
    - xfrm: Verify MAC header exists before overwriting eth_hdr(skb)->h_proto

  * [Ubuntu18.04][Power9][DD2.2]package installation segfaults inside debian
    chroot env in P9 KVM guest with HTM enabled (kvm) (LP: #1792501)
    - KVM: PPC: Book3S HV: Fix guest r11 corruption with POWER9 TM workarounds

  * Provide mode where all vCPUs on a core must be the same VM (LP: #1792957)
    - KVM: PPC: Book3S HV: Provide mode where all vCPUs on a core must be the same
      VM

  * fscache: bad refcounting in fscache_op_complete leads to OOPS (LP: #1797314)
    - SAUCE: fscache: Fix race in decrementing refcount of op->npages

  * CVE-2018-9363
    - Bluetooth: hidp: buffer overflow in hidp_process_report

  * CVE-20...

Read more...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Brad Figg (brad-figg)
tags: added: cscc
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.