xenial: virtio-scsi: CPU soft lockup due to loop in virtscsi_target_destroy()

Bug #1798110 reported by Mauricio Faria de Oliveira
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Triaged
Medium
Unassigned
Xenial
Fix Released
Medium
Unassigned

Bug Description

[Impact]

 * Detaching virtio-scsi disk in Xenial guest can cause
   CPU soft lockup in guest (and take 100% CPU in host).

 * It may prevent further progress on other tasks that
   depend on resources locked earlier in the SCSI target
   removal stack, and/or impact other SCSI functionality.

 * The fix resolves a corner case in the requests counter
   in the virtio SCSI target, which impacts a downstream
   (SAUCE) patch in the virtio-scsi target removal handler
   that depends on the requests counter value to be zero.

[Test Case]

 * See LP #1798110 (this bug)'s comment #3 (too long for
   this section -- synthetic case with GDB+QEMU) and
   comment #4 (organic test case in cloud instance).

[Regression Potential]

 * It seem low -- this only affects the SCSI command requeue
   path with regards to the reference counter, which is only
   used with real chance of problems in our downstream patch
   (which is now passing this testcase).

 * The other less serious issue would be decrementing it to
   a negative / < 0 value, which is not possible with this
   driver logic (see commit message), because the reqs counter
   is always incremented before calling virtscsi_queuecommand(),
   where this decrement operation is inserted.

[Original Description]

A customer reported a CPU soft lockup on Trusty HWE kernel from Xenial
when detaching a virtio-scsi drive, and provided a crashdump that shows
2 things:

1) The soft locked up CPU is waiting for another CPU to finish something,
and that does not happen because the other CPU is infinitely looping in
virtscsi_target_destroy().

2) The loop happens because the 'tgt->reqs' counter is non-zero, and that
probably happened due to a missing decrement in SCSI command requeue path,
exercised when the virtio ring is full.

The reported problem itself happens because of a downstream/SAUCE patch,
coupled with the problem of the missing decrement for the reqs counter.

Introducing a decrement in the SCSI command requeue path resolves the
problem, verified synthetically with QEMU+GDB and with test-case/loop
provided by the customer as problem reproducer.

CVE References

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :
Download full text (16.0 KiB)

Problem Analysis
================

The dmesg log 'crash/201809061748/dmesg.201809061748' shows the CPU soft lockup occurs 25 seconds after the 'sdb' virtio-scsi drive is removed.
This seems to indicate the events are related (there's usually an extra 2s-3s between an event and the report of the 22s or 23s stuck, for some reason).

 [ 3002.697474] sd 0:0:2:0: [sdb] Synchronizing SCSI cache
 [ 3002.697545] sd 0:0:2:0: [sdb] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
 [ 3028.294602] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [id:2887]

CPU 1 is waiting on another CPU (to finish something)
-------------------------------

The locked up 'id' process is in page fault handling stack (which is OK/normal), calling a function on many (SMP) cpus (see smp_call_function_many() in the top of the stack), and is specifically in the call to one CPU (see smp_call_function_single+0xae/0x110() in the RIP register).

 [ 3028.301755] CPU: 1 PID: 2887 Comm: id Not tainted 4.4.0-133-generic #159~14.04.1-Ubuntu
 ...
 [ 3028.301760] RIP: 0010:[<ffffffff81102b1e>] [<ffffffff81102b1e>] smp_call_function_single+0xae/0x110
 ...
 [ 3028.301797] Call Trace:
 [ 3028.301803] [<ffffffff81071f60>] ? do_kernel_range_flush+0x40/0x40
 [ 3028.301805] [<ffffffff81102e9e>] smp_call_function_many+0x22e/0x270
 [ 3028.301808] [<ffffffff810723f8>] native_flush_tlb_others+0x48/0x120
 [ 3028.301810] [<ffffffff8107256d>] flush_tlb_mm_range+0x9d/0x180
 [ 3028.301815] [<ffffffff811cb1e3>] ptep_clear_flush+0x53/0x60
 [ 3028.301819] [<ffffffff811b75ed>] wp_page_copy.isra.58+0x29d/0x530
 [ 3028.301822] [<ffffffff811b955d>] do_wp_page+0x8d/0x590
 [ 3028.301824] [<ffffffff811bb826>] handle_mm_fault+0xd86/0x1ac0
 [ 3028.301829] [<ffffffff81192c53>] ? free_pages+0x13/0x20
 [ 3028.301835] [<ffffffff810aa754>] ? finish_task_switch+0x244/0x2a0
 [ 3028.301840] [<ffffffff8106b0fb>] __do_page_fault+0x19b/0x430
 [ 3028.301843] [<ffffffff8106b3b2>] do_page_fault+0x22/0x30
 [ 3028.301847] [<ffffffff81822928>] page_fault+0x28/0x30

The smp_call_function_many() address line in the stack trace (smp_call_function_many+0x22e / ffffffff81102e9e) reflects it's called smp_call_function_single():

 ffffffff81102c70 <smp_call_function_many>:
 ...
 ffffffff81102e99: e8 d2 fb ff ff callq ffffffff81102a70 <smp_call_function_single>
 ffffffff81102e9e: 48 83 c4 10 add $0x10,%rsp

... which per the address in the RIP register (smp_call_function_single+0xae ffffffff81102b1e)
is in the (inlined) call to csd_lock_wait() after generic_exec_single().

csd_lock_wait() spins in the value of csd->flags with cpu_relax() / 'pause' instruction,
waiting for it to be unlocked (i.e., not have the CSD_FLAG_LOCK flag / 0x1 value in the
flags field / offset 0x18)

 ffffffff81102a70 <smp_call_function_single>:
 ...
 ffffffff81102b0f: e8 0c fe ff ff callq ffffffff81102920 <generic_exec_single>
 ffffffff81102b14: 8b 55 e8 mov -0x18(%rbp),%edx
 ffffffff81102b17: 83 e2 01 and $0x1,%edx
 ffffffff81102b1a: 74 de je ffffffff81102afa <smp_call_function_singl...

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1798110

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: xenial
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :
Download full text (15.7 KiB)

Synthetic Test Case
===================

Create a Xenial KVM guest
---

mfo@rotom:~$ uvt-simplestreams-libvirt sync release=xenial arch=amd64
mfo@rotom:~$ uvt-kvm create --mem 4096 --cpu 2 --disk 16 mfo-sf194614 release=xenial arch=amd64

Modify the virsh XML to add a virtio-scsi controller + disk
---

mfo@rotom:~$ virsh shutdown mfo-sf194614
mfo@rotom:~$ virsh edit mfo-sf194614

Check the virtio-scsi disk
---

mfo@rotom:~$ virsh start mfo-sf194614
mfo@rotom:~$ uvt-kvm ssh mfo-sf194614

ubuntu@mfo-sf194614:~$ ls -ld /sys/block/sda
lrwxrwxrwx 1 root root 0 Oct 13 14:22 /sys/block/sda -> ../devices/pci0000:00/0000:00:07.0/virtio3/host2/target2:0:1/2:0:1:0/block/sda

ubuntu@mfo-sf194614:~$ lspci -s 7.0
00:07.0 SCSI storage controller: Red Hat, Inc Virtio SCSI

Check the SCSI logging for a dd one block transfer + TUR (test-unit-ready)
---

root@mfo-sf194614:~# apt-get install sg3-utils
root@mfo-sf194614:~# scsi_logging_level --set --mlqueue=4 --mlcomplete=2

root@mfo-sf194614:~# dd if=/dev/sda of=/dev/null count=1

[ 557.084898] sd 2:0:1:0: [sda] tag#1 Send: scmd 0xffff880139fe14f0
[ 557.088133] sd 2:0:1:0: [sda] tag#1 CDB: Read(10) 28 00 00 00 00 00 00 00 20 00
[ 557.093154] sd 2:0:1:0: [sda] tag#1 Done: SUCCESS Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[ 557.097336] sd 2:0:1:0: [sda] tag#1 CDB: Read(10) 28 00 00 00 00 00 00 00 20 00

[ 557.100955] sd 2:0:1:0: tag#1 Send: scmd 0xffff880139fe14f0
[ 557.103159] sd 2:0:1:0: tag#1 CDB: Test Unit Ready 00 00 00 00 00 00
[ 557.106005] sd 2:0:1:0: tag#1 Done: SUCCESS Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[ 557.109498] sd 2:0:1:0: tag#1 CDB: Test Unit Ready 00 00 00 00 00 00

Synthetic reproducer (original kernel)
---

root@mfo-sf194614:~# uname -a
Linux mfo-sf194614 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

root@mfo-sf194614:~# cat /sys/module/virtio_scsi/sections/.text
0xffffffffc001c000

Start GDB server in QEMU:

mfo@rotom:~$ virsh qemu-monitor-command mfo-sf194614 --hmp 'gdbserver'
Waiting for gdb connection on device 'tcp::1234'

Get kernel debug symbols:

mfo@rotom:~/ddebs$ wget http://ddebs.ubuntu.com/pool/main/l/linux/linux-image-4.4.0-138-generic-dbgsym_4.4.0-138.164_amd64.ddeb
mfo@rotom:~/ddebs$ dpkg-deb -x linux-image-4.4.0-138-generic-dbgsym_4.4.0-138.164_amd64.ddeb linux-image-4.4.0-138-generic-dbgsym_4.4.0-138.164_amd64
mfo@rotom:~/ddebs$ cd linux-image-4.4.0-138-generic-dbgsym_4.4.0-138.164_amd64/usr/lib/debug

Attach GDB:

$ gdb boot/vmlinux-4.4.0-138-generic \
  -ex 'set confirm off' \
  -ex 'add-symbol-file lib/modules/4.4.0-138-generic/kernel/drivers/scsi/virtio_scsi.ko 0xffffffffc001c000' \
  -ex 'break virtscsi_queuecommand' \
  -ex 'target remote localhost:1234' \
  -ex 'continue'

Run dd:

root@mfo-sf194614:~# dd if=/dev/sda of=/dev/null count=1 2>/dev/null
<freeze>

[ 1271.471568] sd 2:0:1:0: [sda] tag#1 Send: scmd 0xffff880139fe14f0
[ 1271.474633] sd 2:0:1:0: [sda] tag#1 CDB: Read(10) 28 00 00 00 00 00 00 00 20 00
<freeze>

Back to GDB:

Thread 1 hit Breakpoint 1, virtscsi_queuecommand (vscsi=0xffff880036eb07d8, req_vq=0xffff880036eb09e8, sc=0xffff880139fe14f0) at /build/linux-zdslHp/linux-4.4.0/drivers/scs...

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Organic Testcase
================

set -x
instancename=USER-trusty
firstdevice=sdb
seconddevice=sdc
diskname=USER-disk-1
gcloud compute --project "PROJECT" ssh --zone "ZONE" "$instancename" --command '(tar -C /var/issue/first -cf - .) | (tar -C /var/issue/second -xpf -)'
gcloud compute --project "PROJECT" ssh --zone "ZONE" "$instancename" --command "sudo umount /var/issue/first"
gcloud compute --project "PROJECT" ssh --zone "ZONE" "$instancename" --command "sudo umount /var/issue/second"
gcloud compute --project "PROJECT" ssh --zone "ZONE" "$instancename" --command "sudo mount /dev/$seconddevice /var/issue/first"
gcloud compute instances detach-disk $instancename --disk $diskname --zone ZONE
# <----- error should happen here
gcloud compute instances attach-disk $instancename --disk $diskname --zone ZONE
gcloud compute --project "PROJECT" ssh --zone "ZONE" "$instancename" --command "sudo umount /var/issue/first"
gcloud compute --project "PROJECT" ssh --zone "ZONE" "$instancename" --command "sudo mount /dev/$firstdevice /var/issue/first"
gcloud compute --project "PROJECT" ssh --zone "ZONE" "$instancename" --command "sudo mount /dev/$seconddevice /var/issue/second"

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Organic Testcase:

It usually reproduces the problem in the 1st or 2nd iteration.
With the fix the problem did not reproduce in 35 iterations.

description: updated
description: updated
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Patches posted to kernel-team mailing list [1].

[1] https://lists.ubuntu.com/archives/kernel-team/2018-October/096072.html
    [SRU Xenial][PATCH 0/2] Improve our SAUCE for virtio-scsi reqs counter (fix CPU soft lockup)

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Triaged
Changed in linux (Ubuntu Xenial):
status: New → Triaged
importance: Undecided → Medium
Changed in linux (Ubuntu Xenial):
status: Triaged → Fix Committed
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :
Download full text (9.4 KiB)

Verification successful with the kernel in xenial-proposed.
Updating verification tags.

Details
=======

Guest:
---

root@mfo-sf194614:~# apt-cache madison linux-image-4.4.0-139-generic
linux-image-4.4.0-139-generic | 4.4.0-139.165 | http://archive.ubuntu.com/ubuntu/ xenial-proposed/main amd64 Packages

root@mfo-sf194614:~# apt-get -y install linux-image-4.4.0-139-generic
root@mfo-sf194614:~# reboot

root@mfo-sf194614:~# uname -a
Linux mfo-sf194614 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

root@mfo-sf194614:~# ls -ld /sys/block/sda
lrwxrwxrwx 1 root root 0 Oct 25 13:51 /sys/block/sda -> ../devices/pci0000:00/0000:00:07.0/virtio4/host2/target2:0:0/2:0:0:0/block/sda

root@mfo-sf194614:~# lspci -s 7.0
00:07.0 SCSI storage controller: Red Hat, Inc Virtio SCSI

root@mfo-sf194614:~# echo 9 > /proc/sys/kernel/printk
root@mfo-sf194614:~# scsi_logging_level --set --mlqueue=4 --mlcomplete=2

root@mfo-sf194614:~# dd if=/dev/sda of=/dev/null count=1
[ 139.412030] sd 2:0:0:0: [sda] tag#1 Send: scmd 0xffff880036e614f0
[ 139.415541] sd 2:0:0:0: [sda] tag#1 CDB: Read(10) 28 00 00 00 00 00 00 00 20 00
[ 139.418561] sd 2:0:0:0: [sda] tag#1 Done: SUCCESS Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[ 139.421753] sd 2:0:0:0: [sda] tag#1 CDB: Read(10) 28 00 00 00 00 00 00 00 20 00

[ 139.424561] sd 2:0:0:0: tag#1 Send: scmd 0xffff880036e614f0
[ 139.427081] sd 2:0:0:0: tag#1 CDB: Test Unit Ready 00 00 00 00 00 00
[ 139.428997] sd 2:0:0:0: tag#1 Done: SUCCESS Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[ 139.430982] sd 2:0:0:0: tag#1 CDB: Test Unit Ready 00 00 00 00 00 00

root@mfo-sf194614:~# cat /sys/module/virtio_scsi/sections/.text
0xffffffffc0002000

Host:
---

mfo@rotom:~/ddebs$ vscsi_text='0xffffffffc0002000'
mfo@rotom:~/ddebs$ virsh qemu-monitor-command mfo-sf194614 --hmp 'gdbserver'

mfo@rotom:~/ddebs$ wget http://ddebs.ubuntu.com/pool/main/l/linux/linux-image-4.4.0-139-generic-dbgsym_4.4.0-139.165_amd64.ddeb
mfo@rotom:~/ddebs$ dpkg-deb -x linux-image-4.4.0-139-generic-dbgsym_4.4.0-139.165_amd64.ddeb linux-image-4.4.0-139-generic-dbgsym_4.4.0-139.165_amd64/
mfo@rotom:~/ddebs$ cd linux-image-4.4.0-139-generic-dbgsym_4.4.0-139.165_amd64/usr/lib/debug/

gdb boot/vmlinux-4.4.0-139-generic \
  -ex 'set confirm off' \
  -ex "add-symbol-file lib/modules/4.4.0-139-generic/kernel/drivers/scsi/virtio_scsi.ko $vscsi_text" \
  -ex 'break virtscsi_queuecommand' \
  -ex 'target remote localhost:1234' \
  -ex 'continue'

Guest
---

root@mfo-sf194614:~# dd if=/dev/sda of=/dev/null count=1 2>/dev/null
[ 435.462432] sd 2:0:0:0: [sda] tag#1 Send: scmd 0xffff880036e614f0
[ 435.465679] sd 2:0:0:0: [sda] tag#1 CDB: Read(10) 28 00 00 00 00 00 00 00 20 00
<freeze>

Host GDB:
---

Thread 2 hit Breakpoint 1, virtscsi_queuecommand (vscsi=0xffff88013a6cf7d8, req_vq=0xffff88013a6cf9e8, sc=0xffff880036e614f0)
    at /build/linux-PrHwV2/linux-4.4.0/drivers/scsi/virtio_scsi.c:534
534 /build/linux-PrHwV2/linux-4.4.0/drivers/scsi/virtio_scsi.c: No such file or directory.
(gdb) break virtio_ring.c:196
Breakpoint 2 at 0xffffffff814e383f: virtio_ring.c:196. (3 locations)
(gdb) info break
Num Type Disp Enb Addr...

Read more...

tags: added: verification-done-xenial
removed: verification-needed-xenial
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (21.0 KiB)

This bug was fixed in the package linux - 4.4.0-139.165

---------------
linux (4.4.0-139.165) xenial; urgency=medium

  * linux: 4.4.0-139.165 -proposed tracker (LP: #1799401)

  * Kernel panic after the ubuntu_nbd_smoke_test on Xenial kernel (LP: #1793464)
    - nbd: Remove signal usage
    - nbd: Timeouts are not user requested disconnects
    - nbd: Cleanup reset of nbd and bdev after a disconnect
    - nbd: don't shutdown sock with irq's disabled
    - nbd: fix race in ioctl

  * fscache: bad refcounting in fscache_op_complete leads to OOPS (LP: #1797314)
    - SAUCE: fscache: Fix race in decrementing refcount of op->npages

  * xenial: virtio-scsi: CPU soft lockup due to loop in
    virtscsi_target_destroy() (LP: #1798110)
    - SAUCE: (no-up) virtio-scsi: Decrement reqs counter before SCSI command
      requeue

  * Error reported when creating ZFS pool with "-t" option, despite successful
    pool creation (LP: #1769937)
    - SAUCE: (noup) Update zfs to 0.6.5.6-0ubuntu26

  * Xenial update: 4.4.160 upstream stable release (LP: #1798770)
    - crypto: skcipher - Fix -Wstringop-truncation warnings
    - tsl2550: fix lux1_input error in low light
    - vmci: type promotion bug in qp_host_get_user_memory()
    - x86/numa_emulation: Fix emulated-to-physical node mapping
    - staging: rts5208: fix missing error check on call to rtsx_write_register
    - uwb: hwa-rc: fix memory leak at probe
    - power: vexpress: fix corruption in notifier registration
    - Bluetooth: Add a new Realtek 8723DE ID 0bda:b009
    - USB: serial: kobil_sct: fix modem-status error handling
    - 6lowpan: iphc: reset mac_header after decompress to fix panic
    - md-cluster: clear another node's suspend_area after the copy is finished
    - media: exynos4-is: Prevent NULL pointer dereference in __isp_video_try_fmt()
    - powerpc/kdump: Handle crashkernel memory reservation failure
    - media: fsl-viu: fix error handling in viu_of_probe()
    - x86/tsc: Add missing header to tsc_msr.c
    - x86/entry/64: Add two more instruction suffixes
    - scsi: target/iscsi: Make iscsit_ta_authentication() respect the output
      buffer size
    - scsi: klist: Make it safe to use klists in atomic context
    - scsi: ibmvscsi: Improve strings handling
    - usb: wusbcore: security: cast sizeof to int for comparison
    - powerpc/powernv/ioda2: Reduce upper limit for DMA window size
    - alarmtimer: Prevent overflow for relative nanosleep
    - s390/extmem: fix gcc 8 stringop-overflow warning
    - ALSA: snd-aoa: add of_node_put() in error path
    - media: s3c-camif: ignore -ENOIOCTLCMD from v4l2_subdev_call for s_power
    - media: soc_camera: ov772x: correct setting of banding filter
    - media: omap3isp: zero-initialize the isp cam_xclk{a,b} initial data
    - staging: android: ashmem: Fix mmap size validation
    - drivers/tty: add error handling for pcmcia_loop_config
    - media: tm6000: add error handling for dvb_register_adapter
    - ALSA: hda: Add AZX_DCAPS_PM_RUNTIME for AMD Raven Ridge
    - ath10k: protect ath10k_htt_rx_ring_free with rx_ring.lock
    - rndis_wlan: potential buffer overflow in rndis_wlan_auth_indication()
    - wlcore: Add missing PM call fo...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Brad Figg (brad-figg)
tags: added: cscc
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.