multiple kernel oops regarding hung tasks delaying boot

Bug #1989258 reported by Brian Murray
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Auto Package Testing
Fix Released
Undecided
Unassigned
linux (Ubuntu)
Fix Released
Critical
Unassigned

Bug Description

Kinetic amd64 instances in the autopkgtest environment are regularly experiencing a slow boot process due to a hung task timeout which happens multiple times. The oops appears below:

Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.485373] INFO: task systemd-udevd:140 blocked for more than 1208 seconds.
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.487120] Not tainted 5.19.0-15-generic #15-Ubuntu
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.490936] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.492717] task:systemd-udevd state:D stack: 0 pid: 140 ppid: 137 flags:0x00004006
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.494612] Call Trace:
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.495202] <TASK>
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.495727] __schedule+0x23b/0x5c0
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.496559] ? sched_clock_cpu+0x12/0xf0
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.497482] schedule+0x50/0xc0
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.498237] schedule_preempt_disabled+0x15/0x30
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.499306] __mutex_lock.constprop.0+0x4e7/0x760
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.500409] __mutex_lock_slowpath+0x13/0x20
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.501420] mutex_lock+0x36/0x40
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.502211] add_early_randomness+0x1b/0xc0
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.503178] hwrng_register+0x13e/0x1c0
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.504068] virtrng_scan+0x19/0x30 [virtio_rng]
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.505152] virtio_dev_probe+0x1f9/0x2c0
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.506096] really_probe+0x1d1/0x3a0
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.506958] __driver_probe_device+0x11b/0x190
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.507993] driver_probe_device+0x24/0xd0
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.508955] __driver_attach+0xd8/0x200
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.509859] ? __device_attach_driver+0x120/0x120
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.510952] bus_for_each_dev+0x7c/0xc0
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.511846] driver_attach+0x1e/0x30
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.512692] bus_add_driver+0x178/0x220
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.513597] driver_register+0x8f/0xf0
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.514474] ? 0xffffffffc0282000
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.515257] register_virtio_driver+0x20/0x40
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.516277] virtio_rng_driver_init+0x15/0x1000 [virtio_rng]
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.517602] do_one_initcall+0x4a/0x210
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.518498] ? kmem_cache_alloc_trace+0x181/0x300
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.519587] ? do_init_module+0x27/0x200
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.520516] do_init_module+0x50/0x200
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.521403] load_module+0xb60/0xcb0
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.522243] __do_sys_finit_module+0xbd/0x130
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.523255] ? __do_sys_finit_module+0xbd/0x130
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.524312] __x64_sys_finit_module+0x18/0x20
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.525343] do_syscall_64+0x5b/0x80
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.526183] ? syscall_exit_to_user_mode+0x26/0x50
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.527287] ? __x64_sys_mmap+0x33/0x50
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.528180] ? do_syscall_64+0x67/0x80
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.529064] ? exit_to_user_mode_prepare+0x30/0xa0
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.530182] ? syscall_exit_to_user_mode+0x26/0x50
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.531289] ? __x64_sys_newfstatat+0x1c/0x30
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.532310] ? do_syscall_64+0x67/0x80
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.533185] entry_SYSCALL_64_after_hwframe+0x63/0xcd
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.534354] RIP: 0033:0x7f66accbf9fd
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.535191] RSP: 002b:00007ffc83fb0058 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.536912] RAX: ffffffffffffffda RBX: 0000555b12b044d0 RCX: 00007f66accbf9fd
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.538541] RDX: 0000000000000000 RSI: 00007f66ace41458 RDI: 0000000000000005
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.540161] RBP: 00007f66ace41458 R08: 0000000000000000 R09: 0000000000000000
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.541803] R10: 0000000000000005 R11: 0000000000000246 R12: 0000000000020000
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.543431] R13: 0000555b12b03e60 R14: 0000000000000000 R15: 0000555b12afff80
Sep 9 22:31:32 bdmurray-test-systemd kernel: [ 1330.545063] </TASK>

This first occurred with at around 2022-08-24 with at least kernel version 5.19.0-15-generic. But around 2022-09-07 started becoming much more frequent. I've downgraded the version of systemd to 251.2-2ubuntu2 and that did not cause the oops to go away.

ProblemType: Bug
DistroRelease: Ubuntu 22.10
Package: linux-image-5.19.0-15-generic 5.19.0-15.15
ProcVersionSignature: Ubuntu 5.19.0-15.15-generic 5.19.0
Uname: Linux 5.19.0-15-generic x86_64
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Sep 9 22:30 seq
 crw-rw---- 1 root audio 116, 33 Sep 9 22:30 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.23.0-0ubuntu1
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: N/A
CasperMD5CheckResult: unknown
Date: Sat Sep 10 15:51:33 2022
Ec2AMI: ami-00009dfb
Ec2AMIManifest: FIXME
Ec2AvailabilityZone: nova
Ec2InstanceType: autopkgtest
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
Lsusb: Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Lsusb-t: /: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=uhci_hcd/2p, 12M
MachineType: OpenStack Foundation OpenStack Nova
PciMultimedia:

ProcEnviron:
 LC_CTYPE=en_US.UTF-8
 TERM=screen.xterm-256color
 PATH=(custom, no user)
 LANG=C.UTF-8
 SHELL=/bin/bash
ProcFB:

ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.19.0-15-generic root=UUID=63a35cf8-4d22-4a79-a93d-eb178abc8c07 ro console=ttyS0
RelatedPackageVersions:
 linux-restricted-modules-5.19.0-15-generic N/A
 linux-backports-modules-5.19.0-15-generic N/A
 linux-firmware 20220831.gitd3c92280-0ubuntu1
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 04/01/2014
dmi.bios.release: 0.0
dmi.bios.vendor: SeaBIOS
dmi.bios.version: Ubuntu-1.8.2-1ubuntu1+esm1
dmi.chassis.type: 1
dmi.chassis.vendor: QEMU
dmi.chassis.version: pc-i440fx-xenial
dmi.modalias: dmi:bvnSeaBIOS:bvrUbuntu-1.8.2-1ubuntu1+esm1:bd04/01/2014:br0.0:svnOpenStackFoundation:pnOpenStackNova:pvr13.1.4:cvnQEMU:ct1:cvrpc-i440fx-xenial:sku:
dmi.product.family: Virtual Machine
dmi.product.name: OpenStack Nova
dmi.product.version: 13.1.4
dmi.sys.vendor: OpenStack Foundation

CVE References

Revision history for this message
Brian Murray (brian-murray) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Changed in linux (Ubuntu):
importance: Undecided → Critical
Revision history for this message
Andrea Righi (arighi) wrote :

Hi Brian, we have also experienced this issue during our tests, but it's really hard to reproduce it on our side.

It seems to be a regression in virtio-rng and I've prepared a test kernel, but we still haven't found a way to reproduce it systematically, so I couldn't verify if this test kernel is actually fixing the problem or not.

Are you able to reproduce this issue easily? In that case, it'd be great if you could a quick test with this kernel:
https://kernel.ubuntu.com/~arighi/lp-1989258/

(I've uploaded just the debs here, but if you prefer a ppa let me know)

Thanks in advance!

Revision history for this message
Brian Murray (brian-murray) wrote :

I was booting a test system this afternoon and encountered the OOPS regularly so installed the test kernel on the system and rebooted the system multiple times and did not encounter the OOPS once.

Sep 15 00:01:48 bdmurray-test-virtio-rng kernel: [ 0.000000] Linux version 5.19.0-16-generic (ubuntu@ip-10-0-228-190) (x86_64-linux-gnu-gcc (Ubuntu 12.2.0-1ubuntu1) 12.2.0, GNU ld (GNU Binutils for Ubuntu) 2.39) #16+arighi SMP PREEMPT_DYNAMIC Mon Sep 12 15:18:26 UTC 2022 (Ubuntu 5.19.0-16.16+arighi-generic 5.19.7)
Sep 15 00:01:48 bdmurray-test-virtio-rng kernel: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.19.0-16-generic root=UUID=63a35cf8-4d22-4a79-a93d-eb178abc8c07 ro console=ttyS0

ubuntu@bdmurray-test-virtio-rng:~$ grep "Call Trace" /var/log/syslog
Sep 14 23:09:05 bdmurray-test-virtio-rng kernel: [ 242.903669] Call Trace:
Sep 14 23:09:05 bdmurray-test-virtio-rng kernel: [ 363.735838] Call Trace:
Sep 14 23:09:05 bdmurray-test-virtio-rng kernel: [ 484.568986] Call Trace:
Sep 14 23:42:50 bdmurray-test-virtio-rng kernel: [ 242.911193] Call Trace:
Sep 14 23:42:50 bdmurray-test-virtio-rng kernel: [ 363.736184] Call Trace:
Sep 14 23:42:50 bdmurray-test-virtio-rng kernel: [ 484.567867] Call Trace:
Sep 14 23:42:50 bdmurray-test-virtio-rng kernel: [ 605.399430] Call Trace:
Sep 14 23:42:50 bdmurray-test-virtio-rng kernel: [ 726.231456] Call Trace:
Sep 14 23:42:50 bdmurray-test-virtio-rng kernel: [ 847.063944] Call Trace:
Sep 14 23:42:50 bdmurray-test-virtio-rng kernel: [ 967.895641] Call Trace:

ubuntu@bdmurray-test-virtio-rng:~$ lsmod | grep rng
virtio_rng 16384 0
ubuntu@bdmurray-test-virtio-rng:~$ uname -a
Linux bdmurray-test-virtio-rng 5.19.0-16-generic #16+arighi SMP PREEMPT_DYNAMIC Mon Sep 12 15:18:26 UTC 2022 x86_64 x86_64 x86_64 GNU/Linu
x

So I think this is good to go.

Revision history for this message
Brian Murray (brian-murray) wrote :

We've worked around this in the autopackage test environment by blacklisting the virtio_rng module, so I'm setting this task to Fix Released.

Changed in auto-package-testing:
status: New → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (18.6 KiB)

This bug was fixed in the package linux - 5.19.0-18.18

---------------
linux (5.19.0-18.18) kinetic; urgency=medium

  * kinetic/linux: 5.19.0-18.18 -proposed tracker (LP: #1990366)

  * 5.19.0-17.17: kernel NULL pointer dereference, address: 0000000000000084
    (LP: #1990236)
    - Revert "UBUNTU: SAUCE: apparmor: Fix regression in stacking due to label
      flags"
    - Revert "UBUNTU: [Config] disable SECURITY_APPARMOR_RESTRICT_USERNS"
    - Revert "UBUNTU: SAUCE: Revert "hwrng: virtio - add an internal buffer""
    - Revert "UBUNTU: SAUCE: Revert "hwrng: virtio - don't wait on cleanup""
    - Revert "UBUNTU: SAUCE: Revert "hwrng: virtio - don't waste entropy""
    - Revert "UBUNTU: SAUCE: Revert "hwrng: virtio - always add a pending
      request""
    - Revert "UBUNTU: SAUCE: Revert "hwrng: virtio - unregister device before
      reset""
    - Revert "UBUNTU: SAUCE: Revert "virtio-rng: make device ready before making
      request""
    - Revert "UBUNTU: [Config] update configs after apply new apparmor patch set"
    - Revert "UBUNTU: SAUCE: apparmor: add user namespace creation mediation"
    - Revert "UBUNTU: SAUCE: selinux: Implement userns_create hook"
    - Revert "UBUNTU: SAUCE: bpf-lsm: Make bpf_lsm_userns_create() sleepable"
    - Revert "UBUNTU: SAUCE: security, lsm: Introduce security_create_user_ns()"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: AppArmor: Remove the exclusive
      flag"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: LSM: Add /proc attr entry for full
      LSM context"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: LSM: Removed scaffolding function
      lsmcontext_init"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: netlabel: Use a struct lsmblob in
      audit data"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: Audit: Add record for multiple
      object contexts"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: audit: multiple subject lsm values
      for netlabel"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: Audit: Add record for multiple task
      security contexts"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: Audit: Allow multiple records in an
      audit_buffer"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: LSM: Add a function to report
      multiple LSMs"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: Audit: Create audit_stamp
      structure"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: Audit: Keep multiple LSM data in
      audit_names"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: LSM: security_secid_to_secctx
      module selection"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: binder: Pass LSM identifier for
      confirmation"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: NET: Store LSM netlabel data in a
      lsmblob"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: LSM: security_secid_to_secctx in
      netlink netfilter"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: LSM: Use lsmcontext in
      security_dentry_init_security"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: LSM: Use lsmcontext in
      security_inode_getsecctx"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: LSM: Use lsmcontext in
      security_secid_to_secctx"
    - Revert "UBUNTU: SAUCE: lsm stacking v37: LSM:...

Changed in linux (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.