zed process consuming 100% cpu

Bug #1751796 reported by Scott Moser
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Colin Ian King
Bionic
Fix Released
High
Colin Ian King
zfs-linux (Ubuntu)
Fix Released
High
Colin Ian King
Bionic
Fix Released
High
Colin Ian King

Bug Description

I logged in this morning and found system kind of sluggish.
A 'top' showed 'zed' process spinning 100% of a cpu.
I didn't feel like worrying about it, so I simply rebooted.

System came back up into the same state.
Running 'strace -p 2002' shows repeated:

ioctl(5, _IOC(0, 0x5a, 0x81, 0), 0x7ffccbb054b0) = -1 EBADF (Bad file descriptor)
ioctl(5, _IOC(0, 0x5a, 0x81, 0), 0x7ffccbb054b0) = -1 EBADF (Bad file descriptor)
ioctl(5, _IOC(0, 0x5a, 0x81, 0), 0x7ffccbb054b0) = -1 EBADF (Bad file descriptor)
ioctl(5, _IOC(0, 0x5a, 0x81, 0), 0x7ffccbb054b0) = -1 EBADF (Bad file descriptor)
ioctl(5, _IOC(0, 0x5a, 0x81, 0), 0x7ffccbb054b0) = -1 EBADF (Bad file descriptor)
ioctl(5, _IOC(0, 0x5a, 0x81, 0), 0x7ffccbb054b0) = -1 EBADF (Bad file descriptor)

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: zfs-zed 0.7.5-1ubuntu2
ProcVersionSignature: Ubuntu 4.13.0-32.35-generic 4.13.13
Uname: Linux 4.13.0-32-generic x86_64
NonfreeKernelModules: zfs zunicode zavl zcommon znvpair
ApportVersion: 2.20.8-0ubuntu10
Architecture: amd64
CurrentDesktop: ubuntu:GNOME
Date: Mon Feb 26 09:34:59 2018
EcryptfsInUse: Yes
InstallationDate: Installed on 2015-07-23 (949 days ago)
InstallationMedia: Ubuntu 15.10 "Wily Werewolf" - Alpha amd64 (20150722.1)
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: zfs-linux
UpgradeStatus: No upgrade log present (probably fresh install)
modified.conffile..etc.zfs.zed.d.zed-functions.sh: [deleted]

Revision history for this message
Scott Moser (smoser) wrote :
Revision history for this message
Colin Ian King (colin-king) wrote :

This is most probably because you are using ZFS 0.7.x user space tools with a kernel that has ZFS 0.6.5.x drivers. Please install the latest Bionic 4.15 kernel and see if that fixes the issue.

Changed in zfs-linux (Ubuntu):
importance: Undecided → Medium
status: New → In Progress
assignee: nobody → Colin Ian King (colin-king)
Revision history for this message
Scott Moser (smoser) wrote :

Colin, thanks for quick response.
rebooting into 4.15 did in resolve the issue.

I woudln't have stumbled on it at all except at some point i had dpkg-diverted
 /etc/kernel/postinst.d/zz-update-grub
so dist-upgrades didn't fix it.

fwiw, I was booted into vmlinuz-4.13.0-32-generic when I found the problem.

Revision history for this message
Scott Moser (smoser) wrote :

sorry for not being clear above. I had dpkg-diverted that update-grub, so my dist-upgrade and then reboot put me back into an older kernel.

Changed in zfs-linux (Ubuntu):
status: In Progress → Invalid
Revision history for this message
Scott Moser (smoser) wrote :

Why did you mark this invalid?

If zfs 0.7.X relies on kernel > 4.13, then it seems a valid bug.
zfs-zed does not mention any dependency on a specific linux version, and even if it did wouldnt users be guaranteed to hit this issue after they'd upgraded zfs-zed and before they'd upgraded (and rebooted into) kernel?

Changed in zfs-linux (Ubuntu):
status: Invalid → New
Changed in zfs-linux (Ubuntu):
status: New → In Progress
summary: - zed process consumming 100% cpu
+ zed process consuming 100% cpu
Revision history for this message
Steve Langasek (vorlon) wrote :

Confirmed this bug here - same situation, I did an upgrade from 17.10 to 18.04, and prior to reboot, zed is spinning using 100% of a core.

Revision history for this message
Steve Langasek (vorlon) wrote :

Obviously the user needs to reboot after a dist-upgrade for the new system to be fully usable, but zed should degrade more gracefully than by pegging the CPU.

Revision history for this message
Colin Ian King (colin-king) wrote :

The ZFS ioctl() interface is not binary compatible between the older userspace and newer kernel drivers. Fortunately this is fixable with an ioctl() remapping, which seems to fix the issue for me. I'll get an update in by EOD.

Changed in zfs-linux (Ubuntu):
importance: Medium → High
Changed in zfs-linux (Ubuntu):
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package zfs-linux - 0.7.5-1ubuntu8

---------------
zfs-linux (0.7.5-1ubuntu8) bionic; urgency=medium

  * Add ZFS 0.6.x kernel ioctl binary compat shim (LP: #1751796)
    Detect ZFS kernel driver version and copy zfs ioctl command to the
    newer ZFS 0.7.0 ioctl command layout.

 -- Colin Ian King <email address hidden> Thu, 22 Mar 2018 12:00:32 +0000

Changed in zfs-linux (Ubuntu):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu):
status: New → In Progress
importance: Undecided → High
assignee: nobody → Colin Ian King (colin-king)
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (40.4 KiB)

This bug was fixed in the package linux - 4.15.0-15.16

---------------
linux (4.15.0-15.16) bionic; urgency=medium

  * linux: 4.15.0-15.16 -proposed tracker (LP: #1761177)

  * FFe: Enable configuring resume offset via sysfs (LP: #1760106)
    - PM / hibernate: Make passing hibernate offsets more friendly

  * /dev/bcache/by-uuid links not created after reboot (LP: #1729145)
    - SAUCE: (no-up) bcache: decouple emitting a cached_dev CHANGE uevent

  * Ubuntu18.04:POWER9:DD2.2 - Unable to start a KVM guest with default machine
    type(pseries-bionic) complaining "KVM implementation does not support
    Transactional Memory, try cap-htm=off" (kvm) (LP: #1752026)
    - powerpc: Use feature bit for RTC presence rather than timebase presence
    - powerpc: Book E: Remove unused CPU_FTR_L2CSR bit
    - powerpc: Free up CPU feature bits on 64-bit machines
    - powerpc: Add CPU feature bits for TM bug workarounds on POWER9 v2.2
    - powerpc/powernv: Provide a way to force a core into SMT4 mode
    - KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9
    - KVM: PPC: Book3S HV: Work around XER[SO] bug in fake suspend mode
    - KVM: PPC: Book3S HV: Work around TEXASR bug in fake suspend state

  * Important Kernel fixes to be backported for Power9 (kvm) (LP: #1758910)
    - powerpc/mm: Fixup tlbie vs store ordering issue on POWER9

  * Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16
    namespaces (Bolt / NVMe) (LP: #1757497)
    - powerpc/64s: Fix lost pending interrupt due to race causing lost update to
      irq_happened

  * fwts-efi-runtime-dkms 18.03.00-0ubuntu1: fwts-efi-runtime-dkms kernel module
    failed to build (LP: #1760876)
    - [Packaging] include the retpoline extractor in the headers

linux (4.15.0-14.15) bionic; urgency=medium

  * linux: 4.15.0-14.15 -proposed tracker (LP: #1760678)

  * [Bionic] mlx4 ETH - mlnx_qos failed when set some TC to vendor
    (LP: #1758662)
    - net/mlx4_en: Change default QoS settings

  * AT_BASE_PLATFORM in AUXV is absent on kernels available on Ubuntu 17.10
    (LP: #1759312)
    - powerpc/64s: Fix NULL AT_BASE_PLATFORM when using DT CPU features

  * Bionic update to 4.15.15 stable release (LP: #1760585)
    - net: dsa: Fix dsa_is_user_port() test inversion
    - openvswitch: meter: fix the incorrect calculation of max delta_t
    - qed: Fix MPA unalign flow in case header is split across two packets.
    - tcp: purge write queue upon aborting the connection
    - qed: Fix non TCP packets should be dropped on iWARP ll2 connection
    - sysfs: symlink: export sysfs_create_link_nowarn()
    - net: phy: relax error checking when creating sysfs link netdev->phydev
    - devlink: Remove redundant free on error path
    - macvlan: filter out unsupported feature flags
    - net: ipv6: keep sk status consistent after datagram connect failure
    - ipv6: old_dport should be a __be16 in __ip6_datagram_connect()
    - ipv6: sr: fix NULL pointer dereference when setting encap source address
    - ipv6: sr: fix scheduling in RCU when creating seg6 lwtunnel state
    - mlxsw: spectrum_buffers: Set a minimum quota for CPU port traffic
    - net: phy: Tell caller result ...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Andy Whitcroft (apw)
tags: added: kernel-fixup-verification-needed-bionic
removed: verification-needed-bionic
Brad Figg (brad-figg)
tags: added: verification-needed-bionic
Revision history for this message
Andy Whitcroft (apw) wrote :

This bug was erroneously marked for verification in bionic; verification is not required and verification-needed-bionic is being removed.

tags: removed: verification-needed-bionic
tags: added: verification-done-bionic
Brad Figg (brad-figg)
tags: added: cscc
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.