Comment 22 for bug 1626894

Revision history for this message
Stéphane B (ktv) wrote :

Servers with two controllers. The second one disappear (with a kernel trace).

> cat /proc/version
Linux version 4.4.0-47-generic (buildd@lcy01-03) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.2) ) #68-Ubuntu SMP Wed Oct 26 19:39:52 UTC 2016

After upgrading kernel, my ZFS pool becomes DEGRADED:
> zpool status
  pool: zp0
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
 invalid. Sufficient replicas exist for the pool to continue
 functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: none requested
config:

 NAME STATE READ WRITE CKSUM
 zp0 DEGRADED 0 0 0
   mirror-0 DEGRADED 0 0 0
     nvme0n1 ONLINE 0 0 0
     9486952355712335023 UNAVAIL 0 0 0 was /dev/nvme1n1

Only ONE controller listed: !!

> nvme list
Node SN Model Version Namespace Usage Format FW Rev
---------------- -------------------- ---------------------------------------- -------- --------- -------------------------- ---------------- --------
/dev/nvme0n1 CVMD4391006B800GGN INTEL SSDPE2ME800G4 1.0 1 800,17 GB / 800,17 GB 512 B + 0 B 8DV10102

The bug isn't fixed for me.

[ 68.950042] nvme 0000:82:00.0: I/O 0 QID 0 timeout, disable controller
[ 69.054149] nvme 0000:82:00.0: Cancelling I/O 0 QID 0
[ 69.054182] nvme 0000:82:00.0: Identify Controller failed (-4)
[ 69.060132] nvme 0000:82:00.0: Removing after probe failure
[ 69.060284] iounmap: bad address ffffc9000cf34000
[ 69.065020] CPU: 14 PID: 247 Comm: kworker/14:1 Tainted: P OE 4.4.0-47-generic #68-Ubuntu
[ 69.065034] Hardware name: Supermicro SYS-F618R2-RC1+/X10DRFR-N, BIOS 2.0 01/27/2016
[ 69.065040] Workqueue: events nvme_remove_dead_ctrl_work [nvme]
[ 69.065050] 0000000000000286 00000000e10d6171 ffff8820340efce0 ffffffff813f5aa3
[ 69.065052] ffff88203454b4f0 ffffc9000cf34000 ffff8820340efd00 ffffffff8106bdff
[ 69.065054] ffff88203454b4f0 ffff88203454b658 ffff8820340efd10 ffffffff8106be3c
[ 69.065056] Call Trace:
[ 69.065068] [<ffffffff813f5aa3>] dump_stack+0x63/0x90
[ 69.065089] [<ffffffff8106bdff>] iounmap.part.1+0x7f/0x90
[ 69.065093] [<ffffffff8106be3c>] iounmap+0x2c/0x30
[ 69.065097] [<ffffffffc01c364a>] nvme_dev_unmap.isra.35+0x1a/0x30 [nvme]
[ 69.065099] [<ffffffffc01c475e>] nvme_remove+0xce/0xe0 [nvme]
[ 69.065108] [<ffffffff81447009>] pci_device_remove+0x39/0xc0
[ 69.065117] [<ffffffff815585e1>] __device_release_driver+0xa1/0x150
[ 69.065119] [<ffffffff815586b3>] device_release_driver+0x23/0x30
[ 69.065123] [<ffffffff8143fa7a>] pci_stop_bus_device+0x8a/0xa0
[ 69.065125] [<ffffffff8143fbca>] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[ 69.065129] [<ffffffffc01c309c>] nvme_remove_dead_ctrl_work+0x3c/0x50 [nvme]
[ 69.065136] [<ffffffff8109a4a5>] process_one_work+0x165/0x480
[ 69.065138] [<ffffffff8109a80b>] worker_thread+0x4b/0x4c0
[ 69.065141] [<ffffffff8109a7c0>] ? process_one_work+0x480/0x480
[ 69.065143] [<ffffffff8109a7c0>] ? process_one_work+0x480/0x480
[ 69.065147] [<ffffffff810a09e8>] kthread+0xd8/0xf0
[ 69.065150] [<ffffffff810a0910>] ? kthread_create_on_node+0x1e0/0x1e0
[ 69.065157] [<ffffffff8183538f>] ret_from_fork+0x3f/0x70
[ 69.065158] [<ffffffff810a0910>] ? kthread_create_on_node+0x1e0/0x1e0
[ 69.065161] Trying to free nonexistent resource <00000000fbd10000-00000000fbd13fff>