update-grub fails if a pool fails to import

Bug #1848399 reported by satmandu
28
This bug affects 4 people
Affects Status Importance Assigned to Milestone
grub2 (Ubuntu)
Fix Released
Medium
Jean-Baptiste Lallement
Eoan
Won't Fix
Medium
Jean-Baptiste Lallement
Focal
Fix Released
Medium
Jean-Baptiste Lallement
grubzfs-testsuite (Ubuntu)
Fix Released
Medium
Jean-Baptiste Lallement
Eoan
Won't Fix
Medium
Jean-Baptiste Lallement
Focal
Fix Released
Medium
Jean-Baptiste Lallement

Bug Description

[Description]
If a pool to import update-grub will fail because the import error message is used as the name of the pool. The error is similar to:

"cannot open 'This': no such pool" when no ZFS pools are available

The error can be caused by anything such as an invalid feature or a corrupted device.

The fix catches the error, displays it, ignore the pool but import others.

[Test Case]
1. Create a pool on a device and export it
2. Corrupt the device for example by shuffling random blocks on the device but not the header so it is recognized as a ZFS device
3. Run update-grub

Expected result:
The pool is ignored and reported

Actual result:
Generation of the grub menu fails

[Regression potential]
Low since currently the script aborts on error as soon as it fails to import. Worst case nothing imports and there is no zfs entries in the grub menu.

ProblemType: Bug
DistroRelease: Ubuntu 19.10
Package: ubiquity 19.10.20
ProcVersionSignature: Ubuntu 5.3.0-18.19-generic 5.3.1
Uname: Linux 5.3.0-18-generic x86_64
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
ApportVersion: 2.20.11-0ubuntu8
Architecture: amd64
CasperVersion: 1.425
Date: Wed Oct 16 20:03:30 2019
InstallCmdLine: BOOT_IMAGE=/casper/vmlinuz file=/cdrom/preseed/ubuntu.seed quiet splash ---
LiveMediaBuild: Ubuntu 19.10 "Eoan Ermine" - Release amd64 (20191014)
SourcePackage: ubiquity
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
satmandu (satadru-umich) wrote :
Revision history for this message
satmandu (satadru-umich) wrote :

Did a "sudo apt update ; sudo apt dist-upgrade -y" before running ubiquity.

I did a chroot to /target, and then here's the error I get when running "update-grub", which is the command that failed:

    cannot open 'This:' no such pool

zfs properties:

https://paste.ubuntu.com/p/86jzxpgZsG/

relevant dmesg maybe:
[ 360.582647] EXT4-fs (nvme0n1p2): mounted filesystem with ordered data mode. Opts: errors=remount-ro
[ 368.223234] Adding 2097148k swap on /target/swapfile. Priority:-2 extents:6 across:2260988k SSFS
[ 369.732752] nvme0n1: p1
[ 369.772285] nvme0n1: p1
[ 370.076746] nvme0n1: p1 p2 p3 p4 p5
[ 371.323787] EXT4-fs (nvme0n1p2): mounted filesystem with ordered data mode. Opts: (null)
[ 371.371780] Adding 2097148k swap on /dev/nvme0n1p3. Priority:-2 extents:1 across:2097148k SSFS
[ 511.385581] device-mapper: table: 253:0: linear: Device lookup failed
[ 511.385584] device-mapper: ioctl: error adding target to table

satmandu (satadru-umich)
summary: - eoan zfs install fails w/ grub error
+ experimental zfs install fails w/ grub error
Revision history for this message
satmandu (satadru-umich) wrote : Re: experimental zfs install fails w/ grub error

/boot/grub/grub.cfg on BOOT doesn't seem to get populated, but there was a grub.cfg.new which I was able to copy over.

Upon boot initramfs and kernel load from BOOT, but rpool fails to mount and initram drops to shell.

doing this at initramfs prompt lets me login:
    zpool import -R /root -N rpool
    zfs mount -a
    exit
Still get this error though:
    sudo update-grub
    Sourcing file `/etc/default/grub'
    Sourcing file `/etc/default/grub.d/init-select.cfg'
    Generating grub configuration file ...
    cannot open 'This': no such pool

Revision history for this message
satmandu (satadru-umich) wrote :

I'm only getting a grub.cfg.new with useful information when I move /etc/grub.d/10_linux_zfs to /etc/grub.d/10_linux_zfs.old

Default grub.cfg generated:
https://paste.ubuntu.com/p/snQKzDh8Kg/

With
https://paste.ubuntu.com/p/m5w4mTrPVQ/

Looks like /etc/grub.d/10_linux_zfs is the culprit here.

Revision history for this message
Didier Roche-Tolomelli (didrocks) wrote :

Thx! Can you set -x on 10_linux_zfs and direct stderr the output to some log file that you attach here so that we can see where 10_linux_zfs is dying when running update-grub?

Revision history for this message
satmandu (satadru-umich) wrote :

For what it is worth I also have other zfs pools (from other non-boot drives) on my system.

Revision history for this message
Didier Roche-Tolomelli (didrocks) wrote :

I think this is what is creating this issue. Please attach the logs I asked above, and also a zfs list + zpool list then :)
Thanks!

Revision history for this message
satmandu (satadru-umich) wrote :

Looks like this is an edge case due to some of my pools having features which are slated for support after the zfs-0.8.x series.

Running 10_linux_zfs with -x:
https://paste.ubuntu.com/p/x9cCwS3Y7S/

Revision history for this message
Didier Roche-Tolomelli (didrocks) wrote :

(The issue is only on grub, marking the other tasks as invalid and the grub one as incomplete until we get feedback)

Changed in grub (Ubuntu):
status: New → Incomplete
Changed in zsys (Ubuntu):
status: New → Invalid
Changed in ubiquity (Ubuntu):
status: New → Invalid
Revision history for this message
satmandu (satadru-umich) wrote :

"zpool import -f -a -o cachefile=none -N" is throwing an error when a zpool isn't importable.

Changing this line:
    zpool import -f -a -o cachefile=none -N 2>/dev/null
to
    zpool import -f -a -o cachefile=none -N 2>/dev/null || true
Allows update-grub to succeed.

There's still an error, but it doesn't appear to be fatal.

Worth noting that the function import_pools has the comment "We have to ignore zpool import output, as potentially multiple / will be available, and we need to autodetect all zpools this way with their real mountpoints."

So to actually do that one needs to ignore the error from zpool import. (I don't see any zpool import flags which would avoid throwing that error.)

This will happen if any zpools are on the system with zfs feature flags not supported by the zfs version in the installer image.

summary: - experimental zfs install fails w/ grub error
+ update-grub fails if zpools with unsupported feature_flags exist
Revision history for this message
Didier Roche-Tolomelli (didrocks) wrote : Re: update-grub fails if zpools with unsupported feature_flags exist

waow, do you have features enabled on some pools not supported in our ZFS 0.8.1 version? Would be interesting to know.

|| true is a little bit too much without filtering what was imported. We can maybe force at least importing a bpool and rpool (whatever it is) to ensure that grub will install a grub.cfg with at least one bootable system, but what about system with ZFS installed with no pools at all?

That sounds it needs deeper investigation and is (fortunately) only touching advanced users that can debug for now :)
Let's add that to the list for 20.04 and figure out what's the right actions are.

Steve Langasek (vorlon)
affects: grub (Ubuntu) → grub2 (Ubuntu)
Revision history for this message
satmandu (satadru-umich) wrote :

I have log_spacemap enabled.

zpool import -f -a -o cachefile=none -N
"This pool uses the following feature(s) not supported by this system:
 com.delphix:log_spacemap (Log metaslab changes on a single spacemap and flush them periodically.)
All unsupported features are only required for writing to the pool.
The pool can be imported using '-o readonly=on'."

Accepted for post 0.8.1: https://github.com/zfsonlinux/zfs/pull/8442#issuecomment-511918033

(But this could happen with the import of ANY zpool from another system. For instance from a non zfsonlinux system with different feature flags set.)

Revision history for this message
satmandu (satadru-umich) wrote :

FYI:

One can replace the "zpool import -f -a -o cachefile=none -N" line with this:
    local pipe="/tmp/zpool_pipe"
    no_import_pools=$(mkfifo ${pipe}; zpool import -f -a -o cachefile=none -N 2> ${pipe} | cut -d \' -f 2 ${pipe}; rm ${pipe})

If one wants a list of the pools which can not be imported.

To just discard the error though and keep update-grub from failing this works fine too:

local discard_pool_import_err=""
discard_pool_import_err=$(zpool import -f -a -o cachefile=none -N 2>/dev/null || true)

Maybe that workaround could be added so update-grub at least doesn't fail if a zpool which can't be imported is seen?

Changed in grub2 (Ubuntu):
status: Incomplete → Triaged
importance: Undecided → High
assignee: nobody → Jean-Baptiste Lallement (jibel)
no longer affects: ubiquity (Ubuntu)
no longer affects: zsys (Ubuntu)
Changed in grub2 (Ubuntu Focal):
importance: High → Medium
Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :

Fix in progress. For the testsuite, we cannot reproduce the very same error, however by corrupting the disk we can reproduce a similar case (error message is caught and used as the name of the pool)

Changed in grub2 (Ubuntu Focal):
status: Triaged → In Progress
Changed in grub2 (Ubuntu Eoan):
status: New → In Progress
Changed in grubzfs-testsuite (Ubuntu Eoan):
status: New → In Progress
status: In Progress → Triaged
Changed in grubzfs-testsuite (Ubuntu Focal):
status: New → Triaged
Changed in grub2 (Ubuntu Eoan):
importance: Undecided → Medium
Changed in grubzfs-testsuite (Ubuntu Eoan):
importance: Undecided → Medium
Changed in grubzfs-testsuite (Ubuntu Focal):
importance: Undecided → Medium
Changed in grub2 (Ubuntu Eoan):
assignee: nobody → Jean-Baptiste Lallement (jibel)
Changed in grubzfs-testsuite (Ubuntu Eoan):
assignee: nobody → Jean-Baptiste Lallement (jibel)
Changed in grubzfs-testsuite (Ubuntu Focal):
assignee: nobody → Jean-Baptiste Lallement (jibel)
description: updated
summary: - update-grub fails if zpools with unsupported feature_flags exist
+ update-grub fails if a pool fails to import
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package grubzfs-testsuite - 0.4.6

---------------
grubzfs-testsuite (0.4.6) focal; urgency=medium

  [ Jean-Baptiste Lallement ]
  [ Didier Roche ]
  * Test cases for:
    - Handle the case where grub-probe returns several devices for a single
      pool (LP: #1848856).
    - Do not crash on invalid fstab and report the invalid entry.
      (LP: #1849347)
    - When a pool fails to import, catch and display the error message and
      continue with other pools. Import all the pools in readonly mode so we
      can import other pools with unsupported features (LP: #1848399)

 -- Jean-Baptiste Lallement <email address hidden> Mon, 18 Nov 2019 11:38:20 +0100

Changed in grubzfs-testsuite (Ubuntu Focal):
status: Triaged → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package grub2 - 2.04-1ubuntu14

---------------
grub2 (2.04-1ubuntu14) focal; urgency=medium

  * debian/patches/ubuntu-zfs-enhance-support.patch:
    - Handle the case where grub-probe returns several devices for a single
      pool (LP: #1848856). Thanks jpb for the report and the proposed patch.
    - Add savedefault to non-recovery entries (LP: #1850202). Thanks Deltik
      for the patch.
    - Do not crash on invalid fstab and report the invalid entry.
      (LP: #1849347) Thanks Deltik for the patch.
    - When a pool fails to import, catch and display the error message and
      continue with other pools. Import all the pools in readonly mode so we
      can import other pools with unsupported features (LP: #1848399) Thanks
      satmandu for the investigation and the proposed patch

 -- Jean-Baptiste Lallement <email address hidden> Mon, 18 Nov 2019 11:22:43 +0100

Changed in grub2 (Ubuntu Focal):
status: In Progress → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote :

The Eoan Ermine has reached end of life, so this bug will not be fixed for that release

Changed in grub2 (Ubuntu Eoan):
status: In Progress → Won't Fix
Changed in grubzfs-testsuite (Ubuntu Eoan):
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.