zfs-import-cache.service fails on startup

Bug #1741081 reported by Richard Laager
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
zfs-linux (Ubuntu)
Fix Released
High
Colin Ian King
Artful
Fix Released
High
Colin Ian King
Bionic
Fix Released
High
Colin Ian King

Bug Description

== SRU Request, Artful ==

Enable ZFS module to be loaded without the broken ubuntu-load-zfs-unconditionally.patch.

== Fix ==

Add a new zfs-load-module.service script that modprobes the ZFS module and remove any hard coded module loading from zfs-import-cache.service & zfs-import-scan.service and make these latter scripts require the new zfs-load-module.service script. Also remove the now defunct ubuntu-load-zfs-unconditionally.patch as this will no longer be required.

== Testcase ==

On a clean VM, install with the fixed package, zfs should load automatically.

== Regression potential ==

ZFS module may not load if the changes are broken. However, testing proves this not to be the case.

--------------------

I just noticed on my test VM of artful that zfs-import-cache.service does not have a ConditionPathExists=/etc/zfs/zpool.cache. Because of that, it fails on startup, since the cache file does not exist.

This line is being deleted by debian/patches/ubuntu-load-zfs-unconditionally.patch. This patch seems to exist per:
https://bugs.launchpad.net/ubuntu/+source/lxd/+bug/1672749

This patch still exists in bionic, so I assume it will be similarly broken.

If the goal of the patch is to load the module (and only that), I think it should create a third unit instead:

zfs-load-module.service
 ^^ runs modprobe zfs

zfs-import-cache.service & zfs-import-scan.service
  ^^ per upstream minus modprobe plus Requires=zfs-load-module.service

I have tested this manually and it works. I can submit a package patch if this is the desired solution.

Interestingly, before this change, zfs-import-scan.service wasn't starting. If started manually, it worked. I had to give it a `systemctl enable zfs-import-scan.service` to create the Wants symlinks. Looking at the zfsutils-linux.postinst, I see the correct boilerplate from dh_systemd, so I'm not sure why this wasn't already done. Can anyone confirm or deny whether zfs-import-scan.service is enabled out-of-the-box on their system?

Is the zfs-import-scan.service not starting actually the cause of the original bug? The design is that *either* zfs-import-cache.service or zfs-import-scan.service starts. They both call modprobe zfs.

Richard Laager (rlaager)
description: updated
description: updated
description: updated
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in zfs-linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Scott Emmons (lscotte) wrote :

Confirmed to also affect bionic:

$ sudo journalctl -u zfs-import-cache.service
-- Logs begin at Tue 2018-01-23 17:46:55 UTC, end at Tue 2018-01-23 17:49:18 UTC. --
Jan 23 17:46:58 ubuntu systemd[1]: Starting Import ZFS pools by cache file...
Jan 23 17:46:59 ubuntu zpool[640]: failed to open cache file: No such file or directory
Jan 23 17:46:59 ubuntu systemd[1]: zfs-import-cache.service: Main process exited, code=exited, status=1/FAILURE
Jan 23 17:46:59 ubuntu systemd[1]: zfs-import-cache.service: Failed with result 'exit-code'.
Jan 23 17:46:59 ubuntu systemd[1]: Failed to start Import ZFS pools by cache file.

This is on a host with no ZFS filesystems, so expected behavior is a no-op.

tags: added: artful bionic
Revision history for this message
Scott Emmons (lscotte) wrote :

Let me know what I can do to help move this forward. The ubuntu-load-zfs-unconditionally patch was not a great solution as it results in systems that install zfsutils-linux to come up in a failed state when there are no ZFS pools present:

$ systemctl --failed
  UNIT LOAD ACTIVE SUB DESCRIPTION
* zfs-import-cache.service loaded failed failed Import ZFS pools by cache file

We use a common system image across a large fleet, some with ZFS pools, most without - and would prefer not having to do a local override.conf in our image.

Changed in zfs-linux (Ubuntu):
status: Confirmed → In Progress
importance: Undecided → High
assignee: nobody → Colin Ian King (colin-king)
Revision history for this message
Colin Ian King (colin-king) wrote :

@Richard,

thanks for reporting this issue. Re: "I have tested this manually and it works. I can submit a package patch if this is the desired solution."

..please attach a patch and I'll add it to the package. Just to clarify, which releases does this affect?

Revision history for this message
Scott Emmons (lscotte) wrote :

Note that for bionic as of this week the behavior is now different. This particular problem doesn't surface for zfs-import-cache.service (because the ConditionPathExists expression is back in the unit).

It's not all good news, however, as that one failed unit has been replaced with:

$ systemctl --failed
  UNIT LOAD ACTIVE SUB DESCRIPTION
● zfs-mount.service loaded failed failed Mount ZFS filesystems
● zfs-share.service loaded failed failed ZFS file system shares
● zfs-zed.service loaded failed failed ZFS Event Daemon (zed)

All because the zfs kernel module is now no longer being loaded. So, it's related to this bug in that the zfs-import-cache.service was loading the zfs kernel module because it was running unconditionally. Now that this unit is no longer running do to ConditionPathExists, other zfs units fail now.

I'm fairly certain we ran into this same issue back in xenial, I'll see if I can track down the launchpad # for it. Thanks!

Revision history for this message
Scott Emmons (lscotte) wrote :

I found the xenial issue I was thinking of [1] but I'd be surprised if that particular case regressed (it had to do with use of /etc/mtab). For completeness, I'll mention it here anyway.

[1] https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1607920

Changed in zfs-linux (Ubuntu Artful):
assignee: nobody → Colin Ian King (colin-king)
importance: Undecided → High
status: New → In Progress
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package zfs-linux - 0.7.5-1ubuntu3

---------------
zfs-linux (0.7.5-1ubuntu3) bionic; urgency=medium

  * Add new module loading systemd service (LP: #1741081)
    - loads zfs module unconditionally, and make the
      zfs-import-cache.service and zfs-import-scan.service services
      require this service.

 -- Colin Ian King <email address hidden> Fri, 2 Mar 2018 11:47:12 +0000

Changed in zfs-linux (Ubuntu Bionic):
status: In Progress → Fix Released
description: updated
Revision history for this message
Colin Ian King (colin-king) wrote :

Note: I've put a copy of the Artful fix in the following PPA for test purposes, it can be installed using:

sudo add-apt-repository ppa:colin-king/zfs-artful-sru-1741081
sudo apt-get update
sudo apt-get dist-upgrade

Revision history for this message
Scott Emmons (lscotte) wrote :

Great, thanks Colin! We will test this on bionic very soon and will follow up to confirm.

Revision history for this message
Scott Emmons (lscotte) wrote :

OK, I retested with 0.7.5-1ubuntu3 and it's almost there but zfs-mount.service still runs before the zfs kernel module is loaded:

$ systemctl --failed
  UNIT LOAD ACTIVE SUB DESCRIPTION
● zfs-mount.service loaded failed failed Mount ZFS filesystems

$ sudo journalctl -u zfs-mount.service
-- Logs begin at Fri 2018-03-02 19:08:55 UTC, end at Fri 2018-03-02 19:31:42 UTC. --
Mar 02 19:09:00 ubuntu systemd[1]: Starting Mount ZFS filesystems...
Mar 02 19:09:00 ubuntu zfs[557]: The ZFS modules are not loaded.
Mar 02 19:09:00 ubuntu zfs[557]: Try running '/sbin/modprobe zfs' as root to load them.
Mar 02 19:09:00 ubuntu systemd[1]: zfs-mount.service: Main process exited, code=exited, status=1/FAILURE
Mar 02 19:09:00 ubuntu systemd[1]: zfs-mount.service: Failed with result 'exit-code'.
Mar 02 19:09:00 ubuntu systemd[1]: Failed to start Mount ZFS filesystems.

But, by the time the system is fully up, something else has loaded the zfs kernel module along the way:

$ lsmod|grep zfs
zfs 3407872 3

I do see that zfs-load-module.service has "WantedBy=zfs-mount.service", but maybe zfs-mount.service needs "After=zfs-load-module.service" to insure the dependency order? Looking at the dependency tree, zfs-mount.service and zfs-import.target happen in parallel with the start of zfs-load-module.service resulting in a race condition.

Revision history for this message
Richard Laager (rlaager) wrote :

zfs-load-module.service seems to have a Requires on itself? That has to be wrong.

Also, zfs-import-cache.service and zfs-import-scan.service need an After=zfs-load-module.service. They're not getting one automatically because of DefaultDependencies=no (which seems appropriate here, so leave that alone). Scott, can you try this? I think it will fix your issue, since zfs-mount.service already has After=zfs-load-cache.service and After=zfs-load-scan.service.

The proposed package, with the changes above should fix the issue I'm reporting.

However, LP #1672749 says, "Since zfsutils-linux/0.6.5.9-4, zfs module is not automatically loaded on systems that no zpool exists, this avoids tainting everyone's kernel who has the package installed but is not using zfs." Is this still an important goal? If so, wasn't that lost with ubuntu-load-zfs-unconditionally.patch? Or perhaps something is keeping zfs-import-scan.service from being enabled by default?

Revision history for this message
Fabian Grünbichler (f-gruenbichler) wrote :

IMHO this is completely backwards, see https://github.com/zfsonlinux/zfs/pull/7259 for an upstream discussion similar to this.

systemd service units are not the proper place to load modules - there is {/lib,/etc}/modules-load.d/ which gets parsed by systemd-load-modules.service early in the boot process, and ZoL even ships a (disabled by default) snippet for /lib that supports. just flip that one to enabled (admin can override via /etc) and you don't need to introduce bogus services.

Revision history for this message
Colin Ian King (colin-king) wrote :

I've pushed a new release for Bionic to pick up the changes as suggested by Richard in comment #11. If that is a reasonable working solution I'll pick these changes up for an Artful SRU.

Revision history for this message
Łukasz Zemczak (sil2100) wrote :

Should I reject the artful SRU in that case? The .service file indeed looks like it might need some changes, I would prefer the latest bionic version to be SRUed.

Revision history for this message
Colin Ian King (colin-king) wrote :

Yes, please reject the artful SRU

Revision history for this message
Scott Emmons (lscotte) wrote :

I can confirm that with the latest bionic packages (zfsutils-linux 0.7.5-1ubuntu4) all units start successfully for the case where no ZFS pools are present. This is exactly as I would expect.

I can't really speak to the discussion about tainted kernel. If I didn't want ZFS, I wouldn't install zfsutils-linux, so I think it's a reasonable expectation as-is.

Revision history for this message
Colin Ian King (colin-king) wrote :

FYI, I've uploaded the same fix to Artful for the SRU

Changed in zfs-linux (Ubuntu Artful):
status: In Progress → Fix Committed
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Please test proposed package

Hello Richard, or anyone else affected,

Accepted zfs-linux into artful-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/zfs-linux/0.6.5.11-1ubuntu3.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-artful to verification-done-artful. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-artful. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Revision history for this message
Richard Laager (rlaager) wrote :

I updated to the version from -proposed and rebooted. I verified that no units failed on startup.

tags: added: verification-done-artful
Revision history for this message
Colin Ian King (colin-king) wrote :

Richard, thanks for testing.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package zfs-linux - 0.6.5.11-1ubuntu3.2

---------------
zfs-linux (0.6.5.11-1ubuntu3.2) artful; urgency=medium

  * Add new module loading systemd service (LP: #1741081)
    - loads zfs module unconditionally, and make the
      zfs-import-cache.service and zfs-import-scan.service services
      require this service.

 -- Colin Ian King <email address hidden> Fri, 2 Mar 2018 11:47:12 +0000

Changed in zfs-linux (Ubuntu Artful):
status: Fix Committed → Fix Released
Revision history for this message
Chris Halse Rogers (raof) wrote : Update Released

The verification of the Stable Release Update for zfs-linux has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.