zfs, zpool commands hangs for 10 seconds without a /dev/zfs

Bug #1760173 reported by Ryan Harper
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
lxd (Ubuntu)
Invalid
Undecided
Unassigned
zfs-linux (Ubuntu)
Fix Released
High
Colin Ian King
Xenial
Fix Released
High
Colin Ian King
Artful
Fix Released
High
Colin Ian King
Bionic
Fix Released
High
Colin Ian King
Cosmic
Fix Released
High
Colin Ian King

Bug Description

== SRU Justification, Xenial, Artful, Bionic ==

When outside a lxd container with zfs storage, zfs list or zpool status either returns or reports what's going on.

When inside a lxd container with zfs storage, zfs list or zpool status appears to hang, no output for 10 seconds.

== Fix ==

Inside a container we don't need the 10 seconds timeout, so check for this scenario and set the timeout to default to 0 seconds.

== Regression Potential ==

Minimal, this caters for a corner case inside a containerized environment, the fix will not alter the behaviour for other cases.

-----

1. # lsb_release -rd
Description: Ubuntu 16.04.4 LTS
Release: 16.04

2. # apt-cache policy zfsutils-linux
zfsutils-linux:
  Installed: 0.6.5.6-0ubuntu19
  Candidate: 0.6.5.6-0ubuntu19
  Version table:
 *** 0.6.5.6-0ubuntu19 500
        500 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
        100 /var/lib/dpkg/status

3. When inside a lxd container with zfs storage, zfs list or zpool status either return or report what's going on.

4. When inside a lxd container with zfs storage, zfs list or zpool status appears to hang, no output for 10 seconds.

strace reveals that without a /dev/zfs the tools wait for it to appear for 10 seconds but do not provide a command line switch to disable or make it more verbose.

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: zfsutils-linux 0.6.5.6-0ubuntu19
ProcVersionSignature: Ubuntu 4.13.0-36.40~16.04.1-generic 4.13.13
Uname: Linux 4.13.0-36-generic x86_64
ApportVersion: 2.20.1-0ubuntu2.15
Architecture: amd64
Date: Fri Mar 30 18:09:29 2018
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 LANG=C.UTF-8
SourcePackage: zfs-linux
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Ryan Harper (raharper) wrote :
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in zfs-linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Colin Ian King (colin-king) wrote :

Would an immediate return with some error/warning message be more appropriate that a 10 second delay?

Revision history for this message
Colin Ian King (colin-king) wrote :

as above, "that" -> "than"

Changed in zfs-linux (Ubuntu):
status: Confirmed → Triaged
importance: Undecided → Medium
assignee: nobody → Colin Ian King (colin-king)
Revision history for this message
Ryan Harper (raharper) wrote : Re: [Bug 1760173] Re: zfs, zpool commands hangs for 10 seconds without a /dev/zfs

On Tue, Apr 10, 2018 at 2:44 AM, Colin Ian King
<email address hidden> wrote:
> Would an immediate return with some error/warning message be more
> appropriate that a 10 second delay?

Yes. I would think that the amount of time to wait could be an option.
I've read that in some scenarios users have wanted more than 10 seconds and
in our use-case; I would run with zero-time out.

>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1760173
>
> Title:
> zfs, zpool commands hangs for 10 seconds without a /dev/zfs
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1760173/+subscriptions

Changed in zfs-linux (Ubuntu):
importance: Medium → High
Revision history for this message
Colin Ian King (colin-king) wrote :

accessing /dev/zfs is dependent on loading the zfs module which may take some time for udev to do it's work to create /dev/zfs. However, ZFS does allow this to be tweaked with the ZFS_MODULE_TIMEOUT environment variable:

From lib/libzfs/libzfs_util.c, libzfs_load_module:

/*
 * Device creation by udev is asynchronous and waiting may be
 * required. Busy wait for 10ms and then fall back to polling every
 * 10ms for the allowed timeout (default 10s, max 10m). This is
 * done to optimize for the common case where the device is
 * immediately available and to avoid penalizing the possible
 * case where udev is slow or unable to create the device.
 */

timeout_str = getenv("ZFS_MODULE_TIMEOUT");
...

so export ZFS_MODULE_TIMEOUT=0 may be a useful workaround for the moment. I wonder if that could be set automagically to zero lxc environment rather than hacking around zfs to detect if it is in a lxc container. The former does seem the best way forward.

Revision history for this message
Ryan Harper (raharper) wrote :

If they're going to modprobe and are waiting on udev, then I would like them to do something like

udevadm settle --exit-if-exists=/dev/zfs

That means they can exit early without paying the 10s by default value.

Revision history for this message
Colin Ian King (colin-king) wrote :

The code actually polls /dev/zfs until it appears. The issue here is that it does not appear after 10 seconds, and then it gives up.

Revision history for this message
Christian Brauner (cbrauner) wrote :

/*
 * Device creation by udev is asynchronous and waiting may be
 * required. Busy wait for 10ms and then fall back to polling every
 * 10ms for the allowed timeout (default 10s, max 10m). This is
 * done to optimize for the common case where the device is
 * immediately available and to avoid penalizing the possible
 * case where udev is slow or unable to create the device.
 */

That comment is not really correct though as udevd doesn't create
any device nodes itself in any recent versions.
The only thing it does is to create persistent names and symlinks.
The device node /dev/zfs is created by devtmpfs itself. Afaict,
udev only sets the permissions. Is the modprobe taking a long time?
Or is zfs somehow trying to go through a symlink that is generated by
udevd?

Revision history for this message
Ryan Harper (raharper) wrote :

On Mon, Apr 30, 2018 at 12:14 PM, Colin Ian King
<email address hidden> wrote:
> The code actually polls /dev/zfs until it appears. The issue here is
> that it does not appear after 10 seconds, and then it gives up.

Yes, you're right.

Would it make sense to have zfsutils know it's running in a container?

>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1760173
>
> Title:
> zfs, zpool commands hangs for 10 seconds without a /dev/zfs
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/lxd/+bug/1760173/+subscriptions

Revision history for this message
Christian Brauner (cbrauner) wrote :

If you're running zfs tools in a container setting the timeout to 0 will
likely be helpful. The device node will never appear in the containers /dev
since a) it's a tmpfs and b) even if it were a devtmpfs it wouldn't help
since devtmpfs isn't namespaced. (In fact udevd will even ignore any device
event inside an unprivileged container since it's sent with
INVALID_{G,U}ID. I have kernel patches to handle this and a bunch of other
quirks that are up for merging.) For unpriv containers any device that you
somehow create by talking to /dev/zfs (e.g. by having LXD create a device
node for the container) will never show up in the container. It will always
show up in the hosts devtmpfs.

On the host setting it to 0 should also not have a negative effect since
the modprobe should a) cause devtmpfs to create the device node *before*
sending a uevent and b) zfs userspace will likely wait for udevd to settle
aka setup symlinks and persistent device naming. The timeout logic only
makes sense for old udevd implementations that do create device nodes and
might do so in a delayed manner. But better to test.

Revision history for this message
Christian Brauner (cbrauner) wrote :

On Mon, Apr 30, 2018, 12:41 Ryan Harper <email address hidden> wrote:

> On Mon, Apr 30, 2018 at 12:14 PM, Colin Ian King
> <email address hidden> wrote:
> > The code actually polls /dev/zfs until it appears. The issue here is
> > that it does not appear after 10 seconds, and then it gives up.
>
> Yes, you're right.
>
> Would it make sense to have zfsutils know it's running in a container?
>

I think "running in a container" is too strong but making it smarter by
dropping the timeout and simply reporting an error could work if it doesn't
regress on the host. Which I don't expect it to. I haven't given up playing
with devtmpfs and we are on our way to getting quite close to properly
delegated devices by user namespace.

> >
> > --
> > You received this bug notification because you are subscribed to the bug
> > report.
> > https://bugs.launchpad.net/bugs/1760173
> >
> > Title:
> > zfs, zpool commands hangs for 10 seconds without a /dev/zfs
> >
> > To manage notifications about this bug go to:
> >
> https://bugs.launchpad.net/ubuntu/+source/lxd/+bug/1760173/+subscriptions
>
> --
> You received this bug notification because you are a member of Ubuntu
> containers team, which is subscribed to lxd in Ubuntu.
> Matching subscriptions: lxd
> https://bugs.launchpad.net/bugs/1760173
>
> Title:
> zfs, zpool commands hangs for 10 seconds without a /dev/zfs
>
> Status in lxd package in Ubuntu:
> New
> Status in zfs-linux package in Ubuntu:
> Triaged
>
> Bug description:
> 1. # lsb_release -rd
> Description: Ubuntu 16.04.4 LTS
> Release: 16.04
>
> 2. # apt-cache policy zfsutils-linux
> zfsutils-linux:
> Installed: 0.6.5.6-0ubuntu19
> Candidate: 0.6.5.6-0ubuntu19
> Version table:
> *** 0.6.5.6-0ubuntu19 500
> 500 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64
> Packages
> 100 /var/lib/dpkg/status
>
> 3. When inside a lxd container with zfs storage, zfs list or zpool
> status either return or report what's going on.
>
> 4. When inside a lxd container with zfs storage, zfs list or zpool
> status appears to hang, no output for 10 seconds.
>
> strace reveals that without a /dev/zfs the tools wait for it to appear
> for 10 seconds but do not provide a command line switch to disable or
> make it more verbose.
>
> ProblemType: Bug
> DistroRelease: Ubuntu 16.04
> Package: zfsutils-linux 0.6.5.6-0ubuntu19
> ProcVersionSignature: Ubuntu 4.13.0-36.40~16.04.1-generic 4.13.13
> Uname: Linux 4.13.0-36-generic x86_64
> ApportVersion: 2.20.1-0ubuntu2.15
> Architecture: amd64
> Date: Fri Mar 30 18:09:29 2018
> ProcEnviron:
> TERM=xterm-256color
> PATH=(custom, no user)
> LANG=C.UTF-8
> SourcePackage: zfs-linux
> UpgradeStatus: No upgrade log present (probably fresh install)
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/lxd/+bug/1760173/+subscriptions
>

Revision history for this message
Colin Ian King (colin-king) wrote :

Hrm. I'm, included to have the container environment inform zfs that the timeout should be zero rather adding in more logic to zfs that tries to determine which environment it is in to determine the timeout time. The container environment just needs to set the 1 environment variable that ZFS already checks, which should be trivial to add to the container environment. I'm concerned about adding container detection to ZFS that may need to change over time and becomes an effort to support over time. It just seems more straight forward to me for the container environment to inform ZFS not to wait for 10 seconds rather than ZFS (since the mechanism is already provided for in ZFS with the environment variable) than for ZFS to intuit if its in a special lxc container.

Revision history for this message
Stéphane Graber (stgraber) wrote :

Actually, LXC/LXD can't set environment variables in that way as systemd strips all inherited environment.

Looking at the backlog it sounds like it'd be safe for us to just turn off that timeout entirely in Ubuntu given that we can assume we'll always have devtmpfs where it matters and so there's no potential race between module loading and device node appearing.

Changed in lxd (Ubuntu):
status: New → Invalid
Revision history for this message
Colin Ian King (colin-king) wrote :

I'd rather not change the default behaviour for non-container environments as I want to be conservative here, so instead, I'm going to check if $container exits and set the timeout to zero for this specific case.

no longer affects: lxd (Ubuntu Xenial)
no longer affects: lxd (Ubuntu Artful)
no longer affects: lxd (Ubuntu Bionic)
no longer affects: lxd (Ubuntu Cosmic)
description: updated
Revision history for this message
Andy Whitcroft (apw) wrote : Please test proposed package

Hello Ryan, or anyone else affected,

Accepted zfs-linux into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/zfs-linux/0.7.5-1ubuntu16.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in zfs-linux (Ubuntu Bionic):
status: New → Fix Committed
Changed in zfs-linux (Ubuntu Artful):
status: New → Fix Committed
Revision history for this message
Andy Whitcroft (apw) wrote :

Hello Ryan, or anyone else affected,

Accepted zfs-linux into artful-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/zfs-linux/0.6.5.11-1ubuntu3.5 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-artful to verification-done-artful. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-artful. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in zfs-linux (Ubuntu Xenial):
status: New → Fix Committed
Revision history for this message
Andy Whitcroft (apw) wrote :

Hello Ryan, or anyone else affected,

Accepted zfs-linux into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/zfs-linux/0.6.5.6-0ubuntu22 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Revision history for this message
Stéphane Graber (stgraber) wrote :

I'm confused, how is this change going to work when the "container" environment variable is only present in PID1's environment but not in any of its descendants?

Revision history for this message
Colin Ian King (colin-king) wrote :

I'm confused too then, how is that when I start a process inside the container, such as from a shell it can access this environment variable and my fix works?

Revision history for this message
Stéphane Graber (stgraber) wrote :

That's because an attached process ("lxc-attach" or "lxc exec") isn't a child of init, it's spawned directly by liblxc and so does have our env variable set.

Any process which is a direct or indirect child of PID1 in the container will be inheriting its environment through that path and as init systems strip any inherited environment, the container env variable will not be set for those.

So for example, sshing into your container will not have the env variable set, same goes for any systemd unit.

Revision history for this message
Colin Ian King (colin-king) wrote :

OK, I hadn't realized that, so much for my trivial testing. So, is there any simple way for any process inside a container to detect if one is inside a containerized environment without any special root privileges?

Revision history for this message
Stéphane Graber (stgraber) wrote :

Not really, no. You can use systemd-detect-virt which is systemd specific but should work as a regular user, otherwise you can try to add some specialized checks like looking if /dev in the mount table is devtmpfs or not.

Revision history for this message
Colin Ian King (colin-king) wrote :

So, I am pretty confident that systemd will create "/run/systemd/container" if we're inside a container, so I'll check for that

Revision history for this message
Colin Ian King (colin-king) wrote :

..plus it can be easily checked w/o any special privilege.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package zfs-linux - 0.7.5-1ubuntu19

---------------
zfs-linux (0.7.5-1ubuntu19) cosmic; urgency=medium

  * Revert previous fix, re-work by checking for
    a container using /run/systemd/container and
    set timeout to zero for zfs list or zpool status
    when running inside a container (LP: #1760173)

 -- Colin Ian King <email address hidden> Thu, 7 Jun 2018 17:25:12 +0100

Changed in zfs-linux (Ubuntu Cosmic):
status: Triaged → Fix Released
Changed in zfs-linux (Ubuntu Bionic):
assignee: nobody → Colin Ian King (colin-king)
Changed in zfs-linux (Ubuntu Xenial):
assignee: nobody → Colin Ian King (colin-king)
Changed in zfs-linux (Ubuntu Artful):
assignee: nobody → Colin Ian King (colin-king)
Revision history for this message
Brian Murray (brian-murray) wrote :

Hello Ryan, or anyone else affected,

Accepted zfs-linux into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/zfs-linux/0.7.5-1ubuntu16.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Revision history for this message
Brian Murray (brian-murray) wrote :

Hello Ryan, or anyone else affected,

Accepted zfs-linux into artful-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/zfs-linux/0.6.5.11-1ubuntu3.6 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-artful to verification-done-artful. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-artful. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Revision history for this message
Brian Murray (brian-murray) wrote :

Hello Ryan, or anyone else affected,

Accepted zfs-linux into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/zfs-linux/0.6.5.6-0ubuntu23 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Revision history for this message
Colin Ian King (colin-king) wrote :

Tested with 0.7.5-1ubuntu16.2 inside a bionic lxc container and the 10 second delay is now fixed.

tags: added: verification-done-bionic
Revision history for this message
Colin Ian King (colin-king) wrote :

Tested with 0.6.5.11-1ubuntu3.6 inside an artful lxc container and the 10 second delay is now fixed.

tags: added: verification-done-artful
Revision history for this message
Colin Ian King (colin-king) wrote :

Tested with 0.6.5.6-0ubuntu23 inside a xenial lxc container and the 10 second delay is now fixed.

tags: added: ve
Changed in zfs-linux (Ubuntu Bionic):
importance: Undecided → High
Changed in zfs-linux (Ubuntu Artful):
importance: Undecided → High
Changed in zfs-linux (Ubuntu Xenial):
importance: Undecided → High
tags: added: verification-done-xenial
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package zfs-linux - 0.7.5-1ubuntu16.2

---------------
zfs-linux (0.7.5-1ubuntu16.2) bionic; urgency=medium

  * Revert previous fix, re-work by checking for
    a container using /run/systemd/container and
    set timeout to zero for zfs list or zpool status
    when running inside a container (LP: #1760173)

 -- Colin Ian King <email address hidden> Thu, 7 Jun 2018 17:25:12 +0100

Changed in zfs-linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for zfs-linux has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package zfs-linux - 0.6.5.11-1ubuntu3.6

---------------
zfs-linux (0.6.5.11-1ubuntu3.6) artful; urgency=medium

  * Revert previous fix, re-work by checking for
    a container using /run/systemd/container and
    set timeout to zero for zfs list or zpool status
    when running inside a container (LP: #1760173)

 -- Colin Ian King <email address hidden> Thu, 7 Jun 2018 17:25:12 +0100

Changed in zfs-linux (Ubuntu Artful):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package zfs-linux - 0.6.5.6-0ubuntu23

---------------
zfs-linux (0.6.5.6-0ubuntu23) xenial; urgency=medium

  * Revert previous fix, re-work by checking for
    a container using /run/systemd/container and
    set timeout to zero for zfs list or zpool status
    when running inside a container (LP: #1760173)

 -- Colin Ian King <email address hidden> Thu, 7 Jun 2018 17:25:12 +0100

Changed in zfs-linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.