Process accounting deadlock with idmapd callout when writing to NFSv4 mount

Bug #1509120 reported by bugproxy
22
This bug affects 1 person
Affects Status Importance Assigned to Milestone
nfs-utils (Ubuntu)
Fix Released
Medium
Dave Chiluk
Trusty
Fix Released
Medium
Dave Chiluk

Bug Description

[Impact]

 * Programs accessing nfsv4 mounts will hang on request_key interface with nfs4 + sec=sys with old nfsv4 hosts. Kernel is waiting on usermodehelper provided by keyutils.

 * INFO: task ls:2101 blocked for more than 120 seconds.
      Not tainted 3.13.0-66-generic #108-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ls D ffff88007fd13180 0 2101 1215 0x00000004
 ffff88007b14d630 0000000000000086 ffff8800374e6000 ffff88007b14dfd8
 0000000000013180 0000000000013180 ffff8800374e6000 ffff88007b14d6b0
 ffff88007ffd1460 0000000000000002 ffffffff812d0ce0 ffff88007b14d6a0
Call Trace:
 [<ffffffff812d0ce0>] ? umh_keys_init+0x20/0x20
 [<ffffffff81728499>] schedule+0x29/0x70
 [<ffffffff812d0cee>] key_wait_bit+0xe/0x20
 [<ffffffff81728c42>] __wait_on_bit+0x62/0x90
 [<ffffffff812d0ce0>] ? umh_keys_init+0x20/0x20
 [<ffffffff81728ce7>] out_of_line_wait_on_bit+0x77/0x90
 [<ffffffff810ab3d0>] ? autoremove_wake_function+0x40/0x40
 [<ffffffff812d10be>] wait_for_key_construction+0x6e/0x80
 [<ffffffff812d160c>] request_key+0x5c/0xa0
 [<ffffffffa027858f>] nfs_idmap_get_key+0xaf/0x1c0 [nfsv4]
 [<ffffffffa0278f8f>] nfs_map_name_to_uid+0xef/0x150 [nfsv4]
 [<ffffffffa0270117>] decode_getfattr_attrs+0xe47/0x14b0 [nfsv4]
 [<ffffffff8101bc79>] ? sched_clock+0x9/0x10
 [<ffffffffa027080c>] decode_getfattr_generic.constprop.102+0x8c/0xf0 [nfsv4]
 [<ffffffffa0270ef0>] ? nfs4_xdr_dec_access+0xa0/0xa0 [nfsv4]
 [<ffffffffa0270f60>] nfs4_xdr_dec_getattr+0x70/0x80 [nfsv4]
 [<ffffffffa013e316>] rpcauth_unwrap_resp+0x86/0xd0 [sunrpc]
 [<ffffffffa0270ef0>] ? nfs4_xdr_dec_access+0xa0/0xa0 [nfsv4]
 [<ffffffffa0130f6f>] call_decode+0x1df/0x870 [sunrpc]
 [<ffffffffa0130d90>] ? call_refreshresult+0x170/0x170 [sunrpc]
 [<ffffffffa0130d90>] ? call_refreshresult+0x170/0x170 [sunrpc]
 [<ffffffffa013bd84>] __rpc_execute+0x84/0x400 [sunrpc]
 [<ffffffffa013ccfe>] rpc_execute+0x5e/0xa0 [sunrpc]
 [<ffffffffa01331d0>] rpc_run_task+0x70/0x90 [sunrpc]
 [<ffffffffa0259646>] nfs4_call_sync_sequence+0x56/0x80 [nfsv4]
 [<ffffffffa0259f2e>] _nfs4_proc_getattr+0xbe/0xd0 [nfsv4]
 [<ffffffffa02604ea>] nfs4_proc_getattr+0x5a/0xd0 [nfsv4]
 [<ffffffffa01e19df>] __nfs_revalidate_inode+0xbf/0x310 [nfs]
 [<ffffffffa01d9af3>] nfs_opendir+0xe3/0x100 [nfs]
 [<ffffffff811bb883>] do_dentry_open+0x233/0x2e0
 [<ffffffffa01d9a10>] ? nfs_readdir_clear_array+0x70/0x70 [nfs]
 [<ffffffff811bbbb9>] vfs_open+0x49/0x50
 [<ffffffff811ccf64>] do_last+0x564/0x1240
 [<ffffffff811cac06>] ? link_path_walk+0x256/0x880
 [<ffffffff8131615b>] ? apparmor_file_alloc_security+0x5b/0x180
 [<ffffffff812d8786>] ? security_file_alloc+0x16/0x20
 [<ffffffff811cdcfb>] path_openat+0xbb/0x650
 [<ffffffff811cf0fa>] do_filp_open+0x3a/0x90
 [<ffffffff8118199e>] ? do_mmap_pgoff+0x34e/0x3d0
 [<ffffffff811dbf77>] ? __alloc_fd+0xa7/0x130
 [<ffffffff811bd6d9>] do_sys_open+0x129/0x280
 [<ffffffff817305ba>] ? do_page_fault+0x1a/0x70
 [<ffffffff811bd864>] SyS_openat+0x14/0x20
 [<ffffffff81734c5d>] system_call_fastpath+0x1a/0x1f

[Test Case]

1) Install an nfs server that does not support sec=sys, such as centos 6 or others that are old.
2) echo '/export *(rw,sync,no_root_squash,no_subtree_check,fsid=299)' > /etc/exports
3) sudo exportfs -a && sudo service nfs-kernel-server restart

Client:
1) sudo apt-get install nfs-common acct
2) sudo mkdir /account
3) sudo mount -t nfs4 -o rw,relatime,vers=4,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=2049,timeo=600,retrans=2,sec=sys,minorversion=0,local_lock=none 10.245.80.76:/export /account
 4) sudo touch /account/pacct
 5) sudo accton /account/pacct
 6) cd /account/
 7) run ls and other operations. *(dpkg operations seemed to do it for me)
 8) wait
 8) Terminal will become unresponsive within a few minutes, and new logins will not be possible

[Regression Potential]

 * There are a total of 28 other references to request_key in the kernel. It is possible that previous failures to request_key may now be passing which may result in alternate code paths being taken. Those kernel subsystems that don't already have an explicit obvious dependency on keyutils are.
   * drivers/staging/lustre
   * fs/afs
   * fs/fscache
   * fs/nfs
   * lib/digsig.c
   * net/ceph
   * net/dns_resolver
   * net/rxrpc
   * security/integrity
   * security/keys
 * Minimal this was applied to vivid+wily via bug 1449074.

[Other Info]

 * Sulution is to add keyutils as Required to nfs-common.

Original Description
__________________________________________________________________________

---Problem Description---
System hang when process accounting to a NFSv4 mount.

---uname output---
Linux ppc001 3.19.0-30-generic #34~14.04.1-Ubuntu SMP Fri Oct 2 22:21:52
UTC 2015 ppc64le ppc64le ppc64le GNU/Linux

---Problem Details---

We have a customer that is experiencing intermittent system hangs on their system. After a bit of debug, it was discovered that the trigger was turning on process accounting and writing to a file hosted via an NFSv4 mount. During the testing, several vmcores were captured, and the fingerprint indicates a mutex deadlock situation with process accounting. In the most recent vmcore, it appears that the scenario is something like the following:

1. PID: 4898 COMMAND: "ls" triggers a write to the process accounting file.
2. The resulting NFS write needs idmapd information and calls out to idmapd
3. The idmapd usermodehelper process triggers another process accounting update that blocks on the mutex being held by PID 4898.

PID: 4898 TASK: c000001fd26d7580 CPU: 7 COMMAND: "ls"
 #0 [c000001fd274a950] __switch_to at c000000000015934
 #1 [c000001fd274ab20] __switch_to at c000000000015934
 #2 [c000001fd274ab80] __schedule at c000000000a11de8
 #3 [c000001fd274ada0] schedule_timeout at c000000000a16284
 #4 [c000001fd274ae90] wait_for_common at c000000000a1360c
 #5 [c000001fd274af10] call_usermodehelper_exec at c0000000000ccd38
 #6 [c000001fd274af70] call_sbin_request_key at c000000000429258
 #7 [c000001fd274b100] request_key_and_link at c00000000042983c
 #8 [c000001fd274b200] request_key at c000000000429978
 #9 [c000001fd274b240] nfs_idmap_get_key at d00000002ca8b0bc [nfsv4]
#10 [c000001fd274b2b0] nfs_map_name_to_uid at d00000002ca8bbd0 [nfsv4]
#11 [c000001fd274b320] decode_getfattr_attrs at d00000002ca7f59c [nfsv4]
#12 [c000001fd274b420] decode_getfattr_generic.constprop.96 at d00000002ca7fd78
[nfsv4]
#13 [c000001fd274b4d0] nfs4_xdr_dec_getattr at d00000002ca80738 [nfsv4]
#14 [c000001fd274b530] rpcauth_unwrap_resp at d00000001fe67180 [sunrpc]
#15 [c000001fd274b600] call_decode at d00000001fe527c8 [sunrpc]
#16 [c000001fd274b6b0] __rpc_execute at d00000001fe64260 [sunrpc]
#17 [c000001fd274b790] rpc_run_task at d00000001fe54a78 [sunrpc]
#18 [c000001fd274b7c0] nfs4_call_sync_sequence at d00000002ca60960 [nfsv4]
#19 [c000001fd274b860] _nfs4_proc_getattr at d00000002ca6217c [nfsv4]
#20 [c000001fd274b930] nfs4_proc_getattr at d00000002ca6f494 [nfsv4]
#21 [c000001fd274b9a0] __nfs_revalidate_inode at d0000000202cf614 [nfs]
#22 [c000001fd274ba30] nfs_revalidate_file_size at d0000000202c9618 [nfs]
#23 [c000001fd274ba70] nfs_file_write at d0000000202cabdc [nfs]
#24 [c000001fd274bb00] new_sync_write at c0000000002b3d9c
#25 [c000001fd274bbd0] __kernel_write at c0000000002b3fec
#26 [c000001fd274bc20] do_acct_process at c000000000166b78
#27 [c000001fd274bcc0] acct_process at c00000000016748c
#28 [c000001fd274bcf0] do_exit at c0000000000b3660
#29 [c000001fd274bdc0] do_group_exit at c0000000000b3b14
#30 [c000001fd274be00] sys_exit_group at c0000000000b3bdc
#31 [c000001fd274be30] system_call at c000000000009258

PID: 4900 TASK: c000003c9946c180 CPU: 16 COMMAND: "kworker/u320:2"
 #0 [c000003c994fb790] __switch_to at c000000000015934
 #1 [c000003c994fb960] __switch_to at c000000000015934
 #2 [c000003c994fb9c0] __schedule at c000000000a11de8
 #3 [c000003c994fbbe0] schedule_preempt_disabled at c000000000a12980
 #4 [c000003c994fbc00] __mutex_lock_slowpath at c000000000a14aec
 #5 [c000003c994fbc80] mutex_lock at c000000000a14c4c
 #6 [c000003c994fbcb0] acct_get at c0000000001663ec
 #7 [c000003c994fbcf0] acct_process at c000000000167480
 #8 [c000003c994fbd20] do_exit at c0000000000b3660
 #9 [c000003c994fbdf0] ____call_usermodehelper at c0000000000ccaf4
#10 [c000003c994fbe30] ret_from_kernel_thread at c00000000000956c

Historical bug data:

from customer:

 am uploading a crash dump file from a lock up event that I just had. I was reminded on a status update call this morning that I never tried running process accounting since opening the ticket and updating the kernel. So, I tried that this morning. The first time I turned it on with the default output location and didn?t have any problems. Then I tried turning it on with the output going to our shared disk space, which is where I was originally sending it. I ran a couple commands that returned just fine, then when I ran a CUDA test program it didn?t return and I verified that I was no longer able to login to the node. I let it sit for awhile and eventually the console spit out some hung process errors. I waited a little longer, but then went ahead and hit the key combo to force a dump. I will do some more testing, but right now it looks like the problem is running Linux process account (accton) with the output directed at an NFS mount. This is the output of /proc/mounts for the mount point that I was pointing the output:

172.17.0.1:/gpfs/sb /gpfs/sb nfs4 rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=172.17.12.1,local_lock=none,addr=172.17.0.1 0 0

The file system is GPFS 3.5 on the back end, but the client is mounting it via NFS. I will try to reproduce the problem on a disk that is local to the NFS server just to take the GPFS part out of the equation. Our x86 clients are all doing this to directly mounted GPFS volumes so I doubt that is the issue.

Let me know if there is anything else you?d like to see or have me try.

Mike

== Comment: #21 - 2015-10-19 14:35:35 ==
I have repeated the problem and narrowed it down to NFS v4. I used an existing NFS export and mounted it with the default options which used NFS v3 and I was unable to get the problem to happen. I then setup an NFS v4 export, mounted that, turned on process accounting to write to that mount point, did an ls. The ls returned data, but the command prompt never returned. After some time I got the standard hung_task error.

Mike

== Comment: #27 - 2015-10-20 15:57:47 ==
Kevin,
1. The server is running CentOS 6.x (Most packages are from 6.6).
The kernel is 2.6.32-358.18.1.el6.x86_64

2. This is the /etc/exports line for this directory:
/var/psacct 172.17.0.0/16(rw,no_root_squash,sync,fsid=299)

3. The base FS is ext4. This is its /proc/mounts entry:
/dev/md124 /var ext4 rw,relatime,barrier=1,data=ordered 0 0

4. The clients /etc/fstab looks like this:
172.17.0.1:/var/psacct/ /mnt nfs rw,relatime,vers=4,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,minorversion=0,local_lock=none 0 0

Revision history for this message
bugproxy (bugproxy) wrote : sosreport after latest hang replication test

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-131658 severity-high targetmilestone-inin14044
Revision history for this message
bugproxy (bugproxy) wrote : dmesg log buffer from latest test dump

Default Comment by Bridge

Kevin W. Rudd (kevinr)
affects: ubuntu → linux (Ubuntu)
penalvch (penalvch)
Changed in linux (Ubuntu):
importance: Undecided → High
Luciano Chavez (lnx1138)
Changed in linux (Ubuntu):
assignee: nobody → Taco Screen team (taco-screen-team)
Changed in linux (Ubuntu):
assignee: Taco Screen team (taco-screen-team) → Canonical Kernel Team (canonical-kernel-team)
status: New → Triaged
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2015-10-29 17:13 EDT-------
Please ignore the reference to CentOS at the tail end of the problem
summary. We had simply asked the customer for details about the NFS
export in case it was needed for *client-side* replication purposes.
The OS the remote NFSv4 server is using should be irrelevant.

Revision history for this message
Dave Chiluk (chiluk) wrote :

Is this testcase reproducible using an nfs server with a recent kernel? Lots of improvements have been made to NFS in the many years since 2.6.32, including some concurrency improvements.

Thanks,
Dave.

Chris J Arges (arges)
Changed in linux (Ubuntu):
assignee: Canonical Kernel Team (canonical-kernel-team) → Chris J Arges (arges)
status: Triaged → In Progress
Revision history for this message
Dave Chiluk (chiluk) wrote :

My apologies, I just noticed the deadlock on the mutex comment. Is this reproducible with the upstream kernel on the client?

We provide mainline kernel builds for testing of this nature using our mainline build repositories.
http://kernel.ubuntu.com/~kernel-ppa/mainline/

Also keep in mind, that we are working on reproducing this in-house as well.

Revision history for this message
Chris J Arges (arges) wrote :

I'm having an issue reproducing this problem; perhaps I'm missing something so I'll explain my reproduction steps in detail.

Server:
1) sudo apt-get install nfs-kernel-server
2) echo '/export *(rw,sync,no_root_squash,no_subtree_check,fsid=299)' > /etc/exports
3) sudo exportfs -a && sudo service nfs-kernel-server restart

Client (power8 machine):
1) sudo apt-get install nfs-common acct
2) sudo mkdir /account
3) sudo mount -t nfs4 -o rw,relatime,vers=4,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=2049,timeo=600,retrans=2,sec=sys,minorversion=0,local_lock=none 10.245.80.76:/export /account
4) sudo touch /account/pacct
5) sudo accton /account/pacct
6) cd /account/
7) ls
8) cat pacct

Any suggestions would be welcome. Thanks

Revision history for this message
Dave Chiluk (chiluk) wrote :

I also completed the same test as Chris above, without failure. I was using Ubuntu trusty+3.13 as the server and Trusty +3.19 as the guest. At the moment I don't see any reason to assume that there is anything power specific in this issue, so I attempted on purely x86_64 hardware.

Also can we get the output of the mount command from the client so we can see verify that all /etc/fstab options were respected? It's possible that mount options were passed to the server, but were overrode by server capability bits.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2015-10-29 23:00 EDT-------
I have asked for confirmation of the mount options and a local test using a different NFS server. Stay tuned...

Revision history for this message
Dave Chiluk (chiluk) wrote :

So I did some more searching on this including running an tcpdump in my environment, and it appears as if idmapd is not making any requests for me. Hence the reason we aren't hitting the deadlock. Did you do anything specific in the area of getting idmapd up and configured?

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2015-10-30 15:09 EDT-------
I was also not able to replicate this behavior in my initial internal lab-system test, so it is nice to see our results are consistent so far. The customer's idmap.conf config looks to be fairly generic:

root@ppc001:~# cat /etc/idmapd.conf
[General]

#Verbosity = 5
Pipefs-Directory = /run/rpc_pipefs
Domain = localdomain

[Mapping]

Nobody-User = nobody
Nobody-Group = nogroup

I'll start digging a little deeper into site/config differences now that we seem to have consistent results in a test environment.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2015-10-30 16:51 EDT-------
Case 00090143 was opened to track this issue from the customer support
side. The entire debug bundle (including vmcore) from the customer's
test on the 20th has been uploaded and documented in the support case.

The sosreport is indicating that there are other potential factors
involved. The nsswitch.conf file shows that sssd is in the mix, and
sssd.conf indicates that ldap and kerberos may also be involved in the
callout behavior being seen.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2015-10-30 22:06 EDT-------
Sorry about the idmapd misdirection based on my cursory read of the stack. Since they are mounting with sec=sys, idmappng should be disabled. After diving a bit deeper into the code path, it appears to actually be the /sbin/request-key user-mode helper program that is being invoked here. I'm still digging into site config differences to see if I can better understand why we are unable to replicate this behavior in our own testing.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2015-10-30 22:58 EDT-------
Ahhh. (In reply to comment #42)
> Sorry about the idmapd misdirection based on my cursory read of the stack.
> Since they are mounting with sec=sys, idmappng should be disabled.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I think this is the trigger we have been missing. This change does not appear to be in place for the kernel version the customer is running on his NFS server. But it does appear to be the default behavior for the servers we have been trying to replicate with. Because of this, mounting sec=sys is causing our test servers to send back uid/gid information instead of name mapping (so our client code doesn't need the callout). I will try to install a down-rev system as a server for validation if someone else doesn't beat me to the punch first.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2015-10-31 17:59 EDT-------
It looks like a non-zero return from do_execve() in ____call_usermodehelper() is the primary trigger here.

if (!retval)
return 0;
do_exit(0);

A successful callout bypasses the do_exit() process accounting. While trying to debug the nfsidmap utility, I typoed a path and the resulting callout returned 127. My test system locked up just like the customer's.

Dave Chiluk (chiluk)
Changed in linux (Ubuntu):
assignee: Chris J Arges (arges) → Dave Chiluk (chiluk)
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2015-11-02 19:57 EDT-------
I asked the customer to validate their request-key config, and found out that they were missing the keyutils package and, as a result, the /sbin/request-key binary. It looks like there needs to be a hard dependency on the keyutils package in the nfs-common package.

Revision history for this message
Dave Chiluk (chiluk) wrote :
Download full text (3.3 KiB)

I created a centos 6 nfsv4 server, and went through the above recreate procedure with a trusty guest, and could successfully mount, and do file operations with accton. However after a few minutes, the console hung, and most tasks reported the following stack traces in /var/log/kern.log

INFO: task ls:2101 blocked for more than 120 seconds.
      Not tainted 3.13.0-66-generic #108-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ls D ffff88007fd13180 0 2101 1215 0x00000004
 ffff88007b14d630 0000000000000086 ffff8800374e6000 ffff88007b14dfd8
 0000000000013180 0000000000013180 ffff8800374e6000 ffff88007b14d6b0
 ffff88007ffd1460 0000000000000002 ffffffff812d0ce0 ffff88007b14d6a0
Call Trace:
 [<ffffffff812d0ce0>] ? umh_keys_init+0x20/0x20
 [<ffffffff81728499>] schedule+0x29/0x70
 [<ffffffff812d0cee>] key_wait_bit+0xe/0x20
 [<ffffffff81728c42>] __wait_on_bit+0x62/0x90
 [<ffffffff812d0ce0>] ? umh_keys_init+0x20/0x20
 [<ffffffff81728ce7>] out_of_line_wait_on_bit+0x77/0x90
 [<ffffffff810ab3d0>] ? autoremove_wake_function+0x40/0x40
 [<ffffffff812d10be>] wait_for_key_construction+0x6e/0x80
 [<ffffffff812d160c>] request_key+0x5c/0xa0
 [<ffffffffa027858f>] nfs_idmap_get_key+0xaf/0x1c0 [nfsv4]
 [<ffffffffa0278f8f>] nfs_map_name_to_uid+0xef/0x150 [nfsv4]
 [<ffffffffa0270117>] decode_getfattr_attrs+0xe47/0x14b0 [nfsv4]
 [<ffffffff8101bc79>] ? sched_clock+0x9/0x10
 [<ffffffffa027080c>] decode_getfattr_generic.constprop.102+0x8c/0xf0 [nfsv4]
 [<ffffffffa0270ef0>] ? nfs4_xdr_dec_access+0xa0/0xa0 [nfsv4]
 [<ffffffffa0270f60>] nfs4_xdr_dec_getattr+0x70/0x80 [nfsv4]
 [<ffffffffa013e316>] rpcauth_unwrap_resp+0x86/0xd0 [sunrpc]
 [<ffffffffa0270ef0>] ? nfs4_xdr_dec_access+0xa0/0xa0 [nfsv4]
 [<ffffffffa0130f6f>] call_decode+0x1df/0x870 [sunrpc]
 [<ffffffffa0130d90>] ? call_refreshresult+0x170/0x170 [sunrpc]
 [<ffffffffa0130d90>] ? call_refreshresult+0x170/0x170 [sunrpc]
 [<ffffffffa013bd84>] __rpc_execute+0x84/0x400 [sunrpc]
 [<ffffffffa013ccfe>] rpc_execute+0x5e/0xa0 [sunrpc]
 [<ffffffffa01331d0>] rpc_run_task+0x70/0x90 [sunrpc]
 [<ffffffffa0259646>] nfs4_call_sync_sequence+0x56/0x80 [nfsv4]
 [<ffffffffa0259f2e>] _nfs4_proc_getattr+0xbe/0xd0 [nfsv4]
 [<ffffffffa02604ea>] nfs4_proc_getattr+0x5a/0xd0 [nfsv4]
 [<ffffffffa01e19df>] __nfs_revalidate_inode+0xbf/0x310 [nfs]
 [<ffffffffa01d9af3>] nfs_opendir+0xe3/0x100 [nfs]
 [<ffffffff811bb883>] do_dentry_open+0x233/0x2e0
 [<ffffffffa01d9a10>] ? nfs_readdir_clear_array+0x70/0x70 [nfs]
 [<ffffffff811bbbb9>] vfs_open+0x49/0x50
 [<ffffffff811ccf64>] do_last+0x564/0x1240
 [<ffffffff811cac06>] ? link_path_walk+0x256/0x880
 [<ffffffff8131615b>] ? apparmor_file_alloc_security+0x5b/0x180
 [<ffffffff812d8786>] ? security_file_alloc+0x16/0x20
 [<ffffffff811cdcfb>] path_openat+0xbb/0x650
 [<ffffffff811cf0fa>] do_filp_open+0x3a/0x90
 [<ffffffff8118199e>] ? do_mmap_pgoff+0x34e/0x3d0
 [<ffffffff811dbf77>] ? __alloc_fd+0xa7/0x130
 [<ffffffff811bd6d9>] do_sys_open+0x129/0x280
 [<ffffffff817305ba>] ? do_page_fault+0x1a/0x70
 [<ffffffff811bd864>] SyS_openat+0x14/0x20
 [<ffffffff81734c5d>] system_call_fastpath+0x1a/0x1f

I then went and installed keyutils and retested, and could not reprodu...

Read more...

Dave Chiluk (chiluk)
Changed in nfs-utils (Ubuntu):
assignee: nobody → Dave Chiluk (chiluk)
Revision history for this message
Dave Chiluk (chiluk) wrote :

I verified that keyutils is already included in vivid+ in order to fix 1449074. So this shouldn't be too much of an issue to fix in trusty as well.

Dave Chiluk (chiluk)
description: updated
Revision history for this message
Dave Chiluk (chiluk) wrote :

Add keyutils to Depends: for nfs-common

Changed in nfs-utils (Ubuntu):
importance: Undecided → Medium
Changed in linux (Ubuntu):
importance: High → Medium
status: In Progress → Invalid
Changed in nfs-utils (Ubuntu):
status: New → In Progress
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2015-11-03 20:56 EDT-------
Thanks for chasing down the missing dependency side of the equation here.

That still leaves the potential of deadlock if the callout exec ever returns non-zero for any other reason. Are the risks considered low enough to simply document this as a potential hazard of writing process accounting information across an NFSv4 mount?

Revision history for this message
Dave Chiluk (chiluk) wrote :

Yeah, I think so, if you are "creative" enough to put your accounting on your nfs mount, and then have your nfs service fail, you should expect your machine to stop functioning. Otherwise it could be a potential accounting/security hole.

Dave Chiluk (chiluk)
tags: added: sts
Dave Chiluk (chiluk)
description: updated
Revision history for this message
Steve Langasek (vorlon) wrote :

This dependency is already present in vivid and later.

Changed in nfs-utils (Ubuntu Trusty):
assignee: nobody → Dave Chiluk (chiluk)
importance: Undecided → Medium
status: New → In Progress
Changed in nfs-utils (Ubuntu):
status: In Progress → Fix Released
Changed in linux (Ubuntu Trusty):
status: New → Invalid
Dave Chiluk (chiluk)
description: updated
Revision history for this message
Chris J Arges (arges) wrote : Please test proposed package

Hello bugproxy, or anyone else affected,

Accepted nfs-utils into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/nfs-utils/1:1.2.8-6ubuntu1.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in nfs-utils (Ubuntu Trusty):
status: In Progress → Fix Committed
tags: added: verification-needed
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2015-11-07 00:59 EDT-------
Thanks. The updated 1:1.2.8-6ubuntu1.2 version of nfs-common did indeed pull in the keyutils package. This was tested for both the ppc64el and amd64 architectures.

root@p824l:~# apt-show-versions nfs-common
nfs-common:ppc64el/trusty-updates 1:1.2.8-6ubuntu1.1 uptodate
root@p824l:~# apt-show-versions keyutils
keyutils not installed (available for: ppc64el)
root@p824l:~# ls /sbin/request-key
ls: cannot access /sbin/request-key: No such file or directory

root@p824l:/etc/apt# apt-get install nfs-common/trusty-proposed
Reading package lists... Done
Building dependency tree
Reading state information... Done
Selected version '1:1.2.8-6ubuntu1.2' (Ubuntu:14.04/trusty-proposed [ppc64el]) for 'nfs-common'
The following extra packages will be installed:
keyutils
Suggested packages:
open-iscsi watchdog
Recommended packages:
python
The following NEW packages will be installed:
keyutils
The following packages will be upgraded:
nfs-common
1 upgraded, 1 newly installed, 0 to remove and 6 not upgraded.
...

root@p824l:/etc/apt# apt-show-versions nfs-common
nfs-common:ppc64el/trusty-proposed 1:1.2.8-6ubuntu1.2 uptodate
root@p824l:/etc/apt# apt-show-versions keyutils
keyutils:ppc64el/trusty 1.5.6-1 uptodate
root@p824l:/etc/apt# ls /sbin/request-key
/sbin/request-key

tags: added: verification-done
removed: verification-needed
Revision history for this message
Dave Chiluk (chiluk) wrote :

Thanks ruddk, it was a pleasure. We will let this bake in -proposed for a few weeks, and then barring any unforeseen issues it will get promoted to -updates.

Mathew Hodson (mhodson)
no longer affects: linux (Ubuntu)
no longer affects: linux (Ubuntu Trusty)
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nfs-utils - 1:1.2.8-6ubuntu1.2

---------------
nfs-utils (1:1.2.8-6ubuntu1.2) trusty; urgency=medium

  * Add a dependency on keyutils to nfs-common, which fixes hangs in
    waiting_for_key_construction. (LP: #1509120).

 -- Dave Chiluk <email address hidden> Tue, 03 Nov 2015 14:21:05 -0600

Changed in nfs-utils (Ubuntu Trusty):
status: Fix Committed → Fix Released
Revision history for this message
Chris J Arges (arges) wrote : Update Released

The verification of the Stable Release Update for nfs-utils has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2015-11-23 15:56 EDT-------
Confirmed with standard updates repo. Closing.

tags: added: targetmilestone-inin14043
removed: targetmilestone-inin14044
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.