Non-thread-safe functions used in multi-threaded rpc.gssd

Bug #1927745 reported by Andreas Hasenack
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
nfs-utils (Debian)
Fix Released
Unknown
nfs-utils (Fedora)
Fix Released
Undecided
nfs-utils (Ubuntu)
Fix Released
High
Andreas Hasenack
Bionic
Fix Released
High
Andreas Hasenack

Bug Description

[Impact]
rpc-gssd can hang or crash when a kerberos nfsv4 mount point is accessed by multiple users simultaneously. The problem happens because the daemon uses the strtok() function which is not thread safe.

The fix from upstream removes strtok() and uses strsep() instead. These patches are already applied in focal and later, via merges from debian.

[Test Plan]
As with all race conditions, this test case may take a while to reproduce the problem.

# Create a bionic VM. It seems to help if it's created with multiple cpus/cores. I had more success with 4 cores and 1GiB of RAM.
# if using lxd to launch a VM, you can run this before: "lxc config set vm-name limits.cpu=4". Just don't forget to undo it, or set to your normal number of CPUs, after the test
# Login and get its ip, and take note of it:
export IP=$(ip route get default 8.8.8.8 | grep ^8 | awk '{print $7}')
echo $IP

# adjust /etc/hosts:
echo "$IP $(hostname).example.com" | sudo tee -a /etc/hosts

# adjust /etc/resolv.conf:
echo "search example.com" | sudo tee -a /etc/resolv.conf

# verify hostname -f returns the fqdn of the vm, i.e., a name with the .example.com domain:
hostname -f

# If you still don't get the correct FQDN, try the below, adjusting for your hostname if $(hostname) isn't working properly:
sudo hostnamectl set-hostname $(hostname).example.com

# Run these commands, and when asked:
# - for realm: EXAMPLE.COM
# - for kdc and admin server: use the vm's IP

sudo apt update && sudo apt install nfs-server krb5-kdc krb5-admin-server krb5-user gcc

# create a kerberos realm. When prompted, use any password you want:
sudo krb5_newrealm

# create an nfs service ticket, and store it in the keytab
sudo kadmin.local -q "addprinc -randkey nfs/$(hostname -f)"
sudo kadmin.local -q "ktadd nfs/$(hostname -f)"

# create test directories
sudo mkdir -p /mnt/test_krb5/ /export
sudo touch /export/foo

# adjust nfs config and restart the nfs server:
sudo sed -r -i "s,^NEED_SVCGSSD=.*,NEED_SVCGSSD=\"yes\"," /etc/default/nfs-kernel-server
sudo sed -r -i "s,^NEED_GSSD=.*,NEED_GSSD=\"yes\"," /etc/default/nfs-common
sudo systemctl restart nfs-server

# configure an nfs export:
echo "/export *(sec=krb5,rw,sync,no_subtree_check)" | sudo tee -a /etc/exports
sudo exportfs -rva

# confirm it's available
sudo showmount -e localhost

# mount it
sudo mount $(hostname -f):/export /mnt/test_krb5/
sudo ls -la /mnt/test_krb5

# download bug attachments
wget -ct0 https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/1927745/+attachment/5496166/+files/stat_as.c https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/1927745/+attachment/5496167/+files/bz1419280_test_threads
chmod +x bz1419280_test_threads

# build reproducer
gcc stat_as.c -o stat_as

# run test script as root. It may take a few minutes to trigger the bug
sudo ./bz1419280_test_threads

# wait
# Once you get the confirmation:
calling stat on '/mnt/test_krb5/foo' with uids 9995 through 10035
reproduced the bug after 114 iterations

# Check for a "stat_as" D state process:
$ ps axw|grep stat_as
17814 pts/1 D 0:00 ./stat_as /mnt/test_krb5/foo 9995 10035

# To restore functionality, restart rpc-gssd:
sudo systemctl restart rpc-gssd.service

With the updated packages, the script will not detect the bug and never stop.

[Where problems could occur]
NFS v4 services are more complex than earlier versions, and are comprised of several services/daemons. It's possible for the restart done after the automatic package upgrade to show up as a regression due to several factors:
- not all needed services were restarted (bug, but not introduced by this change)
- depending on mount options, client mount points may appear as hung and take a while to recover
- configuration errors on the server which were up until now not noticed, and only manifest themselves after a restart
- some sites, due to the lack of configuration options in /etc/default/nfs-*, might have overriden systemd service files and hardcoded other command line options there. If not done properly (i.e., not done in /etc/systemd via overrides), these local changes will be lost after the package upgrade. I know of at least rpc-gssd, which has no command-line options available in /etc/default/nfs-*, and I know of users who have tweaked this service in many different ways to add things like -v or -n to its command line option.

[Other Info]
The upstream patches have been applied since February 2017 and have not been changed or reverted. They are also applied in Debian and Fedora, and ubuntu since focal at least.

There is an additional patch, but part of the fix, which dupes the string for appropriate logging. Its memory is also freed.

It may be hard to reproduce this bug in a test environment. I've gotten to the error in as little as a few seconds, but other times it took hundreds of attempts. YMMV.

[Original Description]

Fixed in focal and later, due to sync from debian

Bionic affected.

I'll add a proper description in a moment.

RH: https://bugzilla.redhat.com/show_bug.cgi?id=1419280
Debian BTS: https://bugs.debian.org/895381
ML: http://www.spinics.net/lists/linux-nfs/msg62111.html
ML: http://www.spinics.net/lists/linux-nfs/msg62099.html

Related branches

Changed in nfs-utils (Ubuntu Bionic):
status: New → In Progress
assignee: nobody → Andreas Hasenack (ahasenack)
importance: Undecided → High
Changed in nfs-utils (Ubuntu):
importance: Undecided → High
status: In Progress → Fix Released
Changed in nfs-utils (Fedora):
importance: Unknown → Undecided
status: Unknown → Fix Released
Changed in nfs-utils (Debian):
status: Unknown → Fix Released
description: updated
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Reproducer tool from RH bug

description: updated
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Reproducer script from RH bug

description: updated
description: updated
description: updated
description: updated
description: updated
description: updated
description: updated
description: updated
description: updated
description: updated
description: updated
description: updated
description: updated
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

MP approved, package uploaded to bionic-proposed unapproved.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

SRU team pinged. Package is confirmed in the unapproved queue at https://launchpad.net/ubuntu/bionic/+queue?queue_state=1&queue_text=nfs-utils

Revision history for this message
Robie Basak (racb) wrote : Please test proposed package

Hello Andreas, or anyone else affected,

Accepted nfs-utils into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/nfs-utils/1:1.3.4-2.1ubuntu5.4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in nfs-utils (Ubuntu Bionic):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-bionic
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Testing was kind of successful, in the sense that the patch fixes the problem. But I found a pre-existing bug in the packaging that manifested itself in this testing.

After upgrading the packages, not all services are restarted. Critically for this bug we are addressing here, rpc.gssd is not restarted:

Before upgrade:
  442 ? Ss 0:00 /usr/sbin/blkmapd
 7146 ? Ss 0:00 /usr/sbin/rpc.gssd
 7399 ? Ss 0:00 /usr/sbin/rpc.idmapd
 7406 ? Ss 0:00 /usr/sbin/rpc.mountd --manage-gids
 7400 ? Ss 0:00 /usr/sbin/rpc.svcgssd

After pkg upgrade:
  442 ? Ss 0:00 /usr/sbin/blkmapd
 7146 ? Ss 0:00 /usr/sbin/rpc.gssd
 8421 ? Ss 0:00 /usr/sbin/rpc.idmapd
 8422 ? Ss 0:00 /usr/sbin/rpc.mountd --manage-gids
 8420 ? Ss 0:00 /usr/sbin/rpc.svcgssd

We can see that blkmapd and rpc.gssd were not restarted, which means that just upgrading the packages doesn't resolve the bug: the user must issue `sudo systemctl restart nfs-utils.service`.

This happens because the nfs-utils.service is not enabled (and cannot be enabled: it's a "fake" service just meant to coordinate the several daemons necessary for NFS).

If nfs-utils.service was ever started or restarted on the machine, then the upgrade will restart it just fine.

I'm currently investigating what are the best options to address this, and will probably file a new bug.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Download full text (5.8 KiB)

DO NOT RELEASE TO UPDATES YET
DO NOT RELEASE TO UPDATES YET

The fix is correct and working, but requires a manual restart of the services until #1928259 is fixed. We will want to fix both at the same time.

With that in mind, here goes my testing report.

Reproducing the problem with the packages from bionic:

nfs-common:
  Installed: 1:1.3.4-2.1ubuntu5.3
  Candidate: 1:1.3.4-2.1ubuntu5.3
  Version table:
 *** 1:1.3.4-2.1ubuntu5.3 500
        500 http://br.archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages
        500 http://br.archive.ubuntu.com/ubuntu bionic-security/main amd64 Packages
        100 /var/lib/dpkg/status
     1:1.3.4-2.1ubuntu5 500
        500 http://br.archive.ubuntu.com/ubuntu bionic/main amd64 Packages

first attempt:
$ sudo ./bz1419280_test_threads
Iter 1
calling stat on '/mnt/test_krb5/foo' with uids 9995 through 10035
Iter 2
calling stat on '/mnt/test_krb5/foo' with uids 9995 through 10035
Iter 3
calling stat on '/mnt/test_krb5/foo' with uids 9995 through 10035
Iter 4
calling stat on '/mnt/test_krb5/foo' with uids 9995 through 10035
reproduced the bug after 4 iterations
ubuntu@bionic-gssd-bug-1927745:~$ ps axw|grep stat_as
 8248 pts/1 D 0:00 ./stat_as /mnt/test_krb5/foo 9995 10035
 8317 pts/1 S+ 0:00 grep --color=auto stat_as

second attempt (after restarting rpc-gssd):
$ sudo ./bz1419280_test_threads
Iter 1
calling stat on '/mnt/test_krb5/foo' with uids 9995 through 10035
Iter 2
calling stat on '/mnt/test_krb5/foo' with uids 9995 through 10035
Iter 3
calling stat on '/mnt/test_krb5/foo' with uids 9995 through 10035
reproduced the bug after 3 iterations
ubuntu@bionic-gssd-bug-1927745:~$ ps axw|grep stat_as
 8631 pts/1 D 0:00 ./stat_as /mnt/test_krb5/foo 9995 10035 ...

Read more...

tags: added: block-proposed-bionic verification-done-bionic
removed: verification-needed-bionic
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

I uploaded the fix for the blocking bug #1928259, with a version bump, and it includes the fix for this bug here. It's in the SRU queue now.

Revision history for this message
Łukasz Zemczak (sil2100) wrote :

I see that the upload with the aforementioned fix is already in -proposed and verified. Removing the block-proposed tag and proceeding with the release of this package.

tags: added: verification-done
removed: block-proposed-bionic verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nfs-utils - 1:1.3.4-2.1ubuntu5.5

---------------
nfs-utils (1:1.3.4-2.1ubuntu5.5) bionic; urgency=medium

  * d/nfs-common.postinst: always start nfs-utils.service, so the
    restart in the #DEBHELPER# section can do its job if needed
    (LP: #1928259)

 -- Andreas Hasenack <email address hidden> Mon, 24 May 2021 17:38:47 -0300

Changed in nfs-utils (Ubuntu Bionic):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.