expired kerberos credentials cause significant syslog spam

Bug #293705 reported by Andrew Pollock
24
This bug affects 2 people
Affects Status Importance Assigned to Milestone
nfs-utils (Ubuntu)
Fix Released
Undecided
Unassigned
Hardy
Fix Released
Medium
Unassigned
Intrepid
Won't Fix
Medium
Unassigned
nfs-utils (openSUSE)
Fix Released
Critical

Bug Description

This bug can cause an installation's filesystem to fill up due to excessive logging by rpc.gssd, when the user's Kerberos credentials have expired and they have /home mounted via Kerberised NFS.

An explanation of how the bug has been addressed in the development branch, including the relevant version numbers of packages modified in order to implement the fix.

This bug has been fixed in the development branch, by way of a patch (fixed in 1:1.1.2-4ubuntu2)

The patch is attached to this bug report.

TEST CASE: have a system that mounts /home via Kerberised NFS. kdestroy. Wait ~15 minutes for rpc.gssd's cached credentials to expire. Perform activities that attempt to access files in $HOME. Much logging like what is below will occur.

This is a very simple patch. The potential for regression is slim, as all this patch does is decrease the logging verbosity of a few messages below that of normal operation.

Oct 8 17:05:18 swan rpc.gssd[4747]: WARNING: Failed to create krb5 context for user with uid 85153 for server *REDACTED*
Oct 8 17:05:18 swan rpc.gssd[4747]: ERROR: GSS-API: error in gss_acquire_cred(): Unspecified GSS failure. Minor code may provide more information - No credentials cache found
Oct 8 17:05:18 swan rpc.gssd[4747]: WARNING: Failed to create krb5 context for user with uid 85153 for server *REDACTED*
Oct 8 17:05:18 swan rpc.gssd[4747]: ERROR: GSS-API: error in gss_acquire_cred(): Unspecified GSS failure. Minor code may provide more information - No credentials cache found
Oct 8 17:05:18 swan rpc.gssd[4747]: WARNING: Failed to create krb5 context for user with uid 85153 for server *REDACTED*
Oct 8 17:05:18 swan rpc.gssd[4747]: ERROR: GSS-API: error in gss_acquire_cred(): Unspecified GSS failure. Minor code may provide more information - No credentials cache found
Oct 8 17:05:18 swan rpc.gssd[4747]: WARNING: Failed to create krb5 context for user with uid 85153 for server *REDACTED*
Oct 8 17:05:18 swan rpc.gssd[4747]: ERROR: GSS-API: error in gss_acquire_cred(): Unspecified GSS failure. Minor code may provide more information - No credentials cache found
Oct 8 17:05:18 swan rpc.gssd[4747]: WARNING: Failed to create krb5 context for user with uid 85153 for server *REDACTED*
Oct 8 17:05:18 swan rpc.gssd[4747]: ERROR: GSS-API: error in gss_acquire_cred(): Unspecified GSS failure. Minor code may provide more information - No credentials cache found
Oct 8 17:05:18 swan rpc.gssd[4747]: WARNING: Failed to create krb5 context for user with uid 85153 for server *REDACTED*
Oct 8 17:05:18 swan rpc.gssd[4747]: ERROR: GSS-API: error in gss_acquire_cred(): Unspecified GSS failure. Minor code may provide more information - No credentials cache found
Oct 8 17:05:18 swan rpc.gssd[4747]: WARNING: Failed to create krb5 context for user with uid 85153 for server *REDACTED*
Oct 8 17:05:18 swan rpc.gssd[4747]: ERROR: GSS-API: error in gss_acquire_cred(): Unspecified GSS failure. Minor code may provide more information - No credentials cache found

Revision history for this message
Andrew Pollock (apollock) wrote :
Revision history for this message
Andrew Pollock (apollock) wrote :
Revision history for this message
Bryce Harrington (bryce) wrote :

Hi Andrew, thanks for the debdiff.

I've a question though...

This patch appears to be doing two things. First, it's altering the priority on some debug messages - this part I think is perfectly fine.

The other part is affecting the message printing logic:

+--- nfs-utils-1.1.2.orig/utils/gssd/err_util.c
++++ nfs-utils-1.1.2/utils/gssd/err_util.c
+@@ -49,6 +49,10 @@
+ {
+ va_list args;
+
++ /* Decrease verbosity so we don't log so much */
++ if ((priority - verbosity) > 1)
++ return;
++
+ /* Don't bother formatting a message we're never going to print! */
+ if (priority > verbosity)
+ return;

So this is changing the logic so it avoids printing messages unless the 'priority' is more than 'verbosity+1'. This part I think should not be included. I think it's unnecessary anyway, and just doubles up on your other fix.

Would you mind re-rolling the debdiff with this piece omitted? (Or explain why it's needed if I'm missing something.)

Revision history for this message
Vince Busam (vbusam) wrote :

That patch was created with an earlier version of nfs-utils which did not have the subsequent "if (priority > verbosity)" check in it, so it's redundant now. Here's a debdiff without it.

Revision history for this message
Timo Aaltonen (tjaalton) wrote :

Interesting, commit d1e7ccccd9153 in 1.1.2 already fixed it here. This should be upstreamed too.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nfs-utils - 1:1.1.2-4ubuntu2

---------------
nfs-utils (1:1.1.2-4ubuntu2) jaunty; urgency=low

  * Add patch to reduce verbosity (LP: #293705)

 -- Andrew Pollock <email address hidden> Tue, 04 Nov 2008 09:38:48 -0800

Changed in nfs-utils:
status: New → Fix Released
Revision history for this message
Bryce Harrington (bryce) wrote :

Andrew, can you also please follow up with upstream to include the patch there as well? Thanks in advance.

Revision history for this message
Andrew Pollock (apollock) wrote :

I will ensure the patch is submitted upstream. Do I need to do anything further in order to have this bug fixed in Hardy as well?

Revision history for this message
Andrew Pollock (apollock) wrote :
Revision history for this message
Andrew Pollock (apollock) wrote :
description: updated
Revision history for this message
Andrew Pollock (apollock) wrote :

It looks like this was upstreamed:

commit 09c7ad1cd9c5ca2fc46631a0057d47309abc8706
Author: Kevin Coffman <email address hidden>
Date: Mon Jan 5 14:07:05 2009 -0500

    gssd: By default, don't spam syslog when users' credentials expire

    Change the priority of "common" log messages so that syslog doesn't get
    slammed/spammed when users' credentials expire, or there is another
    common
    problem which would cause error messages for all context creation
    requests.

    Note that this will now require that gssd or svcgssd option "-v" is used
    to
    debug these common cases.

    Original patch from Andrew Pollock <email address hidden>.

    Signed-off-by: Kevin Coffman <email address hidden>
    Signed-off-by: Steve Dickson <email address hidden>
    CC: Andrew Pollock <email address hidden>

If I'm reading things right, it was committed before the tag for the 1.2.0 release, so I'd have expected the Ubuntu patch to not apply, but it seems to still be being applied.

Revision history for this message
John Dong (jdong) wrote :

ACK from -sru for intrepid and hardy debdiffs. Wow, apologies for the pathetic delay here! Didn't see this bug on my radar until now :-/

Revision history for this message
Andrew Pollock (apollock) wrote :

Never mind, there's a couple of parts of the patch still outstanding, I'll try to upstream them.

Revision history for this message
Mathias Gug (mathiaz) wrote :

Marking Won't fix for intrepid as EOL is a couple of weeks away.

Changed in nfs-utils (Ubuntu Hardy):
importance: Undecided → Medium
Changed in nfs-utils (Ubuntu Intrepid):
importance: Undecided → Medium
status: New → Won't Fix
Changed in nfs-utils (Ubuntu Hardy):
status: New → In Progress
Revision history for this message
Mathias Gug (mathiaz) wrote :

Upload to hardy-proposed.

Revision history for this message
Martin Pitt (pitti) wrote : Please test proposed package

Accepted nfs-utils into hardy-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in nfs-utils (Ubuntu Hardy):
status: In Progress → Fix Committed
tags: added: verification-needed
Revision history for this message
Matthew L. Dailey (matthew-l-dailey) wrote :

I got a chance to test the proposed package today and there is good news and bad news.

The good news is that it installs cleanly and doesn't seem to introduce any regressions or other problems.

The bad news is that it doesn't actually fix the issue identified in the original bug description - specifically reducing the log level for the messages:
ERROR: GSS-API: error in gss_acquire_cred(): Unspecified GSS failure. Minor code may provide more information - No credentials cache found
WARNING: Failed to create krb5 context for user with uid xxxx for server yyyy

Looking at the patch, these messages don't seem to be addressed. The GSS-API error is in display_status_2() in gss_util.c and the krb5 context message is in handle_krb5_upcall() in gssd_proc.c.

I can't make any promises, but if I get some time to put a debdiff together that covers this along with Andrew's stuff, I will. If someone else has time to get to it first, though, I won't feel slighted. :-)

Revision history for this message
Matthew L. Dailey (matthew-l-dailey) wrote :

So here's what I figured out:

It looks like the logging verbosity of most messages was decreased as of nfs-utils 1.1.5 upstream. See commit 09c7ad1cd9c5ca2fc46631a0057d47309abc8706 in these release notes: https://sourceforge.net/project/shownotes.php?release_id=665939

So, I created a patch that basically back-ports these changes to the hardy version. This includes most of Andrew's work and some others, including the krb5 context warnings.

The change to the logging of the GSS-API message seems unnecessary. As far as I can tell, this function (display_status_1) isn't being called by anything. It's the message in display_status_2 that could be changed. However, in the case of expired credentials, the logging verbosity is already lowered in this function. In the case of missing credentials, the gss error is unspecified, so testing for it would mean some additional programming that seems a bit much for a simple patch.

I have tested my patch on our systems, and in the case of expired credentials, the logs are now silent. In the case of missing credentials, the GSS-API error still appears, but the krb5 context warning does not.

Please let me know if I'm missing something, but this seems like a better way to resolve this particular bug.

Revision history for this message
Andrew Pollock (apollock) wrote :

From my casual inspection of the patch, it looks good. Thanks for doing the detective work Matthew!

Revision history for this message
Martin Pitt (pitti) wrote :

Looks good, thanks! Please upload with fixed target ("hardy-proposed") and version number ("1:1.1.2-2ubuntu2.4", on top of the current 1:1.1.2-2ubuntu2.3 in proposed).

Revision history for this message
Matthew L. Dailey (matthew-l-dailey) wrote :

Martin:

Hopefully, this is what you were after. I added my patch to the nfs-utils_1.1.2-2ubuntu2.3 version in proposed and fixed the changelog to reflect the correct version number and target. On our systems, this builds and installs properly and fixes the verbosity issues in this bug.

Thanks for your help - let me know if you have any questions or if I've messed anything up. :-)

Revision history for this message
Martin Pitt (pitti) wrote :

Accepted nfs-utils into hardy-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Revision history for this message
Matthew L. Dailey (matthew-l-dailey) wrote :

I have installed and run 1.1.2-2ubuntu2.4 from hardy-proposed on a test system for a couple of days. Everything is stable and this bug is now fixed.

As far as I'm concerned, this is good to go. :-)

Martin Pitt (pitti)
tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nfs-utils - 1:1.1.2-2ubuntu2.4

---------------
nfs-utils (1:1.1.2-2ubuntu2.4) hardy-proposed; urgency=low

  * Another patch to reduce verbosity - backported from upstream version 1.1.5
    (LP: #293705)
 -- <email address hidden> (Matthew l. Dailey) Tue, 06 Apr 2010 15:30:21 -0400

Changed in nfs-utils (Ubuntu Hardy):
status: Fix Committed → Fix Released
Revision history for this message
In , Mika Fischer (zoop) wrote :

User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.6) Gecko/20100628 Ubuntu/10.04 (lucid) Firefox/3.6.6

SSH by default deletes Kerberos credentials when a user logs out.

If the user left a program running (for instance via screen), and if Kerberos credentials are needed to access the home directories (kerberized NFS), rpc.gssd will fail to obtain Kerberos credentials.

The problem is that it generates excessive amounts of warnings in the syslog to this effect (about 1100 wrnings per second), which then quickly fill up the hard drive.

Reproducible: Always

Steps to Reproduce:
1. Log in (via SSH) to host that mounts home directory via kerberized NFS
2. Start screen with some process accessing the home dir inside
3. Detach screen
4. Close SSH session
5. Wait for rpc.gssd credentials cache to expire
Actual Results:
When the process still running on the target host tries to access the home directory, rpc.gssd will try and fail to obtain kerberos credentials for the user. It will then spam the syslog with the following warning
----
<date> <hostname> rpc.gssd[<pid>]: WARNING: Failed to create krb5 context for user with uid <uid> for server <other hostname>
----
This is repeated ad infinitum until the offending process is killed manually. The logfile otherwise quickly fills up the partition.

Expected Results:
Maybe one warning or no warning at all should be emitted (the latter is the case for *expired* credentials). See also https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/293705 for the case of expired credentials.

A fix fo this should probably also be coordinated with upstream.

Revision history for this message
In , Sjayaraman (sjayaraman) wrote :

Thanks for the bug report. This issue has been already fixed in upstream.

I have built an updated package including the fix and made it available here:
   http://www.suse.com/~sjayaraman/test-pkgs/nfs-utils/
  (syncing could take a few hours)

You would need to update your nfs-kernel-server package from the above location. Please report back if this fixes the issue for you.

Revision history for this message
In , Sjayaraman (sjayaraman) wrote :

Created an attachment (id=379436)
Upstream patch

Attaching the upstream patch that fixes the problem for completeness.

Revision history for this message
In , Mika Fischer (zoop) wrote :

After testing the packages I can confirm that they fix the problem for us.

Do you recommend that we deploy them on all our 11.2 hosts or should we wait for an official update?

Also, this probably should be fixed in 11.3. However there we have a similar but slightly different behaviour. The error message does not come from rpc.gssd but from the kernel itself. It is however caused by the same circumstances and also spams the log so quickly that there's a good chance of filling up the /var partition.

The error message in this case is (on the NFS client):
kernel: [1301515.320931] Error: state manager failed on NFSv4 server <NFS server hostname> with error 13

Error 13 probably means NFSERR_ACCES. Which probably means that the process does not have permissions to access the file because the Credentials Cache was removed when the user logged out.

Do you want me to open a separate bug report for this?

Revision history for this message
In , Sjayaraman (sjayaraman) wrote :

(In reply to comment #4)
> After testing the packages I can confirm that they fix the problem for us.

Thanks for confirming.

> Do you recommend that we deploy them on all our 11.2 hosts or should we wait
> for an official update?

You should wait for an official update.

> Also, this probably should be fixed in 11.3. However there we have a similar
> but slightly different behaviour. The error message does not come from rpc.gssd
> but from the kernel itself. It is however caused by the same circumstances and
> also spams the log so quickly that there's a good chance of filling up the /var
> partition.
>
>
> Do you want me to open a separate bug report for this?

Yes, it sounds different from this one. Please open a separate bugzilla for that issue.

Revision history for this message
In , Nfbrown (nfbrown) wrote :

I have submitted an update for 11.2 containing this patch, but I'm not confident that an update will be released in any great hurry.

This bug is fixed in 11.3, It might be appropriate to upgrade to 11.3, or
just get the nfs-utils package from there.

The update request id is 45345

Revision history for this message
In , Joschibrauchle (joschibrauchle) wrote :

Hello everyone,

as Mika Fischer described, this bug exists also in OpenSUSE 11.3/11.4, except that the error message is:
--------
kernel: [<timestamp>] Error: state manager failed on NFSv4 server <IP_of_nfs_server> with error 13
--------

Otherwise, the description is still exactly valid:
- SSH login
- start user job, which accesses kerberized nfs user home
- SSH logout
- Kerberos cache expires
- /var/log/messages is spammed with ~1000 errors PER SECOND!
- /var partition out of space!

So, I could not find the corresponding bug report for 11.3/11.4. Is there one yet?

Changed in nfs-utils (openSUSE):
importance: Unknown → Medium
status: Unknown → Fix Released
Revision history for this message
In , Joschibrauchle (joschibrauchle) wrote :

I have reported the above as a separate bug at https://bugzilla.novell.com/show_bug.cgi?id=705470

Changed in nfs-utils (openSUSE):
importance: Medium → Critical
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.