/etc/init/statd.conf: race with portmap startup

Bug #484209 reported by Malcolm Scott
84
This bug affects 16 people
Affects Status Importance Assigned to Milestone
nfs-utils (Ubuntu)
Fix Released
Medium
Steve Langasek
Karmic
Won't Fix
Undecided
Unassigned
Lucid
Fix Released
Medium
Steve Langasek

Bug Description

On my system (a Xen virtual machine running karmic and acting as an NFS client), I see the following during boot:

init: statd pre-start process (831) terminated with status 1
mountall: Event failed

Enabling console output in /etc/init/statd.conf shows that the error is caused by statd's pre-start script attempting to start portmap when it is already started. The pre-start script tries to avoid this by checking portmap's status:

        status portmap | grep -q start/running || start portmap

However adding debug statements showed that portmap's status at this point was "start/spawned" rather than "start/running". I.e. portmap was already in the process of being started but was not yet "running".

Simply changing that line to

        status portmap | grep -q start/ || start portmap

solved the problem for me.

Revision history for this message
Steve Langasek (vorlon) wrote :

Confirmed, there's a race condition here. The race-free fix would be:

        start portmap || true
        status portmap | grep -q start/running

Changed in nfs-utils (Ubuntu):
assignee: nobody → Steve Langasek (vorlon)
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nfs-utils - 1:1.2.0-2ubuntu9

---------------
nfs-utils (1:1.2.0-2ubuntu9) lucid; urgency=low

  * debian/nfs-common.statd.upstart: check for a started portmap in a
    non-racy manner. LP: #484209.
 -- Steve Langasek <email address hidden> Tue, 17 Nov 2009 11:27:37 -0600

Changed in nfs-utils (Ubuntu):
status: Triaged → Fix Released
Revision history for this message
C Snover (launchpad-net-zetafleet) wrote :

This bug exists in karmic, unfixed; is there a reason that nfs-utils 1.2.0-2ubuntu9 was packaged and released for lucid only?

Revision history for this message
Lennart Karssen (l.c.karssen) wrote :

I second C Snover's remark. I was bitten by this bug in Karmic. The proposed fix works for me.

Revision history for this message
K-B (debinix) wrote :

I have tried to confirm this fix in karmic (bug #504776) in my setup,
but I have not been successful to prove that this is the case.

Steve Langasek (vorlon)
Changed in nfs-utils (Ubuntu Karmic):
status: New → Triaged
Revision history for this message
bamyasi (iadzhubey) wrote :

Yes, I can confirm this race condition exists in Karmic, experiencing it every time I shutdown our file server. To add insult to injury, the race also causes random crashes in reiserfs kernel module here (NULL pointer, kernel BUG) forcing full filesystem scan on next boot. Which takes ca. 4 hours on my 6 TB RAID volume, pretty annoying. I will try the suggested fix and report back.

Revision history for this message
Steve Langasek (vorlon) wrote :

Any kernel panics you're seeing in the reiserfs kernel module are bugs in that driver and have nothing to do with this bug report.

Revision history for this message
Mikkel Christensen (mbc-baekhoej) wrote :

I can confirm that this issue exists in Karmic! I applied Malcolm's fix to /etc/init/statd.conf, and I was able to start statd with:

sudo start statd

Revision history for this message
Mikkel Christensen (mbc-baekhoej) wrote :

Sorry, my above comment got posted before I was done writing.

Before applying Malcolm's patch, I got:

start: Job failed to start

when I tried to start statd.

This problem was preventing me from using NFS shares mounted by autofs. Needless to say, it was a critical problem for me that users could not access their files! I really think that this fix needs to be applied to Karmic most urgently. I can't be the only one affected.

If you try to reproduce this bug, please keep in mind that it doesn't fail every time. Being a race condition it succeeds sometimes and fails sometimes.

And finally: Thank you THANK YOU to Malcolm for providing the explanation as well as a solution. I had been tearing my hair out over this one!

It took me hours of work to track down this solution. I will post some of the symptoms I had of the NFS failures here so that other googlers may find this solution more easily:
/var/log/syslog:
automount[1812]: attempting to mount entry /mnt/fileserver/home
automount[5556]: >> mount.nfs: rpc.statd is not running but is required for remote locking.
init: statd pre-start process (6242) terminated with status 1

Revision history for this message
Travis Tabbal (travis-tabbal) wrote :

I am also seeing this in Karmic, I'm also in Xen here. Please update the script. It took me quite some time to find this bug report via Google.

Revision history for this message
Atti (atti84it) wrote :

Same problem with Lucid, I actually solved it by adding
"exec rpc.statd -L $STATDOPTS"
before
"status portmap | grep -q start/ || start portmap"

my statd.conf file actually is:

[...]
        [ "x$NEED_STATD" != xno ] || { stop; exit 0; }

        start portmap || true
        exec rpc.statd -L $STATDOPTS
        status portmap | grep -q start/ || start portmap
        exec sm-notify
end script
[...]

Revision history for this message
Steve Langasek (vorlon) wrote : Re: [Bug 484209] Re: /etc/init/statd.conf: race with portmap startup

On Thu, May 20, 2010 at 07:00:20PM -0000, Atti wrote:
> Same problem with Lucid, I actually solved it by adding
> "exec rpc.statd -L $STATDOPTS"
> before
> "status portmap | grep -q start/ || start portmap"

Utterly broken. If you do that, 'status statd' will correctly report that
the job is *not* running.

And the problem you're seeing isn't the same as this bug, either. You're
probably seeing bug #525154.

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
<email address hidden> <email address hidden>

Revision history for this message
Ricardo Pérez López (ricardo) wrote :

I have the same problem using Ubuntu Server 10.04.1 LTS, fully updated. Sometimes statd starts during system boot and sometimes don't. I resolve it replacing the "start on ..." line with:

   start on runlevel [2]

Revision history for this message
Ricardo Pérez López (ricardo) wrote :

Needless to say that the bug fix doesn't work for me...

Why /etc/init/statd.conf says:

   start on (started portmap or mounting TYPE=nfs)

(requiring portmap to be started before starting the statd job) and later it tries to run portmap manually in the pre-start script? I can't see the point. I probably am wrong, but I think pre-start script is executed AFTER the "start on" event is activated, right?

Revision history for this message
Bruno Rocha Coutinho (bruno-r-coutinho) wrote :

I have a workstation that the fix dosn't worked too.
I'm not gettting errors on "start portmap" but on "status portmap | grep -q start/running"
because status portmap is returning "start/spawned".

Puting "sleep 1" afeter "start portmap || true" worked,
probably because it gave portmap time to start up before
"status portmap | grep -q start/running" test if portmap is runing.

Probably a real solution is to put a loop to wait portmap start up, like:

start portmap || true
while status portmap | grep -v -q start/running
do
        # if isn't running because a error ocurred, give up
        ...
        sleep 1
done
exec sm-notify

Revision history for this message
Bruno Rocha Coutinho (bruno-r-coutinho) wrote :

In the problematic workstation I'm using Ubuntu 10.04.1 LTS, fully updated.

Revision history for this message
Steve Langasek (vorlon) wrote :

> I have a workstation that the fix dosn't worked too.
> I'm not gettting errors on "start portmap" but on "status portmap | grep -q start/running"
> because status portmap is returning "start/spawned".

That would indicate a separate bug in upstart itself. The upstart author has repeatedly assured us that 'start $foo' is not supposed to return until service $foo is running. If you are seeing this is not the case, please file a new bug report against upstart.

Revision history for this message
Bruno Rocha Coutinho (bruno-r-coutinho) wrote :

I couldn't reproduce what I said on comment #15.
Maybe it was using an old version of upstart.

Now I'm getting errors on sm-notify.
Probebly I'm seeing bug #690401 or #525154.

Revision history for this message
Rolf Leggewie (r0lf) wrote :

karmic has seen the end of its life and is no longer receiving any updates. Marking the karmic task for this ticket as "Won't Fix".

Changed in nfs-utils (Ubuntu Karmic):
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.