error : qemudWritePidFile:498 : Failed to open pid file '/var/run/libvirtd.pid' : File exists

Bug #510658 reported by Arnd
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
libvirt (Ubuntu)
Fix Released
Medium
Dustin Kirkland 
Lucid
Fix Released
Medium
Dustin Kirkland 

Bug Description

Libvirt is not robust against failures.

For some reason libvirt segfaulted on me (separate bug):
[12031.895269] libvirtd[1374]: segfault at 10 ip 00007f68357a4224 sp 00007fff49a8aed0 error 4 in libpthread-2.11.1.so[7f683579b000+18000]

upstart tried to restarts it but fails:
Jan 21 14:40:05 rhea init: libvirt-bin main process (1374) killed by SEGV signal
Jan 21 14:40:05 rhea init: libvirt-bin main process ended, respawning
Jan 21 14:40:06 rhea init: libvirt-bin main process (4886) terminated with status 1
Jan 21 14:40:06 rhea init: libvirt-bin main process ended, respawning
Jan 21 14:40:06 rhea init: libvirt-bin main process (4891) terminated with status 1
Jan 21 14:40:06 rhea init: libvirt-bin main process ended, respawning
Jan 21 14:40:06 rhea init: libvirt-bin main process (4896) terminated with status 1
Jan 21 14:40:06 rhea init: libvirt-bin main process ended, respawning
Jan 21 14:40:06 rhea init: libvirt-bin main process (4901) terminated with status 1
Jan 21 14:40:06 rhea init: libvirt-bin main process ended, respawning
Jan 21 14:40:06 rhea init: libvirt-bin main process (4906) terminated with status 1
Jan 21 14:40:06 rhea init: libvirt-bin main process ended, respawning
Jan 21 14:40:06 rhea init: libvirt-bin main process (4911) terminated with status 1
Jan 21 14:40:06 rhea init: libvirt-bin main process ended, respawning
Jan 21 14:40:06 rhea init: libvirt-bin main process (4916) terminated with status 1
Jan 21 14:40:06 rhea init: libvirt-bin main process ended, respawning
Jan 21 14:40:06 rhea init: libvirt-bin main process (4921) terminated with status 1
Jan 21 14:40:06 rhea init: libvirt-bin main process ended, respawning
Jan 21 14:40:06 rhea init: libvirt-bin main process (4926) terminated with status 1
Jan 21 14:40:06 rhea init: libvirt-bin main process ended, respawning
Jan 21 14:40:06 rhea init: libvirt-bin main process (4931) terminated with status 1
Jan 21 14:40:06 rhea init: libvirt-bin respawning too fast, stopped

only after I tried to manually start libvirtd I found out what the problem is:
root@rhea:/var/log# sudo libvirtd
14:51:53.380: error : qemudWritePidFile:498 : Failed to open pid file '/var/run/libvirtd.pid' : File exists

root@rhea:/var/log# cat /var/run/libvirtd.pid
1374

Obviously libvirtd should have removed the stale pidfile (process not longer running), by itself on
subsequent restart.
Furthermore an warning should have been emitted in some logfile, that the stale pidfile was removed.

Best regards,
Arnd

ProblemType: Bug
Architecture: amd64
Date: Thu Jan 21 14:48:15 2010
DistroRelease: Ubuntu 10.04
Package: libvirt-bin 0.7.2-4ubuntu6
ProcEnviron:
 LANG=de_DE.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.32-11.15-generic
SourcePackage: libvirt
Uname: Linux 2.6.32-11-generic x86_64

Related branches

Revision history for this message
Arnd (arnd-arndnet) wrote :
Revision history for this message
Chuck Short (zulcss) wrote :

Looks like pid file is not being removed after it crashes in the upstart script. Can you try remove the /var/run/libvirt.pid file?

chuck

Changed in libvirt (Ubuntu):
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
Arnd (arnd-arndnet) wrote :

@chuck
Well, that was obviously the point I tried to explain.
libvirt should have removed the pid file, by itself.

Off course it works if I remove the file manually.

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

This is likely a problem with Libvirt's new upstart script.

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

This can happen if libvirtd is stopped unexpectedly (crash, kill -9). I doubt it is the upstart job (though I haven't looked at it).

Changed in libvirt (Ubuntu):
status: Confirmed → Triaged
assignee: nobody → Dustin Kirkland (kirkland)
Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Ah, well, I'll unassign myself this bug, then. If it turns out to be a problem with the upstart script, let me know and I'll try to fix it there.

Changed in libvirt (Ubuntu):
assignee: Dustin Kirkland (kirkland) → nobody
Revision history for this message
Arnd (arnd-arndnet) wrote :

Just wanted to drop a line that the problem is still present in libvirt-bin 0.7.5-5ubuntu4.

Howto reproduce:

Simulate a crash of libvirtd:
sudokillall -9 libvirtd

Start libvirtd manually
10:08:04.344: error : qemudWritePidFile:494 : Failed to open pid file '/var/run/libvirtd.pid' : File exists

Verfify that this error is invalid, because libvirtd is really not running:
ps $(cat /var/run/libvirtd.pid)

To make it working again:
sudo rm /var/run/libvirtd.pid
sudo start libvirt-bin

Revision history for this message
Arnd (arnd-arndnet) wrote :

After thinking some time about this, I'm not convinced anymore that libvirtd itself should handle this.

If libvirtd is started and finds the stale pid file there. It could check if the process with the pid contained in that file is running anymore. But even if libvirtd crashed, in the meantime another process could have got exactly the same pid.

Fortunately, upstart _knows_ that libvirtd just crashed and is not longer running anymore, and can therefore safely remove the pidfile (example below). What do you think about this?

E.g.:

  1 description "libvirt daemon"
  2 author "Dustin Kirkland <email address hidden>"
  3
  4 start on runlevel [2345]
  5 stop on runlevel [!2345]
  6
  7 expect daemon
  8 respawn
  9
 10 pre-start script
 11 mkdir -p /var/run/libvirt
 12 if [ -e /var/run/libvirtd.pid ]; then
 13 rm /var/run/libvirtd.pid
 14 fi
 15 end script
 16
 17 # if you used to set $libvirtd_opts in /etc/default/libvirt-bin, you can change the
 18 # 'exec' line here instead
 19 exec /usr/sbin/libvirtd -d

Revision history for this message
IAN DELANEY (johneed) wrote :

Would like to add, realise I'm a regular user, not technical advocate like Arnd and Dustin Kirkland.

On my system,

idella@karmic:~$ sudo /etc/init.d/libvirt-bin restart
 * Restarting libvirt management daemon /usr/sbin/libvirtd [fail]
idella@karmic:~$ sudo libvirtd
15:42:25.008: error : qemudWritePidFile:457 : Failed to open pid file '/var/run/libvirtd.pid' : File exists
della@karmic:~$ ls /var/run/
                ....................
lktapctrl.pid libvirtd.pid udev-configure-printer
           ...........................
idella@karmic:~$ sudo rm /var/run/libvirtd.pid
idella@karmic:~$ sudo /etc/init.d/libvirt-bin restart
 * Restarting libvirt management daemon /usr/sbin/libvirtd [ OK ]

On previous searching, I found this in place in another distro, ?? arch linux. So not new for ubuntu.

Changed in libvirt (Ubuntu):
assignee: nobody → Dustin Kirkland (kirkland)
Changed in libvirt (Ubuntu Lucid):
status: Triaged → In Progress
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libvirt - 0.7.5-5ubuntu16

---------------
libvirt (0.7.5-5ubuntu16) lucid; urgency=low

  * debian/libvirt-bin.upstart: libvirt has a nasty habit of leaving
    it's pidfile lying around when/if it crashes; add a pre-start
    check that removes the pidfile if it exists but the daemon is
    not actually running, LP: #510658
 -- Dustin Kirkland <email address hidden> Thu, 01 Apr 2010 19:47:04 -0500

Changed in libvirt (Ubuntu Lucid):
status: Fix Committed → Fix Released
Revision history for this message
Loïc Minier (lool) wrote :

a) shouldn't libvirtd handle this by itself as the other daemons do? I realize upstart is in the proper position to know that the service crashed, but upstart doesn't care about the pid file

b) perhaps this:
[ -f /var/run/libvirtd.pid ] && rm -f /var/run/libvirtd.pid
could simply be:
rm -f /var/run/libvirtd.pid

c) pidof /usr/sbin/libvirtd seems fragile to me; upstart must really have a good reason if it starts the libvirt-bin job; either this tests for an upstart bug, or this might match a locally started libvirtd; in either case, I don't think this test is needed.

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Hi Loic-

 a) If it was kill -9'd, I don't think the daemon can trap and handle that.

 b) Agreed. I'll change this in the next upload.

 c) I'm up for other ideas, here, but I think pidof() is about the best we can do at this point. I don't think this is about working around upstart bugs as much as it's about cleaning up libvirt's pid file. Agreed, libvirtd should be better about that (and better about not crashing). But fact is, it does die, it does leave its pidfile around, this does prevent libvirt from starting subsequently, and this does annoying Ubuntu users.

You're welcome to upload another one on top of my next upload that cleans this up a little, if you have better solution(s).

As usual, thanks for the feedback, Loic!

Changed in libvirt (Ubuntu Lucid):
status: Fix Released → In Progress
Revision history for this message
Jamie Strandboge (jdstrand) wrote :

Dustin's right on 'a'. Can't catch -9 and if libvirtd dies for some reason, its pid file is left around.

However, Loic is right about 'c' because of qemu:///session (I forgot about this when we discussed this initially). Since the upstart job is only for qemu:///system, and only the libvirtd for qemu:///system writes a pid file to /var/run, and upstart is smart about not starting things it's already started (is this true of 'pre-start'?), we can probably just forget the 'pidof' check and simply remove the pid file unconditionally. A comment in the job might be in order though, since the resulting snippet looks really weird:

pre-start script
        mkdir -p /var/run/libvirt
        rm -f /var/run/libvirtd.pid
        ...

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Cool, thanks, Jamie. I'm uploading that now ...

Changed in libvirt (Ubuntu Lucid):
status: In Progress → Fix Committed
Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Uploaded libvirt_0.7.5-5ubuntu21_source.changes to the queue.

Revision history for this message
Loïc Minier (lool) wrote :

On a): what I meant here is not that libvirt should catch all cases of itself disappearing, but rather that the other daemons don't have a rm on the .pid file; I didn't check what the other daemons to, I guess they are checking for a running process with that pid and whether it's the same program before aborting, and overwriting the pid file otherwise.

It just strikes me that we don't have rm /var/run/<program>.pid in the other upstart jobs, and it's something which should be dealt with by the upstream logic, isn't it?

Of course I'm fine with a simple rm for lucid release as I understand there's little time to developer upstream fixes at this point!

Revision history for this message
Loïc Minier (lool) wrote :

So I checked what two other popular daemons were doing.

If started manually, bind9 just ignores the contents of the .pid file and starts again: if the .pid file is present and mentions a pid which was kill-9-ed, it starts and writes its pid in it instead; if the .pid file is present and mentions a pid which is actually running named (i.e. if you launch it twice), it just starts a second instance and overwrites the .pid file with the new pid.

To test, just install bind9 and run "/usr/sbin/named -u bind" as root.

However in Debian/Ubuntu, we wrap the startup with a sysvinit script (not an upstart script) which launches bind via start-stop-daemon with a pidfile argument -- that probably does the right thing.

If started manually, apache2 will check the contents of a stale .pid flie and proceed if the process is gone and abort if it's an apache2 process. To check, install apache2 and run "APACHE_PID_FILE=/var/run/apache2.pid APACHE_RUN_USER=www-data APACHE_RUN_GROUP=www-data /usr/sbin/apache2 -k start" as root.

So if one relaunches apache2 after its death, it will just start and fixup stuff, but if it's launched twice it will bark and fail:
httpd (pid 23200) already running

I would personally expect this behavior from modern daemons.

But again, just like we workaround the lack of cleverness in bind9 (by using start-stop-daemon), I'm fine with us working around the lack of cleverness of qemu-kvm by rm-ing the .pid file. :-)

Revision history for this message
Dustin Kirkland  (kirkland) wrote : Re: [Bug 510658] Re: error : qemudWritePidFile:498 : Failed to open pid file '/var/run/libvirtd.pid' : File exists

Thanks for the research, Loic.

The logic is perfectly sound, to me.

I wholeheartedly agree that the upstream daemon (or whatever creates
the bloody pid file) should clean up the damn thing. But as you said,
for Lucid, we probably shouldn't go mucking about in the daemon's
internals.

What we could do is replicate that logic in the upstart job.

if [ -f /var/run/libvirtd.pid ]; then
  if ps $(cat /var/run/libvirtd.pid) | grep -qs /usr/sbin/libvirtd; then
    echo "libvirtd is already running, process " $(cat /var/run/libvirtd.pid)
  else
    echo "libvirtd is not running, but its pidfile exists...removing
that now..."
    rm -f /var/run/libvirtd.pid
  fi
fi

Loic- would something like this in the upstart job for Lucid make you happier?

Revision history for this message
Loïc Minier (lool) wrote :

For lucid, I'm happy enough with the "rm" before starting the process -- upstart should already track life and death, so we don't need to do the ps tricks IMO.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libvirt - 0.7.5-5ubuntu20

---------------
libvirt (0.7.5-5ubuntu20) lucid; urgency=low

  * debian/libvirt-bin.upstart:
    - remove unnecessary pid file existence test, LP: #510658
    - revert virbr0 up/down hack added in 0.7.5-5ubuntu17, LP: #345485
 -- Dustin Kirkland <email address hidden> Wed, 07 Apr 2010 15:39:39 -0500

Changed in libvirt (Ubuntu Lucid):
status: Fix Committed → Fix Released
Revision history for this message
Loïc Minier (lool) wrote :

(Not reopening as we have a workaround for Ubuntu (rm of the .pid file) and this is an upstream bug; please reopen if you want to track this in Ubuntu)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.