nagios2 init script has wrong PID file.

Bug #174466 reported by David Ramsden
2
Affects Status Importance Assigned to Milestone
nagios2 (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Binary package hint: nagios2-common

The init script for nagios2 (from nagios2-common) references the wrong PID file. This will only be an issue if the PID file can't be obtained from the nagios.cfg file.

Line 118 of /etc/init.d/nagios2:
[ -n "$THEPIDFILE" ] || THEPIDFILE='/var/run/nagios2/nagios.pid'

This should read:
[ -n "$THEPIDFILE" ] || THEPIDFILE='/var/run/nagios2/nagios2.pid'

(note: nagios2.pid and not nagios.pid)

Version information:
Package: nagios2-common
Priority: optional
Section: universe/net
Installed-Size: 404
Maintainer: Ubuntu MOTU Developers <email address hidden>
Original-Maintainer: Debian Nagios Maintainer Group <email address hidden>
Architecture: all
Source: nagios2
Version: 2.9-1

This was noted in gusty.

Related branches

Revision history for this message
Kevin Lamontagne (kevinlamontagne) wrote :

I confirm
This is a problem when calling "/etc/init.d/nagios2 restart" because it starts a second nagios2 without stopping the first one. Either one responds randomly to requests on the Web interface, one with the old and one with the new configuration.

Revision history for this message
David Ramsden (david-hexstream) wrote :

Exactly. It was after a "/etc/init.d/nagios2 restart" that I noticed we were getting two alerts emailed to us. Process list showed two nagios processes running and after some investigation I traced it back to the PID file problem in the init script.

However, I'm not sure why the restart spawns another nagios process? Surely it should be able to read the nagios configuration file and determine the PID file location in the first instance instead of falling back to the hard coded "THEPIDFILE='/var/run/nagios2/nagios.pid'"? I'll try to find 5 minutes to add a "set -x" to the init script and see exactly what's going on.

Revision history for this message
David Ramsden (david-hexstream) wrote :

In fact, looking in to this a bit more I think there is a more serious bug in the init script. A reload appears to to remove the PID file all together causing the issue that Kevin Lamontagne also reported.

root@noc:/var/run/nagios2# /etc/init.d/nagios2 start
 * Starting nagios2 monitoring daemon nagios2 [ OK ]
root@noc:/var/run/nagios2# ls -l
total 4
-rw-r--r-- 1 nagios nagios 6 2007-12-17 12:48 nagios2.pid

root@noc:/var/run/nagios2# /etc/init.d/nagios2 reload
 * Reloading nagios2 monitoring daemon configuration files nagios2 [ OK ]
root@noc:/var/run/nagios2# ls -l nagios2.pid
ls: nagios2.pid: No such file or directory

Note that a reload has removed the nagios2.pid file but the nagios2 process is still running.

If you then try another reload, the init script can't find the nagios2.pid file so assumes it's not running but a ps says otherwise:

root@noc:/var/run/nagios2# /etc/init.d/nagios2 reload
 * Reloading nagios2 monitoring daemon configuration files nagios2 * Not running.
                                                                         [fail]
root@noc:/var/run/nagios2# ls -l nagios2.pid
ls: nagios2.pid: No such file or directory
root@noc:/var/run/nagios2# ps aux | grep nagios
nagios 19207 0.1 0.1 23628 2124 ? SNsl 12:50 0:00 /usr/sbin/nagios2 -d /etc/nagios2/nagios.cfg

Now there are two scenarios. Firstly, if you didn't check with ps and assumed the init script is correct (i..e nagios2 isn't running when it actually is) and start nagios2, you end up with two nagios2 processes running. Or if you try to first run "/etc/init.d/nagios2 stop" this won't kill the process because there's no PID file (remember it was removed by "/etc/init.d/nagios2 reload") and then assume it has been stopped and start it again, you once again end up with two nagios2 processes running.

root@noc:/var/run/nagios2# ls -l nagios2.pid
ls: nagios2.pid: No such file or directory
root@noc:/var/run/nagios2# /etc/init.d/nagios2 stop
 * Stopping nagios2 monitoring daemon nagios2 [ OK ]
root@noc:/var/run/nagios2# ps aux | grep nagios2
nagios 19207 0.0 0.6 28688 8116 ? SNsl 12:50 0:00 /usr/sbin/nagios2 -d /etc/nagios2/nagios.cfg
root@noc:/var/run/nagios2# /etc/init.d/nagios2 start
 * Starting nagios2 monitoring daemon nagios2 [ OK ]
root@noc:/var/run/nagios2# ps aux | grep nagios
nagios 19207 0.0 0.6 28688 8116 ? SNsl 12:50 0:00 /usr/sbin/nagios2 -d /etc/nagios2/nagios.cfg
nagios 19787 1.0 0.1 22432 1636 ? SNsl 12:57 0:00 /usr/sbin/nagios2 -d /etc/nagios2/nagios.cfg
root@noc:/var/run/nagios2# cat nagios2.pid
19787
root@noc:/var/run/nagios2# /etc/init.d/nagios2 stop
 * Stopping nagios2 monitoring daemon nagios2 [ OK ]
root@noc:/var/run/nagios2# ps aux | grep nagios2
nagios 19207 0.0 0.6 28688 8116 ? SNsl 12:50 0:00 /usr/sbin/nagios2 -d /etc/nagios2/nagios.cfg

I hope this makes sense?

Revision history for this message
David Ramsden (david-hexstream) wrote : Re: nagios2 init script has wrong PID file and incorrectly removes PID file on a reload.

I've had a further look in to this bug and it doesn't actually appear to be an issue with the nagios2 init script. I think the problem is actually in the killproc() sub-routine that's part of /lib/lsb/init-functions (lsb-base).

killproc() from /lib/lsb/init-functions (lsb-base) does not check to see what type of signal was sent. If signal 1 is sent (HUP), it removes the PID file. The PID file shouldn't be removed on this action (I may be wrong?). I'll raise another bug against init-functions for lsb-base and see what feedback I get.

Anyway, I've attached a patch for the original problem where the nagios2 init script has the wrong PID file, although this isn't a major problem. More for consistency.

Revision history for this message
David Ramsden (david-hexstream) wrote :

Just for completeness, I've opened a bug against lsb-base. See bug #176934.

Julius Bloch (jbloch)
Changed in nagios2:
status: New → Confirmed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nagios2 - 2.11-1ubuntu1

---------------
nagios2 (2.11-1ubuntu1) hardy; urgency=low

  * debian/nagios2-common.nagios2.init
    - Fix init script pid file. (LP: #174466)
  * Update maintainers as per spec.

 -- Chuck Short <email address hidden> Mon, 07 Apr 2008 14:36:49 -0400

Changed in nagios2:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.