since charm revision 13 check_ipmi_sensors script can't write to /var/lib/nagios

Bug #1906991 reported by Nikolay Vinogradov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
NRPE Charm
Fix Released
Critical
Joe Guo
hw-health-charm
Fix Released
Critical
Joe Guo

Bug Description

Hi,

I've been running hw-health charm rev. 13 on an HP hardware server. Getting alerts from NRPE 'ipmi' services (see the attached screenshot).

# /etc/cron.d# cat hwhealth_ipmi
0,5,10,15,20,25,30,35,40,45,50,55 * * * * nagios /usr/local/lib/nagios/plugins/cron_ipmi_sensors.py

The script tries to create the temporary file in /var/lib/nagios to then rename it to its final form, but creation fails because of "Permission denied":

# sudo -u nagios /usr/local/lib/nagios/plugins/cron_ipmi_sensors.py
Cannot write output file /var/lib/nagios/ipmi_sensors.out.tmp, error [Errno 13] Permission denied: '/var/lib/nagios/ipmi_sensors.out.tmp'

The directory is not writeable by "nagios" user indeed, only by 'root':

# ls -la /var/lib/ | grep nagios
drwxr-xr-x 3 root root 4096 Dec 6 17:28 nagios

However since the commit [1] the check is not run by 'root' anymore. I'm not sure if that's the only change that caused it, or there was something else in other charms, but I'm filing the bug to hw-health charm as failing check is coming from hw-health charm.

Also I'm not seeing that on another deployment that runs hw-health charm rev. 12, as cron.d script runs as 'root':

$ cat /etc/cron.d/hwhealth_ipmi
4,9,14,19,24,29,34,39,44,49,54,59 * * * * root /usr/local/lib/nagios/plugins/cron_ipmi_sensors.py -xT entity_presence

[1] https://git.launchpad.net/charm-hw-health/commit/?id=ee9ba763

Related branches

Revision history for this message
Nikolay Vinogradov (nikolay.vinogradov) wrote :
Changed in charm-hw-health:
status: New → Confirmed
Changed in charm-hw-health:
importance: Undecided → Critical
Joe Guo (guoqiao)
Changed in charm-hw-health:
assignee: nobody → Joe Guo (guoqiao)
status: Confirmed → In Progress
milestone: none → 20.05
Revision history for this message
Xav Paice (xavpaice) wrote :

Suspect this is related to LP:#1866382 (not a duplicate, though). We should consider having the NRPE charm ensure that the dir /var/lib/nagios (nagios user home dir) is owned like so:

drwxr-sr-x 3 nagios nagios 4096 Feb 3 02:04 /var/lib/nagios

Note the setgid, to ensure that cron jobs running as root which drop files in that dir make files that are readable in the dir by the nagios user, even if the machine has been locked down to remove the r-x permission on the dir (see LP:#1904045).

Joe Guo (guoqiao)
Changed in charm-nrpe:
status: New → In Progress
importance: Undecided → Critical
assignee: nobody → Joe Guo (guoqiao)
milestone: none → 21.04
Joe Guo (guoqiao)
Changed in charm-nrpe:
status: In Progress → Fix Committed
Revision history for this message
Garrett Neugent (thogarre) wrote :

Moving this bug back to New, as this is still occurring. While adding hw-health to a new cloud this week, I observed the same symptoms, with /var/lib/nagios is owned by root, and permissions denied errors.

Running as nagios for the cron job:

$ sudo -u nagios /usr/local/lib/nagios/plugins/cron_ipmi_sensors.py

yields:

Cannot write output file /var/lib/nagios/ipmi_sensors.out.tmp, error [Errno 13] Permission denied: '/var/lib/nagios/ipmi_sensors.out.tmp'

and after adding the file and granting permissions, re-running shows this error:

$ sudo -u nagios /usr/local/lib/nagios/plugins/cron_ipmi_sensors.py
    Traceback (most recent call last):
    File "/usr/local/lib/nagios/plugins/cron_ipmi_sensors.py", line 56, in gather_metrics
 write_output_file(output)
    File "/usr/local/lib/nagios/plugins/cron_ipmi_sensors.py", line 27, in write_output_file
 os.rename(TMP_OUTPUT_FILE, OUTPUT_FILE)
    PermissionError: [Errno 13] Permission denied: '/var/lib/nagios/ipmi_sensors.out.tmp' -> '/var/lib/nagios/ipmi_sensors.out'

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
    File "/usr/local/lib/nagios/plugins/cron_ipmi_sensors.py", line 68, in <module>
 gather_metrics()
    File "/usr/local/lib/nagios/plugins/cron_ipmi_sensors.py", line 61, in gather_metrics
 write_output_file("UNKNOWN: {}".format(error))
    File "/usr/local/lib/nagios/plugins/cron_ipmi_sensors.py", line 27, in write_output_file
 os.rename(TMP_OUTPUT_FILE, OUTPUT_FILE)

the directory itself also needs write permissions, so in total as a workaround:

$ sudo touch /var/lib/nagios/ipmi_sensors.out
$ sudo touch /var/lib/nagios/ipmi_sensors.out.tmp
$ sudo chmod 777 /var/lib/nagios/ipmi_sensors*
$ sudo chmod 777 /var/lib/nagios

Changed in charm-hw-health:
status: In Progress → New
Revision history for this message
Garrett Neugent (thogarre) wrote :

I misunderstood, the fix has been committed, but not yet released (hence me still seeing the issue). I've moved the status back to Fix Committed

Changed in charm-hw-health:
status: New → Fix Committed
Celia Wang (ziyiwang)
Changed in charm-hw-health:
milestone: 20.05 → 21.04
Celia Wang (ziyiwang)
Changed in charm-nrpe:
status: Fix Committed → Fix Released
Changed in charm-hw-health:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.