Merge lp:~brad-marshall/charms/trusty/glance/add-nrpe-checks into lp:~openstack-charmers-archive/charms/trusty/glance/trunk

Proposed by Brad Marshall
Status: Superseded
Proposed branch: lp:~brad-marshall/charms/trusty/glance/add-nrpe-checks
Merge into: lp:~openstack-charmers-archive/charms/trusty/glance/trunk
Diff against target: 969 lines (+853/-0)
10 files modified
charm-helpers-hooks.yaml (+1/-0)
config.yaml (+11/-0)
files/nrpe-external-master/check_exit_status.pl (+189/-0)
files/nrpe-external-master/check_status_file.py (+60/-0)
files/nrpe-external-master/check_upstart_job (+72/-0)
files/nrpe-external-master/nagios_plugin.py (+78/-0)
hooks/charmhelpers/contrib/charmsupport/nrpe.py (+222/-0)
hooks/charmhelpers/contrib/charmsupport/volumes.py (+156/-0)
hooks/glance_relations.py (+61/-0)
metadata.yaml (+3/-0)
To merge this branch: bzr merge lp:~brad-marshall/charms/trusty/glance/add-nrpe-checks
Reviewer Review Type Date Requested Status
Liam Young (community) Needs Fixing
Review via email: mp+241494@code.launchpad.net

This proposal has been superseded by a proposal from 2014-11-17.

Description of the change

Adds nrpe-external-master interface and adds basic nrpe checks.

To post a comment you must log in.
Revision history for this message
Ryan Beisner (1chb1n) wrote :

UOSCI bot says:
charm_lint_check #989 trusty-glance for brad-marshall mp241494
    LINT FAIL: lint-test failed

LINT Results (max last 5 lines):
  hooks/glance_relations.py:476:18: E251 unexpected spaces around keyword / parameter equals
  hooks/glance_relations.py:476:20: E251 unexpected spaces around keyword / parameter equals
  hooks/glance_relations.py:481:18: E251 unexpected spaces around keyword / parameter equals
  hooks/glance_relations.py:481:20: E251 unexpected spaces around keyword / parameter equals
  make: *** [lint] Error 1

Full lint test output: http://paste.ubuntu.com/8955744/
Build: http://10.98.191.181:8080/job/charm_lint_check/989/

Revision history for this message
Ryan Beisner (1chb1n) wrote :

UOSCI bot says:
charm_unit_test #824 trusty-glance for brad-marshall mp241494
    UNIT FAIL: unit-test failed

UNIT Results (max last 5 lines):
  hooks/glance_utils 91 8 91% 156, 249-261
  TOTAL 369 39 89%
  Ran 65 tests in 3.551s
  FAILED (errors=3)
  make: *** [unit_test] Error 1

Full unit test output: http://paste.ubuntu.com/8955748/
Build: http://10.98.191.181:8080/job/charm_unit_test/824/

Revision history for this message
Ryan Beisner (1chb1n) wrote :

UOSCI bot says:
charm_amulet_test #369 trusty-glance for brad-marshall mp241494
    AMULET FAIL: amulet-test failed

AMULET Results (max last 5 lines):
  juju-test.conductor DEBUG : Calling "juju destroy-environment -y osci-sv04"
  WARNING cannot delete security group "juju-osci-sv04-0". Used by another environment?
  juju-test INFO : Results: 1 passed, 2 failed, 0 errored
  ERROR subprocess encountered error code 2
  make: *** [test] Error 2

Full amulet test output: http://paste.ubuntu.com/8955868/
Build: http://10.98.191.181:8080/job/charm_amulet_test/369/

Revision history for this message
Liam Young (gnuoy) wrote :

Thanks again for adding the nagios checks.

Please could check_upstart_job go into charm helpers. Also it'd be nice (but I wouldn't block on it) if glance_utils.services was used to get a list of services to add to the nagios check

review: Needs Fixing
84. By Brad Marshall

[bradm] Fixes from pep8 run, added sysvinit daemon services monitoring, use services() to get services to monitor

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

UOSCI bot says:
charm_lint_check #1081 trusty-glance for brad-marshall mp241494
    LINT OK: passed

LINT Results (max last 5 lines):
  I: config.yaml: option ssl_ca has no default value
  I: config.yaml: option ssl_cert has no default value
  I: config.yaml: option os-internal-network has no default value
  I: config.yaml: option os-public-network has no default value
  OK

Full lint test output: http://paste.ubuntu.com/9051422/
Build: http://10.98.191.181:8080/job/charm_lint_check/1081/

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

UOSCI bot says:
charm_unit_test #915 trusty-glance for brad-marshall mp241494
    UNIT FAIL: unit-test failed

UNIT Results (max last 5 lines):
  hooks/glance_utils 91 8 91% 156, 249-261
  TOTAL 382 51 87%
  Ran 65 tests in 3.421s
  FAILED (errors=3)
  make: *** [unit_test] Error 1

Full unit test output: http://paste.ubuntu.com/9051423/
Build: http://10.98.191.181:8080/job/charm_unit_test/915/

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

UOSCI bot says:
charm_amulet_test #423 trusty-glance for brad-marshall mp241494
    AMULET FAIL: amulet-test failed

AMULET Results (max last 5 lines):
  juju-test.conductor DEBUG : Calling "juju destroy-environment -y osci-sv07"
  WARNING cannot delete security group "juju-osci-sv07-0". Used by another environment?
  juju-test INFO : Results: 1 passed, 2 failed, 0 errored
  ERROR subprocess encountered error code 2
  make: *** [test] Error 2

Full amulet test output: http://paste.ubuntu.com/9051445/
Build: http://10.98.191.181:8080/job/charm_amulet_test/423/

85. By Brad Marshall

[bradm] Removed puppet header from nagios_plugin module

86. By Brad Marshall

[bradm] Removed nagios check files that were moved to nrpe-external-master charm

Unmerged revisions

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'charm-helpers-hooks.yaml'
2--- charm-helpers-hooks.yaml 2014-10-01 22:14:32 +0000
3+++ charm-helpers-hooks.yaml 2014-11-17 02:32:28 +0000
4@@ -8,3 +8,4 @@
5 - contrib.storage.linux.ceph
6 - payload.execd
7 - contrib.network.ip
8+ - contrib.charmsupport
9
10=== modified file 'config.yaml'
11--- config.yaml 2014-10-08 10:40:04 +0000
12+++ config.yaml 2014-11-17 02:32:28 +0000
13@@ -153,3 +153,14 @@
14 The CPU core multiplier to use when configuring worker processes for
15 Glance. By default, the number of workers for each daemon is set to
16 twice the number of CPU cores a service unit has.
17+ nagios_context:
18+ default: "juju"
19+ type: string
20+ description: |
21+ Used by the nrpe-external-master subordinate charm.
22+ A string that will be prepended to instance name to set the host name
23+ in nagios. So for instance the hostname would be something like:
24+ juju-myservice-0
25+ If you're running multiple environments with the same services in them
26+ this allows you to differentiate between them.
27+
28
29=== added directory 'files'
30=== added directory 'files/nrpe-external-master'
31=== added file 'files/nrpe-external-master/check_exit_status.pl'
32--- files/nrpe-external-master/check_exit_status.pl 1970-01-01 00:00:00 +0000
33+++ files/nrpe-external-master/check_exit_status.pl 2014-11-17 02:32:28 +0000
34@@ -0,0 +1,189 @@
35+#!/usr/bin/perl
36+################################################################################
37+# #
38+# Copyright (C) 2011 Chad Columbus <ccolumbu@hotmail.com> #
39+# #
40+# This program is free software; you can redistribute it and/or modify #
41+# it under the terms of the GNU General Public License as published by #
42+# the Free Software Foundation; either version 2 of the License, or #
43+# (at your option) any later version. #
44+# #
45+# This program is distributed in the hope that it will be useful, #
46+# but WITHOUT ANY WARRANTY; without even the implied warranty of #
47+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the #
48+# GNU General Public License for more details. #
49+# #
50+# You should have received a copy of the GNU General Public License #
51+# along with this program; if not, write to the Free Software #
52+# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA #
53+# #
54+################################################################################
55+
56+use strict;
57+use Getopt::Std;
58+$| = 1;
59+
60+my %opts;
61+getopts('heronp:s:', \%opts);
62+
63+my $VERSION = "Version 1.0";
64+my $AUTHOR = '(c) 2011 Chad Columbus <ccolumbu@hotmail.com>';
65+
66+# Default values:
67+my $script_to_check;
68+my $pattern = 'is running';
69+my $cmd;
70+my $message;
71+my $error;
72+
73+# Exit codes
74+my $STATE_OK = 0;
75+my $STATE_WARNING = 1;
76+my $STATE_CRITICAL = 2;
77+my $STATE_UNKNOWN = 3;
78+
79+# Parse command line options
80+if ($opts{'h'} || scalar(%opts) == 0) {
81+ &print_help();
82+ exit($STATE_OK);
83+}
84+
85+# Make sure scipt is provided:
86+if ($opts{'s'} eq '') {
87+ # Script to run not provided
88+ print "\nYou must provide a script to run. Example: -s /etc/init.d/httpd\n";
89+ exit($STATE_UNKNOWN);
90+} else {
91+ $script_to_check = $opts{'s'};
92+}
93+
94+# Make sure only a-z, 0-9, /, _, and - are used in the script.
95+if ($script_to_check =~ /[^a-z0-9\_\-\/\.]/) {
96+ # Script contains illegal characters exit.
97+ print "\nScript to check can only contain Letters, Numbers, Periods, Underscores, Hyphens, and/or Slashes\n";
98+ exit($STATE_UNKNOWN);
99+}
100+
101+# See if script is executable
102+if (! -x "$script_to_check") {
103+ print "\nIt appears you can't execute $script_to_check, $!\n";
104+ exit($STATE_UNKNOWN);
105+}
106+
107+# If a pattern is provided use it:
108+if ($opts{'p'} ne '') {
109+ $pattern = $opts{'p'};
110+}
111+
112+# If -r run command via sudo as root:
113+if ($opts{'r'}) {
114+ $cmd = "sudo -n $script_to_check status" . ' 2>&1';
115+} else {
116+ $cmd = "$script_to_check status" . ' 2>&1';
117+}
118+
119+my $cmd_result = `$cmd`;
120+chomp($cmd_result);
121+if ($cmd_result =~ /sudo/i) {
122+ # This means it could not run the sudo command
123+ $message = "$script_to_check CRITICAL - Could not run: 'sudo -n $script_to_check status'. Result is $cmd_result";
124+ $error = $STATE_UNKNOWN;
125+} else {
126+ # Check exitstatus instead of output:
127+ if ($opts{'e'} == 1) {
128+ if ($? != 0) {
129+ # error
130+ $message = "$script_to_check CRITICAL - Exit code: $?\.";
131+ if ($opts{'o'} == 0) {
132+ $message .= " $cmd_result";
133+ }
134+ $error = $STATE_CRITICAL;
135+ } else {
136+ # success
137+ $message = "$script_to_check OK - Exit code: $?\.";
138+ if ($opts{'o'} == 0) {
139+ $message .= " $cmd_result";
140+ }
141+ $error = $STATE_OK;
142+ }
143+ } else {
144+ my $not_check = 1;
145+ if ($opts{'n'} == 1) {
146+ $not_check = 0;
147+ }
148+ if (($cmd_result =~ /$pattern/i) == $not_check) {
149+ $message = "$script_to_check OK";
150+ if ($opts{'o'} == 0) {
151+ $message .= " - $cmd_result";
152+ }
153+ $error = $STATE_OK;
154+ } else {
155+ $message = "$script_to_check CRITICAL";
156+ if ($opts{'o'} == 0) {
157+ $message .= " - $cmd_result";
158+ }
159+ $error = $STATE_CRITICAL;
160+ }
161+ }
162+}
163+
164+if ($message eq '') {
165+ print "Error: program failed in an unknown way\n";
166+ exit($STATE_UNKNOWN);
167+}
168+
169+if ($error) {
170+ print "$message\n";
171+ exit($error);
172+} else {
173+ # If we get here we are OK
174+ print "$message\n";
175+ exit($STATE_OK);
176+}
177+
178+####################################
179+# Start Subs:
180+####################################
181+sub print_help() {
182+ print << "EOF";
183+Check the output or exit status of a script.
184+$VERSION
185+$AUTHOR
186+
187+Options:
188+-h
189+ Print detailed help screen
190+
191+-s
192+ 'FULL PATH TO SCRIPT' (required)
193+ This is the script to run, the script is designed to run scripts in the
194+ /etc/init.d dir (but can run any script) and will call the script with
195+ a 'status' argument. So if you use another script make sure it will
196+ work with /path/script status, example: /etc/init.d/httpd status
197+
198+-e
199+ This is the "exitstaus" flag, it means check the exit status
200+ code instead of looking for a pattern in the output of the script.
201+
202+-p 'REGEX'
203+ This is a pattern to look for in the output of the script to confirm it
204+ is running, default is 'is running', but not all init.d scripts output
205+ (iptables), so you can specify an arbitrary pattern.
206+ All patterns are case insensitive.
207+
208+-n
209+ This is the "NOT" flag, it means not the -p pattern, so if you want to
210+ make sure the output of the script does NOT contain -p 'REGEX'
211+
212+-r
213+ This is the "ROOT" flag, it means run as root via sudo. You will need a
214+ line in your /etc/sudoers file like:
215+ nagios ALL=(root) NOPASSWD: /etc/init.d/* status
216+
217+-o
218+ This is the "SUPPRESS OUTPUT" flag. Some programs have a long output
219+ (like iptables), this flag suppresses that output so it is not printed
220+ as a part of the nagios message.
221+EOF
222+}
223+
224
225=== added file 'files/nrpe-external-master/check_status_file.py'
226--- files/nrpe-external-master/check_status_file.py 1970-01-01 00:00:00 +0000
227+++ files/nrpe-external-master/check_status_file.py 2014-11-17 02:32:28 +0000
228@@ -0,0 +1,60 @@
229+#!/usr/bin/python
230+
231+# m
232+# mmmm m m mmmm mmmm mmm mm#mm
233+# #" "# # # #" "# #" "# #" # #
234+# # # # # # # # # #"""" #
235+# ##m#" "mm"# ##m#" ##m#" "#mm" "mm
236+# # # #
237+# " " "
238+# This file is managed by puppet. Do not make local changes.
239+
240+#
241+# Copyright 2014 Canonical Ltd.
242+#
243+# Author: Jacek Nykis <jacek.nykis@canonical.com>
244+#
245+
246+import re
247+import nagios_plugin
248+
249+
250+def parse_args():
251+ import argparse
252+
253+ parser = argparse.ArgumentParser(
254+ description='Read file and return nagios status based on its content',
255+ formatter_class=argparse.ArgumentDefaultsHelpFormatter)
256+ parser.add_argument('-f', '--status-file', required=True,
257+ help='Status file path')
258+ parser.add_argument('-c', '--critical-text', default='CRITICAL',
259+ help='String indicating critical status')
260+ parser.add_argument('-w', '--warning-text', default='WARNING',
261+ help='String indicating warning status')
262+ parser.add_argument('-o', '--ok-text', default='OK',
263+ help='String indicating OK status')
264+ parser.add_argument('-u', '--unknown-text', default='UNKNOWN',
265+ help='String indicating unknown status')
266+ return parser.parse_args()
267+
268+
269+def check_status(args):
270+ nagios_plugin.check_file_freshness(args.status_file, 43200)
271+
272+ with open(args.status_file, "r") as f:
273+ content = [l.strip() for l in f.readlines()]
274+
275+ for line in content:
276+ if re.search(args.critical_text, line):
277+ raise nagios_plugin.CriticalError(line)
278+ elif re.search(args.warning_text, line):
279+ raise nagios_plugin.WarnError(line)
280+ elif re.search(args.unknown_text, line):
281+ raise nagios_plugin.UnknownError(line)
282+ else:
283+ print line
284+
285+
286+if __name__ == '__main__':
287+ args = parse_args()
288+ nagios_plugin.try_check(check_status, args)
289
290=== added file 'files/nrpe-external-master/check_upstart_job'
291--- files/nrpe-external-master/check_upstart_job 1970-01-01 00:00:00 +0000
292+++ files/nrpe-external-master/check_upstart_job 2014-11-17 02:32:28 +0000
293@@ -0,0 +1,72 @@
294+#!/usr/bin/python
295+
296+#
297+# Copyright 2012, 2013 Canonical Ltd.
298+#
299+# Author: Paul Collins <paul.collins@canonical.com>
300+#
301+# Based on http://www.eurion.net/python-snippets/snippet/Upstart%20service%20status.html
302+#
303+
304+import sys
305+
306+import dbus
307+
308+
309+class Upstart(object):
310+ def __init__(self):
311+ self._bus = dbus.SystemBus()
312+ self._upstart = self._bus.get_object('com.ubuntu.Upstart',
313+ '/com/ubuntu/Upstart')
314+ def get_job(self, job_name):
315+ path = self._upstart.GetJobByName(job_name,
316+ dbus_interface='com.ubuntu.Upstart0_6')
317+ return self._bus.get_object('com.ubuntu.Upstart', path)
318+
319+ def get_properties(self, job):
320+ path = job.GetInstance([], dbus_interface='com.ubuntu.Upstart0_6.Job')
321+ instance = self._bus.get_object('com.ubuntu.Upstart', path)
322+ return instance.GetAll('com.ubuntu.Upstart0_6.Instance',
323+ dbus_interface=dbus.PROPERTIES_IFACE)
324+
325+ def get_job_instances(self, job_name):
326+ job = self.get_job(job_name)
327+ paths = job.GetAllInstances([], dbus_interface='com.ubuntu.Upstart0_6.Job')
328+ return [self._bus.get_object('com.ubuntu.Upstart', path) for path in paths]
329+
330+ def get_job_instance_properties(self, job):
331+ return job.GetAll('com.ubuntu.Upstart0_6.Instance',
332+ dbus_interface=dbus.PROPERTIES_IFACE)
333+
334+try:
335+ upstart = Upstart()
336+ try:
337+ job = upstart.get_job(sys.argv[1])
338+ props = upstart.get_properties(job)
339+
340+ if props['state'] == 'running':
341+ print 'OK: %s is running' % sys.argv[1]
342+ sys.exit(0)
343+ else:
344+ print 'CRITICAL: %s is not running' % sys.argv[1]
345+ sys.exit(2)
346+
347+ except dbus.DBusException as e:
348+ instances = upstart.get_job_instances(sys.argv[1])
349+ propses = [upstart.get_job_instance_properties(instance) for instance in instances]
350+ states = dict([(props['name'], props['state']) for props in propses])
351+ if len(states) != states.values().count('running'):
352+ not_running = []
353+ for name in states.keys():
354+ if states[name] != 'running':
355+ not_running.append(name)
356+ print 'CRITICAL: %d instances of %s not running: %s' % \
357+ (len(not_running), sys.argv[1], not_running.join(', '))
358+ sys.exit(2)
359+ else:
360+ print 'OK: %d instances of %s running' % (len(states), sys.argv[1])
361+
362+except dbus.DBusException as e:
363+ print 'CRITICAL: failed to get properties of \'%s\' from upstart' % sys.argv[1]
364+ sys.exit(2)
365+
366
367=== added file 'files/nrpe-external-master/nagios_plugin.py'
368--- files/nrpe-external-master/nagios_plugin.py 1970-01-01 00:00:00 +0000
369+++ files/nrpe-external-master/nagios_plugin.py 2014-11-17 02:32:28 +0000
370@@ -0,0 +1,78 @@
371+#!/usr/bin/env python
372+# m
373+# mmmm m m mmmm mmmm mmm mm#mm
374+# #" "# # # #" "# #" "# #" # #
375+# # # # # # # # # #"""" #
376+# ##m#" "mm"# ##m#" ##m#" "#mm" "mm
377+# # # #
378+# " " "
379+# This file is managed by puppet. Do not make local changes.
380+
381+# Copyright (C) 2005, 2006, 2007, 2012 James Troup <james.troup@canonical.com>
382+
383+import os
384+import stat
385+import time
386+import traceback
387+import sys
388+
389+
390+################################################################################
391+
392+class CriticalError(Exception):
393+ """This indicates a critical error."""
394+ pass
395+
396+
397+class WarnError(Exception):
398+ """This indicates a warning condition."""
399+ pass
400+
401+
402+class UnknownError(Exception):
403+ """This indicates a unknown error was encountered."""
404+ pass
405+
406+
407+def try_check(function, *args, **kwargs):
408+ """Perform a check with error/warn/unknown handling."""
409+ try:
410+ function(*args, **kwargs)
411+ except UnknownError, msg:
412+ print msg
413+ sys.exit(3)
414+ except CriticalError, msg:
415+ print msg
416+ sys.exit(2)
417+ except WarnError, msg:
418+ print msg
419+ sys.exit(1)
420+ except:
421+ print "%s raised unknown exception '%s'" % (function, sys.exc_info()[0])
422+ print '=' * 60
423+ traceback.print_exc(file=sys.stdout)
424+ print '=' * 60
425+ sys.exit(3)
426+
427+
428+################################################################################
429+
430+def check_file_freshness(filename, newer_than=600):
431+ """Check a file exists, is readable and is newer than <n> seconds (where <n> defaults to 600)."""
432+ # First check the file exists and is readable
433+ if not os.path.exists(filename):
434+ raise CriticalError("%s: does not exist." % (filename))
435+ if os.access(filename, os.R_OK) == 0:
436+ raise CriticalError("%s: is not readable." % (filename))
437+
438+ # Then ensure the file is up-to-date enough
439+ mtime = os.stat(filename)[stat.ST_MTIME]
440+ last_modified = time.time() - mtime
441+ if last_modified > newer_than:
442+ raise CriticalError("%s: was last modified on %s and is too old (> %s seconds)."
443+ % (filename, time.ctime(mtime), newer_than))
444+ if last_modified < 0:
445+ raise CriticalError("%s: was last modified on %s which is in the future."
446+ % (filename, time.ctime(mtime)))
447+
448+################################################################################
449
450=== added directory 'hooks/charmhelpers/contrib/charmsupport'
451=== added file 'hooks/charmhelpers/contrib/charmsupport/__init__.py'
452=== added file 'hooks/charmhelpers/contrib/charmsupport/nrpe.py'
453--- hooks/charmhelpers/contrib/charmsupport/nrpe.py 1970-01-01 00:00:00 +0000
454+++ hooks/charmhelpers/contrib/charmsupport/nrpe.py 2014-11-17 02:32:28 +0000
455@@ -0,0 +1,222 @@
456+"""Compatibility with the nrpe-external-master charm"""
457+# Copyright 2012 Canonical Ltd.
458+#
459+# Authors:
460+# Matthew Wedgwood <matthew.wedgwood@canonical.com>
461+
462+import subprocess
463+import pwd
464+import grp
465+import os
466+import re
467+import shlex
468+import yaml
469+
470+from charmhelpers.core.hookenv import (
471+ config,
472+ local_unit,
473+ log,
474+ relation_ids,
475+ relation_set,
476+)
477+
478+from charmhelpers.core.host import service
479+
480+# This module adds compatibility with the nrpe-external-master and plain nrpe
481+# subordinate charms. To use it in your charm:
482+#
483+# 1. Update metadata.yaml
484+#
485+# provides:
486+# (...)
487+# nrpe-external-master:
488+# interface: nrpe-external-master
489+# scope: container
490+#
491+# and/or
492+#
493+# provides:
494+# (...)
495+# local-monitors:
496+# interface: local-monitors
497+# scope: container
498+
499+#
500+# 2. Add the following to config.yaml
501+#
502+# nagios_context:
503+# default: "juju"
504+# type: string
505+# description: |
506+# Used by the nrpe subordinate charms.
507+# A string that will be prepended to instance name to set the host name
508+# in nagios. So for instance the hostname would be something like:
509+# juju-myservice-0
510+# If you're running multiple environments with the same services in them
511+# this allows you to differentiate between them.
512+#
513+# 3. Add custom checks (Nagios plugins) to files/nrpe-external-master
514+#
515+# 4. Update your hooks.py with something like this:
516+#
517+# from charmsupport.nrpe import NRPE
518+# (...)
519+# def update_nrpe_config():
520+# nrpe_compat = NRPE()
521+# nrpe_compat.add_check(
522+# shortname = "myservice",
523+# description = "Check MyService",
524+# check_cmd = "check_http -w 2 -c 10 http://localhost"
525+# )
526+# nrpe_compat.add_check(
527+# "myservice_other",
528+# "Check for widget failures",
529+# check_cmd = "/srv/myapp/scripts/widget_check"
530+# )
531+# nrpe_compat.write()
532+#
533+# def config_changed():
534+# (...)
535+# update_nrpe_config()
536+#
537+# def nrpe_external_master_relation_changed():
538+# update_nrpe_config()
539+#
540+# def local_monitors_relation_changed():
541+# update_nrpe_config()
542+#
543+# 5. ln -s hooks.py nrpe-external-master-relation-changed
544+# ln -s hooks.py local-monitors-relation-changed
545+
546+
547+class CheckException(Exception):
548+ pass
549+
550+
551+class Check(object):
552+ shortname_re = '[A-Za-z0-9-_]+$'
553+ service_template = ("""
554+#---------------------------------------------------
555+# This file is Juju managed
556+#---------------------------------------------------
557+define service {{
558+ use active-service
559+ host_name {nagios_hostname}
560+ service_description {nagios_hostname}[{shortname}] """
561+ """{description}
562+ check_command check_nrpe!{command}
563+ servicegroups {nagios_servicegroup}
564+}}
565+""")
566+
567+ def __init__(self, shortname, description, check_cmd):
568+ super(Check, self).__init__()
569+ # XXX: could be better to calculate this from the service name
570+ if not re.match(self.shortname_re, shortname):
571+ raise CheckException("shortname must match {}".format(
572+ Check.shortname_re))
573+ self.shortname = shortname
574+ self.command = "check_{}".format(shortname)
575+ # Note: a set of invalid characters is defined by the
576+ # Nagios server config
577+ # The default is: illegal_object_name_chars=`~!$%^&*"|'<>?,()=
578+ self.description = description
579+ self.check_cmd = self._locate_cmd(check_cmd)
580+
581+ def _locate_cmd(self, check_cmd):
582+ search_path = (
583+ '/',
584+ os.path.join(os.environ['CHARM_DIR'],
585+ 'files/nrpe-external-master'),
586+ '/usr/lib/nagios/plugins',
587+ '/usr/local/lib/nagios/plugins',
588+ )
589+ parts = shlex.split(check_cmd)
590+ for path in search_path:
591+ if os.path.exists(os.path.join(path, parts[0])):
592+ command = os.path.join(path, parts[0])
593+ if len(parts) > 1:
594+ command += " " + " ".join(parts[1:])
595+ return command
596+ log('Check command not found: {}'.format(parts[0]))
597+ return ''
598+
599+ def write(self, nagios_context, hostname):
600+ nrpe_check_file = '/etc/nagios/nrpe.d/{}.cfg'.format(
601+ self.command)
602+ with open(nrpe_check_file, 'w') as nrpe_check_config:
603+ nrpe_check_config.write("# check {}\n".format(self.shortname))
604+ nrpe_check_config.write("command[{}]={}\n".format(
605+ self.command, self.check_cmd))
606+
607+ if not os.path.exists(NRPE.nagios_exportdir):
608+ log('Not writing service config as {} is not accessible'.format(
609+ NRPE.nagios_exportdir))
610+ else:
611+ self.write_service_config(nagios_context, hostname)
612+
613+ def write_service_config(self, nagios_context, hostname):
614+ for f in os.listdir(NRPE.nagios_exportdir):
615+ if re.search('.*{}.cfg'.format(self.command), f):
616+ os.remove(os.path.join(NRPE.nagios_exportdir, f))
617+
618+ templ_vars = {
619+ 'nagios_hostname': hostname,
620+ 'nagios_servicegroup': nagios_context,
621+ 'description': self.description,
622+ 'shortname': self.shortname,
623+ 'command': self.command,
624+ }
625+ nrpe_service_text = Check.service_template.format(**templ_vars)
626+ nrpe_service_file = '{}/service__{}_{}.cfg'.format(
627+ NRPE.nagios_exportdir, hostname, self.command)
628+ with open(nrpe_service_file, 'w') as nrpe_service_config:
629+ nrpe_service_config.write(str(nrpe_service_text))
630+
631+ def run(self):
632+ subprocess.call(self.check_cmd)
633+
634+
635+class NRPE(object):
636+ nagios_logdir = '/var/log/nagios'
637+ nagios_exportdir = '/var/lib/nagios/export'
638+ nrpe_confdir = '/etc/nagios/nrpe.d'
639+
640+ def __init__(self, hostname=None):
641+ super(NRPE, self).__init__()
642+ self.config = config()
643+ self.nagios_context = self.config['nagios_context']
644+ self.unit_name = local_unit().replace('/', '-')
645+ if hostname:
646+ self.hostname = hostname
647+ else:
648+ self.hostname = "{}-{}".format(self.nagios_context, self.unit_name)
649+ self.checks = []
650+
651+ def add_check(self, *args, **kwargs):
652+ self.checks.append(Check(*args, **kwargs))
653+
654+ def write(self):
655+ try:
656+ nagios_uid = pwd.getpwnam('nagios').pw_uid
657+ nagios_gid = grp.getgrnam('nagios').gr_gid
658+ except:
659+ log("Nagios user not set up, nrpe checks not updated")
660+ return
661+
662+ if not os.path.exists(NRPE.nagios_logdir):
663+ os.mkdir(NRPE.nagios_logdir)
664+ os.chown(NRPE.nagios_logdir, nagios_uid, nagios_gid)
665+
666+ nrpe_monitors = {}
667+ monitors = {"monitors": {"remote": {"nrpe": nrpe_monitors}}}
668+ for nrpecheck in self.checks:
669+ nrpecheck.write(self.nagios_context, self.hostname)
670+ nrpe_monitors[nrpecheck.shortname] = {
671+ "command": nrpecheck.command,
672+ }
673+
674+ service('restart', 'nagios-nrpe-server')
675+
676+ for rid in relation_ids("local-monitors"):
677+ relation_set(relation_id=rid, monitors=yaml.dump(monitors))
678
679=== added file 'hooks/charmhelpers/contrib/charmsupport/volumes.py'
680--- hooks/charmhelpers/contrib/charmsupport/volumes.py 1970-01-01 00:00:00 +0000
681+++ hooks/charmhelpers/contrib/charmsupport/volumes.py 2014-11-17 02:32:28 +0000
682@@ -0,0 +1,156 @@
683+'''
684+Functions for managing volumes in juju units. One volume is supported per unit.
685+Subordinates may have their own storage, provided it is on its own partition.
686+
687+Configuration stanzas:
688+ volume-ephemeral:
689+ type: boolean
690+ default: true
691+ description: >
692+ If false, a volume is mounted as sepecified in "volume-map"
693+ If true, ephemeral storage will be used, meaning that log data
694+ will only exist as long as the machine. YOU HAVE BEEN WARNED.
695+ volume-map:
696+ type: string
697+ default: {}
698+ description: >
699+ YAML map of units to device names, e.g:
700+ "{ rsyslog/0: /dev/vdb, rsyslog/1: /dev/vdb }"
701+ Service units will raise a configure-error if volume-ephemeral
702+ is 'true' and no volume-map value is set. Use 'juju set' to set a
703+ value and 'juju resolved' to complete configuration.
704+
705+Usage:
706+ from charmsupport.volumes import configure_volume, VolumeConfigurationError
707+ from charmsupport.hookenv import log, ERROR
708+ def post_mount_hook():
709+ stop_service('myservice')
710+ def post_mount_hook():
711+ start_service('myservice')
712+
713+ if __name__ == '__main__':
714+ try:
715+ configure_volume(before_change=pre_mount_hook,
716+ after_change=post_mount_hook)
717+ except VolumeConfigurationError:
718+ log('Storage could not be configured', ERROR)
719+'''
720+
721+# XXX: Known limitations
722+# - fstab is neither consulted nor updated
723+
724+import os
725+from charmhelpers.core import hookenv
726+from charmhelpers.core import host
727+import yaml
728+
729+
730+MOUNT_BASE = '/srv/juju/volumes'
731+
732+
733+class VolumeConfigurationError(Exception):
734+ '''Volume configuration data is missing or invalid'''
735+ pass
736+
737+
738+def get_config():
739+ '''Gather and sanity-check volume configuration data'''
740+ volume_config = {}
741+ config = hookenv.config()
742+
743+ errors = False
744+
745+ if config.get('volume-ephemeral') in (True, 'True', 'true', 'Yes', 'yes'):
746+ volume_config['ephemeral'] = True
747+ else:
748+ volume_config['ephemeral'] = False
749+
750+ try:
751+ volume_map = yaml.safe_load(config.get('volume-map', '{}'))
752+ except yaml.YAMLError as e:
753+ hookenv.log("Error parsing YAML volume-map: {}".format(e),
754+ hookenv.ERROR)
755+ errors = True
756+ if volume_map is None:
757+ # probably an empty string
758+ volume_map = {}
759+ elif not isinstance(volume_map, dict):
760+ hookenv.log("Volume-map should be a dictionary, not {}".format(
761+ type(volume_map)))
762+ errors = True
763+
764+ volume_config['device'] = volume_map.get(os.environ['JUJU_UNIT_NAME'])
765+ if volume_config['device'] and volume_config['ephemeral']:
766+ # asked for ephemeral storage but also defined a volume ID
767+ hookenv.log('A volume is defined for this unit, but ephemeral '
768+ 'storage was requested', hookenv.ERROR)
769+ errors = True
770+ elif not volume_config['device'] and not volume_config['ephemeral']:
771+ # asked for permanent storage but did not define volume ID
772+ hookenv.log('Ephemeral storage was requested, but there is no volume '
773+ 'defined for this unit.', hookenv.ERROR)
774+ errors = True
775+
776+ unit_mount_name = hookenv.local_unit().replace('/', '-')
777+ volume_config['mountpoint'] = os.path.join(MOUNT_BASE, unit_mount_name)
778+
779+ if errors:
780+ return None
781+ return volume_config
782+
783+
784+def mount_volume(config):
785+ if os.path.exists(config['mountpoint']):
786+ if not os.path.isdir(config['mountpoint']):
787+ hookenv.log('Not a directory: {}'.format(config['mountpoint']))
788+ raise VolumeConfigurationError()
789+ else:
790+ host.mkdir(config['mountpoint'])
791+ if os.path.ismount(config['mountpoint']):
792+ unmount_volume(config)
793+ if not host.mount(config['device'], config['mountpoint'], persist=True):
794+ raise VolumeConfigurationError()
795+
796+
797+def unmount_volume(config):
798+ if os.path.ismount(config['mountpoint']):
799+ if not host.umount(config['mountpoint'], persist=True):
800+ raise VolumeConfigurationError()
801+
802+
803+def managed_mounts():
804+ '''List of all mounted managed volumes'''
805+ return filter(lambda mount: mount[0].startswith(MOUNT_BASE), host.mounts())
806+
807+
808+def configure_volume(before_change=lambda: None, after_change=lambda: None):
809+ '''Set up storage (or don't) according to the charm's volume configuration.
810+ Returns the mount point or "ephemeral". before_change and after_change
811+ are optional functions to be called if the volume configuration changes.
812+ '''
813+
814+ config = get_config()
815+ if not config:
816+ hookenv.log('Failed to read volume configuration', hookenv.CRITICAL)
817+ raise VolumeConfigurationError()
818+
819+ if config['ephemeral']:
820+ if os.path.ismount(config['mountpoint']):
821+ before_change()
822+ unmount_volume(config)
823+ after_change()
824+ return 'ephemeral'
825+ else:
826+ # persistent storage
827+ if os.path.ismount(config['mountpoint']):
828+ mounts = dict(managed_mounts())
829+ if mounts.get(config['mountpoint']) != config['device']:
830+ before_change()
831+ unmount_volume(config)
832+ mount_volume(config)
833+ after_change()
834+ else:
835+ before_change()
836+ mount_volume(config)
837+ after_change()
838+ return config['mountpoint']
839
840=== modified file 'hooks/glance_relations.py'
841--- hooks/glance_relations.py 2014-10-01 22:14:32 +0000
842+++ hooks/glance_relations.py 2014-11-17 02:32:28 +0000
843@@ -1,5 +1,6 @@
844 #!/usr/bin/python
845 import sys
846+import os
847
848 from glance_utils import (
849 do_openstack_upgrade,
850@@ -7,6 +8,7 @@
851 migrate_database,
852 register_configs,
853 restart_map,
854+ services,
855 CLUSTER_RES,
856 PACKAGES,
857 SERVICES,
858@@ -30,6 +32,7 @@
859 relation_get,
860 relation_set,
861 relation_ids,
862+ relations_of_type,
863 service_name,
864 unit_get,
865 UnregisteredHookError, )
866@@ -73,6 +76,8 @@
867
868 from charmhelpers.contrib.openstack.context import ADDRESS_TYPES
869
870+from charmhelpers.contrib.charmsupport.nrpe import NRPE
871+
872 from subprocess import (
873 check_call,
874 call, )
875@@ -297,6 +302,8 @@
876 open_port(9292)
877 configure_https()
878
879+ update_nrpe_config()
880+
881 # Pickup and changes due to network reference architecture
882 # configuration
883 [keystone_joined(rid) for rid in relation_ids('identity-service')]
884@@ -334,6 +341,7 @@
885 def upgrade_charm():
886 apt_install(filter_installed_packages(PACKAGES), fatal=True)
887 configure_https()
888+ update_nrpe_config()
889 CONFIGS.write_all()
890
891
892@@ -446,6 +454,59 @@
893 return
894 CONFIGS.write(GLANCE_API_CONF)
895
896+
897+@hooks.hook('nrpe-external-master-relation-joined',
898+ 'nrpe-external-master-relation-changed')
899+def update_nrpe_config():
900+ # Find out if nrpe set nagios_hostname
901+ hostname = None
902+ host_context = None
903+ for rel in relations_of_type('nrpe-external-master'):
904+ if 'nagios_hostname' in rel:
905+ hostname = rel['nagios_hostname']
906+ host_context = rel['nagios_host_context']
907+ break
908+ nrpe = NRPE(hostname=hostname)
909+ apt_install('python-dbus')
910+
911+ if host_context:
912+ current_unit = "%s:%s" % (host_context, local_unit())
913+ else:
914+ current_unit = local_unit()
915+
916+ services_to_monitor = services()
917+
918+ for service in services_to_monitor:
919+ upstart_init = '/etc/init/%s.conf' % service
920+ sysv_init = '/etc/init.d/%s' % service
921+
922+ if os.path.exists(upstart_init):
923+ nrpe.add_check(
924+ shortname=service,
925+ description='process check {%s}' % current_unit,
926+ check_cmd='check_upstart_job %s' % service,
927+ )
928+ elif os.path.exists(sysv_init):
929+ cronpath = '/etc/cron.d/nagios-service-check-%s' % service
930+ checkpath = os.path.join(os.environ['CHARM_DIR'],
931+ 'files/nrpe-external-master',
932+ 'check_exit_status.pl'),
933+ cron_template = '*/5 * * * * root \
934+%s -s /etc/init.d/%s status > /var/lib/nagios/service-check-%s.txt\n' \
935+ % (checkpath[0], service, service)
936+ f = open(cronpath, 'w')
937+ f.write(cron_template)
938+ f.close()
939+ nrpe.add_check(
940+ shortname=service,
941+ description='process check {%s}' % current_unit,
942+ check_cmd='check_status_file.py -f \
943+ /var/lib/nagios/service-check-%s.txt' % service,
944+ )
945+
946+ nrpe.write()
947+
948+
949 if __name__ == '__main__':
950 try:
951 hooks.execute(sys.argv)
952
953=== added symlink 'hooks/nrpe-external-master-relation-changed'
954=== target is u'glance_relations.py'
955=== added symlink 'hooks/nrpe-external-master-relation-joined'
956=== target is u'glance_relations.py'
957=== modified file 'metadata.yaml'
958--- metadata.yaml 2014-09-11 07:05:30 +0000
959+++ metadata.yaml 2014-11-17 02:32:28 +0000
960@@ -9,6 +9,9 @@
961 categories:
962 - miscellaneous
963 provides:
964+ nrpe-external-master:
965+ interface: nrpe-external-master
966+ scope: container
967 image-service:
968 interface: glance
969 requires:

Subscribers

People subscribed via source and target branches