Merge lp:~jacekn/charms/precise/ceph/n-e-m into lp:~openstack-charmers-archive/charms/trusty/ceph/next

Proposed by James Page
Status: Rejected
Rejected by: James Page
Proposed branch: lp:~jacekn/charms/precise/ceph/n-e-m
Merge into: lp:~openstack-charmers-archive/charms/trusty/ceph/next
Diff against target: 616 lines (+500/-4)
9 files modified
charm-helpers-sync.yaml (+1/-0)
config.yaml (+10/-0)
files/nagios/check_ceph_status.py (+44/-0)
files/nagios/collect_ceph_status.sh (+18/-0)
hooks/charmhelpers/contrib/charmsupport/nrpe.py (+219/-0)
hooks/charmhelpers/contrib/charmsupport/volumes.py (+156/-0)
hooks/hooks.py (+47/-3)
metadata.yaml (+4/-0)
revision (+1/-1)
To merge this branch: bzr merge lp:~jacekn/charms/precise/ceph/n-e-m
Reviewer Review Type Date Requested Status
James Page Needs Fixing
Review via email: mp+221693@code.launchpad.net

This proposal supersedes a proposal from 2014-05-15.

Description of the change

This branch adds nrpe-external-master hook support and basic nagios check

To post a comment you must log in.
Revision history for this message
James Page (james-page) wrote :

See inline comment re using ceph python API.

review: Needs Fixing
Revision history for this message
Ryan Beisner (1chb1n) wrote :

UOSCI bot says:
This MP triggered a test on the Ubuntu OSCI system. Here is a summary of results.

#449 ceph-next for james-page mp221693
charm_unit_test

This build:
http://10.98.191.181:8080/job/charm_unit_test/449/

MP URL:
https://code.launchpad.net/~jacekn/charms/precise/ceph/n-e-m/+merge/221693

Proposed branch:
lp:~jacekn/charms/precise/ceph/n-e-m

Results summary:
    UNIT FAIL: unit-test missing

UNIT Results not found.

Ubuntu OSCI Jenkins is currently in development on a Canonical private network, but we plan to publish results to a public instance soon. Tests are triggered if the proposed branch rev changes, or if the MP is placed into "Needs review" status after being otherwise for >= 1hr. Human review of results is still recommended.
http://10.98.191.181:8080/

Revision history for this message
Ryan Beisner (1chb1n) wrote :

UOSCI bot says:
This MP triggered a test on the Ubuntu OSCI system. Here is a summary of results.

#643 ceph-next for james-page mp221693
charm_lint_check

This build:
http://10.98.191.181:8080/job/charm_lint_check/643/

MP URL:
https://code.launchpad.net/~jacekn/charms/precise/ceph/n-e-m/+merge/221693

Proposed branch:
lp:~jacekn/charms/precise/ceph/n-e-m

Results summary:
    LINT OK: believed to pass, but you should confirm results

LINT Results (max last 25 lines) from
/var/lib/jenkins/workspace/charm_lint_check/make-lint.643:
E: Unknown relation field in relation nrpe-external-master - (gets)
W: config.yaml: option osd-journal does not have the keys: default
W: config.yaml: option ephemeral-unmount does not have the keys: default
W: config.yaml: option monitor-secret does not have the keys: default
W: config.yaml: option osd-reformat does not have the keys: default
W: config.yaml: option key does not have the keys: default
W: config.yaml: option fsid does not have the keys: default

Ubuntu OSCI Jenkins is currently in development on a Canonical private network, but we plan to publish results to a public instance soon. Tests are triggered if the proposed branch rev changes, or if the MP is placed into "Needs review" status after being otherwise for >= 1hr. Human review of results is still recommended.
http://10.98.191.181:8080/

Revision history for this message
Ryan Beisner (1chb1n) wrote :

UOSCI bot says:
This MP triggered a test on the Ubuntu OSCI system. Here is a summary of results.

#222 ceph-next for james-page mp221693
charm_amulet_test

This build:
http://10.98.191.181:8080/job/charm_amulet_test/222/

MP URL:
https://code.launchpad.net/~jacekn/charms/precise/ceph/n-e-m/+merge/221693

Proposed branch:
lp:~jacekn/charms/precise/ceph/n-e-m

Results summary:
    AMULET FAIL: amulet-test missing

AMULET Results not found.

Ubuntu OSCI Jenkins is currently in development on a Canonical private network, but we plan to publish results to a public instance soon. Tests are triggered if the proposed branch rev changes, or if the MP is placed into "Needs review" status after being otherwise for >= 1hr. Human review of results is still recommended.
http://10.98.191.181:8080/

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_lint_check #4233 ceph-next for james-page mp221693
    LINT FAIL: lint-test failed

LINT Results (max last 2 lines):
make: *** [lint] Error 1
ERROR:root:Make target returned non-zero.

Full lint test output: http://paste.ubuntu.com/10965096/
Build: http://10.245.162.77:8080/job/charm_lint_check/4233/

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_unit_test #3976 ceph-next for james-page mp221693
    UNIT FAIL: unit-test missing

UNIT Results (max last 2 lines):
INFO:root:Search string not found in makefile target commands.
ERROR:root:No make target was executed.

Full unit test output: http://paste.ubuntu.com/10965292/
Build: http://10.245.162.77:8080/job/charm_unit_test/3976/

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_lint_check #4250 ceph-next for james-page mp221693
    LINT FAIL: lint-test failed
    LINT FAIL: charm-proof failed

LINT Results (max last 2 lines):
make: *** [lint] Error 200
ERROR:root:Make target returned non-zero.

Full lint test output: http://paste.ubuntu.com/10965293/
Build: http://10.245.162.77:8080/job/charm_lint_check/4250/

Revision history for this message
James Page (james-page) wrote :

Marking as rejected as has not had any activity in a long time

Unmerged revisions

73. By Jacek Nykis

Added nrpe-external-master hook support and nagios check

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'charm-helpers-sync.yaml'
2--- charm-helpers-sync.yaml 2014-03-25 17:03:59 +0000
3+++ charm-helpers-sync.yaml 2014-06-02 09:35:59 +0000
4@@ -7,3 +7,4 @@
5 - utils
6 - payload.execd
7 - contrib.openstack.alternatives
8+ - contrib.charmsupport
9
10=== modified file 'config.yaml'
11--- config.yaml 2014-03-25 18:44:22 +0000
12+++ config.yaml 2014-06-02 09:35:59 +0000
13@@ -115,3 +115,13 @@
14 default: False
15 description: |
16 If set to True, supporting services will log to syslog.
17+ nagios_context:
18+ default: "juju"
19+ type: string
20+ description: |
21+ Used by the nrpe-external-master subordinate charm.
22+ A string that will be prepended to instance name to set the host name
23+ in nagios. So for instance the hostname would be something like:
24+ juju-myservice-0
25+ If you're running multiple environments with the same services in them
26+ this allows you to differentiate between them.
27
28=== added directory 'files/nagios'
29=== added file 'files/nagios/check_ceph_status.py'
30--- files/nagios/check_ceph_status.py 1970-01-01 00:00:00 +0000
31+++ files/nagios/check_ceph_status.py 2014-06-02 09:35:59 +0000
32@@ -0,0 +1,44 @@
33+#!/usr/bin/env python
34+
35+# Copyright (C) 2014 Canonical
36+# All Rights Reserved
37+# Author: Jacek Nykis <jacek.nykis@canonical.com>
38+
39+import re
40+import argparse
41+import subprocess
42+import nagios_plugin
43+
44+
45+def check_ceph_status(args):
46+ if args.status_file:
47+ nagios_plugin.check_file_freshness(args.status_file, 3600)
48+ with open(args.status_file, "r") as f:
49+ lines = f.readlines()
50+ status_data = dict(l.strip().split(' ', 1) for l in lines if len(l) > 1)
51+ else:
52+ lines = subprocess.check_output(["ceph", "status"]).split('\n')
53+ status_data = dict(l.strip().split(' ', 1) for l in lines if len(l) > 1)
54+
55+ if ('health' not in status_data
56+ or 'monmap' not in status_data
57+ or 'osdmap'not in status_data):
58+ raise nagios_plugin.UnknownError('UNKNOWN: status data is incomplete')
59+
60+ if status_data['health'] != 'HEALTH_OK':
61+ msg = 'CRITICAL: ceph health status: "{}"'.format(status_data['health'])
62+ raise nagios_plugin.CriticalError(msg)
63+ osds = re.search("^.*: (\d+) osds: (\d+) up, (\d+) in", status_data['osdmap'])
64+ if osds.group(1) > osds.group(2): # not all OSDs are "up"
65+ msg = 'CRITICAL: Some OSDs are not up. Total: {}, up: {}'.format(
66+ osds.group(1), osds.group(2))
67+ raise nagios_plugin.CriticalError(msg)
68+ print "All OK"
69+
70+
71+if __name__ == '__main__':
72+ parser = argparse.ArgumentParser(description='Check ceph status')
73+ parser.add_argument('-f', '--file', dest='status_file',
74+ default=False, help='Optional file with "ceph status" output')
75+ args = parser.parse_args()
76+ nagios_plugin.try_check(check_ceph_status, args)
77
78=== added file 'files/nagios/collect_ceph_status.sh'
79--- files/nagios/collect_ceph_status.sh 1970-01-01 00:00:00 +0000
80+++ files/nagios/collect_ceph_status.sh 2014-06-02 09:35:59 +0000
81@@ -0,0 +1,18 @@
82+#!/bin/bash
83+# Copyright (C) 2014 Canonical
84+# All Rights Reserved
85+# Author: Jacek Nykis <jacek.nykis@canonical.com>
86+
87+LOCK=/var/lock/ceph-status.lock
88+lockfile-create -r2 --lock-name $LOCK > /dev/null 2>&1
89+if [ $? -ne 0 ]; then
90+ exit 1
91+fi
92+trap "rm -f $LOCK > /dev/null 2>&1" exit
93+
94+DATA_DIR="/var/lib/nagios"
95+if [ ! -d $DATA_DIR ]; then
96+ mkdir -p $DATA_DIR
97+fi
98+
99+ceph status >${DATA_DIR}/cat-ceph-status.txt
100
101=== added directory 'hooks/charmhelpers/contrib/charmsupport'
102=== added file 'hooks/charmhelpers/contrib/charmsupport/__init__.py'
103=== added file 'hooks/charmhelpers/contrib/charmsupport/nrpe.py'
104--- hooks/charmhelpers/contrib/charmsupport/nrpe.py 1970-01-01 00:00:00 +0000
105+++ hooks/charmhelpers/contrib/charmsupport/nrpe.py 2014-06-02 09:35:59 +0000
106@@ -0,0 +1,219 @@
107+"""Compatibility with the nrpe-external-master charm"""
108+# Copyright 2012 Canonical Ltd.
109+#
110+# Authors:
111+# Matthew Wedgwood <matthew.wedgwood@canonical.com>
112+
113+import subprocess
114+import pwd
115+import grp
116+import os
117+import re
118+import shlex
119+import yaml
120+
121+from charmhelpers.core.hookenv import (
122+ config,
123+ local_unit,
124+ log,
125+ relation_ids,
126+ relation_set,
127+)
128+
129+from charmhelpers.core.host import service
130+
131+# This module adds compatibility with the nrpe-external-master and plain nrpe
132+# subordinate charms. To use it in your charm:
133+#
134+# 1. Update metadata.yaml
135+#
136+# provides:
137+# (...)
138+# nrpe-external-master:
139+# interface: nrpe-external-master
140+# scope: container
141+#
142+# and/or
143+#
144+# provides:
145+# (...)
146+# local-monitors:
147+# interface: local-monitors
148+# scope: container
149+
150+#
151+# 2. Add the following to config.yaml
152+#
153+# nagios_context:
154+# default: "juju"
155+# type: string
156+# description: |
157+# Used by the nrpe subordinate charms.
158+# A string that will be prepended to instance name to set the host name
159+# in nagios. So for instance the hostname would be something like:
160+# juju-myservice-0
161+# If you're running multiple environments with the same services in them
162+# this allows you to differentiate between them.
163+#
164+# 3. Add custom checks (Nagios plugins) to files/nrpe-external-master
165+#
166+# 4. Update your hooks.py with something like this:
167+#
168+# from charmsupport.nrpe import NRPE
169+# (...)
170+# def update_nrpe_config():
171+# nrpe_compat = NRPE()
172+# nrpe_compat.add_check(
173+# shortname = "myservice",
174+# description = "Check MyService",
175+# check_cmd = "check_http -w 2 -c 10 http://localhost"
176+# )
177+# nrpe_compat.add_check(
178+# "myservice_other",
179+# "Check for widget failures",
180+# check_cmd = "/srv/myapp/scripts/widget_check"
181+# )
182+# nrpe_compat.write()
183+#
184+# def config_changed():
185+# (...)
186+# update_nrpe_config()
187+#
188+# def nrpe_external_master_relation_changed():
189+# update_nrpe_config()
190+#
191+# def local_monitors_relation_changed():
192+# update_nrpe_config()
193+#
194+# 5. ln -s hooks.py nrpe-external-master-relation-changed
195+# ln -s hooks.py local-monitors-relation-changed
196+
197+
198+class CheckException(Exception):
199+ pass
200+
201+
202+class Check(object):
203+ shortname_re = '[A-Za-z0-9-_]+$'
204+ service_template = ("""
205+#---------------------------------------------------
206+# This file is Juju managed
207+#---------------------------------------------------
208+define service {{
209+ use active-service
210+ host_name {nagios_hostname}
211+ service_description {nagios_hostname}[{shortname}] """
212+ """{description}
213+ check_command check_nrpe!{command}
214+ servicegroups {nagios_servicegroup}
215+}}
216+""")
217+
218+ def __init__(self, shortname, description, check_cmd):
219+ super(Check, self).__init__()
220+ # XXX: could be better to calculate this from the service name
221+ if not re.match(self.shortname_re, shortname):
222+ raise CheckException("shortname must match {}".format(
223+ Check.shortname_re))
224+ self.shortname = shortname
225+ self.command = "check_{}".format(shortname)
226+ # Note: a set of invalid characters is defined by the
227+ # Nagios server config
228+ # The default is: illegal_object_name_chars=`~!$%^&*"|'<>?,()=
229+ self.description = description
230+ self.check_cmd = self._locate_cmd(check_cmd)
231+
232+ def _locate_cmd(self, check_cmd):
233+ search_path = (
234+ '/usr/lib/nagios/plugins',
235+ '/usr/local/lib/nagios/plugins',
236+ )
237+ parts = shlex.split(check_cmd)
238+ for path in search_path:
239+ if os.path.exists(os.path.join(path, parts[0])):
240+ command = os.path.join(path, parts[0])
241+ if len(parts) > 1:
242+ command += " " + " ".join(parts[1:])
243+ return command
244+ log('Check command not found: {}'.format(parts[0]))
245+ return ''
246+
247+ def write(self, nagios_context, hostname):
248+ nrpe_check_file = '/etc/nagios/nrpe.d/{}.cfg'.format(
249+ self.command)
250+ with open(nrpe_check_file, 'w') as nrpe_check_config:
251+ nrpe_check_config.write("# check {}\n".format(self.shortname))
252+ nrpe_check_config.write("command[{}]={}\n".format(
253+ self.command, self.check_cmd))
254+
255+ if not os.path.exists(NRPE.nagios_exportdir):
256+ log('Not writing service config as {} is not accessible'.format(
257+ NRPE.nagios_exportdir))
258+ else:
259+ self.write_service_config(nagios_context, hostname)
260+
261+ def write_service_config(self, nagios_context, hostname):
262+ for f in os.listdir(NRPE.nagios_exportdir):
263+ if re.search('.*{}.cfg'.format(self.command), f):
264+ os.remove(os.path.join(NRPE.nagios_exportdir, f))
265+
266+ templ_vars = {
267+ 'nagios_hostname': hostname,
268+ 'nagios_servicegroup': nagios_context,
269+ 'description': self.description,
270+ 'shortname': self.shortname,
271+ 'command': self.command,
272+ }
273+ nrpe_service_text = Check.service_template.format(**templ_vars)
274+ nrpe_service_file = '{}/service__{}_{}.cfg'.format(
275+ NRPE.nagios_exportdir, hostname, self.command)
276+ with open(nrpe_service_file, 'w') as nrpe_service_config:
277+ nrpe_service_config.write(str(nrpe_service_text))
278+
279+ def run(self):
280+ subprocess.call(self.check_cmd)
281+
282+
283+class NRPE(object):
284+ nagios_logdir = '/var/log/nagios'
285+ nagios_exportdir = '/var/lib/nagios/export'
286+ nrpe_confdir = '/etc/nagios/nrpe.d'
287+
288+ def __init__(self, hostname=None):
289+ super(NRPE, self).__init__()
290+ self.config = config()
291+ self.nagios_context = self.config['nagios_context']
292+ self.unit_name = local_unit().replace('/', '-')
293+ if hostname:
294+ self.hostname = hostname
295+ else:
296+ self.hostname = "{}-{}".format(self.nagios_context, self.unit_name)
297+ self.checks = []
298+
299+ def add_check(self, *args, **kwargs):
300+ self.checks.append(Check(*args, **kwargs))
301+
302+ def write(self):
303+ try:
304+ nagios_uid = pwd.getpwnam('nagios').pw_uid
305+ nagios_gid = grp.getgrnam('nagios').gr_gid
306+ except:
307+ log("Nagios user not set up, nrpe checks not updated")
308+ return
309+
310+ if not os.path.exists(NRPE.nagios_logdir):
311+ os.mkdir(NRPE.nagios_logdir)
312+ os.chown(NRPE.nagios_logdir, nagios_uid, nagios_gid)
313+
314+ nrpe_monitors = {}
315+ monitors = {"monitors": {"remote": {"nrpe": nrpe_monitors}}}
316+ for nrpecheck in self.checks:
317+ nrpecheck.write(self.nagios_context, self.hostname)
318+ nrpe_monitors[nrpecheck.shortname] = {
319+ "command": nrpecheck.command,
320+ }
321+
322+ service('restart', 'nagios-nrpe-server')
323+
324+ for rid in relation_ids("local-monitors"):
325+ relation_set(relation_id=rid, monitors=yaml.dump(monitors))
326
327=== added file 'hooks/charmhelpers/contrib/charmsupport/volumes.py'
328--- hooks/charmhelpers/contrib/charmsupport/volumes.py 1970-01-01 00:00:00 +0000
329+++ hooks/charmhelpers/contrib/charmsupport/volumes.py 2014-06-02 09:35:59 +0000
330@@ -0,0 +1,156 @@
331+'''
332+Functions for managing volumes in juju units. One volume is supported per unit.
333+Subordinates may have their own storage, provided it is on its own partition.
334+
335+Configuration stanzas:
336+ volume-ephemeral:
337+ type: boolean
338+ default: true
339+ description: >
340+ If false, a volume is mounted as sepecified in "volume-map"
341+ If true, ephemeral storage will be used, meaning that log data
342+ will only exist as long as the machine. YOU HAVE BEEN WARNED.
343+ volume-map:
344+ type: string
345+ default: {}
346+ description: >
347+ YAML map of units to device names, e.g:
348+ "{ rsyslog/0: /dev/vdb, rsyslog/1: /dev/vdb }"
349+ Service units will raise a configure-error if volume-ephemeral
350+ is 'true' and no volume-map value is set. Use 'juju set' to set a
351+ value and 'juju resolved' to complete configuration.
352+
353+Usage:
354+ from charmsupport.volumes import configure_volume, VolumeConfigurationError
355+ from charmsupport.hookenv import log, ERROR
356+ def post_mount_hook():
357+ stop_service('myservice')
358+ def post_mount_hook():
359+ start_service('myservice')
360+
361+ if __name__ == '__main__':
362+ try:
363+ configure_volume(before_change=pre_mount_hook,
364+ after_change=post_mount_hook)
365+ except VolumeConfigurationError:
366+ log('Storage could not be configured', ERROR)
367+'''
368+
369+# XXX: Known limitations
370+# - fstab is neither consulted nor updated
371+
372+import os
373+from charmhelpers.core import hookenv
374+from charmhelpers.core import host
375+import yaml
376+
377+
378+MOUNT_BASE = '/srv/juju/volumes'
379+
380+
381+class VolumeConfigurationError(Exception):
382+ '''Volume configuration data is missing or invalid'''
383+ pass
384+
385+
386+def get_config():
387+ '''Gather and sanity-check volume configuration data'''
388+ volume_config = {}
389+ config = hookenv.config()
390+
391+ errors = False
392+
393+ if config.get('volume-ephemeral') in (True, 'True', 'true', 'Yes', 'yes'):
394+ volume_config['ephemeral'] = True
395+ else:
396+ volume_config['ephemeral'] = False
397+
398+ try:
399+ volume_map = yaml.safe_load(config.get('volume-map', '{}'))
400+ except yaml.YAMLError as e:
401+ hookenv.log("Error parsing YAML volume-map: {}".format(e),
402+ hookenv.ERROR)
403+ errors = True
404+ if volume_map is None:
405+ # probably an empty string
406+ volume_map = {}
407+ elif not isinstance(volume_map, dict):
408+ hookenv.log("Volume-map should be a dictionary, not {}".format(
409+ type(volume_map)))
410+ errors = True
411+
412+ volume_config['device'] = volume_map.get(os.environ['JUJU_UNIT_NAME'])
413+ if volume_config['device'] and volume_config['ephemeral']:
414+ # asked for ephemeral storage but also defined a volume ID
415+ hookenv.log('A volume is defined for this unit, but ephemeral '
416+ 'storage was requested', hookenv.ERROR)
417+ errors = True
418+ elif not volume_config['device'] and not volume_config['ephemeral']:
419+ # asked for permanent storage but did not define volume ID
420+ hookenv.log('Ephemeral storage was requested, but there is no volume '
421+ 'defined for this unit.', hookenv.ERROR)
422+ errors = True
423+
424+ unit_mount_name = hookenv.local_unit().replace('/', '-')
425+ volume_config['mountpoint'] = os.path.join(MOUNT_BASE, unit_mount_name)
426+
427+ if errors:
428+ return None
429+ return volume_config
430+
431+
432+def mount_volume(config):
433+ if os.path.exists(config['mountpoint']):
434+ if not os.path.isdir(config['mountpoint']):
435+ hookenv.log('Not a directory: {}'.format(config['mountpoint']))
436+ raise VolumeConfigurationError()
437+ else:
438+ host.mkdir(config['mountpoint'])
439+ if os.path.ismount(config['mountpoint']):
440+ unmount_volume(config)
441+ if not host.mount(config['device'], config['mountpoint'], persist=True):
442+ raise VolumeConfigurationError()
443+
444+
445+def unmount_volume(config):
446+ if os.path.ismount(config['mountpoint']):
447+ if not host.umount(config['mountpoint'], persist=True):
448+ raise VolumeConfigurationError()
449+
450+
451+def managed_mounts():
452+ '''List of all mounted managed volumes'''
453+ return filter(lambda mount: mount[0].startswith(MOUNT_BASE), host.mounts())
454+
455+
456+def configure_volume(before_change=lambda: None, after_change=lambda: None):
457+ '''Set up storage (or don't) according to the charm's volume configuration.
458+ Returns the mount point or "ephemeral". before_change and after_change
459+ are optional functions to be called if the volume configuration changes.
460+ '''
461+
462+ config = get_config()
463+ if not config:
464+ hookenv.log('Failed to read volume configuration', hookenv.CRITICAL)
465+ raise VolumeConfigurationError()
466+
467+ if config['ephemeral']:
468+ if os.path.ismount(config['mountpoint']):
469+ before_change()
470+ unmount_volume(config)
471+ after_change()
472+ return 'ephemeral'
473+ else:
474+ # persistent storage
475+ if os.path.ismount(config['mountpoint']):
476+ mounts = dict(managed_mounts())
477+ if mounts.get(config['mountpoint']) != config['device']:
478+ before_change()
479+ unmount_volume(config)
480+ mount_volume(config)
481+ after_change()
482+ else:
483+ before_change()
484+ mount_volume(config)
485+ after_change()
486+ return config['mountpoint']
487
488=== modified file 'hooks/hooks.py'
489--- hooks/hooks.py 2014-03-25 18:44:22 +0000
490+++ hooks/hooks.py 2014-06-02 09:35:59 +0000
491@@ -1,11 +1,12 @@
492 #!/usr/bin/python
493
494 #
495-# Copyright 2012 Canonical Ltd.
496+# Copyright 2012, 2014 Canonical Ltd.
497 #
498 # Authors:
499 # Paul Collins <paul.collins@canonical.com>
500 # James Page <james.page@ubuntu.com>
501+# Jacek Nykis <jacek.nykis@canonical.com>
502 #
503
504 import glob
505@@ -23,13 +24,16 @@
506 relation_set,
507 remote_unit,
508 Hooks, UnregisteredHookError,
509- service_name
510+ service_name,
511+ relations_of_type
512 )
513
514 from charmhelpers.core.host import (
515 service_restart,
516 umount,
517- mkdir
518+ mkdir,
519+ write_file,
520+ rsync
521 )
522 from charmhelpers.fetch import (
523 apt_install,
524@@ -39,6 +43,7 @@
525 )
526 from charmhelpers.payload.execd import execd_preinstall
527 from charmhelpers.contrib.openstack.alternatives import install_alternative
528+from charmhelpers.contrib.charmsupport.nrpe import NRPE
529
530 from utils import (
531 render_template,
532@@ -47,6 +52,11 @@
533
534 hooks = Hooks()
535
536+NAGIOS_PLUGINS = '/usr/local/lib/nagios/plugins'
537+SCRIPTS_DIR = '/usr/local/bin'
538+STATUS_FILE = '/var/lib/nagios/cat-ceph-status.txt'
539+STATUS_CRONFILE = '/etc/cron.d/cat-ceph-health'
540+
541
542 def install_upstart_scripts():
543 # Only install upstart configurations for older versions
544@@ -128,6 +138,9 @@
545 reformat_osd())
546 ceph.start_osds(get_devices())
547
548+ if relations_of_type('nrpe-external-master'):
549+ nrpe_relation()
550+
551 log('End config-changed hook.')
552
553
554@@ -300,6 +313,37 @@
555 ceph.start_osds(get_devices())
556
557
558+@hooks.hook('nrpe-external-master-relation-joined')
559+@hooks.hook('nrpe-external-master-relation-changed')
560+def nrpe_relation():
561+ log('Refreshing nagios checks')
562+ if os.path.isdir(NAGIOS_PLUGINS):
563+ rsync(os.path.join(os.getenv('CHARM_DIR'), 'files', 'nagios',
564+ 'check_ceph_status.py'),
565+ os.path.join(NAGIOS_PLUGINS, 'check_ceph_status.py'))
566+
567+ script = os.path.join(SCRIPTS_DIR, 'collect_ceph_status.sh')
568+ rsync(os.path.join(os.getenv('CHARM_DIR'), 'files',
569+ 'nagios', 'collect_ceph_status.sh'),
570+ script)
571+ cronjob = "{} root {}\n".format('*/5 * * * *', script)
572+ write_file(STATUS_CRONFILE, cronjob)
573+
574+ # Find out if nrpe set nagios_hostname
575+ hostname = None
576+ for rel in relations_of_type('nrpe-external-master'):
577+ if 'nagios_hostname' in rel:
578+ hostname = rel['nagios_hostname']
579+ break
580+ nrpe = NRPE(hostname=hostname)
581+ nrpe.add_check(
582+ shortname="ceph",
583+ description='Check Ceph health',
584+ check_cmd='check_ceph_status.py -f {}'.format(STATUS_FILE)
585+ )
586+ nrpe.write()
587+
588+
589 if __name__ == '__main__':
590 try:
591 hooks.execute(sys.argv)
592
593=== added symlink 'hooks/nrpe-external-master-relation-changed'
594=== target is u'hooks.py'
595=== added symlink 'hooks/nrpe-external-master-relation-joined'
596=== target is u'hooks.py'
597=== modified file 'metadata.yaml'
598--- metadata.yaml 2013-07-14 19:46:24 +0000
599+++ metadata.yaml 2014-06-02 09:35:59 +0000
600@@ -16,3 +16,7 @@
601 interface: ceph-osd
602 radosgw:
603 interface: ceph-radosgw
604+ nrpe-external-master:
605+ interface: nrpe-external-master
606+ scope: container
607+ gets: [nagios_hostname, nagios_host_context]
608
609=== modified file 'revision'
610--- revision 2014-03-25 18:44:22 +0000
611+++ revision 2014-06-02 09:35:59 +0000
612@@ -1,1 +1,1 @@
613-105
614\ No newline at end of file
615+113
616\ No newline at end of file

Subscribers

People subscribed via source and target branches