Merge lp:~sinzui/charms/precise/juju-gui/nagios into lp:charms/juju-gui

Proposed by Curtis Hovey
Status: Merged
Merged at revision: 74
Proposed branch: lp:~sinzui/charms/precise/juju-gui/nagios
Merge into: lp:charms/juju-gui
Diff against target: 446 lines (+378/-3)
8 files modified
config.yaml (+11/-1)
files/nrpe-external-master/check-app-access.sh (+18/-0)
metadata.yaml (+3/-0)
revision (+1/-1)
scripts/charmsupport/hookenv.py (+150/-0)
scripts/charmsupport/nrpe.py (+169/-0)
scripts/update-nrpe.py (+14/-0)
tests/20-functional.test (+12/-1)
To merge this branch: bzr merge lp:~sinzui/charms/precise/juju-gui/nagios
Reviewer Review Type Date Requested Status
Benji York (community) Approve
charmers Pending
Review via email: mp+177588@code.launchpad.net

Description of the change

Add nagios support.

RULES

    Pre-implementation: sidnei, moon127, wedgwood
    * We want to support the nagios subordinate charm to monitor production.
    * Webops prefer to use the nrpe module from charmsupport.
      * The project merged with charmhelpers (not related to the charm-helpers
        ppa that all PPA configured juju deploys get).
      * Webops write a custom script for each charm they deploy to embed the
        library in the charm; they alway deploy a forked charm. The charm
        runs with a modified python path to find that included libs.
      * Webops require the charm to ensure errors installing dependencies are
        are ignored since they have ensured the libs are already in the charm.
        * Or have they? Every change to the charm requires review of a webop
          to ensure the building of a fat charm really works.
    * I decided to include just the nrpe module and its dependencies in
      the charm's tree.
      * No action is needed to build out a fat charm.
      * No future coordination with webops to ensure the charm really
        installs its deps.
      * I chose an older version of nrpe because it has fewer deps than
        the current version. I chose the same version used by charmworld.
        I'd like to keep these two charms synced to simplify the parts used
        to deploy charmworld.com

QA

    DEPLOY JUJU-GUI AND NAGIOS
    juju bootstrap
    juju deploy --repo=~/Work/juju-charms/ local:precise/juju-gui
    juju deploy nagios
    juju add-relation nagios juju-gui
    juju expose nagios
    Verify you can see juju-gui in nagios with a single ssh check

    DEPLOY NRPE-EXTERNAL-MASTER
    juju deploy nrpe-external-master
    juju set nrpe-external-master nagios_master=http://10.0.3.45/<nagio-public-ip>
    juju add-relation nrpe-external-master:nrpe-external-master juju-gui:nrpe-external-master

    Verify you can see the nagios external cfg files in juju-gui's file system.
    ls /var/lib/nagios/export/
    Verify the unit config "host__juju-juju-gui-0.cfg" has the the nagio ip address.

IMPLEMENTATION

    * I added just the parts of the nrpe library needed.
      scripts/charmsupport/__init__.py
      scripts/charmsupport/hookenv.py
      scripts/charmsupport/nrpe.py

    * I added a hook that calls the update-nrpe.py script that in turn
      registers the check-front-page.sh script. nrpe requires the
      check scripts to be in files/nrpe-external-master/.
      hooks/nrpe-external-master-relation-changed
      scripts/update-nrpe.py
      files/nrpe-external-master/check-front-page.sh

    * I updated the metadata and config to enable the subordinate charm.
      config.yaml
      metadata.yaml
      revision

To post a comment you must log in.
78. By Curtis Hovey

Merged tip and resolved conflicts.

79. By Curtis Hovey

Merged tip.

Revision history for this message
Benji York (benji) wrote :

Thanks for wiring this up. Monitoring is essential.

Given that this MP doesn't have Reitveld links, I assume you didn't use
lbox to propose it. Despite it's many annoyances, lbox is the
prescribed way of proposing and landing GUI branches. The most
important thing it does is to run "make check" before proposing a
branch. Please manually run that command before landing. Thanks.

> I decided to include just the nrpe module and its dependencies in the
> charm's tree.

It's a shame that our policies push us toward doing things like copy
chunks of code from outdated dependencies, but I can't proffer anything
better.

In check-front-page.sh (line 31 of the diff) shouldn't it be "sites-enabled"
instead of "sites-available"?

Also, I suggest using "https://127.0.0.1:443/juju-ui/version.js" for ADDRESS
and "jujuGuiVersionInfo" for LIFE_SIGN. Being code, that is less likely
to change because of a UI/UX redesign and we can write a test for it
that ensures if it does change a test named "testMonitoringEndpoint"
will fail. Oh, and we need that test in this branch too.

Once the above are addressed, this will be ready to land.

review: Approve
80. By Curtis Hovey

change script to check-app-access.sh because it now checks that the
GUI can be downloaded.

81. By Curtis Hovey

Added test to correlate check-app-access.sh to the path it checks.

Revision history for this message
Gary Poster (gary) wrote :

Hey Curtis. Thank you! Is it reasonable/easy to merge these revisions into the ~juju-gui charm branch as well, or are there conflicts?

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'config.yaml'
2--- config.yaml 2013-07-15 20:40:15 +0000
3+++ config.yaml 2013-08-08 17:05:15 +0000
4@@ -159,7 +159,7 @@
5 canvas. This is also known as browse mode.
6 - 'minimized': the charmbrowser will be minimized by default, and hidden.
7 type: string
8- default: sidebar
9+ default: sidebar
10 show-get-juju-button:
11 description: |
12 There are deployment modes for Juju GUI which are not intended as regular
13@@ -167,3 +167,13 @@
14 link to juju.ubuntu.com
15 type: boolean
16 default: false
17+ nagios_context:
18+ default: "juju"
19+ type: string
20+ description: |
21+ Used by the nrpe-external-master subordinate charm.
22+ A string that will be prepended to instance name to set the host name
23+ in nagios. So for instance the hostname would be something like:
24+ juju-myservice-0
25+ If you're running multiple environments with the same services in them
26+ this allows you to differentiate between them.
27
28=== added directory 'files'
29=== added directory 'files/nrpe-external-master'
30=== added file 'files/nrpe-external-master/check-app-access.sh'
31--- files/nrpe-external-master/check-app-access.sh 1970-01-01 00:00:00 +0000
32+++ files/nrpe-external-master/check-app-access.sh 2013-08-08 17:05:15 +0000
33@@ -0,0 +1,18 @@
34+#!/bin/bash
35+SITE_CONF='/etc/apache2/sites-enabled/juju-gui'
36+ADDRESS='https://127.0.0.1:443/juju-ui/version.js'
37+LIFE_SIGN='jujuGuiVersionInfo'
38+
39+if [[ ! -f $SITE_CONF ]]; then
40+ echo Apache is not configured serve juju-gui.
41+ exit 2
42+fi
43+
44+match=$(curl -k $ADDRESS | grep "$LIFE_SIGN")
45+
46+if [[ -n "$match" ]]; then
47+ exit 0
48+else
49+ echo juju-gui did not return content indicating it was loading.
50+ exit 2
51+fi
52
53=== added symlink 'hooks/nrpe-external-master-relation-changed'
54=== target is u'../scripts/update-nrpe.py'
55=== modified file 'metadata.yaml'
56--- metadata.yaml 2013-06-11 14:13:45 +0000
57+++ metadata.yaml 2013-08-08 17:05:15 +0000
58@@ -22,3 +22,6 @@
59 provides:
60 web:
61 interface: http
62+ nrpe-external-master:
63+ interface: nrpe-external-master
64+ scope: container
65
66=== modified file 'revision'
67--- revision 2013-08-02 14:19:45 +0000
68+++ revision 2013-08-08 17:05:15 +0000
69@@ -1,1 +1,1 @@
70-65
71+66
72
73=== added directory 'scripts'
74=== added directory 'scripts/charmsupport'
75=== added file 'scripts/charmsupport/__init__.py'
76=== added file 'scripts/charmsupport/hookenv.py'
77--- scripts/charmsupport/hookenv.py 1970-01-01 00:00:00 +0000
78+++ scripts/charmsupport/hookenv.py 2013-08-08 17:05:15 +0000
79@@ -0,0 +1,150 @@
80+"Interactions with the Juju environment"
81+# source: 27:lp:charmsupport
82+# Copyright 2012 Canonical Ltd.
83+#
84+# Authors:
85+# Matthew Wedgwood <matthew.wedgwood@canonical.com>
86+
87+import os
88+import json
89+import yaml
90+import subprocess
91+
92+CRITICAL = "CRITICAL"
93+ERROR = "ERROR"
94+WARNING = "WARNING"
95+INFO = "INFO"
96+DEBUG = "DEBUG"
97+def log(message, level=DEBUG):
98+ "Write a message to the juju log"
99+ subprocess.call( [ 'juju-log', '-l', level, message ] )
100+
101+class Serializable(object):
102+ "Wrapper, an object that can be serialized to yaml or json"
103+ def __init__(self, obj):
104+ # wrap the object
105+ super(Serializable, self).__init__()
106+ self._wrapped_obj = obj
107+
108+ def __getattr__(self, attr):
109+ # see if this object has attr
110+ if attr in self.__dict__:
111+ return getattr(self, attr)
112+ # proxy to the wrapped object
113+ return self[attr]
114+
115+ def __getitem__(self, key):
116+ return self._wrapped_obj[key]
117+
118+ def json(self):
119+ "Serialize the object to json"
120+ return json.dumps(self._wrapped_obj)
121+
122+ def yaml(self):
123+ "Serialize the object to yaml"
124+ return yaml.dump(self._wrapped_obj)
125+
126+def execution_environment():
127+ """A convenient bundling of the current execution context"""
128+ context = {}
129+ context['conf'] = config()
130+ context['unit'] = local_unit()
131+ context['rel'] = relations_of_type()
132+ context['env'] = os.environ
133+ return context
134+
135+def in_relation_hook():
136+ "Determine whether we're running in a relation hook"
137+ return os.environ.has_key('JUJU_RELATION')
138+
139+def relation_type():
140+ "The scope for the current relation hook"
141+ return os.environ['JUJU_RELATION']
142+def relation_id():
143+ "The relation ID for the current relation hook"
144+ return os.environ['JUJU_RELATION_ID']
145+def local_unit():
146+ "Local unit ID"
147+ return os.environ['JUJU_UNIT_NAME']
148+def remote_unit():
149+ "The remote unit for the current relation hook"
150+ return os.environ['JUJU_REMOTE_UNIT']
151+
152+def config(scope=None):
153+ "Juju charm configuration"
154+ config_cmd_line = ['config-get']
155+ if scope is not None:
156+ config_cmd_line.append(scope)
157+ config_cmd_line.append('--format=json')
158+ try:
159+ config_data = json.loads(subprocess.check_output(config_cmd_line))
160+ except (ValueError, OSError, subprocess.CalledProcessError) as err:
161+ log(str(err), level=ERROR)
162+ raise err
163+ return Serializable(config_data)
164+
165+def relation_ids(reltype=None):
166+ "A list of relation_ids"
167+ reltype = reltype or relation_type()
168+ relids = []
169+ relid_cmd_line = ['relation-ids', '--format=json', reltype]
170+ relids.extend(json.loads(subprocess.check_output(relid_cmd_line)))
171+ return relids
172+
173+def related_units(relid=None):
174+ "A list of related units"
175+ relid = relid or relation_id()
176+ units_cmd_line = ['relation-list', '--format=json', '-r', relid]
177+ units = json.loads(subprocess.check_output(units_cmd_line))
178+ return units
179+
180+def relation_for_unit(unit=None):
181+ "Get the json represenation of a unit's relation"
182+ unit = unit or remote_unit()
183+ relation_cmd_line = ['relation-get', '--format=json', '-', unit]
184+ try:
185+ relation = json.loads(subprocess.check_output(relation_cmd_line))
186+ except (ValueError, OSError, subprocess.CalledProcessError), err:
187+ log(str(err), level=ERROR)
188+ raise err
189+ for key in relation:
190+ if key.endswith('-list'):
191+ relation[key] = relation[key].split()
192+ relation['__unit__'] = unit
193+ return Serializable(relation)
194+
195+def relations_for_id(relid=None):
196+ "Get relations of a specific relation ID"
197+ relation_data = []
198+ relid = relid or relation_ids()
199+ for unit in related_units(relid):
200+ unit_data = relation_for_unit(unit)
201+ unit_data['__relid__'] = relid
202+ relation_data.append(unit_data)
203+ return relation_data
204+
205+def relations_of_type(reltype=None):
206+ "Get relations of a specific type"
207+ relation_data = []
208+ if in_relation_hook():
209+ reltype = reltype or relation_type()
210+ for relid in relation_ids(reltype):
211+ for relation in relations_for_id(relid):
212+ relation['__relid__'] = relid
213+ relation_data.append(relation)
214+ return relation_data
215+
216+class UnregisteredHookError(Exception): pass
217+
218+class Hooks(object):
219+ def __init__(self):
220+ super(Hooks, self).__init__()
221+ self._hooks = {}
222+ def register(self, name, function):
223+ self._hooks[name] = function
224+ def execute(self, args):
225+ hook_name = os.path.basename(args[0])
226+ if hook_name in self._hooks:
227+ self._hooks[hook_name]()
228+ else:
229+ raise UnregisteredHookError(hook_name)
230
231=== added file 'scripts/charmsupport/nrpe.py'
232--- scripts/charmsupport/nrpe.py 1970-01-01 00:00:00 +0000
233+++ scripts/charmsupport/nrpe.py 2013-08-08 17:05:15 +0000
234@@ -0,0 +1,169 @@
235+"""Compatibility with the nrpe-external-master charm"""
236+# source: 27:lp:charmsupport
237+# Copyright 2012 Canonical Ltd.
238+#
239+# Authors:
240+# Matthew Wedgwood <matthew.wedgwood@canonical.com>
241+
242+import subprocess
243+import pwd
244+import grp
245+import os
246+import re
247+import shlex
248+
249+from hookenv import config, local_unit
250+
251+# This module adds compatibility with the nrpe_external_master
252+# subordinate charm. To use it in your charm:
253+#
254+# 1. Update metadata.yaml
255+#
256+# provides:
257+# (...)
258+# nrpe-external-master:
259+# interface: nrpe-external-master
260+# scope: container
261+#
262+# 2. Add the following to config.yaml
263+#
264+# nagios_context:
265+# default: "juju"
266+# type: string
267+# description: |
268+# Used by the nrpe-external-master subordinate charm.
269+# A string that will be prepended to instance name to set the host name
270+# in nagios. So for instance the hostname would be something like:
271+# juju-myservice-0
272+# If you're running multiple environments with the same services in them
273+# this allows you to differentiate between them.
274+#
275+# 3. Add custom checks (Nagios plugins) to files/nrpe-external-master
276+#
277+# 4. Update your hooks.py with something like this:
278+#
279+# import nrpe
280+# (...)
281+# def update_nrpe_config():
282+# nrpe_compat = NRPE("myservice")
283+# nrpe_compat.add_check(
284+# shortname = "myservice",
285+# description = "Check MyService",
286+# check_cmd = "check_http -w 2 -c 10 http://localhost"
287+# )
288+# nrpe_compat.add_check(
289+# "myservice_other",
290+# "Check for widget failures",
291+# check_cmd = "/srv/myapp/scripts/widget_check"
292+# )
293+# nrpe_compat.write()
294+#
295+# def config_changed():
296+# (...)
297+# update_nrpe_config()
298+# def nrpe_external_master_relation_changed():
299+# update_nrpe_config()
300+#
301+# 5. ln -s hooks.py nrpe-external-master-relation-changed
302+
303+class CheckException(Exception): pass
304+class Check(object):
305+ shortname_re = '[A-Za-z0-9-_]*'
306+ service_template = """
307+#---------------------------------------------------
308+# This file is Juju managed
309+#---------------------------------------------------
310+define service {{
311+ use active-service
312+ host_name {nagios_hostname}
313+ service_description {nagios_hostname}[{shortname}] {description}
314+ check_command check_nrpe!check_{shortname}
315+ servicegroups {nagios_servicegroup}
316+}}
317+"""
318+ def __init__(self, shortname, description, check_cmd):
319+ super(Check, self).__init__()
320+ # XXX: could be better to calculate this from the service name
321+ if not re.match(self.shortname_re, shortname):
322+ raise CheckException("shortname must match {}".format(Check.shortname_re))
323+ self.shortname = shortname
324+ # Note: a set of invalid characters is defined by the Nagios server config
325+ # The default is: illegal_object_name_chars=`~!$%^&*"|'<>?,()=
326+ self.description = description
327+ self.check_cmd = self._locate_cmd(check_cmd)
328+
329+ def _locate_cmd(self, check_cmd):
330+ search_path = (
331+ '/',
332+ os.path.join(os.environ['CHARM_DIR'], 'files/nrpe-external-master'),
333+ '/usr/lib/nagios/plugins',
334+ )
335+ command = shlex.split(check_cmd)
336+ for path in search_path:
337+ if os.path.exists(os.path.join(path,command[0])):
338+ return os.path.join(path, command[0]) + " " + " ".join(command[1:])
339+ subprocess.call(['juju-log', 'Check command not found: {}'.format(command[0])])
340+ return ''
341+
342+ def write(self, nagios_context, hostname):
343+ for f in os.listdir(NRPE.nagios_exportdir):
344+ if re.search('.*check_{}.cfg'.format(self.shortname), f):
345+ os.remove(os.path.join(NRPE.nagios_exportdir, f))
346+
347+ templ_vars = {
348+ 'nagios_hostname': hostname,
349+ 'nagios_servicegroup': nagios_context,
350+ 'description': self.description,
351+ 'shortname': self.shortname,
352+ }
353+ nrpe_service_text = Check.service_template.format(**templ_vars)
354+ nrpe_service_file = '{}/service__{}_check_{}.cfg'.format(
355+ NRPE.nagios_exportdir, hostname, self.shortname)
356+ with open(nrpe_service_file, 'w') as nrpe_service_config:
357+ nrpe_service_config.write(str(nrpe_service_text))
358+
359+ nrpe_check_file = '/etc/nagios/nrpe.d/check_{}.cfg'.format(self.shortname)
360+ with open(nrpe_check_file, 'w') as nrpe_check_config:
361+ nrpe_check_config.write("# check {}\n".format(self.shortname))
362+ nrpe_check_config.write("command[check_{}]={}\n".format(
363+ self.shortname, self.check_cmd))
364+
365+ def run(self):
366+ subprocess.call(self.check_cmd)
367+
368+class NRPE(object):
369+ nagios_logdir = '/var/log/nagios'
370+ nagios_exportdir = '/var/lib/nagios/export'
371+ nrpe_confdir = '/etc/nagios/nrpe.d'
372+ def __init__(self):
373+ super(NRPE, self).__init__()
374+ self.config = config()
375+ self.nagios_context = self.config['nagios_context']
376+ self.unit_name = local_unit().replace('/', '-')
377+ self.hostname = "{}-{}".format(self.nagios_context, self.unit_name)
378+ self.checks = []
379+
380+ def add_check(self, *args, **kwargs):
381+ self.checks.append( Check(*args, **kwargs) )
382+
383+ def write(self):
384+ try:
385+ nagios_uid = pwd.getpwnam('nagios').pw_uid
386+ nagios_gid = grp.getgrnam('nagios').gr_gid
387+ except:
388+ subprocess.call(['juju-log', "Nagios user not set up, nrpe checks not updated"])
389+ return
390+
391+ if not os.path.exists(NRPE.nagios_exportdir):
392+ subprocess.call(['juju-log', 'Exiting as {} is not accessible'.format(NRPE.nagios_exportdir)])
393+ return
394+
395+ if not os.path.exists(NRPE.nagios_logdir):
396+ os.mkdir(NRPE.nagios_logdir)
397+ os.chown(NRPE.nagios_logdir, nagios_uid, nagios_gid)
398+
399+ for nrpecheck in self.checks:
400+ nrpecheck.write(self.nagios_context, self.hostname)
401+
402+ if os.path.isfile('/etc/init.d/nagios-nrpe-server'):
403+ subprocess.call(['service', 'nagios-nrpe-server', 'reload'])
404
405=== added file 'scripts/update-nrpe.py'
406--- scripts/update-nrpe.py 1970-01-01 00:00:00 +0000
407+++ scripts/update-nrpe.py 2013-08-08 17:05:15 +0000
408@@ -0,0 +1,14 @@
409+#!/usr/bin/env python
410+from charmsupport import nrpe
411+
412+
413+def update_nrpe_config():
414+ nrpe_compat = nrpe.NRPE()
415+ nrpe_compat.add_check(
416+ 'App is accessible', 'Check the app can be downloaded',
417+ 'check-app-access.sh')
418+ nrpe_compat.write()
419+
420+
421+if __name__ == '__main__':
422+ update_nrpe_config()
423
424=== modified file 'tests/20-functional.test'
425--- tests/20-functional.test 2013-08-01 14:33:00 +0000
426+++ tests/20-functional.test 2013-08-08 17:05:15 +0000
427@@ -202,7 +202,18 @@
428 self.assertIn('max-age=0', cache_directives)
429 self.assertIn('public', cache_directives)
430 self.assertIn('must-revalidate', cache_directives)
431-
432+
433+ def test_nrpe_check_available(self):
434+ # Make sure the check-app-access.sh script's ADDRESS is available.
435+ unit_info = self.juju_deploy(
436+ self.charm, options={'juju-gui-source': JUJU_GUI_TEST_BRANCH})
437+ hostname = unit_info['public-address']
438+ conn = httplib.HTTPSConnection(hostname)
439+ # This request matches the ADDRESS var in the script.
440+ conn.request('GET', '/juju-ui/version.js')
441+ message = 'ADDRESS in check-app-access.sh is not accessible.'
442+ self.assertEqual(200, conn.getresponse().status, message)
443+
444
445 @unittest.skipIf(is_legacy_juju, 'force-machine only works in juju-core')
446 def test_force_machine(self):

Subscribers

People subscribed via source and target branches