Merge lp:~jacekn/charms/precise/rabbitmq-server/queue-monitoring into lp:charms/rabbitmq-server

Proposed by Jacek Nykis
Status: Merged
Merged at revision: 94
Proposed branch: lp:~jacekn/charms/precise/rabbitmq-server/queue-monitoring
Merge into: lp:charms/rabbitmq-server
Prerequisite: lp:~jacekn/charms/precise/rabbitmq-server/nrpe-fix
Diff against target: 271 lines (+194/-5)
5 files modified
config.yaml (+14/-0)
hooks/rabbitmq_server_relations.py (+31/-4)
revision (+1/-1)
scripts/check_rabbitmq_queues.py (+99/-0)
scripts/collect_rabbitmq_stats.sh (+49/-0)
To merge this branch: bzr merge lp:~jacekn/charms/precise/rabbitmq-server/queue-monitoring
Reviewer Review Type Date Requested Status
Stuart Bishop (community) Needs Fixing
Matt Bruzek (community) Needs Fixing
Jacek Nykis (community) Needs Resubmitting
Review via email: mp+218580@code.launchpad.net

Description of the change

This change adds support for queue monitoring by nagios.

To post a comment you must log in.
58. By Jacek Nykis

Add quotes to the rabbitmq thresholds to allow wildcards

Revision history for this message
Matt Bruzek (mbruzek) wrote :

Jacek,

Thanks for this work on the rabbitmq-server! I realize monitoring is important for production environments and am looking forward to getting the monitoring code in the charm.

When I run charm proof on the merged code there are two new Warning messages:
$ charm proof
W: config.yaml: option stats_cron_schedule does not have the keys: default
W: config.yaml: option queue_thresholds does not have the keys: default

These new configuration values should have default values in the config.yaml file. You can have a default of “” or (I believe) NoneType is valid.

Thanks for chatting with me on IRC. The steps to test this were as follows:

juju deploy --repository=. local:precise/rabbitmq-server
juju deploy nrpe-external-master
juju add-relation rabbitmq-server nrpe-external-master

juju set rabbitmq-server stats_cron_schedule="*/5 * * * *"
juju set rabbitmq-server queue_thresholds="[['\*', '\*', 15, 30]]"

(Wait 5 minutes for the queue data to be generated by cron")
juju ssh rabbitmq-server/0
$ /usr/local/lib/nagios/plugins/check_rabbitmq_queues.py -c \* \* 15 30 /var/lib/rabbitmq/data/*_queue_stats.dat

The result of this command were:
ubuntu@mbruzek-local-machine-1:/var/lib/rabbitmq/data$ /usr/local/lib/nagios/plugins/check_rabbitmq_queues.py -c \* \* 15 30 /var/lib/rabbitmq/data/mbruzek-local-machine-1_queue_stats.dat
CRITICALS: No Queues Found in No Vhosts Found has None messages

Normally this result would indicate a failure, but you mentioned on IRC that is to be expected because there was nothing connected to rabbitmq-server to generate messages. I was unable to figure out how to generate queued messages in rabbitmq-server for further testing.

Please provide default values for the two new configuration options and give me some more details on how to generate some rabbitmq-traffic so we can see the command run successful.

Thank you again for the submission! I am going to put this MP to "needs fixing"

Once the problem has been addressed, click on the “Request another review” link on this merge proposal. That way it will be added to the review queue properly.

If you have any questions/comments/concerns about the review contact mbruzek on IRC. You can find the rest of the team in #juju on irc.freenode.net or email the mailing list <email address hidden>

Revision history for this message
Matt Bruzek (mbruzek) :
review: Needs Fixing
59. By Jacek Nykis

Added default value (empty string) to stats_cron_schedule and queue_thresholds config options as per MP comment

Revision history for this message
Jacek Nykis (jacekn) wrote :

Hi Matt,

Thank you for reviewing the charm. I added default values (empty strings) and tested everything again.

If you want to verify the queue check you can use sample data below. This lines were generated by the same script as one in the charm. You can also modify 5th column to simulate queues filling up.

#Vhost Name Messages_ready Messages_unacknowledged Messages Consumers Memory Time
nagios-rabbitmq-server-0 test_exchange_queue 0 0 0 0 8952 1401271802
openstack ceilometer.collector 0 0 0 1 13984 1401271802
openstack ceilometer.collector.metering 0 0 0 1 14056 1401271802
openstack cert 0 0 0 1 13984 1401271802

review: Needs Resubmitting
Revision history for this message
Matt Bruzek (mbruzek) wrote :

Jacek,

Thank you for resubmitting this proposal with those fixes!

I deployed the charms as listed above and added the text (from your last comment) to the juju-hp-mbruzek-machine-1_queue_stats.dat file:
sudo vi juju-hp-mbruzek-machine-1_queue_stats.dat

The check_rabbitmq_queues.py still does not return any output! I am using hp-cloud but the output is essentially the same:
ubuntu@juju-hp-mbruzek-machine-1:~$ /usr/local/lib/nagios/plugins/check_rabbitmq_queues.py -c \* \* 15 30 /var/lib/rabbitmq/data/*_queue_stats.dat
CRITICALS: No Queues Found in No Vhosts Found has None messages

Also there are 2 new keys that need default: in config.yaml.
$ charm proof
W: config.yaml: option key does not have the keys: default
W: config.yaml: option source does not have the keys: default

These configuration options need the “default:” keyword in the config.yaml and my understanding is leaving them blank gives it a None value which is what I recommend for these options.

The rabbitmq queue has not filled up on any of the tests with the rabbitmq-server and nrpe-external-master charms deployed. Please provide detailed steps on how to set this charm up so the rabbitmq queue starts filling up so we can verify the Merge Proposal!

Putting this proposal in "Needs Fixing". Once you have addressed the issues please click on the “Request another review” link on this merge proposal. That way it will be added to the review queue properly.

If you have any questions/comments/concerns about the review contact mbruzek on IRC. You can find the rest of the team in #juju on irc.freenode.net or email the mailing list <email address hidden>

review: Needs Fixing
Revision history for this message
Alexander List (alexlist) wrote :

Please consider setting up an openstack trusty/icehouse deployment, which requires rabbitmq-server for collecting stats for ceilometer. Or any other service that uses AMQP.

The two empty ("") default values for "key" and "source" are missing in cs:precise/rabbitmq-server as well, so they should not block this MP.

Revision history for this message
Alexander List (alexlist) wrote :

We'll try to come up with a scenario that's testable, and update the MP to include these defaults.

Revision history for this message
Stuart Bishop (stub) wrote :

Per mbruzek's review

review: Needs Fixing
Revision history for this message
José Antonio Rey (jose) wrote :

Hey Jacek,

Based on other charmer's comments, I'm marking this as WIP. Make sure to move it to Needs Review and ask for another review from the ~charmers team once it's ready!

Thanks for all your efforts on making this charm better :)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'config.yaml'
2--- config.yaml 2014-03-26 10:23:01 +0000
3+++ config.yaml 2014-05-28 10:08:46 +0000
4@@ -133,3 +133,17 @@
5 description: |
6 Key ID to import to the apt keyring to support use with arbitary source
7 configuration from outside of Launchpad archives or PPA's.
8+ stats_cron_schedule:
9+ type: string
10+ default: ""
11+ description: |
12+ Cron schedule used to generate rabbitmq stats. If unset
13+ no stats will be generated
14+ queue_thresholds:
15+ type: string
16+ default: ""
17+ description: |
18+ List of RabbitMQ queue size check thresholds. Interpreted as YAML
19+ in format [<vhost>, <queue>, <warn>, <crit>]
20+ - ['/', 'queue1', 10, 20]
21+ - ['/', 'queue2', 200, 300]
22
23=== modified file 'hooks/rabbitmq_server_relations.py'
24--- hooks/rabbitmq_server_relations.py 2014-05-02 14:18:00 +0000
25+++ hooks/rabbitmq_server_relations.py 2014-05-28 10:08:46 +0000
26@@ -5,6 +5,7 @@
27 import sys
28 import subprocess
29 import glob
30+import yaml
31
32 import rabbit_utils as rabbit
33 from lib.utils import (
34@@ -41,7 +42,7 @@
35 UnregisteredHookError
36 )
37 from charmhelpers.core.host import (
38- rsync, service_stop, service_restart
39+ rsync, service_stop, service_restart, write_file
40 )
41 from charmhelpers.contrib.charmsupport.nrpe import NRPE
42 from charmhelpers.contrib.ssl.service import ServiceCA
43@@ -60,6 +61,10 @@
44 RABBIT_USER = 'rabbitmq'
45 RABBIT_GROUP = 'rabbitmq'
46 NAGIOS_PLUGINS = '/usr/local/lib/nagios/plugins'
47+SCRIPTS_DIR = '/usr/local/bin'
48+STATS_CRONFILE = '/etc/cron.d/rabbitmq-stats'
49+STATS_DATAFILE = os.path.join(RABBIT_DIR, 'data',
50+ subprocess.check_output(['hostname', '-s']).strip() + '_queue_stats.dat')
51
52
53 @hooks.hook('install')
54@@ -334,10 +339,10 @@
55 rbd_img=rbd_img, sizemb=sizemb,
56 fstype='ext4', mount_point=RABBIT_DIR,
57 blk_device=blk_device,
58- system_services=['rabbitmq-server'])#,
59+ system_services=['rabbitmq-server']) # ,
60 #rbd_pool_replicas=rbd_pool_rep_count)
61 subprocess.check_call(['chown', '-R', '%s:%s' %
62- (RABBIT_USER,RABBIT_GROUP), RABBIT_DIR])
63+ (RABBIT_USER, RABBIT_GROUP), RABBIT_DIR])
64 else:
65 log('This is not the peer leader. Not configuring RBD.')
66 log('Stopping rabbitmq-server.')
67@@ -360,9 +365,20 @@
68 rsync(os.path.join(os.getenv('CHARM_DIR'), 'scripts',
69 'check_rabbitmq.py'),
70 os.path.join(NAGIOS_PLUGINS, 'check_rabbitmq.py'))
71+ rsync(os.path.join(os.getenv('CHARM_DIR'), 'scripts',
72+ 'check_rabbitmq_queues.py'),
73+ os.path.join(NAGIOS_PLUGINS, 'check_rabbitmq_queues.py'))
74+ if config('stats_cron_schedule'):
75+ script = os.path.join(SCRIPTS_DIR, 'collect_rabbitmq_stats.sh')
76+ cronjob = "{} root {}\n".format(config('stats_cron_schedule'), script)
77+ rsync(os.path.join(os.getenv('CHARM_DIR'), 'scripts',
78+ 'collect_rabbitmq_stats.sh'), script)
79+ write_file(STATS_CRONFILE, cronjob)
80+ elif os.path.isfile(STATS_CRONFILE):
81+ os.remove(STATS_CRONFILE)
82
83 # Find out if nrpe set nagios_hostname
84- hostname=None
85+ hostname = None
86 for rel in relations_of_type('nrpe-external-master'):
87 if 'nagios_hostname' in rel:
88 hostname = rel['nagios_hostname']
89@@ -384,6 +400,17 @@
90 check_cmd='{}/check_rabbitmq.py --user {} --password {} --vhost {}'
91 ''.format(NAGIOS_PLUGINS, user, password, vhost)
92 )
93+ if config('queue_thresholds'):
94+ cmd = ""
95+ # If value of queue_thresholds is incorrect we want the hook to fail
96+ for item in yaml.safe_load(config('queue_thresholds')):
97+ cmd += ' -c "{}" "{}" {} {}'.format(*item)
98+ nrpe_compat.add_check(
99+ shortname=rabbit.RABBIT_USER + '_queue',
100+ description='Check RabbitMQ Queues',
101+ check_cmd='{}/check_rabbitmq_queues.py{} {}'.format(
102+ NAGIOS_PLUGINS, cmd, STATS_DATAFILE)
103+ )
104 nrpe_compat.write()
105
106
107
108=== modified file 'revision'
109--- revision 2014-05-02 14:18:00 +0000
110+++ revision 2014-05-28 10:08:46 +0000
111@@ -1,1 +1,1 @@
112-128
113+150
114
115=== added file 'scripts/check_rabbitmq_queues.py'
116--- scripts/check_rabbitmq_queues.py 1970-01-01 00:00:00 +0000
117+++ scripts/check_rabbitmq_queues.py 2014-05-28 10:08:46 +0000
118@@ -0,0 +1,99 @@
119+#!/usr/bin/python
120+
121+# Copyright (C) 2011, 2012, 2014 Canonical
122+# All Rights Reserved
123+# Author: Liam Young, Jacek Nykis
124+
125+from collections import defaultdict
126+from fnmatch import fnmatchcase
127+from itertools import chain
128+import argparse
129+import sys
130+
131+def gen_data_lines(filename):
132+ with open(filename, "rb") as fin:
133+ for line in fin:
134+ if not line.startswith("#"):
135+ yield line
136+
137+
138+def gen_stats(data_lines):
139+ for line in data_lines:
140+ try:
141+ vhost, queue, _, _, m_all, _ = line.split(None, 5)
142+ except ValueError:
143+ print "ERROR: problem parsing the stats file"
144+ sys.exit(2)
145+ assert m_all.isdigit(), "Message count is not a number: %r" % m_all
146+ yield vhost, queue, int(m_all)
147+
148+
149+def collate_stats(stats, limits):
150+ # Create a dict with stats collated according to the definitions in the
151+ # limits file. If none of the definitions in the limits file is matched,
152+ # store the stat without collating.
153+ collated = defaultdict(lambda: 0)
154+ for vhost, queue, m_all in stats:
155+ for l_vhost, l_queue, _, _ in limits:
156+ if fnmatchcase(vhost, l_vhost) and fnmatchcase(queue, l_queue):
157+ collated[l_vhost, l_queue] += m_all
158+ break
159+ else:
160+ collated[vhost, queue] += m_all
161+ return collated
162+
163+
164+def check_stats(stats_collated, limits):
165+ # Create a limits lookup dict with keys of the form (vhost, queue).
166+ limits_lookup = dict(
167+ ((l_vhost, l_queue), (int(t_warning), int(t_critical)))
168+ for l_vhost, l_queue, t_warning, t_critical in limits)
169+ if not (stats_collated):
170+ yield 'No Queues Found', 'No Vhosts Found', None, "CRIT"
171+ # Go through the stats and compare again limits, if any.
172+ for l_vhost, l_queue in sorted(stats_collated):
173+ m_all = stats_collated[l_vhost, l_queue]
174+ try:
175+ t_warning, t_critical = limits_lookup[l_vhost, l_queue]
176+ except KeyError:
177+ yield l_queue, l_vhost, m_all, "UNKNOWN"
178+ else:
179+ if m_all >= t_critical:
180+ yield l_queue, l_vhost, m_all, "CRIT"
181+ elif m_all >= t_warning:
182+ yield l_queue, l_vhost, m_all, "WARN"
183+
184+
185+if __name__ == "__main__":
186+ parser = argparse.ArgumentParser(description='RabbitMQ queue size nagios check.')
187+ parser.add_argument('-c', nargs=4, action='append', required=True,
188+ metavar=('vhost', 'queue', 'warn', 'crit'),
189+ help=('Vhost and queue to check. Can be used multiple times'))
190+ parser.add_argument('stats_file', nargs='*', type=str, help='file containing queue stats')
191+ args = parser.parse_args()
192+
193+ # Start generating stats from all files given on the command line.
194+ stats = gen_stats(
195+ chain.from_iterable(
196+ gen_data_lines(filename) for filename in args.stats_file))
197+ # Collate stats according to limit definitions and check.
198+ stats_collated = collate_stats(stats, args.c)
199+ stats_checked = check_stats(stats_collated, args.c)
200+ criticals, warnings = [], []
201+ for queue, vhost, message_no, status in stats_checked:
202+ if status == "CRIT":
203+ criticals.append(
204+ "%s in %s has %s messages" % (queue, vhost, message_no))
205+ elif status == "WARN":
206+ warnings.append(
207+ "%s in %s has %s messages" % (queue, vhost, message_no))
208+ if len(criticals) > 0:
209+ print "CRITICALS: %s" % ", ".join(criticals)
210+ sys.exit(2)
211+ # XXX: No warnings if there are criticals?
212+ elif len(warnings) > 0:
213+ print "WARNINGS: %s" % ", ".join(warnings)
214+ sys.exit(1)
215+ else:
216+ print "OK"
217+ sys.exit(0)
218
219=== added file 'scripts/collect_rabbitmq_stats.sh'
220--- scripts/collect_rabbitmq_stats.sh 1970-01-01 00:00:00 +0000
221+++ scripts/collect_rabbitmq_stats.sh 2014-05-28 10:08:46 +0000
222@@ -0,0 +1,49 @@
223+#!/bin/bash
224+# Copyright (C) 2011, 2014 Canonical
225+# All Rights Reserved
226+# Author: Liam Young, Jacek Nykis
227+
228+# Produce a queue data for a given vhost. Useful for graphing and Nagios checks
229+LOCK=/var/lock/rabbitmq-gather-metrics.lock
230+# Check for a lock file and if not, create one
231+lockfile-create -r2 --lock-name $LOCK > /dev/null 2>&1
232+if [ $? -ne 0 ]; then
233+ exit 1
234+fi
235+trap "rm -f $LOCK > /dev/null 2>&1" exit
236+
237+# Required to fix the bug about start-stop-daemon not being found in
238+# rabbitmq-server 2.7.1-0ubuntu4.
239+# '/usr/sbin/rabbitmqctl: 33: /usr/sbin/rabbitmqctl: start-stop-daemon: not found'
240+export PATH=${PATH}:/sbin/
241+
242+if [ -f /var/lib/rabbitmq/pids ]; then
243+ RABBIT_PID=$(grep "{rabbit\@${HOSTNAME}," /var/lib/rabbitmq/pids | sed -e 's!^.*,\([0-9]*\).*!\1!')
244+elif [ -f /var/run/rabbitmq/pid ]; then
245+ RABBIT_PID=$(cat /var/run/rabbitmq/pid)
246+else
247+ echo "No PID file found"
248+ exit 3
249+fi
250+DATA_DIR="/var/lib/rabbitmq/data"
251+DATA_FILE="${DATA_DIR}/$(hostname -s)_queue_stats.dat"
252+LOG_DIR="/var/lib/rabbitmq/logs"
253+RABBIT_STATS_DATA_FILE="${DATA_DIR}/$(hostname -s)_general_stats.dat"
254+NOW=$(date +'%s')
255+HOSTNAME=$(hostname -s)
256+MNESIA_DB_SIZE=$(du -sm /var/lib/rabbitmq/mnesia | cut -f1)
257+RABBIT_RSS=$(ps -p $RABBIT_PID -o rss=)
258+if [ ! -d $DATA_DIR ]; then
259+ mkdir -p $DATA_DIR
260+fi
261+if [ ! -d $LOG_DIR ]; then
262+ mkdir -p $LOG_DIR
263+fi
264+echo "#Vhost Name Messages_ready Messages_unacknowledged Messages Consumers Memory Time" > $DATA_FILE
265+/usr/sbin/rabbitmqctl -q list_vhosts | \
266+while read VHOST; do
267+ /usr/sbin/rabbitmqctl -q list_queues -p $VHOST name messages_ready messages_unacknowledged messages consumers memory | \
268+ awk "{print \"$VHOST \" \$0 \" $(date +'%s') \"}" >> $DATA_FILE 2>${LOG_DIR}/list_queues.log
269+done
270+echo "mnesia_size: ${MNESIA_DB_SIZE}@$NOW" > $RABBIT_STATS_DATA_FILE
271+echo "rss_size: ${RABBIT_RSS}@$NOW" >> $RABBIT_STATS_DATA_FILE

Subscribers

People subscribed via source and target branches