Juju Charms Collection
rabbitmq-server package

Merge lp:~jacekn/charms/precise/rabbitmq-server/queue-monitoring into lp:charms/rabbitmq-server

Proposed by Jacek Nykis on 2014-05-07

Status:	Merged
Merged at revision:	94
Proposed branch:	lp:~jacekn/charms/precise/rabbitmq-server/queue-monitoring
Merge into:	lp:charms/rabbitmq-server
Prerequisite:	lp:~jacekn/charms/precise/rabbitmq-server/nrpe-fix
Diff against target:	271 lines (+194/-5) 5 files modified config.yaml (+14/-0) hooks/rabbitmq_server_relations.py (+31/-4) revision (+1/-1) scripts/check_rabbitmq_queues.py (+99/-0) scripts/collect_rabbitmq_stats.sh (+49/-0)
To merge this branch:	bzr merge lp:~jacekn/charms/precise/rabbitmq-server/queue-monitoring
Related bugs:	Link a bug report

Reviewer	Date Requested	Status
Stuart Bishop (community)	2014-06-20	Needs Fixing on 2014-08-27
Matt Bruzek (community)	2014-05-07	Needs Fixing on 2014-06-20
Jacek Nykis (community)		Needs Resubmitting on 2014-05-28
Review via email: mp+218580@code.launchpad.net

Description of the change

This change adds support for queue monitoring by nagios.

lp:~jacekn/charms/precise/rabbitmq-server/queue-monitoring updated on 2014-05-19

58. By Jacek Nykis on 2014-05-19: Add quotes to the rabbitmq thresholds to allow wildcards

Revision history for this message

Matt Bruzek (mbruzek) wrote on 2014-05-21:

Jacek,

Thanks for this work on the rabbitmq-server! I realize monitoring is important for production environments and am looking forward to getting the monitoring code in the charm.

When I run charm proof on the merged code there are two new Warning messages:
$ charm proof
W: config.yaml: option stats_cron_schedule does not have the keys: default
W: config.yaml: option queue_thresholds does not have the keys: default
…

These new configuration values should have default values in the config.yaml file. You can have a default of “” or (I believe) NoneType is valid.

Thanks for chatting with me on IRC. The steps to test this were as follows:

juju deploy --repository=. local:precise/rabbitmq-server
juju deploy nrpe-external-master
juju add-relation rabbitmq-server nrpe-external-master

juju set rabbitmq-server stats_cron_schedule="*/5 * * * *"
juju set rabbitmq-server queue_thresholds="[['\*', '\*', 15, 30]]"

(Wait 5 minutes for the queue data to be generated by cron")
juju ssh rabbitmq-server/0
$ /usr/local/lib/nagios/plugins/check_rabbitmq_queues.py -c \* \* 15 30 /var/lib/rabbitmq/data/*_queue_stats.dat

The result of this command were:
ubuntu@mbruzek-local-machine-1:/var/lib/rabbitmq/data$ /usr/local/lib/nagios/plugins/check_rabbitmq_queues.py -c \* \* 15 30 /var/lib/rabbitmq/data/mbruzek-local-machine-1_queue_stats.dat
CRITICALS: No Queues Found in No Vhosts Found has None messages

Normally this result would indicate a failure, but you mentioned on IRC that is to be expected because there was nothing connected to rabbitmq-server to generate messages. I was unable to figure out how to generate queued messages in rabbitmq-server for further testing.

Please provide default values for the two new configuration options and give me some more details on how to generate some rabbitmq-traffic so we can see the command run successful.

Thank you again for the submission! I am going to put this MP to "needs fixing"

Once the problem has been addressed, click on the “Request another review” link on this merge proposal. That way it will be added to the review queue properly.

If you have any questions/comments/concerns about the review contact mbruzek on IRC. You can find the rest of the team in #juju on irc.freenode.net or email the mailing list <email address hidden>

Jacek,

Thanks for this work on the rabbitmq-server!  I realize monitoring is important for production environments and am looking forward to getting the monitoring code in the charm.

These new configuration values should have default values in the config.yaml file.  You can have a default of “” or (I believe) NoneType is valid.

Thanks for chatting with me on IRC.  The steps to test this were as follows:

juju deploy --repository=. local:precise/rabbitmq-server
juju deploy nrpe-external-master
juju add-relation rabbitmq-server nrpe-external-master

juju set rabbitmq-server stats_cron_schedule="*/5 * * * *"
juju set rabbitmq-server queue_thresholds="[['\*', '\*', 15, 30]]"

The result of this command were:
ubuntu@mbruzek-local-machine-1:/var/lib/rabbitmq/data$ /usr/local/lib/nagios/plugins/check_rabbitmq_queues.py -c \* \* 15 30 /var/lib/rabbitmq/data/mbruzek-local-machine-1_queue_stats.dat 
CRITICALS: No Queues Found in No Vhosts Found has None messages

Normally this result would indicate a failure, but you mentioned on IRC that is to be expected because there was nothing connected to rabbitmq-server to generate messages.  I was unable to figure out how to generate queued messages in rabbitmq-server for further testing.

Please provide default values for the two new configuration options and give me some more details on how to generate some rabbitmq-traffic so we can see the command run successful.
  
Thank you again for the submission!  I am going to put this MP to "needs fixing"

Once the problem has been addressed, click on the “Request another review” link on this merge proposal. That way it will be added to the review queue properly.

If you have any questions/comments/concerns about the review contact mbruzek on IRC.  You can find the rest of the team in #juju on irc.freenode.net or email the mailing list juju@lists.ubuntu.com

Revision history for this message

Matt Bruzek (mbruzek) on 2014-05-21:

review: Needs Fixing

lp:~jacekn/charms/precise/rabbitmq-server/queue-monitoring updated on 2014-05-28

59. By Jacek Nykis on 2014-05-28: Added default value (empty string) to stats_cron_schedule and queue_thresholds config options as per MP comment

Revision history for this message

Jacek Nykis (jacekn) wrote on 2014-05-28:

Hi Matt,

Thank you for reviewing the charm. I added default values (empty strings) and tested everything again.

If you want to verify the queue check you can use sample data below. This lines were generated by the same script as one in the charm. You can also modify 5th column to simulate queues filling up.

#Vhost Name Messages_ready Messages_unacknowledged Messages Consumers Memory Time
nagios-rabbitmq-server-0 test_exchange_queue 0 0 0 0 8952 1401271802
openstack ceilometer.collector 0 0 0 1 13984 1401271802
openstack ceilometer.collector.metering 0 0 0 1 14056 1401271802
openstack cert 0 0 0 1 13984 1401271802

review: Needs Resubmitting

Revision history for this message

Matt Bruzek (mbruzek) wrote on 2014-06-20:

Jacek,

Thank you for resubmitting this proposal with those fixes!

I deployed the charms as listed above and added the text (from your last comment) to the juju-hp-mbruzek-machine-1_queue_stats.dat file:
sudo vi juju-hp-mbruzek-machine-1_queue_stats.dat

The check_rabbitmq_queues.py still does not return any output! I am using hp-cloud but the output is essentially the same:
ubuntu@juju-hp-mbruzek-machine-1:~$ /usr/local/lib/nagios/plugins/check_rabbitmq_queues.py -c \* \* 15 30 /var/lib/rabbitmq/data/*_queue_stats.dat
CRITICALS: No Queues Found in No Vhosts Found has None messages

Also there are 2 new keys that need default: in config.yaml.
$ charm proof
W: config.yaml: option key does not have the keys: default
W: config.yaml: option source does not have the keys: default

These configuration options need the “default:” keyword in the config.yaml and my understanding is leaving them blank gives it a None value which is what I recommend for these options.

The rabbitmq queue has not filled up on any of the tests with the rabbitmq-server and nrpe-external-master charms deployed. Please provide detailed steps on how to set this charm up so the rabbitmq queue starts filling up so we can verify the Merge Proposal!

Putting this proposal in "Needs Fixing". Once you have addressed the issues please click on the “Request another review” link on this merge proposal. That way it will be added to the review queue properly.

If you have any questions/comments/concerns about the review contact mbruzek on IRC. You can find the rest of the team in #juju on irc.freenode.net or email the mailing list <email address hidden>

review: Needs Fixing

Revision history for this message

Alexander List (alexlist) wrote on 2014-08-05:

Please consider setting up an openstack trusty/icehouse deployment, which requires rabbitmq-server for collecting stats for ceilometer. Or any other service that uses AMQP.

The two empty ("") default values for "key" and "source" are missing in cs:precise/rabbitmq-server as well, so they should not block this MP.

Revision history for this message

Alexander List (alexlist) wrote on 2014-08-05:

We'll try to come up with a scenario that's testable, and update the MP to include these defaults.

Revision history for this message

Stuart Bishop (stub) wrote on 2014-08-27:

Per mbruzek's review

review: Needs Fixing

Revision history for this message

José Antonio Rey (jose) wrote on 2014-09-10:

Hey Jacek,

Based on other charmer's comments, I'm marking this as WIP. Make sure to move it to Needs Review and ask for another review from the ~charmers team once it's ready!

Thanks for all your efforts on making this charm better :)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Csaba TOTH

Jacek Nykis

Landscape

charmers

 === modified file 'config.yaml'
 --- config.yaml	2014-03-26 10:23:01 +0000
 +++ config.yaml	2014-05-28 10:08:46 +0000
@@ -133,3 +133,17 @@
        description: |
          Key ID to import to the apt keyring to support use with arbitary source
          configuration from outside of Launchpad archives or PPA's.
++  stats_cron_schedule:
++      type: string
++      default: ""
++      description: |
++        Cron schedule used to generate rabbitmq stats. If unset
++        no stats will be generated
++  queue_thresholds:
++      type: string
++      default: ""
++      description: |
++        List of RabbitMQ queue size check thresholds. Interpreted as YAML
++        in format [<vhost>, <queue>, <warn>, <crit>]
++        - ['/', 'queue1', 10, 20]
++        - ['/', 'queue2', 200, 300]
 === modified file 'hooks/rabbitmq_server_relations.py'
 --- hooks/rabbitmq_server_relations.py	2014-05-02 14:18:00 +0000
 +++ hooks/rabbitmq_server_relations.py	2014-05-28 10:08:46 +0000
@@ -5,6 +5,7 @@
  import sys
  import subprocess
  import glob
++import yaml
  import rabbit_utils as rabbit
  from lib.utils import (
@@ -41,7 +42,7 @@
      UnregisteredHookError
+ )
  from charmhelpers.core.host import (
--    rsync, service_stop, service_restart
++    rsync, service_stop, service_restart, write_file
+ )
  from charmhelpers.contrib.charmsupport.nrpe import NRPE
  from charmhelpers.contrib.ssl.service import ServiceCA
@@ -60,6 +61,10 @@
  RABBIT_USER = 'rabbitmq'
  RABBIT_GROUP = 'rabbitmq'
  NAGIOS_PLUGINS = '/usr/local/lib/nagios/plugins'
++SCRIPTS_DIR = '/usr/local/bin'
++STATS_CRONFILE = '/etc/cron.d/rabbitmq-stats'
++STATS_DATAFILE = os.path.join(RABBIT_DIR, 'data',
++                  subprocess.check_output(['hostname', '-s']).strip() + '_queue_stats.dat')
  @hooks.hook('install')
@@ -334,10 +339,10 @@
                                   rbd_img=rbd_img, sizemb=sizemb,
                                   fstype='ext4', mount_point=RABBIT_DIR,
                                   blk_device=blk_device,
--                                 system_services=['rabbitmq-server'])#,
++                                 system_services=['rabbitmq-server'])  # ,
                                   #rbd_pool_replicas=rbd_pool_rep_count)
          subprocess.check_call(['chown', '-R', '%s:%s' %
--            (RABBIT_USER,RABBIT_GROUP), RABBIT_DIR])
++                              (RABBIT_USER, RABBIT_GROUP), RABBIT_DIR])
      else:
          log('This is not the peer leader. Not configuring RBD.')
          log('Stopping rabbitmq-server.')
@@ -360,9 +365,20 @@
          rsync(os.path.join(os.getenv('CHARM_DIR'), 'scripts',
                             'check_rabbitmq.py'),
                os.path.join(NAGIOS_PLUGINS, 'check_rabbitmq.py'))
++        rsync(os.path.join(os.getenv('CHARM_DIR'), 'scripts',
++                           'check_rabbitmq_queues.py'),
++              os.path.join(NAGIOS_PLUGINS, 'check_rabbitmq_queues.py'))
++    if config('stats_cron_schedule'):
++        script = os.path.join(SCRIPTS_DIR, 'collect_rabbitmq_stats.sh')
++        cronjob = "{} root {}\n".format(config('stats_cron_schedule'), script)
++        rsync(os.path.join(os.getenv('CHARM_DIR'), 'scripts',
++                           'collect_rabbitmq_stats.sh'), script)
++        write_file(STATS_CRONFILE, cronjob)
++    elif os.path.isfile(STATS_CRONFILE):
++        os.remove(STATS_CRONFILE)
      # Find out if nrpe set nagios_hostname
--    hostname=None
++    hostname = None
      for rel in relations_of_type('nrpe-external-master'):
          if 'nagios_hostname' in rel:
              hostname = rel['nagios_hostname']
@@ -384,6 +400,17 @@
          check_cmd='{}/check_rabbitmq.py --user {} --password {} --vhost {}'
                    ''.format(NAGIOS_PLUGINS, user, password, vhost)
+     )
++    if config('queue_thresholds'):
++        cmd = ""
++        # If value of queue_thresholds is incorrect we want the hook to fail
++        for item in yaml.safe_load(config('queue_thresholds')):
++            cmd += ' -c "{}" "{}" {} {}'.format(*item)
++        nrpe_compat.add_check(
++            shortname=rabbit.RABBIT_USER + '_queue',
++            description='Check RabbitMQ Queues',
++            check_cmd='{}/check_rabbitmq_queues.py{} {}'.format(
++                        NAGIOS_PLUGINS, cmd, STATS_DATAFILE)
++        )
      nrpe_compat.write()
 === modified file 'revision'
 --- revision	2014-05-02 14:18:00 +0000
 +++ revision	2014-05-28 10:08:46 +0000
@@ -1,1 +1,1 @@
--128
++150
 === added file 'scripts/check_rabbitmq_queues.py'
 --- scripts/check_rabbitmq_queues.py	1970-01-01 00:00:00 +0000
 +++ scripts/check_rabbitmq_queues.py	2014-05-28 10:08:46 +0000
@@ -0,0 +1,99 @@
++#!/usr/bin/python
++
++# Copyright (C) 2011, 2012, 2014 Canonical
++# All Rights Reserved
++# Author: Liam Young, Jacek Nykis
++
++from collections import defaultdict
++from fnmatch import fnmatchcase
++from itertools import chain
++import argparse
++import sys
++
++def gen_data_lines(filename):
++    with open(filename, "rb") as fin:
++        for line in fin:
++            if not line.startswith("#"):
++                yield line
++
++
++def gen_stats(data_lines):
++    for line in data_lines:
++        try:
++            vhost, queue, _, _, m_all, _ = line.split(None, 5)
++        except ValueError:
++            print "ERROR: problem parsing the stats file"
++            sys.exit(2)
++        assert m_all.isdigit(), "Message count is not a number: %r" % m_all
++        yield vhost, queue, int(m_all)
++
++
++def collate_stats(stats, limits):
++    # Create a dict with stats collated according to the definitions in the
++    # limits file. If none of the definitions in the limits file is matched,
++    # store the stat without collating.
++    collated = defaultdict(lambda: 0)
++    for vhost, queue, m_all in stats:
++        for l_vhost, l_queue, _, _ in limits:
++            if fnmatchcase(vhost, l_vhost) and fnmatchcase(queue, l_queue):
++                collated[l_vhost, l_queue] += m_all
++                break
++        else:
++            collated[vhost, queue] += m_all
++    return collated
++
++
++def check_stats(stats_collated, limits):
++    # Create a limits lookup dict with keys of the form (vhost, queue).
++    limits_lookup = dict(
++        ((l_vhost, l_queue), (int(t_warning), int(t_critical)))
++        for l_vhost, l_queue, t_warning, t_critical in limits)
++    if not (stats_collated):
++        yield 'No Queues Found', 'No Vhosts Found', None, "CRIT"
++    # Go through the stats and compare again limits, if any.
++    for l_vhost, l_queue in sorted(stats_collated):
++        m_all = stats_collated[l_vhost, l_queue]
++        try:
++            t_warning, t_critical = limits_lookup[l_vhost, l_queue]
++        except KeyError:
++            yield l_queue, l_vhost, m_all, "UNKNOWN"
++        else:
++            if m_all >= t_critical:
++                yield l_queue, l_vhost, m_all, "CRIT"
++            elif m_all >= t_warning:
++                yield l_queue, l_vhost, m_all, "WARN"
++
++
++if __name__ == "__main__":
++    parser = argparse.ArgumentParser(description='RabbitMQ queue size nagios check.')
++    parser.add_argument('-c', nargs=4, action='append', required=True,
++        metavar=('vhost', 'queue', 'warn', 'crit'),
++        help=('Vhost and queue to check. Can be used multiple times'))
++    parser.add_argument('stats_file', nargs='*', type=str, help='file containing queue stats')
++    args = parser.parse_args()
++
++    # Start generating stats from all files given on the command line.
++    stats = gen_stats(
++        chain.from_iterable(
++            gen_data_lines(filename) for filename in args.stats_file))
++    # Collate stats according to limit definitions and check.
++    stats_collated = collate_stats(stats, args.c)
++    stats_checked = check_stats(stats_collated, args.c)
++    criticals, warnings = [], []
++    for queue, vhost, message_no, status in stats_checked:
++        if status == "CRIT":
++            criticals.append(
++                "%s in %s has %s messages" % (queue, vhost, message_no))
++        elif status == "WARN":
++            warnings.append(
++                "%s in %s has %s messages" % (queue, vhost, message_no))
++    if len(criticals) > 0:
++        print "CRITICALS: %s" % ", ".join(criticals)
++        sys.exit(2)
++        # XXX: No warnings if there are criticals?
++    elif len(warnings) > 0:
++        print "WARNINGS: %s" % ", ".join(warnings)
++        sys.exit(1)
++    else:
++        print "OK"
++        sys.exit(0)
 === added file 'scripts/collect_rabbitmq_stats.sh'
 --- scripts/collect_rabbitmq_stats.sh	1970-01-01 00:00:00 +0000
 +++ scripts/collect_rabbitmq_stats.sh	2014-05-28 10:08:46 +0000
@@ -0,0 +1,49 @@
++#!/bin/bash
++# Copyright (C) 2011, 2014 Canonical
++# All Rights Reserved
++# Author: Liam Young, Jacek Nykis
++
++# Produce a queue data for a given vhost. Useful for graphing and Nagios checks
++LOCK=/var/lock/rabbitmq-gather-metrics.lock
++# Check for a lock file and if not, create one
++lockfile-create -r2 --lock-name $LOCK > /dev/null 2>&1
++if [ $? -ne 0 ]; then
++    exit 1
++fi
++trap "rm -f $LOCK > /dev/null 2>&1" exit
++
++# Required to fix the bug about start-stop-daemon not being found in
++# rabbitmq-server 2.7.1-0ubuntu4.
++# '/usr/sbin/rabbitmqctl: 33: /usr/sbin/rabbitmqctl: start-stop-daemon: not found'
++export PATH=${PATH}:/sbin/
++
++if [ -f /var/lib/rabbitmq/pids ]; then
++    RABBIT_PID=$(grep "{rabbit\@${HOSTNAME}," /var/lib/rabbitmq/pids | sed -e 's!^.*,\([0-9]*\).*!\1!')
++elif [ -f /var/run/rabbitmq/pid ]; then
++    RABBIT_PID=$(cat /var/run/rabbitmq/pid)
++else
++    echo "No PID file found"
++    exit 3
++fi
++DATA_DIR="/var/lib/rabbitmq/data"
++DATA_FILE="${DATA_DIR}/$(hostname -s)_queue_stats.dat"
++LOG_DIR="/var/lib/rabbitmq/logs"
++RABBIT_STATS_DATA_FILE="${DATA_DIR}/$(hostname -s)_general_stats.dat"
++NOW=$(date +'%s')
++HOSTNAME=$(hostname -s)
++MNESIA_DB_SIZE=$(du -sm /var/lib/rabbitmq/mnesia | cut -f1)
++RABBIT_RSS=$(ps -p $RABBIT_PID -o rss=)
++if [ ! -d $DATA_DIR ]; then
++    mkdir -p $DATA_DIR
++fi
++if [ ! -d $LOG_DIR ]; then
++    mkdir -p $LOG_DIR
++fi
++echo "#Vhost Name Messages_ready Messages_unacknowledged Messages Consumers Memory Time" > $DATA_FILE
++/usr/sbin/rabbitmqctl -q list_vhosts | \
++while read VHOST; do
++    /usr/sbin/rabbitmqctl -q list_queues -p $VHOST name messages_ready messages_unacknowledged messages consumers memory | \
++    awk "{print \"$VHOST \" \$0 \" $(date +'%s') \"}" >> $DATA_FILE 2>${LOG_DIR}/list_queues.log
++done
++echo "mnesia_size: ${MNESIA_DB_SIZE}@$NOW" > $RABBIT_STATS_DATA_FILE
++echo "rss_size: ${RABBIT_RSS}@$NOW" >> $RABBIT_STATS_DATA_FILE

Juju Charms Collectionrabbitmq-server package

Merge lp:~jacekn/charms/precise/rabbitmq-server/queue-monitoring into lp:charms/rabbitmq-server

Commit message

Description of the change

Preview Diff

Subscribers

Juju Charms Collection
rabbitmq-server package