Juju Charms Collection
postgresql package

Merge lp:~james-w/charms/precise/postgresql/metrics into lp:charms/postgresql

Proposed by James Westby on 2014-08-19

Status:	Merged
Merged at revision:	107
Proposed branch:	lp:~james-w/charms/precise/postgresql/metrics
Merge into:	lp:charms/postgresql
Diff against target:	352 lines (+292/-5) 4 files modified config.yaml (+17/-0) files/metrics/postgres_to_statsd.py (+217/-0) hooks/hooks.py (+55/-5) templates/metrics_cronjob.template (+3/-0)
To merge this branch:	bzr merge lp:~james-w/charms/precise/postgresql/metrics
Related bugs:	Link a bug report

Reviewer	Review Type	Date Requested	Status
Review Queue (community)	automated testing		Needs Fixing on 2014-09-21
Stuart Bishop (community)		2014-08-19	Approve on 2014-08-19
Review via email: mp+231322@code.launchpad.net

Commit message

Add support for sending metrics to statsd.

Description of the change

Hi,

This adds support for sending postgres metrics to statsd in much the same
way as is done for haproxy and squid in their charms.

Thanks,

James

Revision history for this message

Stuart Bishop (stub) wrote on 2014-08-19:

This all looks good, and I can't really fault anything.

There is are extensions in psycopg2 that return rows as dictionarys or namedtuples, which would stop you needing to return the list of column names from cur.description and make the code quite a bit simpler. The way you have works fine too though.

+ log("Required config not found or invalid "
+ "(metrics_target, metrics_sample_interval), "
+ "disabling metrics")

The above log message should be WARNING if the config is invalid (INFO or DEBUG if it is not set).

+ metrics_prefix = metrics_prefix.replace(
+ "$UNIT", hookenv.local_unit().replace('.', '-').replace('/', '-'))

You might want to use the existing sanitize() helper here.

+ shutil.copy2('%s/files/metrics/postgres_to_statsd.py' % charm_dir,
+ script_path)

You should use host.write_file() from charm-helpers here, which will also ensure permissions are correct rather than relying on the umask being set the way you expect.

+# crontab for pushing postgres metrics to statsd
+*/{{ metrics_sample_interval }} * * * * postgres python {{ script }} | python -c "import socket, sys; sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM); map(lambda line: sock.sendto(line, ('{{ statsd_host }}', {{ statsd_port }})), map(lambda line: '{{ metrics_prefix }}' + '.' + line, sys.stdin))"

This scares me. I'd rather a separate small script (or use 'nc' from one of the netcat packages if that works), rather than have the python inlined in the crontab. Also, we should use 'run-one' from the 'run-one' package to ensure we don't get a backlog of processes if database connections start hanging. I want to use this for the other cronjobs too, but haven't gotten around to it yet.

I notice that there are no tests. This would be nice to have, but don't have to go in this branch.

Approving as this is an improvement as it stands, and none of the above comments are requirements.

review: Approve

Revision history for this message

Review Queue (review-queue) wrote on 2014-09-12:

The results (PASS) are in and available here: http://reports.vapour.ws/charm-tests/charm-bundle-test-958-results

review: Approve (cbt)

Revision history for this message

Review Queue (review-queue) wrote on 2014-09-21:

This items has failed automated testing! Results available here http://reports.vapour.ws/charm-tests/charm-bundle-test-1045-results

review: Needs Fixing (automated testing)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Csaba TOTH

James Westby

Landscape

Mark Mims

charmers

 === modified file 'config.yaml'
 --- config.yaml	2014-06-03 09:24:14 +0000
 +++ config.yaml	2014-08-19 09:18:22 +0000
@@ -439,3 +439,20 @@
      description: |
          The status of service-affecting packages will be set to this value in the dpkg database.
          Useful valid values are "install" and "hold".
++  # statsd-compatible metrics
++  metrics_target:
++    default: ""
++    type: string
++    description: |
++        Destination for statsd-format metrics, format "host:port". If
++        not present and valid, metrics disabled.
++  metrics_prefix:
++    default: "dev.$UNIT.postgresql"
++    type: string
++    description: |
++        Prefix for metrics. Special value $UNIT can be used to include the
++        name of the unit in the prefix.
++  metrics_sample_interval:
++    default: 5
++    type: int
++    description: Period for metrics cron job to run in minutes
 === added directory 'files'
 === added directory 'files/metrics'
 === added file 'files/metrics/postgres_to_statsd.py'
 --- files/metrics/postgres_to_statsd.py	1970-01-01 00:00:00 +0000
 +++ files/metrics/postgres_to_statsd.py	2014-08-19 09:18:22 +0000
@@ -0,0 +1,217 @@
++#!/usr/bin/env python
++
++from __future__ import print_function
++
++
++from contextlib import contextmanager
++from distutils.version import StrictVersion
++import psycopg2
++import sys
++
++
++EXCLUDE_DBS = ['postgres', 'template0', 'template1']
++
++
++STATS = {
++    'per-server': [
++        {
++            'table': 'pg_stat_bgwriter',
++            'exclude_columns': ['stats_reset'],
++        },
++        # Some very interesting things, but not easy to create a key for,
++        #  perhaps need to build something to sample the information here
++        #  and summarize
++        # pg_stat_activity
++
++        # Not sure what the key would be, as I'm not sure what would be
++        #  unique. Maybe it would have to be pid, which isn't great for
++        #  graphing
++        # pg_stat_replication
++    ],
++    'per-db': [
++        {
++            'query': 'SELECT * FROM pg_stat_database WHERE datname=%(dbname)s;',
++            'exclude_columns': ['datid', 'datname', 'stats_reset'],
++        },
++        {
++            'query': 'SELECT * FROM pg_stat_database_conflicts WHERE datname=%(dbname)s;',
++            'exclude_columns': ['datid', 'datname'],
++        },
++        {
++            'table': 'pg_stat_user_tables',
++            'key': ['schemaname', 'relname'],
++            'alias': 'tables',
++            'exclude_columns': ['relid', 'last_vacuum', 'last_autovacuum', 'last_analyze', 'last_autoanalyze'],
++        },
++        {
++            'table': 'pg_stat_user_indexes',
++            'key': ['schemaname', 'relname', 'indexrelname'],
++            'alias': 'indexes',
++            'exclude_columns': ['relid', 'indexrelid'],
++        },
++        {
++            'table': 'pg_statio_user_tables',
++            'key': ['schemaname', 'relname'],
++            'alias': 'tables',
++            'exclude_columns': ['relid'],
++        },
++        {
++            'table': 'pg_statio_user_indexes',
++            'key': ['schemaname', 'relname', 'indexrelname'],
++            'alias': 'indexes',
++            'exclude_columns': ['relid', 'indexrelid'],
++        },
++        {
++            'table': 'pg_statio_user_sequences',
++            'key': ['schemaname', 'relname'],
++            'alias': 'sequences',
++            'exclude_columns': ['relid'],
++        },
++        {
++            'table': 'pg_stat_user_functions',
++            'key': ['schemaname', 'funcname'],
++            'alias': 'user_functions',
++            'exclude_columns': ['funcid'],
++        },
++        # Do we really care about sending all of the system table information?
++        # pg_stat_sys_tables
++        # pg_stat_sys_indexes
++        # pg_statio_sys_tables
++        # pg_statio_sys_indexes
++        # pg_statio_sys_sequences
++        {
++           'query': 'select COUNT(*) as connections from pg_stat_activity where datname=%(dbname)s;',
++        },
++        {
++           'query': 'select COUNT(*) as connections_waiting from pg_stat_activity where datname=%(dbname)s and waiting=true;',
++        },
++        {
++           'query': 'select COUNT(*) as connections_active from pg_stat_activity where datname=%(dbname)s and state="active";',
++           'version_required': '9.3',
++        },
++        {
++           'query': 'select COUNT(*) as connections_idle from pg_stat_activity where datname=%(dbname)s and state="idle";',
++           'version_required': '9.3',
++        },
++        {
++           'query': 'select COUNT(*) as connections_idle_in_transaction from pg_stat_activity where datname=%(dbname)s and state="idle in transaction";',
++           'version_required': '9.3',
++        },
++        {
++           'query': 'select COUNT(*) as connections_idle_in_transaction_aborted from pg_stat_activity where datname=%(dbname)s and state="idle in transaction (aborted)";',
++           'version_required': '9.3',
++        },
++        {
++           'query': 'select COUNT(*) as connections_fastpath_function_call from pg_stat_activity where datname=%(dbname)s and state="fastpath function call";',
++           'version_required': '9.3',
++        },
++        {
++           'query': 'select COUNT(*) as connections_disabled from pg_stat_activity where datname=%(dbname)s and state="disabled";',
++           'version_required': '9.3',
++        },
++    ],
++}
++
++
++def execute_query(description, cur, dbname=None):
++    if 'table' in description:
++        query = "SELECT * from {};".format(description['table'])
++    else:
++        query = description['query']
++    cur.execute(query, dict(dbname=dbname))
++    return cur.fetchall(), [desc.name for desc in cur.description]
++
++
++def get_key_indices(description, column_names):
++    key = description['key']
++    if isinstance(key, list):
++        key_indices = [column_names.index(k) for k in key]
++    else:
++        key_indices = [column_names.index(key)]
++    return key_indices
++
++
++def get_stats(description, conn, dbname=None):
++    if 'version_required' in description:
++        cur = conn.cursor()
++        cur.execute('show SERVER_VERSION;')
++        server_version = StrictVersion(cur.fetchone()[0])
++        required = StrictVersion(description['version_required'])
++        if server_version < required:
++            return []
++    cur = conn.cursor()
++    rows, column_names = execute_query(description, cur, dbname=dbname)
++    name_parts = []
++    if dbname:
++        name_parts.extend(['databases', dbname])
++    if 'alias' in description:
++        name_parts.append(description['alias'])
++    elif 'table' in description:
++        name_parts.append(description['table'])
++    row_keys = []
++    key_indices = []
++    if 'key' in description:
++        key_indices = get_key_indices(description, column_names)
++        for row in rows:
++            row_keys.append(".".join([row[i] for i in key_indices]))
++    stats = []
++    for i, row in enumerate(rows):
++        row_name_parts = name_parts[:]
++        if row_keys:
++            row_name_parts.append(row_keys[i])
++        for j, value in enumerate(row):
++            cell_name_parts = row_name_parts[:]
++            if key_indices and j in key_indices:
++                continue
++            column = column_names[j]
++            if column in description.get('exclude_columns', []):
++                continue
++            cell_name_parts.append(column)
++            if value is None:
++                value = 0
++            stats.append((".".join(cell_name_parts), value))
++    return stats
++
++
++def list_dbnames(conn):
++    cur = conn.cursor()
++    cur.execute("SELECT datname from pg_stat_database;")
++    return [r[0] for r in cur.fetchall() if r[0] not in EXCLUDE_DBS]
++
++
++@contextmanager
++def connect_to_db(dbname):
++    conn = psycopg2.connect("dbname={}".format(dbname))
++    try:
++        yield conn
++    finally:
++        conn.close()
++
++
++def statsdify(stat):
++    return "{}:{}|g'".format(stat[0], stat[1])
++
++
++def write_stats(stats):
++    map(print, map(statsdify, stats))
++
++
++def main(args):
++    stats = []
++    with connect_to_db('postgres') as conn:
++        for description in STATS['per-server']:
++            stats.extend(get_stats(description, conn))
++        dbnames = list_dbnames(conn)
++    for dbname in dbnames:
++        with connect_to_db(dbname) as conn:
++            for description in STATS['per-db']:
++                stats.extend(get_stats(description, conn, dbname))
++    write_stats(stats)
++
++
++def run():
++    sys.exit(main(sys.argv[1:]))
++
++
++if __name__ == '__main__':
++    run()
 === modified file 'hooks/hooks.py'
 --- hooks/hooks.py	2014-06-16 09:38:33 +0000
 +++ hooks/hooks.py	2014-08-19 09:18:22 +0000
@@ -90,6 +90,14 @@
      return host.lsb_release()['DISTRIB_CODENAME']
++def render_template(template_name, vars):
++    # deferred import so install hook can install jinja2
++    templates_dir = os.path.join(os.environ['CHARM_DIR'], 'templates')
++    template_env = Environment(loader=FileSystemLoader(templates_dir))
++    template = template_env.get_template(template_name)
++    return template.render(vars)
++
++
  class State(dict):
      """Encapsulate state common to the unit for republishing to relations."""
      def __init__(self, state_file):
@@ -940,6 +948,8 @@
          postgresql_data_dir, pg_version(), config_data['cluster_name']))
      update_service_port()
      update_nrpe_checks()
++    write_metrics_cronjob('/usr/local/bin/postgres_to_statsd.py',
++        '/etc/cron.d/postgres_metrics')
      if force_restart:
          postgresql_restart()
      postgresql_reload_or_restart()
@@ -2156,6 +2166,50 @@
          units, lambda a, b: cmp(int(a.split('/')[-1]), int(b.split('/')[-1])))
++def delete_metrics_cronjob(cron_path):
++    try:
++        os.unlink(cron_path)
++    except OSError:
++        pass
++
++
++def write_metrics_cronjob(script_path, cron_path):
++    config_data = hookenv.config()
++
++    # need the following two configs to be valid
++    metrics_target = config_data['metrics_target'].strip()
++    metrics_sample_interval = config_data['metrics_sample_interval']
++    if (not metrics_target
++            or ':' not in metrics_target
++            or not metrics_sample_interval):
++        log("Required config not found or invalid "
++            "(metrics_target, metrics_sample_interval), "
++            "disabling metrics")
++        delete_metrics_cronjob(cron_path)
++        return
++
++    charm_dir = os.environ['CHARM_DIR']
++    statsd_host, statsd_port = metrics_target.split(':', 1)
++    metrics_prefix = config_data['metrics_prefix'].strip()
++    metrics_prefix = metrics_prefix.replace(
++        "$UNIT", hookenv.local_unit().replace('.', '-').replace('/', '-'))
++
++    # ensure script installed
++    shutil.copy2('%s/files/metrics/postgres_to_statsd.py' % charm_dir,
++                 script_path)
++
++    # write the crontab
++    with open(cron_path, 'w') as cronjob:
++        cronjob.write(render_template("metrics_cronjob.template", {
++            'interval': config_data['metrics_sample_interval'],
++            'script': script_path,
++            'metrics_prefix': metrics_prefix,
++            'metrics_sample_interval': metrics_sample_interval,
++            'statsd_host': statsd_host,
++            'statsd_port': statsd_port,
++        }))
++
++
  @hooks.hook('nrpe-external-master-relation-changed')
  def update_nrpe_checks():
      config_data = hookenv.config()
@@ -2186,15 +2240,11 @@
              os.remove(os.path.join('/var/lib/nagios/export/', f))
      # --- exported service configuration file
--    template_env = Environment(
--        loader=FileSystemLoader(
--            os.path.join(os.environ['CHARM_DIR'], 'templates')))
      templ_vars = {
          'nagios_hostname': nagios_hostname,
          'nagios_servicegroup': config_data['nagios_context'],
+     }
--    template = \
--        template_env.get_template('nrpe_service.tmpl').render(templ_vars)
++    template = render_template('nrpe_service.tmpl', templ_vars)
      with open(nrpe_service_file, 'w') as nrpe_service_config:
          nrpe_service_config.write(str(template))
 === added file 'templates/metrics_cronjob.template'
 --- templates/metrics_cronjob.template	1970-01-01 00:00:00 +0000
 +++ templates/metrics_cronjob.template	2014-08-19 09:18:22 +0000
@@ -0,0 +1,3 @@
++# crontab for pushing postgres metrics to statsd
++*/{{ metrics_sample_interval }} * * * * postgres python {{ script }} | python -c "import socket, sys; sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM); map(lambda line: sock.sendto(line, ('{{ statsd_host }}', {{ statsd_port }})), map(lambda line: '{{ metrics_prefix }}' + '.' + line, sys.stdin))"
++

Juju Charms Collectionpostgresql package

Merge lp:~james-w/charms/precise/postgresql/metrics into lp:charms/postgresql

Commit message

Description of the change

Preview Diff

Subscribers

Juju Charms Collection
postgresql package