Merge lp:~james-w/charms/precise/postgresql/metrics into lp:charms/postgresql

Proposed by James Westby
Status: Merged
Merged at revision: 107
Proposed branch: lp:~james-w/charms/precise/postgresql/metrics
Merge into: lp:charms/postgresql
Diff against target: 352 lines (+292/-5)
4 files modified
config.yaml (+17/-0)
files/metrics/postgres_to_statsd.py (+217/-0)
hooks/hooks.py (+55/-5)
templates/metrics_cronjob.template (+3/-0)
To merge this branch: bzr merge lp:~james-w/charms/precise/postgresql/metrics
Reviewer Review Type Date Requested Status
Review Queue (community) automated testing Needs Fixing
Stuart Bishop (community) Approve
Review via email: mp+231322@code.launchpad.net

Commit message

Add support for sending metrics to statsd.

Description of the change

Hi,

This adds support for sending postgres metrics to statsd in much the same
way as is done for haproxy and squid in their charms.

Thanks,

James

To post a comment you must log in.
Revision history for this message
Stuart Bishop (stub) wrote :

This all looks good, and I can't really fault anything.

There is are extensions in psycopg2 that return rows as dictionarys or namedtuples, which would stop you needing to return the list of column names from cur.description and make the code quite a bit simpler. The way you have works fine too though.

+ log("Required config not found or invalid "
+ "(metrics_target, metrics_sample_interval), "
+ "disabling metrics")

The above log message should be WARNING if the config is invalid (INFO or DEBUG if it is not set).

+ metrics_prefix = metrics_prefix.replace(
+ "$UNIT", hookenv.local_unit().replace('.', '-').replace('/', '-'))

You might want to use the existing sanitize() helper here.

+ shutil.copy2('%s/files/metrics/postgres_to_statsd.py' % charm_dir,
+ script_path)

You should use host.write_file() from charm-helpers here, which will also ensure permissions are correct rather than relying on the umask being set the way you expect.

+# crontab for pushing postgres metrics to statsd
+*/{{ metrics_sample_interval }} * * * * postgres python {{ script }} | python -c "import socket, sys; sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM); map(lambda line: sock.sendto(line, ('{{ statsd_host }}', {{ statsd_port }})), map(lambda line: '{{ metrics_prefix }}' + '.' + line, sys.stdin))"

This scares me. I'd rather a separate small script (or use 'nc' from one of the netcat packages if that works), rather than have the python inlined in the crontab. Also, we should use 'run-one' from the 'run-one' package to ensure we don't get a backlog of processes if database connections start hanging. I want to use this for the other cronjobs too, but haven't gotten around to it yet.

I notice that there are no tests. This would be nice to have, but don't have to go in this branch.

Approving as this is an improvement as it stands, and none of the above comments are requirements.

review: Approve
Revision history for this message
Review Queue (review-queue) wrote :

The results (PASS) are in and available here: http://reports.vapour.ws/charm-tests/charm-bundle-test-958-results

review: Approve (cbt)
Revision history for this message
Review Queue (review-queue) wrote :

This items has failed automated testing! Results available here http://reports.vapour.ws/charm-tests/charm-bundle-test-1045-results

review: Needs Fixing (automated testing)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'config.yaml'
2--- config.yaml 2014-06-03 09:24:14 +0000
3+++ config.yaml 2014-08-19 09:18:22 +0000
4@@ -439,3 +439,20 @@
5 description: |
6 The status of service-affecting packages will be set to this value in the dpkg database.
7 Useful valid values are "install" and "hold".
8+ # statsd-compatible metrics
9+ metrics_target:
10+ default: ""
11+ type: string
12+ description: |
13+ Destination for statsd-format metrics, format "host:port". If
14+ not present and valid, metrics disabled.
15+ metrics_prefix:
16+ default: "dev.$UNIT.postgresql"
17+ type: string
18+ description: |
19+ Prefix for metrics. Special value $UNIT can be used to include the
20+ name of the unit in the prefix.
21+ metrics_sample_interval:
22+ default: 5
23+ type: int
24+ description: Period for metrics cron job to run in minutes
25
26=== added directory 'files'
27=== added directory 'files/metrics'
28=== added file 'files/metrics/postgres_to_statsd.py'
29--- files/metrics/postgres_to_statsd.py 1970-01-01 00:00:00 +0000
30+++ files/metrics/postgres_to_statsd.py 2014-08-19 09:18:22 +0000
31@@ -0,0 +1,217 @@
32+#!/usr/bin/env python
33+
34+from __future__ import print_function
35+
36+
37+from contextlib import contextmanager
38+from distutils.version import StrictVersion
39+import psycopg2
40+import sys
41+
42+
43+EXCLUDE_DBS = ['postgres', 'template0', 'template1']
44+
45+
46+STATS = {
47+ 'per-server': [
48+ {
49+ 'table': 'pg_stat_bgwriter',
50+ 'exclude_columns': ['stats_reset'],
51+ },
52+ # Some very interesting things, but not easy to create a key for,
53+ # perhaps need to build something to sample the information here
54+ # and summarize
55+ # pg_stat_activity
56+
57+ # Not sure what the key would be, as I'm not sure what would be
58+ # unique. Maybe it would have to be pid, which isn't great for
59+ # graphing
60+ # pg_stat_replication
61+ ],
62+ 'per-db': [
63+ {
64+ 'query': 'SELECT * FROM pg_stat_database WHERE datname=%(dbname)s;',
65+ 'exclude_columns': ['datid', 'datname', 'stats_reset'],
66+ },
67+ {
68+ 'query': 'SELECT * FROM pg_stat_database_conflicts WHERE datname=%(dbname)s;',
69+ 'exclude_columns': ['datid', 'datname'],
70+ },
71+ {
72+ 'table': 'pg_stat_user_tables',
73+ 'key': ['schemaname', 'relname'],
74+ 'alias': 'tables',
75+ 'exclude_columns': ['relid', 'last_vacuum', 'last_autovacuum', 'last_analyze', 'last_autoanalyze'],
76+ },
77+ {
78+ 'table': 'pg_stat_user_indexes',
79+ 'key': ['schemaname', 'relname', 'indexrelname'],
80+ 'alias': 'indexes',
81+ 'exclude_columns': ['relid', 'indexrelid'],
82+ },
83+ {
84+ 'table': 'pg_statio_user_tables',
85+ 'key': ['schemaname', 'relname'],
86+ 'alias': 'tables',
87+ 'exclude_columns': ['relid'],
88+ },
89+ {
90+ 'table': 'pg_statio_user_indexes',
91+ 'key': ['schemaname', 'relname', 'indexrelname'],
92+ 'alias': 'indexes',
93+ 'exclude_columns': ['relid', 'indexrelid'],
94+ },
95+ {
96+ 'table': 'pg_statio_user_sequences',
97+ 'key': ['schemaname', 'relname'],
98+ 'alias': 'sequences',
99+ 'exclude_columns': ['relid'],
100+ },
101+ {
102+ 'table': 'pg_stat_user_functions',
103+ 'key': ['schemaname', 'funcname'],
104+ 'alias': 'user_functions',
105+ 'exclude_columns': ['funcid'],
106+ },
107+ # Do we really care about sending all of the system table information?
108+ # pg_stat_sys_tables
109+ # pg_stat_sys_indexes
110+ # pg_statio_sys_tables
111+ # pg_statio_sys_indexes
112+ # pg_statio_sys_sequences
113+ {
114+ 'query': 'select COUNT(*) as connections from pg_stat_activity where datname=%(dbname)s;',
115+ },
116+ {
117+ 'query': 'select COUNT(*) as connections_waiting from pg_stat_activity where datname=%(dbname)s and waiting=true;',
118+ },
119+ {
120+ 'query': 'select COUNT(*) as connections_active from pg_stat_activity where datname=%(dbname)s and state="active";',
121+ 'version_required': '9.3',
122+ },
123+ {
124+ 'query': 'select COUNT(*) as connections_idle from pg_stat_activity where datname=%(dbname)s and state="idle";',
125+ 'version_required': '9.3',
126+ },
127+ {
128+ 'query': 'select COUNT(*) as connections_idle_in_transaction from pg_stat_activity where datname=%(dbname)s and state="idle in transaction";',
129+ 'version_required': '9.3',
130+ },
131+ {
132+ 'query': 'select COUNT(*) as connections_idle_in_transaction_aborted from pg_stat_activity where datname=%(dbname)s and state="idle in transaction (aborted)";',
133+ 'version_required': '9.3',
134+ },
135+ {
136+ 'query': 'select COUNT(*) as connections_fastpath_function_call from pg_stat_activity where datname=%(dbname)s and state="fastpath function call";',
137+ 'version_required': '9.3',
138+ },
139+ {
140+ 'query': 'select COUNT(*) as connections_disabled from pg_stat_activity where datname=%(dbname)s and state="disabled";',
141+ 'version_required': '9.3',
142+ },
143+ ],
144+}
145+
146+
147+def execute_query(description, cur, dbname=None):
148+ if 'table' in description:
149+ query = "SELECT * from {};".format(description['table'])
150+ else:
151+ query = description['query']
152+ cur.execute(query, dict(dbname=dbname))
153+ return cur.fetchall(), [desc.name for desc in cur.description]
154+
155+
156+def get_key_indices(description, column_names):
157+ key = description['key']
158+ if isinstance(key, list):
159+ key_indices = [column_names.index(k) for k in key]
160+ else:
161+ key_indices = [column_names.index(key)]
162+ return key_indices
163+
164+
165+def get_stats(description, conn, dbname=None):
166+ if 'version_required' in description:
167+ cur = conn.cursor()
168+ cur.execute('show SERVER_VERSION;')
169+ server_version = StrictVersion(cur.fetchone()[0])
170+ required = StrictVersion(description['version_required'])
171+ if server_version < required:
172+ return []
173+ cur = conn.cursor()
174+ rows, column_names = execute_query(description, cur, dbname=dbname)
175+ name_parts = []
176+ if dbname:
177+ name_parts.extend(['databases', dbname])
178+ if 'alias' in description:
179+ name_parts.append(description['alias'])
180+ elif 'table' in description:
181+ name_parts.append(description['table'])
182+ row_keys = []
183+ key_indices = []
184+ if 'key' in description:
185+ key_indices = get_key_indices(description, column_names)
186+ for row in rows:
187+ row_keys.append(".".join([row[i] for i in key_indices]))
188+ stats = []
189+ for i, row in enumerate(rows):
190+ row_name_parts = name_parts[:]
191+ if row_keys:
192+ row_name_parts.append(row_keys[i])
193+ for j, value in enumerate(row):
194+ cell_name_parts = row_name_parts[:]
195+ if key_indices and j in key_indices:
196+ continue
197+ column = column_names[j]
198+ if column in description.get('exclude_columns', []):
199+ continue
200+ cell_name_parts.append(column)
201+ if value is None:
202+ value = 0
203+ stats.append((".".join(cell_name_parts), value))
204+ return stats
205+
206+
207+def list_dbnames(conn):
208+ cur = conn.cursor()
209+ cur.execute("SELECT datname from pg_stat_database;")
210+ return [r[0] for r in cur.fetchall() if r[0] not in EXCLUDE_DBS]
211+
212+
213+@contextmanager
214+def connect_to_db(dbname):
215+ conn = psycopg2.connect("dbname={}".format(dbname))
216+ try:
217+ yield conn
218+ finally:
219+ conn.close()
220+
221+
222+def statsdify(stat):
223+ return "{}:{}|g'".format(stat[0], stat[1])
224+
225+
226+def write_stats(stats):
227+ map(print, map(statsdify, stats))
228+
229+
230+def main(args):
231+ stats = []
232+ with connect_to_db('postgres') as conn:
233+ for description in STATS['per-server']:
234+ stats.extend(get_stats(description, conn))
235+ dbnames = list_dbnames(conn)
236+ for dbname in dbnames:
237+ with connect_to_db(dbname) as conn:
238+ for description in STATS['per-db']:
239+ stats.extend(get_stats(description, conn, dbname))
240+ write_stats(stats)
241+
242+
243+def run():
244+ sys.exit(main(sys.argv[1:]))
245+
246+
247+if __name__ == '__main__':
248+ run()
249
250=== modified file 'hooks/hooks.py'
251--- hooks/hooks.py 2014-06-16 09:38:33 +0000
252+++ hooks/hooks.py 2014-08-19 09:18:22 +0000
253@@ -90,6 +90,14 @@
254 return host.lsb_release()['DISTRIB_CODENAME']
255
256
257+def render_template(template_name, vars):
258+ # deferred import so install hook can install jinja2
259+ templates_dir = os.path.join(os.environ['CHARM_DIR'], 'templates')
260+ template_env = Environment(loader=FileSystemLoader(templates_dir))
261+ template = template_env.get_template(template_name)
262+ return template.render(vars)
263+
264+
265 class State(dict):
266 """Encapsulate state common to the unit for republishing to relations."""
267 def __init__(self, state_file):
268@@ -940,6 +948,8 @@
269 postgresql_data_dir, pg_version(), config_data['cluster_name']))
270 update_service_port()
271 update_nrpe_checks()
272+ write_metrics_cronjob('/usr/local/bin/postgres_to_statsd.py',
273+ '/etc/cron.d/postgres_metrics')
274 if force_restart:
275 postgresql_restart()
276 postgresql_reload_or_restart()
277@@ -2156,6 +2166,50 @@
278 units, lambda a, b: cmp(int(a.split('/')[-1]), int(b.split('/')[-1])))
279
280
281+def delete_metrics_cronjob(cron_path):
282+ try:
283+ os.unlink(cron_path)
284+ except OSError:
285+ pass
286+
287+
288+def write_metrics_cronjob(script_path, cron_path):
289+ config_data = hookenv.config()
290+
291+ # need the following two configs to be valid
292+ metrics_target = config_data['metrics_target'].strip()
293+ metrics_sample_interval = config_data['metrics_sample_interval']
294+ if (not metrics_target
295+ or ':' not in metrics_target
296+ or not metrics_sample_interval):
297+ log("Required config not found or invalid "
298+ "(metrics_target, metrics_sample_interval), "
299+ "disabling metrics")
300+ delete_metrics_cronjob(cron_path)
301+ return
302+
303+ charm_dir = os.environ['CHARM_DIR']
304+ statsd_host, statsd_port = metrics_target.split(':', 1)
305+ metrics_prefix = config_data['metrics_prefix'].strip()
306+ metrics_prefix = metrics_prefix.replace(
307+ "$UNIT", hookenv.local_unit().replace('.', '-').replace('/', '-'))
308+
309+ # ensure script installed
310+ shutil.copy2('%s/files/metrics/postgres_to_statsd.py' % charm_dir,
311+ script_path)
312+
313+ # write the crontab
314+ with open(cron_path, 'w') as cronjob:
315+ cronjob.write(render_template("metrics_cronjob.template", {
316+ 'interval': config_data['metrics_sample_interval'],
317+ 'script': script_path,
318+ 'metrics_prefix': metrics_prefix,
319+ 'metrics_sample_interval': metrics_sample_interval,
320+ 'statsd_host': statsd_host,
321+ 'statsd_port': statsd_port,
322+ }))
323+
324+
325 @hooks.hook('nrpe-external-master-relation-changed')
326 def update_nrpe_checks():
327 config_data = hookenv.config()
328@@ -2186,15 +2240,11 @@
329 os.remove(os.path.join('/var/lib/nagios/export/', f))
330
331 # --- exported service configuration file
332- template_env = Environment(
333- loader=FileSystemLoader(
334- os.path.join(os.environ['CHARM_DIR'], 'templates')))
335 templ_vars = {
336 'nagios_hostname': nagios_hostname,
337 'nagios_servicegroup': config_data['nagios_context'],
338 }
339- template = \
340- template_env.get_template('nrpe_service.tmpl').render(templ_vars)
341+ template = render_template('nrpe_service.tmpl', templ_vars)
342 with open(nrpe_service_file, 'w') as nrpe_service_config:
343 nrpe_service_config.write(str(template))
344
345
346=== added file 'templates/metrics_cronjob.template'
347--- templates/metrics_cronjob.template 1970-01-01 00:00:00 +0000
348+++ templates/metrics_cronjob.template 2014-08-19 09:18:22 +0000
349@@ -0,0 +1,3 @@
350+# crontab for pushing postgres metrics to statsd
351+*/{{ metrics_sample_interval }} * * * * postgres python {{ script }} | python -c "import socket, sys; sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM); map(lambda line: sock.sendto(line, ('{{ statsd_host }}', {{ statsd_port }})), map(lambda line: '{{ metrics_prefix }}' + '.' + line, sys.stdin))"
352+

Subscribers

People subscribed via source and target branches