Merge lp:~james-w/charms/precise/postgresql/metrics into lp:charms/postgresql
- Precise Pangolin (12.04)
- metrics
- Merge into trunk
Proposed by
James Westby
Status: | Merged |
---|---|
Merged at revision: | 107 |
Proposed branch: | lp:~james-w/charms/precise/postgresql/metrics |
Merge into: | lp:charms/postgresql |
Diff against target: |
352 lines (+292/-5) 4 files modified
config.yaml (+17/-0) files/metrics/postgres_to_statsd.py (+217/-0) hooks/hooks.py (+55/-5) templates/metrics_cronjob.template (+3/-0) |
To merge this branch: | bzr merge lp:~james-w/charms/precise/postgresql/metrics |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Review Queue (community) | automated testing | Needs Fixing | |
Stuart Bishop (community) | Approve | ||
Review via email: mp+231322@code.launchpad.net |
Commit message
Add support for sending metrics to statsd.
Description of the change
Hi,
This adds support for sending postgres metrics to statsd in much the same
way as is done for haproxy and squid in their charms.
Thanks,
James
To post a comment you must log in.
Revision history for this message
Review Queue (review-queue) wrote : | # |
The results (PASS) are in and available here: http://
review:
Approve
(cbt)
Revision history for this message
Review Queue (review-queue) wrote : | # |
This items has failed automated testing! Results available here http://
review:
Needs Fixing
(automated testing)
Preview Diff
[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1 | === modified file 'config.yaml' |
2 | --- config.yaml 2014-06-03 09:24:14 +0000 |
3 | +++ config.yaml 2014-08-19 09:18:22 +0000 |
4 | @@ -439,3 +439,20 @@ |
5 | description: | |
6 | The status of service-affecting packages will be set to this value in the dpkg database. |
7 | Useful valid values are "install" and "hold". |
8 | + # statsd-compatible metrics |
9 | + metrics_target: |
10 | + default: "" |
11 | + type: string |
12 | + description: | |
13 | + Destination for statsd-format metrics, format "host:port". If |
14 | + not present and valid, metrics disabled. |
15 | + metrics_prefix: |
16 | + default: "dev.$UNIT.postgresql" |
17 | + type: string |
18 | + description: | |
19 | + Prefix for metrics. Special value $UNIT can be used to include the |
20 | + name of the unit in the prefix. |
21 | + metrics_sample_interval: |
22 | + default: 5 |
23 | + type: int |
24 | + description: Period for metrics cron job to run in minutes |
25 | |
26 | === added directory 'files' |
27 | === added directory 'files/metrics' |
28 | === added file 'files/metrics/postgres_to_statsd.py' |
29 | --- files/metrics/postgres_to_statsd.py 1970-01-01 00:00:00 +0000 |
30 | +++ files/metrics/postgres_to_statsd.py 2014-08-19 09:18:22 +0000 |
31 | @@ -0,0 +1,217 @@ |
32 | +#!/usr/bin/env python |
33 | + |
34 | +from __future__ import print_function |
35 | + |
36 | + |
37 | +from contextlib import contextmanager |
38 | +from distutils.version import StrictVersion |
39 | +import psycopg2 |
40 | +import sys |
41 | + |
42 | + |
43 | +EXCLUDE_DBS = ['postgres', 'template0', 'template1'] |
44 | + |
45 | + |
46 | +STATS = { |
47 | + 'per-server': [ |
48 | + { |
49 | + 'table': 'pg_stat_bgwriter', |
50 | + 'exclude_columns': ['stats_reset'], |
51 | + }, |
52 | + # Some very interesting things, but not easy to create a key for, |
53 | + # perhaps need to build something to sample the information here |
54 | + # and summarize |
55 | + # pg_stat_activity |
56 | + |
57 | + # Not sure what the key would be, as I'm not sure what would be |
58 | + # unique. Maybe it would have to be pid, which isn't great for |
59 | + # graphing |
60 | + # pg_stat_replication |
61 | + ], |
62 | + 'per-db': [ |
63 | + { |
64 | + 'query': 'SELECT * FROM pg_stat_database WHERE datname=%(dbname)s;', |
65 | + 'exclude_columns': ['datid', 'datname', 'stats_reset'], |
66 | + }, |
67 | + { |
68 | + 'query': 'SELECT * FROM pg_stat_database_conflicts WHERE datname=%(dbname)s;', |
69 | + 'exclude_columns': ['datid', 'datname'], |
70 | + }, |
71 | + { |
72 | + 'table': 'pg_stat_user_tables', |
73 | + 'key': ['schemaname', 'relname'], |
74 | + 'alias': 'tables', |
75 | + 'exclude_columns': ['relid', 'last_vacuum', 'last_autovacuum', 'last_analyze', 'last_autoanalyze'], |
76 | + }, |
77 | + { |
78 | + 'table': 'pg_stat_user_indexes', |
79 | + 'key': ['schemaname', 'relname', 'indexrelname'], |
80 | + 'alias': 'indexes', |
81 | + 'exclude_columns': ['relid', 'indexrelid'], |
82 | + }, |
83 | + { |
84 | + 'table': 'pg_statio_user_tables', |
85 | + 'key': ['schemaname', 'relname'], |
86 | + 'alias': 'tables', |
87 | + 'exclude_columns': ['relid'], |
88 | + }, |
89 | + { |
90 | + 'table': 'pg_statio_user_indexes', |
91 | + 'key': ['schemaname', 'relname', 'indexrelname'], |
92 | + 'alias': 'indexes', |
93 | + 'exclude_columns': ['relid', 'indexrelid'], |
94 | + }, |
95 | + { |
96 | + 'table': 'pg_statio_user_sequences', |
97 | + 'key': ['schemaname', 'relname'], |
98 | + 'alias': 'sequences', |
99 | + 'exclude_columns': ['relid'], |
100 | + }, |
101 | + { |
102 | + 'table': 'pg_stat_user_functions', |
103 | + 'key': ['schemaname', 'funcname'], |
104 | + 'alias': 'user_functions', |
105 | + 'exclude_columns': ['funcid'], |
106 | + }, |
107 | + # Do we really care about sending all of the system table information? |
108 | + # pg_stat_sys_tables |
109 | + # pg_stat_sys_indexes |
110 | + # pg_statio_sys_tables |
111 | + # pg_statio_sys_indexes |
112 | + # pg_statio_sys_sequences |
113 | + { |
114 | + 'query': 'select COUNT(*) as connections from pg_stat_activity where datname=%(dbname)s;', |
115 | + }, |
116 | + { |
117 | + 'query': 'select COUNT(*) as connections_waiting from pg_stat_activity where datname=%(dbname)s and waiting=true;', |
118 | + }, |
119 | + { |
120 | + 'query': 'select COUNT(*) as connections_active from pg_stat_activity where datname=%(dbname)s and state="active";', |
121 | + 'version_required': '9.3', |
122 | + }, |
123 | + { |
124 | + 'query': 'select COUNT(*) as connections_idle from pg_stat_activity where datname=%(dbname)s and state="idle";', |
125 | + 'version_required': '9.3', |
126 | + }, |
127 | + { |
128 | + 'query': 'select COUNT(*) as connections_idle_in_transaction from pg_stat_activity where datname=%(dbname)s and state="idle in transaction";', |
129 | + 'version_required': '9.3', |
130 | + }, |
131 | + { |
132 | + 'query': 'select COUNT(*) as connections_idle_in_transaction_aborted from pg_stat_activity where datname=%(dbname)s and state="idle in transaction (aborted)";', |
133 | + 'version_required': '9.3', |
134 | + }, |
135 | + { |
136 | + 'query': 'select COUNT(*) as connections_fastpath_function_call from pg_stat_activity where datname=%(dbname)s and state="fastpath function call";', |
137 | + 'version_required': '9.3', |
138 | + }, |
139 | + { |
140 | + 'query': 'select COUNT(*) as connections_disabled from pg_stat_activity where datname=%(dbname)s and state="disabled";', |
141 | + 'version_required': '9.3', |
142 | + }, |
143 | + ], |
144 | +} |
145 | + |
146 | + |
147 | +def execute_query(description, cur, dbname=None): |
148 | + if 'table' in description: |
149 | + query = "SELECT * from {};".format(description['table']) |
150 | + else: |
151 | + query = description['query'] |
152 | + cur.execute(query, dict(dbname=dbname)) |
153 | + return cur.fetchall(), [desc.name for desc in cur.description] |
154 | + |
155 | + |
156 | +def get_key_indices(description, column_names): |
157 | + key = description['key'] |
158 | + if isinstance(key, list): |
159 | + key_indices = [column_names.index(k) for k in key] |
160 | + else: |
161 | + key_indices = [column_names.index(key)] |
162 | + return key_indices |
163 | + |
164 | + |
165 | +def get_stats(description, conn, dbname=None): |
166 | + if 'version_required' in description: |
167 | + cur = conn.cursor() |
168 | + cur.execute('show SERVER_VERSION;') |
169 | + server_version = StrictVersion(cur.fetchone()[0]) |
170 | + required = StrictVersion(description['version_required']) |
171 | + if server_version < required: |
172 | + return [] |
173 | + cur = conn.cursor() |
174 | + rows, column_names = execute_query(description, cur, dbname=dbname) |
175 | + name_parts = [] |
176 | + if dbname: |
177 | + name_parts.extend(['databases', dbname]) |
178 | + if 'alias' in description: |
179 | + name_parts.append(description['alias']) |
180 | + elif 'table' in description: |
181 | + name_parts.append(description['table']) |
182 | + row_keys = [] |
183 | + key_indices = [] |
184 | + if 'key' in description: |
185 | + key_indices = get_key_indices(description, column_names) |
186 | + for row in rows: |
187 | + row_keys.append(".".join([row[i] for i in key_indices])) |
188 | + stats = [] |
189 | + for i, row in enumerate(rows): |
190 | + row_name_parts = name_parts[:] |
191 | + if row_keys: |
192 | + row_name_parts.append(row_keys[i]) |
193 | + for j, value in enumerate(row): |
194 | + cell_name_parts = row_name_parts[:] |
195 | + if key_indices and j in key_indices: |
196 | + continue |
197 | + column = column_names[j] |
198 | + if column in description.get('exclude_columns', []): |
199 | + continue |
200 | + cell_name_parts.append(column) |
201 | + if value is None: |
202 | + value = 0 |
203 | + stats.append((".".join(cell_name_parts), value)) |
204 | + return stats |
205 | + |
206 | + |
207 | +def list_dbnames(conn): |
208 | + cur = conn.cursor() |
209 | + cur.execute("SELECT datname from pg_stat_database;") |
210 | + return [r[0] for r in cur.fetchall() if r[0] not in EXCLUDE_DBS] |
211 | + |
212 | + |
213 | +@contextmanager |
214 | +def connect_to_db(dbname): |
215 | + conn = psycopg2.connect("dbname={}".format(dbname)) |
216 | + try: |
217 | + yield conn |
218 | + finally: |
219 | + conn.close() |
220 | + |
221 | + |
222 | +def statsdify(stat): |
223 | + return "{}:{}|g'".format(stat[0], stat[1]) |
224 | + |
225 | + |
226 | +def write_stats(stats): |
227 | + map(print, map(statsdify, stats)) |
228 | + |
229 | + |
230 | +def main(args): |
231 | + stats = [] |
232 | + with connect_to_db('postgres') as conn: |
233 | + for description in STATS['per-server']: |
234 | + stats.extend(get_stats(description, conn)) |
235 | + dbnames = list_dbnames(conn) |
236 | + for dbname in dbnames: |
237 | + with connect_to_db(dbname) as conn: |
238 | + for description in STATS['per-db']: |
239 | + stats.extend(get_stats(description, conn, dbname)) |
240 | + write_stats(stats) |
241 | + |
242 | + |
243 | +def run(): |
244 | + sys.exit(main(sys.argv[1:])) |
245 | + |
246 | + |
247 | +if __name__ == '__main__': |
248 | + run() |
249 | |
250 | === modified file 'hooks/hooks.py' |
251 | --- hooks/hooks.py 2014-06-16 09:38:33 +0000 |
252 | +++ hooks/hooks.py 2014-08-19 09:18:22 +0000 |
253 | @@ -90,6 +90,14 @@ |
254 | return host.lsb_release()['DISTRIB_CODENAME'] |
255 | |
256 | |
257 | +def render_template(template_name, vars): |
258 | + # deferred import so install hook can install jinja2 |
259 | + templates_dir = os.path.join(os.environ['CHARM_DIR'], 'templates') |
260 | + template_env = Environment(loader=FileSystemLoader(templates_dir)) |
261 | + template = template_env.get_template(template_name) |
262 | + return template.render(vars) |
263 | + |
264 | + |
265 | class State(dict): |
266 | """Encapsulate state common to the unit for republishing to relations.""" |
267 | def __init__(self, state_file): |
268 | @@ -940,6 +948,8 @@ |
269 | postgresql_data_dir, pg_version(), config_data['cluster_name'])) |
270 | update_service_port() |
271 | update_nrpe_checks() |
272 | + write_metrics_cronjob('/usr/local/bin/postgres_to_statsd.py', |
273 | + '/etc/cron.d/postgres_metrics') |
274 | if force_restart: |
275 | postgresql_restart() |
276 | postgresql_reload_or_restart() |
277 | @@ -2156,6 +2166,50 @@ |
278 | units, lambda a, b: cmp(int(a.split('/')[-1]), int(b.split('/')[-1]))) |
279 | |
280 | |
281 | +def delete_metrics_cronjob(cron_path): |
282 | + try: |
283 | + os.unlink(cron_path) |
284 | + except OSError: |
285 | + pass |
286 | + |
287 | + |
288 | +def write_metrics_cronjob(script_path, cron_path): |
289 | + config_data = hookenv.config() |
290 | + |
291 | + # need the following two configs to be valid |
292 | + metrics_target = config_data['metrics_target'].strip() |
293 | + metrics_sample_interval = config_data['metrics_sample_interval'] |
294 | + if (not metrics_target |
295 | + or ':' not in metrics_target |
296 | + or not metrics_sample_interval): |
297 | + log("Required config not found or invalid " |
298 | + "(metrics_target, metrics_sample_interval), " |
299 | + "disabling metrics") |
300 | + delete_metrics_cronjob(cron_path) |
301 | + return |
302 | + |
303 | + charm_dir = os.environ['CHARM_DIR'] |
304 | + statsd_host, statsd_port = metrics_target.split(':', 1) |
305 | + metrics_prefix = config_data['metrics_prefix'].strip() |
306 | + metrics_prefix = metrics_prefix.replace( |
307 | + "$UNIT", hookenv.local_unit().replace('.', '-').replace('/', '-')) |
308 | + |
309 | + # ensure script installed |
310 | + shutil.copy2('%s/files/metrics/postgres_to_statsd.py' % charm_dir, |
311 | + script_path) |
312 | + |
313 | + # write the crontab |
314 | + with open(cron_path, 'w') as cronjob: |
315 | + cronjob.write(render_template("metrics_cronjob.template", { |
316 | + 'interval': config_data['metrics_sample_interval'], |
317 | + 'script': script_path, |
318 | + 'metrics_prefix': metrics_prefix, |
319 | + 'metrics_sample_interval': metrics_sample_interval, |
320 | + 'statsd_host': statsd_host, |
321 | + 'statsd_port': statsd_port, |
322 | + })) |
323 | + |
324 | + |
325 | @hooks.hook('nrpe-external-master-relation-changed') |
326 | def update_nrpe_checks(): |
327 | config_data = hookenv.config() |
328 | @@ -2186,15 +2240,11 @@ |
329 | os.remove(os.path.join('/var/lib/nagios/export/', f)) |
330 | |
331 | # --- exported service configuration file |
332 | - template_env = Environment( |
333 | - loader=FileSystemLoader( |
334 | - os.path.join(os.environ['CHARM_DIR'], 'templates'))) |
335 | templ_vars = { |
336 | 'nagios_hostname': nagios_hostname, |
337 | 'nagios_servicegroup': config_data['nagios_context'], |
338 | } |
339 | - template = \ |
340 | - template_env.get_template('nrpe_service.tmpl').render(templ_vars) |
341 | + template = render_template('nrpe_service.tmpl', templ_vars) |
342 | with open(nrpe_service_file, 'w') as nrpe_service_config: |
343 | nrpe_service_config.write(str(template)) |
344 | |
345 | |
346 | === added file 'templates/metrics_cronjob.template' |
347 | --- templates/metrics_cronjob.template 1970-01-01 00:00:00 +0000 |
348 | +++ templates/metrics_cronjob.template 2014-08-19 09:18:22 +0000 |
349 | @@ -0,0 +1,3 @@ |
350 | +# crontab for pushing postgres metrics to statsd |
351 | +*/{{ metrics_sample_interval }} * * * * postgres python {{ script }} | python -c "import socket, sys; sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM); map(lambda line: sock.sendto(line, ('{{ statsd_host }}', {{ statsd_port }})), map(lambda line: '{{ metrics_prefix }}' + '.' + line, sys.stdin))" |
352 | + |
This all looks good, and I can't really fault anything.
There is are extensions in psycopg2 that return rows as dictionarys or namedtuples, which would stop you needing to return the list of column names from cur.description and make the code quite a bit simpler. The way you have works fine too though.
+ log("Required config not found or invalid " sample_ interval) , "
+ "(metrics_target, metrics_
+ "disabling metrics")
The above log message should be WARNING if the config is invalid (INFO or DEBUG if it is not set).
+ metrics_prefix = metrics_ prefix. replace( local_unit( ).replace( '.', '-').replace('/', '-'))
+ "$UNIT", hookenv.
You might want to use the existing sanitize() helper here.
+ shutil. copy2(' %s/files/ metrics/ postgres_ to_statsd. py' % charm_dir,
+ script_path)
You should use host.write_file() from charm-helpers here, which will also ensure permissions are correct rather than relying on the umask being set the way you expect.
+# crontab for pushing postgres metrics to statsd sample_ interval }} * * * * postgres python {{ script }} | python -c "import socket, sys; sock = socket. socket( socket. AF_INET, socket.SOCK_DGRAM); map(lambda line: sock.sendto(line, ('{{ statsd_host }}', {{ statsd_port }})), map(lambda line: '{{ metrics_prefix }}' + '.' + line, sys.stdin))"
+*/{{ metrics_
This scares me. I'd rather a separate small script (or use 'nc' from one of the netcat packages if that works), rather than have the python inlined in the crontab. Also, we should use 'run-one' from the 'run-one' package to ensure we don't get a backlog of processes if database connections start hanging. I want to use this for the other cronjobs too, but haven't gotten around to it yet.
I notice that there are no tests. This would be nice to have, but don't have to go in this branch.
Approving as this is an improvement as it stands, and none of the above comments are requirements.