Merge ~jillrouleau/charm-nrpe:xfs_checks into ~nrpe-charmers/charm-nrpe:master

Proposed by Jill Rouleau
Status: Merged
Approved by: Xav Paice
Approved revision: fdc3cda34b22235ba2d7777e048abce3ae71c053
Merged at revision: 5bd86d0b09166224a866c6e83c6c9b0bf0d3f93d
Proposed branch: ~jillrouleau/charm-nrpe:xfs_checks
Merge into: ~nrpe-charmers/charm-nrpe:master
Diff against target: 84 lines (+59/-0)
3 files modified
config.yaml (+7/-0)
files/plugins/check_xfs_errors.py (+46/-0)
hooks/nrpe_helpers.py (+6/-0)
Reviewer Review Type Date Requested Status
Xav Paice (community) Approve
Review via email: mp+333717@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Xav Paice (xavpaice) wrote :

LGTM

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/config.yaml b/config.yaml
2index 3a55f70..b679aae 100644
3--- a/config.yaml
4+++ b/config.yaml
5@@ -143,3 +143,10 @@ options:
6 description: |
7 A string to be appended onto all the nrpe checks created by this charm
8 to avoid potential clashes with existing checks
9+ xfs_errors:
10+ default: "120"
11+ type: string
12+ description: |
13+ dmesg history length to check for xfs errors, in minutes
14+ .
15+ Set to '' in order to disable this check.
16diff --git a/files/plugins/check_xfs_errors.py b/files/plugins/check_xfs_errors.py
17new file mode 100755
18index 0000000..c031336
19--- /dev/null
20+++ b/files/plugins/check_xfs_errors.py
21@@ -0,0 +1,46 @@
22+#!/usr/bin/env python3
23+#
24+# Copyright 2017 Canonical Ltd
25+#
26+# Author: Jill Rouleau <jill.rouleau@canonical.com>
27+#
28+# Check for xfs errors and alert
29+#
30+
31+import sys
32+import re
33+import datetime
34+import subprocess
35+
36+# error messages commonly seen in dmesg on xfs errors
37+raw_xfs_errors = ['XFS_WANT_CORRUPTED_',
38+ 'xfs_error_report',
39+ 'corruption detected at xfs_',
40+ 'Unmount and run xfs_repair']
41+
42+xfs_regex = [re.compile(i) for i in raw_xfs_errors]
43+
44+# nagios can't read from kern.log, so we look at dmesg - this does present
45+# a known limitation if a node is rebooted or dmesg is otherwise cleared.
46+log_lines = [line for line in subprocess.getoutput(['dmesg -T']).split('\n')]
47+
48+err_results = [line for line in log_lines for rgx in xfs_regex if
49+ re.search(rgx, line)]
50+
51+# Look for errors within the last N minutes, specified in the check definition
52+check_delta = int(sys.argv[1])
53+
54+# dmesg -T formatted timestamps are inside [], so we need to add them
55+datetime_delta = '['+(datetime.datetime.now() -
56+ datetime.timedelta(minutes=check_delta)
57+ ).strftime('%c')+']'
58+
59+recent_logs = [i for i in err_results if i >= datetime_delta]
60+
61+if recent_logs:
62+ print('CRITCAL: Recent XFS errors in kern.log.'+'\n'+'{}'.format(
63+ recent_logs))
64+ sys.exit(2)
65+else:
66+ print('OK')
67+ sys.exit(0)
68diff --git a/hooks/nrpe_helpers.py b/hooks/nrpe_helpers.py
69index 60008f9..cfcb348 100644
70--- a/hooks/nrpe_helpers.py
71+++ b/hooks/nrpe_helpers.py
72@@ -337,6 +337,12 @@ class SubordinateCheckDefinitions(dict):
73 'cmd_exec': local_plugin_dir + 'check_conntrack.sh',
74 'cmd_params': hookenv.config('conntrack'),
75 },
76+ {
77+ 'description': 'XFS Errors',
78+ 'cmd_name': 'check_xfs_errors',
79+ 'cmd_exec': local_plugin_dir + 'check_xfs_errors.py',
80+ 'cmd_params': hookenv.config('xfs_errors'),
81+ },
82 ]
83 self['checks'] = []
84 sub_postfix = str(hookenv.config("sub_postfix"))

Subscribers

People subscribed via source and target branches