Merge ~addyess/charm-openstack-service-checks:bugs/lp1887561-check_octavia_filtering into ~llama-charmers/charm-openstack-service-checks:master

Proposed by Adam Dyess on 2020-07-14
Status: Merged
Approved by: Chris Sanders on 2020-07-15
Approved revision: dcf043ac09c1dea8c1fa18b7f85fa0c559911fa4
Merged at revision: b4a24bff8ea1f9d38d70b1297ef64c63b49327e0
Proposed branch: ~addyess/charm-openstack-service-checks:bugs/lp1887561-check_octavia_filtering
Merge into: ~llama-charmers/charm-openstack-service-checks:master
Diff against target: 802 lines (+370/-138)
9 files modified
README.md (+41/-0)
config.yaml (+20/-0)
files/plugins/check_octavia.py (+137/-89)
lib/lib_openstack_service_checks.py (+37/-15)
tests/unit/conftest.py (+4/-11)
tests/unit/test_check_cinder_services.py (+1/-5)
tests/unit/test_check_contrail_analytics_alarms.py (+12/-14)
tests/unit/test_check_nova_services.py (+1/-4)
tests/unit/test_check_octavia.py (+117/-0)
Reviewer Review Type Date Requested Status
Chris Sanders 2020-07-14 Approve on 2020-07-15
Review via email: mp+387394@code.launchpad.net

Commit message

Support for filtered alarming of octavia checks

To post a comment you must log in.
Adam Dyess (addyess) wrote :

This change added an ignore-list of keywords for each of the 4 octavia checks: loadbalancers, amphora, pools, and images.

each keyword in the ignore list will be blocked when it appears in the output of the check_octavia. Presume that you have a test or non-production loadbalancer you do not want alert checks from with the ID=deadbeef-1234-56789012-dead-beef

You can use this config

juju config <openstack-service-checks-app> octavia-loadbalancer-ignored='deadbeef-1234-56789012-dead-beef,'

to ignore any checks associated with the loadbalancer such as it being inactive or degraded.

Alternatively, you could silence all degraded loadbalancer alerts with

juju config <openstack-service-checks-app> octavia-loadbalancer-ignored='DEGRADED,'

Chris Sanders (chris.sanders) wrote :

The commit message here is a very good example, can you add something to the Readme about this as well?

A few in-line comments/questions as well.

review: Needs Information
review: Approve
Chris Sanders (chris.sanders) wrote :

Let's see those tests ;)

review: Needs Information
Chris Sanders (chris.sanders) wrote :

Alright thanks, +1

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
diff --git a/README.md b/README.md
index 54dceb0..f589fb9 100644
--- a/README.md
+++ b/README.md
@@ -38,6 +38,47 @@ If such API endpoints use TLS, new checks will monitor the certificates expirati
3838
39Alternatively, instead of the above relation, there is also an action "refresh-endpoint-checks" available. Running this action will update the service checks with the current endpoints.39Alternatively, instead of the above relation, there is also an action "refresh-endpoint-checks" available. Running this action will update the service checks with the current endpoints.
4040
41## Octavia Checks
42
43Knowning when an openstack load-balancer is having an issue is an important
44operational situation which this charm helps manage. There is both course
45grain control over octavia checks, as well as more fine-grained control by
46use of the following config items.
47
48### Course Grain
49
50 * `check-octavia`: `true` or `false` can enable or disable checks
51
52### Fine Grain
53
54 * `octavia-loadbalancers-ignored`
55 * `octavia-amphorae-ignored`
56 * `octavia-pools-ignored`
57 * `octavia-image-ignored`
58
59Each of these config items adds an ignore-list of keywords. Each keyword in
60the ignore list will be blocked when it appears in the output of the check.
61
62#### Examples
63
64---------------
65Ignoring a test or non-production loadbalancer with the ID=`deadbeef-1234
66-56789012-dead-beef` which is __INACTIVE__ or __DEGRADED__.
67```bash
68juju config my-openstack-service-checks octavia-loadbalancer-ignored='deadbeef-1234-56789012-dead-beef,'
69```
70
71Ignoring all loadbalancers which happen to be __DEGRADED__.
72```bash
73juju config my-openstack-service-checks octavia-loadbalancer-ignored='DEGRADED,'
74```
75
76Ignoring amphorae that are stuck in __BOOTING__ state
77```bash
78juju config my-openstack-service-checks octavia-amphorae-ignored='BOOTING,'
79```
80
81
41## Compute services monitoring82## Compute services monitoring
4283
43Compute services are monitored via the 'os-services' interface. Several thresholds can84Compute services are monitored via the 'os-services' interface. Several thresholds can
diff --git a/config.yaml b/config.yaml
index 81428e5..d36f4a9 100644
--- a/config.yaml
+++ b/config.yaml
@@ -15,6 +15,26 @@ options:
15 type: boolean15 type: boolean
16 description: |16 description: |
17 Switch to turn on or off check for octavia services.17 Switch to turn on or off check for octavia services.
18 octavia-loadbalancers-ignored:
19 type: string
20 default: ""
21 description: |
22 Comma separated list of octavia load balancer alerts to ignore
23 octavia-amphorae-ignored:
24 type: string
25 default: ""
26 description: |
27 Comma separated list of octavia amphorae alerts to ignore
28 octavia-pools-ignored:
29 type: string
30 default: ""
31 description: |
32 Comma separated list of octavia pool alerts to ignore
33 octavia-image-ignored:
34 type: string
35 default: ""
36 description: |
37 Comma separated list of octavia image alerts to ignore
18 octavia-amp-image-tag:38 octavia-amp-image-tag:
19 default: "octavia-amphora"39 default: "octavia-amphora"
20 type: string40 type: string
diff --git a/files/plugins/check_octavia.py b/files/plugins/check_octavia.py
index 09fb6b1..bb6debc 100755
--- a/files/plugins/check_octavia.py
+++ b/files/plugins/check_octavia.py
@@ -1,13 +1,19 @@
1#!/usr/bin/env python31#!/usr/bin/env python3
22
3import os
4import sys
5import json
6import argparse3import argparse
7import subprocess4import collections
8from datetime import datetime, timedelta5from datetime import datetime, timedelta
6import json
7import os
8import re
9import subprocess
10import sys
11
9import openstack12import openstack
1013
14
15Alarm = collections.namedtuple('Alarm', 'lvl, desc')
16DEFAULT_IGNORED = r''
11NAGIOS_STATUS_OK = 017NAGIOS_STATUS_OK = 0
12NAGIOS_STATUS_WARNING = 118NAGIOS_STATUS_WARNING = 1
13NAGIOS_STATUS_CRITICAL = 219NAGIOS_STATUS_CRITICAL = 2
@@ -21,12 +27,48 @@ NAGIOS_STATUS = {
21}27}
2228
2329
24def nagios_exit(status, message):30def filter_checks(alarms, ignored=DEFAULT_IGNORED):
31 """
32 Reduce all checks down to an overall check based on the highest level
33 not ignored
34
35 :param List[Tuple] alarms: list of alarms (lvl, message)
36 :param str ignored: regular expression of messages to ignore
37 :return:
38 """
39 search_re = re.compile(ignored)
40 full = [Alarm(lvl, msg) for lvl, msg in alarms]
41 ignoring = list(filter(lambda m: search_re.search(m.desc), full)) if ignored else []
42 important = set(full) - set(ignoring)
43
44 total_crit = len([a for a in full if a.lvl == NAGIOS_STATUS_CRITICAL])
45 important_crit = len([a for a in important if a.lvl == NAGIOS_STATUS_CRITICAL])
46 important_count = len(important)
47 if important_crit > 0:
48 status = NAGIOS_STATUS_CRITICAL
49 elif important_count > 0:
50 status = NAGIOS_STATUS_WARNING
51 else:
52 status = NAGIOS_STATUS_OK
53 msg = (
54 "total_alarms[{}], total_crit[{}], total_ignored[{}], "
55 "ignoring r'{}'\n"
56 .format(len(full), total_crit, len(ignoring), ignored)
57 )
58 msg += '\n'.join(_.desc for _ in sorted(important))
59 return status, msg
60
61
62def nagios_exit(args, results):
63 # parse ignored list
64 unique = sorted(filter(None, set(args.ignored.split(","))))
65 ignored_re = r'|'.join('(?:{})'.format(_) for _ in unique)
66
67 status, message = filter_checks(results, ignored=ignored_re)
25 assert status in NAGIOS_STATUS, "Invalid Nagios status code"68 assert status in NAGIOS_STATUS, "Invalid Nagios status code"
26 # prefix status name to message69 # prefix status name to message
27 output = '{}: {}'.format(NAGIOS_STATUS[status], message)70 output = '{}: {}'.format(NAGIOS_STATUS[status], message)
28 print(output) # nagios requires print to stdout, no stderr71 return status, output
29 sys.exit(status)
3072
3173
32def check_loadbalancers(connection):74def check_loadbalancers(connection):
@@ -39,77 +81,72 @@ def check_loadbalancers(connection):
39 lb_enabled = [lb for lb in lb_all if lb.is_admin_state_up]81 lb_enabled = [lb for lb in lb_all if lb.is_admin_state_up]
4082
41 # check provisioning_status is ACTIVE for each lb83 # check provisioning_status is ACTIVE for each lb
42 bad_lbs = [lb for lb in lb_enabled if lb.provisioning_status != 'ACTIVE']84 bad_lbs = [(
43 if bad_lbs:85 NAGIOS_STATUS_CRITICAL,
44 parts = ['loadbalancer {} provisioning_status is {}'.format(86 'loadbalancer {} provisioning_status is {}'.format(
45 lb.id, lb.provisioning_status) for lb in bad_lbs]87 lb.id, lb.provisioning_status)
46 message = ', '.join(parts)88 ) for lb in lb_enabled if lb.provisioning_status != 'ACTIVE']
47 return NAGIOS_STATUS_CRITICAL, message
4889
49 # raise WARNING if operating_status is not ONLINE90 # raise WARNING if operating_status is not ONLINE
50 bad_lbs = [lb for lb in lb_enabled if lb.operating_status != 'ONLINE']91 bad_lbs += [(
51 if bad_lbs:92 NAGIOS_STATUS_CRITICAL,
52 parts = ['loadbalancer {} operating_status is {}'.format(93 'loadbalancer {} operating_status is {}'.format(
53 lb.id, lb.operating_status) for lb in bad_lbs]94 lb.id, lb.operating_status)
54 message = ', '.join(parts)95 ) for lb in lb_enabled if lb.operating_status != 'ONLINE']
55 return NAGIOS_STATUS_CRITICAL, message
5696
57 net_mgr = connection.network
58 # check vip port exists for each lb97 # check vip port exists for each lb
59 bad_lbs = []98 net_mgr = connection.network
99 vip_lbs = []
60 for lb in lb_enabled:100 for lb in lb_enabled:
61 try:101 try:
62 net_mgr.get_port(lb.vip_port_id)102 net_mgr.get_port(lb.vip_port_id)
63 except openstack.exceptions.NotFoundException:103 except openstack.exceptions.NotFoundException:
64 bad_lbs.append(lb)104 vip_lbs.append(lb)
65 if bad_lbs:105 bad_lbs += [(
66 parts = ['vip port {} for loadbalancer {} not found'.format(106 NAGIOS_STATUS_CRITICAL,
67 lb.vip_port_id, lb.id) for lb in bad_lbs]107 'vip port {} for loadbalancer {} not found'.format(
68 message = ', '.join(parts)108 lb.vip_port_id, lb.id)
69 return NAGIOS_STATUS_CRITICAL, message109 ) for lb in vip_lbs]
70110
71 # warn about disabled lbs if no other error found111 # warn about disabled lbs if no other error found
72 lb_disabled = [lb for lb in lb_all if not lb.is_admin_state_up]112 bad_lbs += [(
73 if lb_disabled:113 NAGIOS_STATUS_WARNING,
74 parts = ['loadbalancer {} admin_state_up is False'.format(lb.id)114 'loadbalancer {} admin_state_up is False'.format(lb.id)
75 for lb in lb_disabled]115 ) for lb in lb_all if not lb.is_admin_state_up]
76 message = ', '.join(parts)
77 return NAGIOS_STATUS_WARNING, message
78116
79 return NAGIOS_STATUS_OK, 'loadbalancers are happy'117 return bad_lbs
80118
81119
82def check_pools(connection):120def check_pools(connection):
83 """check pools status."""121 """check pools status."""
84 lb_mgr = connection.load_balancer122 lb_mgr = connection.load_balancer
85 pools_all = lb_mgr.pools()123 pools_all = lb_mgr.pools()
124
125 # only check enabled pools
86 pools_enabled = [pool for pool in pools_all if pool.is_admin_state_up]126 pools_enabled = [pool for pool in pools_all if pool.is_admin_state_up]
87127
88 # check provisioning_status is ACTIVE for each pool128 # check provisioning_status is ACTIVE for each pool
89 bad_pools = [pool for pool in pools_enabled if pool.provisioning_status != 'ACTIVE']129 bad_pools = [(
90 if bad_pools:130 NAGIOS_STATUS_CRITICAL,
91 parts = ['pool {} provisioning_status is {}'.format(131 'pool {} provisioning_status is {}'.format(
92 pool.id, pool.provisioning_status) for pool in bad_pools]132 pool.id, pool.provisioning_status)
93 message = ', '.join(parts)133 ) for pool in pools_enabled if pool.provisioning_status != 'ACTIVE']
94 return NAGIOS_STATUS_CRITICAL, message
95134
96 # raise CRITICAL if operating_status is ERROR135 # raise CRITICAL if operating_status is ERROR
97 bad_pools = [pool for pool in pools_enabled if pool.operating_status == 'ERROR']136 bad_pools += [(
98 if bad_pools:137 NAGIOS_STATUS_CRITICAL,
99 parts = ['pool {} operating_status is {}'.format(138 'pool {} operating_status is {}'.format(
100 pool.id, pool.operating_status) for pool in bad_pools]139 pool.id, pool.operating_status)
101 message = ', '.join(parts)140 ) for pool in pools_enabled if pool.operating_status == 'ERROR']
102 return NAGIOS_STATUS_CRITICAL, message
103141
104 # raise WARNING if operating_status is NO_MONITOR142 # raise WARNING if operating_status is NO_MONITOR
105 bad_pools = [pool for pool in pools_enabled if pool.operating_status == 'NO_MONITOR']143 bad_pools += [(
106 if bad_pools:144 NAGIOS_STATUS_WARNING,
107 parts = ['pool {} operating_status is {}'.format(145 'pool {} operating_status is {}'.format(
108 pool.id, pool.operating_status) for pool in bad_pools]146 pool.id, pool.operating_status)
109 message = ', '.join(parts)147 ) for pool in pools_enabled if pool.operating_status == 'NO_MONITOR']
110 return NAGIOS_STATUS_WARNING, message
111148
112 return NAGIOS_STATUS_OK, 'pools are happy'149 return bad_pools
113150
114151
115def check_amphorae(connection):152def check_amphorae(connection):
@@ -120,7 +157,7 @@ def check_amphorae(connection):
120 resp = lb_mgr.get('/v2/octavia/amphorae')157 resp = lb_mgr.get('/v2/octavia/amphorae')
121 # python api is not available yet, use url158 # python api is not available yet, use url
122 if resp.status_code != 200:159 if resp.status_code != 200:
123 return NAGIOS_STATUS_WARNING, 'amphorae api not working'160 return [(NAGIOS_STATUS_WARNING, 'amphorae api not working')]
124161
125 data = json.loads(resp.content)162 data = json.loads(resp.content)
126 # ouput is like {"amphorae": [{...}, {...}, ...]}163 # ouput is like {"amphorae": [{...}, {...}, ...]}
@@ -128,26 +165,20 @@ def check_amphorae(connection):
128165
129 # raise CRITICAL for ERROR status166 # raise CRITICAL for ERROR status
130 bad_status_list = ('ERROR',)167 bad_status_list = ('ERROR',)
131 bad_items = [item for item in items if item['status'] in bad_status_list]168 bad_amp = [(
132 if bad_items:169 NAGIOS_STATUS_CRITICAL,
133 parts = [170 'amphora {} status is {}'.format(item['id'], item['status'])
134 'amphora {} status is {}'.format(item['id'], item['status'])171 ) for item in items if item['status'] in bad_status_list]
135 for item in bad_items]
136 message = ', '.join(parts)
137 return NAGIOS_STATUS_CRITICAL, message
138172
139 # raise WARNING for these status173 # raise WARNING for these status
140 bad_status_list = (174 bad_status_list = (
141 'PENDING_CREATE', 'PENDING_UPDATE', 'PENDING_DELETE', 'BOOTING')175 'PENDING_CREATE', 'PENDING_UPDATE', 'PENDING_DELETE', 'BOOTING')
142 bad_items = [item for item in items if item['status'] in bad_status_list]176 bad_amp += [(
143 if bad_items:177 NAGIOS_STATUS_WARNING,
144 parts = [178 'amphora {} status is {}'.format(item['id'], item['status'])
145 'amphora {} status is {}'.format(item['id'], item['status'])179 ) for item in items if item['status'] in bad_status_list]
146 for item in bad_items]
147 message = ', '.join(parts)
148 return NAGIOS_STATUS_WARNING, message
149180
150 return NAGIOS_STATUS_OK, 'amphorae are happy'181 return bad_amp
151182
152183
153def check_image(connection, tag, days):184def check_image(connection, tag, days):
@@ -157,28 +188,47 @@ def check_image(connection, tag, days):
157 if not images:188 if not images:
158 message = ('Octavia requires image with tag {} to create amphora, '189 message = ('Octavia requires image with tag {} to create amphora, '
159 'but none exist').format(tag)190 'but none exist').format(tag)
160 return NAGIOS_STATUS_CRITICAL, message191 return [(NAGIOS_STATUS_CRITICAL, message)]
161192
162 active_images = [image for image in images if image.status == 'active']193 active_images = [image for image in images if image.status == 'active']
163 if not active_images:194 if not active_images:
164 parts = ['{}({})'.format(image.name, image.id) for image in images]195 details = ['{}({})'.format(image.name, image.id) for image in images]
165 message = ('Octavia requires image with tag {} to create amphora, '196 message = ('Octavia requires image with tag {} to create amphora, '
166 'but none is active: {}').format(tag, ', '.join(parts))197 'but none are active: {}').format(tag, ', '.join(details))
167 return NAGIOS_STATUS_CRITICAL, message198 return [(NAGIOS_STATUS_CRITICAL, message)]
168199
169 # raise WARNING if image is too old200 # raise WARNING if image is too old
170 when = (datetime.now() - timedelta(days=days)).isoformat()201 when = (datetime.now() - timedelta(days=days)).isoformat()
171 # updated_at str format: '2019-12-05T18:21:25Z'202 # updated_at str format: '2019-12-05T18:21:25Z'
172 fresh_images = [image for image in active_images if image.updated_at > when]203 fresh_images = [image for image in active_images if image.updated_at > when]
173 if not fresh_images:204 if not fresh_images:
205 details = ['{}({})'.format(image.name, image.id) for image in images]
174 message = ('Octavia requires image with tag {} to create amphora, '206 message = ('Octavia requires image with tag {} to create amphora, '
175 'but it is older than {} days').format(tag, days)207 'but all images are older than {} day(s): {}'
176 return NAGIOS_STATUS_WARNING, message208 '').format(tag, days, ', '.join(details))
209 return [(NAGIOS_STATUS_WARNING, message)]
177210
178 return NAGIOS_STATUS_OK, 'image is ready'211 return []
179212
180213
181if __name__ == '__main__':214def process_checks(args):
215 # use closure to make all checks have same signature
216 # so we can handle them in same way
217 def _check_image(_connection):
218 return check_image(_connection, args.amp_image_tag, args.amp_image_days)
219
220 checks = {
221 'loadbalancers': check_loadbalancers,
222 'amphorae': check_amphorae,
223 'pools': check_pools,
224 'image': _check_image,
225 }
226
227 connection = openstack.connect(cloud='envvars')
228 return nagios_exit(args, checks[args.check](connection))
229
230
231def main():
182 parser = argparse.ArgumentParser(232 parser = argparse.ArgumentParser(
183 description='Check Octavia status',233 description='Check Octavia status',
184 formatter_class=argparse.ArgumentDefaultsHelpFormatter,234 formatter_class=argparse.ArgumentDefaultsHelpFormatter,
@@ -195,6 +245,11 @@ if __name__ == '__main__':
195 help='which check to run')245 help='which check to run')
196246
197 parser.add_argument(247 parser.add_argument(
248 '--ignored', dest="ignored", type=str,
249 default=DEFAULT_IGNORED,
250 help='Comma separated list of alerts to ignore')
251
252 parser.add_argument(
198 '--amp-image-tag', dest='amp_image_tag', default='octavia-amphora',253 '--amp-image-tag', dest='amp_image_tag', default='octavia-amphora',
199 help='amphora image tag for image check')254 help='amphora image tag for image check')
200255
@@ -211,17 +266,10 @@ if __name__ == '__main__':
211 os.environ[key.decode('utf-8')] = value.rstrip().decode('utf-8')266 os.environ[key.decode('utf-8')] = value.rstrip().decode('utf-8')
212 proc.communicate()267 proc.communicate()
213268
214 # use closure to make all checks have same signature269 status, message = process_checks(args)
215 # so we can handle them in same way270 print(message)
216 def _check_image(connection):271 sys.exit(status)
217 return check_image(connection, args.amp_image_tag, args.amp_image_days)
218272
219 checks = {
220 'loadbalancers': check_loadbalancers,
221 'amphorae': check_amphorae,
222 'pools': check_pools,
223 'image': _check_image,
224 }
225273
226 connection = openstack.connect(cloud='envvars')274if __name__ == '__main__':
227 nagios_exit(*checks[args.check](connection))275 main()
diff --git a/lib/lib_openstack_service_checks.py b/lib/lib_openstack_service_checks.py
index 5e82395..c74d61d 100644
--- a/lib/lib_openstack_service_checks.py
+++ b/lib/lib_openstack_service_checks.py
@@ -202,16 +202,8 @@ class OSCHelper():
202 charm_plugin_dir = os.path.join(hookenv.charm_dir(), 'files', 'plugins/')202 charm_plugin_dir = os.path.join(hookenv.charm_dir(), 'files', 'plugins/')
203 host.rsync(charm_plugin_dir, self.plugins_dir, options=['--executability'])203 host.rsync(charm_plugin_dir, self.plugins_dir, options=['--executability'])
204204
205 def render_checks(self, creds):205 def _render_nova_checks(self, nrpe):
206 render(source='nagios.novarc', target=self.novarc, context=creds,206 """Nova services health."""
207 owner='nagios', group='nagios')
208
209 nrpe = NRPE()
210 if not os.path.exists(self.plugins_dir):
211 os.makedirs(self.plugins_dir)
212
213 self.update_plugins()
214 # Nova services health
215 nova_check_command = os.path.join(self.plugins_dir, 'check_nova_services.py')207 nova_check_command = os.path.join(self.plugins_dir, 'check_nova_services.py')
216 check_command = '{} --warn {} --crit {} --skip-aggregates {} {}'.format(208 check_command = '{} --warn {} --crit {} --skip-aggregates {} {}'.format(
217 nova_check_command, self.nova_warn, self.nova_crit, self.nova_skip_aggregates,209 nova_check_command, self.nova_warn, self.nova_crit, self.nova_skip_aggregates,
@@ -221,7 +213,8 @@ class OSCHelper():
221 check_cmd=check_command,213 check_cmd=check_command,
222 )214 )
223215
224 # Neutron agents health216 def _render_neutron_checks(self, nrpe):
217 """Neutron agents health."""
225 if self.is_neutron_agents_check_enabled:218 if self.is_neutron_agents_check_enabled:
226 nrpe.add_check(shortname='neutron_agents',219 nrpe.add_check(shortname='neutron_agents',
227 description='Check that enabled Neutron agents are up',220 description='Check that enabled Neutron agents are up',
@@ -231,6 +224,7 @@ class OSCHelper():
231 else:224 else:
232 nrpe.remove_check(shortname='neutron_agents')225 nrpe.remove_check(shortname='neutron_agents')
233226
227 def _render_cinder_checks(self, nrpe):
234 # Cinder services health228 # Cinder services health
235 cinder_check_command = os.path.join(self.plugins_dir, 'check_cinder_services.py')229 cinder_check_command = os.path.join(self.plugins_dir, 'check_cinder_services.py')
236 check_command = '{} {}'.format(cinder_check_command, self.skip_disabled)230 check_command = '{} {}'.format(cinder_check_command, self.skip_disabled)
@@ -239,6 +233,7 @@ class OSCHelper():
239 check_cmd=check_command,233 check_cmd=check_command,
240 )234 )
241235
236 def _render_octavia_checks(self, nrpe):
242 # only care about octavia after 18.04237 # only care about octavia after 18.04
243 if host.lsb_release()['DISTRIB_RELEASE'] >= '18.04':238 if host.lsb_release()['DISTRIB_RELEASE'] >= '18.04':
244 if self.is_octavia_check_enabled:239 if self.is_octavia_check_enabled:
@@ -246,24 +241,34 @@ class OSCHelper():
246 script = os.path.join(self.plugins_dir, 'check_octavia.py')241 script = os.path.join(self.plugins_dir, 'check_octavia.py')
247242
248 for check in ('loadbalancers', 'amphorae', 'pools'):243 for check in ('loadbalancers', 'amphorae', 'pools'):
244 check_cmd = '{} --check {}'.format(script, check)
245 ignore = self.charm_config.get('octavia-%s-ignored' % check)
246 if ignore:
247 check_cmd += ' --ignored {}'.format(ignore)
249 nrpe.add_check(248 nrpe.add_check(
250 shortname='octavia_{}'.format(check),249 shortname='octavia_{}'.format(check),
251 description='Check octavia {} status'.format(check),250 description='Check octavia {} status'.format(check),
252 check_cmd='{} --check {}'.format(script, check),251 check_cmd=check_cmd,
253 )252 )
254253
255 # image check has extra args, add it separately254 # image check has extra args, add it separately
256 check = 'image'255 check = 'image'
256 check_cmd = "{} --check {}".format(script, check)
257 check_cmd += " --amp-image-tag {}".format(self.octavia_amp_image_tag)
258 check_cmd += " --amp-image-days {}".format(self.octavia_amp_image_days)
259 ignore = self.charm_config.get('octavia-%s-ignored' % check)
260 if ignore:
261 check_cmd += " --ignored {}".format(ignore)
257 nrpe.add_check(262 nrpe.add_check(
258 shortname='octavia_{}'.format(check),263 shortname='octavia_{}'.format(check),
259 description='Check octavia {} status'.format(check),264 description='Check octavia {} status'.format(check),
260 check_cmd='{} --check {} --amp-image-tag {} --amp-image-days {}'.format(265 check_cmd=check_cmd,
261 script, check, self.octavia_amp_image_tag, self.octavia_amp_image_days),
262 )266 )
263 else:267 else:
264 for check in ('loadbalancers', 'amphorae', 'pools', 'image'):268 for check in ('loadbalancers', 'amphorae', 'pools', 'image'):
265 nrpe.remove_check(shortname='octavia_{}'.format(check))269 nrpe.remove_check(shortname='octavia_{}'.format(check))
266270
271 def _render_contrail_checks(self, nrpe):
267 if self.contrail_analytics_vip:272 if self.contrail_analytics_vip:
268 contrail_check_command = '{} --host {}'.format(273 contrail_check_command = '{} --host {}'.format(
269 os.path.join(self.plugins_dir, 'check_contrail_analytics_alarms.py'),274 os.path.join(self.plugins_dir, 'check_contrail_analytics_alarms.py'),
@@ -279,6 +284,7 @@ class OSCHelper():
279 else:284 else:
280 nrpe.remove_check(shortname='contrail_analytics_alarms')285 nrpe.remove_check(shortname='contrail_analytics_alarms')
281286
287 def _render_dns_checks(self, nrpe):
282 if len(self.check_dns):288 if len(self.check_dns):
283 nrpe.add_check(shortname='dns_multi',289 nrpe.add_check(shortname='dns_multi',
284 description='Check DNS names are resolvable',290 description='Check DNS names are resolvable',
@@ -289,8 +295,24 @@ class OSCHelper():
289 )295 )
290 else:296 else:
291 nrpe.remove_check(shortname='dns_multi')297 nrpe.remove_check(shortname='dns_multi')
292 nrpe.write()
293298
299 def render_checks(self, creds):
300 render(source='nagios.novarc', target=self.novarc, context=creds,
301 owner='nagios', group='nagios')
302
303 nrpe = NRPE()
304 if not os.path.exists(self.plugins_dir):
305 os.makedirs(self.plugins_dir)
306
307 self.update_plugins()
308 self._render_nova_checks(nrpe)
309 self._render_neutron_checks(nrpe)
310 self._render_cinder_checks(nrpe)
311 self._render_octavia_checks(nrpe)
312 self._render_contrail_checks(nrpe)
313 self._render_dns_checks(nrpe)
314
315 nrpe.write()
294 self.create_endpoint_checks(creds)316 self.create_endpoint_checks(creds)
295317
296 def _split_url(self, netloc, scheme):318 def _split_url(self, netloc, scheme):
diff --git a/tests/unit/conftest.py b/tests/unit/conftest.py
index 6797e85..639b91a 100644
--- a/tests/unit/conftest.py
+++ b/tests/unit/conftest.py
@@ -4,6 +4,10 @@ import sys
44
5import pytest5import pytest
66
7TEST_DIR = os.path.dirname(__file__)
8CHECKS_DIR = os.path.join(TEST_DIR, '..', '..', 'files', 'plugins')
9sys.path.append(CHECKS_DIR)
10
711
8# If layer options are used, add this to openstackservicechecks12# If layer options are used, add this to openstackservicechecks
9# and import layer in lib_openstack_service_checks13# and import layer in lib_openstack_service_checks
@@ -77,14 +81,3 @@ def openstackservicechecks(tmpdir, mock_hookenv_config, mock_charm_dir, monkeypa
77 monkeypatch.setattr('lib_openstack_service_checks.OSCHelper', lambda: helper)81 monkeypatch.setattr('lib_openstack_service_checks.OSCHelper', lambda: helper)
7882
79 return helper83 return helper
80
81
82@pytest.fixture(scope='module')
83def check_contrail_analytics():
84 pre = sys.path
85 TEST_DIR = os.path.dirname(__file__)
86 tests_dir = os.path.join(TEST_DIR, '..', '..', 'files', 'plugins')
87 sys.path.append(tests_dir)
88 import check_contrail_analytics_alarms as checks # noqa
89 yield checks
90 sys.path = pre
diff --git a/tests/unit/test_check_cinder_services.py b/tests/unit/test_check_cinder_services.py
index 709b4dc..3428dd6 100644
--- a/tests/unit/test_check_cinder_services.py
+++ b/tests/unit/test_check_cinder_services.py
@@ -1,11 +1,7 @@
1import pytest1import pytest
2import nagios_plugin32import nagios_plugin3
33
4import sys4import check_cinder_services
5
6sys.path.append("files/plugins")
7
8import check_cinder_services # noqa: E402
95
106
11@pytest.mark.parametrize(7@pytest.mark.parametrize(
diff --git a/tests/unit/test_check_contrail_analytics_alarms.py b/tests/unit/test_check_contrail_analytics_alarms.py
index c7fd6e9..886a396 100644
--- a/tests/unit/test_check_contrail_analytics_alarms.py
+++ b/tests/unit/test_check_contrail_analytics_alarms.py
@@ -1,14 +1,15 @@
1import json1import json
2import os2import os
33
4import check_contrail_analytics_alarms
5
4TEST_DIR = os.path.dirname(__file__)6TEST_DIR = os.path.dirname(__file__)
57
68
7def test_parse_contrail_alarms(check_contrail_analytics):9def test_parse_contrail_alarms():
8 with open(os.path.join(TEST_DIR, 'contrail_alert_data.json')) as f:10 with open(os.path.join(TEST_DIR, 'contrail_alert_data.json')) as f:
9 data = json.load(f)11 data = json.load(f)
10 assert hasattr(check_contrail_analytics, 'parse_contrail_alarms')12 parsed = check_contrail_analytics_alarms.parse_contrail_alarms(data)
11 parsed = check_contrail_analytics.parse_contrail_alarms(data)
12 assert parsed in """13 assert parsed in """
13CRITICAL: total_alarms[11], unacked_or_sev_gt_0[10], total_ignored[0], ignoring r''14CRITICAL: total_alarms[11], unacked_or_sev_gt_0[10], total_ignored[0], ignoring r''
14CRITICAL: vrouter{compute-10.maas, sev=1, ts[2020-06-25 18:29:23.149146]} Vrouter interface(s) down.15CRITICAL: vrouter{compute-10.maas, sev=1, ts[2020-06-25 18:29:23.149146]} Vrouter interface(s) down.
@@ -25,12 +26,11 @@ CRITICAL: vrouter{compute-7.maas, sev=1, ts[2020-07-03 18:30:32.481386]} Vrouter
25""" # noqa: ignore=F50126""" # noqa: ignore=F501
2627
2728
28def test_parse_contrail_alarms_filter_vrouter_control_9(check_contrail_analytics):29def test_parse_contrail_alarms_filter_vrouter_control_9():
29 with open(os.path.join(TEST_DIR, 'contrail_alert_data.json')) as f:30 with open(os.path.join(TEST_DIR, 'contrail_alert_data.json')) as f:
30 data = json.load(f)31 data = json.load(f)
31 assert hasattr(check_contrail_analytics, 'parse_contrail_alarms')
32 ignored_re = r'(?:vrouter)|(?:control-9)'32 ignored_re = r'(?:vrouter)|(?:control-9)'
33 parsed = check_contrail_analytics.parse_contrail_alarms(data, ignored=ignored_re)33 parsed = check_contrail_analytics_alarms.parse_contrail_alarms(data, ignored=ignored_re)
34 assert parsed in """ 34 assert parsed in """
35CRITICAL: total_alarms[11], unacked_or_sev_gt_0[10], total_ignored[8], ignoring r'(?:vrouter)|(?:control-9)'35CRITICAL: total_alarms[11], unacked_or_sev_gt_0[10], total_ignored[8], ignoring r'(?:vrouter)|(?:control-9)'
36WARNING: control-node{control-8-contrail-rmq, sev=0, ts[2020-06-25 18:29:23.684803]} Node Failure. NodeStatus UVE not present.36WARNING: control-node{control-8-contrail-rmq, sev=0, ts[2020-06-25 18:29:23.684803]} Node Failure. NodeStatus UVE not present.
@@ -39,32 +39,30 @@ CRITICAL: control-node{control-7-contrail-rmq, sev=1, ts[2020-06-25 18:29:24.377
39""" # noqa: ignore=F50139""" # noqa: ignore=F501
4040
4141
42def test_parse_contrail_alarms_filter_critical(check_contrail_analytics):42def test_parse_contrail_alarms_filter_critical():
43 with open(os.path.join(TEST_DIR, 'contrail_alert_data.json')) as f:43 with open(os.path.join(TEST_DIR, 'contrail_alert_data.json')) as f:
44 data = json.load(f)44 data = json.load(f)
45 assert hasattr(check_contrail_analytics, 'parse_contrail_alarms')
46 ignored_re = r'(?:CRITICAL)'45 ignored_re = r'(?:CRITICAL)'
47 parsed = check_contrail_analytics.parse_contrail_alarms(data, ignored=ignored_re)46 parsed = check_contrail_analytics_alarms.parse_contrail_alarms(data, ignored=ignored_re)
48 assert parsed in """47 assert parsed in """
49WARNING: total_alarms[11], unacked_or_sev_gt_0[10], total_ignored[10], ignoring r'(?:CRITICAL)'48WARNING: total_alarms[11], unacked_or_sev_gt_0[10], total_ignored[10], ignoring r'(?:CRITICAL)'
50WARNING: control-node{control-8-contrail-rmq, sev=0, ts[2020-06-25 18:29:23.684803]} Node Failure. NodeStatus UVE not present.49WARNING: control-node{control-8-contrail-rmq, sev=0, ts[2020-06-25 18:29:23.684803]} Node Failure. NodeStatus UVE not present.
51""" # noqa: ignore=F50150""" # noqa: ignore=F501
5251
5352
54def test_parse_contrail_alarms_all_ignored(check_contrail_analytics):53def test_parse_contrail_alarms_all_ignored():
55 with open(os.path.join(TEST_DIR, 'contrail_alert_data.json')) as f:54 with open(os.path.join(TEST_DIR, 'contrail_alert_data.json')) as f:
56 data = json.load(f)55 data = json.load(f)
57 assert hasattr(check_contrail_analytics, 'parse_contrail_alarms')
58 ignored_re = r'(?:CRITICAL)|(?:WARNING)'56 ignored_re = r'(?:CRITICAL)|(?:WARNING)'
59 parsed = check_contrail_analytics.parse_contrail_alarms(data, ignored=ignored_re)57 parsed = check_contrail_analytics_alarms.parse_contrail_alarms(data, ignored=ignored_re)
60 assert parsed in """58 assert parsed in """
61OK: total_alarms[11], unacked_or_sev_gt_0[10], total_ignored[11], ignoring r'(?:CRITICAL)|(?:WARNING)'59OK: total_alarms[11], unacked_or_sev_gt_0[10], total_ignored[11], ignoring r'(?:CRITICAL)|(?:WARNING)'
62""" # noqa: ignore=F50160""" # noqa: ignore=F501
6361
6462
65def test_parse_contrail_alarms_no_alarms(check_contrail_analytics):63def test_parse_contrail_alarms_no_alarms():
66 ignored_re = r''64 ignored_re = r''
67 parsed = check_contrail_analytics.parse_contrail_alarms({}, ignored=ignored_re)65 parsed = check_contrail_analytics_alarms.parse_contrail_alarms({}, ignored=ignored_re)
68 assert parsed in """66 assert parsed in """
69OK: total_alarms[0], unacked_or_sev_gt_0[0], total_ignored[0], ignoring r''67OK: total_alarms[0], unacked_or_sev_gt_0[0], total_ignored[0], ignoring r''
70"""68"""
diff --git a/tests/unit/test_check_nova_services.py b/tests/unit/test_check_nova_services.py
index 10c13ed..dad32a6 100644
--- a/tests/unit/test_check_nova_services.py
+++ b/tests/unit/test_check_nova_services.py
@@ -1,10 +1,7 @@
1import pytest1import pytest
2import nagios_plugin32import nagios_plugin3
33
4import sys4import check_nova_services
5sys.path.append('files/plugins')
6
7import check_nova_services # noqa: E402
85
96
10@pytest.mark.parametrize('is_skip_disabled,num_nodes',7@pytest.mark.parametrize('is_skip_disabled,num_nodes',
diff --git a/tests/unit/test_check_octavia.py b/tests/unit/test_check_octavia.py
11new file mode 1006448new file mode 100644
index 0000000..08c9357
--- /dev/null
+++ b/tests/unit/test_check_octavia.py
@@ -0,0 +1,117 @@
1from datetime import datetime, timedelta
2import json
3import unittest.mock as mock
4from uuid import uuid4
5
6import check_octavia
7import pytest
8
9
10@mock.patch('check_octavia.openstack.connect')
11@pytest.mark.parametrize('check', [
12 'loadbalancers', 'pools', "amphorae", "image"
13])
14def test_stable_alarms(connect, check):
15 args = mock.MagicMock()
16 args.ignored = r''
17 args.check = check
18 if check == "amphorae":
19 # Present 0 Amphora instances
20 resp = connect().load_balancer.get()
21 resp.status_code = 200
22 resp.content = json.dumps({'amphora': []})
23 elif check == "image":
24 # Present 1 Active Fresh Amphora image
25 args.amp_image_tag = 'octavia'
26 args.amp_image_days = 1
27 amp_image = mock.MagicMock()
28 amp_image.status = 'active'
29 amp_image.updated_at = datetime.now().isoformat()
30 connect().image.images.return_value = [amp_image]
31
32 status, message = check_octavia.process_checks(args)
33 assert message in """
34OK: total_alarms[0], total_crit[0], total_ignored[0], ignoring r''
35"""
36 assert status == check_octavia.NAGIOS_STATUS_OK
37
38
39@mock.patch('check_octavia.openstack.connect')
40def test_no_images_is_ignorable(connect):
41 args = mock.MagicMock()
42 args.ignored = 'none exist'
43 args.check = "image"
44 # Present 1 Active Fresh Amphora image
45 args.amp_image_tag = 'octavia'
46 args.amp_image_days = 1
47 connect().image.images.return_value = []
48
49 status, message = check_octavia.process_checks(args)
50 assert message in """
51OK: total_alarms[1], total_crit[1], total_ignored[1], ignoring r'(?:none exist)'
52"""
53 assert status == check_octavia.NAGIOS_STATUS_OK
54
55
56@mock.patch('check_octavia.openstack.connect')
57def test_no_images(connect):
58 args = mock.MagicMock()
59 args.ignored = r''
60 args.check = "image"
61 # Present 1 Active Fresh Amphora image
62 args.amp_image_tag = 'octavia'
63 args.amp_image_days = 1
64 connect().image.images.return_value = []
65
66 status, message = check_octavia.process_checks(args)
67 assert message in """
68CRITICAL: total_alarms[1], total_crit[1], total_ignored[0], ignoring r''
69Octavia requires image with tag octavia to create amphora, but none exist
70"""
71 assert status == check_octavia.NAGIOS_STATUS_CRITICAL
72
73
74@mock.patch('check_octavia.openstack.connect')
75def test_no_active_images(connect):
76 args = mock.MagicMock()
77 args.ignored = r''
78 args.check = "image"
79 # Present 1 Active Fresh Amphora image
80 args.amp_image_tag = 'octavia'
81 args.amp_image_days = 1
82 amp_image = mock.MagicMock()
83 amp_image.name = "bob-the-image"
84 amp_image.id = str(uuid4())
85 amp_image.status = 'inactive'
86 amp_image.updated_at = datetime.now().isoformat()
87 connect().image.images.return_value = [amp_image]
88
89 status, message = check_octavia.process_checks(args)
90 assert message in """
91CRITICAL: total_alarms[1], total_crit[1], total_ignored[0], ignoring r''
92Octavia requires image with tag octavia to create amphora, but none are active: bob-the-image({})
93""".format(amp_image.id)
94 assert status == check_octavia.NAGIOS_STATUS_CRITICAL
95
96
97@mock.patch('check_octavia.openstack.connect')
98def test_no_fresh_images(connect):
99 args = mock.MagicMock()
100 args.ignored = r''
101 args.check = "image"
102 # Present 1 Active Fresh Amphora image
103 args.amp_image_tag = 'octavia'
104 args.amp_image_days = 1
105 amp_image = mock.MagicMock()
106 amp_image.name = "bob-the-image"
107 amp_image.id = str(uuid4())
108 amp_image.status = 'active'
109 amp_image.updated_at = (datetime.now() - timedelta(days=2)).isoformat()
110 connect().image.images.return_value = [amp_image]
111
112 status, message = check_octavia.process_checks(args)
113 assert message in """
114WARNING: total_alarms[1], total_crit[0], total_ignored[0], ignoring r''
115Octavia requires image with tag octavia to create amphora, but all images are older than 1 day(s): bob-the-image({})
116""".format(amp_image.id)
117 assert status == check_octavia.NAGIOS_STATUS_WARNING

Subscribers

People subscribed via source and target branches