Merge ~canonical-bootstack/charm-openstack-service-checks:check-loadbalancers into ~canonical-bootstack/charm-openstack-service-checks:master
- Git
- lp:~canonical-bootstack/charm-openstack-service-checks
- check-loadbalancers
- Merge into master
Status: | Superseded |
---|---|
Proposed branch: | ~canonical-bootstack/charm-openstack-service-checks:check-loadbalancers |
Merge into: | ~canonical-bootstack/charm-openstack-service-checks:master |
Diff against target: |
515 lines (+386/-9) 9 files modified
actions.yaml (+3/-0) actions/actions.py (+75/-0) actions/refresh-endpoint-checks (+1/-0) config.yaml (+18/-0) files/plugins/check_octavia.py (+227/-0) layer.yaml (+2/-0) lib/lib_openstack_service_checks.py (+55/-8) metadata.yaml (+1/-1) tests/functional/test_deploy.py (+4/-0) |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Jose Guedez (community) | Approve | ||
Jeremy Lounder | Pending | ||
James Hebden | Pending | ||
Joe Guo | Pending | ||
Review via email: mp+377016@code.launchpad.net |
This proposal has been superseded by a proposal from 2020-03-18.
Commit message
Add checks for loadbalancers.
Description of the change
🤖 Canonical IS Merge Bot (canonical-is-mergebot) wrote : | # |
James Hebden (ec0) wrote : | # |
Please see comments inline.
Overall, looks really cool!
One other check I would suggest, is verifying that the image exists in glance for booting new Amphorae if possible. The tag used by Octavia should be available as the amp-image-tag setting on Octavia, so we should either provide a setting on this charm to configure the tag we will monitor (i.e. a string setting called octavia-image-tag with default of 'amp-image-tag') or get this information from Octavia. We can then check that there is an image with this tag present in Glance, as if the image is deleted, we have a problem, because new amphora can not be booted.
I'd be happy to approve this with the below changes, even without the glance check, and we can handle the glance check as a follow up activity.
- 18472a4... by Joe Guo
-
add octavia checks
Signed-off-by: Joe Guo <email address hidden>
Alvaro Uria (aluria) wrote : | # |
I've added a comment on a minor typo. OTOH, the best practice suggested for charms is to have as many things autoconfigured as possible (ie. reduce config parameters). So, I would suggest a new interface between charm-octavia and charm-openstack
1) It would make "check-
2) The relation would share the amp-image-tag value automatically (if not set, charm-openstack
3) Possibly do the same for the flavor id, which is a config param in Octavia (however, I don't know the way to guess if a flavor that will be used by the Amphorae exists - the default is that charm-octavia handles it, so an internal value probably exists)
Thank you for the change. Is there a bug we can link this MP to?
Joe Guo (guoqiao) wrote : | # |
Hi,
Thanks for the review and feedback.
I've fixed all, and refactored the code to make it more flexible/
Could you please review again when you get the time?
NOTE: for image check, it hardcoded `amp_img_tag` to `octavia-amphora` at the moment, and it's disabled by default. Work in progress to find a proper/minimal way to set it.
James Hebden (ec0) wrote : | # |
Just a couple more comments, around config.yaml and default checks.
Joe Guo (guoqiao) wrote : | # |
> Just a couple more comments, around config.yaml and default checks.
Hi James,
I've made the changes, please review again when you get the time. Thanks:)
- 0bfc801... by Jeremy Lounder
-
Updated maintainers in metadata.yaml
- bd33e8e... by Xav Paice
-
Add ability to remove NRPE endpoint checks
When a service is removed from the Keystone catalog, this adds a
mechanism to remove that service from the list of NRPE checks so it
doesn't alert CRITICAL for services that aren't there any more.Closes-Bug: LP: #1836385
Jose Guedez (jfguedez) wrote : | # |
comments inline...
Joe Guo (guoqiao) wrote : | # |
Hi Jose,
Thanks for the great review, they are all good advice. I will make those changes in my new Engineer rotation. Thanks.
Joe Guo (guoqiao) wrote : | # |
Hi Jose,
I've made following changes:
1. split octavia nagios check into 4(loadbalancers
2. remove the image tag and days default on check_image function. But I still have them on both cli option and config.yaml since they are different levels.
I didn't import nagios3_plugin module for the status codes, so I can keep this script standalone, and run it from any location.
Another review appreciated:)
Jose Guedez (jfguedez) wrote : | # |
Hi Joe - Good stuff, much cleaner now. Comments inline. Just one real comment with the functional test and the new filenames of the checks.
Joe Guo (guoqiao) wrote : | # |
Hi Jose,
Changed and replied, see my inline replies.
BTW: I've verified the new octavia checks are added correctly with openstack-on-lxd env: https:/
Unmerged commits
- 18472a4... by Joe Guo
-
add octavia checks
Signed-off-by: Joe Guo <email address hidden>
- bd33e8e... by Xav Paice
-
Add ability to remove NRPE endpoint checks
When a service is removed from the Keystone catalog, this adds a
mechanism to remove that service from the list of NRPE checks so it
doesn't alert CRITICAL for services that aren't there any more.Closes-Bug: LP: #1836385
- 0bfc801... by Jeremy Lounder
-
Updated maintainers in metadata.yaml
Preview Diff
1 | diff --git a/actions.yaml b/actions.yaml |
2 | new file mode 100644 |
3 | index 0000000..9e5a96a |
4 | --- /dev/null |
5 | +++ b/actions.yaml |
6 | @@ -0,0 +1,3 @@ |
7 | +refresh-endpoint-checks: |
8 | + description: >- |
9 | + Trigger the Keystone endpoints to be reloaded on next update-status hook. |
10 | diff --git a/actions/actions.py b/actions/actions.py |
11 | new file mode 100755 |
12 | index 0000000..1e679b6 |
13 | --- /dev/null |
14 | +++ b/actions/actions.py |
15 | @@ -0,0 +1,75 @@ |
16 | +#!/usr/local/sbin/charm-env python3 |
17 | +# Copyright 2020 Canonical Ltd |
18 | +# |
19 | +# Licensed under the Apache License, Version 2.0 (the "License"); |
20 | +# you may not use this file except in compliance with the License. |
21 | +# You may obtain a copy of the License at |
22 | +# |
23 | +# http://www.apache.org/licenses/LICENSE-2.0 |
24 | +# |
25 | +# Unless required by applicable law or agreed to in writing, software |
26 | +# distributed under the License is distributed on an "AS IS" BASIS, |
27 | +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
28 | +# See the License for the specific language governing permissions and |
29 | +# limitations under the License. |
30 | + |
31 | +import os |
32 | +import sys |
33 | +import charmhelpers.core.hookenv as hookenv |
34 | +import charmhelpers.core.unitdata as unitdata |
35 | +from traceback import format_exc |
36 | + |
37 | +# Load modules from $CHARM_DIR/lib |
38 | +sys.path.append("lib") |
39 | + |
40 | +import charms.reactive # NOQA: E402 |
41 | +from charms.layer import basic # NOQA: E402 |
42 | + |
43 | +basic.bootstrap_charm_deps() |
44 | +basic.init_config_states() |
45 | + |
46 | +from charms.reactive.flags import clear_flag # NOQA: E402 |
47 | + |
48 | + |
49 | +def refresh_endpoint_checks(*args): |
50 | + """Clear the openstack-service-checks.endpoints.configured flag |
51 | + so that next time update-status runs, the Keystone catalog is re-read |
52 | + and nrpe checks refreshed. |
53 | + """ |
54 | + clear_flag("openstack-service-checks.endpoints.configured") |
55 | + |
56 | + |
57 | +# Actions to function mapping, to allow for illegal python action names that |
58 | +# can map to a python function. |
59 | +ACTIONS = { |
60 | + "refresh-endpoint-checks": refresh_endpoint_checks, |
61 | +} |
62 | + |
63 | + |
64 | +def main(args): |
65 | + action_name = os.path.basename(args[0]) |
66 | + try: |
67 | + action = ACTIONS[action_name] |
68 | + except KeyError: |
69 | + return "Action %s undefined" % action_name |
70 | + else: |
71 | + try: |
72 | + action(args) |
73 | + except Exception: |
74 | + exc = format_exc() |
75 | + hookenv.log(exc, hookenv.ERROR) |
76 | + hookenv.action_fail(exc.splitlines()[-1]) |
77 | + else: |
78 | + # we were successful, so commit changes from the action |
79 | + unitdata.kv().flush() |
80 | + # try running handlers based on new state |
81 | + try: |
82 | + charms.reactive.main() |
83 | + except Exception: |
84 | + exc = format_exc() |
85 | + hookenv.log(exc, hookenv.ERROR) |
86 | + hookenv.action_fail(exc.splitlines()[-1]) |
87 | + |
88 | + |
89 | +if __name__ == "__main__": |
90 | + sys.exit(main(sys.argv)) |
91 | diff --git a/actions/refresh-endpoint-checks b/actions/refresh-endpoint-checks |
92 | new file mode 120000 |
93 | index 0000000..405a394 |
94 | --- /dev/null |
95 | +++ b/actions/refresh-endpoint-checks |
96 | @@ -0,0 +1 @@ |
97 | +actions.py |
98 | \ No newline at end of file |
99 | diff --git a/config.yaml b/config.yaml |
100 | index 073ba9b..edfda51 100644 |
101 | --- a/config.yaml |
102 | +++ b/config.yaml |
103 | @@ -10,6 +10,24 @@ options: |
104 | description: | |
105 | Switch to turn on or off neutron agents checks. By default, neutron_agents nrpe check is enabled. |
106 | If a different SDN (ie. Contrail) is in use, you may want to disable this check. |
107 | + check-octavia: |
108 | + default: True |
109 | + type: boolean |
110 | + description: | |
111 | + Switch to turn on or off check for octavia services. |
112 | + octavia-amp-image-tag: |
113 | + default: "octavia-amphora" |
114 | + type: string |
115 | + description: | |
116 | + The glance image tag octavia will use to create amphora. |
117 | + octavia-amp-image-days: |
118 | + default: 365 |
119 | + type: int |
120 | + description: | |
121 | + If latest glance image tagged with above octavia-amp-image-tag is updated more than these days ago, |
122 | + a Nagios warning will be raised. The version of octavia agent builtin in amphora image must match |
123 | + version of octavia controller, otherwise octavia will fail to communicate with new amphora, |
124 | + failover will also fail. |
125 | check-rally: |
126 | default: False |
127 | type: boolean |
128 | diff --git a/files/plugins/check_octavia.py b/files/plugins/check_octavia.py |
129 | new file mode 100755 |
130 | index 0000000..09fb6b1 |
131 | --- /dev/null |
132 | +++ b/files/plugins/check_octavia.py |
133 | @@ -0,0 +1,227 @@ |
134 | +#!/usr/bin/env python3 |
135 | + |
136 | +import os |
137 | +import sys |
138 | +import json |
139 | +import argparse |
140 | +import subprocess |
141 | +from datetime import datetime, timedelta |
142 | +import openstack |
143 | + |
144 | +NAGIOS_STATUS_OK = 0 |
145 | +NAGIOS_STATUS_WARNING = 1 |
146 | +NAGIOS_STATUS_CRITICAL = 2 |
147 | +NAGIOS_STATUS_UNKNOWN = 3 |
148 | + |
149 | +NAGIOS_STATUS = { |
150 | + NAGIOS_STATUS_OK: 'OK', |
151 | + NAGIOS_STATUS_WARNING: 'WARNING', |
152 | + NAGIOS_STATUS_CRITICAL: 'CRITICAL', |
153 | + NAGIOS_STATUS_UNKNOWN: 'UNKNOWN', |
154 | +} |
155 | + |
156 | + |
157 | +def nagios_exit(status, message): |
158 | + assert status in NAGIOS_STATUS, "Invalid Nagios status code" |
159 | + # prefix status name to message |
160 | + output = '{}: {}'.format(NAGIOS_STATUS[status], message) |
161 | + print(output) # nagios requires print to stdout, no stderr |
162 | + sys.exit(status) |
163 | + |
164 | + |
165 | +def check_loadbalancers(connection): |
166 | + """check loadbalancers status.""" |
167 | + |
168 | + lb_mgr = connection.load_balancer |
169 | + lb_all = lb_mgr.load_balancers() |
170 | + |
171 | + # only check enabled lbs |
172 | + lb_enabled = [lb for lb in lb_all if lb.is_admin_state_up] |
173 | + |
174 | + # check provisioning_status is ACTIVE for each lb |
175 | + bad_lbs = [lb for lb in lb_enabled if lb.provisioning_status != 'ACTIVE'] |
176 | + if bad_lbs: |
177 | + parts = ['loadbalancer {} provisioning_status is {}'.format( |
178 | + lb.id, lb.provisioning_status) for lb in bad_lbs] |
179 | + message = ', '.join(parts) |
180 | + return NAGIOS_STATUS_CRITICAL, message |
181 | + |
182 | + # raise WARNING if operating_status is not ONLINE |
183 | + bad_lbs = [lb for lb in lb_enabled if lb.operating_status != 'ONLINE'] |
184 | + if bad_lbs: |
185 | + parts = ['loadbalancer {} operating_status is {}'.format( |
186 | + lb.id, lb.operating_status) for lb in bad_lbs] |
187 | + message = ', '.join(parts) |
188 | + return NAGIOS_STATUS_CRITICAL, message |
189 | + |
190 | + net_mgr = connection.network |
191 | + # check vip port exists for each lb |
192 | + bad_lbs = [] |
193 | + for lb in lb_enabled: |
194 | + try: |
195 | + net_mgr.get_port(lb.vip_port_id) |
196 | + except openstack.exceptions.NotFoundException: |
197 | + bad_lbs.append(lb) |
198 | + if bad_lbs: |
199 | + parts = ['vip port {} for loadbalancer {} not found'.format( |
200 | + lb.vip_port_id, lb.id) for lb in bad_lbs] |
201 | + message = ', '.join(parts) |
202 | + return NAGIOS_STATUS_CRITICAL, message |
203 | + |
204 | + # warn about disabled lbs if no other error found |
205 | + lb_disabled = [lb for lb in lb_all if not lb.is_admin_state_up] |
206 | + if lb_disabled: |
207 | + parts = ['loadbalancer {} admin_state_up is False'.format(lb.id) |
208 | + for lb in lb_disabled] |
209 | + message = ', '.join(parts) |
210 | + return NAGIOS_STATUS_WARNING, message |
211 | + |
212 | + return NAGIOS_STATUS_OK, 'loadbalancers are happy' |
213 | + |
214 | + |
215 | +def check_pools(connection): |
216 | + """check pools status.""" |
217 | + lb_mgr = connection.load_balancer |
218 | + pools_all = lb_mgr.pools() |
219 | + pools_enabled = [pool for pool in pools_all if pool.is_admin_state_up] |
220 | + |
221 | + # check provisioning_status is ACTIVE for each pool |
222 | + bad_pools = [pool for pool in pools_enabled if pool.provisioning_status != 'ACTIVE'] |
223 | + if bad_pools: |
224 | + parts = ['pool {} provisioning_status is {}'.format( |
225 | + pool.id, pool.provisioning_status) for pool in bad_pools] |
226 | + message = ', '.join(parts) |
227 | + return NAGIOS_STATUS_CRITICAL, message |
228 | + |
229 | + # raise CRITICAL if operating_status is ERROR |
230 | + bad_pools = [pool for pool in pools_enabled if pool.operating_status == 'ERROR'] |
231 | + if bad_pools: |
232 | + parts = ['pool {} operating_status is {}'.format( |
233 | + pool.id, pool.operating_status) for pool in bad_pools] |
234 | + message = ', '.join(parts) |
235 | + return NAGIOS_STATUS_CRITICAL, message |
236 | + |
237 | + # raise WARNING if operating_status is NO_MONITOR |
238 | + bad_pools = [pool for pool in pools_enabled if pool.operating_status == 'NO_MONITOR'] |
239 | + if bad_pools: |
240 | + parts = ['pool {} operating_status is {}'.format( |
241 | + pool.id, pool.operating_status) for pool in bad_pools] |
242 | + message = ', '.join(parts) |
243 | + return NAGIOS_STATUS_WARNING, message |
244 | + |
245 | + return NAGIOS_STATUS_OK, 'pools are happy' |
246 | + |
247 | + |
248 | +def check_amphorae(connection): |
249 | + """check amphorae status.""" |
250 | + |
251 | + lb_mgr = connection.load_balancer |
252 | + |
253 | + resp = lb_mgr.get('/v2/octavia/amphorae') |
254 | + # python api is not available yet, use url |
255 | + if resp.status_code != 200: |
256 | + return NAGIOS_STATUS_WARNING, 'amphorae api not working' |
257 | + |
258 | + data = json.loads(resp.content) |
259 | + # ouput is like {"amphorae": [{...}, {...}, ...]} |
260 | + items = data.get('amphorae', []) |
261 | + |
262 | + # raise CRITICAL for ERROR status |
263 | + bad_status_list = ('ERROR',) |
264 | + bad_items = [item for item in items if item['status'] in bad_status_list] |
265 | + if bad_items: |
266 | + parts = [ |
267 | + 'amphora {} status is {}'.format(item['id'], item['status']) |
268 | + for item in bad_items] |
269 | + message = ', '.join(parts) |
270 | + return NAGIOS_STATUS_CRITICAL, message |
271 | + |
272 | + # raise WARNING for these status |
273 | + bad_status_list = ( |
274 | + 'PENDING_CREATE', 'PENDING_UPDATE', 'PENDING_DELETE', 'BOOTING') |
275 | + bad_items = [item for item in items if item['status'] in bad_status_list] |
276 | + if bad_items: |
277 | + parts = [ |
278 | + 'amphora {} status is {}'.format(item['id'], item['status']) |
279 | + for item in bad_items] |
280 | + message = ', '.join(parts) |
281 | + return NAGIOS_STATUS_WARNING, message |
282 | + |
283 | + return NAGIOS_STATUS_OK, 'amphorae are happy' |
284 | + |
285 | + |
286 | +def check_image(connection, tag, days): |
287 | + img_mgr = connection.image |
288 | + images = list(img_mgr.images(tag=tag)) |
289 | + |
290 | + if not images: |
291 | + message = ('Octavia requires image with tag {} to create amphora, ' |
292 | + 'but none exist').format(tag) |
293 | + return NAGIOS_STATUS_CRITICAL, message |
294 | + |
295 | + active_images = [image for image in images if image.status == 'active'] |
296 | + if not active_images: |
297 | + parts = ['{}({})'.format(image.name, image.id) for image in images] |
298 | + message = ('Octavia requires image with tag {} to create amphora, ' |
299 | + 'but none is active: {}').format(tag, ', '.join(parts)) |
300 | + return NAGIOS_STATUS_CRITICAL, message |
301 | + |
302 | + # raise WARNING if image is too old |
303 | + when = (datetime.now() - timedelta(days=days)).isoformat() |
304 | + # updated_at str format: '2019-12-05T18:21:25Z' |
305 | + fresh_images = [image for image in active_images if image.updated_at > when] |
306 | + if not fresh_images: |
307 | + message = ('Octavia requires image with tag {} to create amphora, ' |
308 | + 'but it is older than {} days').format(tag, days) |
309 | + return NAGIOS_STATUS_WARNING, message |
310 | + |
311 | + return NAGIOS_STATUS_OK, 'image is ready' |
312 | + |
313 | + |
314 | +if __name__ == '__main__': |
315 | + parser = argparse.ArgumentParser( |
316 | + description='Check Octavia status', |
317 | + formatter_class=argparse.ArgumentDefaultsHelpFormatter, |
318 | + ) |
319 | + parser.add_argument( |
320 | + '--env', dest='env', default='/var/lib/nagios/nagios.novarc', |
321 | + help='Novarc file to use for this check') |
322 | + |
323 | + check_choices = ['loadbalancers', 'amphorae', 'pools', 'image'] |
324 | + parser.add_argument( |
325 | + '--check', dest='check', metavar='|'.join(check_choices), |
326 | + type=str, choices=check_choices, |
327 | + default=check_choices[0], |
328 | + help='which check to run') |
329 | + |
330 | + parser.add_argument( |
331 | + '--amp-image-tag', dest='amp_image_tag', default='octavia-amphora', |
332 | + help='amphora image tag for image check') |
333 | + |
334 | + parser.add_argument( |
335 | + '--amp-image-days', dest='amp_image_days', type=int, default=365, |
336 | + help='raise warning if amphora image is older than these days') |
337 | + |
338 | + args = parser.parse_args() |
339 | + # source environment vars |
340 | + command = ['/bin/bash', '-c', 'source {} && env'.format(args.env)] |
341 | + proc = subprocess.Popen(command, stdout=subprocess.PIPE) |
342 | + for line in proc.stdout: |
343 | + (key, _, value) = line.partition(b'=') |
344 | + os.environ[key.decode('utf-8')] = value.rstrip().decode('utf-8') |
345 | + proc.communicate() |
346 | + |
347 | + # use closure to make all checks have same signature |
348 | + # so we can handle them in same way |
349 | + def _check_image(connection): |
350 | + return check_image(connection, args.amp_image_tag, args.amp_image_days) |
351 | + |
352 | + checks = { |
353 | + 'loadbalancers': check_loadbalancers, |
354 | + 'amphorae': check_amphorae, |
355 | + 'pools': check_pools, |
356 | + 'image': _check_image, |
357 | + } |
358 | + |
359 | + connection = openstack.connect(cloud='envvars') |
360 | + nagios_exit(*checks[args.check](connection)) |
361 | diff --git a/layer.yaml b/layer.yaml |
362 | index 4053919..0606d45 100644 |
363 | --- a/layer.yaml |
364 | +++ b/layer.yaml |
365 | @@ -20,6 +20,8 @@ options: |
366 | - python3-keystoneclient |
367 | - python3-openstackclient |
368 | - python-openstackclient |
369 | + - python3-octaviaclient |
370 | + - python-octaviaclient |
371 | snap: |
372 | fcbtest: |
373 | channel: stable |
374 | diff --git a/lib/lib_openstack_service_checks.py b/lib/lib_openstack_service_checks.py |
375 | index 9d123fb..38e5f0f 100644 |
376 | --- a/lib/lib_openstack_service_checks.py |
377 | +++ b/lib/lib_openstack_service_checks.py |
378 | @@ -65,6 +65,18 @@ class OSCHelper(): |
379 | return self.charm_config['check-neutron-agents'] |
380 | |
381 | @property |
382 | + def is_octavia_check_enabled(self): |
383 | + return self.charm_config['check-octavia'] |
384 | + |
385 | + @property |
386 | + def octavia_amp_image_tag(self): |
387 | + return self.charm_config['octavia-amp-image-tag'] |
388 | + |
389 | + @property |
390 | + def octavia_amp_image_days(self): |
391 | + return self.charm_config['octavia-amp-image-days'] |
392 | + |
393 | + @property |
394 | def skipped_rally_checks(self): |
395 | skipped_os_components = self.charm_config['skip-rally'].strip() |
396 | if not skipped_os_components: |
397 | @@ -186,6 +198,28 @@ class OSCHelper(): |
398 | else: |
399 | nrpe.remove_check(shortname='neutron_agents') |
400 | |
401 | + if self.is_octavia_check_enabled: |
402 | + script = os.path.join(self.plugins_dir, 'check_octavia.py') |
403 | + |
404 | + for check in ('loadbalancers', 'amphorae', 'pools'): |
405 | + nrpe.add_check( |
406 | + shortname='octavia_{}'.format(check), |
407 | + description='Check octavia {} status'.format(check), |
408 | + check_cmd='{} --check {}'.format(script, check), |
409 | + ) |
410 | + |
411 | + # image check has extra args, add it separately |
412 | + check = 'image' |
413 | + nrpe.add_check( |
414 | + shortname='octavia_{}'.format(check), |
415 | + description='Check octavia {} status'.format(check), |
416 | + check_cmd='{} --check {} --amp-image-tag {} --amp-image-days {}'.format( |
417 | + script, check, self.octavia_amp_image_tag, self.octavia_amp_image_days), |
418 | + ) |
419 | + else: |
420 | + for check in ('loadbalancers', 'amphorae', 'pools', 'image'): |
421 | + nrpe.remove_check(shortname='octavia_{}'.format(check)) |
422 | + |
423 | if self.contrail_analytics_vip: |
424 | contrail_check_command = '{} --host {}'.format( |
425 | os.path.join(self.plugins_dir, 'check_contrail_analytics_alarms.py'), |
426 | @@ -272,7 +306,7 @@ class OSCHelper(): |
427 | endpoints = self.keystone_endpoints |
428 | services = [svc for svc in self.keystone_services if svc.enabled] |
429 | nrpe = NRPE() |
430 | - skip_service = set() |
431 | + configured_endpoint_checks = dict() |
432 | for endpoint in endpoints: |
433 | endpoint.service_names = [x.name |
434 | for x in services |
435 | @@ -281,21 +315,18 @@ class OSCHelper(): |
436 | endpoint.healthcheck_url = health_check_params.get(service_name, '/') |
437 | |
438 | # Note(aluria): glance-simplestreams-sync does not provide an API to check |
439 | - if service_name == 'image-stream': |
440 | + # Note(aluria): filter:healthcheck is not configured in Keystone v2 |
441 | + # https://docs.openstack.org/keystone/pike/configuration.html#health-check-middleware |
442 | + if service_name == 'image-stream' or service_name == 'keystone': |
443 | continue |
444 | |
445 | if not hasattr(endpoint, 'interface'): |
446 | - if service_name == 'keystone': |
447 | - # Note(aluria): filter:healthcheck is not configured in v2 |
448 | - # https://docs.openstack.org/keystone/pike/configuration.html#health-check-middleware |
449 | - continue |
450 | for interface in 'admin internal public'.split(): |
451 | old_interface_name = '{}url'.format(interface) |
452 | if not hasattr(endpoint, old_interface_name): |
453 | continue |
454 | endpoint.interface = interface |
455 | endpoint.url = getattr(endpoint, old_interface_name) |
456 | - skip_service.add(service_name) |
457 | break |
458 | |
459 | check_url = urlparse(endpoint.url) |
460 | @@ -324,10 +355,26 @@ class OSCHelper(): |
461 | check_cmd=' '.join(cmd_params_cert)) |
462 | |
463 | # Add the actual health check for the URL |
464 | - nrpe.add_check(shortname='{}_{}'.format(service_name, endpoint.interface), |
465 | + nrpe_shortname = '{}_{}'.format(service_name, endpoint.interface) |
466 | + nrpe.add_check(shortname=nrpe_shortname, |
467 | description='Endpoint url check for {} {}'.format(service_name, endpoint.interface), |
468 | check_cmd=' '.join(cmd_params)) |
469 | + configured_endpoint_checks[nrpe_shortname] = True |
470 | + nrpe.write() |
471 | + self._remove_old_nrpe_endpoint_checks(nrpe, configured_endpoint_checks) |
472 | |
473 | + def _remove_old_nrpe_endpoint_checks(self, nrpe, configured_endpoint_checks): |
474 | + """Loop through the old and new endpoint checks, if there are checks that aren't needed any more, |
475 | + remove them. |
476 | + """ |
477 | + kv = unitdata.kv() |
478 | + endpoint_delta = kv.delta(configured_endpoint_checks, 'endpoint_checks') |
479 | + kv.update(configured_endpoint_checks, 'endpoint_checks') |
480 | + for nrpe_shortname in endpoint_delta.items(): |
481 | + # generates tuples of the format ('heat_public', Delta(previous=None, current=True)) |
482 | + # remove any that are not current |
483 | + if not nrpe_shortname[1].current: |
484 | + nrpe.remove_check(shortname=nrpe_shortname[0]) |
485 | nrpe.write() |
486 | |
487 | def get_keystone_client(self, creds): |
488 | diff --git a/metadata.yaml b/metadata.yaml |
489 | index d4b7e87..94a7db8 100644 |
490 | --- a/metadata.yaml |
491 | +++ b/metadata.yaml |
492 | @@ -1,7 +1,7 @@ |
493 | name: openstack-service-checks |
494 | summary: OpenStack Services NRPE Checks |
495 | description: OpenStack Services NRPE Checks |
496 | -maintainer: LMA Charmers <llama-charmers@lists.ubuntu.com> |
497 | +maintainer: Llama (LMA) Charmers <llama-charmers@lists.ubuntu.com> |
498 | subordinate: false |
499 | tags: |
500 | - openstack |
501 | diff --git a/tests/functional/test_deploy.py b/tests/functional/test_deploy.py |
502 | index d0eea26..1716339 100644 |
503 | --- a/tests/functional/test_deploy.py |
504 | +++ b/tests/functional/test_deploy.py |
505 | @@ -128,6 +128,10 @@ async def test_openstackservicechecks_verify_default_nrpe_checks(deploy_app, mod |
506 | filenames.extend([ |
507 | '/etc/nagios/nrpe.d/check_nova_services.cfg', |
508 | '/etc/nagios/nrpe.d/check_neutron_agents.cfg', |
509 | + '/etc/nagios/nrpe.d/check_octavia_loadbalancers.cfg', |
510 | + '/etc/nagios/nrpe.d/check_octavia_amphorae.cfg', |
511 | + '/etc/nagios/nrpe.d/check_octavia_pools.cfg', |
512 | + '/etc/nagios/nrpe.d/check_octavia_image.cfg', |
513 | ]) |
514 | for filename in filenames: |
515 | test_stat = await file_stat(filename, unit) |
This merge proposal is being monitored by mergebot. Change the status to Approved to merge.