Merge ~canonical-bootstack/charm-openstack-service-checks:check-loadbalancers into ~canonical-bootstack/charm-openstack-service-checks:master
- Git
- lp:~canonical-bootstack/charm-openstack-service-checks
- check-loadbalancers
- Merge into master
Status: | Superseded |
---|---|
Proposed branch: | ~canonical-bootstack/charm-openstack-service-checks:check-loadbalancers |
Merge into: | ~canonical-bootstack/charm-openstack-service-checks:master |
Diff against target: |
515 lines (+386/-9) 9 files modified
actions.yaml (+3/-0) actions/actions.py (+75/-0) actions/refresh-endpoint-checks (+1/-0) config.yaml (+18/-0) files/plugins/check_octavia.py (+227/-0) layer.yaml (+2/-0) lib/lib_openstack_service_checks.py (+55/-8) metadata.yaml (+1/-1) tests/functional/test_deploy.py (+4/-0) |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Jose Guedez (community) | Approve | ||
Jeremy Lounder | Pending | ||
James Hebden | Pending | ||
Joe Guo | Pending | ||
Review via email: mp+377016@code.launchpad.net |
This proposal has been superseded by a proposal from 2020-03-18.
Commit message
Add checks for loadbalancers.
Description of the change
🤖 Canonical IS Merge Bot (canonical-is-mergebot) wrote : | # |
James Hebden (ec0) wrote : | # |
Please see comments inline.
Overall, looks really cool!
One other check I would suggest, is verifying that the image exists in glance for booting new Amphorae if possible. The tag used by Octavia should be available as the amp-image-tag setting on Octavia, so we should either provide a setting on this charm to configure the tag we will monitor (i.e. a string setting called octavia-image-tag with default of 'amp-image-tag') or get this information from Octavia. We can then check that there is an image with this tag present in Glance, as if the image is deleted, we have a problem, because new amphora can not be booted.
I'd be happy to approve this with the below changes, even without the glance check, and we can handle the glance check as a follow up activity.
- 18472a4... by Joe Guo
-
add octavia checks
Signed-off-by: Joe Guo <email address hidden>
Alvaro Uria (aluria) wrote : | # |
I've added a comment on a minor typo. OTOH, the best practice suggested for charms is to have as many things autoconfigured as possible (ie. reduce config parameters). So, I would suggest a new interface between charm-octavia and charm-openstack
1) It would make "check-
2) The relation would share the amp-image-tag value automatically (if not set, charm-openstack
3) Possibly do the same for the flavor id, which is a config param in Octavia (however, I don't know the way to guess if a flavor that will be used by the Amphorae exists - the default is that charm-octavia handles it, so an internal value probably exists)
Thank you for the change. Is there a bug we can link this MP to?
Joe Guo (guoqiao) wrote : | # |
Hi,
Thanks for the review and feedback.
I've fixed all, and refactored the code to make it more flexible/
Could you please review again when you get the time?
NOTE: for image check, it hardcoded `amp_img_tag` to `octavia-amphora` at the moment, and it's disabled by default. Work in progress to find a proper/minimal way to set it.
James Hebden (ec0) wrote : | # |
Just a couple more comments, around config.yaml and default checks.
Joe Guo (guoqiao) wrote : | # |
> Just a couple more comments, around config.yaml and default checks.
Hi James,
I've made the changes, please review again when you get the time. Thanks:)
- 0bfc801... by Jeremy Lounder
-
Updated maintainers in metadata.yaml
- bd33e8e... by Xav Paice
-
Add ability to remove NRPE endpoint checks
When a service is removed from the Keystone catalog, this adds a
mechanism to remove that service from the list of NRPE checks so it
doesn't alert CRITICAL for services that aren't there any more.Closes-Bug: LP: #1836385
Jose Guedez (jfguedez) wrote : | # |
comments inline...
Joe Guo (guoqiao) wrote : | # |
Hi Jose,
Thanks for the great review, they are all good advice. I will make those changes in my new Engineer rotation. Thanks.
Joe Guo (guoqiao) wrote : | # |
Hi Jose,
I've made following changes:
1. split octavia nagios check into 4(loadbalancers
2. remove the image tag and days default on check_image function. But I still have them on both cli option and config.yaml since they are different levels.
I didn't import nagios3_plugin module for the status codes, so I can keep this script standalone, and run it from any location.
Another review appreciated:)
Jose Guedez (jfguedez) wrote : | # |
Hi Joe - Good stuff, much cleaner now. Comments inline. Just one real comment with the functional test and the new filenames of the checks.
Joe Guo (guoqiao) wrote : | # |
Hi Jose,
Changed and replied, see my inline replies.
BTW: I've verified the new octavia checks are added correctly with openstack-on-lxd env: https:/
Unmerged commits
- 18472a4... by Joe Guo
-
add octavia checks
Signed-off-by: Joe Guo <email address hidden>
- bd33e8e... by Xav Paice
-
Add ability to remove NRPE endpoint checks
When a service is removed from the Keystone catalog, this adds a
mechanism to remove that service from the list of NRPE checks so it
doesn't alert CRITICAL for services that aren't there any more.Closes-Bug: LP: #1836385
- 0bfc801... by Jeremy Lounder
-
Updated maintainers in metadata.yaml
Preview Diff
1 | diff --git a/actions.yaml b/actions.yaml | |||
2 | 0 | new file mode 100644 | 0 | new file mode 100644 |
3 | index 0000000..9e5a96a | |||
4 | --- /dev/null | |||
5 | +++ b/actions.yaml | |||
6 | @@ -0,0 +1,3 @@ | |||
7 | 1 | refresh-endpoint-checks: | ||
8 | 2 | description: >- | ||
9 | 3 | Trigger the Keystone endpoints to be reloaded on next update-status hook. | ||
10 | diff --git a/actions/actions.py b/actions/actions.py | |||
11 | 0 | new file mode 100755 | 4 | new file mode 100755 |
12 | index 0000000..1e679b6 | |||
13 | --- /dev/null | |||
14 | +++ b/actions/actions.py | |||
15 | @@ -0,0 +1,75 @@ | |||
16 | 1 | #!/usr/local/sbin/charm-env python3 | ||
17 | 2 | # Copyright 2020 Canonical Ltd | ||
18 | 3 | # | ||
19 | 4 | # Licensed under the Apache License, Version 2.0 (the "License"); | ||
20 | 5 | # you may not use this file except in compliance with the License. | ||
21 | 6 | # You may obtain a copy of the License at | ||
22 | 7 | # | ||
23 | 8 | # http://www.apache.org/licenses/LICENSE-2.0 | ||
24 | 9 | # | ||
25 | 10 | # Unless required by applicable law or agreed to in writing, software | ||
26 | 11 | # distributed under the License is distributed on an "AS IS" BASIS, | ||
27 | 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
28 | 13 | # See the License for the specific language governing permissions and | ||
29 | 14 | # limitations under the License. | ||
30 | 15 | |||
31 | 16 | import os | ||
32 | 17 | import sys | ||
33 | 18 | import charmhelpers.core.hookenv as hookenv | ||
34 | 19 | import charmhelpers.core.unitdata as unitdata | ||
35 | 20 | from traceback import format_exc | ||
36 | 21 | |||
37 | 22 | # Load modules from $CHARM_DIR/lib | ||
38 | 23 | sys.path.append("lib") | ||
39 | 24 | |||
40 | 25 | import charms.reactive # NOQA: E402 | ||
41 | 26 | from charms.layer import basic # NOQA: E402 | ||
42 | 27 | |||
43 | 28 | basic.bootstrap_charm_deps() | ||
44 | 29 | basic.init_config_states() | ||
45 | 30 | |||
46 | 31 | from charms.reactive.flags import clear_flag # NOQA: E402 | ||
47 | 32 | |||
48 | 33 | |||
49 | 34 | def refresh_endpoint_checks(*args): | ||
50 | 35 | """Clear the openstack-service-checks.endpoints.configured flag | ||
51 | 36 | so that next time update-status runs, the Keystone catalog is re-read | ||
52 | 37 | and nrpe checks refreshed. | ||
53 | 38 | """ | ||
54 | 39 | clear_flag("openstack-service-checks.endpoints.configured") | ||
55 | 40 | |||
56 | 41 | |||
57 | 42 | # Actions to function mapping, to allow for illegal python action names that | ||
58 | 43 | # can map to a python function. | ||
59 | 44 | ACTIONS = { | ||
60 | 45 | "refresh-endpoint-checks": refresh_endpoint_checks, | ||
61 | 46 | } | ||
62 | 47 | |||
63 | 48 | |||
64 | 49 | def main(args): | ||
65 | 50 | action_name = os.path.basename(args[0]) | ||
66 | 51 | try: | ||
67 | 52 | action = ACTIONS[action_name] | ||
68 | 53 | except KeyError: | ||
69 | 54 | return "Action %s undefined" % action_name | ||
70 | 55 | else: | ||
71 | 56 | try: | ||
72 | 57 | action(args) | ||
73 | 58 | except Exception: | ||
74 | 59 | exc = format_exc() | ||
75 | 60 | hookenv.log(exc, hookenv.ERROR) | ||
76 | 61 | hookenv.action_fail(exc.splitlines()[-1]) | ||
77 | 62 | else: | ||
78 | 63 | # we were successful, so commit changes from the action | ||
79 | 64 | unitdata.kv().flush() | ||
80 | 65 | # try running handlers based on new state | ||
81 | 66 | try: | ||
82 | 67 | charms.reactive.main() | ||
83 | 68 | except Exception: | ||
84 | 69 | exc = format_exc() | ||
85 | 70 | hookenv.log(exc, hookenv.ERROR) | ||
86 | 71 | hookenv.action_fail(exc.splitlines()[-1]) | ||
87 | 72 | |||
88 | 73 | |||
89 | 74 | if __name__ == "__main__": | ||
90 | 75 | sys.exit(main(sys.argv)) | ||
91 | diff --git a/actions/refresh-endpoint-checks b/actions/refresh-endpoint-checks | |||
92 | 0 | new file mode 120000 | 76 | new file mode 120000 |
93 | index 0000000..405a394 | |||
94 | --- /dev/null | |||
95 | +++ b/actions/refresh-endpoint-checks | |||
96 | @@ -0,0 +1 @@ | |||
97 | 1 | actions.py | ||
98 | 0 | \ No newline at end of file | 2 | \ No newline at end of file |
99 | diff --git a/config.yaml b/config.yaml | |||
100 | index 073ba9b..edfda51 100644 | |||
101 | --- a/config.yaml | |||
102 | +++ b/config.yaml | |||
103 | @@ -10,6 +10,24 @@ options: | |||
104 | 10 | description: | | 10 | description: | |
105 | 11 | Switch to turn on or off neutron agents checks. By default, neutron_agents nrpe check is enabled. | 11 | Switch to turn on or off neutron agents checks. By default, neutron_agents nrpe check is enabled. |
106 | 12 | If a different SDN (ie. Contrail) is in use, you may want to disable this check. | 12 | If a different SDN (ie. Contrail) is in use, you may want to disable this check. |
107 | 13 | check-octavia: | ||
108 | 14 | default: True | ||
109 | 15 | type: boolean | ||
110 | 16 | description: | | ||
111 | 17 | Switch to turn on or off check for octavia services. | ||
112 | 18 | octavia-amp-image-tag: | ||
113 | 19 | default: "octavia-amphora" | ||
114 | 20 | type: string | ||
115 | 21 | description: | | ||
116 | 22 | The glance image tag octavia will use to create amphora. | ||
117 | 23 | octavia-amp-image-days: | ||
118 | 24 | default: 365 | ||
119 | 25 | type: int | ||
120 | 26 | description: | | ||
121 | 27 | If latest glance image tagged with above octavia-amp-image-tag is updated more than these days ago, | ||
122 | 28 | a Nagios warning will be raised. The version of octavia agent builtin in amphora image must match | ||
123 | 29 | version of octavia controller, otherwise octavia will fail to communicate with new amphora, | ||
124 | 30 | failover will also fail. | ||
125 | 13 | check-rally: | 31 | check-rally: |
126 | 14 | default: False | 32 | default: False |
127 | 15 | type: boolean | 33 | type: boolean |
128 | diff --git a/files/plugins/check_octavia.py b/files/plugins/check_octavia.py | |||
129 | 16 | new file mode 100755 | 34 | new file mode 100755 |
130 | index 0000000..09fb6b1 | |||
131 | --- /dev/null | |||
132 | +++ b/files/plugins/check_octavia.py | |||
133 | @@ -0,0 +1,227 @@ | |||
134 | 1 | #!/usr/bin/env python3 | ||
135 | 2 | |||
136 | 3 | import os | ||
137 | 4 | import sys | ||
138 | 5 | import json | ||
139 | 6 | import argparse | ||
140 | 7 | import subprocess | ||
141 | 8 | from datetime import datetime, timedelta | ||
142 | 9 | import openstack | ||
143 | 10 | |||
144 | 11 | NAGIOS_STATUS_OK = 0 | ||
145 | 12 | NAGIOS_STATUS_WARNING = 1 | ||
146 | 13 | NAGIOS_STATUS_CRITICAL = 2 | ||
147 | 14 | NAGIOS_STATUS_UNKNOWN = 3 | ||
148 | 15 | |||
149 | 16 | NAGIOS_STATUS = { | ||
150 | 17 | NAGIOS_STATUS_OK: 'OK', | ||
151 | 18 | NAGIOS_STATUS_WARNING: 'WARNING', | ||
152 | 19 | NAGIOS_STATUS_CRITICAL: 'CRITICAL', | ||
153 | 20 | NAGIOS_STATUS_UNKNOWN: 'UNKNOWN', | ||
154 | 21 | } | ||
155 | 22 | |||
156 | 23 | |||
157 | 24 | def nagios_exit(status, message): | ||
158 | 25 | assert status in NAGIOS_STATUS, "Invalid Nagios status code" | ||
159 | 26 | # prefix status name to message | ||
160 | 27 | output = '{}: {}'.format(NAGIOS_STATUS[status], message) | ||
161 | 28 | print(output) # nagios requires print to stdout, no stderr | ||
162 | 29 | sys.exit(status) | ||
163 | 30 | |||
164 | 31 | |||
165 | 32 | def check_loadbalancers(connection): | ||
166 | 33 | """check loadbalancers status.""" | ||
167 | 34 | |||
168 | 35 | lb_mgr = connection.load_balancer | ||
169 | 36 | lb_all = lb_mgr.load_balancers() | ||
170 | 37 | |||
171 | 38 | # only check enabled lbs | ||
172 | 39 | lb_enabled = [lb for lb in lb_all if lb.is_admin_state_up] | ||
173 | 40 | |||
174 | 41 | # check provisioning_status is ACTIVE for each lb | ||
175 | 42 | bad_lbs = [lb for lb in lb_enabled if lb.provisioning_status != 'ACTIVE'] | ||
176 | 43 | if bad_lbs: | ||
177 | 44 | parts = ['loadbalancer {} provisioning_status is {}'.format( | ||
178 | 45 | lb.id, lb.provisioning_status) for lb in bad_lbs] | ||
179 | 46 | message = ', '.join(parts) | ||
180 | 47 | return NAGIOS_STATUS_CRITICAL, message | ||
181 | 48 | |||
182 | 49 | # raise WARNING if operating_status is not ONLINE | ||
183 | 50 | bad_lbs = [lb for lb in lb_enabled if lb.operating_status != 'ONLINE'] | ||
184 | 51 | if bad_lbs: | ||
185 | 52 | parts = ['loadbalancer {} operating_status is {}'.format( | ||
186 | 53 | lb.id, lb.operating_status) for lb in bad_lbs] | ||
187 | 54 | message = ', '.join(parts) | ||
188 | 55 | return NAGIOS_STATUS_CRITICAL, message | ||
189 | 56 | |||
190 | 57 | net_mgr = connection.network | ||
191 | 58 | # check vip port exists for each lb | ||
192 | 59 | bad_lbs = [] | ||
193 | 60 | for lb in lb_enabled: | ||
194 | 61 | try: | ||
195 | 62 | net_mgr.get_port(lb.vip_port_id) | ||
196 | 63 | except openstack.exceptions.NotFoundException: | ||
197 | 64 | bad_lbs.append(lb) | ||
198 | 65 | if bad_lbs: | ||
199 | 66 | parts = ['vip port {} for loadbalancer {} not found'.format( | ||
200 | 67 | lb.vip_port_id, lb.id) for lb in bad_lbs] | ||
201 | 68 | message = ', '.join(parts) | ||
202 | 69 | return NAGIOS_STATUS_CRITICAL, message | ||
203 | 70 | |||
204 | 71 | # warn about disabled lbs if no other error found | ||
205 | 72 | lb_disabled = [lb for lb in lb_all if not lb.is_admin_state_up] | ||
206 | 73 | if lb_disabled: | ||
207 | 74 | parts = ['loadbalancer {} admin_state_up is False'.format(lb.id) | ||
208 | 75 | for lb in lb_disabled] | ||
209 | 76 | message = ', '.join(parts) | ||
210 | 77 | return NAGIOS_STATUS_WARNING, message | ||
211 | 78 | |||
212 | 79 | return NAGIOS_STATUS_OK, 'loadbalancers are happy' | ||
213 | 80 | |||
214 | 81 | |||
215 | 82 | def check_pools(connection): | ||
216 | 83 | """check pools status.""" | ||
217 | 84 | lb_mgr = connection.load_balancer | ||
218 | 85 | pools_all = lb_mgr.pools() | ||
219 | 86 | pools_enabled = [pool for pool in pools_all if pool.is_admin_state_up] | ||
220 | 87 | |||
221 | 88 | # check provisioning_status is ACTIVE for each pool | ||
222 | 89 | bad_pools = [pool for pool in pools_enabled if pool.provisioning_status != 'ACTIVE'] | ||
223 | 90 | if bad_pools: | ||
224 | 91 | parts = ['pool {} provisioning_status is {}'.format( | ||
225 | 92 | pool.id, pool.provisioning_status) for pool in bad_pools] | ||
226 | 93 | message = ', '.join(parts) | ||
227 | 94 | return NAGIOS_STATUS_CRITICAL, message | ||
228 | 95 | |||
229 | 96 | # raise CRITICAL if operating_status is ERROR | ||
230 | 97 | bad_pools = [pool for pool in pools_enabled if pool.operating_status == 'ERROR'] | ||
231 | 98 | if bad_pools: | ||
232 | 99 | parts = ['pool {} operating_status is {}'.format( | ||
233 | 100 | pool.id, pool.operating_status) for pool in bad_pools] | ||
234 | 101 | message = ', '.join(parts) | ||
235 | 102 | return NAGIOS_STATUS_CRITICAL, message | ||
236 | 103 | |||
237 | 104 | # raise WARNING if operating_status is NO_MONITOR | ||
238 | 105 | bad_pools = [pool for pool in pools_enabled if pool.operating_status == 'NO_MONITOR'] | ||
239 | 106 | if bad_pools: | ||
240 | 107 | parts = ['pool {} operating_status is {}'.format( | ||
241 | 108 | pool.id, pool.operating_status) for pool in bad_pools] | ||
242 | 109 | message = ', '.join(parts) | ||
243 | 110 | return NAGIOS_STATUS_WARNING, message | ||
244 | 111 | |||
245 | 112 | return NAGIOS_STATUS_OK, 'pools are happy' | ||
246 | 113 | |||
247 | 114 | |||
248 | 115 | def check_amphorae(connection): | ||
249 | 116 | """check amphorae status.""" | ||
250 | 117 | |||
251 | 118 | lb_mgr = connection.load_balancer | ||
252 | 119 | |||
253 | 120 | resp = lb_mgr.get('/v2/octavia/amphorae') | ||
254 | 121 | # python api is not available yet, use url | ||
255 | 122 | if resp.status_code != 200: | ||
256 | 123 | return NAGIOS_STATUS_WARNING, 'amphorae api not working' | ||
257 | 124 | |||
258 | 125 | data = json.loads(resp.content) | ||
259 | 126 | # ouput is like {"amphorae": [{...}, {...}, ...]} | ||
260 | 127 | items = data.get('amphorae', []) | ||
261 | 128 | |||
262 | 129 | # raise CRITICAL for ERROR status | ||
263 | 130 | bad_status_list = ('ERROR',) | ||
264 | 131 | bad_items = [item for item in items if item['status'] in bad_status_list] | ||
265 | 132 | if bad_items: | ||
266 | 133 | parts = [ | ||
267 | 134 | 'amphora {} status is {}'.format(item['id'], item['status']) | ||
268 | 135 | for item in bad_items] | ||
269 | 136 | message = ', '.join(parts) | ||
270 | 137 | return NAGIOS_STATUS_CRITICAL, message | ||
271 | 138 | |||
272 | 139 | # raise WARNING for these status | ||
273 | 140 | bad_status_list = ( | ||
274 | 141 | 'PENDING_CREATE', 'PENDING_UPDATE', 'PENDING_DELETE', 'BOOTING') | ||
275 | 142 | bad_items = [item for item in items if item['status'] in bad_status_list] | ||
276 | 143 | if bad_items: | ||
277 | 144 | parts = [ | ||
278 | 145 | 'amphora {} status is {}'.format(item['id'], item['status']) | ||
279 | 146 | for item in bad_items] | ||
280 | 147 | message = ', '.join(parts) | ||
281 | 148 | return NAGIOS_STATUS_WARNING, message | ||
282 | 149 | |||
283 | 150 | return NAGIOS_STATUS_OK, 'amphorae are happy' | ||
284 | 151 | |||
285 | 152 | |||
286 | 153 | def check_image(connection, tag, days): | ||
287 | 154 | img_mgr = connection.image | ||
288 | 155 | images = list(img_mgr.images(tag=tag)) | ||
289 | 156 | |||
290 | 157 | if not images: | ||
291 | 158 | message = ('Octavia requires image with tag {} to create amphora, ' | ||
292 | 159 | 'but none exist').format(tag) | ||
293 | 160 | return NAGIOS_STATUS_CRITICAL, message | ||
294 | 161 | |||
295 | 162 | active_images = [image for image in images if image.status == 'active'] | ||
296 | 163 | if not active_images: | ||
297 | 164 | parts = ['{}({})'.format(image.name, image.id) for image in images] | ||
298 | 165 | message = ('Octavia requires image with tag {} to create amphora, ' | ||
299 | 166 | 'but none is active: {}').format(tag, ', '.join(parts)) | ||
300 | 167 | return NAGIOS_STATUS_CRITICAL, message | ||
301 | 168 | |||
302 | 169 | # raise WARNING if image is too old | ||
303 | 170 | when = (datetime.now() - timedelta(days=days)).isoformat() | ||
304 | 171 | # updated_at str format: '2019-12-05T18:21:25Z' | ||
305 | 172 | fresh_images = [image for image in active_images if image.updated_at > when] | ||
306 | 173 | if not fresh_images: | ||
307 | 174 | message = ('Octavia requires image with tag {} to create amphora, ' | ||
308 | 175 | 'but it is older than {} days').format(tag, days) | ||
309 | 176 | return NAGIOS_STATUS_WARNING, message | ||
310 | 177 | |||
311 | 178 | return NAGIOS_STATUS_OK, 'image is ready' | ||
312 | 179 | |||
313 | 180 | |||
314 | 181 | if __name__ == '__main__': | ||
315 | 182 | parser = argparse.ArgumentParser( | ||
316 | 183 | description='Check Octavia status', | ||
317 | 184 | formatter_class=argparse.ArgumentDefaultsHelpFormatter, | ||
318 | 185 | ) | ||
319 | 186 | parser.add_argument( | ||
320 | 187 | '--env', dest='env', default='/var/lib/nagios/nagios.novarc', | ||
321 | 188 | help='Novarc file to use for this check') | ||
322 | 189 | |||
323 | 190 | check_choices = ['loadbalancers', 'amphorae', 'pools', 'image'] | ||
324 | 191 | parser.add_argument( | ||
325 | 192 | '--check', dest='check', metavar='|'.join(check_choices), | ||
326 | 193 | type=str, choices=check_choices, | ||
327 | 194 | default=check_choices[0], | ||
328 | 195 | help='which check to run') | ||
329 | 196 | |||
330 | 197 | parser.add_argument( | ||
331 | 198 | '--amp-image-tag', dest='amp_image_tag', default='octavia-amphora', | ||
332 | 199 | help='amphora image tag for image check') | ||
333 | 200 | |||
334 | 201 | parser.add_argument( | ||
335 | 202 | '--amp-image-days', dest='amp_image_days', type=int, default=365, | ||
336 | 203 | help='raise warning if amphora image is older than these days') | ||
337 | 204 | |||
338 | 205 | args = parser.parse_args() | ||
339 | 206 | # source environment vars | ||
340 | 207 | command = ['/bin/bash', '-c', 'source {} && env'.format(args.env)] | ||
341 | 208 | proc = subprocess.Popen(command, stdout=subprocess.PIPE) | ||
342 | 209 | for line in proc.stdout: | ||
343 | 210 | (key, _, value) = line.partition(b'=') | ||
344 | 211 | os.environ[key.decode('utf-8')] = value.rstrip().decode('utf-8') | ||
345 | 212 | proc.communicate() | ||
346 | 213 | |||
347 | 214 | # use closure to make all checks have same signature | ||
348 | 215 | # so we can handle them in same way | ||
349 | 216 | def _check_image(connection): | ||
350 | 217 | return check_image(connection, args.amp_image_tag, args.amp_image_days) | ||
351 | 218 | |||
352 | 219 | checks = { | ||
353 | 220 | 'loadbalancers': check_loadbalancers, | ||
354 | 221 | 'amphorae': check_amphorae, | ||
355 | 222 | 'pools': check_pools, | ||
356 | 223 | 'image': _check_image, | ||
357 | 224 | } | ||
358 | 225 | |||
359 | 226 | connection = openstack.connect(cloud='envvars') | ||
360 | 227 | nagios_exit(*checks[args.check](connection)) | ||
361 | diff --git a/layer.yaml b/layer.yaml | |||
362 | index 4053919..0606d45 100644 | |||
363 | --- a/layer.yaml | |||
364 | +++ b/layer.yaml | |||
365 | @@ -20,6 +20,8 @@ options: | |||
366 | 20 | - python3-keystoneclient | 20 | - python3-keystoneclient |
367 | 21 | - python3-openstackclient | 21 | - python3-openstackclient |
368 | 22 | - python-openstackclient | 22 | - python-openstackclient |
369 | 23 | - python3-octaviaclient | ||
370 | 24 | - python-octaviaclient | ||
371 | 23 | snap: | 25 | snap: |
372 | 24 | fcbtest: | 26 | fcbtest: |
373 | 25 | channel: stable | 27 | channel: stable |
374 | diff --git a/lib/lib_openstack_service_checks.py b/lib/lib_openstack_service_checks.py | |||
375 | index 9d123fb..38e5f0f 100644 | |||
376 | --- a/lib/lib_openstack_service_checks.py | |||
377 | +++ b/lib/lib_openstack_service_checks.py | |||
378 | @@ -65,6 +65,18 @@ class OSCHelper(): | |||
379 | 65 | return self.charm_config['check-neutron-agents'] | 65 | return self.charm_config['check-neutron-agents'] |
380 | 66 | 66 | ||
381 | 67 | @property | 67 | @property |
382 | 68 | def is_octavia_check_enabled(self): | ||
383 | 69 | return self.charm_config['check-octavia'] | ||
384 | 70 | |||
385 | 71 | @property | ||
386 | 72 | def octavia_amp_image_tag(self): | ||
387 | 73 | return self.charm_config['octavia-amp-image-tag'] | ||
388 | 74 | |||
389 | 75 | @property | ||
390 | 76 | def octavia_amp_image_days(self): | ||
391 | 77 | return self.charm_config['octavia-amp-image-days'] | ||
392 | 78 | |||
393 | 79 | @property | ||
394 | 68 | def skipped_rally_checks(self): | 80 | def skipped_rally_checks(self): |
395 | 69 | skipped_os_components = self.charm_config['skip-rally'].strip() | 81 | skipped_os_components = self.charm_config['skip-rally'].strip() |
396 | 70 | if not skipped_os_components: | 82 | if not skipped_os_components: |
397 | @@ -186,6 +198,28 @@ class OSCHelper(): | |||
398 | 186 | else: | 198 | else: |
399 | 187 | nrpe.remove_check(shortname='neutron_agents') | 199 | nrpe.remove_check(shortname='neutron_agents') |
400 | 188 | 200 | ||
401 | 201 | if self.is_octavia_check_enabled: | ||
402 | 202 | script = os.path.join(self.plugins_dir, 'check_octavia.py') | ||
403 | 203 | |||
404 | 204 | for check in ('loadbalancers', 'amphorae', 'pools'): | ||
405 | 205 | nrpe.add_check( | ||
406 | 206 | shortname='octavia_{}'.format(check), | ||
407 | 207 | description='Check octavia {} status'.format(check), | ||
408 | 208 | check_cmd='{} --check {}'.format(script, check), | ||
409 | 209 | ) | ||
410 | 210 | |||
411 | 211 | # image check has extra args, add it separately | ||
412 | 212 | check = 'image' | ||
413 | 213 | nrpe.add_check( | ||
414 | 214 | shortname='octavia_{}'.format(check), | ||
415 | 215 | description='Check octavia {} status'.format(check), | ||
416 | 216 | check_cmd='{} --check {} --amp-image-tag {} --amp-image-days {}'.format( | ||
417 | 217 | script, check, self.octavia_amp_image_tag, self.octavia_amp_image_days), | ||
418 | 218 | ) | ||
419 | 219 | else: | ||
420 | 220 | for check in ('loadbalancers', 'amphorae', 'pools', 'image'): | ||
421 | 221 | nrpe.remove_check(shortname='octavia_{}'.format(check)) | ||
422 | 222 | |||
423 | 189 | if self.contrail_analytics_vip: | 223 | if self.contrail_analytics_vip: |
424 | 190 | contrail_check_command = '{} --host {}'.format( | 224 | contrail_check_command = '{} --host {}'.format( |
425 | 191 | os.path.join(self.plugins_dir, 'check_contrail_analytics_alarms.py'), | 225 | os.path.join(self.plugins_dir, 'check_contrail_analytics_alarms.py'), |
426 | @@ -272,7 +306,7 @@ class OSCHelper(): | |||
427 | 272 | endpoints = self.keystone_endpoints | 306 | endpoints = self.keystone_endpoints |
428 | 273 | services = [svc for svc in self.keystone_services if svc.enabled] | 307 | services = [svc for svc in self.keystone_services if svc.enabled] |
429 | 274 | nrpe = NRPE() | 308 | nrpe = NRPE() |
431 | 275 | skip_service = set() | 309 | configured_endpoint_checks = dict() |
432 | 276 | for endpoint in endpoints: | 310 | for endpoint in endpoints: |
433 | 277 | endpoint.service_names = [x.name | 311 | endpoint.service_names = [x.name |
434 | 278 | for x in services | 312 | for x in services |
435 | @@ -281,21 +315,18 @@ class OSCHelper(): | |||
436 | 281 | endpoint.healthcheck_url = health_check_params.get(service_name, '/') | 315 | endpoint.healthcheck_url = health_check_params.get(service_name, '/') |
437 | 282 | 316 | ||
438 | 283 | # Note(aluria): glance-simplestreams-sync does not provide an API to check | 317 | # Note(aluria): glance-simplestreams-sync does not provide an API to check |
440 | 284 | if service_name == 'image-stream': | 318 | # Note(aluria): filter:healthcheck is not configured in Keystone v2 |
441 | 319 | # https://docs.openstack.org/keystone/pike/configuration.html#health-check-middleware | ||
442 | 320 | if service_name == 'image-stream' or service_name == 'keystone': | ||
443 | 285 | continue | 321 | continue |
444 | 286 | 322 | ||
445 | 287 | if not hasattr(endpoint, 'interface'): | 323 | if not hasattr(endpoint, 'interface'): |
446 | 288 | if service_name == 'keystone': | ||
447 | 289 | # Note(aluria): filter:healthcheck is not configured in v2 | ||
448 | 290 | # https://docs.openstack.org/keystone/pike/configuration.html#health-check-middleware | ||
449 | 291 | continue | ||
450 | 292 | for interface in 'admin internal public'.split(): | 324 | for interface in 'admin internal public'.split(): |
451 | 293 | old_interface_name = '{}url'.format(interface) | 325 | old_interface_name = '{}url'.format(interface) |
452 | 294 | if not hasattr(endpoint, old_interface_name): | 326 | if not hasattr(endpoint, old_interface_name): |
453 | 295 | continue | 327 | continue |
454 | 296 | endpoint.interface = interface | 328 | endpoint.interface = interface |
455 | 297 | endpoint.url = getattr(endpoint, old_interface_name) | 329 | endpoint.url = getattr(endpoint, old_interface_name) |
456 | 298 | skip_service.add(service_name) | ||
457 | 299 | break | 330 | break |
458 | 300 | 331 | ||
459 | 301 | check_url = urlparse(endpoint.url) | 332 | check_url = urlparse(endpoint.url) |
460 | @@ -324,10 +355,26 @@ class OSCHelper(): | |||
461 | 324 | check_cmd=' '.join(cmd_params_cert)) | 355 | check_cmd=' '.join(cmd_params_cert)) |
462 | 325 | 356 | ||
463 | 326 | # Add the actual health check for the URL | 357 | # Add the actual health check for the URL |
465 | 327 | nrpe.add_check(shortname='{}_{}'.format(service_name, endpoint.interface), | 358 | nrpe_shortname = '{}_{}'.format(service_name, endpoint.interface) |
466 | 359 | nrpe.add_check(shortname=nrpe_shortname, | ||
467 | 328 | description='Endpoint url check for {} {}'.format(service_name, endpoint.interface), | 360 | description='Endpoint url check for {} {}'.format(service_name, endpoint.interface), |
468 | 329 | check_cmd=' '.join(cmd_params)) | 361 | check_cmd=' '.join(cmd_params)) |
469 | 362 | configured_endpoint_checks[nrpe_shortname] = True | ||
470 | 363 | nrpe.write() | ||
471 | 364 | self._remove_old_nrpe_endpoint_checks(nrpe, configured_endpoint_checks) | ||
472 | 330 | 365 | ||
473 | 366 | def _remove_old_nrpe_endpoint_checks(self, nrpe, configured_endpoint_checks): | ||
474 | 367 | """Loop through the old and new endpoint checks, if there are checks that aren't needed any more, | ||
475 | 368 | remove them. | ||
476 | 369 | """ | ||
477 | 370 | kv = unitdata.kv() | ||
478 | 371 | endpoint_delta = kv.delta(configured_endpoint_checks, 'endpoint_checks') | ||
479 | 372 | kv.update(configured_endpoint_checks, 'endpoint_checks') | ||
480 | 373 | for nrpe_shortname in endpoint_delta.items(): | ||
481 | 374 | # generates tuples of the format ('heat_public', Delta(previous=None, current=True)) | ||
482 | 375 | # remove any that are not current | ||
483 | 376 | if not nrpe_shortname[1].current: | ||
484 | 377 | nrpe.remove_check(shortname=nrpe_shortname[0]) | ||
485 | 331 | nrpe.write() | 378 | nrpe.write() |
486 | 332 | 379 | ||
487 | 333 | def get_keystone_client(self, creds): | 380 | def get_keystone_client(self, creds): |
488 | diff --git a/metadata.yaml b/metadata.yaml | |||
489 | index d4b7e87..94a7db8 100644 | |||
490 | --- a/metadata.yaml | |||
491 | +++ b/metadata.yaml | |||
492 | @@ -1,7 +1,7 @@ | |||
493 | 1 | name: openstack-service-checks | 1 | name: openstack-service-checks |
494 | 2 | summary: OpenStack Services NRPE Checks | 2 | summary: OpenStack Services NRPE Checks |
495 | 3 | description: OpenStack Services NRPE Checks | 3 | description: OpenStack Services NRPE Checks |
497 | 4 | maintainer: LMA Charmers <llama-charmers@lists.ubuntu.com> | 4 | maintainer: Llama (LMA) Charmers <llama-charmers@lists.ubuntu.com> |
498 | 5 | subordinate: false | 5 | subordinate: false |
499 | 6 | tags: | 6 | tags: |
500 | 7 | - openstack | 7 | - openstack |
501 | diff --git a/tests/functional/test_deploy.py b/tests/functional/test_deploy.py | |||
502 | index d0eea26..1716339 100644 | |||
503 | --- a/tests/functional/test_deploy.py | |||
504 | +++ b/tests/functional/test_deploy.py | |||
505 | @@ -128,6 +128,10 @@ async def test_openstackservicechecks_verify_default_nrpe_checks(deploy_app, mod | |||
506 | 128 | filenames.extend([ | 128 | filenames.extend([ |
507 | 129 | '/etc/nagios/nrpe.d/check_nova_services.cfg', | 129 | '/etc/nagios/nrpe.d/check_nova_services.cfg', |
508 | 130 | '/etc/nagios/nrpe.d/check_neutron_agents.cfg', | 130 | '/etc/nagios/nrpe.d/check_neutron_agents.cfg', |
509 | 131 | '/etc/nagios/nrpe.d/check_octavia_loadbalancers.cfg', | ||
510 | 132 | '/etc/nagios/nrpe.d/check_octavia_amphorae.cfg', | ||
511 | 133 | '/etc/nagios/nrpe.d/check_octavia_pools.cfg', | ||
512 | 134 | '/etc/nagios/nrpe.d/check_octavia_image.cfg', | ||
513 | 131 | ]) | 135 | ]) |
514 | 132 | for filename in filenames: | 136 | for filename in filenames: |
515 | 133 | test_stat = await file_stat(filename, unit) | 137 | test_stat = await file_stat(filename, unit) |
This merge proposal is being monitored by mergebot. Change the status to Approved to merge.