Merge lp:~doanac/ubuntu-ci-services-itself/rabbit-queue-status into lp:ubuntu-ci-services-itself

Proposed by Andy Doan
Status: Merged
Approved by: Andy Doan
Approved revision: 335
Merged at revision: 344
Proposed branch: lp:~doanac/ubuntu-ci-services-itself/rabbit-queue-status
Merge into: lp:ubuntu-ci-services-itself
Diff against target: 270 lines (+147/-3)
9 files modified
branch-source-builder/bsbuilder/resources/v1.py (+3/-1)
branch-source-builder/bsbuilder/tests/test_v1.py (+3/-1)
ci-utils/ci_utils/json_status.py (+35/-0)
ci-utils/ci_utils/tests/test_json_status.py (+96/-0)
image-builder/imagebuilder/resources/v1.py (+3/-1)
juju-deployer/branch-source-builder.yaml.tmpl (+2/-0)
juju-deployer/image-builder.yaml.tmpl (+2/-0)
juju-deployer/test-runner.yaml.tmpl (+2/-0)
test_runner/tstrun/resources/v1.py (+1/-0)
To merge this branch: bzr merge lp:~doanac/ubuntu-ci-services-itself/rabbit-queue-status
Reviewer Review Type Date Requested Status
Andy Doan (community) Approve
Vincent Ladeuil (community) Approve
PS Jenkins bot (community) continuous-integration Approve
Review via email: mp+209833@code.launchpad.net

Commit message

run_worker: add minimal monitoring for workers

We've had some periodic issues where a run_worker script
failes to come online.

This adds a simple check for each of our services that use
rabbitmq workers. It checks queue information via the rabbitmq
web API to see how many consumers are subscribed to the queue. This
will let us know when a run_worker script isn't running.

Description of the change

We've had a few bugs lately where one or more of our rabbit-workers weren't online. Its a hard situation to detect, and requires poking around via "juju-ssh". This adds a simple status check to each of our services using rabbit workers. They check for the "consumer count" of a queue which essentially indicates whether or not its corresponding runner is online.

In addition to unit test cases, I ran some test cases in the cloud with:

turning rabbit off yields a webui page that looks like:

 | imagebuild-restish/0 | rabbit configured: true
 | | workers-online: unable to check

turning off one of the workers yields (with "workers-online" highlighted in red):

 | imagebuild-restish/0 | rabbit configured: true
 | | workers-online: 0

and when everything is online:

 | imagebuild-restish/0 | rabbit configured: true
 | | workers-online: 1

To post a comment you must log in.
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :

FAILED: Continuous integration, rev:333
http://s-jenkins.ubuntu-ci:8080/job/uci-engine-ci/322/
Executed test runs:

Click here to trigger a rebuild:
http://s-jenkins.ubuntu-ci:8080/job/uci-engine-ci/322/rebuild

review: Needs Fixing (continuous-integration)
334. By Andy Doan

fix broken test case

Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :

PASSED: Continuous integration, rev:334
http://s-jenkins.ubuntu-ci:8080/job/uci-engine-ci/333/
Executed test runs:

Click here to trigger a rebuild:
http://s-jenkins.ubuntu-ci:8080/job/uci-engine-ci/333/rebuild

review: Approve (continuous-integration)
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :

PASSED: Continuous integration, rev:334
http://s-jenkins.ubuntu-ci:8080/job/uci-engine-ci/334/
Executed test runs:

Click here to trigger a rebuild:
http://s-jenkins.ubuntu-ci:8080/job/uci-engine-ci/334/rebuild

review: Approve (continuous-integration)
Revision history for this message
Vincent Ladeuil (vila) wrote :

62 + '''checks if there any workers attached to the queue.'''

'are' missing ?

66 + # we already report rabbit isn't configured, so no sense adding
67 + # another failure for this

Let's have a common way to report that error then, in the most obscure cases, it's better to have *some* output than none. I know this will never happen... but I know that if it happens, we'll be glad to have an error message ;)

review: Approve
335. By Andy Doan

grammar mistake

Revision history for this message
Andy Doan (doanac) wrote :

On 03/09/2014 04:43 AM, Vincent Ladeuil wrote:
> 62 + '''checks if there any workers attached to the queue.'''
>
> 'are' missing ?

fixed in revno(335)

> 66 + # we already report rabbit isn't configured, so no sense adding
> 67 + # another failure for this
>
> Let's have a common way to report that error then, in the most obscure cases, it's better to have*some* output than none. I know this will never happen... but I know that if it happens, we'll be glad to have an error message;)

We are via the "add_rabbit_configured" in that object. My point was
we'll be reporting a "rabbit isn't configured" message already so
showing another error is just piling on extra stuff that might make it
more confusing to diagnose.

Revision history for this message
Andy Doan (doanac) wrote :

self-acking since vila already acked.

review: Approve
Revision history for this message
Vincent Ladeuil (vila) wrote :

>>>>> Andy Doan <email address hidden> writes:

    > On 03/09/2014 04:43 AM, Vincent Ladeuil wrote:
    >> 62 + '''checks if there any workers attached to the queue.'''
    >>
    >> 'are' missing ?

    > fixed in revno(335)

    >> 66 + # we already report rabbit isn't configured, so no sense adding
    >> 67 + # another failure for this
    >>
    >> Let's have a common way to report that error then, in the most obscure cases, it's better to have*some* output than none. I know this will never happen... but I know that if it happens, we'll be glad to have an error message;)

    > We are via the "add_rabbit_configured" in that object. My point was
    > we'll be reporting a "rabbit isn't configured" message already so
    > showing another error is just piling on extra stuff that might make it
    > more confusing to diagnose.

Ok, I was mistaken then, I thought the previous error was fatal and that
this one could only trigger if the config existed before and disappeared
for unknown reasons.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'branch-source-builder/bsbuilder/resources/v1.py'
2--- branch-source-builder/bsbuilder/resources/v1.py 2014-03-06 19:04:41 +0000
3+++ branch-source-builder/bsbuilder/resources/v1.py 2014-03-10 01:19:47 +0000
4@@ -21,11 +21,13 @@
5 from ci_utils import amqp_utils, json_status, restish_utils
6
7 log = logging.getLogger(__name__)
8+WORKER_QUEUE = 'bsbuilder'
9
10
11 def _status():
12 status = json_status.JSONStatus()
13 status.add_rabbit_configured()
14+ status.add_rabbit_worker_health(WORKER_QUEUE)
15 return restish_utils.json_ok(status.results)
16
17
18@@ -43,7 +45,7 @@
19 log.error('Unable to notify progress trigger, aborting build_source')
20 return http.service_unavailable(body=r)
21
22- r = amqp_utils.send('bsbuilder', json.dumps(params))
23+ r = amqp_utils.send(WORKER_QUEUE, json.dumps(params))
24 if r:
25 # send only returns something if it an error message
26 r = http.service_unavailable(body=r)
27
28=== modified file 'branch-source-builder/bsbuilder/tests/test_v1.py'
29--- branch-source-builder/bsbuilder/tests/test_v1.py 2014-03-06 19:04:41 +0000
30+++ branch-source-builder/bsbuilder/tests/test_v1.py 2014-03-10 01:19:47 +0000
31@@ -41,7 +41,9 @@
32 }]
33 self.assertListEqual(expected, data)
34
35- get_config.return_value = {'foo': 'bar'}
36+ config = mock.Mock()
37+ config.AMQP_HOST = 'bar'
38+ get_config.return_value = config
39 resp = self.app.get('/api/v1/status', status=200)
40 data = json.loads(resp.body)
41 self.assertEqual(True, data[0]['value'])
42
43=== modified file 'ci-utils/ci_utils/json_status.py'
44--- ci-utils/ci_utils/json_status.py 2014-03-06 15:45:17 +0000
45+++ ci-utils/ci_utils/json_status.py 2014-03-10 01:19:47 +0000
46@@ -13,6 +13,9 @@
47 # You should have received a copy of the GNU General Public License along with
48 # this program. If not, see <http://www.gnu.org/licenses/>.
49
50+import json
51+import urllib2
52+
53
54 class JSONStatus(object):
55 def __init__(self):
56@@ -47,3 +50,35 @@
57 self.add_okay('rabbit configured', True)
58 else:
59 self.add_fail('rabbit configured', False)
60+
61+ def add_rabbit_worker_health(self, queue):
62+ '''checks if there are any workers attached to the queue.'''
63+ from ci_utils import amqp_utils
64+ config = amqp_utils.get_config()
65+ if config is None:
66+ # we already report rabbit isn't configured, so no sense adding
67+ # another failure for this
68+ return
69+
70+ mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
71+ base = 'http://{}:55672'.format(config.AMQP_HOST)
72+ # NOTE: juju leaves this guest/guest and doesn't seem to expose
73+ # anything in the relation to override it
74+ mgr.add_password(None, base, 'guest', 'guest')
75+ handler = urllib2.HTTPBasicAuthHandler(mgr)
76+
77+ try:
78+ resp = urllib2.build_opener(handler).open(base + '/api/queues')
79+ content = resp.read()
80+ data = json.loads(content)
81+ consumers = None
82+ for q in data:
83+ if q['name'] == queue:
84+ consumers = q['consumers']
85+ break
86+ if consumers is None:
87+ self.add_fail('workers-online', 'no queue defined')
88+ else:
89+ self.add_true_false('workers-online', consumers, consumers > 0)
90+ except:
91+ self.add_fail('workers-online', 'unable to check')
92
93=== added file 'ci-utils/ci_utils/tests/test_json_status.py'
94--- ci-utils/ci_utils/tests/test_json_status.py 1970-01-01 00:00:00 +0000
95+++ ci-utils/ci_utils/tests/test_json_status.py 2014-03-10 01:19:47 +0000
96@@ -0,0 +1,96 @@
97+# Ubuntu CI Engine
98+# Copyright 2014 Canonical Ltd.
99+
100+# This program is free software: you can redistribute it and/or modify it
101+# under the terms of the GNU Affero General Public License version 3, as
102+# published by the Free Software Foundation.
103+
104+# This program is distributed in the hope that it will be useful, but
105+# WITHOUT ANY WARRANTY; without even the implied warranties of
106+# MERCHANTABILITY, SATISFACTORY QUALITY, or FITNESS FOR A PARTICULAR
107+# PURPOSE. See the GNU Affero General Public License for more details.
108+
109+# You should have received a copy of the GNU Affero General Public License
110+# along with this program. If not, see <http://www.gnu.org/licenses/>.
111+
112+import json
113+import unittest
114+
115+import mock
116+
117+from ci_utils.json_status import JSONStatus
118+
119+
120+class TestRabbitStatus(unittest.TestCase):
121+ @mock.patch('ci_utils.amqp_utils.get_config')
122+ def testNoConfig(self, get_config):
123+ get_config.return_value = None
124+ status = JSONStatus()
125+ status.add_rabbit_worker_health('foo')
126+ self.assertEqual([], status.results)
127+
128+ @mock.patch('ci_utils.amqp_utils.get_config')
129+ @mock.patch('urllib2.build_opener')
130+ def testCantReach(self, opener, get_config):
131+ opener.side_effect = RuntimeError('foo')
132+ status = JSONStatus()
133+ status.add_rabbit_worker_health('foo')
134+ expected = {
135+ 'status': 'fail',
136+ 'value': 'unable to check',
137+ 'label': 'workers-online'
138+ }
139+ self.assertEqual([expected], status.results)
140+
141+ @mock.patch('ci_utils.amqp_utils.get_config')
142+ @mock.patch('urllib2.build_opener')
143+ def _testRabbitHealth(self, queue, data, opener, get_config):
144+ fdfake = mock.Mock()
145+ fdfake.read.return_value = json.dumps(data)
146+ con = mock.Mock()
147+ con.open.return_value = fdfake
148+ opener.return_value = con
149+ status = JSONStatus()
150+ status.add_rabbit_worker_health(queue)
151+ return status.results
152+
153+ def testWorkers(self):
154+ data = [
155+ {'name': 'bla', 'consumers': 2},
156+ {'name': 'foo', 'consumers': 1},
157+ {'name': 'bar', 'consumers': 1},
158+ ]
159+ expected = {
160+ 'status': 'okay',
161+ 'value': 1,
162+ 'label': 'workers-online'
163+ }
164+ results = self._testRabbitHealth('foo', data)
165+ self.assertEqual([expected], results)
166+
167+ def testNoWorkers(self):
168+ data = [
169+ {'name': 'bla', 'consumers': 2},
170+ {'name': 'foo', 'consumers': 0},
171+ {'name': 'bar', 'consumers': 1},
172+ ]
173+ expected = {
174+ 'status': 'fail',
175+ 'value': 0,
176+ 'label': 'workers-online'
177+ }
178+ results = self._testRabbitHealth('foo', data)
179+ self.assertEqual([expected], results)
180+
181+ def testNoQueue(self):
182+ data = [
183+ {'name': 'bla', 'consumers': 2},
184+ {'name': 'bar', 'consumers': 1},
185+ ]
186+ expected = {
187+ 'status': 'fail',
188+ 'value': 'no queue defined',
189+ 'label': 'workers-online'
190+ }
191+ results = self._testRabbitHealth('foo', data)
192+ self.assertEqual([expected], results)
193
194=== modified file 'image-builder/imagebuilder/resources/v1.py'
195--- image-builder/imagebuilder/resources/v1.py 2014-03-06 19:04:41 +0000
196+++ image-builder/imagebuilder/resources/v1.py 2014-03-10 01:19:47 +0000
197@@ -21,6 +21,7 @@
198 from ci_utils import amqp_utils, json_status, restish_utils
199
200 log = logging.getLogger(__name__)
201+WORKER_QUEUE = 'imagebuilder'
202
203
204 def _build_image(base_image, ppa_list, package_list, progress_trigger,
205@@ -39,7 +40,7 @@
206 log.error('Unable to notify progress trigger, aborting build_image')
207 return http.service_unavailable(body=r)
208
209- r = amqp_utils.send('imagebuilder', json.dumps(params))
210+ r = amqp_utils.send(WORKER_QUEUE, json.dumps(params))
211 if r:
212 r = http.service_unavailable(body=r)
213 return r
214@@ -48,6 +49,7 @@
215 def _status():
216 status = json_status.JSONStatus()
217 status.add_rabbit_configured()
218+ status.add_rabbit_worker_health(WORKER_QUEUE)
219 return restish_utils.json_ok(status.results)
220
221
222
223=== modified file 'juju-deployer/branch-source-builder.yaml.tmpl'
224--- juju-deployer/branch-source-builder.yaml.tmpl 2014-03-06 22:18:04 +0000
225+++ juju-deployer/branch-source-builder.yaml.tmpl 2014-03-10 01:19:47 +0000
226@@ -37,6 +37,8 @@
227 rabbit:
228 branch: lp:~canonical-ci-engineering/charms/precise/ubuntu-ci-services-itself/rabbitmq-server@46
229 charm: rabbitmq
230+ options:
231+ management_plugin: true
232 relations:
233 - [bsb-restish, bsb-gunicorn]
234 - [bsb-worker, rabbit]
235
236=== modified file 'juju-deployer/image-builder.yaml.tmpl'
237--- juju-deployer/image-builder.yaml.tmpl 2014-03-06 22:18:04 +0000
238+++ juju-deployer/image-builder.yaml.tmpl 2014-03-10 01:19:47 +0000
239@@ -38,6 +38,8 @@
240 rabbit:
241 branch: lp:~canonical-ci-engineering/charms/precise/ubuntu-ci-services-itself/rabbitmq-server@46
242 charm: rabbitmq
243+ options:
244+ management_plugin: true
245 relations:
246 - ["imagebuild-restish:wsgi", "imagebuild-gunicorn:wsgi-file"]
247 - ["imagebuild-worker:amqp", "rabbit:amqp"]
248
249=== modified file 'juju-deployer/test-runner.yaml.tmpl'
250--- juju-deployer/test-runner.yaml.tmpl 2014-03-06 14:13:05 +0000
251+++ juju-deployer/test-runner.yaml.tmpl 2014-03-10 01:19:47 +0000
252@@ -45,6 +45,8 @@
253 rabbit:
254 branch: lp:~canonical-ci-engineering/charms/precise/ubuntu-ci-services-itself/rabbitmq-server@46
255 charm: rabbitmq
256+ options:
257+ management_plugin: true
258 relations:
259 # Relations should be explicit as amulet can't infer them otherwise
260 # even if there is a single one
261
262=== modified file 'test_runner/tstrun/resources/v1.py'
263--- test_runner/tstrun/resources/v1.py 2014-03-04 22:48:57 +0000
264+++ test_runner/tstrun/resources/v1.py 2014-03-10 01:19:47 +0000
265@@ -34,6 +34,7 @@
266 def _status():
267 status = json_status.JSONStatus()
268 status.add_rabbit_configured()
269+ status.add_rabbit_worker_health(tstrun.internal_rabbit_queue)
270 return restish_utils.json_ok(status.results)
271
272

Subscribers

People subscribed via source and target branches