Merge lp:~tribaal/charms/trusty/rabbitmq-server/add-pause-resume-actions into lp:~openstack-charmers-archive/charms/trusty/rabbitmq-server/next

Proposed by Chris Glass
Status: Work in progress
Proposed branch: lp:~tribaal/charms/trusty/rabbitmq-server/add-pause-resume-actions
Merge into: lp:~openstack-charmers-archive/charms/trusty/rabbitmq-server/next
Diff against target: 316 lines (+225/-3)
7 files modified
Makefile (+1/-1)
actions.yaml (+4/-0)
actions/actions.py (+61/-0)
hooks/rabbit_utils.py (+31/-0)
hooks/rabbitmq_server_relations.py (+1/-1)
tests/basic_deployment.py (+125/-0)
unit_tests/test_rabbitmq_server_relations.py (+2/-1)
To merge this branch: bzr merge lp:~tribaal/charms/trusty/rabbitmq-server/add-pause-resume-actions
Reviewer Review Type Date Requested Status
David Ames (community) Needs Fixing
James Page Needs Fixing
Geoff Teale (community) Approve
Review via email: mp+277418@code.launchpad.net

Description of the change

This branch adds two unit-level actions, "pause" and "resume" to the charm, that users can execute to pause and respectively resume a particular unit of a rabbitmq-server cluster, similar to the way other charms implement the same feature (ceilometer, swift*).

A unit that is "paused" should not have any responsability anymore and should be allowed to be taken offline without further warning. In Rabbitmq's case this is pretty simple, since the application itself allows for this to happen by default. Pausing the unit will simply stop the rabbitmq service on the given unit. Other charms need to make sure no data is lost.

One small non-obvious change here: the @restart_on_change decorator is further wrapped to make it "pause aware" otherwise some service-level configuration changes would restart the rabbitmq services on paused units. Again, this is not critical in rabbitmq's case since the application logic should handle a killed service gracefully, but was added anyway for consistency's sake.

To post a comment you must log in.
Revision history for this message
Geoff Teale (tealeg) wrote :

+1 Approve.

review: Approve
Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_unit_test #12768 rabbitmq-server-next for tribaal mp277418
    UNIT FAIL: unit-test failed

UNIT Results (max last 2 lines):
make: *** [test] Error 1
ERROR:root:Make target returned non-zero.

Full unit test output: http://paste.ubuntu.com/13246423/
Build: http://10.245.162.77:8080/job/charm_unit_test/12768/

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_lint_check #13698 rabbitmq-server-next for tribaal mp277418
    LINT FAIL: lint-test failed

LINT Results (max last 2 lines):
make: *** [lint] Error 1
ERROR:root:Make target returned non-zero.

Full lint test output: http://paste.ubuntu.com/13246424/
Build: http://10.245.162.77:8080/job/charm_lint_check/13698/

Revision history for this message
James Page (james-page) wrote :

I think we need to avoid calling status_set directly in the actions, and move to updating the assess_status function in the main relations file to be pause aware; this will ensure that the next 'update-status' hook execution (which happens periodically in 1.25) will set the correct status of 'paused' rather than determining that rabbitmq is not running and setting 'blocked'

review: Needs Fixing
Revision history for this message
Chris Glass (tribaal) wrote :

> I think we need to avoid calling status_set directly in the actions, and move
> to updating the assess_status function in the main relations file to be pause
> aware; this will ensure that the next 'update-status' hook execution (which
> happens periodically in 1.25) will set the correct status of 'paused' rather
> than determining that rabbitmq is not running and setting 'blocked'

Excellent point. Changed the "status_set" to be in assess_status() as you suggested.

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_amulet_test #7855 rabbitmq-server-next for tribaal mp277418
    AMULET FAIL: amulet-test failed

AMULET Results (max last 2 lines):
make: *** [functional_test] Error 1
ERROR:root:Make target returned non-zero.

Full amulet test output: http://paste.ubuntu.com/13246587/
Build: http://10.245.162.77:8080/job/charm_amulet_test/7855/

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_unit_test #12769 rabbitmq-server-next for tribaal mp277418
    UNIT OK: passed

Build: http://10.245.162.77:8080/job/charm_unit_test/12769/

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_lint_check #13699 rabbitmq-server-next for tribaal mp277418
    LINT OK: passed

Build: http://10.245.162.77:8080/job/charm_lint_check/13699/

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_amulet_test #7856 rabbitmq-server-next for tribaal mp277418
    AMULET FAIL: amulet-test failed

AMULET Results (max last 2 lines):
make: *** [functional_test] Error 1
ERROR:root:Make target returned non-zero.

Full amulet test output: http://paste.ubuntu.com/13246796/
Build: http://10.245.162.77:8080/job/charm_amulet_test/7856/

Revision history for this message
James Page (james-page) wrote :

Hmm - that amulet test look suspicious as the rabbitmq cluster is not clustering correctly

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_amulet_test #7859 rabbitmq-server-next for tribaal mp277418
    AMULET FAIL: amulet-test failed

AMULET Results (max last 2 lines):
2015-11-13 18:31:20,197 publish_amqp_message_by_unit DEBUG: Closing connection...
ERROR:root:Make target returned non-zero.

Full amulet test output: http://paste.ubuntu.com/13249544/
Build: http://10.245.162.77:8080/job/charm_amulet_test/7859/

Revision history for this message
Ryan Beisner (1chb1n) wrote :

Please also add lint coverage for the new actions dir via the Makefile's lint target. Thank you.

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_amulet_test #7860 rabbitmq-server-next for tribaal mp277418
    AMULET FAIL: amulet-test failed

AMULET Results (max last 2 lines):
MESSAGE 1@172.17.107.152 [4B29C6B3-6F9A-4DF2-9A09-3675F6DE0EEA-1447444725.03]
ERROR:root:Make target returned non-zero.

Full amulet test output: http://paste.ubuntu.com/13250328/
Build: http://10.245.162.77:8080/job/charm_amulet_test/7860/

Revision history for this message
Chris Glass (tribaal) wrote :

> Please also add lint coverage for the new actions dir via the Makefile's lint
> target. Thank you.

Good catch! Done.

Thanks!

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_unit_test #12924 rabbitmq-server-next for tribaal mp277418
    UNIT OK: passed

Build: http://10.245.162.77:8080/job/charm_unit_test/12924/

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_lint_check #13863 rabbitmq-server-next for tribaal mp277418
    LINT OK: passed

Build: http://10.245.162.77:8080/job/charm_lint_check/13863/

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_lint_check #13866 rabbitmq-server-next for tribaal mp277418
    LINT OK: passed

Build: http://10.245.162.77:8080/job/charm_lint_check/13866/

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_unit_test #12926 rabbitmq-server-next for tribaal mp277418
    UNIT OK: passed

Build: http://10.245.162.77:8080/job/charm_unit_test/12926/

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_amulet_test #7900 rabbitmq-server-next for tribaal mp277418
    AMULET FAIL: amulet-test failed

AMULET Results (max last 2 lines):
2015-11-16 22:33:08,393 _test_rmq_amqp_messages_all_units DEBUG: Publish message to: 172.17.105.97 (rabbitmq-server/0 juju-osci-sv05-machine-4)
ERROR:root:Make target returned non-zero.

Full amulet test output: http://paste.ubuntu.com/13303433/
Build: http://10.245.162.77:8080/job/charm_amulet_test/7900/

Revision history for this message
David Ames (thedac) wrote :

Tests need fixing:
Traceback (most recent call last):
  File "tests/014-basic-precise-icehouse", line 11, in <module>
    deployment.run_tests()
  File "/var/lib/jenkins/checkout/rabbitmq-server/tests/charmhelpers/contrib/amulet/deployment.py", line 95, in run_tests
    getattr(self, test)()
  File "/var/lib/jenkins/checkout/rabbitmq-server/tests/basic_deployment.py", line 587, in test_416_pause_resume_actions
    self._test_pause()
  File "/var/lib/jenkins/checkout/rabbitmq-server/tests/basic_deployment.py", line 551, in _test_pause
    self._assert_services(should_run=True)
AttributeError: 'RmqBasicDeployment' object has no attribute '_assert_services'

review: Needs Fixing
Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_unit_test #12978 rabbitmq-server-next for tribaal mp277418
    UNIT OK: passed

Build: http://10.245.162.77:8080/job/charm_unit_test/12978/

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_lint_check #13923 rabbitmq-server-next for tribaal mp277418
    LINT OK: passed

Build: http://10.245.162.77:8080/job/charm_lint_check/13923/

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_amulet_test #7905 rabbitmq-server-next for tribaal mp277418
    AMULET FAIL: amulet-test failed

AMULET Results (max last 2 lines):
2015-11-17 14:35:37,575 connect_amqp_by_unit DEBUG: Connecting to amqp on 172.17.107.199:5671 (rabbitmq-server/2) as testuser1...
ERROR:root:Make target returned non-zero.

Full amulet test output: http://paste.ubuntu.com/13313632/
Build: http://10.245.162.77:8080/job/charm_amulet_test/7905/

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_unit_test #12981 rabbitmq-server-next for tribaal mp277418
    UNIT OK: passed

Build: http://10.245.162.77:8080/job/charm_unit_test/12981/

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_lint_check #13925 rabbitmq-server-next for tribaal mp277418
    LINT OK: passed

Build: http://10.245.162.77:8080/job/charm_lint_check/13925/

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_amulet_test #7908 rabbitmq-server-next for tribaal mp277418
    AMULET FAIL: amulet-test failed

AMULET Results (max last 2 lines):
2015-11-17 16:00:50,154 connect_amqp_by_unit DEBUG: Connecting to amqp on 172.17.106.18:5671 (rabbitmq-server/1) as testuser1...
ERROR:root:Make target returned non-zero.

Full amulet test output: http://paste.ubuntu.com/13314342/
Build: http://10.245.162.77:8080/job/charm_amulet_test/7908/

Revision history for this message
Ryan Beisner (1chb1n) wrote :

hi @tribaal

It's hard to say why the test failed on the pause action. I'd suggest adding a check + debug output when it fails the == check, before returning from wait_on_action() in charmhelpers/contrib/amulet/utils.py.

Unmerged revisions

134. By Chris Glass

This should hopefully fix tests once and for all.

133. By Chris Glass

Added missing method (straight copy from the swift-proxy charm).

132. By Chris Glass

Re-added actions directory to linting after commit revert.

131. By Chris Glass

Reverting last change - unintended charmhelpers sync was pulled in as well.

130. By Chris Glass

Added new actions directory to lint makefile target.

129. By Chris Glass

Fixed missing mock for tests.

128. By Chris Glass

Lint fixes.

127. By Chris Glass

Move the maintenance status set to the assess_status() function. (james's review)

126. By Chris Glass

Forgot the tests

125. By Chris Glass

First pass to make rabbitmq-server maintenance-aware.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'Makefile'
2--- Makefile 2015-10-21 13:36:00 +0000
3+++ Makefile 2015-11-17 15:16:48 +0000
4@@ -17,7 +17,7 @@
5
6 lint: .venv
7 @.venv/bin/flake8 --exclude hooks/charmhelpers,tests/charmhelpers hooks \
8- unit_tests tests
9+ unit_tests tests actions
10 @charm proof
11
12 bin/charm_helpers_sync.py:
13
14=== added directory 'actions'
15=== added file 'actions.yaml'
16--- actions.yaml 1970-01-01 00:00:00 +0000
17+++ actions.yaml 2015-11-17 15:16:48 +0000
18@@ -0,0 +1,4 @@
19+pause:
20+ description: Pause the RabbitMQ unit. This action will stop RabbitMQ services.
21+resume:
22+ descrpition: Resume the RabbitMQ unit. This action will start RabbitMQ services.
23
24=== added file 'actions/actions.py'
25--- actions/actions.py 1970-01-01 00:00:00 +0000
26+++ actions/actions.py 2015-11-17 15:16:48 +0000
27@@ -0,0 +1,61 @@
28+#!/usr/bin/python
29+
30+import os
31+import sys
32+
33+from charmhelpers.core.host import service_pause, service_resume
34+from charmhelpers.core.hookenv import action_fail
35+from charmhelpers.core.unitdata import kv
36+
37+from rabbit_utils import assess_status
38+
39+
40+def pause(args):
41+ """Pause the RabbitMQ services.
42+
43+ @raises Exception should the service fail to stop.
44+ """
45+ if not service_pause("rabbitmq-server"):
46+ raise Exception("Failed to pause rabbitmq-server.")
47+
48+ db = kv()
49+ db.set('unit-paused', True)
50+ db.flush()
51+
52+ assess_status()
53+
54+
55+def resume(args):
56+ """Resume the RabbitMQ services.
57+
58+ @raises Exception should the service fail to start."""
59+ if not service_resume("rabbitmq-server"):
60+ raise Exception("Failed to resume rabbitmq-server")
61+
62+ db = kv()
63+ db.set('unit-paused', False)
64+ db.flush()
65+
66+ assess_status()
67+
68+
69+# A dictionary of all the defined actions to callables (which take
70+# parsed arguments).
71+ACTIONS = {"pause": pause, "resume": resume}
72+
73+
74+def main(args):
75+ action_name = os.path.basename(args[0])
76+ try:
77+ action = ACTIONS[action_name]
78+ except KeyError:
79+ return "Action %s undefined" % action_name
80+ else:
81+ try:
82+ action(args)
83+ except Exception as e:
84+ action_fail(str(e))
85+
86+
87+if __name__ == "__main__":
88+ sys.exit(main(sys.argv))
89
90=== added symlink 'actions/charmhelpers'
91=== target is u'../hooks/charmhelpers'
92=== added symlink 'actions/pause'
93=== target is u'actions.py'
94=== added symlink 'actions/rabbit_utils.py'
95=== target is u'../hooks/rabbit_utils.py'
96=== added symlink 'actions/resume'
97=== target is u'actions.py'
98=== modified file 'hooks/rabbit_utils.py'
99--- hooks/rabbit_utils.py 2015-10-22 02:05:54 +0000
100+++ hooks/rabbit_utils.py 2015-11-17 15:16:48 +0000
101@@ -27,6 +27,7 @@
102 INFO,
103 service_name,
104 status_set,
105+ status_get,
106 cached,
107 unit_get,
108 relation_set,
109@@ -49,6 +50,11 @@
110
111 from collections import OrderedDict
112
113+from charmhelpers.core.unitdata import (
114+ HookData,
115+ kv,
116+)
117+
118 PACKAGES = ['rabbitmq-server', 'python-amqplib', 'lockfile-progs']
119
120 RABBITMQ_CTL = '/usr/sbin/rabbitmqctl'
121@@ -691,6 +697,12 @@
122 # NOTE: ensure rabbitmq is actually installed before doing
123 # any checks
124 if os.path.exists(RABBITMQ_CTL):
125+ # Pause check
126+ if is_paused():
127+ status_set(
128+ 'maintenance',
129+ "Unit paused - use 'resume' action to resume normal service")
130+ return
131 # Clustering Check
132 peer_ids = relation_ids('cluster')
133 if peer_ids and len(related_units(peer_ids[0])):
134@@ -751,3 +763,22 @@
135 wait_app()
136 return wrapped_f
137 return wrap
138+
139+
140+def is_paused(status_get=status_get):
141+ """Is the unit paused?"""
142+ with HookData()():
143+ if kv().get('unit-paused'):
144+ return True
145+ else:
146+ return False
147+
148+
149+def pause_aware_restart_on_change(restart_map):
150+ """Avoids restarting services if config changes when unit is paused."""
151+ def wrapper(f):
152+ if is_paused():
153+ return f
154+ else:
155+ return restart_on_change(restart_map)(f)
156+ return wrapper
157
158=== modified file 'hooks/rabbitmq_server_relations.py'
159--- hooks/rabbitmq_server_relations.py 2015-10-28 11:33:12 +0000
160+++ hooks/rabbitmq_server_relations.py 2015-11-17 15:16:48 +0000
161@@ -631,7 +631,7 @@
162
163
164 @hooks.hook('config-changed')
165-@rabbit.restart_on_change(rabbit.restart_map())
166+@rabbit.pause_aware_restart_on_change(rabbit.restart_map())
167 def config_changed():
168
169 if config('prefer-ipv6'):
170
171=== modified file 'tests/basic_deployment.py'
172--- tests/basic_deployment.py 2015-10-21 04:54:45 +0000
173+++ tests/basic_deployment.py 2015-11-17 15:16:48 +0000
174@@ -545,3 +545,128 @@
175 u.rmq_wait_for_cluster(self)
176
177 u.log.info('OK\n')
178+
179+ def _assert_services(self, should_run):
180+ rabbitmq_services = ['rabbitmq-server']
181+
182+ sentry_units = self._get_rmq_sentry_units()
183+ sentry = sentry_units[0] # We need to run actions on a single unit
184+
185+ u.get_unit_process_ids(
186+ {sentry: rabbitmq_services}, expect_success=should_run)
187+
188+ def _test_pause(self):
189+ u.log.info("Testing pause action")
190+ self._assert_services(should_run=True)
191+ sentry_units = self._get_rmq_sentry_units()
192+ sentry = sentry_units[0] # We need to run actions on a single unit
193+
194+ pause_action_id = u.run_action(sentry, "pause")
195+ assert u.wait_on_action(pause_action_id), "Pause action failed."
196+
197+ self._assert_services(should_run=False)
198+ status, message = u.status_get(sentry)
199+ if status != "maintenance":
200+ msg = ("Pause action failed to move unit to maintenance "
201+ "status (got {} instead)".format(status))
202+ amulet.raise_status(amulet.FAIL, msg=msg)
203+ if message != "Paused. Use 'resume' action to resume normal service.":
204+ msg = ("Pause action failed to set message"
205+ " (got {} instead)".format(message))
206+ amulet.raise_status(amulet.FAIL, msg=msg)
207+
208+ def _test_resume(self):
209+ u.log.info("Testing resume action")
210+ # service is left paused by _test_pause
211+ self._assert_services(should_run=False)
212+ sentry_units = self._get_rmq_sentry_units()
213+ sentry = sentry_units[0] # We need to run actions on a single unit
214+
215+ resume_action_id = u.run_action(sentry, "resume")
216+ assert u.wait_on_action(resume_action_id), "Resume action failed."
217+
218+ self._assert_services(should_run=True)
219+ status, message = u.status_get(sentry)
220+ if status != "active":
221+ msg = ("Resume action failed to move unit to active "
222+ "status (got {} instead)".format(status))
223+ amulet.raise_status(amulet.FAIL, msg=msg)
224+ if message != "Unit is ready":
225+ msg = ("Resume action failed to clear message"
226+ " (got {} instead)".format(message))
227+ amulet.raise_status(amulet.FAIL, msg=msg)
228+
229+ def test_416_pause_resume_actions(self):
230+ """Pause and then resume swift-proxy."""
231+ u.log.debug('Checking pause/resume actions...')
232+ self._test_pause()
233+ self._test_resume()
234+
235+ def test_417_restart_on_config_change(self):
236+ """Verify that the specified services are restarted when the config
237+ is changed."""
238+ u.log.info('Checking that conf files and system services respond '
239+ 'to a charm config change...')
240+
241+ sentry_units = self._get_rmq_sentry_units()
242+ sentry = sentry_units[0] # We need to run actions on a single unit
243+ juju_service = 'rabbitmq-server'
244+
245+ # Process names, corresponding conf files
246+ services = {'rabbitmq-server': '/etc/rabbitmq/rabbitmq-env.conf'}
247+
248+ # Expected default and alternate values
249+ set_default = {'ssl': 'off'}
250+ set_alternate = {'ssl': 'on'}
251+
252+ # Make config change, check for service restarts
253+ u.log.debug('Making config change on {}...'.format(juju_service))
254+ mtime = u.get_sentry_time(sentry)
255+ self.d.configure(juju_service, set_alternate)
256+
257+ sleep_time = 40
258+ for s, conf_file in services.iteritems():
259+ u.log.debug("Checking that service restarted: {}".format(s))
260+ if not u.validate_service_config_changed(sentry, mtime, s,
261+ conf_file,
262+ sleep_time=sleep_time):
263+ self.d.configure(juju_service, set_default)
264+ msg = "service {} didn't restart after config change".format(s)
265+ amulet.raise_status(amulet.FAIL, msg=msg)
266+ sleep_time = 0
267+
268+ self.d.configure(juju_service, set_default)
269+
270+ def test_418_no_restart_on_config_change_when_paused(self):
271+ """Verify that the specified services are not restarted when the config
272+ is changed and the unit is paused."""
273+ u.log.info('Checking that system services do not get restarted '
274+ 'when charm config changes but unit is paused...')
275+ sentry_units = self._get_rmq_sentry_units()
276+ sentry = sentry_units[0] # We need to run actions on a single unit
277+ juju_service = 'rabbitmq-server'
278+
279+ # Expected default and alternate values
280+ set_default = {'ssl': 'off'}
281+ set_alternate = {'ssl': 'on'}
282+
283+ services = ['rabbitmq-server']
284+
285+ # Pause the unit
286+ u.log.debug('Pausing the unit...')
287+ pause_action_id = u.run_action(sentry, "pause")
288+ assert u.wait_on_action(pause_action_id), "Resume action failed."
289+ # Make config change, check for service restarts
290+ u.log.debug('Making config change on {}...'.format(juju_service))
291+ self.d.configure(juju_service, set_alternate)
292+
293+ for service in services:
294+ u.log.debug("Checking that service didn't start while "
295+ "paused: {}".format(service))
296+ # No explicit assert because get_process_id_list will do it for us
297+ u.get_process_id_list(
298+ sentry, service, expect_success=False)
299+
300+ self.d.configure(juju_service, set_default)
301+ resume_action_id = u.run_action(sentry, "resume")
302+ assert u.wait_on_action(resume_action_id), "Resume action failed."
303
304=== modified file 'unit_tests/test_rabbitmq_server_relations.py'
305--- unit_tests/test_rabbitmq_server_relations.py 2015-10-19 03:33:39 +0000
306+++ unit_tests/test_rabbitmq_server_relations.py 2015-11-17 15:16:48 +0000
307@@ -3,7 +3,8 @@
308 from mock import patch, MagicMock
309
310 os.environ['JUJU_UNIT_NAME'] = 'UNIT_TEST/0' # noqa - needed for import
311-import rabbitmq_server_relations
312+with patch('rabbit_utils.is_paused') as is_paused:
313+ import rabbitmq_server_relations
314
315
316 class RelationUtil(TestCase):

Subscribers

People subscribed via source and target branches