Merge lp:~tribaal/charms/trusty/rabbitmq-server/add-pause-resume-actions into lp:~openstack-charmers-archive/charms/trusty/rabbitmq-server/next

Proposed by Chris Glass
Status: Work in progress
Proposed branch: lp:~tribaal/charms/trusty/rabbitmq-server/add-pause-resume-actions
Merge into: lp:~openstack-charmers-archive/charms/trusty/rabbitmq-server/next
Diff against target: 316 lines (+225/-3)
7 files modified
Makefile (+1/-1)
actions.yaml (+4/-0)
actions/actions.py (+61/-0)
hooks/rabbit_utils.py (+31/-0)
hooks/rabbitmq_server_relations.py (+1/-1)
tests/basic_deployment.py (+125/-0)
unit_tests/test_rabbitmq_server_relations.py (+2/-1)
To merge this branch: bzr merge lp:~tribaal/charms/trusty/rabbitmq-server/add-pause-resume-actions
Reviewer Review Type Date Requested Status
David Ames (community) Needs Fixing
James Page Needs Fixing
Geoff Teale (community) Approve
Review via email: mp+277418@code.launchpad.net

Description of the change

This branch adds two unit-level actions, "pause" and "resume" to the charm, that users can execute to pause and respectively resume a particular unit of a rabbitmq-server cluster, similar to the way other charms implement the same feature (ceilometer, swift*).

A unit that is "paused" should not have any responsability anymore and should be allowed to be taken offline without further warning. In Rabbitmq's case this is pretty simple, since the application itself allows for this to happen by default. Pausing the unit will simply stop the rabbitmq service on the given unit. Other charms need to make sure no data is lost.

One small non-obvious change here: the @restart_on_change decorator is further wrapped to make it "pause aware" otherwise some service-level configuration changes would restart the rabbitmq services on paused units. Again, this is not critical in rabbitmq's case since the application logic should handle a killed service gracefully, but was added anyway for consistency's sake.

To post a comment you must log in.
Revision history for this message
Geoff Teale (tealeg) wrote :

+1 Approve.

review: Approve
Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_unit_test #12768 rabbitmq-server-next for tribaal mp277418
    UNIT FAIL: unit-test failed

UNIT Results (max last 2 lines):
make: *** [test] Error 1
ERROR:root:Make target returned non-zero.

Full unit test output: http://paste.ubuntu.com/13246423/
Build: http://10.245.162.77:8080/job/charm_unit_test/12768/

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_lint_check #13698 rabbitmq-server-next for tribaal mp277418
    LINT FAIL: lint-test failed

LINT Results (max last 2 lines):
make: *** [lint] Error 1
ERROR:root:Make target returned non-zero.

Full lint test output: http://paste.ubuntu.com/13246424/
Build: http://10.245.162.77:8080/job/charm_lint_check/13698/

Revision history for this message
James Page (james-page) wrote :

I think we need to avoid calling status_set directly in the actions, and move to updating the assess_status function in the main relations file to be pause aware; this will ensure that the next 'update-status' hook execution (which happens periodically in 1.25) will set the correct status of 'paused' rather than determining that rabbitmq is not running and setting 'blocked'

review: Needs Fixing
Revision history for this message
Chris Glass (tribaal) wrote :

> I think we need to avoid calling status_set directly in the actions, and move
> to updating the assess_status function in the main relations file to be pause
> aware; this will ensure that the next 'update-status' hook execution (which
> happens periodically in 1.25) will set the correct status of 'paused' rather
> than determining that rabbitmq is not running and setting 'blocked'

Excellent point. Changed the "status_set" to be in assess_status() as you suggested.

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_amulet_test #7855 rabbitmq-server-next for tribaal mp277418
    AMULET FAIL: amulet-test failed

AMULET Results (max last 2 lines):
make: *** [functional_test] Error 1
ERROR:root:Make target returned non-zero.

Full amulet test output: http://paste.ubuntu.com/13246587/
Build: http://10.245.162.77:8080/job/charm_amulet_test/7855/

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_unit_test #12769 rabbitmq-server-next for tribaal mp277418
    UNIT OK: passed

Build: http://10.245.162.77:8080/job/charm_unit_test/12769/

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_lint_check #13699 rabbitmq-server-next for tribaal mp277418
    LINT OK: passed

Build: http://10.245.162.77:8080/job/charm_lint_check/13699/

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_amulet_test #7856 rabbitmq-server-next for tribaal mp277418
    AMULET FAIL: amulet-test failed

AMULET Results (max last 2 lines):
make: *** [functional_test] Error 1
ERROR:root:Make target returned non-zero.

Full amulet test output: http://paste.ubuntu.com/13246796/
Build: http://10.245.162.77:8080/job/charm_amulet_test/7856/

Revision history for this message
James Page (james-page) wrote :

Hmm - that amulet test look suspicious as the rabbitmq cluster is not clustering correctly

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_amulet_test #7859 rabbitmq-server-next for tribaal mp277418
    AMULET FAIL: amulet-test failed

AMULET Results (max last 2 lines):
2015-11-13 18:31:20,197 publish_amqp_message_by_unit DEBUG: Closing connection...
ERROR:root:Make target returned non-zero.

Full amulet test output: http://paste.ubuntu.com/13249544/
Build: http://10.245.162.77:8080/job/charm_amulet_test/7859/

Revision history for this message
Ryan Beisner (1chb1n) wrote :

Please also add lint coverage for the new actions dir via the Makefile's lint target. Thank you.

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_amulet_test #7860 rabbitmq-server-next for tribaal mp277418
    AMULET FAIL: amulet-test failed

AMULET Results (max last 2 lines):
MESSAGE 1@172.17.107.152 [4B29C6B3-6F9A-4DF2-9A09-3675F6DE0EEA-1447444725.03]
ERROR:root:Make target returned non-zero.

Full amulet test output: http://paste.ubuntu.com/13250328/
Build: http://10.245.162.77:8080/job/charm_amulet_test/7860/

Revision history for this message
Chris Glass (tribaal) wrote :

> Please also add lint coverage for the new actions dir via the Makefile's lint
> target. Thank you.

Good catch! Done.

Thanks!

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_unit_test #12924 rabbitmq-server-next for tribaal mp277418
    UNIT OK: passed

Build: http://10.245.162.77:8080/job/charm_unit_test/12924/

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_lint_check #13863 rabbitmq-server-next for tribaal mp277418
    LINT OK: passed

Build: http://10.245.162.77:8080/job/charm_lint_check/13863/

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_lint_check #13866 rabbitmq-server-next for tribaal mp277418
    LINT OK: passed

Build: http://10.245.162.77:8080/job/charm_lint_check/13866/

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_unit_test #12926 rabbitmq-server-next for tribaal mp277418
    UNIT OK: passed

Build: http://10.245.162.77:8080/job/charm_unit_test/12926/

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_amulet_test #7900 rabbitmq-server-next for tribaal mp277418
    AMULET FAIL: amulet-test failed

AMULET Results (max last 2 lines):
2015-11-16 22:33:08,393 _test_rmq_amqp_messages_all_units DEBUG: Publish message to: 172.17.105.97 (rabbitmq-server/0 juju-osci-sv05-machine-4)
ERROR:root:Make target returned non-zero.

Full amulet test output: http://paste.ubuntu.com/13303433/
Build: http://10.245.162.77:8080/job/charm_amulet_test/7900/

Revision history for this message
David Ames (thedac) wrote :

Tests need fixing:
Traceback (most recent call last):
  File "tests/014-basic-precise-icehouse", line 11, in <module>
    deployment.run_tests()
  File "/var/lib/jenkins/checkout/rabbitmq-server/tests/charmhelpers/contrib/amulet/deployment.py", line 95, in run_tests
    getattr(self, test)()
  File "/var/lib/jenkins/checkout/rabbitmq-server/tests/basic_deployment.py", line 587, in test_416_pause_resume_actions
    self._test_pause()
  File "/var/lib/jenkins/checkout/rabbitmq-server/tests/basic_deployment.py", line 551, in _test_pause
    self._assert_services(should_run=True)
AttributeError: 'RmqBasicDeployment' object has no attribute '_assert_services'

review: Needs Fixing
Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_unit_test #12978 rabbitmq-server-next for tribaal mp277418
    UNIT OK: passed

Build: http://10.245.162.77:8080/job/charm_unit_test/12978/

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_lint_check #13923 rabbitmq-server-next for tribaal mp277418
    LINT OK: passed

Build: http://10.245.162.77:8080/job/charm_lint_check/13923/

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_amulet_test #7905 rabbitmq-server-next for tribaal mp277418
    AMULET FAIL: amulet-test failed

AMULET Results (max last 2 lines):
2015-11-17 14:35:37,575 connect_amqp_by_unit DEBUG: Connecting to amqp on 172.17.107.199:5671 (rabbitmq-server/2) as testuser1...
ERROR:root:Make target returned non-zero.

Full amulet test output: http://paste.ubuntu.com/13313632/
Build: http://10.245.162.77:8080/job/charm_amulet_test/7905/

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_unit_test #12981 rabbitmq-server-next for tribaal mp277418
    UNIT OK: passed

Build: http://10.245.162.77:8080/job/charm_unit_test/12981/

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_lint_check #13925 rabbitmq-server-next for tribaal mp277418
    LINT OK: passed

Build: http://10.245.162.77:8080/job/charm_lint_check/13925/

Revision history for this message
uosci-testing-bot (uosci-testing-bot) wrote :

charm_amulet_test #7908 rabbitmq-server-next for tribaal mp277418
    AMULET FAIL: amulet-test failed

AMULET Results (max last 2 lines):
2015-11-17 16:00:50,154 connect_amqp_by_unit DEBUG: Connecting to amqp on 172.17.106.18:5671 (rabbitmq-server/1) as testuser1...
ERROR:root:Make target returned non-zero.

Full amulet test output: http://paste.ubuntu.com/13314342/
Build: http://10.245.162.77:8080/job/charm_amulet_test/7908/

Revision history for this message
Ryan Beisner (1chb1n) wrote :

hi @tribaal

It's hard to say why the test failed on the pause action. I'd suggest adding a check + debug output when it fails the == check, before returning from wait_on_action() in charmhelpers/contrib/amulet/utils.py.

Unmerged revisions

134. By Chris Glass

This should hopefully fix tests once and for all.

133. By Chris Glass

Added missing method (straight copy from the swift-proxy charm).

132. By Chris Glass

Re-added actions directory to linting after commit revert.

131. By Chris Glass

Reverting last change - unintended charmhelpers sync was pulled in as well.

130. By Chris Glass

Added new actions directory to lint makefile target.

129. By Chris Glass

Fixed missing mock for tests.

128. By Chris Glass

Lint fixes.

127. By Chris Glass

Move the maintenance status set to the assess_status() function. (james's review)

126. By Chris Glass

Forgot the tests

125. By Chris Glass

First pass to make rabbitmq-server maintenance-aware.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'Makefile'
--- Makefile 2015-10-21 13:36:00 +0000
+++ Makefile 2015-11-17 15:16:48 +0000
@@ -17,7 +17,7 @@
1717
18lint: .venv18lint: .venv
19 @.venv/bin/flake8 --exclude hooks/charmhelpers,tests/charmhelpers hooks \19 @.venv/bin/flake8 --exclude hooks/charmhelpers,tests/charmhelpers hooks \
20 unit_tests tests20 unit_tests tests actions
21 @charm proof21 @charm proof
2222
23bin/charm_helpers_sync.py:23bin/charm_helpers_sync.py:
2424
=== added directory 'actions'
=== added file 'actions.yaml'
--- actions.yaml 1970-01-01 00:00:00 +0000
+++ actions.yaml 2015-11-17 15:16:48 +0000
@@ -0,0 +1,4 @@
1pause:
2 description: Pause the RabbitMQ unit. This action will stop RabbitMQ services.
3resume:
4 descrpition: Resume the RabbitMQ unit. This action will start RabbitMQ services.
05
=== added file 'actions/actions.py'
--- actions/actions.py 1970-01-01 00:00:00 +0000
+++ actions/actions.py 2015-11-17 15:16:48 +0000
@@ -0,0 +1,61 @@
1#!/usr/bin/python
2
3import os
4import sys
5
6from charmhelpers.core.host import service_pause, service_resume
7from charmhelpers.core.hookenv import action_fail
8from charmhelpers.core.unitdata import kv
9
10from rabbit_utils import assess_status
11
12
13def pause(args):
14 """Pause the RabbitMQ services.
15
16 @raises Exception should the service fail to stop.
17 """
18 if not service_pause("rabbitmq-server"):
19 raise Exception("Failed to pause rabbitmq-server.")
20
21 db = kv()
22 db.set('unit-paused', True)
23 db.flush()
24
25 assess_status()
26
27
28def resume(args):
29 """Resume the RabbitMQ services.
30
31 @raises Exception should the service fail to start."""
32 if not service_resume("rabbitmq-server"):
33 raise Exception("Failed to resume rabbitmq-server")
34
35 db = kv()
36 db.set('unit-paused', False)
37 db.flush()
38
39 assess_status()
40
41
42# A dictionary of all the defined actions to callables (which take
43# parsed arguments).
44ACTIONS = {"pause": pause, "resume": resume}
45
46
47def main(args):
48 action_name = os.path.basename(args[0])
49 try:
50 action = ACTIONS[action_name]
51 except KeyError:
52 return "Action %s undefined" % action_name
53 else:
54 try:
55 action(args)
56 except Exception as e:
57 action_fail(str(e))
58
59
60if __name__ == "__main__":
61 sys.exit(main(sys.argv))
062
=== added symlink 'actions/charmhelpers'
=== target is u'../hooks/charmhelpers'
=== added symlink 'actions/pause'
=== target is u'actions.py'
=== added symlink 'actions/rabbit_utils.py'
=== target is u'../hooks/rabbit_utils.py'
=== added symlink 'actions/resume'
=== target is u'actions.py'
=== modified file 'hooks/rabbit_utils.py'
--- hooks/rabbit_utils.py 2015-10-22 02:05:54 +0000
+++ hooks/rabbit_utils.py 2015-11-17 15:16:48 +0000
@@ -27,6 +27,7 @@
27 INFO,27 INFO,
28 service_name,28 service_name,
29 status_set,29 status_set,
30 status_get,
30 cached,31 cached,
31 unit_get,32 unit_get,
32 relation_set,33 relation_set,
@@ -49,6 +50,11 @@
4950
50from collections import OrderedDict51from collections import OrderedDict
5152
53from charmhelpers.core.unitdata import (
54 HookData,
55 kv,
56)
57
52PACKAGES = ['rabbitmq-server', 'python-amqplib', 'lockfile-progs']58PACKAGES = ['rabbitmq-server', 'python-amqplib', 'lockfile-progs']
5359
54RABBITMQ_CTL = '/usr/sbin/rabbitmqctl'60RABBITMQ_CTL = '/usr/sbin/rabbitmqctl'
@@ -691,6 +697,12 @@
691 # NOTE: ensure rabbitmq is actually installed before doing697 # NOTE: ensure rabbitmq is actually installed before doing
692 # any checks698 # any checks
693 if os.path.exists(RABBITMQ_CTL):699 if os.path.exists(RABBITMQ_CTL):
700 # Pause check
701 if is_paused():
702 status_set(
703 'maintenance',
704 "Unit paused - use 'resume' action to resume normal service")
705 return
694 # Clustering Check706 # Clustering Check
695 peer_ids = relation_ids('cluster')707 peer_ids = relation_ids('cluster')
696 if peer_ids and len(related_units(peer_ids[0])):708 if peer_ids and len(related_units(peer_ids[0])):
@@ -751,3 +763,22 @@
751 wait_app()763 wait_app()
752 return wrapped_f764 return wrapped_f
753 return wrap765 return wrap
766
767
768def is_paused(status_get=status_get):
769 """Is the unit paused?"""
770 with HookData()():
771 if kv().get('unit-paused'):
772 return True
773 else:
774 return False
775
776
777def pause_aware_restart_on_change(restart_map):
778 """Avoids restarting services if config changes when unit is paused."""
779 def wrapper(f):
780 if is_paused():
781 return f
782 else:
783 return restart_on_change(restart_map)(f)
784 return wrapper
754785
=== modified file 'hooks/rabbitmq_server_relations.py'
--- hooks/rabbitmq_server_relations.py 2015-10-28 11:33:12 +0000
+++ hooks/rabbitmq_server_relations.py 2015-11-17 15:16:48 +0000
@@ -631,7 +631,7 @@
631631
632632
633@hooks.hook('config-changed')633@hooks.hook('config-changed')
634@rabbit.restart_on_change(rabbit.restart_map())634@rabbit.pause_aware_restart_on_change(rabbit.restart_map())
635def config_changed():635def config_changed():
636636
637 if config('prefer-ipv6'):637 if config('prefer-ipv6'):
638638
=== modified file 'tests/basic_deployment.py'
--- tests/basic_deployment.py 2015-10-21 04:54:45 +0000
+++ tests/basic_deployment.py 2015-11-17 15:16:48 +0000
@@ -545,3 +545,128 @@
545 u.rmq_wait_for_cluster(self)545 u.rmq_wait_for_cluster(self)
546546
547 u.log.info('OK\n')547 u.log.info('OK\n')
548
549 def _assert_services(self, should_run):
550 rabbitmq_services = ['rabbitmq-server']
551
552 sentry_units = self._get_rmq_sentry_units()
553 sentry = sentry_units[0] # We need to run actions on a single unit
554
555 u.get_unit_process_ids(
556 {sentry: rabbitmq_services}, expect_success=should_run)
557
558 def _test_pause(self):
559 u.log.info("Testing pause action")
560 self._assert_services(should_run=True)
561 sentry_units = self._get_rmq_sentry_units()
562 sentry = sentry_units[0] # We need to run actions on a single unit
563
564 pause_action_id = u.run_action(sentry, "pause")
565 assert u.wait_on_action(pause_action_id), "Pause action failed."
566
567 self._assert_services(should_run=False)
568 status, message = u.status_get(sentry)
569 if status != "maintenance":
570 msg = ("Pause action failed to move unit to maintenance "
571 "status (got {} instead)".format(status))
572 amulet.raise_status(amulet.FAIL, msg=msg)
573 if message != "Paused. Use 'resume' action to resume normal service.":
574 msg = ("Pause action failed to set message"
575 " (got {} instead)".format(message))
576 amulet.raise_status(amulet.FAIL, msg=msg)
577
578 def _test_resume(self):
579 u.log.info("Testing resume action")
580 # service is left paused by _test_pause
581 self._assert_services(should_run=False)
582 sentry_units = self._get_rmq_sentry_units()
583 sentry = sentry_units[0] # We need to run actions on a single unit
584
585 resume_action_id = u.run_action(sentry, "resume")
586 assert u.wait_on_action(resume_action_id), "Resume action failed."
587
588 self._assert_services(should_run=True)
589 status, message = u.status_get(sentry)
590 if status != "active":
591 msg = ("Resume action failed to move unit to active "
592 "status (got {} instead)".format(status))
593 amulet.raise_status(amulet.FAIL, msg=msg)
594 if message != "Unit is ready":
595 msg = ("Resume action failed to clear message"
596 " (got {} instead)".format(message))
597 amulet.raise_status(amulet.FAIL, msg=msg)
598
599 def test_416_pause_resume_actions(self):
600 """Pause and then resume swift-proxy."""
601 u.log.debug('Checking pause/resume actions...')
602 self._test_pause()
603 self._test_resume()
604
605 def test_417_restart_on_config_change(self):
606 """Verify that the specified services are restarted when the config
607 is changed."""
608 u.log.info('Checking that conf files and system services respond '
609 'to a charm config change...')
610
611 sentry_units = self._get_rmq_sentry_units()
612 sentry = sentry_units[0] # We need to run actions on a single unit
613 juju_service = 'rabbitmq-server'
614
615 # Process names, corresponding conf files
616 services = {'rabbitmq-server': '/etc/rabbitmq/rabbitmq-env.conf'}
617
618 # Expected default and alternate values
619 set_default = {'ssl': 'off'}
620 set_alternate = {'ssl': 'on'}
621
622 # Make config change, check for service restarts
623 u.log.debug('Making config change on {}...'.format(juju_service))
624 mtime = u.get_sentry_time(sentry)
625 self.d.configure(juju_service, set_alternate)
626
627 sleep_time = 40
628 for s, conf_file in services.iteritems():
629 u.log.debug("Checking that service restarted: {}".format(s))
630 if not u.validate_service_config_changed(sentry, mtime, s,
631 conf_file,
632 sleep_time=sleep_time):
633 self.d.configure(juju_service, set_default)
634 msg = "service {} didn't restart after config change".format(s)
635 amulet.raise_status(amulet.FAIL, msg=msg)
636 sleep_time = 0
637
638 self.d.configure(juju_service, set_default)
639
640 def test_418_no_restart_on_config_change_when_paused(self):
641 """Verify that the specified services are not restarted when the config
642 is changed and the unit is paused."""
643 u.log.info('Checking that system services do not get restarted '
644 'when charm config changes but unit is paused...')
645 sentry_units = self._get_rmq_sentry_units()
646 sentry = sentry_units[0] # We need to run actions on a single unit
647 juju_service = 'rabbitmq-server'
648
649 # Expected default and alternate values
650 set_default = {'ssl': 'off'}
651 set_alternate = {'ssl': 'on'}
652
653 services = ['rabbitmq-server']
654
655 # Pause the unit
656 u.log.debug('Pausing the unit...')
657 pause_action_id = u.run_action(sentry, "pause")
658 assert u.wait_on_action(pause_action_id), "Resume action failed."
659 # Make config change, check for service restarts
660 u.log.debug('Making config change on {}...'.format(juju_service))
661 self.d.configure(juju_service, set_alternate)
662
663 for service in services:
664 u.log.debug("Checking that service didn't start while "
665 "paused: {}".format(service))
666 # No explicit assert because get_process_id_list will do it for us
667 u.get_process_id_list(
668 sentry, service, expect_success=False)
669
670 self.d.configure(juju_service, set_default)
671 resume_action_id = u.run_action(sentry, "resume")
672 assert u.wait_on_action(resume_action_id), "Resume action failed."
548673
=== modified file 'unit_tests/test_rabbitmq_server_relations.py'
--- unit_tests/test_rabbitmq_server_relations.py 2015-10-19 03:33:39 +0000
+++ unit_tests/test_rabbitmq_server_relations.py 2015-11-17 15:16:48 +0000
@@ -3,7 +3,8 @@
3from mock import patch, MagicMock3from mock import patch, MagicMock
44
5os.environ['JUJU_UNIT_NAME'] = 'UNIT_TEST/0' # noqa - needed for import5os.environ['JUJU_UNIT_NAME'] = 'UNIT_TEST/0' # noqa - needed for import
6import rabbitmq_server_relations6with patch('rabbit_utils.is_paused') as is_paused:
7 import rabbitmq_server_relations
78
89
9class RelationUtil(TestCase):10class RelationUtil(TestCase):

Subscribers

People subscribed via source and target branches