Landscape Client

Merge lp:~chad.smith/landscape-client/ha-manager-skeleton into lp:~landscape/landscape-client/trunk

ha-manager-skeleton
Merge into trunk

Proposed by Chad Smith on 2013-02-15

Status:

Merged

Approved by:

Chad Smith on 2013-02-22

Approved revision:

636

Merged at revision:

628

Proposed branch:

lp:~chad.smith/landscape-client/ha-manager-skeleton

Merge into:

lp:~landscape/landscape-client/trunk

Diff against target:

597 lines (+547/-3)

5 files modified

landscape/manager/config.py (+1/-1)
landscape/manager/haservice.py (+205/-0)
landscape/manager/tests/test_config.py (+2/-1)
landscape/manager/tests/test_haservice.py (+331/-0)
landscape/message_schemas.py (+8/-1)

To merge this branch:

bzr merge lp:~chad.smith/landscape-client/ha-manager-skeleton

High

Fix Released

Link a bug report

Reviewer	Date Requested	Status
Jerry Seutter (community)	2013-02-15	Approve on 2013-02-21
Christopher Armstrong (community)	2013-02-15	Approve on 2013-02-20
Review via email: mp+148593@code.launchpad.net

Commit message

Initial HA service manager plugin for landscape-client to better enable Openstack live upgrades. This manager expects generic ha enablement and health scripts (add_to_cluster, remove_from_cluster and health_checks.d) delivered by a charm at /var/lib/juju/units/<charm_name>/charm/.

This plugin only activates upon receipt of an ha-service-change message from landscape-server. It will only take action with haproxy configured charms delivering the above-mentioned scripts. Without scripts or a health_checks.d dir, this plugin will log success and continue on with any package maintenance or updates.

Description of the change

Initial HA service manager pluging for landscape- client. Please take a good look over the deferreds and callbacks I'm using to make sure I'm not overusing callbacks.

This is round 1, a skeleton that depends on charm-delivered scripts which allow for local service health checks and ha_cluster online or standby. There will likely me iterations on this as the server team (HA charm writers) add functionality to the openstack HA charms.

The manager will allow Landscape server to send change-ha-service messages to request service-state: "online" or "standby".

When change-ha-service requests the "standby" state:
The manager will run the charm's remove_from_cluster script and validate the error code to return an operation-result message with SUCCEEDED or FAILED message status. If the charm doesn't deliver a remove_from_cluster script a SUCCEEDED result is returned.

When change-ha-service requests the "online" state:
The manager will first run and validate any charm health check scripts delivered at /var/lib/juju/units/<charm_name>/health_checks.d/*. Health checks are validated using run-parts commandline utility. Upon successful health check runs (or the absence of charm health checks) the manager will proceed to run the charm's add_to_cluster script. Bad exit codes from any present scripts will result in a FAILED operation-result message returned to the server with the offending script listed in the optional result-text.

To test:
in my local copy of landscape/trunk I hacked the RebootComputer message to instead send a static change-ha-service message so I could test from a local client. I'll attach the trunk patch I was using which might better enable integration testing.

Revision history for this message

Chad Smith (chad.smith) wrote on 2013-02-15:

hmmm no attach functionality to a merge proposal. Let's try a pastebin (which will probably need updating)

https://pastebin.canonical.com/84692/

lp:~chad.smith/landscape-client/ha-manager-skeleton updated on 2013-02-15

629. By Chad Smith on 2013-02-15: reintroduce CharmScriptError and RunPartsError to return as twisted deferred fails.
630. By Chad Smith on 2013-02-15: fixed config unit test adding HAService to ALL_PLUGINS test

Revision history for this message

Christopher Armstrong (radix) wrote on 2013-02-18:

[1] + def run_parts(script_dir):

You have an inner run_parts method that doesn't need to exist; you can just inline the code.

[2] + def _respond_failure(self, failure, opid):

You should use landscape.lib.log.log_failure here to log the failure.

[3] _format_exception and the following code at the end of handle_change_ha_service is unnecessary:

+ except Exception, e:
+ self._respond_failure(self._format_exception(e), opid)

Instead, do

except:
self._respond_failure(Failure(), opid)

Failure(), when constructed with no arguments, automatically grabs the "current" exception and traceback.

[4] I recommend separating _respond_failure into two different functions, one for handling failure instances and another for handling string messages.

[5] In the places where you invoke getProcessValue, I think you'll still need to provide an environment in case the script relies on basic things like PATH, etc. It should be reasonable to just pass through os.environ like you do for the getProcessOutputAndValue call.

[6]

+ def validate_exit_code(code, script):
+ if code != 0:
+ return fail(CharmScriptError(script, code))
+ else:
+ return succeed("%s succeeded." % script)

This could be rewritten a bit nicer as

if code != 0:
raise CharmScriptError(script, code)
else:
return "%s succeeded." % script

[7] Same for parse_output.

review: Needs Fixing

lp:~chad.smith/landscape-client/ha-manager-skeleton updated on 2013-02-19

631. By Chad Smith on 2013-02-19: per review comments:
- drop unnneeded run_parts method in favor of inline
- use log_failure instead of logging.error
- drop format_exception and use Failure() instead
- on success, instead of return succeed("some string") just return "some string"
- raise CharmScriptError instead of return fail(CharmScriptError)
632. By Chad Smith on 2013-02-19: - add _respond_failure_string to handle failure strings using logging.error.
- _respond_failure handles any raised exceptions using log_failure

Revision history for this message

Chad Smith (chad.smith) wrote on 2013-02-19:

thanks Chris for the input. I worked in those changes you suggested.

Revision history for this message

Christopher Armstrong (radix) wrote on 2013-02-20:

[8]

+ failure_string = "%s" % (failure.value)

You should probably just use failure.getErrorMessage(). (at first I was going to suggest str(failure.value) instead of "%s" %, but then I realized failure has a method for that already)

Looks good!

review: Approve

lp:~chad.smith/landscape-client/ha-manager-skeleton updated on 2013-02-20

633. By Chad Smith on 2013-02-20: use failure.getErrorMessage() instead of str(failure.value)

Revision history for this message

Jerry Seutter (jseutter) wrote on 2013-02-21:

+1 looks good

Regarding "opid" - it looks like operator process id to me. I have been told to stop over-abbreviating variable names in the past. It would probably be best to use operation_id instead.

review: Approve

lp:~chad.smith/landscape-client/ha-manager-skeleton updated on 2013-02-22

634. By Chad Smith on 2013-02-21: charm directory needs to tack on the 'charm' subdir for housing all charm deliverables. An example: /var/lib/juju/units/keystone-2/charm/
635. By Chad Smith on 2013-02-21: opid -> operation_id
636. By Chad Smith on 2013-02-22: lint fixes

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Chad Smith

 === modified file 'landscape/manager/config.py'
 --- landscape/manager/config.py	2013-01-24 20:15:36 +0000
 +++ landscape/manager/config.py	2013-02-22 00:26:21 +0000
@@ -6,7 +6,7 @@
  ALL_PLUGINS = ["ProcessKiller", "PackageManager", "UserManager",
                 "ShutdownManager", "Eucalyptus", "AptSources", "HardwareInfo",
--               "CephUsage", "KeystoneToken"]
++               "CephUsage", "KeystoneToken", "HAService"]
  class ManagerConfiguration(Configuration):
 === added file 'landscape/manager/haservice.py'
 --- landscape/manager/haservice.py	1970-01-01 00:00:00 +0000
 +++ landscape/manager/haservice.py	2013-02-22 00:26:21 +0000
@@ -0,0 +1,205 @@
++import logging
++import os
++
++from twisted.python.failure import Failure
++from twisted.internet.utils import getProcessValue, getProcessOutputAndValue
++from twisted.internet.defer import succeed
++
++from landscape.lib.log import log_failure
++from landscape.manager.plugin import ManagerPlugin, SUCCEEDED, FAILED
++
++
++class CharmScriptError(Exception):
++    """
++    Raised when a charm-provided script fails with a non-zero exit code.
++
++    @ivar script: the name of the failed script
++    @ivar code: the exit code of the failed script
++    """
++
++    def __init__(self, script, code):
++        self.script = script
++        self.code = code
++        Exception.__init__(self, self._get_message())
++
++    def _get_message(self):
++        return ("Failed charm script: %s exited with return code %d." %
++                (self.script, self.code))
++
++
++class RunPartsError(Exception):
++    """
++    Raised when a charm-provided health script run-parts directory contains
++    a health script that fails with a non-zero exit code.
++
++    @ivar stderr: the stderr from the failed run-parts command
++    """
++
++    def __init__(self, stderr):
++        self.message = ("%s" % stderr.split(":")[1].strip())
++        Exception.__init__(self, self._get_message())
++
++    def _get_message(self):
++        return "Failed charm script: %s." % self.message
++
++
++class HAService(ManagerPlugin):
++    """
++    Plugin to manage this computer's active participation in a
++    high-availability cluster. It depends on charms delivering both health
++    scripts and cluster_add cluster_remove scripts to function.
++    """
++
++    JUJU_UNITS_BASE = "/var/lib/juju/units"
++    CLUSTER_ONLINE = "add_to_cluster"
++    CLUSTER_STANDBY = "remove_from_cluster"
++    HEALTH_SCRIPTS_DIR = "health_checks.d"
++    STATE_STANDBY = u"standby"
++    STATE_ONLINE = u"online"
++
++    def register(self, registry):
++        super(HAService, self).register(registry)
++        registry.register_message("change-ha-service",
++                                  self.handle_change_ha_service)
++
++    def _respond(self, status, data, operation_id):
++        message = {"type": "operation-result",
++                   "status": status,
++                   "operation-id": operation_id}
++        if data:
++            if not isinstance(data, unicode):
++            # Let's decode result-text, replacing non-printable
++            # characters
++                message["result-text"] = data.decode("utf-8", "replace")
++            else:
++                message["result-text"] = data.decode("utf-8", "replace")
++        return self.registry.broker.send_message(message, True)
++
++    def _respond_success(self, data, message, operation_id):
++        logging.info(message)
++        return self._respond(SUCCEEDED, data, operation_id)
++
++    def _respond_failure(self, failure, operation_id):
++        """Handle exception failures."""
++        log_failure(failure)
++        return self._respond(FAILED, failure.getErrorMessage(), operation_id)
++
++    def _respond_failure_string(self, failure_string, operation_id):
++        """Only handle string failures."""
++        logging.error(failure_string)
++        return self._respond(FAILED, failure_string, operation_id)
++
++    def _run_health_checks(self, unit_name):
++        """
++        Exercise any discovered health check scripts, will return a deferred
++        success or fail.
++        """
++        health_dir = "%s/%s/charm/%s" % (
++            self.JUJU_UNITS_BASE, unit_name, self.HEALTH_SCRIPTS_DIR)
++        if not os.path.exists(health_dir) or len(os.listdir(health_dir)) == 0:
++            # No scripts, no problem
++            message = (
++                "Skipping juju charm health checks. No scripts at %s." %
++                health_dir)
++            logging.info(message)
++            return succeed(message)
++
++        def parse_output((stdout_data, stderr_data, status)):
++            if status != 0:
++                raise RunPartsError(stderr_data)
++            else:
++                return "All health checks succeeded."
++
++        result = getProcessOutputAndValue(
++            "run-parts", [health_dir], env=os.environ)
++        return result.addCallback(parse_output)
++
++    def _change_cluster_participation(self, _, unit_name, service_state):
++        """
++        Enables or disables a unit's participation in a cluster based on
++        running charm-delivered CLUSTER_ONLINE and CLUSTER_STANDBY scripts
++        if they exist. If the charm doesn't deliver scripts, return succeed().
++        """
++
++        unit_dir = "%s/%s/charm/" % (self.JUJU_UNITS_BASE, unit_name)
++        if service_state == u"online":
++            script = unit_dir + self.CLUSTER_ONLINE
++        else:
++            script = unit_dir + self.CLUSTER_STANDBY
++
++        if not os.path.exists(script):
++            logging.info("Ignoring juju charm cluster state change to '%s'. "
++                         "Charm script does not exist at %s." %
++                         (service_state, script))
++            return succeed(
++                "This computer is always a participant in its high-availabilty"
++                " cluster. No juju charm cluster settings changed.")
++
++        def run_script(script):
++            result = getProcessValue(script, env=os.environ)
++
++            def validate_exit_code(code, script):
++                if code != 0:
++                    raise CharmScriptError(script, code)
++                else:
++                    return "%s succeeded." % script
++            return result.addCallback(validate_exit_code, script)
++
++        return run_script(script)
++
++    def _perform_state_change(self, unit_name, service_state, operation_id):
++        """
++        Handle specific state change requests through calls to available
++        charm scripts like C{CLUSTER_ONLINE}, C{CLUSTER_STANDBY} and any
++        health check scripts. Assume success in any case where no scripts
++        exist for a given task.
++        """
++        d = succeed(None)
++        if service_state == self.STATE_ONLINE:
++            # Validate health of local service before we bring it online
++            # in the HAcluster
++            d = self._run_health_checks(unit_name)
++        d.addCallback(
++            self._change_cluster_participation, unit_name, service_state)
++        return d
++
++    def handle_change_ha_service(self, message):
++        """Parse incoming change-ha-service messages"""
++        operation_id = message["operation-id"]
++        try:
++            error_message = u""
++
++            service_name = message["service-name"]   # keystone
++            unit_name = message["unit-name"]         # keystone-0
++            service_state = message["service-state"]  # "online" | "standby"
++            change_message = (
++                "%s high-availability service set to %s" %
++                (service_name, service_state))
++
++            if service_state not in [self.STATE_STANDBY, self.STATE_ONLINE]:
++                error_message = (
++                   u"Invalid cluster participation state requested %s." %
++                   service_state)
++
++            unit_dir = "%s/%s/charm" % (self.JUJU_UNITS_BASE, unit_name)
++            if not os.path.exists(self.JUJU_UNITS_BASE):
++                error_message = (
++                    u"This computer is not deployed with juju. "
++                    u"Changing high-availability service not supported.")
++            elif not os.path.exists(unit_dir):
++                error_message = (
++                    u"This computer is not juju unit %s. Unable to "
++                    u"modify high-availability services." % unit_name)
++
++            if error_message:
++                return self._respond_failure_string(
++                    error_message, operation_id)
++
++            d = self._perform_state_change(
++                unit_name, service_state, operation_id)
++            d.addCallback(self._respond_success, change_message, operation_id)
++            d.addErrback(self._respond_failure, operation_id)
++            return d
++        except:
++            self._respond_failure(Failure(), operation_id)
++            return d
 === modified file 'landscape/manager/tests/test_config.py'
 --- landscape/manager/tests/test_config.py	2013-01-24 18:22:32 +0000
 +++ landscape/manager/tests/test_config.py	2013-02-22 00:26:21 +0000
@@ -13,7 +13,8 @@
          """By default all plugins are enabled."""
          self.assertEqual(["ProcessKiller", "PackageManager", "UserManager",
                            "ShutdownManager", "Eucalyptus", "AptSources",
--                          "HardwareInfo", "CephUsage", "KeystoneToken"],
++                          "HardwareInfo", "CephUsage", "KeystoneToken",
++                          "HAService"],
                           ALL_PLUGINS)
          self.assertEqual(ALL_PLUGINS, self.config.plugin_factories)
 === added file 'landscape/manager/tests/test_haservice.py'
 --- landscape/manager/tests/test_haservice.py	1970-01-01 00:00:00 +0000
 +++ landscape/manager/tests/test_haservice.py	2013-02-22 00:26:21 +0000
@@ -0,0 +1,331 @@
++import os
++
++from twisted.internet.defer import Deferred
++
++
++from landscape.manager.haservice import HAService
++from landscape.manager.plugin import SUCCEEDED, FAILED
++from landscape.tests.helpers import LandscapeTest, ManagerHelper
++from landscape.tests.mocker import ANY
++
++
++class HAServiceTests(LandscapeTest):
++    helpers = [ManagerHelper]
++
++    def setUp(self):
++        super(HAServiceTests, self).setUp()
++        self.ha_service = HAService()
++        self.ha_service.JUJU_UNITS_BASE = self.makeDir()
++        self.unit_name = "my-service-9"
++
++        self.health_check_d = os.path.join(
++            self.ha_service.JUJU_UNITS_BASE, self.unit_name, "charm",
++             self.ha_service.HEALTH_SCRIPTS_DIR)
++        # create entire dir path
++        os.makedirs(self.health_check_d)
++
++        self.manager.add(self.ha_service)
++
++        unit_dir = "%s/%s/charm" % (
++            self.ha_service.JUJU_UNITS_BASE, self.unit_name)
++        cluster_online = file(
++            "%s/add_to_cluster" % unit_dir, "w")
++        cluster_online.write("#!/bin/bash\nexit 0")
++        cluster_online.close()
++        cluster_standby = file(
++            "%s/remove_from_cluster" % unit_dir, "w")
++        cluster_standby.write("#!/bin/bash\nexit 0")
++        cluster_standby.close()
++
++        os.chmod(
++            "%s/add_to_cluster" % unit_dir, 0755)
++        os.chmod(
++            "%s/remove_from_cluster" % unit_dir, 0755)
++
++        service = self.broker_service
++        service.message_store.set_accepted_types(["operation-result"])
++
++    def test_invalid_server_service_state_request(self):
++        """
++        When the landscape server requests a C{service-state} other than
++        'online' or 'standby' the client responds with the appropriate error.
++        """
++        logging_mock = self.mocker.replace("logging.error")
++        logging_mock("Invalid cluster participation state requested BOGUS.")
++        self.mocker.replay()
++
++        self.manager.dispatch_message(
++            {"type": "change-ha-service", "service-name": "my-service",
++             "unit-name": self.unit_name, "service-state": "BOGUS",
++             "operation-id": 1})
++
++        service = self.broker_service
++        self.assertMessages(
++            service.message_store.get_pending_messages(),
++            [{"type": "operation-result", "result-text":
++              u"Invalid cluster participation state requested BOGUS.",
++              "status": FAILED, "operation-id": 1}])
++
++    def test_not_a_juju_computer(self):
++        """
++        When not a juju charmed computer, L{HAService} reponds with an error
++        due to missing JUJU_UNITS_BASE dir.
++        """
++        self.ha_service.JUJU_UNITS_BASE = "/I/don't/exist"
++
++        logging_mock = self.mocker.replace("logging.error")
++        logging_mock("This computer is not deployed with juju. "
++                     "Changing high-availability service not supported.")
++        self.mocker.replay()
++
++        self.manager.dispatch_message(
++            {"type": "change-ha-service", "service-name": "my-service",
++             "unit-name": self.unit_name,
++             "service-state": self.ha_service.STATE_STANDBY,
++             "operation-id": 1})
++
++        service = self.broker_service
++        self.assertMessages(
++            service.message_store.get_pending_messages(),
++            [{"type": "operation-result", "result-text":
++              u"This computer is not deployed with juju. Changing "
++              u"high-availability service not supported.",
++              "status": FAILED, "operation-id": 1}])
++
++    def test_incorrect_juju_unit(self):
++        """
++        When not the specific juju charmed computer, L{HAService} reponds
++        with an error due to missing the JUJU_UNITS_BASE/$JUJU_UNIT dir.
++        """
++        logging_mock = self.mocker.replace("logging.error")
++        logging_mock("This computer is not juju unit some-other-service-0. "
++                     "Unable to modify high-availability services.")
++        self.mocker.replay()
++
++        self.manager.dispatch_message(
++            {"type": "change-ha-service", "service-name": "some-other-service",
++             "unit-name": "some-other-service-0", "service-state": "standby",
++             "operation-id": 1})
++
++        service = self.broker_service
++        self.assertMessages(
++            service.message_store.get_pending_messages(),
++            [{"type": "operation-result", "result-text":
++              u"This computer is not juju unit some-other-service-0. "
++              u"Unable to modify high-availability services.",
++              "status": FAILED, "operation-id": 1}])
++
++    def test_wb_no_health_check_directory(self):
++        """
++        When unable to find a valid C{HEALTH_CHECK_DIR}, L{HAService} will
++        succeed but log an informational message.
++        """
++        self.ha_service.HEALTH_SCRIPTS_DIR = "I/don't/exist"
++
++        def should_not_be_called(result):
++            self.fail(
++                "_run_health_checks failed on absent health check directory.")
++
++        def check_success_result(result):
++            self.assertEqual(
++                result,
++                "Skipping juju charm health checks. No scripts at "
++                "%s/%s/charm/I/don't/exist." %
++                (self.ha_service.JUJU_UNITS_BASE, self.unit_name))
++
++        result = self.ha_service._run_health_checks(self.unit_name)
++        result.addCallbacks(check_success_result, should_not_be_called)
++
++    def test_wb_no_health_check_scripts(self):
++        """
++        When C{HEALTH_CHECK_DIR} exists but, no scripts exist, L{HAService}
++        will log an informational message, but succeed.
++        """
++        # In setup we created a health check directory but placed no health
++        # scripts in it.
++        def should_not_be_called(result):
++            self.fail(
++                "_run_health_checks failed on empty health check directory.")
++
++        def check_success_result(result):
++            self.assertEqual(
++                result,
++                "Skipping juju charm health checks. No scripts at "
++                "%s/%s/charm/%s." %
++                (self.ha_service.JUJU_UNITS_BASE, self.unit_name,
++                 self.ha_service.HEALTH_SCRIPTS_DIR))
++
++        result = self.ha_service._run_health_checks(self.unit_name)
++        result.addCallbacks(check_success_result, should_not_be_called)
++
++    def test_wb_failed_health_script(self):
++        """
++        L{HAService} runs all health check scripts found in the
++        C{HEALTH_CHECK_DIR}. If any script fails, L{HAService} will return a
++        deferred L{fail}.
++        """
++
++        def expected_failure(result):
++            self.assertEqual(
++                str(result.value),
++                "Failed charm script: %s/%s/charm/%s/my-health-script-2 "
++                "exited with return code 1." %
++                (self.ha_service.JUJU_UNITS_BASE, self.unit_name,
++                 self.ha_service.HEALTH_SCRIPTS_DIR))
++
++        def check_success_result(result):
++            self.fail(
++                "_run_health_checks succeded despite a failed health script.")
++
++        for number in [1, 2, 3]:
++            script_path = (
++                "%s/my-health-script-%d" % (self.health_check_d, number))
++            health_script = file(script_path, "w")
++            if number == 2:
++                health_script.write("#!/bin/bash\nexit 1")
++            else:
++                health_script.write("#!/bin/bash\nexit 0")
++            health_script.close()
++            os.chmod(script_path, 0755)
++
++        result = self.ha_service._run_health_checks(self.unit_name)
++        result.addCallbacks(check_success_result, expected_failure)
++        return result
++
++    def test_missing_cluster_standby_or_cluster_online_scripts(self):
++        """
++        When no cluster status change scripts are delivered by the charm,
++        L{HAService} will still return a L{succeeded}.
++        C{HEALTH_CHECK_DIR}. If any script fails, L{HAService} will return a
++        deferred L{fail}.
++        """
++
++        def should_not_be_called(result):
++            self.fail(
++                "_change_cluster_participation failed on absent charm script.")
++
++        def check_success_result(result):
++            self.assertEqual(
++                result,
++                "This computer is always a participant in its high-availabilty"
++                " cluster. No juju charm cluster settings changed.")
++
++        self.ha_service.CLUSTER_ONLINE = "I/don't/exist"
++        self.ha_service.CLUSTER_STANDBY = "I/don't/exist"
++
++        result = self.ha_service._change_cluster_participation(
++            None, self.unit_name, self.ha_service.STATE_ONLINE)
++        result.addCallbacks(check_success_result, should_not_be_called)
++
++        # Now test the cluster standby script
++        result = self.ha_service._change_cluster_participation(
++            None, self.unit_name, self.ha_service.STATE_STANDBY)
++        result.addCallbacks(check_success_result, should_not_be_called)
++        return result
++
++    def test_failed_cluster_standby_or_cluster_online_scripts(self):
++        def expected_failure(result, script_path):
++            self.assertEqual(
++                str(result.value),
++                "Failed charm script: %s exited with return code 2." %
++                (script_path))
++
++        def check_success_result(result):
++            self.fail(
++                "_change_cluster_participation ignored charm script failure.")
++
++        # Rewrite both cluster scripts as failures
++        unit_dir = "%s/%s/charm" % (
++            self.ha_service.JUJU_UNITS_BASE, self.unit_name)
++        for script_name in [
++            self.ha_service.CLUSTER_ONLINE, self.ha_service.CLUSTER_STANDBY]:
++
++            cluster_online = file("%s/%s" % (unit_dir, script_name), "w")
++            cluster_online.write("#!/bin/bash\nexit 2")
++            cluster_online.close()
++
++        result = self.ha_service._change_cluster_participation(
++            None, self.unit_name, self.ha_service.STATE_ONLINE)
++        result.addCallback(check_success_result)
++        script_path = ("%s/%s" % (unit_dir, self.ha_service.CLUSTER_ONLINE))
++        result.addErrback(expected_failure, script_path)
++
++        # Now test the cluster standby script
++        result = self.ha_service._change_cluster_participation(
++            None, self.unit_name, self.ha_service.STATE_STANDBY)
++        result.addCallback(check_success_result)
++        script_path = ("%s/%s" % (unit_dir, self.ha_service.CLUSTER_STANDBY))
++        result.addErrback(expected_failure, script_path)
++        return result
++
++    def test_run_success_cluster_standby(self):
++        """
++        When receives a C{change-ha-service message} with C{STATE_STANDBY}
++        requested the manager runs the C{CLUSTER_STANDBY} script and returns
++        a successful operation-result to the server.
++        """
++        message = ({"type": "change-ha-service", "service-name": "my-service",
++                    "unit-name": self.unit_name,
++                    "service-state": self.ha_service.STATE_STANDBY,
++                    "operation-id": 1})
++        deferred = Deferred()
++
++        def validate_messages(value):
++            cluster_script = "%s/%s/charm/%s" % (
++                self.ha_service.JUJU_UNITS_BASE, self.unit_name,
++                self.ha_service.CLUSTER_STANDBY)
++            service = self.broker_service
++            self.assertMessages(
++                service.message_store.get_pending_messages(),
++                [{"type": "operation-result",
++                  "result-text": u"%s succeeded." % cluster_script,
++                  "status": SUCCEEDED, "operation-id": 1}])
++
++        def handle_has_run(handle_result_deferred):
++            handle_result_deferred.chainDeferred(deferred)
++            return deferred.addCallback(validate_messages)
++
++        ha_service_mock = self.mocker.patch(self.ha_service)
++        ha_service_mock.handle_change_ha_service(ANY)
++        self.mocker.passthrough(handle_has_run)
++        self.mocker.replay()
++        self.manager.add(self.ha_service)
++        self.manager.dispatch_message(message)
++
++        return deferred
++
++    def test_run_success_cluster_online(self):
++        """
++        When receives a C{change-ha-service message} with C{STATE_ONLINE}
++        requested the manager runs the C{CLUSTER_ONLINE} script and returns
++        a successful operation-result to the server.
++        """
++        message = ({"type": "change-ha-service", "service-name": "my-service",
++                    "unit-name": self.unit_name,
++                    "service-state": self.ha_service.STATE_ONLINE,
++                    "operation-id": 1})
++        deferred = Deferred()
++
++        def validate_messages(value):
++            cluster_script = "%s/%s/charm/%s" % (
++                self.ha_service.JUJU_UNITS_BASE, self.unit_name,
++                self.ha_service.CLUSTER_ONLINE)
++            service = self.broker_service
++            self.assertMessages(
++                service.message_store.get_pending_messages(),
++                [{"type": "operation-result",
++                  "result-text": u"%s succeeded." % cluster_script,
++                  "status": SUCCEEDED, "operation-id": 1}])
++
++        def handle_has_run(handle_result_deferred):
++            handle_result_deferred.chainDeferred(deferred)
++            return deferred.addCallback(validate_messages)
++
++        ha_service_mock = self.mocker.patch(self.ha_service)
++        ha_service_mock.handle_change_ha_service(ANY)
++        self.mocker.passthrough(handle_has_run)
++        self.mocker.replay()
++        self.manager.add(self.ha_service)
++        self.manager.dispatch_message(message)
++
++        return deferred
 === modified file 'landscape/message_schemas.py'
 --- landscape/message_schemas.py	2013-02-21 13:35:54 +0000
 +++ landscape/message_schemas.py	2013-02-22 00:26:21 +0000
@@ -125,6 +125,12 @@
      "data": Any(String(), Constant(None))
  })
++CHANGE_HA_SERVICE = Message(
++    "change-ha-service",
++    {"service-name": String(),  # keystone
++     "unit-name": String(),     # keystone-9
++     "state": String()})        # online or standby
++
  MEMORY_INFO = Message("memory-info", {
      "memory-info": List(Tuple(Float(), Int(), Int())),
      })
@@ -445,5 +451,6 @@
                 CUSTOM_GRAPH, REBOOT_REQUIRED, APT_PREFERENCES, EUCALYPTUS_INFO,
                 EUCALYPTUS_INFO_ERROR, NETWORK_DEVICE, NETWORK_ACTIVITY,
                 REBOOT_REQUIRED_INFO, UPDATE_MANAGER_INFO, CPU_USAGE,
--               CEPH_USAGE, SWIFT_DEVICE_INFO, KEYSTONE_TOKEN]:
++               CEPH_USAGE, SWIFT_DEVICE_INFO, KEYSTONE_TOKEN,
++               CHANGE_HA_SERVICE]:
      message_schemas[schema.type] = schema