Comment 4 for bug 1500552

Revision history for this message
Jorge Niedbalski (niedbalski) wrote :

After reviewing the full logs (available on http://10.245.162.77:8080/job/charm_amulet_test/6470/).

From juju-osci-sv05-machine-4

The cluster is sane:

2015-09-16 19:10:39 INFO juju-log cluster:1: cluster status is Cluster status of node 'rabbit@juju-osci-sv05-machine-4' ...
[{nodes,[{disc,['rabbit@juju-osci-sv05-machine-4']}]},
 {running_nodes,['rabbit@juju-osci-sv05-machine-4']},
 {cluster_name,<<"<email address hidden>">>},
 {partitions,[]}]

At this point it tries to cluster with juju-osci-sv05-machine-2

2015-09-16 19:10:40 INFO juju-log cluster:1: Clustering with remote rabbit host (juju-osci-sv05-machine-2).

At this point the node rabbit@juju-osci-sv05-machine-4, stops itself.

2015-09-16 19:10:40 INFO cluster-relation-changed Stopping node 'rabbit@juju-osci-sv05-machine-4' ...

Then the reported issue here is a counter effect of the same.

2015-09-16 19:10:41 INFO worker.uniter.jujuc server.go:158 running hook tool "juju-log" ["Failed to cluster with juju-osci-sv05-machine-2."]

So, it seems that this is failing on hooks/rabbit_utils:cluster_with method on the following call (Line 302)

        try:
            cmd = [RABBITMQ_CTL, 'stop_app']
            subprocess.check_call(cmd)
            cmd = [RABBITMQ_CTL, cluster_cmd, 'rabbit@%s' % node]
            try:
                subprocess.check_output(cmd, stderr=subprocess.STDOUT)
            except subprocess.CalledProcessError as e:
                if not e.returncode == 2 or \
                        "{ok,already_member}" not in e.output:
                    raise e

So, at this point this exception is being catch'd and this is going unnoticed, everything from here
such as add_vhost , etc. Will fail since the unit is stopped.