Merge lp:~thedac/charm-helpers/legacy_leadership_peer_retrieve_fix into lp:charm-helpers

Proposed by David Ames
Status: Work in progress
Proposed branch: lp:~thedac/charm-helpers/legacy_leadership_peer_retrieve_fix
Merge into: lp:charm-helpers
Diff against target: 31 lines (+10/-1)
1 file modified
charmhelpers/contrib/peerstorage/__init__.py (+10/-1)
To merge this branch: bzr merge lp:~thedac/charm-helpers/legacy_leadership_peer_retrieve_fix
Reviewer Review Type Date Requested Status
charmers Pending
Review via email: mp+269400@code.launchpad.net

Description of the change

Fixes clustering race condition LP Bug#1486177 for juju versions < 1.24
Use remote_unit() in relation_get when in a native cluster relation hook

To post a comment you must log in.
Revision history for this message
Ryan Beisner (1chb1n) wrote :

Just noting that this MP is pre-requisite for https://code.launchpad.net/~thedac/charms/trusty/rabbitmq-server/native-cluster-race-fixes/+merge/269395 but is also pending test completion there.

Thank you for your work on this!

Revision history for this message
Ryan Beisner (1chb1n) wrote :

+$0.02:

I've got this branch, plus the native-cluster-race-fixes branch both merged into my MP for the refactored and extended rmq amulet tests @

https://code.launchpad.net/~1chb1n/charms/trusty/rabbitmq-server/amulet-refactor-1508/+merge/269749

With that:

Test automation using juju with LE (1.24.5) is looking solid. I have yet to observe a single failure-to-cluster occurrence. I've cycled 50+ on bare metal and NaN virtually. ;-)

Manual testing using juju sans LE (1.22.6) is significantly improved.

I did some non-LE iterations on bare metal last week which showed only a rare failure-to-cluster occurrence (2 out of 53 loops). That, versus a complete failure (25 of 25) to cluster without the proposed cluster fixes when using pre-LE jujus + rmq 15.07 charm. FWIW, that was on bare metal, with all hosts resolving forward and reverse.

Revision history for this message
Liam Young (gnuoy) wrote :

I'm nervous that this has the potential to break existing charms if they have been previously querying data they set and are now switching to retrieve data from the remote unit.

Revision history for this message
David Ames (thedac) wrote :

Note this is only for juju versions with no leadership election. LE versions of juju skip this code entirely.

When relation_get is called in a non-cluster relation hook and we are checking for *cluster* relation data,
the remote unit is not a peer and the remote unit is ignorant of the data set on the *cluster* relation.
So we use the local unit, which for some reason (possibly a bug) gives us the cluster relation data.

However, when we are in a native cluster relation hook using the local unit gives an empty set (AFAIK always).
So we need to use the remote_unit to get the relation data. In this case cookie, without this rabbit never sees a peer cookie and never clusters.

In rabbit's case it is a chicken/egg problem. The non-leader never gets the peer data in order to echo it.

### Example in a cluster relation

# echo $JUJU_RELATION
cluster
# echo $JUJU_RELATION_ID
cluster:1
# echo $JUJU_UNIT_NAME
rabbitmq-server/2
# echo $JUJU_REMOTE_UNIT
rabbitmq-server/0

## Using local unit gets no peer data
# relation-get -r cluster:1 - $JUJU_UNIT_NAME
private-address: 10.5.19.154

## Using remote unit gets peer data
# relation-get -r cluster:1 - $JUJU_REMOTE_UNIT
amqp:2_hostname: 10.5.19.152
amqp:2_password: JXyBbWFXsPZqwSjZXhJmkNLxgC5PnnPWpz9SChqGS6X7MsZKCSp4fzbXTSy9mm53
amqp:2_private-address: 10.5.19.152
cinder.passwd: JXyBbWFXsPZqwSjZXhJmkNLxgC5PnnPWpz9SChqGS6X7MsZKCSp4fzbXTSy9mm53
cookie: GYIRHLPVKCKJHNBLSQRE
private-address: 10.5.19.152

## Relation get by itself also works in a cluster relation
# relation-get
amqp:2_hostname: 10.5.19.152
amqp:2_password: JXyBbWFXsPZqwSjZXhJmkNLxgC5PnnPWpz9SChqGS6X7MsZKCSp4fzbXTSy9mm53
amqp:2_private-address: 10.5.19.152
cinder.passwd: JXyBbWFXsPZqwSjZXhJmkNLxgC5PnnPWpz9SChqGS6X7MsZKCSp4fzbXTSy9mm53
cookie: GYIRHLPVKCKJHNBLSQRE
private-address: 10.5.19.152

### Example in amqp relation

# echo $JUJU_RELATION
amqp
# echo $JUJU_RELATION_ID
amqp:2
# echo $JUJU_UNIT_NAME
rabbitmq-server/2
# echo $JUJU_REMOTE_UNIT
cinder/0

## Using local unit gets data
# relation-get -r cluster:1 - $JUJU_UNIT_NAME
amqp:2_hostname: 10.5.19.152
amqp:2_password: JXyBbWFXsPZqwSjZXhJmkNLxgC5PnnPWpz9SChqGS6X7MsZKCSp4fzbXTSy9mm53
amqp:2_private-address: 10.5.19.152
cinder.passwd: JXyBbWFXsPZqwSjZXhJmkNLxgC5PnnPWpz9SChqGS6X7MsZKCSp4fzbXTSy9mm53
cookie: GYIRHLPVKCKJHNBLSQRE
private-address: 10.5.19.154

## Using remote unit gets permission denied
# relation-get -r cluster:1 - $JUJU_REMOTE_UNIT
error: permission denied

## Relation get by itself gets the rabbitmq-server <-> cinder data not peer data
# relation-get
private-address: 10.5.19.151
username: cinder
vhost: openstack

Unmerged revisions

434. By David Ames

Fix Bug#1486177
Handle legacy leadership peer_retrieve
relation_get needs to use remote unit in a native cluster relation hook but local unit in a non-cluster relation hook as the remote unit is not a part of the cluster relation

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'charmhelpers/contrib/peerstorage/__init__.py'
2--- charmhelpers/contrib/peerstorage/__init__.py 2015-08-03 13:23:37 +0000
3+++ charmhelpers/contrib/peerstorage/__init__.py 2015-08-27 15:57:17 +0000
4@@ -27,6 +27,8 @@
5 leader_get as _leader_get,
6 leader_set,
7 is_leader,
8+ remote_unit,
9+ relation_type,
10 )
11
12
13@@ -162,10 +164,17 @@
14 def peer_retrieve(key, relation_name='cluster'):
15 """Retrieve a named key from peer relation `relation_name`."""
16 cluster_rels = relation_ids(relation_name)
17+ # Remote unit if we are in a native cluster relation hook
18+ if relation_type() == 'cluster':
19+ unit = remote_unit()
20+ # Local unit if we are in a non-cluster relation hook
21+ # The remote unit is not a part of the cluster relation
22+ else:
23+ unit = local_unit()
24 if len(cluster_rels) > 0:
25 cluster_rid = cluster_rels[0]
26 return relation_get(attribute=key, rid=cluster_rid,
27- unit=local_unit())
28+ unit=unit)
29 else:
30 raise ValueError('Unable to detect'
31 'peer relation {}'.format(relation_name))

Subscribers

People subscribed via source and target branches