Merge lp:~thedac/charm-helpers/legacy_leadership_peer_retrieve_fix into lp:charm-helpers
| Status: | Work in progress |
|---|---|
| Proposed branch: | lp:~thedac/charm-helpers/legacy_leadership_peer_retrieve_fix |
| Merge into: | lp:charm-helpers |
| Diff against target: |
31 lines (+10/-1) 1 file modified
charmhelpers/contrib/peerstorage/__init__.py (+10/-1) |
| To merge this branch: | bzr merge lp:~thedac/charm-helpers/legacy_leadership_peer_retrieve_fix |
| Related bugs: |
| Reviewer | Review Type | Date Requested | Status |
|---|---|---|---|
| charmers | 2015-08-27 | Pending | |
|
Review via email:
|
|||
Description of the Change
Fixes clustering race condition LP Bug#1486177 for juju versions < 1.24
Use remote_unit() in relation_get when in a native cluster relation hook
| Ryan Beisner (1chb1n) wrote : | # |
| Ryan Beisner (1chb1n) wrote : | # |
+$0.02:
I've got this branch, plus the native-
https:/
With that:
Test automation using juju with LE (1.24.5) is looking solid. I have yet to observe a single failure-to-cluster occurrence. I've cycled 50+ on bare metal and NaN virtually. ;-)
Manual testing using juju sans LE (1.22.6) is significantly improved.
I did some non-LE iterations on bare metal last week which showed only a rare failure-to-cluster occurrence (2 out of 53 loops). That, versus a complete failure (25 of 25) to cluster without the proposed cluster fixes when using pre-LE jujus + rmq 15.07 charm. FWIW, that was on bare metal, with all hosts resolving forward and reverse.
| Liam Young (gnuoy) wrote : | # |
I'm nervous that this has the potential to break existing charms if they have been previously querying data they set and are now switching to retrieve data from the remote unit.
| David Ames (thedac) wrote : | # |
Note this is only for juju versions with no leadership election. LE versions of juju skip this code entirely.
When relation_get is called in a non-cluster relation hook and we are checking for *cluster* relation data,
the remote unit is not a peer and the remote unit is ignorant of the data set on the *cluster* relation.
So we use the local unit, which for some reason (possibly a bug) gives us the cluster relation data.
However, when we are in a native cluster relation hook using the local unit gives an empty set (AFAIK always).
So we need to use the remote_unit to get the relation data. In this case cookie, without this rabbit never sees a peer cookie and never clusters.
In rabbit's case it is a chicken/egg problem. The non-leader never gets the peer data in order to echo it.
### Example in a cluster relation
# echo $JUJU_RELATION
cluster
# echo $JUJU_RELATION_ID
cluster:1
# echo $JUJU_UNIT_NAME
rabbitmq-server/2
# echo $JUJU_REMOTE_UNIT
rabbitmq-server/0
## Using local unit gets no peer data
# relation-get -r cluster:1 - $JUJU_UNIT_NAME
private-address: 10.5.19.154
## Using remote unit gets peer data
# relation-get -r cluster:1 - $JUJU_REMOTE_UNIT
amqp:2_hostname: 10.5.19.152
amqp:2_password: JXyBbWFXsPZqwSj
amqp:2_
cinder.passwd: JXyBbWFXsPZqwSj
cookie: GYIRHLPVKCKJHNB
private-address: 10.5.19.152
## Relation get by itself also works in a cluster relation
# relation-get
amqp:2_hostname: 10.5.19.152
amqp:2_password: JXyBbWFXsPZqwSj
amqp:2_
cinder.passwd: JXyBbWFXsPZqwSj
cookie: GYIRHLPVKCKJHNB
private-address: 10.5.19.152
### Example in amqp relation
# echo $JUJU_RELATION
amqp
# echo $JUJU_RELATION_ID
amqp:2
# echo $JUJU_UNIT_NAME
rabbitmq-server/2
# echo $JUJU_REMOTE_UNIT
cinder/0
## Using local unit gets data
# relation-get -r cluster:1 - $JUJU_UNIT_NAME
amqp:2_hostname: 10.5.19.152
amqp:2_password: JXyBbWFXsPZqwSj
amqp:2_
cinder.passwd: JXyBbWFXsPZqwSj
cookie: GYIRHLPVKCKJHNB
private-address: 10.5.19.154
## Using remote unit gets permission denied
# relation-get -r cluster:1 - $JUJU_REMOTE_UNIT
error: permission denied
## Relation get by itself gets the rabbitmq-server <-> cinder data not peer data
# relation-get
private-address: 10.5.19.151
username: cinder
vhost: openstack
Unmerged revisions
- 434. By David Ames on 2015-08-26
-
Fix Bug#1486177
Handle legacy leadership peer_retrieve
relation_get needs to use remote unit in a native cluster relation hook but local unit in a non-cluster relation hook as the remote unit is not a part of the cluster relation


Just noting that this MP is pre-requisite for https:/ /code.launchpad .net/~thedac/ charms/ trusty/ rabbitmq- server/ native- cluster- race-fixes/ +merge/ 269395 but is also pending test completion there.
Thank you for your work on this!