Corosync 2.99 altered the status output for udp/udpu rings to
be hardcoded to 'OK'. This breaks the check_corosync_rings nrpe
check which is looking for 'ring $number active with no faults'.
Since the value has been hardcoded to show 'OK', the check itself
does not provide any real meaningful value.
Change-Id: I642ecf11946b1ea791a27c54f0bec54adbfecb83
Closes-Bug: #1902919
(cherry picked from commit 3080d64281afa5e65f39bb47d224d66d25bf702c)
* charm-helpers sync for classic charms
* charms.ceph sync for ceph charms
* rebuild for reactive charms
* sync tox.ini files as needed
* sync requirements.txt files to sync to standard
There appears to be a window between a pacemaker remote resource
being added and the location properties for that resource being
added. In this window the resource is down and pacemaker may fence
the node.
The window is present because the charm charm currently does:
1) Set stonith-enabled=true cluster property
2) Add maas stonith device that controls pacemaker remote node that
has not yet been added.
3) Add pacemaker remote node
4) Add pacemaker location rules.
I think the following two fixes are needed:
1) For initial deploys update the charm so it does not enable stonith
until stonith resources and pacemaker remotes have been added.
2) For scale-out do not add the new pacemaker remote stonith resource
until the corresponding pacemaker resource has been added along
with its location rules.
Use location directives to spread pacemaker remote resources across
cluster. This is to prevent multiple resources being taken down in
the event of a single node failure. This would usually not be a
problem but if the node is being queried by masakari host
monitors at the time the node goes down then the query can hang.