Merge ~xavpaice/juju-verify:ceph-mon into ~canonical-solutions-engineering/juju-verify:master

Proposed by Xav Paice
Status: Superseded
Proposed branch: ~xavpaice/juju-verify:ceph-mon
Merge into: ~canonical-solutions-engineering/juju-verify:master
Prerequisite: ~xavpaice/juju-verify:add_functest
Diff against target: 378 lines (+196/-23) (has conflicts)
7 files modified
juju_verify/verifiers/__init__.py (+2/-1)
juju_verify/verifiers/ceph.py (+74/-3)
tests/functional/requirements.txt (+1/-0)
tests/functional/tests/bundles/hacluster.yaml (+1/-1)
tests/functional/tests/test_ceph.py (+29/-0)
tests/unit/verifiers/test_ceph.py (+87/-16)
tox.ini (+2/-2)
Conflict in juju_verify/verifiers/ceph.py
Conflict in tests/unit/verifiers/test_ceph.py
Reviewer Review Type Date Requested Status
Robert Gildein Needs Fixing
🤖 prod-jenkaas-bootstack (community) continuous-integration Approve
Canonical Solutions Engineering Team Pending
Review via email: mp+400081@code.launchpad.net

This proposal supersedes a proposal from 2021-03-24.

This proposal has been superseded by a proposal from 2021-04-28.

Commit message

Add verifier for ceph-mon

This change adds a CephMon verifier for the ceph-mon charm. It uses
existing actions to ensure the cluster has HEALTH_OK status.

The change relies on ceph-mon get-quorum-status action, introduced in
https://review.opendev.org/c/openstack/charm-ceph-mon/+/778837.

Also uses Juju changes in 2.8.10, and introduces a dependency on that
version.

To post a comment you must log in.
Revision history for this message
🤖 Canonical IS Merge Bot (canonical-is-mergebot) wrote : Posted in a previous version of this proposal

This merge proposal is being monitored by mergebot. Change the status to Approved to merge.

Revision history for this message
🤖 Canonical IS Merge Bot (canonical-is-mergebot) wrote : Posted in a previous version of this proposal

Unable to determine commit message from repository - please click "Set commit message" and enter the commit message manually.

Revision history for this message
🤖 prod-jenkaas-bootstack (prod-jenkaas-bootstack) wrote :

A CI job is currently in progress. A follow up comment will be added when it completes.

Revision history for this message
🤖 Canonical IS Merge Bot (canonical-is-mergebot) wrote :

This merge proposal is being monitored by mergebot. Change the status to Approved to merge.

Revision history for this message
🤖 prod-jenkaas-bootstack (prod-jenkaas-bootstack) wrote :
review: Approve (continuous-integration)
Revision history for this message
Martin Kalcok (martin-kalcok) wrote :

Code-wise this change is LGTM, thanks. I just have one in-line suggestion that I think is worth considering.

General question: I looked through the documentation and I see that odd number of monitors is not necessary for quorum, but it's strongly recommended [1]. Do you think it would be worth it to add warning output (via logger) that would warn user that the action would result in sub-optimal configuration of ceph-mon? (i.e. having 2 mons is not much better than 1, as it provides no fault tolerance).

[1] https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/

~xavpaice/juju-verify:ceph-mon updated
f12707b... by Xav Paice

Update ceph action output field names

https://review.opendev.org/c/openstack/charm-ceph-mon/+/778837 changed
the field names and format during the review process, this reflects that
change.

Revision history for this message
🤖 prod-jenkaas-bootstack (prod-jenkaas-bootstack) wrote :

A CI job is currently in progress. A follow up comment will be added when it completes.

Revision history for this message
🤖 prod-jenkaas-bootstack (prod-jenkaas-bootstack) wrote :
review: Approve (continuous-integration)
Revision history for this message
Xav Paice (xavpaice) wrote :

This will want a rebase and functest update once the functest change is finalized.

Revision history for this message
Robert Gildein (rgildein) wrote :

Overall, it looks good, the only problem is using "\n".

Revision history for this message
Robert Gildein (rgildein) :
review: Needs Fixing

Unmerged commits

f12707b... by Xav Paice

Update ceph action output field names

https://review.opendev.org/c/openstack/charm-ceph-mon/+/778837 changed
the field names and format during the review process, this reflects that
change.

afd975b... by Xav Paice

Add verifier for ceph-mon

This change adds a CephMon verifier for the ceph-mon charm. It uses
existing actions to ensure the cluster has HEALTH_OK status.

The change relies on ceph-mon get-quorum-status action, introduced in
https://review.opendev.org/c/openstack/charm-ceph-mon/+/778837.

Also uses Juju changes in 2.8.10, and introduces a dependency on that
version.

4def417... by Xav Paice

update unittests target to exclude functests

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/juju_verify/verifiers/__init__.py b/juju_verify/verifiers/__init__.py
2index 4400e0f..83329e1 100644
3--- a/juju_verify/verifiers/__init__.py
4+++ b/juju_verify/verifiers/__init__.py
5@@ -25,7 +25,7 @@ from juju.unit import Unit
6 from juju_verify.exceptions import CharmException
7 from juju_verify.utils.unit import parse_charm_name
8 from juju_verify.verifiers.base import BaseVerifier
9-from juju_verify.verifiers.ceph import CephOsd
10+from juju_verify.verifiers.ceph import CephOsd, CephMon
11 from juju_verify.verifiers.nova_compute import NovaCompute
12 from juju_verify.verifiers.result import Result
13
14@@ -35,6 +35,7 @@ logger = logging.getLogger(__name__)
15 SUPPORTED_CHARMS = {
16 'nova-compute': NovaCompute,
17 'ceph-osd': CephOsd,
18+ 'ceph-mon': CephMon,
19 }
20
21
22diff --git a/juju_verify/verifiers/ceph.py b/juju_verify/verifiers/ceph.py
23index 96253bd..524d31f 100644
24--- a/juju_verify/verifiers/ceph.py
25+++ b/juju_verify/verifiers/ceph.py
26@@ -15,6 +15,10 @@
27 # You should have received a copy of the GNU General Public License along with
28 # this program. If not, see https://www.gnu.org/licenses/.
29 """ceph-osd verification."""
30+<<<<<<< juju_verify/verifiers/ceph.py
31+=======
32+
33+>>>>>>> juju_verify/verifiers/ceph.py
34 import json
35 import logging
36 from typing import Dict, Optional, Any, List
37@@ -22,9 +26,9 @@ from typing import Dict, Optional, Any, List
38 from juju.unit import Unit
39
40 from juju_verify.utils.action import data_from_action
41-from juju_verify.utils.unit import verify_charm_unit, run_action_on_units
42+from juju_verify.utils.unit import run_action_on_units, verify_charm_unit
43 from juju_verify.verifiers.base import BaseVerifier
44-from juju_verify.verifiers.result import aggregate_results, Result
45+from juju_verify.verifiers.result import Result, aggregate_results
46
47 logger = logging.getLogger()
48
49@@ -43,6 +47,7 @@ class CephCommon(BaseVerifier): # pylint: disable=W0223
50 verify_charm_unit("ceph-mon", *units)
51 result = Result(success=True)
52 action_map = run_action_on_units(list(units), "get-health")
53+
54 for unit, action in action_map.items():
55 cluster_health = data_from_action(action, "message")
56 logger.debug("Unit (%s): Ceph cluster health '%s'", unit, cluster_health)
57@@ -85,7 +90,7 @@ class CephCommon(BaseVerifier): # pylint: disable=W0223
58 class CephOsd(CephCommon):
59 """Implementation of verification checks for the ceph-osd charm."""
60
61- NAME = 'ceph-osd'
62+ NAME = "ceph-osd"
63
64 def __init__(self, units: List[Unit]):
65 """Ceph-osd charm verifier."""
66@@ -174,3 +179,69 @@ class CephOsd(CephCommon):
67 def verify_shutdown(self) -> Result:
68 """Verify that it's safe to shutdown selected ceph-osd units."""
69 return self.verify_reboot()
70+
71+
72+class CephMon(CephCommon):
73+ """Implementation of verification checks for the ceph-mon charm."""
74+
75+ NAME = "ceph-mon"
76+
77+ def verify_reboot(self) -> Result:
78+ """Verify that it's safe to reboot selected ceph-mon units."""
79+ return aggregate_results(
80+ self.check_quorum(), self.check_cluster_health(*self.units)
81+ )
82+
83+ def verify_shutdown(self) -> Result:
84+ """Verify that it's safe to shutdown selected units."""
85+ return self.verify_reboot()
86+
87+ def check_quorum(self) -> Result:
88+ """Check that the shutdown does not result in <50% mons alive."""
89+ result = Result(success=True)
90+
91+ action_name = "get-quorum-status"
92+ action_results = self.run_action_on_all(action_name)
93+
94+ mons = {}
95+ affected_hosts = set()
96+
97+ for unit in self.units:
98+ # set of affected hostnames needs to be fully populated early
99+ # from online mons
100+
101+ if unit.machine.hostname:
102+ affected_hosts.update(
103+ [unit.machine.hostname]
104+ )
105+ else:
106+ # we don't have a hostname for this unit
107+ logger.error(
108+ "The machine for unit %s did not return a hostname", unit.entity_id
109+ )
110+ result.success = False
111+ result.reason += (
112+ "The machine for unit {} does not have a hostname attribute, please"
113+ "ensure that Juju is 2.8.10+\n".format(unit.entity_id)
114+ )
115+
116+ # populate a list of known mons, online mons and hostnames per unit
117+ for unit_id, action in action_results.items():
118+ # run this per unit because we might have multiple clusters
119+ mons[unit_id] = {
120+ "known": set(json.loads(data_from_action(action, "known-mons"))),
121+ "online": set(json.loads(data_from_action(action, "online-mons"))),
122+ }
123+
124+ for unit in self.units:
125+ mon_count = len(mons[unit.entity_id]["known"])
126+ mons_after_change = len(mons[unit.entity_id]["online"] - affected_hosts)
127+
128+ if mons_after_change <= mon_count / 2:
129+ result.success = False
130+ result.reason += (
131+ "Removing unit {} will lose Ceph mon "
132+ "quorum\n".format(unit.entity_id)
133+ )
134+
135+ return result
136diff --git a/tests/functional/requirements.txt b/tests/functional/requirements.txt
137index e6b19ec..fae35b2 100644
138--- a/tests/functional/requirements.txt
139+++ b/tests/functional/requirements.txt
140@@ -1,2 +1,3 @@
141 python-openstackclient
142 git+https://github.com/openstack-charmers/zaza.git#egg=zaza
143+git+https://github.com/juju/python-libjuju.git#egg=juju
144diff --git a/tests/functional/tests/bundles/hacluster.yaml b/tests/functional/tests/bundles/hacluster.yaml
145index dd166b7..6f54d8a 100644
146--- a/tests/functional/tests/bundles/hacluster.yaml
147+++ b/tests/functional/tests/bundles/hacluster.yaml
148@@ -1,7 +1,7 @@
149 series: xenial
150 applications:
151 dashboard-hacluster:
152- charm: cs:~xavpaice/hacluster-0
153+ charm: cs:~xavpaice/hacluster
154 keystone:
155 charm: cs:keystone
156 num_units: 1
157diff --git a/tests/functional/tests/test_ceph.py b/tests/functional/tests/test_ceph.py
158index 12f4ca4..3a116c8 100644
159--- a/tests/functional/tests/test_ceph.py
160+++ b/tests/functional/tests/test_ceph.py
161@@ -1,6 +1,7 @@
162 """Test deployment and functionality of juju-verify."""
163
164 from juju import loop
165+import zaza.model
166 from tests.base import BaseTestCase
167
168 from juju_verify import juju_verify
169@@ -23,3 +24,31 @@ class CephTests(BaseTestCase):
170 verifier = get_verifier(unit_objects)
171 result = verifier.verify(check)
172 self.assertTrue(result.success is True)
173+
174+ def test_single_mon_unit(self):
175+ """Test that shutdown of a single mon unit returns OK."""
176+ # juju-verify shutdown --units ceph-mon/0
177+
178+ units = ['ceph-mon/0']
179+ log_level = 'info'
180+ check = 'shutdown'
181+ juju_verify.config_logger(log_level)
182+ model = loop.run(juju_verify.connect_model(zaza.model.CURRENT_MODEL))
183+ unit_objects = loop.run(juju_verify.find_units(model, units))
184+ verifier = get_verifier(unit_objects)
185+ result = verifier.verify(check)
186+ self.assertTrue(result.success is True)
187+
188+ def test_two_mon_unit(self):
189+ """Test that shutdown of multiple mon units fails."""
190+ # juju-verify shutdown --units ceph-mon/0 ceph-mon/1
191+
192+ units = ['ceph-mon/0', 'ceph-mon/1']
193+ log_level = 'info'
194+ check = 'shutdown'
195+ juju_verify.config_logger(log_level)
196+ model = loop.run(juju_verify.connect_model(zaza.model.CURRENT_MODEL))
197+ unit_objects = loop.run(juju_verify.find_units(model, units))
198+ verifier = get_verifier(unit_objects)
199+ result = verifier.verify(check)
200+ self.assertTrue(result.success is False)
201diff --git a/tests/unit/verifiers/test_ceph.py b/tests/unit/verifiers/test_ceph.py
202index b9fb4b5..e443a02 100644
203--- a/tests/unit/verifiers/test_ceph.py
204+++ b/tests/unit/verifiers/test_ceph.py
205@@ -21,20 +21,31 @@ from unittest import mock
206 from unittest.mock import MagicMock
207
208 import pytest
209+from juju.model import Model
210+from juju.unit import Unit
211
212+<<<<<<< tests/unit/verifiers/test_ceph.py
213 from juju_verify.exceptions import CharmException
214 from juju_verify.verifiers import CephOsd, Result
215+=======
216+from juju_verify.verifiers import CephMon, CephOsd, Result
217+>>>>>>> tests/unit/verifiers/test_ceph.py
218 from juju_verify.verifiers.ceph import CephCommon
219
220
221 @mock.patch("juju_verify.verifiers.ceph.run_action_on_units")
222-@pytest.mark.parametrize("message, exp_result", [
223- ("HEALTH_OK ...", Result(True, "ceph-mon/0: Ceph cluster is healthy")),
224- ("HEALTH_WARN ...", Result(False, "ceph-mon/0: Ceph cluster is unhealthy")),
225- ("HEALTH_ERR ...", Result(False, "ceph-mon/0: Ceph cluster is unhealthy")),
226- ("not valid message",
227- Result(False, "ceph-mon/0: Ceph cluster is in an unknown state")),
228-])
229+@pytest.mark.parametrize(
230+ "message, exp_result",
231+ [
232+ ("HEALTH_OK ...", Result(True, "ceph-mon/0: Ceph cluster is healthy")),
233+ ("HEALTH_WARN ...", Result(False, "ceph-mon/0: Ceph cluster is unhealthy")),
234+ ("HEALTH_ERR ...", Result(False, "ceph-mon/0: Ceph cluster is unhealthy")),
235+ (
236+ "not valid message",
237+ Result(False, "ceph-mon/0: Ceph cluster is in an unknown state"),
238+ ),
239+ ],
240+)
241 def test_check_cluster_health(mock_run_action_on_units, message, exp_result, model):
242 """Test check Ceph cluster health."""
243 action = MagicMock()
244@@ -49,18 +60,27 @@ def test_check_cluster_health(mock_run_action_on_units, message, exp_result, mod
245 @mock.patch("juju_verify.verifiers.ceph.run_action_on_units")
246 def test_check_cluster_health_combination(mock_run_action_on_units, model):
247 """Test check Ceph cluster health combination of two diff state."""
248- exp_result = Result(False,
249- os.linesep.join(["ceph-mon/0: Ceph cluster is healthy",
250- "ceph-mon/1: Ceph cluster is unhealthy"]))
251+ exp_result = Result(
252+ False,
253+ os.linesep.join(
254+ [
255+ "ceph-mon/0: Ceph cluster is healthy",
256+ "ceph-mon/1: Ceph cluster is unhealthy",
257+ ]
258+ ),
259+ )
260 action_healthy = MagicMock()
261 action_healthy.data.get.side_effect = {"results": {"message": "HEALTH_OK"}}.get
262 action_unhealthy = MagicMock()
263 action_unhealthy.data.get.side_effect = {"results": {"message": "HEALTH_ERR"}}.get
264- mock_run_action_on_units.return_value = {"ceph-mon/0": action_healthy,
265- "ceph-mon/1": action_unhealthy}
266+ mock_run_action_on_units.return_value = {
267+ "ceph-mon/0": action_healthy,
268+ "ceph-mon/1": action_unhealthy,
269+ }
270
271- result = CephCommon.check_cluster_health(model.units["ceph-mon/0"],
272- model.units["ceph-mon/1"])
273+ result = CephCommon.check_cluster_health(
274+ model.units["ceph-mon/0"], model.units["ceph-mon/1"]
275+ )
276
277 assert result == exp_result
278
279@@ -70,8 +90,9 @@ def test_check_cluster_health_unknown_state(mock_run_action_on_units, model):
280 """Test check Ceph cluster health in unknown state."""
281 mock_run_action_on_units.return_value = {}
282
283- result = CephCommon.check_cluster_health(model.units["ceph-mon/0"],
284- model.units["ceph-mon/1"])
285+ result = CephCommon.check_cluster_health(
286+ model.units["ceph-mon/0"], model.units["ceph-mon/1"]
287+ )
288
289 assert result == Result(False, "Ceph cluster is in an unknown state")
290
291@@ -207,12 +228,17 @@ def test_verify_reboot(
292
293 @mock.patch("juju_verify.verifiers.ceph.CephOsd._get_ceph_mon_app_map")
294 @mock.patch("juju_verify.verifiers.ceph.CephCommon.check_cluster_health")
295+<<<<<<< tests/unit/verifiers/test_ceph.py
296 @mock.patch("juju_verify.verifiers.ceph.CephCommon.get_replication_number")
297 def test_verify_shutdown(
298 mock_get_replication_number,
299 mock_check_cluster_health,
300 mock_get_ceph_mon_app_map,
301 model
302+=======
303+def test_verify_ceph_osd_shutdown(
304+ mock_check_cluster_health, mock_get_ceph_mon_units, model
305+>>>>>>> tests/unit/verifiers/test_ceph.py
306 ):
307 """Test shutdown verification on CephOsd."""
308 mock_get_replication_number.return_value = 1
309@@ -225,3 +251,48 @@ def test_verify_shutdown(
310 mock_get_replication_number.return_value = None # empty list of pools
311 result = CephOsd([model.units["ceph-osd/0"]]).verify_shutdown()
312 assert result == Result(True, "Ceph cluster is healthy")
313+
314+
315+@pytest.mark.parametrize(
316+ "action_return_value, expected_result, hostname",
317+ [('["host0", "host1", "host2"]', Result(True), "host1"),
318+ ('["host0", "host1"]', Result(False), "host1"),
319+ ('["host0", "host1", "host2"]', Result(False), None),
320+ ('["host0", "host1", "host2"]', Result(True), "host5"),
321+ ],
322+)
323+def test_check_ceph_mon_quorum(mocker, action_return_value, expected_result, hostname):
324+ """Test Ceph quorum verification on CephMon."""
325+ unit = Unit("ceph-mon/0", Model())
326+ verifier = CephMon([unit])
327+ mocker.patch.object(verifier, "run_action_on_all").return_value = {
328+ "ceph-mon/0": action_return_value,
329+ }
330+ unit.machine = mock.Mock(hostname=hostname)
331+ mocker.patch(
332+ "juju_verify.verifiers.ceph.data_from_action"
333+ ).return_value = action_return_value
334+ result = verifier.check_quorum()
335+ assert result.success == expected_result.success
336+
337+
338+def test_verify_ceph_mon_shutdown(mocker):
339+ """Test that verify_shutdown links to verify_reboot."""
340+ mocker.patch.object(CephMon, "verify_reboot")
341+ unit = Unit("ceph-mon/0", Model())
342+ verifier = CephMon([unit])
343+ verifier.verify_shutdown()
344+ verifier.verify_reboot.assert_called_once()
345+
346+
347+@mock.patch("juju_verify.verifiers.ceph.CephCommon.check_cluster_health")
348+@mock.patch("juju_verify.verifiers.ceph.CephMon.check_quorum")
349+def test_verify_ceph_mon_reboot(mock_quorum, mock_health):
350+ """Test reboot verification on CephMon."""
351+ unit = Unit("ceph-mon/0", Model())
352+ mock_health.return_value = Result(True, "Ceph cluster is healthy")
353+ mock_quorum.return_value = Result(True)
354+ verifier = CephMon([unit])
355+ result = verifier.verify_reboot()
356+
357+ assert result.success
358diff --git a/tox.ini b/tox.ini
359index 0cce781..321a23d 100644
360--- a/tox.ini
361+++ b/tox.ini
362@@ -78,8 +78,8 @@ commands =
363 pip install --editable {toxinidir}/
364 functest-run-suite {posargs}
365 deps =
366- .[dev]
367 -r{toxinidir}/tests/functional/requirements.txt
368+ .[dev]
369
370
371 [testenv:func-debug]
372@@ -88,5 +88,5 @@ commands =
373 pip install --editable {toxinidir}/
374 functest-run-suite {posargs} --keep-model --log DEBUG
375 deps =
376- .[dev]
377 -r{toxinidir}/tests/functional/requirements.txt
378+ .[dev]

Subscribers

People subscribed via source and target branches