Merge lp:~hazmat/pyjuju/ensemble-resolved into lp:pyjuju

Proposed by Kapil Thangavelu
Status: Merged
Approved by: Gustavo Niemeyer
Approved revision: 255
Merged at revision: 223
Proposed branch: lp:~hazmat/pyjuju/ensemble-resolved
Merge into: lp:pyjuju
Prerequisite: lp:~hazmat/pyjuju/resolved-state-api
Diff against target: 542 lines (+503/-1)
4 files modified
ensemble/control/__init__.py (+2/-0)
ensemble/control/resolved.py (+110/-0)
ensemble/control/tests/test_resolved.py (+390/-0)
ensemble/unit/lifecycle.py (+1/-1)
To merge this branch: bzr merge lp:~hazmat/pyjuju/ensemble-resolved
Reviewer Review Type Date Requested Status
Gustavo Niemeyer Approve
Review via email: mp+58607@code.launchpad.net

Description of the change

This adds a new command to ensemble to allow for manual correction of any errors that might occur in unit or unit relation hooks. The new command allows marking a unit or unit relation as needing to be resolved by the unit agent. It validates that the units are not running before scheduling them. It optionally allows specifying that hooks should run again.

To post a comment you must log in.
Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :

Looks good. Mostly just points we already talked about, plus moving
some of the error checking logic into the state API itself.

[1]

+ sub_parser = subparsers.add_parser("resolved", help=command.__doc__)

The nice docs are in the resolved function.

[2]

+ service_name, _ = unit_name.split("/")
+ service_state = yield service_manager.get_service_state(service_name)
+ unit_state = yield service_state.get_unit_state(unit_name)

Please invert the ordering here so that unit_state is retrieved
first. This will automatically validate the unit_name value
provided by the user, and avoid an ugly traceback in case the
user provides a bad unit_name value.

[3]

+ running, workflow_state = yield is_unit_running(client, unit_state)

It feels like that verification should be part of set_resolved(), since it
will never make sense to mark as resolved something which is already
running.

[4]

+ log.info("Unit %r already running:%s", unit_name, workflow_state)

s/running:%s/running: %s/

[5]

+ # Check for the matching relations
+ service_relations = yield relation_manager.get_relations_for_service(

As per our conversation, the mapping between user-provided relation names and
internal ids this should be moved into the state.

[6]

+ # Verify the relations are in need of resolution.
+ resolve_relations = {}

Same case as unit resolution: the error checking can be put within
set_relation_resolved() itself, and the command becomes a simple user
of the API.

review: Needs Fixing
Revision history for this message
Kapil Thangavelu (hazmat) wrote :

Excerpts from Gustavo Niemeyer's message of Tue Apr 26 19:30:57 UTC 2011:
> Review: Needs Fixing
> Looks good. Mostly just points we already talked about, plus moving
> some of the error checking logic into the state API itself.
>
>
> [1]
>
> + sub_parser = subparsers.add_parser("resolved", help=command.__doc__)
>
> The nice docs are in the resolved function.

Moved the long docs to subcommand description for resolved -h, updated the
command doc with a better summary.

>
> [2]
>
> + service_name, _ = unit_name.split("/")
> + service_state = yield service_manager.get_service_state(service_name)
> + unit_state = yield service_state.get_unit_state(unit_name)
>
> Please invert the ordering here so that unit_state is retrieved
> first. This will automatically validate the unit_name value
> provided by the user, and avoid an ugly traceback in case the
> user provides a bad unit_name value.
>

done.

[3-6]

as per irc discussions, moving the validation logic and internal-ids to inside
set-relation-resolved is eing deferred to another bug. The unit agent
implementation of resolved will need to verify values as well before acting
upon a unit/relation.

lp:~hazmat/pyjuju/ensemble-resolved updated
255. By Kapil Thangavelu

Merged resolved-state-api into ensemble-resolved.

Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :

This looks great, +1! Only one nit related to terminology:

[7]

+ # Verify the relations are in need of resolution.
+ resolve_relations = {}
(...)
+ # Mark the relations for resolution.
+ yield unit_state.set_relation_resolved(resolve_relations)

There's some terminology conflict here, in the spirit we debated in the IRC channel regarding the pre-requisite branch.

E.g.:

s/are in need of resolution/are not running/
s/resolve_relations/resolved_relations/
s/for resolution/as resolved/

review: Approve
lp:~hazmat/pyjuju/ensemble-resolved updated
256. By Kapil Thangavelu

Merged resolved-state-api into ensemble-resolved.

257. By Kapil Thangavelu

Merged resolved-state-api into ensemble-resolved.

258. By Kapil Thangavelu

Merged resolved-state-api into ensemble-resolved.

259. By Kapil Thangavelu

update code comments per review comment

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'ensemble/control/__init__.py'
2--- ensemble/control/__init__.py 2011-05-03 08:54:13 +0000
3+++ ensemble/control/__init__.py 2011-05-05 16:55:55 +0000
4@@ -17,6 +17,7 @@
5 import open_tunnel
6 import remove_relation
7 import remove_unit
8+import resolved
9 import status
10 import shutdown
11 import ssh
12@@ -39,6 +40,7 @@
13 open_tunnel,
14 remove_relation,
15 remove_unit,
16+ resolved,
17 status,
18 ssh,
19 shutdown,
20
21=== added file 'ensemble/control/resolved.py'
22--- ensemble/control/resolved.py 1970-01-01 00:00:00 +0000
23+++ ensemble/control/resolved.py 2011-05-05 16:55:55 +0000
24@@ -0,0 +1,110 @@
25+"""Implementation of resolved subcommand"""
26+
27+
28+from twisted.internet.defer import inlineCallbacks, returnValue
29+
30+from ensemble.control.utils import get_environment
31+
32+from ensemble.state.service import ServiceStateManager, RETRY_HOOKS, NO_HOOKS
33+from ensemble.state.relation import RelationStateManager
34+from ensemble.state.errors import RelationStateNotFound
35+from ensemble.unit.workflow import is_unit_running, is_relation_running
36+
37+
38+def configure_subparser(subparsers):
39+ """Configure resolved subcommand"""
40+ sub_parser = subparsers.add_parser(
41+ "resolved", help=command.__doc__, description=resolved.__doc__)
42+ sub_parser.add_argument(
43+ "--retry", "-r", action="store_true",
44+ help="Retry failed hook."),
45+ sub_parser.add_argument(
46+ "--environment", "-e",
47+ help="Ensemble environment to operate in.")
48+ sub_parser.add_argument(
49+ "service_unit_name",
50+ help="Name of the service unit that should be resolved")
51+ sub_parser.add_argument(
52+ "relation_name", nargs="?", default=None,
53+ help="Name of the unit relation that should be resolved")
54+ return sub_parser
55+
56+
57+def command(options):
58+ """Mark an error as resolved in a unit or unit relation."""
59+ environment = get_environment(options)
60+ return resolved(
61+ options.environments,
62+ environment,
63+ options.verbose,
64+ options.log,
65+ options.service_unit_name,
66+ options.relation_name,
67+ options.retry)
68+
69+
70+@inlineCallbacks
71+def resolved(
72+ config, environment, verbose, log, unit_name, relation_name, retry):
73+ """Mark an error as resolved in a unit or unit relation.
74+
75+ If one of a unit's formula non-relation hooks returns a non-zero exit
76+ status, the entire unit can be considered to be in a non-running state.
77+
78+ As a resolution, the the unit can be manually returned a running state
79+ via the ensemble resolved command. Optionally this command can also
80+ rerun the failed hook.
81+
82+ This resolution also applies separately to each of the unit's relations.
83+ If one of the relation-hooks failed. In that case there is no
84+ notion of retrying (the change is gone), but resolving will allow
85+ additional relation hooks for that relation to proceed.
86+ """
87+ provider = environment.get_machine_provider()
88+ client = yield provider.connect()
89+ service_manager = ServiceStateManager(client)
90+ relation_manager = RelationStateManager(client)
91+
92+ unit_state = yield service_manager.get_unit_state(unit_name)
93+ service_state = yield service_manager.get_service_state(
94+ unit_name.split("/")[0])
95+
96+ retry = retry and RETRY_HOOKS or NO_HOOKS
97+
98+ if not relation_name:
99+ running, workflow_state = yield is_unit_running(client, unit_state)
100+ if running:
101+ log.info("Unit %r already running: %s", unit_name, workflow_state)
102+ client.close()
103+ returnValue(False)
104+
105+ yield unit_state.set_resolved(retry)
106+ log.info("Marked unit %r as resolved", unit_name)
107+ returnValue(True)
108+
109+ # Check for the matching relations
110+ service_relations = yield relation_manager.get_relations_for_service(
111+ service_state)
112+ service_relations = [
113+ sr for sr in service_relations if sr.relation_name == relation_name]
114+ if not service_relations:
115+ raise RelationStateNotFound()
116+
117+ # Verify the relations are in need of resolution.
118+ resolved_relations = {}
119+ for service_relation in service_relations:
120+ unit_relation = yield service_relation.get_unit_state(unit_state)
121+ running, state = yield is_relation_running(client, unit_relation)
122+ if not running:
123+ resolved_relations[unit_relation.internal_relation_id] = retry
124+
125+ if not resolved_relations:
126+ log.warning("Matched relations are all running")
127+ client.close()
128+ returnValue(False)
129+
130+ # Mark the relations as resolved.
131+ yield unit_state.set_relation_resolved(resolved_relations)
132+ log.info(
133+ "Marked unit %r relation %r as resolved", unit_name, relation_name)
134+ client.close()
135
136=== added file 'ensemble/control/tests/test_resolved.py'
137--- ensemble/control/tests/test_resolved.py 1970-01-01 00:00:00 +0000
138+++ ensemble/control/tests/test_resolved.py 2011-05-05 16:55:55 +0000
139@@ -0,0 +1,390 @@
140+from twisted.internet.defer import inlineCallbacks, returnValue
141+from yaml import dump
142+
143+from ensemble.control import main
144+from ensemble.control.resolved import resolved
145+from ensemble.control.tests.common import ControlToolTest
146+from ensemble.formula.tests.test_repository import RepositoryTestBase
147+
148+from ensemble.state.service import RETRY_HOOKS, NO_HOOKS
149+from ensemble.state.tests.test_service import ServiceStateManagerTestBase
150+from ensemble.state.errors import ServiceStateNotFound
151+
152+from ensemble.unit.workflow import UnitWorkflowState, RelationWorkflowState
153+from ensemble.unit.lifecycle import UnitRelationLifecycle
154+from ensemble.hooks.executor import HookExecutor
155+
156+
157+class ControlResolvedTest(
158+ ServiceStateManagerTestBase, ControlToolTest, RepositoryTestBase):
159+
160+ @inlineCallbacks
161+ def setUp(self):
162+ yield super(ControlResolvedTest, self).setUp()
163+ config = {
164+ "ensemble": "environments",
165+ "environments": {"firstenv": {"type": "dummy"}}}
166+
167+ self.write_config(dump(config))
168+ self.config.load()
169+
170+ yield self.add_relation_state("wordpress", "mysql")
171+ yield self.add_relation_state("wordpress", "varnish")
172+
173+ self.service1 = yield self.service_state_manager.get_service_state(
174+ "mysql")
175+ self.service_unit1 = yield self.service1.add_unit_state()
176+ self.service_unit2 = yield self.service1.add_unit_state()
177+
178+ self.unit1_workflow = UnitWorkflowState(
179+ self.client, self.service_unit1, None, self.makeDir())
180+ yield self.unit1_workflow.set_state("started")
181+
182+ self.environment = self.config.get_default()
183+ self.provider = self.environment.get_machine_provider()
184+
185+ self.output = self.capture_logging()
186+ self.stderr = self.capture_stream("stderr")
187+ self.executor = HookExecutor()
188+
189+ @inlineCallbacks
190+ def add_relation_state(self, *service_names):
191+ for service_name in service_names:
192+ try:
193+ yield self.service_state_manager.get_service_state(
194+ service_name)
195+ except ServiceStateNotFound:
196+ yield self.add_service_from_formula(service_name)
197+
198+ endpoint_pairs = yield self.service_state_manager.join_descriptors(
199+ *service_names)
200+ endpoints = endpoint_pairs[0]
201+ endpoints = endpoint_pairs[0]
202+ if endpoints[0] == endpoints[1]:
203+ endpoints = endpoints[0:1]
204+ relation_state = (yield self.relation_state_manager.add_relation_state(
205+ *endpoints))[0]
206+ returnValue(relation_state)
207+
208+ @inlineCallbacks
209+ def get_named_service_relation(self, service_state, relation_name):
210+ if isinstance(service_state, str):
211+ service_state = yield self.service_state_manager.get_service_state(
212+ service_state)
213+
214+ rels = yield self.relation_state_manager.get_relations_for_service(
215+ service_state)
216+
217+ rels = [sr for sr in rels if sr.relation_name == relation_name]
218+ if len(rels) == 1:
219+ returnValue(rels[0])
220+ returnValue(rels)
221+
222+ @inlineCallbacks
223+ def setup_unit_relations(self, service_relation, *units):
224+ """
225+ Given a service relation and set of unit tuples in the form
226+ unit_state, unit_relation_workflow_state, will add unit relations
227+ for these units and update their workflow state to the desired/given
228+ state.
229+ """
230+ for unit, state in units:
231+ unit_relation = yield service_relation.add_unit_state(unit)
232+ lifecycle = UnitRelationLifecycle(
233+ self.client, unit_relation, service_relation.relation_name,
234+ self.makeDir(), self.executor)
235+ workflow_state = RelationWorkflowState(
236+ self.client, unit_relation, lifecycle, self.makeDir())
237+ yield workflow_state.set_state(state)
238+
239+ @inlineCallbacks
240+ def test_resolved(self):
241+ """
242+ 'ensemble resolved <unit_name>' will schedule a unit for
243+ retrying from an error state.
244+ """
245+ # Push the unit into an error state
246+ yield self.unit1_workflow.set_state("start_error")
247+ self.setup_exit(0)
248+ finished = self.setup_cli_reactor()
249+ self.mocker.replay()
250+
251+ self.assertEqual(
252+ (yield self.service_unit1.get_resolved()), None)
253+
254+ main(["resolved", "mysql/0"])
255+ yield finished
256+
257+ self.assertEqual(
258+ (yield self.service_unit1.get_resolved()), {"retry": NO_HOOKS})
259+ self.assertIn(
260+ "Marked unit 'mysql/0' as resolved",
261+ self.output.getvalue())
262+
263+ @inlineCallbacks
264+ def test_resolved_retry(self):
265+ """
266+ 'ensemble resolved --retry <unit_name>' will schedule a unit
267+ for retrying from an error state with a retry of hooks
268+ executions.
269+ """
270+ yield self.unit1_workflow.set_state("start_error")
271+ self.setup_exit(0)
272+ finished = self.setup_cli_reactor()
273+ self.mocker.replay()
274+
275+ self.assertEqual(
276+ (yield self.service_unit1.get_resolved()), None)
277+
278+ main(["resolved", "--retry", "mysql/0"])
279+ yield finished
280+
281+ self.assertEqual(
282+ (yield self.service_unit1.get_resolved()), {"retry": RETRY_HOOKS})
283+ self.assertIn(
284+ "Marked unit 'mysql/0' as resolved",
285+ self.output.getvalue())
286+
287+ @inlineCallbacks
288+ def test_relation_resolved(self):
289+ """
290+ 'ensemble relation <unit_name> <rel_name>' will schedule
291+ the broken unit relations for being resolved.
292+ """
293+ service_relation = yield self.get_named_service_relation(
294+ self.service1, "server")
295+
296+ self.setup_unit_relations(
297+ service_relation,
298+ (self.service_unit1, "down"),
299+ (self.service_unit2, "up"))
300+
301+ yield self.unit1_workflow.set_state("start_error")
302+ self.setup_exit(0)
303+ finished = self.setup_cli_reactor()
304+ self.mocker.replay()
305+
306+ self.assertEqual(
307+ (yield self.service_unit1.get_relation_resolved()), None)
308+
309+ main(["resolved", "--retry", "mysql/0",
310+ service_relation.relation_name])
311+ yield finished
312+
313+ self.assertEqual(
314+ (yield self.service_unit1.get_relation_resolved()),
315+ {service_relation.internal_relation_id: RETRY_HOOKS})
316+ self.assertEqual(
317+ (yield self.service_unit2.get_relation_resolved()),
318+ None)
319+ self.assertIn(
320+ "Marked unit 'mysql/0' relation 'server' as resolved",
321+ self.output.getvalue())
322+
323+ @inlineCallbacks
324+ def test_resolved_relation_some_already_resolved(self):
325+ """
326+ 'ensemble resolved <service_name> <rel_name>' will mark
327+ resolved all down units that are not already marked resolved.
328+ """
329+
330+ service2 = yield self.service_state_manager.get_service_state(
331+ "wordpress")
332+ service_unit1 = yield service2.add_unit_state()
333+
334+ service_relation = yield self.get_named_service_relation(
335+ service2, "db")
336+ yield self.setup_unit_relations(
337+ service_relation, (service_unit1, "down"))
338+
339+ service_relation2 = yield self.get_named_service_relation(
340+ service2, "cache")
341+ yield self.setup_unit_relations(
342+ service_relation2, (service_unit1, "down"))
343+
344+ yield service_unit1.set_relation_resolved(
345+ {service_relation.internal_relation_id: NO_HOOKS})
346+
347+ self.setup_exit(0)
348+ finished = self.setup_cli_reactor()
349+ self.mocker.replay()
350+
351+ main(["resolved", "--retry", "wordpress/0", "cache"])
352+ yield finished
353+
354+ self.assertEqual(
355+ (yield service_unit1.get_relation_resolved()),
356+ {service_relation.internal_relation_id: NO_HOOKS,
357+ service_relation2.internal_relation_id: RETRY_HOOKS})
358+
359+ self.assertIn(
360+ "Marked unit 'wordpress/0' relation 'cache' as resolved",
361+ self.output.getvalue())
362+
363+ @inlineCallbacks
364+ def test_resolved_relation_some_already_resolved_conflict(self):
365+ """
366+ 'ensemble resolved <service_name> <rel_name>' will mark
367+ resolved all down units that are not already marked resolved.
368+ """
369+
370+ service2 = yield self.service_state_manager.get_service_state(
371+ "wordpress")
372+ service_unit1 = yield service2.add_unit_state()
373+
374+ service_relation = yield self.get_named_service_relation(
375+ service2, "db")
376+ yield self.setup_unit_relations(
377+ service_relation, (service_unit1, "down"))
378+
379+ yield service_unit1.set_relation_resolved(
380+ {service_relation.internal_relation_id: NO_HOOKS})
381+
382+ self.setup_exit(0)
383+ finished = self.setup_cli_reactor()
384+ self.mocker.replay()
385+
386+ main(["resolved", "--retry", "wordpress/0", "db"])
387+ yield finished
388+
389+ self.assertEqual(
390+ (yield service_unit1.get_relation_resolved()),
391+ {service_relation.internal_relation_id: NO_HOOKS})
392+
393+ self.assertIn(
394+ "Service unit 'wordpress/0' already has relations marked as resol",
395+ self.output.getvalue())
396+
397+ @inlineCallbacks
398+ def test_resolved_unknown_service(self):
399+ """
400+ 'ensemble resolved <unit_name>' will report if a service is
401+ invalid.
402+ """
403+ self.setup_exit(0)
404+ finished = self.setup_cli_reactor()
405+ self.mocker.replay()
406+ main(["resolved", "zebra/0"])
407+ yield finished
408+ self.assertIn("Service 'zebra' was not found", self.stderr.getvalue())
409+
410+ @inlineCallbacks
411+ def test_resolved_unknown_unit(self):
412+ """
413+ 'ensemble resolved <unit_name>' will report if a unit is
414+ invalid.
415+ """
416+ self.setup_exit(0)
417+ finished = self.setup_cli_reactor()
418+ self.mocker.replay()
419+ main(["resolved", "mysql/5"])
420+ yield finished
421+ self.assertIn(
422+ "Service unit 'mysql/5' was not found", self.output.getvalue())
423+
424+ @inlineCallbacks
425+ def test_resolved_unknown_unit_relation(self):
426+ """
427+ 'ensemble resolved <unit_name>' will report if a relation is
428+ invalid.
429+ """
430+ self.setup_exit(0)
431+ finished = self.setup_cli_reactor()
432+ self.mocker.replay()
433+
434+ self.assertEqual(
435+ (yield self.service_unit1.get_resolved()), None)
436+
437+ main(["resolved", "mysql/0", "magic"])
438+ yield finished
439+
440+ self.assertIn("Relation not found", self.output.getvalue())
441+
442+ @inlineCallbacks
443+ def test_resolved_already_running(self):
444+ """
445+ 'ensemble resolved <unit_name>' will report if
446+ the unit is already running.
447+ """
448+ # Just verify we don't accidentally mark up another unit of the service
449+ unit2_workflow = UnitWorkflowState(
450+ self.client, self.service_unit2, None, self.makeDir())
451+ unit2_workflow.set_state("start_error")
452+
453+ self.setup_exit(0)
454+ finished = self.setup_cli_reactor()
455+ self.mocker.replay()
456+
457+ main(["resolved", "mysql/0"])
458+ yield finished
459+
460+ self.assertEqual(
461+ (yield self.service_unit2.get_resolved()), None)
462+ self.assertEqual(
463+ (yield self.service_unit1.get_resolved()), None)
464+
465+ self.assertNotIn(
466+ "Unit 'mysql/0 already running: started",
467+ self.output.getvalue())
468+
469+ @inlineCallbacks
470+ def test_resolved_already_resolved(self):
471+ """
472+ 'ensemble resolved <unit_name>' will report if
473+ the unit is already resolved.
474+ """
475+ # Mark the unit as resolved and as in an error state.
476+ yield self.service_unit1.set_resolved(RETRY_HOOKS)
477+ yield self.unit1_workflow.set_state("start_error")
478+
479+ unit2_workflow = UnitWorkflowState(
480+ self.client, self.service_unit1, None, self.makeDir())
481+ unit2_workflow.set_state("start_error")
482+
483+ self.assertEqual(
484+ (yield self.service_unit2.get_resolved()), None)
485+
486+ self.setup_exit(0)
487+ finished = self.setup_cli_reactor()
488+ self.mocker.replay()
489+
490+ main(["resolved", "mysql/0"])
491+ yield finished
492+
493+ self.assertEqual(
494+ (yield self.service_unit1.get_resolved()),
495+ {"retry": RETRY_HOOKS})
496+ self.assertNotIn(
497+ "Marked unit 'mysql/0' as resolved",
498+ self.output.getvalue())
499+ self.assertIn(
500+ "Service unit 'mysql/0' is already marked as resolved.",
501+ self.stderr.getvalue(), "")
502+
503+ @inlineCallbacks
504+ def test_resolved_relation_already_running(self):
505+ """
506+ 'ensemble resolved <unit_name> <rel_name>' will report
507+ if the relation is already running.
508+ """
509+ service2 = yield self.service_state_manager.get_service_state(
510+ "wordpress")
511+ service_unit1 = yield service2.add_unit_state()
512+
513+ service_relation = yield self.get_named_service_relation(
514+ service2, "db")
515+ yield self.setup_unit_relations(
516+ service_relation, (service_unit1, "up"))
517+
518+ self.setup_exit(0)
519+ finished = self.setup_cli_reactor()
520+ self.mocker.replay()
521+
522+ main(["resolved", "wordpress/0", "db"])
523+ yield finished
524+
525+ self.assertIn("Matched relations are all running",
526+ self.output.getvalue())
527+ self.assertEqual(
528+ (yield service_unit1.get_relation_resolved()),
529+ None)
530
531=== modified file 'ensemble/unit/lifecycle.py'
532--- ensemble/unit/lifecycle.py 2011-05-05 14:31:41 +0000
533+++ ensemble/unit/lifecycle.py 2011-05-05 16:55:55 +0000
534@@ -9,7 +9,7 @@
535 from ensemble.state.hook import RelationChange
536 from ensemble.state.errors import StopWatcher, UnitRelationStateNotFound
537
538-from ensemble.unit.workflow import RelationWorkflowState, RelationWorkflow
539+from ensemble.unit.workflow import RelationWorkflowState
540
541
542 HOOK_SOCKET_FILE = ".ensemble.hookcli.sock"

Subscribers

People subscribed via source and target branches

to status/vote changes: