Merge lp:~hazmat/pyjuju/ensemble-resolved into lp:pyjuju

Proposed by Kapil Thangavelu
Status: Merged
Approved by: Gustavo Niemeyer
Approved revision: 255
Merged at revision: 223
Proposed branch: lp:~hazmat/pyjuju/ensemble-resolved
Merge into: lp:pyjuju
Prerequisite: lp:~hazmat/pyjuju/resolved-state-api
Diff against target: 542 lines (+503/-1)
4 files modified
ensemble/control/__init__.py (+2/-0)
ensemble/control/resolved.py (+110/-0)
ensemble/control/tests/test_resolved.py (+390/-0)
ensemble/unit/lifecycle.py (+1/-1)
To merge this branch: bzr merge lp:~hazmat/pyjuju/ensemble-resolved
Reviewer Review Type Date Requested Status
Gustavo Niemeyer Approve
Review via email: mp+58607@code.launchpad.net

Description of the change

This adds a new command to ensemble to allow for manual correction of any errors that might occur in unit or unit relation hooks. The new command allows marking a unit or unit relation as needing to be resolved by the unit agent. It validates that the units are not running before scheduling them. It optionally allows specifying that hooks should run again.

To post a comment you must log in.
Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :

Looks good. Mostly just points we already talked about, plus moving
some of the error checking logic into the state API itself.

[1]

+ sub_parser = subparsers.add_parser("resolved", help=command.__doc__)

The nice docs are in the resolved function.

[2]

+ service_name, _ = unit_name.split("/")
+ service_state = yield service_manager.get_service_state(service_name)
+ unit_state = yield service_state.get_unit_state(unit_name)

Please invert the ordering here so that unit_state is retrieved
first. This will automatically validate the unit_name value
provided by the user, and avoid an ugly traceback in case the
user provides a bad unit_name value.

[3]

+ running, workflow_state = yield is_unit_running(client, unit_state)

It feels like that verification should be part of set_resolved(), since it
will never make sense to mark as resolved something which is already
running.

[4]

+ log.info("Unit %r already running:%s", unit_name, workflow_state)

s/running:%s/running: %s/

[5]

+ # Check for the matching relations
+ service_relations = yield relation_manager.get_relations_for_service(

As per our conversation, the mapping between user-provided relation names and
internal ids this should be moved into the state.

[6]

+ # Verify the relations are in need of resolution.
+ resolve_relations = {}

Same case as unit resolution: the error checking can be put within
set_relation_resolved() itself, and the command becomes a simple user
of the API.

review: Needs Fixing
Revision history for this message
Kapil Thangavelu (hazmat) wrote :

Excerpts from Gustavo Niemeyer's message of Tue Apr 26 19:30:57 UTC 2011:
> Review: Needs Fixing
> Looks good. Mostly just points we already talked about, plus moving
> some of the error checking logic into the state API itself.
>
>
> [1]
>
> + sub_parser = subparsers.add_parser("resolved", help=command.__doc__)
>
> The nice docs are in the resolved function.

Moved the long docs to subcommand description for resolved -h, updated the
command doc with a better summary.

>
> [2]
>
> + service_name, _ = unit_name.split("/")
> + service_state = yield service_manager.get_service_state(service_name)
> + unit_state = yield service_state.get_unit_state(unit_name)
>
> Please invert the ordering here so that unit_state is retrieved
> first. This will automatically validate the unit_name value
> provided by the user, and avoid an ugly traceback in case the
> user provides a bad unit_name value.
>

done.

[3-6]

as per irc discussions, moving the validation logic and internal-ids to inside
set-relation-resolved is eing deferred to another bug. The unit agent
implementation of resolved will need to verify values as well before acting
upon a unit/relation.

lp:~hazmat/pyjuju/ensemble-resolved updated
255. By Kapil Thangavelu

Merged resolved-state-api into ensemble-resolved.

Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :

This looks great, +1! Only one nit related to terminology:

[7]

+ # Verify the relations are in need of resolution.
+ resolve_relations = {}
(...)
+ # Mark the relations for resolution.
+ yield unit_state.set_relation_resolved(resolve_relations)

There's some terminology conflict here, in the spirit we debated in the IRC channel regarding the pre-requisite branch.

E.g.:

s/are in need of resolution/are not running/
s/resolve_relations/resolved_relations/
s/for resolution/as resolved/

review: Approve
lp:~hazmat/pyjuju/ensemble-resolved updated
256. By Kapil Thangavelu

Merged resolved-state-api into ensemble-resolved.

257. By Kapil Thangavelu

Merged resolved-state-api into ensemble-resolved.

258. By Kapil Thangavelu

Merged resolved-state-api into ensemble-resolved.

259. By Kapil Thangavelu

update code comments per review comment

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'ensemble/control/__init__.py'
--- ensemble/control/__init__.py 2011-05-03 08:54:13 +0000
+++ ensemble/control/__init__.py 2011-05-05 16:55:55 +0000
@@ -17,6 +17,7 @@
17import open_tunnel17import open_tunnel
18import remove_relation18import remove_relation
19import remove_unit19import remove_unit
20import resolved
20import status21import status
21import shutdown22import shutdown
22import ssh23import ssh
@@ -39,6 +40,7 @@
39 open_tunnel,40 open_tunnel,
40 remove_relation,41 remove_relation,
41 remove_unit,42 remove_unit,
43 resolved,
42 status,44 status,
43 ssh,45 ssh,
44 shutdown,46 shutdown,
4547
=== added file 'ensemble/control/resolved.py'
--- ensemble/control/resolved.py 1970-01-01 00:00:00 +0000
+++ ensemble/control/resolved.py 2011-05-05 16:55:55 +0000
@@ -0,0 +1,110 @@
1"""Implementation of resolved subcommand"""
2
3
4from twisted.internet.defer import inlineCallbacks, returnValue
5
6from ensemble.control.utils import get_environment
7
8from ensemble.state.service import ServiceStateManager, RETRY_HOOKS, NO_HOOKS
9from ensemble.state.relation import RelationStateManager
10from ensemble.state.errors import RelationStateNotFound
11from ensemble.unit.workflow import is_unit_running, is_relation_running
12
13
14def configure_subparser(subparsers):
15 """Configure resolved subcommand"""
16 sub_parser = subparsers.add_parser(
17 "resolved", help=command.__doc__, description=resolved.__doc__)
18 sub_parser.add_argument(
19 "--retry", "-r", action="store_true",
20 help="Retry failed hook."),
21 sub_parser.add_argument(
22 "--environment", "-e",
23 help="Ensemble environment to operate in.")
24 sub_parser.add_argument(
25 "service_unit_name",
26 help="Name of the service unit that should be resolved")
27 sub_parser.add_argument(
28 "relation_name", nargs="?", default=None,
29 help="Name of the unit relation that should be resolved")
30 return sub_parser
31
32
33def command(options):
34 """Mark an error as resolved in a unit or unit relation."""
35 environment = get_environment(options)
36 return resolved(
37 options.environments,
38 environment,
39 options.verbose,
40 options.log,
41 options.service_unit_name,
42 options.relation_name,
43 options.retry)
44
45
46@inlineCallbacks
47def resolved(
48 config, environment, verbose, log, unit_name, relation_name, retry):
49 """Mark an error as resolved in a unit or unit relation.
50
51 If one of a unit's formula non-relation hooks returns a non-zero exit
52 status, the entire unit can be considered to be in a non-running state.
53
54 As a resolution, the the unit can be manually returned a running state
55 via the ensemble resolved command. Optionally this command can also
56 rerun the failed hook.
57
58 This resolution also applies separately to each of the unit's relations.
59 If one of the relation-hooks failed. In that case there is no
60 notion of retrying (the change is gone), but resolving will allow
61 additional relation hooks for that relation to proceed.
62 """
63 provider = environment.get_machine_provider()
64 client = yield provider.connect()
65 service_manager = ServiceStateManager(client)
66 relation_manager = RelationStateManager(client)
67
68 unit_state = yield service_manager.get_unit_state(unit_name)
69 service_state = yield service_manager.get_service_state(
70 unit_name.split("/")[0])
71
72 retry = retry and RETRY_HOOKS or NO_HOOKS
73
74 if not relation_name:
75 running, workflow_state = yield is_unit_running(client, unit_state)
76 if running:
77 log.info("Unit %r already running: %s", unit_name, workflow_state)
78 client.close()
79 returnValue(False)
80
81 yield unit_state.set_resolved(retry)
82 log.info("Marked unit %r as resolved", unit_name)
83 returnValue(True)
84
85 # Check for the matching relations
86 service_relations = yield relation_manager.get_relations_for_service(
87 service_state)
88 service_relations = [
89 sr for sr in service_relations if sr.relation_name == relation_name]
90 if not service_relations:
91 raise RelationStateNotFound()
92
93 # Verify the relations are in need of resolution.
94 resolved_relations = {}
95 for service_relation in service_relations:
96 unit_relation = yield service_relation.get_unit_state(unit_state)
97 running, state = yield is_relation_running(client, unit_relation)
98 if not running:
99 resolved_relations[unit_relation.internal_relation_id] = retry
100
101 if not resolved_relations:
102 log.warning("Matched relations are all running")
103 client.close()
104 returnValue(False)
105
106 # Mark the relations as resolved.
107 yield unit_state.set_relation_resolved(resolved_relations)
108 log.info(
109 "Marked unit %r relation %r as resolved", unit_name, relation_name)
110 client.close()
0111
=== added file 'ensemble/control/tests/test_resolved.py'
--- ensemble/control/tests/test_resolved.py 1970-01-01 00:00:00 +0000
+++ ensemble/control/tests/test_resolved.py 2011-05-05 16:55:55 +0000
@@ -0,0 +1,390 @@
1from twisted.internet.defer import inlineCallbacks, returnValue
2from yaml import dump
3
4from ensemble.control import main
5from ensemble.control.resolved import resolved
6from ensemble.control.tests.common import ControlToolTest
7from ensemble.formula.tests.test_repository import RepositoryTestBase
8
9from ensemble.state.service import RETRY_HOOKS, NO_HOOKS
10from ensemble.state.tests.test_service import ServiceStateManagerTestBase
11from ensemble.state.errors import ServiceStateNotFound
12
13from ensemble.unit.workflow import UnitWorkflowState, RelationWorkflowState
14from ensemble.unit.lifecycle import UnitRelationLifecycle
15from ensemble.hooks.executor import HookExecutor
16
17
18class ControlResolvedTest(
19 ServiceStateManagerTestBase, ControlToolTest, RepositoryTestBase):
20
21 @inlineCallbacks
22 def setUp(self):
23 yield super(ControlResolvedTest, self).setUp()
24 config = {
25 "ensemble": "environments",
26 "environments": {"firstenv": {"type": "dummy"}}}
27
28 self.write_config(dump(config))
29 self.config.load()
30
31 yield self.add_relation_state("wordpress", "mysql")
32 yield self.add_relation_state("wordpress", "varnish")
33
34 self.service1 = yield self.service_state_manager.get_service_state(
35 "mysql")
36 self.service_unit1 = yield self.service1.add_unit_state()
37 self.service_unit2 = yield self.service1.add_unit_state()
38
39 self.unit1_workflow = UnitWorkflowState(
40 self.client, self.service_unit1, None, self.makeDir())
41 yield self.unit1_workflow.set_state("started")
42
43 self.environment = self.config.get_default()
44 self.provider = self.environment.get_machine_provider()
45
46 self.output = self.capture_logging()
47 self.stderr = self.capture_stream("stderr")
48 self.executor = HookExecutor()
49
50 @inlineCallbacks
51 def add_relation_state(self, *service_names):
52 for service_name in service_names:
53 try:
54 yield self.service_state_manager.get_service_state(
55 service_name)
56 except ServiceStateNotFound:
57 yield self.add_service_from_formula(service_name)
58
59 endpoint_pairs = yield self.service_state_manager.join_descriptors(
60 *service_names)
61 endpoints = endpoint_pairs[0]
62 endpoints = endpoint_pairs[0]
63 if endpoints[0] == endpoints[1]:
64 endpoints = endpoints[0:1]
65 relation_state = (yield self.relation_state_manager.add_relation_state(
66 *endpoints))[0]
67 returnValue(relation_state)
68
69 @inlineCallbacks
70 def get_named_service_relation(self, service_state, relation_name):
71 if isinstance(service_state, str):
72 service_state = yield self.service_state_manager.get_service_state(
73 service_state)
74
75 rels = yield self.relation_state_manager.get_relations_for_service(
76 service_state)
77
78 rels = [sr for sr in rels if sr.relation_name == relation_name]
79 if len(rels) == 1:
80 returnValue(rels[0])
81 returnValue(rels)
82
83 @inlineCallbacks
84 def setup_unit_relations(self, service_relation, *units):
85 """
86 Given a service relation and set of unit tuples in the form
87 unit_state, unit_relation_workflow_state, will add unit relations
88 for these units and update their workflow state to the desired/given
89 state.
90 """
91 for unit, state in units:
92 unit_relation = yield service_relation.add_unit_state(unit)
93 lifecycle = UnitRelationLifecycle(
94 self.client, unit_relation, service_relation.relation_name,
95 self.makeDir(), self.executor)
96 workflow_state = RelationWorkflowState(
97 self.client, unit_relation, lifecycle, self.makeDir())
98 yield workflow_state.set_state(state)
99
100 @inlineCallbacks
101 def test_resolved(self):
102 """
103 'ensemble resolved <unit_name>' will schedule a unit for
104 retrying from an error state.
105 """
106 # Push the unit into an error state
107 yield self.unit1_workflow.set_state("start_error")
108 self.setup_exit(0)
109 finished = self.setup_cli_reactor()
110 self.mocker.replay()
111
112 self.assertEqual(
113 (yield self.service_unit1.get_resolved()), None)
114
115 main(["resolved", "mysql/0"])
116 yield finished
117
118 self.assertEqual(
119 (yield self.service_unit1.get_resolved()), {"retry": NO_HOOKS})
120 self.assertIn(
121 "Marked unit 'mysql/0' as resolved",
122 self.output.getvalue())
123
124 @inlineCallbacks
125 def test_resolved_retry(self):
126 """
127 'ensemble resolved --retry <unit_name>' will schedule a unit
128 for retrying from an error state with a retry of hooks
129 executions.
130 """
131 yield self.unit1_workflow.set_state("start_error")
132 self.setup_exit(0)
133 finished = self.setup_cli_reactor()
134 self.mocker.replay()
135
136 self.assertEqual(
137 (yield self.service_unit1.get_resolved()), None)
138
139 main(["resolved", "--retry", "mysql/0"])
140 yield finished
141
142 self.assertEqual(
143 (yield self.service_unit1.get_resolved()), {"retry": RETRY_HOOKS})
144 self.assertIn(
145 "Marked unit 'mysql/0' as resolved",
146 self.output.getvalue())
147
148 @inlineCallbacks
149 def test_relation_resolved(self):
150 """
151 'ensemble relation <unit_name> <rel_name>' will schedule
152 the broken unit relations for being resolved.
153 """
154 service_relation = yield self.get_named_service_relation(
155 self.service1, "server")
156
157 self.setup_unit_relations(
158 service_relation,
159 (self.service_unit1, "down"),
160 (self.service_unit2, "up"))
161
162 yield self.unit1_workflow.set_state("start_error")
163 self.setup_exit(0)
164 finished = self.setup_cli_reactor()
165 self.mocker.replay()
166
167 self.assertEqual(
168 (yield self.service_unit1.get_relation_resolved()), None)
169
170 main(["resolved", "--retry", "mysql/0",
171 service_relation.relation_name])
172 yield finished
173
174 self.assertEqual(
175 (yield self.service_unit1.get_relation_resolved()),
176 {service_relation.internal_relation_id: RETRY_HOOKS})
177 self.assertEqual(
178 (yield self.service_unit2.get_relation_resolved()),
179 None)
180 self.assertIn(
181 "Marked unit 'mysql/0' relation 'server' as resolved",
182 self.output.getvalue())
183
184 @inlineCallbacks
185 def test_resolved_relation_some_already_resolved(self):
186 """
187 'ensemble resolved <service_name> <rel_name>' will mark
188 resolved all down units that are not already marked resolved.
189 """
190
191 service2 = yield self.service_state_manager.get_service_state(
192 "wordpress")
193 service_unit1 = yield service2.add_unit_state()
194
195 service_relation = yield self.get_named_service_relation(
196 service2, "db")
197 yield self.setup_unit_relations(
198 service_relation, (service_unit1, "down"))
199
200 service_relation2 = yield self.get_named_service_relation(
201 service2, "cache")
202 yield self.setup_unit_relations(
203 service_relation2, (service_unit1, "down"))
204
205 yield service_unit1.set_relation_resolved(
206 {service_relation.internal_relation_id: NO_HOOKS})
207
208 self.setup_exit(0)
209 finished = self.setup_cli_reactor()
210 self.mocker.replay()
211
212 main(["resolved", "--retry", "wordpress/0", "cache"])
213 yield finished
214
215 self.assertEqual(
216 (yield service_unit1.get_relation_resolved()),
217 {service_relation.internal_relation_id: NO_HOOKS,
218 service_relation2.internal_relation_id: RETRY_HOOKS})
219
220 self.assertIn(
221 "Marked unit 'wordpress/0' relation 'cache' as resolved",
222 self.output.getvalue())
223
224 @inlineCallbacks
225 def test_resolved_relation_some_already_resolved_conflict(self):
226 """
227 'ensemble resolved <service_name> <rel_name>' will mark
228 resolved all down units that are not already marked resolved.
229 """
230
231 service2 = yield self.service_state_manager.get_service_state(
232 "wordpress")
233 service_unit1 = yield service2.add_unit_state()
234
235 service_relation = yield self.get_named_service_relation(
236 service2, "db")
237 yield self.setup_unit_relations(
238 service_relation, (service_unit1, "down"))
239
240 yield service_unit1.set_relation_resolved(
241 {service_relation.internal_relation_id: NO_HOOKS})
242
243 self.setup_exit(0)
244 finished = self.setup_cli_reactor()
245 self.mocker.replay()
246
247 main(["resolved", "--retry", "wordpress/0", "db"])
248 yield finished
249
250 self.assertEqual(
251 (yield service_unit1.get_relation_resolved()),
252 {service_relation.internal_relation_id: NO_HOOKS})
253
254 self.assertIn(
255 "Service unit 'wordpress/0' already has relations marked as resol",
256 self.output.getvalue())
257
258 @inlineCallbacks
259 def test_resolved_unknown_service(self):
260 """
261 'ensemble resolved <unit_name>' will report if a service is
262 invalid.
263 """
264 self.setup_exit(0)
265 finished = self.setup_cli_reactor()
266 self.mocker.replay()
267 main(["resolved", "zebra/0"])
268 yield finished
269 self.assertIn("Service 'zebra' was not found", self.stderr.getvalue())
270
271 @inlineCallbacks
272 def test_resolved_unknown_unit(self):
273 """
274 'ensemble resolved <unit_name>' will report if a unit is
275 invalid.
276 """
277 self.setup_exit(0)
278 finished = self.setup_cli_reactor()
279 self.mocker.replay()
280 main(["resolved", "mysql/5"])
281 yield finished
282 self.assertIn(
283 "Service unit 'mysql/5' was not found", self.output.getvalue())
284
285 @inlineCallbacks
286 def test_resolved_unknown_unit_relation(self):
287 """
288 'ensemble resolved <unit_name>' will report if a relation is
289 invalid.
290 """
291 self.setup_exit(0)
292 finished = self.setup_cli_reactor()
293 self.mocker.replay()
294
295 self.assertEqual(
296 (yield self.service_unit1.get_resolved()), None)
297
298 main(["resolved", "mysql/0", "magic"])
299 yield finished
300
301 self.assertIn("Relation not found", self.output.getvalue())
302
303 @inlineCallbacks
304 def test_resolved_already_running(self):
305 """
306 'ensemble resolved <unit_name>' will report if
307 the unit is already running.
308 """
309 # Just verify we don't accidentally mark up another unit of the service
310 unit2_workflow = UnitWorkflowState(
311 self.client, self.service_unit2, None, self.makeDir())
312 unit2_workflow.set_state("start_error")
313
314 self.setup_exit(0)
315 finished = self.setup_cli_reactor()
316 self.mocker.replay()
317
318 main(["resolved", "mysql/0"])
319 yield finished
320
321 self.assertEqual(
322 (yield self.service_unit2.get_resolved()), None)
323 self.assertEqual(
324 (yield self.service_unit1.get_resolved()), None)
325
326 self.assertNotIn(
327 "Unit 'mysql/0 already running: started",
328 self.output.getvalue())
329
330 @inlineCallbacks
331 def test_resolved_already_resolved(self):
332 """
333 'ensemble resolved <unit_name>' will report if
334 the unit is already resolved.
335 """
336 # Mark the unit as resolved and as in an error state.
337 yield self.service_unit1.set_resolved(RETRY_HOOKS)
338 yield self.unit1_workflow.set_state("start_error")
339
340 unit2_workflow = UnitWorkflowState(
341 self.client, self.service_unit1, None, self.makeDir())
342 unit2_workflow.set_state("start_error")
343
344 self.assertEqual(
345 (yield self.service_unit2.get_resolved()), None)
346
347 self.setup_exit(0)
348 finished = self.setup_cli_reactor()
349 self.mocker.replay()
350
351 main(["resolved", "mysql/0"])
352 yield finished
353
354 self.assertEqual(
355 (yield self.service_unit1.get_resolved()),
356 {"retry": RETRY_HOOKS})
357 self.assertNotIn(
358 "Marked unit 'mysql/0' as resolved",
359 self.output.getvalue())
360 self.assertIn(
361 "Service unit 'mysql/0' is already marked as resolved.",
362 self.stderr.getvalue(), "")
363
364 @inlineCallbacks
365 def test_resolved_relation_already_running(self):
366 """
367 'ensemble resolved <unit_name> <rel_name>' will report
368 if the relation is already running.
369 """
370 service2 = yield self.service_state_manager.get_service_state(
371 "wordpress")
372 service_unit1 = yield service2.add_unit_state()
373
374 service_relation = yield self.get_named_service_relation(
375 service2, "db")
376 yield self.setup_unit_relations(
377 service_relation, (service_unit1, "up"))
378
379 self.setup_exit(0)
380 finished = self.setup_cli_reactor()
381 self.mocker.replay()
382
383 main(["resolved", "wordpress/0", "db"])
384 yield finished
385
386 self.assertIn("Matched relations are all running",
387 self.output.getvalue())
388 self.assertEqual(
389 (yield service_unit1.get_relation_resolved()),
390 None)
0391
=== modified file 'ensemble/unit/lifecycle.py'
--- ensemble/unit/lifecycle.py 2011-05-05 14:31:41 +0000
+++ ensemble/unit/lifecycle.py 2011-05-05 16:55:55 +0000
@@ -9,7 +9,7 @@
9from ensemble.state.hook import RelationChange9from ensemble.state.hook import RelationChange
10from ensemble.state.errors import StopWatcher, UnitRelationStateNotFound10from ensemble.state.errors import StopWatcher, UnitRelationStateNotFound
1111
12from ensemble.unit.workflow import RelationWorkflowState, RelationWorkflow12from ensemble.unit.workflow import RelationWorkflowState
1313
1414
15HOOK_SOCKET_FILE = ".ensemble.hookcli.sock"15HOOK_SOCKET_FILE = ".ensemble.hookcli.sock"

Subscribers

People subscribed via source and target branches

to status/vote changes: