Merge lp:~hazmat/pyjuju/unit-agent-resolved into lp:pyjuju
- unit-agent-resolved
- Merge into trunk
Status: | Merged |
---|---|
Approved by: | Gustavo Niemeyer |
Approved revision: | 275 |
Merged at revision: | 224 |
Proposed branch: | lp:~hazmat/pyjuju/unit-agent-resolved |
Merge into: | lp:pyjuju |
Prerequisite: | lp:~hazmat/pyjuju/ensemble-resolved |
Diff against target: |
1276 lines (+718/-94) 7 files modified
ensemble/control/tests/test_resolved.py (+0/-1) ensemble/state/service.py (+18/-5) ensemble/state/tests/test_service.py (+71/-1) ensemble/unit/lifecycle.py (+85/-21) ensemble/unit/tests/test_lifecycle.py (+242/-3) ensemble/unit/tests/test_workflow.py (+154/-12) ensemble/unit/workflow.py (+148/-51) |
To merge this branch: | bzr merge lp:~hazmat/pyjuju/unit-agent-resolved |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Gustavo Niemeyer | Approve | ||
Review via email: mp+59713@code.launchpad.net |
This proposal supersedes a proposal from 2011-05-02.
Commit message
Description of the change
This implements the unit lifecycle resolving unit relations, and additional changes to enable resolution (transition actions receiving hook execution flags).
The branch is large enough that i'm going to split the remaining unit agent resolving unit relations into another branch.
Gustavo Niemeyer (niemeyer) wrote : | # |
Gustavo Niemeyer (niemeyer) wrote : | # |
[1]
+ def do_retry_
+ return self._invoke_
+ self._lifecycle
We've already debated this live over a voice call, and had already
covered the topic in a previous conversation: I think it is a mistake
to introduce variables which define how the transition should work.
This is increasing the complexity of actions in an unpredictable way
(a single action with parameters could handle *all* the possible
transitions in the state machine).
Kapil Thangavelu (hazmat) wrote : | # |
Changed over to using additional transitions, its a bit nicer now, thanks.
Gustavo Niemeyer (niemeyer) wrote : | # |
Nice, thanks for the changes. Looking good!
+1, taking these in consideration:
[2]
+ def do_retry_
The fire_hooks seems to be a left over.
[3]
+ return self._invoke_
+ self._lifecycle
+ lambda x: self._invoke_
+ self._lifecycle
(...)
+ return self._invoke_
+ self._lifecycle
+ lambda x: self._invoke_
+ self._lifecycle
Breaking these down would make them more readable than the huge one-liners.
[4]
+# Another interesting issue, process recovery using the on disk state,
+# is complicated by consistency to the the in memory state, which
+# won't be directly recoverable anymore without some state specific
Sweet. Thanks for capturing those ideas.
[5]
+ # If the unit lifecycle isn't running we shouldn't process
+ # any relation resolutions.
+ if not self._running:
+ self._log.
+ self._watching_
+ raise StopWatcher()
It's not clear what's the intention here, and test coverage also
looks a bit poor in this area (e.g. replacing everything under the "if" by
"return" still passes tests fine).
What do we want to happen in this case, and what's the rationale
behind it?
[6]
+ if self._client.
+ yield self._process_
What's the "if connected" test about? It feels like pretty much every
interaction with zk could have such a test. Why is this location
special?
Kapil Thangavelu (hazmat) wrote : | # |
Excerpts from Gustavo Niemeyer's message of Thu May 05 02:01:15 UTC 2011:
> Review: Approve
> Nice, thanks for the changes. Looking good!
>
> +1, taking these in consideration:
>
> [5]
>
> + # If the unit lifecycle isn't running we shouldn't process
> + # any relation resolutions.
> + if not self._running:
> + self._log.
> + self._watching_
> + raise StopWatcher()
>
> It's not clear what's the intention here, and test coverage also
> looks a bit poor in this area (e.g. replacing everything under the "if" by
> "return" still passes tests fine).
>
> What do we want to happen in this case, and what's the rationale
> behind it?
>
The notion is if we're not running, the watcher terminates, when we start watching
again the watcher is restarted, if it doesn't already exist. Its designed to
avoid concurrent watchers, and to allow for watch termination.
The design rationale being if a unit is not running, it needs to be resolved before
its unit relations can be fixed.
Being able to correctly tests runs into another behavior on the unit lifecycle,
where we automatically repair unit relations when the unit is started. I've
introduced a separate error state for relations in this branch, so we can choose
not to have unit relation errors automatically recovered, in which case the logic
in question can be tested easily, by bringing the unit back online, and the verifying
the resolve action was performed.
>
> [6]
>
> + if self._client.
> + yield self._process_
>
> What's the "if connected" test about? It feels like pretty much every
> interaction with zk could have such a test. Why is this location
> special?
>
Its typically done in watch callbacks, such that an async background execution while the test
is closing, does not trigger a zookeeper closing exception, and minimizes errors.
- 276. By Kapil Thangavelu
-
Merged ensemble-resolved into unit-agent-
resolved. - 277. By Kapil Thangavelu
-
Merged ensemble-resolved into unit-agent-
resolved. - 278. By Kapil Thangavelu
-
Merged ensemble-resolved into unit-agent-
resolved. - 279. By Kapil Thangavelu
-
address review comments re formatting
Kapil Thangavelu (hazmat) wrote : | # |
Excerpts from Kapil Thangavelu's message of Thu May 05 13:26:02 -0400 2011:
> Excerpts from Gustavo Niemeyer's message of Thu May 05 02:01:15 UTC 2011:
> > Review: Approve
> > Nice, thanks for the changes. Looking good!
> >
> > +1, taking these in consideration:
> >
> > [5]
> >
> > + # If the unit lifecycle isn't running we shouldn't process
> > + # any relation resolutions.
> > + if not self._running:
> > + self._log.
> > + self._watching_
> > + raise StopWatcher()
> >
> > It's not clear what's the intention here, and test coverage also
> > looks a bit poor in this area (e.g. replacing everything under the "if" by
> > "return" still passes tests fine).
> >
> > What do we want to happen in this case, and what's the rationale
> > behind it?
> >
>
> The notion is if we're not running, the watcher terminates, when we start watching
> again the watcher is restarted, if it doesn't already exist. Its designed to
> avoid concurrent watchers, and to allow for watch termination.
>
> The design rationale being if a unit is not running, it needs to be resolved before
> its unit relations can be fixed.
>
> Being able to correctly tests runs into another behavior on the unit lifecycle,
> where we automatically repair unit relations when the unit is started. I've
> introduced a separate error state for relations in this branch, so we can choose
> not to have unit relation errors automatically recovered, in which case the logic
> in question can be tested easily, by bringing the unit back online, and the verifying
> the resolve action was performed.
Just to be clear this is a test for the overall functionality in
test_resolved_
But it because of the above constraint regarding automatic recover on start, it can
only verify that the watcher has stopped, not that the recovery proceeds normally
after restart, so it instead verifies that the resolved setting has persisted
past the end of the watcher.
Preview Diff
1 | === modified file 'ensemble/control/tests/test_resolved.py' |
2 | --- ensemble/control/tests/test_resolved.py 2011-04-28 17:09:36 +0000 |
3 | +++ ensemble/control/tests/test_resolved.py 2011-05-05 17:43:23 +0000 |
4 | @@ -2,7 +2,6 @@ |
5 | from yaml import dump |
6 | |
7 | from ensemble.control import main |
8 | -from ensemble.control.resolved import resolved |
9 | from ensemble.control.tests.common import ControlToolTest |
10 | from ensemble.formula.tests.test_repository import RepositoryTestBase |
11 | |
12 | |
13 | === modified file 'ensemble/state/service.py' |
14 | --- ensemble/state/service.py 2011-05-05 16:18:34 +0000 |
15 | +++ ensemble/state/service.py 2011-05-05 17:43:23 +0000 |
16 | @@ -16,11 +16,10 @@ |
17 | ServiceUnitStateMachineAlreadyAssigned, ServiceStateNameInUse, |
18 | BadDescriptor, BadServiceStateName, NoUnusedMachines, |
19 | ServiceUnitDebugAlreadyEnabled, ServiceUnitResolvedAlreadyEnabled, |
20 | - ServiceUnitRelationResolvedAlreadyEnabled) |
21 | + ServiceUnitRelationResolvedAlreadyEnabled, StopWatcher) |
22 | from ensemble.state.formula import FormulaStateManager |
23 | from ensemble.state.relation import ServiceRelationState, RelationStateManager |
24 | from ensemble.state.machine import _public_machine_id, MachineState |
25 | - |
26 | from ensemble.state.utils import remove_tree, dict_merge, YAMLState |
27 | |
28 | RETRY_HOOKS = 1000 |
29 | @@ -516,7 +515,7 @@ |
30 | its `write` method invoked to publish the state to Zookeeper. |
31 | """ |
32 | config_node = YAMLState(self._client, |
33 | - "/services/%s/config" % self._internal_id ) |
34 | + "/services/%s/config" % self._internal_id) |
35 | yield config_node.read() |
36 | returnValue(config_node) |
37 | |
38 | @@ -944,9 +943,13 @@ |
39 | def watcher(change_event): |
40 | if not self._client.connected: |
41 | returnValue(None) |
42 | + |
43 | exists_d, watch_d = self._client.exists_and_watch( |
44 | self._unit_resolve_path) |
45 | - yield callback(change_event) |
46 | + try: |
47 | + yield callback(change_event) |
48 | + except StopWatcher: |
49 | + returnValue(None) |
50 | watch_d.addCallback(watcher) |
51 | |
52 | exists_d, watch_d = self._client.exists_and_watch( |
53 | @@ -958,6 +961,8 @@ |
54 | callback_d = maybeDeferred(callback, bool(exists)) |
55 | callback_d.addCallback( |
56 | lambda x: watch_d.addCallback(watcher) and x) |
57 | + callback_d.addErrback( |
58 | + lambda failure: failure.trap(StopWatcher)) |
59 | |
60 | @property |
61 | def _relation_resolved_path(self): |
62 | @@ -1033,6 +1038,8 @@ |
63 | |
64 | @inlineCallbacks |
65 | def clear_relation_resolved(self): |
66 | + """ Clear the relation resolved setting. |
67 | + """ |
68 | try: |
69 | yield self._client.delete(self._relation_resolved_path) |
70 | except zookeeper.NoNodeException: |
71 | @@ -1055,7 +1062,11 @@ |
72 | returnValue(None) |
73 | exists_d, watch_d = self._client.exists_and_watch( |
74 | self._relation_resolved_path) |
75 | - yield callback(change_event) |
76 | + try: |
77 | + yield callback(change_event) |
78 | + except StopWatcher: |
79 | + returnValue(None) |
80 | + |
81 | watch_d.addCallback(watcher) |
82 | |
83 | exists_d, watch_d = self._client.exists_and_watch( |
84 | @@ -1068,6 +1079,8 @@ |
85 | callback_d = maybeDeferred(callback, bool(exists)) |
86 | callback_d.addCallback( |
87 | lambda x: watch_d.addCallback(watcher) and x) |
88 | + callback_d.addErrback( |
89 | + lambda failure: failure.trap(StopWatcher)) |
90 | |
91 | |
92 | def _parse_unit_name(unit_name): |
93 | |
94 | === modified file 'ensemble/state/tests/test_service.py' |
95 | --- ensemble/state/tests/test_service.py 2011-05-05 14:31:41 +0000 |
96 | +++ ensemble/state/tests/test_service.py 2011-05-05 17:43:23 +0000 |
97 | @@ -17,7 +17,7 @@ |
98 | ServiceUnitStateMachineAlreadyAssigned, ServiceStateNameInUse, |
99 | BadDescriptor, BadServiceStateName, ServiceUnitDebugAlreadyEnabled, |
100 | MachineStateNotFound, NoUnusedMachines, ServiceUnitResolvedAlreadyEnabled, |
101 | - ServiceUnitRelationResolvedAlreadyEnabled) |
102 | + ServiceUnitRelationResolvedAlreadyEnabled, StopWatcher) |
103 | |
104 | |
105 | from ensemble.state.tests.common import StateTestBase |
106 | @@ -790,6 +790,42 @@ |
107 | {"retry": NO_HOOKS}) |
108 | |
109 | @inlineCallbacks |
110 | + def test_stop_watch_resolved(self): |
111 | + """A unit resolved watch can be instituted on a permanent basis. |
112 | + |
113 | + However the callback can raise StopWatcher at anytime to stop the watch |
114 | + """ |
115 | + unit_state = yield self.get_unit_state() |
116 | + |
117 | + results = [] |
118 | + |
119 | + def callback(value): |
120 | + results.append(value) |
121 | + if len(results) == 1: |
122 | + raise StopWatcher() |
123 | + if len(results) == 3: |
124 | + raise StopWatcher() |
125 | + |
126 | + unit_state.watch_resolved(callback) |
127 | + yield unit_state.set_resolved(RETRY_HOOKS) |
128 | + yield unit_state.clear_resolved() |
129 | + yield self.poke_zk() |
130 | + |
131 | + unit_state.watch_resolved(callback) |
132 | + yield unit_state.set_resolved(NO_HOOKS) |
133 | + yield unit_state.clear_resolved() |
134 | + |
135 | + yield self.poke_zk() |
136 | + |
137 | + self.assertEqual(len(results), 3) |
138 | + self.assertIdentical(results.pop(0), False) |
139 | + self.assertIdentical(results.pop(0), False) |
140 | + self.assertEqual(results.pop(0).type_name, "created") |
141 | + |
142 | + self.assertEqual( |
143 | + (yield unit_state.get_resolved()), None) |
144 | + |
145 | + @inlineCallbacks |
146 | def test_get_set_clear_relation_resolved(self): |
147 | """The a unit's realtions can be set to resolved to mark a |
148 | future transition, with an optional retry flag.""" |
149 | @@ -850,6 +886,40 @@ |
150 | {"0": NO_HOOKS}) |
151 | |
152 | @inlineCallbacks |
153 | + def test_stop_watch_relation_resolved(self): |
154 | + """A unit resolved watch can be instituted on a permanent basis.""" |
155 | + unit_state = yield self.get_unit_state() |
156 | + |
157 | + results = [] |
158 | + |
159 | + def callback(value): |
160 | + results.append(value) |
161 | + |
162 | + if len(results) == 1: |
163 | + raise StopWatcher() |
164 | + |
165 | + if len(results) == 3: |
166 | + raise StopWatcher() |
167 | + |
168 | + unit_state.watch_relation_resolved(callback) |
169 | + yield unit_state.set_relation_resolved({"0": RETRY_HOOKS}) |
170 | + yield unit_state.clear_relation_resolved() |
171 | + yield self.poke_zk() |
172 | + self.assertEqual(len(results), 1) |
173 | + |
174 | + unit_state.watch_relation_resolved(callback) |
175 | + yield unit_state.set_relation_resolved({"0": RETRY_HOOKS}) |
176 | + yield unit_state.clear_relation_resolved() |
177 | + yield self.poke_zk() |
178 | + self.assertEqual(len(results), 3) |
179 | + self.assertIdentical(results.pop(0), False) |
180 | + self.assertIdentical(results.pop(0), False) |
181 | + self.assertEqual(results.pop(0).type_name, "created") |
182 | + |
183 | + self.assertEqual( |
184 | + (yield unit_state.get_relation_resolved()), None) |
185 | + |
186 | + @inlineCallbacks |
187 | def test_watch_resolved_slow_callback(self): |
188 | """A slow watch callback is still invoked serially.""" |
189 | unit_state = yield self.get_unit_state() |
190 | |
191 | === modified file 'ensemble/unit/lifecycle.py' |
192 | --- ensemble/unit/lifecycle.py 2011-05-05 14:31:43 +0000 |
193 | +++ ensemble/unit/lifecycle.py 2011-05-05 17:43:23 +0000 |
194 | @@ -2,7 +2,7 @@ |
195 | import logging |
196 | |
197 | from twisted.internet.defer import ( |
198 | - inlineCallbacks, DeferredLock) |
199 | + inlineCallbacks, DeferredLock, returnValue) |
200 | |
201 | from ensemble.hooks.invoker import Invoker |
202 | from ensemble.hooks.scheduler import HookScheduler |
203 | @@ -32,7 +32,8 @@ |
204 | self._unit_path = unit_path |
205 | self._relations = {} |
206 | self._running = False |
207 | - self._watching = False |
208 | + self._watching_relation_memberships = False |
209 | + self._watching_relation_resolved = False |
210 | self._run_lock = DeferredLock() |
211 | self._log = logging.getLogger("unit.lifecycle") |
212 | |
213 | @@ -45,21 +46,23 @@ |
214 | return self._relations[relation_id] |
215 | |
216 | @inlineCallbacks |
217 | - def install(self): |
218 | + def install(self, fire_hooks=True): |
219 | """Invoke the unit's install hook. |
220 | """ |
221 | - yield self._execute_hook("install") |
222 | + if fire_hooks: |
223 | + yield self._execute_hook("install") |
224 | |
225 | @inlineCallbacks |
226 | - def upgrade_formula(self): |
227 | + def upgrade_formula(self, fire_hooks=True): |
228 | """Invoke the unit's upgrade-formula hook. |
229 | """ |
230 | - yield self._execute_hook("upgrade-formula", now=True) |
231 | + if fire_hooks: |
232 | + yield self._execute_hook("upgrade-formula", now=True) |
233 | # Restart hook queued hook execution. |
234 | self._executor.start() |
235 | |
236 | @inlineCallbacks |
237 | - def start(self): |
238 | + def start(self, fire_hooks=True): |
239 | """Invoke the start hook, and setup relation watching. |
240 | """ |
241 | self._log.debug("pre-start acquire, running:%s", self._running) |
242 | @@ -70,22 +73,29 @@ |
243 | assert not self._running, "Already started" |
244 | |
245 | # Execute the start hook |
246 | - yield self._execute_hook("start") |
247 | + if fire_hooks: |
248 | + yield self._execute_hook("start") |
249 | |
250 | # If we have any existing relations in memory, start them. |
251 | if self._relations: |
252 | self._log.debug("starting relation lifecycles") |
253 | |
254 | for workflow in self._relations.values(): |
255 | - # should not transition an |
256 | yield workflow.transition_state("up") |
257 | |
258 | # Establish a watch on the existing relations. |
259 | - if not self._watching: |
260 | + if not self._watching_relation_memberships: |
261 | self._log.debug("starting service relation watch") |
262 | yield self._service.watch_relation_states( |
263 | self._on_service_relation_changes) |
264 | - self._watching = True |
265 | + self._watching_relation_memberships = True |
266 | + |
267 | + # Establish a watch for resolved relations |
268 | + if not self._watching_relation_resolved: |
269 | + self._log.debug("starting unit relation resolved watch") |
270 | + yield self._unit.watch_relation_resolved( |
271 | + self._on_relation_resolved_changes) |
272 | + self._watching_relation_resolved = True |
273 | |
274 | # Set current status |
275 | self._running = True |
276 | @@ -94,7 +104,7 @@ |
277 | self._log.debug("started unit lifecycle") |
278 | |
279 | @inlineCallbacks |
280 | - def stop(self): |
281 | + def stop(self, fire_hooks=True): |
282 | """Stop the unit, executes the stop hook, and stops relation watching. |
283 | """ |
284 | self._log.debug("pre-stop acquire, running:%s", self._running) |
285 | @@ -110,7 +120,8 @@ |
286 | for workflow in self._relations.values(): |
287 | yield workflow.transition_state("down") |
288 | |
289 | - yield self._execute_hook("stop") |
290 | + if fire_hooks: |
291 | + yield self._execute_hook("stop") |
292 | |
293 | # Set current status |
294 | self._running = False |
295 | @@ -119,9 +130,11 @@ |
296 | self._log.debug("stopped unit lifecycle") |
297 | |
298 | @inlineCallbacks |
299 | - def configure(self): |
300 | + def configure(self, fire_hooks=True): |
301 | """Inform the unit that its service config has changed. |
302 | """ |
303 | + if not fire_hooks: |
304 | + returnValue(None) |
305 | yield self._run_lock.acquire() |
306 | try: |
307 | # Verify State |
308 | @@ -134,6 +147,49 @@ |
309 | self._log.debug("configured unit") |
310 | |
311 | @inlineCallbacks |
312 | + def _on_relation_resolved_changes(self, event): |
313 | + """Callback for unit relation resolved watching. |
314 | + |
315 | + The callback is invoked whenever the relation resolved |
316 | + settings change. |
317 | + """ |
318 | + self._log.debug("relation resolved changed") |
319 | + # Acquire the run lock, and process the changes. |
320 | + yield self._run_lock.acquire() |
321 | + |
322 | + try: |
323 | + # If the unit lifecycle isn't running we shouldn't process |
324 | + # any relation resolutions. |
325 | + if not self._running: |
326 | + self._log.debug("stop watch relation resolved changes") |
327 | + self._watching_relation_resolved = False |
328 | + raise StopWatcher() |
329 | + |
330 | + self._log.info("processing relation resolved changed") |
331 | + if self._client.connected: |
332 | + yield self._process_relation_resolved_changes() |
333 | + finally: |
334 | + yield self._run_lock.release() |
335 | + |
336 | + @inlineCallbacks |
337 | + def _process_relation_resolved_changes(self): |
338 | + """Invoke retry transitions on relations if their not running. |
339 | + """ |
340 | + relation_resolved = yield self._unit.get_relation_resolved() |
341 | + if relation_resolved is None: |
342 | + returnValue(None) |
343 | + else: |
344 | + yield self._unit.clear_relation_resolved() |
345 | + |
346 | + keys = set(relation_resolved).intersection(self._relations) |
347 | + for rel_id in keys: |
348 | + relation_workflow = self._relations[rel_id] |
349 | + relation_state = yield relation_workflow.get_state() |
350 | + if relation_state == "up": |
351 | + continue |
352 | + yield relation_workflow.transition_state("up") |
353 | + |
354 | + @inlineCallbacks |
355 | def _on_service_relation_changes(self, old_relations, new_relations): |
356 | """Callback for service relation watching. |
357 | |
358 | @@ -153,7 +209,7 @@ |
359 | # If the lifecycle is not running, then stop the watcher |
360 | if not self._running: |
361 | self._log.debug("stop service-rel watcher, discarding changes") |
362 | - self._watching = False |
363 | + self._watching_relation_memberships = False |
364 | raise StopWatcher() |
365 | |
366 | self._log.debug("processing relations changed") |
367 | @@ -163,9 +219,9 @@ |
368 | |
369 | @inlineCallbacks |
370 | def _process_service_changes(self, old_relations, new_relations): |
371 | - """Add and remove unit lifecycles per the service relations changes. |
372 | + """Add and remove unit lifecycles per the service relations Determine. |
373 | """ |
374 | - # Determine relation delta of global zk state with our memory state. |
375 | + # changes relation delta of global zk state with our memory state. |
376 | new_relations = dict([(service_relation.internal_relation_id, |
377 | service_relation) for |
378 | service_relation in new_relations]) |
379 | @@ -352,8 +408,11 @@ |
380 | self._error_handler = handler |
381 | |
382 | @inlineCallbacks |
383 | - def start(self): |
384 | + def start(self, watches=True): |
385 | """Start watching related units and executing change hooks. |
386 | + |
387 | + @param watches: boolean parameter denoting if relation watches |
388 | + should be started. |
389 | """ |
390 | yield self._run_lock.acquire() |
391 | try: |
392 | @@ -364,19 +423,24 @@ |
393 | self._watcher = yield self._unit_relation.watch_related_units( |
394 | self._scheduler.notify_change) |
395 | # And start the watcher. |
396 | - yield self._watcher.start() |
397 | + if watches: |
398 | + yield self._watcher.start() |
399 | finally: |
400 | self._run_lock.release() |
401 | self._log.debug( |
402 | "started relation:%s lifecycle", self._relation_name) |
403 | |
404 | @inlineCallbacks |
405 | - def stop(self): |
406 | + def stop(self, watches=True): |
407 | """Stop watching changes and stop executing relation change hooks. |
408 | + |
409 | + @param watches: boolean parameter denoting if relation watches |
410 | + should be stopped. |
411 | """ |
412 | yield self._run_lock.acquire() |
413 | try: |
414 | - self._watcher.stop() |
415 | + if watches and self._watcher: |
416 | + self._watcher.stop() |
417 | self._scheduler.stop() |
418 | finally: |
419 | yield self._run_lock.release() |
420 | |
421 | === modified file 'ensemble/unit/tests/test_lifecycle.py' |
422 | --- ensemble/unit/tests/test_lifecycle.py 2011-05-05 09:41:11 +0000 |
423 | +++ ensemble/unit/tests/test_lifecycle.py 2011-05-05 17:43:23 +0000 |
424 | @@ -12,6 +12,8 @@ |
425 | from ensemble.unit.lifecycle import ( |
426 | UnitLifecycle, UnitRelationLifecycle, RelationInvoker) |
427 | |
428 | +from ensemble.unit.workflow import RelationWorkflowState |
429 | + |
430 | from ensemble.hooks.invoker import Invoker |
431 | from ensemble.hooks.executor import HookExecutor |
432 | |
433 | @@ -19,9 +21,11 @@ |
434 | |
435 | from ensemble.state.endpoint import RelationEndpoint |
436 | from ensemble.state.relation import ClientServerUnitWatcher |
437 | +from ensemble.state.service import NO_HOOKS |
438 | from ensemble.state.tests.test_relation import RelationTestBase |
439 | from ensemble.state.hook import RelationChange |
440 | |
441 | + |
442 | from ensemble.lib.testing import TestCase |
443 | from ensemble.lib.mocker import MATCH |
444 | |
445 | @@ -97,6 +101,8 @@ |
446 | results.append(hook_name) |
447 | if debug: |
448 | print "-> exec hook", hook_name |
449 | + if d.called: |
450 | + return |
451 | if results == sequence: |
452 | d.callback(True) |
453 | if hook_name == name and count is None: |
454 | @@ -145,6 +151,182 @@ |
455 | return output |
456 | |
457 | |
458 | +class LifecycleResolvedTest(LifecycleTestBase): |
459 | + |
460 | + @inlineCallbacks |
461 | + def setUp(self): |
462 | + yield super(LifecycleResolvedTest, self).setUp() |
463 | + yield self.setup_default_test_relation() |
464 | + self.lifecycle = UnitLifecycle( |
465 | + self.client, self.states["unit"], self.states["service"], |
466 | + self.unit_directory, self.executor) |
467 | + |
468 | + def get_unit_relation_workflow(self, states): |
469 | + state_dir = os.path.join(self.ensemble_directory, "state") |
470 | + lifecycle = UnitRelationLifecycle( |
471 | + self.client, |
472 | + states["unit_relation"], |
473 | + states["service_relation"].relation_name, |
474 | + self.unit_directory, |
475 | + self.executor) |
476 | + |
477 | + workflow = RelationWorkflowState( |
478 | + self.client, |
479 | + states["unit_relation"], |
480 | + lifecycle, |
481 | + state_dir) |
482 | + |
483 | + return (workflow, lifecycle) |
484 | + |
485 | + @inlineCallbacks |
486 | + def test_resolved_relation_watch_unit_lifecycle_not_running(self): |
487 | + """If the unit is not running then no relation resolving is performed. |
488 | + However the resolution value remains the same. |
489 | + """ |
490 | + # Start the unit. |
491 | + yield self.lifecycle.start() |
492 | + |
493 | + # Wait for the relation to be started.... TODO: async background work |
494 | + yield self.sleep(0.1) |
495 | + |
496 | + # Simulate relation down on an individual unit relation |
497 | + workflow = self.lifecycle.get_relation_workflow( |
498 | + self.states["unit_relation"].internal_relation_id) |
499 | + self.assertEqual("up", (yield workflow.get_state())) |
500 | + |
501 | + yield workflow.transition_state("down") |
502 | + resolved = self.wait_on_state(workflow, "up") |
503 | + |
504 | + # Stop the unit lifecycle |
505 | + yield self.lifecycle.stop() |
506 | + |
507 | + # Set the relation to resolved |
508 | + yield self.states["unit"].set_relation_resolved( |
509 | + {self.states["unit_relation"].internal_relation_id: NO_HOOKS}) |
510 | + |
511 | + # Give a moment for the watch to fire erroneously |
512 | + yield self.sleep(0.2) |
513 | + |
514 | + # Ensure we didn't attempt a transition. |
515 | + self.assertFalse(resolved.called) |
516 | + self.assertEqual( |
517 | + {self.states["unit_relation"].internal_relation_id: NO_HOOKS}, |
518 | + (yield self.states["unit"].get_relation_resolved())) |
519 | + |
520 | + # If the unit is restarted start, we currently have the |
521 | + # behavior that the unit relation workflow will automatically |
522 | + # be transitioned back to running, as part of the normal state |
523 | + # transition. Sigh.. we should have a separate error |
524 | + # state for relation hooks then down with state variable usage. |
525 | + |
526 | + @inlineCallbacks |
527 | + def test_resolved_relation_watch_relation_up(self): |
528 | + """If a relation marked as to be resolved is already running, |
529 | + then no work is performed. |
530 | + """ |
531 | + # Start the unit. |
532 | + yield self.lifecycle.start() |
533 | + |
534 | + # Wait for the relation to be started.... TODO: async background work |
535 | + yield self.sleep(0.1) |
536 | + |
537 | + # get a hold of the unit relation and verify state |
538 | + workflow = self.lifecycle.get_relation_workflow( |
539 | + self.states["unit_relation"].internal_relation_id) |
540 | + self.assertEqual("up", (yield workflow.get_state())) |
541 | + |
542 | + # Set the relation to resolved |
543 | + yield self.states["unit"].set_relation_resolved( |
544 | + {self.states["unit_relation"].internal_relation_id: NO_HOOKS}) |
545 | + |
546 | + # Give a moment for the async background work. |
547 | + yield self.sleep(0.1) |
548 | + |
549 | + # Ensure we're still up and the relation resolved setting has been |
550 | + # cleared. |
551 | + self.assertEqual( |
552 | + None, (yield self.states["unit"].get_relation_resolved())) |
553 | + self.assertEqual("up", (yield workflow.get_state())) |
554 | + |
555 | + @inlineCallbacks |
556 | + def test_resolved_relation_watch_from_error(self): |
557 | + """Unit lifecycle's will process a unit relation resolved |
558 | + setting, and transition a down relation back to a running |
559 | + state. |
560 | + """ |
561 | + log_output = self.capture_logging( |
562 | + "unit.lifecycle", level=logging.DEBUG) |
563 | + |
564 | + # Start the unit. |
565 | + yield self.lifecycle.start() |
566 | + |
567 | + # Wait for the relation to be started... TODO: async background work |
568 | + yield self.sleep(0.1) |
569 | + |
570 | + # Simulate an error condition |
571 | + workflow = self.lifecycle.get_relation_workflow( |
572 | + self.states["unit_relation"].internal_relation_id) |
573 | + self.assertEqual("up", (yield workflow.get_state())) |
574 | + yield workflow.fire_transition("error") |
575 | + |
576 | + resolved = self.wait_on_state(workflow, "up") |
577 | + |
578 | + # Set the relation to resolved |
579 | + yield self.states["unit"].set_relation_resolved( |
580 | + {self.states["unit_relation"].internal_relation_id: NO_HOOKS}) |
581 | + |
582 | + # Wait for the relation to come back up |
583 | + value = yield self.states["unit"].get_relation_resolved() |
584 | + |
585 | + yield resolved |
586 | + |
587 | + # Verify state |
588 | + value = yield workflow.get_state() |
589 | + self.assertEqual(value, "up") |
590 | + |
591 | + self.assertIn( |
592 | + "processing relation resolved changed", log_output.getvalue()) |
593 | + |
594 | + @inlineCallbacks |
595 | + def test_resolved_relation_watch(self): |
596 | + """Unit lifecycle's will process a unit relation resolved |
597 | + setting, and transition a down relation back to a running |
598 | + state. |
599 | + """ |
600 | + log_output = self.capture_logging( |
601 | + "unit.lifecycle", level=logging.DEBUG) |
602 | + |
603 | + # Start the unit. |
604 | + yield self.lifecycle.start() |
605 | + |
606 | + # Wait for the relation to be started... TODO: async background work |
607 | + yield self.sleep(0.1) |
608 | + |
609 | + # Simulate an error condition |
610 | + workflow = self.lifecycle.get_relation_workflow( |
611 | + self.states["unit_relation"].internal_relation_id) |
612 | + self.assertEqual("up", (yield workflow.get_state())) |
613 | + yield workflow.transition_state("down") |
614 | + |
615 | + resolved = self.wait_on_state(workflow, "up") |
616 | + |
617 | + # Set the relation to resolved |
618 | + yield self.states["unit"].set_relation_resolved( |
619 | + {self.states["unit_relation"].internal_relation_id: NO_HOOKS}) |
620 | + |
621 | + # Wait for the relation to come back up |
622 | + value = yield self.states["unit"].get_relation_resolved() |
623 | + |
624 | + yield resolved |
625 | + |
626 | + # Verify state |
627 | + value = yield workflow.get_state() |
628 | + self.assertEqual(value, "up") |
629 | + |
630 | + self.assertIn( |
631 | + "processing relation resolved changed", log_output.getvalue()) |
632 | + |
633 | + |
634 | class UnitLifecycleTest(LifecycleTestBase): |
635 | |
636 | @inlineCallbacks |
637 | @@ -187,6 +369,45 @@ |
638 | # verify the sockets are cleaned up. |
639 | self.assertEqual(os.listdir(self.unit_directory), ["formula"]) |
640 | |
641 | + @inlineCallbacks |
642 | + def test_start_sans_hook(self): |
643 | + """The lifecycle start can be invoked without firing hooks.""" |
644 | + self.write_hook("start", "#!/bin/sh\n exit 1") |
645 | + start_executed = self.wait_on_hook("start") |
646 | + yield self.lifecycle.start(fire_hooks=False) |
647 | + # Wait for unit relation background processing.... |
648 | + yield self.sleep(0.1) |
649 | + self.assertFalse(start_executed.called) |
650 | + |
651 | + @inlineCallbacks |
652 | + def test_stop_sans_hook(self): |
653 | + """The lifecycle stop can be invoked without firing hooks.""" |
654 | + self.write_hook("stop", "#!/bin/sh\n exit 1") |
655 | + stop_executed = self.wait_on_hook("stop") |
656 | + yield self.lifecycle.start() |
657 | + yield self.lifecycle.stop(fire_hooks=False) |
658 | + # Wait for unit relation background processing.... |
659 | + yield self.sleep(0.1) |
660 | + self.assertFalse(stop_executed.called) |
661 | + |
662 | + @inlineCallbacks |
663 | + def test_install_sans_hook(self): |
664 | + """The lifecycle install can be invoked without firing hooks.""" |
665 | + self.write_hook("install", "#!/bin/sh\n exit 1") |
666 | + install_executed = self.wait_on_hook("install") |
667 | + yield self.lifecycle.install(fire_hooks=False) |
668 | + self.assertFalse(install_executed.called) |
669 | + |
670 | + @inlineCallbacks |
671 | + def test_upgrade_sans_hook(self): |
672 | + """The lifecycle upgrade can be invoked without firing hooks.""" |
673 | + self.executor.stop() |
674 | + self.write_hook("upgrade-formula", "#!/bin/sh\n exit 1") |
675 | + upgrade_executed = self.wait_on_hook("upgrade-formula") |
676 | + yield self.lifecycle.upgrade_formula(fire_hooks=False) |
677 | + self.assertFalse(upgrade_executed.called) |
678 | + self.assertTrue(self.executor.running) |
679 | + |
680 | def test_hook_error(self): |
681 | """Verify hook execution error, raises an exception.""" |
682 | self.write_hook("install", '#!/bin/sh\n exit 1') |
683 | @@ -196,14 +417,12 @@ |
684 | def test_hook_not_executable(self): |
685 | """A hook not executable, raises an exception.""" |
686 | self.write_hook("install", '#!/bin/sh\n exit 0', no_exec=True) |
687 | - # It would be preferrable if this was also a formulainvocation error. |
688 | return self.failUnlessFailure( |
689 | self.lifecycle.install(), FormulaError) |
690 | |
691 | def test_hook_not_formatted_correctly(self): |
692 | """Hook execution error, raises an exception.""" |
693 | self.write_hook("install", '!/bin/sh\n exit 0') |
694 | - # It would be preferrable if this was also a formulainvocation error. |
695 | return self.failUnlessFailure( |
696 | self.lifecycle.install(), FormulaInvocationError) |
697 | |
698 | @@ -532,7 +751,7 @@ |
699 | @inlineCallbacks |
700 | def test_initial_start_lifecycle_no_related_no_exec(self): |
701 | """ |
702 | - If there are no related units on startup, the relation changed hook |
703 | + If there are no related units on startup, the relation joined hook |
704 | is not invoked. |
705 | """ |
706 | file_path = self.makeFile() |
707 | @@ -545,6 +764,26 @@ |
708 | self.assertFalse(os.path.exists(file_path)) |
709 | |
710 | @inlineCallbacks |
711 | + def test_stop_can_continue_watching(self): |
712 | + """ |
713 | + """ |
714 | + file_path = self.makeFile() |
715 | + self.write_hook( |
716 | + "%s-relation-changed" % self.relation_name, |
717 | + ("#!/bin/bash\n" "echo executed >> %s\n" % file_path)) |
718 | + rel_states = yield self.add_opposite_service_unit(self.states) |
719 | + yield self.lifecycle.start() |
720 | + yield self.wait_on_hook( |
721 | + sequence=["app-relation-joined", "app-relation-changed"]) |
722 | + changed_executed = self.wait_on_hook("app-relation-changed") |
723 | + yield self.lifecycle.stop(watches=False) |
724 | + rel_states["unit_relation"].set_data(yaml.dump(dict(hello="world"))) |
725 | + yield self.sleep(0.1) |
726 | + self.assertFalse(changed_executed.called) |
727 | + yield self.lifecycle.start(watches=False) |
728 | + yield changed_executed |
729 | + |
730 | + @inlineCallbacks |
731 | def test_initial_start_lifecycle_with_related(self): |
732 | """ |
733 | If there are related units on startup, the relation changed hook |
734 | |
735 | === modified file 'ensemble/unit/tests/test_workflow.py' |
736 | --- ensemble/unit/tests/test_workflow.py 2011-05-05 14:31:43 +0000 |
737 | +++ ensemble/unit/tests/test_workflow.py 2011-05-05 17:43:23 +0000 |
738 | @@ -83,11 +83,30 @@ |
739 | self.assertFalse(result) |
740 | current_state = yield self.workflow.get_state() |
741 | yield self.assertEqual(current_state, "install_error") |
742 | - self.write_hook("install", "#!/bin/bash\necho hello\n") |
743 | result = yield self.workflow.fire_transition("retry_install") |
744 | yield self.assertState(self.workflow, "installed") |
745 | |
746 | @inlineCallbacks |
747 | + def test_install_error_with_retry_hook(self): |
748 | + """If the install hook fails, the workflow is transition to the |
749 | + install_error state. |
750 | + """ |
751 | + self.write_hook("install", "#!/bin/bash\nexit 1") |
752 | + result = yield self.workflow.fire_transition("install") |
753 | + self.assertFalse(result) |
754 | + current_state = yield self.workflow.get_state() |
755 | + yield self.assertEqual(current_state, "install_error") |
756 | + |
757 | + result = yield self.workflow.fire_transition("retry_install_hook") |
758 | + yield self.assertState(self.workflow, "install_error") |
759 | + |
760 | + self.write_hook("install", "#!/bin/bash\necho hello\n") |
761 | + hook_deferred = self.wait_on_hook("install") |
762 | + result = yield self.workflow.fire_transition_alias("retry_hook") |
763 | + yield hook_deferred |
764 | + yield self.assertState(self.workflow, "installed") |
765 | + |
766 | + @inlineCallbacks |
767 | def test_start(self): |
768 | file_path = self.makeFile() |
769 | self.write_hook( |
770 | @@ -131,12 +150,41 @@ |
771 | current_state = yield self.workflow.get_state() |
772 | self.assertEqual(current_state, "start_error") |
773 | |
774 | - self.write_hook("start", "#!/bin/bash\necho hello\n") |
775 | result = yield self.workflow.fire_transition("retry_start") |
776 | yield self.assertState(self.workflow, "started") |
777 | - # If we don't stop, we'll end up with the relation lifecycle |
778 | - # watches firing in the background when the test stops. |
779 | - self.write_hook("stop", "#!/bin/bash\necho hello\n") |
780 | + |
781 | + # If we don't stop, we'll end up with the relation lifecycle |
782 | + # watches firing in the background when the test stops. |
783 | + result = yield self.workflow.fire_transition("stop") |
784 | + yield self.assertState(self.workflow, "stopped") |
785 | + |
786 | + @inlineCallbacks |
787 | + def test_start_error_with_retry_hook(self): |
788 | + """Executing the start transition with a hook error, results in the |
789 | + workflow going to the start_error state. The start can be retried. |
790 | + """ |
791 | + self.write_hook("install", "#!/bin/bash\necho hello\n") |
792 | + result = yield self.workflow.fire_transition("install") |
793 | + self.assertTrue(result) |
794 | + self.write_hook("start", "#!/bin/bash\nexit 1") |
795 | + result = yield self.workflow.fire_transition("start") |
796 | + self.assertFalse(result) |
797 | + current_state = yield self.workflow.get_state() |
798 | + self.assertEqual(current_state, "start_error") |
799 | + |
800 | + hook_deferred = self.wait_on_hook("start") |
801 | + result = yield self.workflow.fire_transition("retry_start_hook") |
802 | + yield hook_deferred |
803 | + yield self.assertState(self.workflow, "start_error") |
804 | + |
805 | + self.write_hook("start", "#!/bin/bash\nexit 0") |
806 | + hook_deferred = self.wait_on_hook("start") |
807 | + result = yield self.workflow.fire_transition_alias("retry_hook") |
808 | + yield hook_deferred |
809 | + yield self.assertState(self.workflow, "started") |
810 | + |
811 | + # If we don't stop, we'll end up with the relation lifecycle |
812 | + # watches firing in the background when the test stops. |
813 | result = yield self.workflow.fire_transition("stop") |
814 | yield self.assertState(self.workflow, "stopped") |
815 | |
816 | @@ -193,13 +241,49 @@ |
817 | yield self.assertState(self.workflow, "configure_error") |
818 | |
819 | # Verify recovery from error state |
820 | + result = yield self.workflow.fire_transition_alias("retry") |
821 | + self.assertTrue(result) |
822 | + yield self.assertState(self.workflow, "started") |
823 | + |
824 | + # Stop any background processing |
825 | + yield self.workflow.fire_transition("stop") |
826 | + |
827 | + @inlineCallbacks |
828 | + def test_configure_error_and_retry_hook(self): |
829 | + """An error while configuring, transitions the unit and |
830 | + stops the lifecycle.""" |
831 | + #self.capture_output() |
832 | + yield self.workflow.fire_transition("install") |
833 | + result = yield self.workflow.fire_transition("start") |
834 | + self.assertTrue(result) |
835 | + self.assertState(self.workflow, "started") |
836 | + |
837 | + # Verify transition to error state |
838 | + hook_deferred = self.wait_on_hook("config-changed") |
839 | + self.write_hook("config-changed", "#!/bin/bash\nexit 1") |
840 | + result = yield self.workflow.fire_transition("reconfigure") |
841 | + yield hook_deferred |
842 | + self.assertFalse(result) |
843 | + yield self.assertState(self.workflow, "configure_error") |
844 | + |
845 | + # Verify retry hook with hook error stays in error state |
846 | + hook_deferred = self.wait_on_hook("config-changed") |
847 | + result = yield self.workflow.fire_transition("retry_configure_hook") |
848 | + |
849 | + self.assertFalse(result) |
850 | + yield hook_deferred |
851 | + yield self.assertState(self.workflow, "configure_error") |
852 | + |
853 | hook_deferred = self.wait_on_hook("config-changed") |
854 | self.write_hook("config-changed", "#!/bin/bash\nexit 0") |
855 | - result = yield self.workflow.fire_transition("retry_configure") |
856 | + result = yield self.workflow.fire_transition_alias("retry_hook") |
857 | yield hook_deferred |
858 | - self.assertTrue(result) |
859 | yield self.assertState(self.workflow, "started") |
860 | |
861 | + # Stop any background processing |
862 | + yield self.workflow.fire_transition("stop") |
863 | + yield self.sleep(0.1) |
864 | + |
865 | @inlineCallbacks |
866 | def test_upgrade(self): |
867 | """Upgrading a workflow results in the upgrade hook being |
868 | @@ -235,7 +319,7 @@ |
869 | yield self.workflow.fire_transition("stop") |
870 | |
871 | @inlineCallbacks |
872 | - def test_upgrade_error_state(self): |
873 | + def test_upgrade_error_retry(self): |
874 | """A hook error during an upgrade transitions to |
875 | upgrade_error. |
876 | """ |
877 | @@ -259,6 +343,41 @@ |
878 | yield self.workflow.fire_transition("retry_upgrade_formula") |
879 | current_state = yield self.workflow.get_state() |
880 | self.assertEqual(current_state, "started") |
881 | + |
882 | + # Stop any background activity |
883 | + yield self.workflow.fire_transition("stop") |
884 | + |
885 | + @inlineCallbacks |
886 | + def test_upgrade_error_retry_hook(self): |
887 | + """A hook error during an upgrade transitions to |
888 | + upgrade_error, and can be re-tried with hook execution. |
889 | + """ |
890 | + yield self.workflow.fire_transition("install") |
891 | + yield self.workflow.fire_transition("start") |
892 | + current_state = yield self.workflow.get_state() |
893 | + self.assertEqual(current_state, "started") |
894 | + |
895 | + # Agent prepares this. |
896 | + self.executor.stop() |
897 | + |
898 | + self.write_hook("upgrade-formula", "#!/bin/bash\nexit 1") |
899 | + hook_deferred = self.wait_on_hook("upgrade-formula") |
900 | + yield self.workflow.fire_transition("upgrade_formula") |
901 | + yield hook_deferred |
902 | + current_state = yield self.workflow.get_state() |
903 | + self.assertEqual(current_state, "formula_upgrade_error") |
904 | + |
905 | + hook_deferred = self.wait_on_hook("upgrade-formula") |
906 | + self.write_hook("upgrade-formula", "#!/bin/bash\nexit 0") |
907 | + # The upgrade error hook should ensure that the executor is stoppped. |
908 | + self.assertFalse(self.executor.running) |
909 | + yield self.workflow.fire_transition_alias("retry_hook") |
910 | + yield hook_deferred |
911 | + current_state = yield self.workflow.get_state() |
912 | + self.assertEqual(current_state, "started") |
913 | + self.assertTrue(self.executor.running) |
914 | + |
915 | + # Stop any background activity |
916 | yield self.workflow.fire_transition("stop") |
917 | |
918 | @inlineCallbacks |
919 | @@ -301,13 +420,36 @@ |
920 | self.assertTrue(result) |
921 | result = yield self.workflow.fire_transition("start") |
922 | self.assertTrue(result) |
923 | + |
924 | self.write_hook("stop", "#!/bin/bash\nexit 1") |
925 | result = yield self.workflow.fire_transition("stop") |
926 | self.assertFalse(result) |
927 | - current_state = yield self.workflow.get_state() |
928 | - self.assertEqual(current_state, "stop_error") |
929 | + |
930 | + yield self.assertState(self.workflow, "stop_error") |
931 | self.write_hook("stop", "#!/bin/bash\necho hello\n") |
932 | result = yield self.workflow.fire_transition("retry_stop") |
933 | + |
934 | + yield self.assertState(self.workflow, "stopped") |
935 | + |
936 | + @inlineCallbacks |
937 | + def test_stop_error_with_retry_hook(self): |
938 | + self.write_hook("install", "#!/bin/bash\necho hello\n") |
939 | + self.write_hook("start", "#!/bin/bash\necho hello\n") |
940 | + result = yield self.workflow.fire_transition("install") |
941 | + self.assertTrue(result) |
942 | + result = yield self.workflow.fire_transition("start") |
943 | + self.assertTrue(result) |
944 | + |
945 | + self.write_hook("stop", "#!/bin/bash\nexit 1") |
946 | + result = yield self.workflow.fire_transition("stop") |
947 | + self.assertFalse(result) |
948 | + yield self.assertState(self.workflow, "stop_error") |
949 | + |
950 | + result = yield self.workflow.fire_transition_alias("retry_hook") |
951 | + yield self.assertState(self.workflow, "stop_error") |
952 | + |
953 | + self.write_hook("stop", "#!/bin/bash\nexit 0") |
954 | + result = yield self.workflow.fire_transition_alias("retry_hook") |
955 | yield self.assertState(self.workflow, "stopped") |
956 | |
957 | @inlineCallbacks |
958 | @@ -447,7 +589,7 @@ |
959 | # Add a new unit, and wait for the broken hook to result in |
960 | # the transition to the down state. |
961 | yield self.add_opposite_service_unit(self.states) |
962 | - yield self.wait_on_state(self.workflow, "down") |
963 | + yield self.wait_on_state(self.workflow, "error") |
964 | |
965 | f_state, history, zk_state = yield self.read_persistent_state( |
966 | history_id=self.workflow.zk_state_id) |
967 | @@ -458,7 +600,7 @@ |
968 | "formula", "hooks", "app-relation-changed")) |
969 | |
970 | self.assertEqual(f_state, |
971 | - {"state": "down", |
972 | + {"state": "error", |
973 | "state_variables": { |
974 | "change_type": "joined", |
975 | "error_message": error}}) |
976 | |
977 | === modified file 'ensemble/unit/workflow.py' |
978 | --- ensemble/unit/workflow.py 2011-05-05 14:31:43 +0000 |
979 | +++ ensemble/unit/workflow.py 2011-05-05 17:43:23 +0000 |
980 | @@ -14,31 +14,56 @@ |
981 | |
982 | |
983 | UnitWorkflow = Workflow( |
984 | + # Install transitions |
985 | Transition("install", "Install", None, "installed", |
986 | error_transition_id="error_install"), |
987 | - Transition("error_install", "Install Error", None, "install_error"), |
988 | - Transition("retry_install", "Retry Install", "install_error", "installed"), |
989 | + Transition("error_install", "Install error", None, "install_error"), |
990 | + |
991 | + Transition("retry_install", "Retry install", "install_error", "installed", |
992 | + alias="retry"), |
993 | + Transition("retry_install_hook", "Retry install with hook", |
994 | + "install_error", "installed", alias="retry_hook"), |
995 | + |
996 | + # Start transitions |
997 | Transition("start", "Start", "installed", "started", |
998 | error_transition_id="error_start"), |
999 | - Transition("error_start", "Start Error", "installed", "start_error"), |
1000 | - Transition("retry_start", "Retry Start", "start_error", "started"), |
1001 | + Transition("error_start", "Start error", "installed", "start_error"), |
1002 | + Transition("retry_start", "Retry start", "start_error", "started", |
1003 | + alias="retry"), |
1004 | + Transition("retry_start_hook", "Retry start with hook", |
1005 | + "start_error", "started", alias="retry_hook"), |
1006 | + |
1007 | + # Stop transitions |
1008 | Transition("stop", "Stop", "started", "stopped", |
1009 | error_transition_id="error_stop"), |
1010 | - Transition("error_stop", "Stop Error", "started", "stop_error"), |
1011 | - Transition("retry_stop", "Retry Stop", "stop_error", "stopped"), |
1012 | - |
1013 | - # Upgrade Transitions (stay in state, with success transitition) |
1014 | + Transition("error_stop", "Stop error", "started", "stop_error"), |
1015 | + Transition("retry_stop", "Retry stop", "stop_error", "stopped", |
1016 | + alias="retry"), |
1017 | + Transition("retry_stop_hook", "Retry stop with hook", |
1018 | + "stop_error", "stopped", alias="retry_hook"), |
1019 | + |
1020 | + # Restart transitions |
1021 | + Transition("restart", "Restart", "stop", "start", |
1022 | + error_transition_id="error_start", alias="retry"), |
1023 | + Transition("restart_with_hook", "Restart with hook", |
1024 | + "stop", "start", alias="retry_hook", |
1025 | + error_transition_id="error_start"), |
1026 | + |
1027 | + # Upgrade transitions |
1028 | Transition( |
1029 | "upgrade_formula", "Upgrade", "started", "started", |
1030 | error_transition_id="upgrade_formula_error"), |
1031 | Transition( |
1032 | - "upgrade_formula_error", "On upgrade error", |
1033 | + "upgrade_formula_error", "Upgrade from stop error", |
1034 | "started", "formula_upgrade_error"), |
1035 | Transition( |
1036 | - "retry_upgrade_formula", "Retry failed upgrade", |
1037 | - "formula_upgrade_error", "started"), |
1038 | + "retry_upgrade_formula", "Upgrade from stop error", |
1039 | + "formula_upgrade_error", "started", alias="retry"), |
1040 | + Transition( |
1041 | + "retry_upgrade_formula_hook", "Upgrade from stop error with hook", |
1042 | + "formula_upgrade_error", "started", alias="retry_hook"), |
1043 | |
1044 | - # Configuration Transitions (stay in state, with success transition) |
1045 | + # Configuration Transitions |
1046 | Transition( |
1047 | "reconfigure", "Reconfigure", "started", "started", |
1048 | error_transition_id="error_configure"), |
1049 | @@ -46,28 +71,56 @@ |
1050 | "error_configure", "On configure error", |
1051 | "started", "configure_error"), |
1052 | Transition( |
1053 | + "retry_error", "On retry configure error", |
1054 | + "configure_error", "configure_error"), |
1055 | + Transition( |
1056 | "retry_configure", "Retry configure", |
1057 | - "configure_error", "started") |
1058 | + "configure_error", "started", alias="retry", |
1059 | + error_transition_id="retry_error"), |
1060 | + Transition( |
1061 | + "retry_configure_hook", "Retry configure with hooks", |
1062 | + "configure_error", "started", alias="retry_hook", |
1063 | + error_transition_id="retry_error") |
1064 | ) |
1065 | |
1066 | |
1067 | -# There's been some discussion, if we should have per change type error states |
1068 | -# here, corresponding to the different changes that the relation-changed hook |
1069 | -# is invoked for. The important aspects to capture are both observability of |
1070 | -# error type locally and globally (zk), and per error type and instance |
1071 | -# recovery of the same. To provide for this functionality without additional |
1072 | -# states, the error information (change type, and error message) are captured |
1073 | -# in state variables which are locally and globally observable. Future |
1074 | -# extension of the restart transition action, will allow for customized |
1075 | -# recovery based on the change type state variable. Effectively this |
1076 | -# differs from the unit definition, in that it collapses three possible error |
1077 | -# states, into a behavior off switch. A separate state will be needed to |
1078 | -# denote departing. |
1079 | +# Unit relation error states |
1080 | +# |
1081 | +# There's been some discussion, if we should have per change type |
1082 | +# error states here, corresponding to the different changes that the |
1083 | +# relation-changed hook is invoked for. The important aspects to |
1084 | +# capture are both observability of error type locally and globally |
1085 | +# (zk), and per error type and instance recovery of the same. To |
1086 | +# provide for this functionality without additional states, the error |
1087 | +# information (change type, and error message) are captured in state |
1088 | +# variables which are locally and globally observable. Future |
1089 | +# extension of the restart transition action, will allow for |
1090 | +# customized recovery based on the change type state |
1091 | +# variable. Effectively this differs from the unit definition, in that |
1092 | +# it collapses three possible error states, into a behavior off |
1093 | +# switch. A separate state will be needed to denote departing. |
1094 | + |
1095 | + |
1096 | +# Process recovery using on disk workflow state |
1097 | +# |
1098 | +# Another interesting issue, process recovery using the on disk state, |
1099 | +# is complicated by consistency to the the in memory state, which |
1100 | +# won't be directly recoverable anymore without some state specific |
1101 | +# semantics to recovering from on disk state, ie a restarted unit |
1102 | +# agent, with a relation in an error state would require special |
1103 | +# semantics around loading from disk to ensure that the in-memory |
1104 | +# process state (watching and scheduling but not executing) matches |
1105 | +# the recovery transition actions (which just restart hook execution, |
1106 | +# but assume the watch continues).. this functionality added to better |
1107 | +# allow for the behavior that while down due to a hook error, the |
1108 | +# relation would continues to schedule pending hooks |
1109 | |
1110 | RelationWorkflow = Workflow( |
1111 | Transition("start", "Start", None, "up"), |
1112 | Transition("stop", "Stop", "up", "down"), |
1113 | - Transition("restart", "Restart", "down", "up"), |
1114 | + Transition("restart", "Restart", "down", "up", alias="retry"), |
1115 | + Transition("error", "Relation hook error", "up", "error"), |
1116 | + Transition("reset", "Recover from hook error", "error", "up"), |
1117 | Transition("depart", "Relation broken", "up", "departed"), |
1118 | Transition("down_depart", "Relation broken", "down", "departed"), |
1119 | ) |
1120 | @@ -223,7 +276,6 @@ |
1121 | row per entry with CSV escaping. |
1122 | """ |
1123 | state_serialized = yaml.safe_dump(state_dict) |
1124 | - |
1125 | # State File |
1126 | with open(self.state_file_path, "w") as handle: |
1127 | handle.write(state_serialized) |
1128 | @@ -243,6 +295,7 @@ |
1129 | return {"state": None} |
1130 | with open(self.state_file_path, "r") as handle: |
1131 | content = handle.read() |
1132 | + |
1133 | return yaml.load(content) |
1134 | |
1135 | |
1136 | @@ -282,51 +335,77 @@ |
1137 | self._lifecycle = lifecycle |
1138 | |
1139 | @inlineCallbacks |
1140 | - def _invoke_lifecycle(self, method): |
1141 | + def _invoke_lifecycle(self, method, *args, **kw): |
1142 | try: |
1143 | - result = yield method() |
1144 | + result = yield method(*args, **kw) |
1145 | except (FileNotFound, FormulaError, FormulaInvocationError), e: |
1146 | raise TransitionError(e) |
1147 | returnValue(result) |
1148 | |
1149 | - # Transition Actions |
1150 | + # Install transitions |
1151 | def do_install(self): |
1152 | return self._invoke_lifecycle(self._lifecycle.install) |
1153 | |
1154 | + def do_retry_install(self): |
1155 | + return self._invoke_lifecycle(self._lifecycle.install, |
1156 | + fire_hooks=False) |
1157 | + |
1158 | + def do_retry_install_hook(self): |
1159 | + return self._invoke_lifecycle(self._lifecycle.install) |
1160 | + |
1161 | + # Start transitions |
1162 | def do_start(self): |
1163 | return self._invoke_lifecycle(self._lifecycle.start) |
1164 | |
1165 | - def do_stop(self): |
1166 | - return self._invoke_lifecycle(self._lifecycle.stop) |
1167 | - |
1168 | def do_retry_start(self): |
1169 | + return self._invoke_lifecycle(self._lifecycle.start, |
1170 | + fire_hooks=False) |
1171 | + |
1172 | + def do_retry_start_hook(self): |
1173 | return self._invoke_lifecycle(self._lifecycle.start) |
1174 | |
1175 | + # Stop transitions |
1176 | + def do_stop(self): |
1177 | + return self._invoke_lifecycle(self._lifecycle.stop) |
1178 | + |
1179 | def do_retry_stop(self): |
1180 | - self._invoke_lifecycle(self._lifecycle.stop) |
1181 | - |
1182 | - def do_retry_install(self): |
1183 | - return self._invoke_lifecycle(self._lifecycle.install) |
1184 | + return self._invoke_lifecycle(self._lifecycle.stop, |
1185 | + fire_hooks=False) |
1186 | + |
1187 | + def do_retry_stop_hook(self): |
1188 | + return self._invoke_lifecycle(self._lifecycle.stop) |
1189 | + |
1190 | + # Upgrade transititions |
1191 | + def do_upgrade_formula(self): |
1192 | + return self._invoke_lifecycle(self._lifecycle.upgrade_formula) |
1193 | |
1194 | def do_retry_upgrade_formula(self): |
1195 | - return self._invoke_lifecycle(self._lifecycle.upgrade_formula) |
1196 | - |
1197 | - def do_upgrade_formula(self): |
1198 | - return self._invoke_lifecycle(self._lifecycle.upgrade_formula) |
1199 | - |
1200 | - # Some of this needs support from the resolved branches, as we |
1201 | - # want to fire some of these lifecycle methods sans hooks. |
1202 | + return self._invoke_lifecycle(self._lifecycle.upgrade_formula, |
1203 | + fire_hooks=False) |
1204 | + |
1205 | + def do_retry_upgrade_formula_hook(self): |
1206 | + return self._invoke_lifecycle(self._lifecycle.upgrade_formula) |
1207 | + |
1208 | + # Config transitions |
1209 | def do_error_configure(self): |
1210 | - # self._invoke_lifecycle(self._lifecycle.stop, fire_hooks=False) |
1211 | - pass |
1212 | + return self._invoke_lifecycle(self._lifecycle.stop, fire_hooks=False) |
1213 | |
1214 | def do_reconfigure(self): |
1215 | return self._invoke_lifecycle(self._lifecycle.configure) |
1216 | |
1217 | + def do_retry_error(self): |
1218 | + return self._invoke_lifecycle(self._lifecycle.stop, fire_hooks=False) |
1219 | + |
1220 | + @inlineCallbacks |
1221 | def do_retry_configure(self): |
1222 | - # self._invoke_lifecycle(self._lifecycle.start, fire_hooks=False) |
1223 | - self._invoke_lifecycle( |
1224 | - self._lifecycle.configure) # fire_hooks=False) |
1225 | + yield self._invoke_lifecycle(self._lifecycle.start, fire_hooks=False) |
1226 | + yield self._invoke_lifecycle(self._lifecycle.configure, |
1227 | + fire_hooks=False) |
1228 | + |
1229 | + @inlineCallbacks |
1230 | + def do_retry_configure_hook(self): |
1231 | + yield self._invoke_lifecycle(self._lifecycle.start, fire_hooks=False) |
1232 | + yield self._invoke_lifecycle(self._lifecycle.configure) |
1233 | |
1234 | |
1235 | class RelationWorkflowState(DiskWorkflowState): |
1236 | @@ -360,7 +439,7 @@ |
1237 | |
1238 | @param: error: The error from hook invocation. |
1239 | """ |
1240 | - yield self.fire_transition("stop", |
1241 | + yield self.fire_transition("error", |
1242 | change_type=relation_change.change_type, |
1243 | error_message=str(error)) |
1244 | |
1245 | @@ -369,12 +448,30 @@ |
1246 | """Transition the workflow to the 'down' state. |
1247 | |
1248 | Turns off the unit-relation lifecycle monitoring and hook execution. |
1249 | + |
1250 | + :param error_info: If called on relation hook error, contains |
1251 | + error variables. |
1252 | """ |
1253 | yield self._lifecycle.stop() |
1254 | |
1255 | @inlineCallbacks |
1256 | + def do_reset(self): |
1257 | + """Transition the workflow to the 'up' state from an error state. |
1258 | + |
1259 | + Turns on the unit-relation lifecycle monitoring and hook execution. |
1260 | + """ |
1261 | + yield self._lifecycle.start(watches=False) |
1262 | + |
1263 | + @inlineCallbacks |
1264 | + def do_error(self, **error_info): |
1265 | + """A relation hook error, stops further execution hooks but |
1266 | + continues to watch for changes. |
1267 | + """ |
1268 | + yield self._lifecycle.stop(watches=False) |
1269 | + |
1270 | + @inlineCallbacks |
1271 | def do_restart(self): |
1272 | - """Transition the workflow to the 'up' state. |
1273 | + """Transition the workflow to the 'up' state from the down state. |
1274 | |
1275 | Turns on the unit-relation lifecycle monitoring and hook execution. |
1276 | """ |
Kapil mentioned he's still working on this one.