Merge lp:~hazmat/pyjuju/unit-agent-resolved into lp:pyjuju

Proposed by Kapil Thangavelu
Status: Merged
Approved by: Gustavo Niemeyer
Approved revision: 275
Merged at revision: 224
Proposed branch: lp:~hazmat/pyjuju/unit-agent-resolved
Merge into: lp:pyjuju
Prerequisite: lp:~hazmat/pyjuju/ensemble-resolved
Diff against target: 1276 lines (+718/-94)
7 files modified
ensemble/control/tests/test_resolved.py (+0/-1)
ensemble/state/service.py (+18/-5)
ensemble/state/tests/test_service.py (+71/-1)
ensemble/unit/lifecycle.py (+85/-21)
ensemble/unit/tests/test_lifecycle.py (+242/-3)
ensemble/unit/tests/test_workflow.py (+154/-12)
ensemble/unit/workflow.py (+148/-51)
To merge this branch: bzr merge lp:~hazmat/pyjuju/unit-agent-resolved
Reviewer Review Type Date Requested Status
Gustavo Niemeyer Approve
Review via email: mp+59713@code.launchpad.net

This proposal supersedes a proposal from 2011-05-02.

Description of the change

This implements the unit lifecycle resolving unit relations, and additional changes to enable resolution (transition actions receiving hook execution flags).

The branch is large enough that i'm going to split the remaining unit agent resolving unit relations into another branch.

To post a comment you must log in.
Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :

Kapil mentioned he's still working on this one.

Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :

[1]

+ def do_retry_start(self, fire_hooks=True):
+ return self._invoke_lifecycle(
+ self._lifecycle.start, fire_hooks=fire_hooks)

We've already debated this live over a voice call, and had already
covered the topic in a previous conversation: I think it is a mistake
to introduce variables which define how the transition should work.

This is increasing the complexity of actions in an unpredictable way
(a single action with parameters could handle *all* the possible
transitions in the state machine).

review: Needs Fixing
Revision history for this message
Kapil Thangavelu (hazmat) wrote :

Changed over to using additional transitions, its a bit nicer now, thanks.

Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :

Nice, thanks for the changes. Looking good!

+1, taking these in consideration:

[2]

+ def do_retry_upgrade_formula_hook(self, fire_hooks=True):

The fire_hooks seems to be a left over.

[3]

+ return self._invoke_lifecycle(
+ self._lifecycle.start, fire_hooks=False).addCallback(
+ lambda x: self._invoke_lifecycle(
+ self._lifecycle.configure, fire_hooks=False))
(...)
+ return self._invoke_lifecycle(
+ self._lifecycle.start, fire_hooks=False).addCallback(
+ lambda x: self._invoke_lifecycle(
+ self._lifecycle.configure))

Breaking these down would make them more readable than the huge one-liners.

[4]

+# Another interesting issue, process recovery using the on disk state,
+# is complicated by consistency to the the in memory state, which
+# won't be directly recoverable anymore without some state specific

Sweet. Thanks for capturing those ideas.

[5]

+ # If the unit lifecycle isn't running we shouldn't process
+ # any relation resolutions.
+ if not self._running:
+ self._log.debug("stop watch relation resolved changes")
+ self._watching_relation_resolved = False
+ raise StopWatcher()

It's not clear what's the intention here, and test coverage also
looks a bit poor in this area (e.g. replacing everything under the "if" by
"return" still passes tests fine).

What do we want to happen in this case, and what's the rationale
behind it?

[6]

+ if self._client.connected:
+ yield self._process_relation_resolved_changes()

What's the "if connected" test about? It feels like pretty much every
interaction with zk could have such a test. Why is this location
special?

review: Approve
Revision history for this message
Kapil Thangavelu (hazmat) wrote :

Excerpts from Gustavo Niemeyer's message of Thu May 05 02:01:15 UTC 2011:
> Review: Approve
> Nice, thanks for the changes. Looking good!
>
> +1, taking these in consideration:
>
> [5]
>
> + # If the unit lifecycle isn't running we shouldn't process
> + # any relation resolutions.
> + if not self._running:
> + self._log.debug("stop watch relation resolved changes")
> + self._watching_relation_resolved = False
> + raise StopWatcher()
>
> It's not clear what's the intention here, and test coverage also
> looks a bit poor in this area (e.g. replacing everything under the "if" by
> "return" still passes tests fine).
>
> What do we want to happen in this case, and what's the rationale
> behind it?
>

The notion is if we're not running, the watcher terminates, when we start watching
again the watcher is restarted, if it doesn't already exist. Its designed to
avoid concurrent watchers, and to allow for watch termination.

The design rationale being if a unit is not running, it needs to be resolved before
its unit relations can be fixed.

Being able to correctly tests runs into another behavior on the unit lifecycle,
where we automatically repair unit relations when the unit is started. I've
introduced a separate error state for relations in this branch, so we can choose
not to have unit relation errors automatically recovered, in which case the logic
in question can be tested easily, by bringing the unit back online, and the verifying
the resolve action was performed.

>
> [6]
>
> + if self._client.connected:
> + yield self._process_relation_resolved_changes()
>
> What's the "if connected" test about? It feels like pretty much every
> interaction with zk could have such a test. Why is this location
> special?
>

Its typically done in watch callbacks, such that an async background execution while the test
is closing, does not trigger a zookeeper closing exception, and minimizes errors.

276. By Kapil Thangavelu

Merged ensemble-resolved into unit-agent-resolved.

277. By Kapil Thangavelu

Merged ensemble-resolved into unit-agent-resolved.

278. By Kapil Thangavelu

Merged ensemble-resolved into unit-agent-resolved.

279. By Kapil Thangavelu

address review comments re formatting

Revision history for this message
Kapil Thangavelu (hazmat) wrote :

Excerpts from Kapil Thangavelu's message of Thu May 05 13:26:02 -0400 2011:
> Excerpts from Gustavo Niemeyer's message of Thu May 05 02:01:15 UTC 2011:
> > Review: Approve
> > Nice, thanks for the changes. Looking good!
> >
> > +1, taking these in consideration:
> >
> > [5]
> >
> > + # If the unit lifecycle isn't running we shouldn't process
> > + # any relation resolutions.
> > + if not self._running:
> > + self._log.debug("stop watch relation resolved changes")
> > + self._watching_relation_resolved = False
> > + raise StopWatcher()
> >
> > It's not clear what's the intention here, and test coverage also
> > looks a bit poor in this area (e.g. replacing everything under the "if" by
> > "return" still passes tests fine).
> >
> > What do we want to happen in this case, and what's the rationale
> > behind it?
> >
>
> The notion is if we're not running, the watcher terminates, when we start watching
> again the watcher is restarted, if it doesn't already exist. Its designed to
> avoid concurrent watchers, and to allow for watch termination.
>
> The design rationale being if a unit is not running, it needs to be resolved before
> its unit relations can be fixed.
>
> Being able to correctly tests runs into another behavior on the unit lifecycle,
> where we automatically repair unit relations when the unit is started. I've
> introduced a separate error state for relations in this branch, so we can choose
> not to have unit relation errors automatically recovered, in which case the logic
> in question can be tested easily, by bringing the unit back online, and the verifying
> the resolve action was performed.

Just to be clear this is a test for the overall functionality in

test_resolved_relation_watch_unit_lifecycle_not_running

But it because of the above constraint regarding automatic recover on start, it can
only verify that the watcher has stopped, not that the recovery proceeds normally
after restart, so it instead verifies that the resolved setting has persisted
past the end of the watcher.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'ensemble/control/tests/test_resolved.py'
2--- ensemble/control/tests/test_resolved.py 2011-04-28 17:09:36 +0000
3+++ ensemble/control/tests/test_resolved.py 2011-05-05 17:43:23 +0000
4@@ -2,7 +2,6 @@
5 from yaml import dump
6
7 from ensemble.control import main
8-from ensemble.control.resolved import resolved
9 from ensemble.control.tests.common import ControlToolTest
10 from ensemble.formula.tests.test_repository import RepositoryTestBase
11
12
13=== modified file 'ensemble/state/service.py'
14--- ensemble/state/service.py 2011-05-05 16:18:34 +0000
15+++ ensemble/state/service.py 2011-05-05 17:43:23 +0000
16@@ -16,11 +16,10 @@
17 ServiceUnitStateMachineAlreadyAssigned, ServiceStateNameInUse,
18 BadDescriptor, BadServiceStateName, NoUnusedMachines,
19 ServiceUnitDebugAlreadyEnabled, ServiceUnitResolvedAlreadyEnabled,
20- ServiceUnitRelationResolvedAlreadyEnabled)
21+ ServiceUnitRelationResolvedAlreadyEnabled, StopWatcher)
22 from ensemble.state.formula import FormulaStateManager
23 from ensemble.state.relation import ServiceRelationState, RelationStateManager
24 from ensemble.state.machine import _public_machine_id, MachineState
25-
26 from ensemble.state.utils import remove_tree, dict_merge, YAMLState
27
28 RETRY_HOOKS = 1000
29@@ -516,7 +515,7 @@
30 its `write` method invoked to publish the state to Zookeeper.
31 """
32 config_node = YAMLState(self._client,
33- "/services/%s/config" % self._internal_id )
34+ "/services/%s/config" % self._internal_id)
35 yield config_node.read()
36 returnValue(config_node)
37
38@@ -944,9 +943,13 @@
39 def watcher(change_event):
40 if not self._client.connected:
41 returnValue(None)
42+
43 exists_d, watch_d = self._client.exists_and_watch(
44 self._unit_resolve_path)
45- yield callback(change_event)
46+ try:
47+ yield callback(change_event)
48+ except StopWatcher:
49+ returnValue(None)
50 watch_d.addCallback(watcher)
51
52 exists_d, watch_d = self._client.exists_and_watch(
53@@ -958,6 +961,8 @@
54 callback_d = maybeDeferred(callback, bool(exists))
55 callback_d.addCallback(
56 lambda x: watch_d.addCallback(watcher) and x)
57+ callback_d.addErrback(
58+ lambda failure: failure.trap(StopWatcher))
59
60 @property
61 def _relation_resolved_path(self):
62@@ -1033,6 +1038,8 @@
63
64 @inlineCallbacks
65 def clear_relation_resolved(self):
66+ """ Clear the relation resolved setting.
67+ """
68 try:
69 yield self._client.delete(self._relation_resolved_path)
70 except zookeeper.NoNodeException:
71@@ -1055,7 +1062,11 @@
72 returnValue(None)
73 exists_d, watch_d = self._client.exists_and_watch(
74 self._relation_resolved_path)
75- yield callback(change_event)
76+ try:
77+ yield callback(change_event)
78+ except StopWatcher:
79+ returnValue(None)
80+
81 watch_d.addCallback(watcher)
82
83 exists_d, watch_d = self._client.exists_and_watch(
84@@ -1068,6 +1079,8 @@
85 callback_d = maybeDeferred(callback, bool(exists))
86 callback_d.addCallback(
87 lambda x: watch_d.addCallback(watcher) and x)
88+ callback_d.addErrback(
89+ lambda failure: failure.trap(StopWatcher))
90
91
92 def _parse_unit_name(unit_name):
93
94=== modified file 'ensemble/state/tests/test_service.py'
95--- ensemble/state/tests/test_service.py 2011-05-05 14:31:41 +0000
96+++ ensemble/state/tests/test_service.py 2011-05-05 17:43:23 +0000
97@@ -17,7 +17,7 @@
98 ServiceUnitStateMachineAlreadyAssigned, ServiceStateNameInUse,
99 BadDescriptor, BadServiceStateName, ServiceUnitDebugAlreadyEnabled,
100 MachineStateNotFound, NoUnusedMachines, ServiceUnitResolvedAlreadyEnabled,
101- ServiceUnitRelationResolvedAlreadyEnabled)
102+ ServiceUnitRelationResolvedAlreadyEnabled, StopWatcher)
103
104
105 from ensemble.state.tests.common import StateTestBase
106@@ -790,6 +790,42 @@
107 {"retry": NO_HOOKS})
108
109 @inlineCallbacks
110+ def test_stop_watch_resolved(self):
111+ """A unit resolved watch can be instituted on a permanent basis.
112+
113+ However the callback can raise StopWatcher at anytime to stop the watch
114+ """
115+ unit_state = yield self.get_unit_state()
116+
117+ results = []
118+
119+ def callback(value):
120+ results.append(value)
121+ if len(results) == 1:
122+ raise StopWatcher()
123+ if len(results) == 3:
124+ raise StopWatcher()
125+
126+ unit_state.watch_resolved(callback)
127+ yield unit_state.set_resolved(RETRY_HOOKS)
128+ yield unit_state.clear_resolved()
129+ yield self.poke_zk()
130+
131+ unit_state.watch_resolved(callback)
132+ yield unit_state.set_resolved(NO_HOOKS)
133+ yield unit_state.clear_resolved()
134+
135+ yield self.poke_zk()
136+
137+ self.assertEqual(len(results), 3)
138+ self.assertIdentical(results.pop(0), False)
139+ self.assertIdentical(results.pop(0), False)
140+ self.assertEqual(results.pop(0).type_name, "created")
141+
142+ self.assertEqual(
143+ (yield unit_state.get_resolved()), None)
144+
145+ @inlineCallbacks
146 def test_get_set_clear_relation_resolved(self):
147 """The a unit's realtions can be set to resolved to mark a
148 future transition, with an optional retry flag."""
149@@ -850,6 +886,40 @@
150 {"0": NO_HOOKS})
151
152 @inlineCallbacks
153+ def test_stop_watch_relation_resolved(self):
154+ """A unit resolved watch can be instituted on a permanent basis."""
155+ unit_state = yield self.get_unit_state()
156+
157+ results = []
158+
159+ def callback(value):
160+ results.append(value)
161+
162+ if len(results) == 1:
163+ raise StopWatcher()
164+
165+ if len(results) == 3:
166+ raise StopWatcher()
167+
168+ unit_state.watch_relation_resolved(callback)
169+ yield unit_state.set_relation_resolved({"0": RETRY_HOOKS})
170+ yield unit_state.clear_relation_resolved()
171+ yield self.poke_zk()
172+ self.assertEqual(len(results), 1)
173+
174+ unit_state.watch_relation_resolved(callback)
175+ yield unit_state.set_relation_resolved({"0": RETRY_HOOKS})
176+ yield unit_state.clear_relation_resolved()
177+ yield self.poke_zk()
178+ self.assertEqual(len(results), 3)
179+ self.assertIdentical(results.pop(0), False)
180+ self.assertIdentical(results.pop(0), False)
181+ self.assertEqual(results.pop(0).type_name, "created")
182+
183+ self.assertEqual(
184+ (yield unit_state.get_relation_resolved()), None)
185+
186+ @inlineCallbacks
187 def test_watch_resolved_slow_callback(self):
188 """A slow watch callback is still invoked serially."""
189 unit_state = yield self.get_unit_state()
190
191=== modified file 'ensemble/unit/lifecycle.py'
192--- ensemble/unit/lifecycle.py 2011-05-05 14:31:43 +0000
193+++ ensemble/unit/lifecycle.py 2011-05-05 17:43:23 +0000
194@@ -2,7 +2,7 @@
195 import logging
196
197 from twisted.internet.defer import (
198- inlineCallbacks, DeferredLock)
199+ inlineCallbacks, DeferredLock, returnValue)
200
201 from ensemble.hooks.invoker import Invoker
202 from ensemble.hooks.scheduler import HookScheduler
203@@ -32,7 +32,8 @@
204 self._unit_path = unit_path
205 self._relations = {}
206 self._running = False
207- self._watching = False
208+ self._watching_relation_memberships = False
209+ self._watching_relation_resolved = False
210 self._run_lock = DeferredLock()
211 self._log = logging.getLogger("unit.lifecycle")
212
213@@ -45,21 +46,23 @@
214 return self._relations[relation_id]
215
216 @inlineCallbacks
217- def install(self):
218+ def install(self, fire_hooks=True):
219 """Invoke the unit's install hook.
220 """
221- yield self._execute_hook("install")
222+ if fire_hooks:
223+ yield self._execute_hook("install")
224
225 @inlineCallbacks
226- def upgrade_formula(self):
227+ def upgrade_formula(self, fire_hooks=True):
228 """Invoke the unit's upgrade-formula hook.
229 """
230- yield self._execute_hook("upgrade-formula", now=True)
231+ if fire_hooks:
232+ yield self._execute_hook("upgrade-formula", now=True)
233 # Restart hook queued hook execution.
234 self._executor.start()
235
236 @inlineCallbacks
237- def start(self):
238+ def start(self, fire_hooks=True):
239 """Invoke the start hook, and setup relation watching.
240 """
241 self._log.debug("pre-start acquire, running:%s", self._running)
242@@ -70,22 +73,29 @@
243 assert not self._running, "Already started"
244
245 # Execute the start hook
246- yield self._execute_hook("start")
247+ if fire_hooks:
248+ yield self._execute_hook("start")
249
250 # If we have any existing relations in memory, start them.
251 if self._relations:
252 self._log.debug("starting relation lifecycles")
253
254 for workflow in self._relations.values():
255- # should not transition an
256 yield workflow.transition_state("up")
257
258 # Establish a watch on the existing relations.
259- if not self._watching:
260+ if not self._watching_relation_memberships:
261 self._log.debug("starting service relation watch")
262 yield self._service.watch_relation_states(
263 self._on_service_relation_changes)
264- self._watching = True
265+ self._watching_relation_memberships = True
266+
267+ # Establish a watch for resolved relations
268+ if not self._watching_relation_resolved:
269+ self._log.debug("starting unit relation resolved watch")
270+ yield self._unit.watch_relation_resolved(
271+ self._on_relation_resolved_changes)
272+ self._watching_relation_resolved = True
273
274 # Set current status
275 self._running = True
276@@ -94,7 +104,7 @@
277 self._log.debug("started unit lifecycle")
278
279 @inlineCallbacks
280- def stop(self):
281+ def stop(self, fire_hooks=True):
282 """Stop the unit, executes the stop hook, and stops relation watching.
283 """
284 self._log.debug("pre-stop acquire, running:%s", self._running)
285@@ -110,7 +120,8 @@
286 for workflow in self._relations.values():
287 yield workflow.transition_state("down")
288
289- yield self._execute_hook("stop")
290+ if fire_hooks:
291+ yield self._execute_hook("stop")
292
293 # Set current status
294 self._running = False
295@@ -119,9 +130,11 @@
296 self._log.debug("stopped unit lifecycle")
297
298 @inlineCallbacks
299- def configure(self):
300+ def configure(self, fire_hooks=True):
301 """Inform the unit that its service config has changed.
302 """
303+ if not fire_hooks:
304+ returnValue(None)
305 yield self._run_lock.acquire()
306 try:
307 # Verify State
308@@ -134,6 +147,49 @@
309 self._log.debug("configured unit")
310
311 @inlineCallbacks
312+ def _on_relation_resolved_changes(self, event):
313+ """Callback for unit relation resolved watching.
314+
315+ The callback is invoked whenever the relation resolved
316+ settings change.
317+ """
318+ self._log.debug("relation resolved changed")
319+ # Acquire the run lock, and process the changes.
320+ yield self._run_lock.acquire()
321+
322+ try:
323+ # If the unit lifecycle isn't running we shouldn't process
324+ # any relation resolutions.
325+ if not self._running:
326+ self._log.debug("stop watch relation resolved changes")
327+ self._watching_relation_resolved = False
328+ raise StopWatcher()
329+
330+ self._log.info("processing relation resolved changed")
331+ if self._client.connected:
332+ yield self._process_relation_resolved_changes()
333+ finally:
334+ yield self._run_lock.release()
335+
336+ @inlineCallbacks
337+ def _process_relation_resolved_changes(self):
338+ """Invoke retry transitions on relations if their not running.
339+ """
340+ relation_resolved = yield self._unit.get_relation_resolved()
341+ if relation_resolved is None:
342+ returnValue(None)
343+ else:
344+ yield self._unit.clear_relation_resolved()
345+
346+ keys = set(relation_resolved).intersection(self._relations)
347+ for rel_id in keys:
348+ relation_workflow = self._relations[rel_id]
349+ relation_state = yield relation_workflow.get_state()
350+ if relation_state == "up":
351+ continue
352+ yield relation_workflow.transition_state("up")
353+
354+ @inlineCallbacks
355 def _on_service_relation_changes(self, old_relations, new_relations):
356 """Callback for service relation watching.
357
358@@ -153,7 +209,7 @@
359 # If the lifecycle is not running, then stop the watcher
360 if not self._running:
361 self._log.debug("stop service-rel watcher, discarding changes")
362- self._watching = False
363+ self._watching_relation_memberships = False
364 raise StopWatcher()
365
366 self._log.debug("processing relations changed")
367@@ -163,9 +219,9 @@
368
369 @inlineCallbacks
370 def _process_service_changes(self, old_relations, new_relations):
371- """Add and remove unit lifecycles per the service relations changes.
372+ """Add and remove unit lifecycles per the service relations Determine.
373 """
374- # Determine relation delta of global zk state with our memory state.
375+ # changes relation delta of global zk state with our memory state.
376 new_relations = dict([(service_relation.internal_relation_id,
377 service_relation) for
378 service_relation in new_relations])
379@@ -352,8 +408,11 @@
380 self._error_handler = handler
381
382 @inlineCallbacks
383- def start(self):
384+ def start(self, watches=True):
385 """Start watching related units and executing change hooks.
386+
387+ @param watches: boolean parameter denoting if relation watches
388+ should be started.
389 """
390 yield self._run_lock.acquire()
391 try:
392@@ -364,19 +423,24 @@
393 self._watcher = yield self._unit_relation.watch_related_units(
394 self._scheduler.notify_change)
395 # And start the watcher.
396- yield self._watcher.start()
397+ if watches:
398+ yield self._watcher.start()
399 finally:
400 self._run_lock.release()
401 self._log.debug(
402 "started relation:%s lifecycle", self._relation_name)
403
404 @inlineCallbacks
405- def stop(self):
406+ def stop(self, watches=True):
407 """Stop watching changes and stop executing relation change hooks.
408+
409+ @param watches: boolean parameter denoting if relation watches
410+ should be stopped.
411 """
412 yield self._run_lock.acquire()
413 try:
414- self._watcher.stop()
415+ if watches and self._watcher:
416+ self._watcher.stop()
417 self._scheduler.stop()
418 finally:
419 yield self._run_lock.release()
420
421=== modified file 'ensemble/unit/tests/test_lifecycle.py'
422--- ensemble/unit/tests/test_lifecycle.py 2011-05-05 09:41:11 +0000
423+++ ensemble/unit/tests/test_lifecycle.py 2011-05-05 17:43:23 +0000
424@@ -12,6 +12,8 @@
425 from ensemble.unit.lifecycle import (
426 UnitLifecycle, UnitRelationLifecycle, RelationInvoker)
427
428+from ensemble.unit.workflow import RelationWorkflowState
429+
430 from ensemble.hooks.invoker import Invoker
431 from ensemble.hooks.executor import HookExecutor
432
433@@ -19,9 +21,11 @@
434
435 from ensemble.state.endpoint import RelationEndpoint
436 from ensemble.state.relation import ClientServerUnitWatcher
437+from ensemble.state.service import NO_HOOKS
438 from ensemble.state.tests.test_relation import RelationTestBase
439 from ensemble.state.hook import RelationChange
440
441+
442 from ensemble.lib.testing import TestCase
443 from ensemble.lib.mocker import MATCH
444
445@@ -97,6 +101,8 @@
446 results.append(hook_name)
447 if debug:
448 print "-> exec hook", hook_name
449+ if d.called:
450+ return
451 if results == sequence:
452 d.callback(True)
453 if hook_name == name and count is None:
454@@ -145,6 +151,182 @@
455 return output
456
457
458+class LifecycleResolvedTest(LifecycleTestBase):
459+
460+ @inlineCallbacks
461+ def setUp(self):
462+ yield super(LifecycleResolvedTest, self).setUp()
463+ yield self.setup_default_test_relation()
464+ self.lifecycle = UnitLifecycle(
465+ self.client, self.states["unit"], self.states["service"],
466+ self.unit_directory, self.executor)
467+
468+ def get_unit_relation_workflow(self, states):
469+ state_dir = os.path.join(self.ensemble_directory, "state")
470+ lifecycle = UnitRelationLifecycle(
471+ self.client,
472+ states["unit_relation"],
473+ states["service_relation"].relation_name,
474+ self.unit_directory,
475+ self.executor)
476+
477+ workflow = RelationWorkflowState(
478+ self.client,
479+ states["unit_relation"],
480+ lifecycle,
481+ state_dir)
482+
483+ return (workflow, lifecycle)
484+
485+ @inlineCallbacks
486+ def test_resolved_relation_watch_unit_lifecycle_not_running(self):
487+ """If the unit is not running then no relation resolving is performed.
488+ However the resolution value remains the same.
489+ """
490+ # Start the unit.
491+ yield self.lifecycle.start()
492+
493+ # Wait for the relation to be started.... TODO: async background work
494+ yield self.sleep(0.1)
495+
496+ # Simulate relation down on an individual unit relation
497+ workflow = self.lifecycle.get_relation_workflow(
498+ self.states["unit_relation"].internal_relation_id)
499+ self.assertEqual("up", (yield workflow.get_state()))
500+
501+ yield workflow.transition_state("down")
502+ resolved = self.wait_on_state(workflow, "up")
503+
504+ # Stop the unit lifecycle
505+ yield self.lifecycle.stop()
506+
507+ # Set the relation to resolved
508+ yield self.states["unit"].set_relation_resolved(
509+ {self.states["unit_relation"].internal_relation_id: NO_HOOKS})
510+
511+ # Give a moment for the watch to fire erroneously
512+ yield self.sleep(0.2)
513+
514+ # Ensure we didn't attempt a transition.
515+ self.assertFalse(resolved.called)
516+ self.assertEqual(
517+ {self.states["unit_relation"].internal_relation_id: NO_HOOKS},
518+ (yield self.states["unit"].get_relation_resolved()))
519+
520+ # If the unit is restarted start, we currently have the
521+ # behavior that the unit relation workflow will automatically
522+ # be transitioned back to running, as part of the normal state
523+ # transition. Sigh.. we should have a separate error
524+ # state for relation hooks then down with state variable usage.
525+
526+ @inlineCallbacks
527+ def test_resolved_relation_watch_relation_up(self):
528+ """If a relation marked as to be resolved is already running,
529+ then no work is performed.
530+ """
531+ # Start the unit.
532+ yield self.lifecycle.start()
533+
534+ # Wait for the relation to be started.... TODO: async background work
535+ yield self.sleep(0.1)
536+
537+ # get a hold of the unit relation and verify state
538+ workflow = self.lifecycle.get_relation_workflow(
539+ self.states["unit_relation"].internal_relation_id)
540+ self.assertEqual("up", (yield workflow.get_state()))
541+
542+ # Set the relation to resolved
543+ yield self.states["unit"].set_relation_resolved(
544+ {self.states["unit_relation"].internal_relation_id: NO_HOOKS})
545+
546+ # Give a moment for the async background work.
547+ yield self.sleep(0.1)
548+
549+ # Ensure we're still up and the relation resolved setting has been
550+ # cleared.
551+ self.assertEqual(
552+ None, (yield self.states["unit"].get_relation_resolved()))
553+ self.assertEqual("up", (yield workflow.get_state()))
554+
555+ @inlineCallbacks
556+ def test_resolved_relation_watch_from_error(self):
557+ """Unit lifecycle's will process a unit relation resolved
558+ setting, and transition a down relation back to a running
559+ state.
560+ """
561+ log_output = self.capture_logging(
562+ "unit.lifecycle", level=logging.DEBUG)
563+
564+ # Start the unit.
565+ yield self.lifecycle.start()
566+
567+ # Wait for the relation to be started... TODO: async background work
568+ yield self.sleep(0.1)
569+
570+ # Simulate an error condition
571+ workflow = self.lifecycle.get_relation_workflow(
572+ self.states["unit_relation"].internal_relation_id)
573+ self.assertEqual("up", (yield workflow.get_state()))
574+ yield workflow.fire_transition("error")
575+
576+ resolved = self.wait_on_state(workflow, "up")
577+
578+ # Set the relation to resolved
579+ yield self.states["unit"].set_relation_resolved(
580+ {self.states["unit_relation"].internal_relation_id: NO_HOOKS})
581+
582+ # Wait for the relation to come back up
583+ value = yield self.states["unit"].get_relation_resolved()
584+
585+ yield resolved
586+
587+ # Verify state
588+ value = yield workflow.get_state()
589+ self.assertEqual(value, "up")
590+
591+ self.assertIn(
592+ "processing relation resolved changed", log_output.getvalue())
593+
594+ @inlineCallbacks
595+ def test_resolved_relation_watch(self):
596+ """Unit lifecycle's will process a unit relation resolved
597+ setting, and transition a down relation back to a running
598+ state.
599+ """
600+ log_output = self.capture_logging(
601+ "unit.lifecycle", level=logging.DEBUG)
602+
603+ # Start the unit.
604+ yield self.lifecycle.start()
605+
606+ # Wait for the relation to be started... TODO: async background work
607+ yield self.sleep(0.1)
608+
609+ # Simulate an error condition
610+ workflow = self.lifecycle.get_relation_workflow(
611+ self.states["unit_relation"].internal_relation_id)
612+ self.assertEqual("up", (yield workflow.get_state()))
613+ yield workflow.transition_state("down")
614+
615+ resolved = self.wait_on_state(workflow, "up")
616+
617+ # Set the relation to resolved
618+ yield self.states["unit"].set_relation_resolved(
619+ {self.states["unit_relation"].internal_relation_id: NO_HOOKS})
620+
621+ # Wait for the relation to come back up
622+ value = yield self.states["unit"].get_relation_resolved()
623+
624+ yield resolved
625+
626+ # Verify state
627+ value = yield workflow.get_state()
628+ self.assertEqual(value, "up")
629+
630+ self.assertIn(
631+ "processing relation resolved changed", log_output.getvalue())
632+
633+
634 class UnitLifecycleTest(LifecycleTestBase):
635
636 @inlineCallbacks
637@@ -187,6 +369,45 @@
638 # verify the sockets are cleaned up.
639 self.assertEqual(os.listdir(self.unit_directory), ["formula"])
640
641+ @inlineCallbacks
642+ def test_start_sans_hook(self):
643+ """The lifecycle start can be invoked without firing hooks."""
644+ self.write_hook("start", "#!/bin/sh\n exit 1")
645+ start_executed = self.wait_on_hook("start")
646+ yield self.lifecycle.start(fire_hooks=False)
647+ # Wait for unit relation background processing....
648+ yield self.sleep(0.1)
649+ self.assertFalse(start_executed.called)
650+
651+ @inlineCallbacks
652+ def test_stop_sans_hook(self):
653+ """The lifecycle stop can be invoked without firing hooks."""
654+ self.write_hook("stop", "#!/bin/sh\n exit 1")
655+ stop_executed = self.wait_on_hook("stop")
656+ yield self.lifecycle.start()
657+ yield self.lifecycle.stop(fire_hooks=False)
658+ # Wait for unit relation background processing....
659+ yield self.sleep(0.1)
660+ self.assertFalse(stop_executed.called)
661+
662+ @inlineCallbacks
663+ def test_install_sans_hook(self):
664+ """The lifecycle install can be invoked without firing hooks."""
665+ self.write_hook("install", "#!/bin/sh\n exit 1")
666+ install_executed = self.wait_on_hook("install")
667+ yield self.lifecycle.install(fire_hooks=False)
668+ self.assertFalse(install_executed.called)
669+
670+ @inlineCallbacks
671+ def test_upgrade_sans_hook(self):
672+ """The lifecycle upgrade can be invoked without firing hooks."""
673+ self.executor.stop()
674+ self.write_hook("upgrade-formula", "#!/bin/sh\n exit 1")
675+ upgrade_executed = self.wait_on_hook("upgrade-formula")
676+ yield self.lifecycle.upgrade_formula(fire_hooks=False)
677+ self.assertFalse(upgrade_executed.called)
678+ self.assertTrue(self.executor.running)
679+
680 def test_hook_error(self):
681 """Verify hook execution error, raises an exception."""
682 self.write_hook("install", '#!/bin/sh\n exit 1')
683@@ -196,14 +417,12 @@
684 def test_hook_not_executable(self):
685 """A hook not executable, raises an exception."""
686 self.write_hook("install", '#!/bin/sh\n exit 0', no_exec=True)
687- # It would be preferrable if this was also a formulainvocation error.
688 return self.failUnlessFailure(
689 self.lifecycle.install(), FormulaError)
690
691 def test_hook_not_formatted_correctly(self):
692 """Hook execution error, raises an exception."""
693 self.write_hook("install", '!/bin/sh\n exit 0')
694- # It would be preferrable if this was also a formulainvocation error.
695 return self.failUnlessFailure(
696 self.lifecycle.install(), FormulaInvocationError)
697
698@@ -532,7 +751,7 @@
699 @inlineCallbacks
700 def test_initial_start_lifecycle_no_related_no_exec(self):
701 """
702- If there are no related units on startup, the relation changed hook
703+ If there are no related units on startup, the relation joined hook
704 is not invoked.
705 """
706 file_path = self.makeFile()
707@@ -545,6 +764,26 @@
708 self.assertFalse(os.path.exists(file_path))
709
710 @inlineCallbacks
711+ def test_stop_can_continue_watching(self):
712+ """
713+ """
714+ file_path = self.makeFile()
715+ self.write_hook(
716+ "%s-relation-changed" % self.relation_name,
717+ ("#!/bin/bash\n" "echo executed >> %s\n" % file_path))
718+ rel_states = yield self.add_opposite_service_unit(self.states)
719+ yield self.lifecycle.start()
720+ yield self.wait_on_hook(
721+ sequence=["app-relation-joined", "app-relation-changed"])
722+ changed_executed = self.wait_on_hook("app-relation-changed")
723+ yield self.lifecycle.stop(watches=False)
724+ rel_states["unit_relation"].set_data(yaml.dump(dict(hello="world")))
725+ yield self.sleep(0.1)
726+ self.assertFalse(changed_executed.called)
727+ yield self.lifecycle.start(watches=False)
728+ yield changed_executed
729+
730+ @inlineCallbacks
731 def test_initial_start_lifecycle_with_related(self):
732 """
733 If there are related units on startup, the relation changed hook
734
735=== modified file 'ensemble/unit/tests/test_workflow.py'
736--- ensemble/unit/tests/test_workflow.py 2011-05-05 14:31:43 +0000
737+++ ensemble/unit/tests/test_workflow.py 2011-05-05 17:43:23 +0000
738@@ -83,11 +83,30 @@
739 self.assertFalse(result)
740 current_state = yield self.workflow.get_state()
741 yield self.assertEqual(current_state, "install_error")
742- self.write_hook("install", "#!/bin/bash\necho hello\n")
743 result = yield self.workflow.fire_transition("retry_install")
744 yield self.assertState(self.workflow, "installed")
745
746 @inlineCallbacks
747+ def test_install_error_with_retry_hook(self):
748+ """If the install hook fails, the workflow is transition to the
749+ install_error state.
750+ """
751+ self.write_hook("install", "#!/bin/bash\nexit 1")
752+ result = yield self.workflow.fire_transition("install")
753+ self.assertFalse(result)
754+ current_state = yield self.workflow.get_state()
755+ yield self.assertEqual(current_state, "install_error")
756+
757+ result = yield self.workflow.fire_transition("retry_install_hook")
758+ yield self.assertState(self.workflow, "install_error")
759+
760+ self.write_hook("install", "#!/bin/bash\necho hello\n")
761+ hook_deferred = self.wait_on_hook("install")
762+ result = yield self.workflow.fire_transition_alias("retry_hook")
763+ yield hook_deferred
764+ yield self.assertState(self.workflow, "installed")
765+
766+ @inlineCallbacks
767 def test_start(self):
768 file_path = self.makeFile()
769 self.write_hook(
770@@ -131,12 +150,41 @@
771 current_state = yield self.workflow.get_state()
772 self.assertEqual(current_state, "start_error")
773
774- self.write_hook("start", "#!/bin/bash\necho hello\n")
775 result = yield self.workflow.fire_transition("retry_start")
776 yield self.assertState(self.workflow, "started")
777- # If we don't stop, we'll end up with the relation lifecycle
778- # watches firing in the background when the test stops.
779- self.write_hook("stop", "#!/bin/bash\necho hello\n")
780+
781+ # If we don't stop, we'll end up with the relation lifecycle
782+ # watches firing in the background when the test stops.
783+ result = yield self.workflow.fire_transition("stop")
784+ yield self.assertState(self.workflow, "stopped")
785+
786+ @inlineCallbacks
787+ def test_start_error_with_retry_hook(self):
788+ """Executing the start transition with a hook error, results in the
789+ workflow going to the start_error state. The start can be retried.
790+ """
791+ self.write_hook("install", "#!/bin/bash\necho hello\n")
792+ result = yield self.workflow.fire_transition("install")
793+ self.assertTrue(result)
794+ self.write_hook("start", "#!/bin/bash\nexit 1")
795+ result = yield self.workflow.fire_transition("start")
796+ self.assertFalse(result)
797+ current_state = yield self.workflow.get_state()
798+ self.assertEqual(current_state, "start_error")
799+
800+ hook_deferred = self.wait_on_hook("start")
801+ result = yield self.workflow.fire_transition("retry_start_hook")
802+ yield hook_deferred
803+ yield self.assertState(self.workflow, "start_error")
804+
805+ self.write_hook("start", "#!/bin/bash\nexit 0")
806+ hook_deferred = self.wait_on_hook("start")
807+ result = yield self.workflow.fire_transition_alias("retry_hook")
808+ yield hook_deferred
809+ yield self.assertState(self.workflow, "started")
810+
811+ # If we don't stop, we'll end up with the relation lifecycle
812+ # watches firing in the background when the test stops.
813 result = yield self.workflow.fire_transition("stop")
814 yield self.assertState(self.workflow, "stopped")
815
816@@ -193,13 +241,49 @@
817 yield self.assertState(self.workflow, "configure_error")
818
819 # Verify recovery from error state
820+ result = yield self.workflow.fire_transition_alias("retry")
821+ self.assertTrue(result)
822+ yield self.assertState(self.workflow, "started")
823+
824+ # Stop any background processing
825+ yield self.workflow.fire_transition("stop")
826+
827+ @inlineCallbacks
828+ def test_configure_error_and_retry_hook(self):
829+ """An error while configuring, transitions the unit and
830+ stops the lifecycle."""
831+ #self.capture_output()
832+ yield self.workflow.fire_transition("install")
833+ result = yield self.workflow.fire_transition("start")
834+ self.assertTrue(result)
835+ self.assertState(self.workflow, "started")
836+
837+ # Verify transition to error state
838+ hook_deferred = self.wait_on_hook("config-changed")
839+ self.write_hook("config-changed", "#!/bin/bash\nexit 1")
840+ result = yield self.workflow.fire_transition("reconfigure")
841+ yield hook_deferred
842+ self.assertFalse(result)
843+ yield self.assertState(self.workflow, "configure_error")
844+
845+ # Verify retry hook with hook error stays in error state
846+ hook_deferred = self.wait_on_hook("config-changed")
847+ result = yield self.workflow.fire_transition("retry_configure_hook")
848+
849+ self.assertFalse(result)
850+ yield hook_deferred
851+ yield self.assertState(self.workflow, "configure_error")
852+
853 hook_deferred = self.wait_on_hook("config-changed")
854 self.write_hook("config-changed", "#!/bin/bash\nexit 0")
855- result = yield self.workflow.fire_transition("retry_configure")
856+ result = yield self.workflow.fire_transition_alias("retry_hook")
857 yield hook_deferred
858- self.assertTrue(result)
859 yield self.assertState(self.workflow, "started")
860
861+ # Stop any background processing
862+ yield self.workflow.fire_transition("stop")
863+ yield self.sleep(0.1)
864+
865 @inlineCallbacks
866 def test_upgrade(self):
867 """Upgrading a workflow results in the upgrade hook being
868@@ -235,7 +319,7 @@
869 yield self.workflow.fire_transition("stop")
870
871 @inlineCallbacks
872- def test_upgrade_error_state(self):
873+ def test_upgrade_error_retry(self):
874 """A hook error during an upgrade transitions to
875 upgrade_error.
876 """
877@@ -259,6 +343,41 @@
878 yield self.workflow.fire_transition("retry_upgrade_formula")
879 current_state = yield self.workflow.get_state()
880 self.assertEqual(current_state, "started")
881+
882+ # Stop any background activity
883+ yield self.workflow.fire_transition("stop")
884+
885+ @inlineCallbacks
886+ def test_upgrade_error_retry_hook(self):
887+ """A hook error during an upgrade transitions to
888+ upgrade_error, and can be re-tried with hook execution.
889+ """
890+ yield self.workflow.fire_transition("install")
891+ yield self.workflow.fire_transition("start")
892+ current_state = yield self.workflow.get_state()
893+ self.assertEqual(current_state, "started")
894+
895+ # Agent prepares this.
896+ self.executor.stop()
897+
898+ self.write_hook("upgrade-formula", "#!/bin/bash\nexit 1")
899+ hook_deferred = self.wait_on_hook("upgrade-formula")
900+ yield self.workflow.fire_transition("upgrade_formula")
901+ yield hook_deferred
902+ current_state = yield self.workflow.get_state()
903+ self.assertEqual(current_state, "formula_upgrade_error")
904+
905+ hook_deferred = self.wait_on_hook("upgrade-formula")
906+ self.write_hook("upgrade-formula", "#!/bin/bash\nexit 0")
907+ # The upgrade error hook should ensure that the executor is stoppped.
908+ self.assertFalse(self.executor.running)
909+ yield self.workflow.fire_transition_alias("retry_hook")
910+ yield hook_deferred
911+ current_state = yield self.workflow.get_state()
912+ self.assertEqual(current_state, "started")
913+ self.assertTrue(self.executor.running)
914+
915+ # Stop any background activity
916 yield self.workflow.fire_transition("stop")
917
918 @inlineCallbacks
919@@ -301,13 +420,36 @@
920 self.assertTrue(result)
921 result = yield self.workflow.fire_transition("start")
922 self.assertTrue(result)
923+
924 self.write_hook("stop", "#!/bin/bash\nexit 1")
925 result = yield self.workflow.fire_transition("stop")
926 self.assertFalse(result)
927- current_state = yield self.workflow.get_state()
928- self.assertEqual(current_state, "stop_error")
929+
930+ yield self.assertState(self.workflow, "stop_error")
931 self.write_hook("stop", "#!/bin/bash\necho hello\n")
932 result = yield self.workflow.fire_transition("retry_stop")
933+
934+ yield self.assertState(self.workflow, "stopped")
935+
936+ @inlineCallbacks
937+ def test_stop_error_with_retry_hook(self):
938+ self.write_hook("install", "#!/bin/bash\necho hello\n")
939+ self.write_hook("start", "#!/bin/bash\necho hello\n")
940+ result = yield self.workflow.fire_transition("install")
941+ self.assertTrue(result)
942+ result = yield self.workflow.fire_transition("start")
943+ self.assertTrue(result)
944+
945+ self.write_hook("stop", "#!/bin/bash\nexit 1")
946+ result = yield self.workflow.fire_transition("stop")
947+ self.assertFalse(result)
948+ yield self.assertState(self.workflow, "stop_error")
949+
950+ result = yield self.workflow.fire_transition_alias("retry_hook")
951+ yield self.assertState(self.workflow, "stop_error")
952+
953+ self.write_hook("stop", "#!/bin/bash\nexit 0")
954+ result = yield self.workflow.fire_transition_alias("retry_hook")
955 yield self.assertState(self.workflow, "stopped")
956
957 @inlineCallbacks
958@@ -447,7 +589,7 @@
959 # Add a new unit, and wait for the broken hook to result in
960 # the transition to the down state.
961 yield self.add_opposite_service_unit(self.states)
962- yield self.wait_on_state(self.workflow, "down")
963+ yield self.wait_on_state(self.workflow, "error")
964
965 f_state, history, zk_state = yield self.read_persistent_state(
966 history_id=self.workflow.zk_state_id)
967@@ -458,7 +600,7 @@
968 "formula", "hooks", "app-relation-changed"))
969
970 self.assertEqual(f_state,
971- {"state": "down",
972+ {"state": "error",
973 "state_variables": {
974 "change_type": "joined",
975 "error_message": error}})
976
977=== modified file 'ensemble/unit/workflow.py'
978--- ensemble/unit/workflow.py 2011-05-05 14:31:43 +0000
979+++ ensemble/unit/workflow.py 2011-05-05 17:43:23 +0000
980@@ -14,31 +14,56 @@
981
982
983 UnitWorkflow = Workflow(
984+ # Install transitions
985 Transition("install", "Install", None, "installed",
986 error_transition_id="error_install"),
987- Transition("error_install", "Install Error", None, "install_error"),
988- Transition("retry_install", "Retry Install", "install_error", "installed"),
989+ Transition("error_install", "Install error", None, "install_error"),
990+
991+ Transition("retry_install", "Retry install", "install_error", "installed",
992+ alias="retry"),
993+ Transition("retry_install_hook", "Retry install with hook",
994+ "install_error", "installed", alias="retry_hook"),
995+
996+ # Start transitions
997 Transition("start", "Start", "installed", "started",
998 error_transition_id="error_start"),
999- Transition("error_start", "Start Error", "installed", "start_error"),
1000- Transition("retry_start", "Retry Start", "start_error", "started"),
1001+ Transition("error_start", "Start error", "installed", "start_error"),
1002+ Transition("retry_start", "Retry start", "start_error", "started",
1003+ alias="retry"),
1004+ Transition("retry_start_hook", "Retry start with hook",
1005+ "start_error", "started", alias="retry_hook"),
1006+
1007+ # Stop transitions
1008 Transition("stop", "Stop", "started", "stopped",
1009 error_transition_id="error_stop"),
1010- Transition("error_stop", "Stop Error", "started", "stop_error"),
1011- Transition("retry_stop", "Retry Stop", "stop_error", "stopped"),
1012-
1013- # Upgrade Transitions (stay in state, with success transitition)
1014+ Transition("error_stop", "Stop error", "started", "stop_error"),
1015+ Transition("retry_stop", "Retry stop", "stop_error", "stopped",
1016+ alias="retry"),
1017+ Transition("retry_stop_hook", "Retry stop with hook",
1018+ "stop_error", "stopped", alias="retry_hook"),
1019+
1020+ # Restart transitions
1021+ Transition("restart", "Restart", "stop", "start",
1022+ error_transition_id="error_start", alias="retry"),
1023+ Transition("restart_with_hook", "Restart with hook",
1024+ "stop", "start", alias="retry_hook",
1025+ error_transition_id="error_start"),
1026+
1027+ # Upgrade transitions
1028 Transition(
1029 "upgrade_formula", "Upgrade", "started", "started",
1030 error_transition_id="upgrade_formula_error"),
1031 Transition(
1032- "upgrade_formula_error", "On upgrade error",
1033+ "upgrade_formula_error", "Upgrade from stop error",
1034 "started", "formula_upgrade_error"),
1035 Transition(
1036- "retry_upgrade_formula", "Retry failed upgrade",
1037- "formula_upgrade_error", "started"),
1038+ "retry_upgrade_formula", "Upgrade from stop error",
1039+ "formula_upgrade_error", "started", alias="retry"),
1040+ Transition(
1041+ "retry_upgrade_formula_hook", "Upgrade from stop error with hook",
1042+ "formula_upgrade_error", "started", alias="retry_hook"),
1043
1044- # Configuration Transitions (stay in state, with success transition)
1045+ # Configuration Transitions
1046 Transition(
1047 "reconfigure", "Reconfigure", "started", "started",
1048 error_transition_id="error_configure"),
1049@@ -46,28 +71,56 @@
1050 "error_configure", "On configure error",
1051 "started", "configure_error"),
1052 Transition(
1053+ "retry_error", "On retry configure error",
1054+ "configure_error", "configure_error"),
1055+ Transition(
1056 "retry_configure", "Retry configure",
1057- "configure_error", "started")
1058+ "configure_error", "started", alias="retry",
1059+ error_transition_id="retry_error"),
1060+ Transition(
1061+ "retry_configure_hook", "Retry configure with hooks",
1062+ "configure_error", "started", alias="retry_hook",
1063+ error_transition_id="retry_error")
1064 )
1065
1066
1067-# There's been some discussion, if we should have per change type error states
1068-# here, corresponding to the different changes that the relation-changed hook
1069-# is invoked for. The important aspects to capture are both observability of
1070-# error type locally and globally (zk), and per error type and instance
1071-# recovery of the same. To provide for this functionality without additional
1072-# states, the error information (change type, and error message) are captured
1073-# in state variables which are locally and globally observable. Future
1074-# extension of the restart transition action, will allow for customized
1075-# recovery based on the change type state variable. Effectively this
1076-# differs from the unit definition, in that it collapses three possible error
1077-# states, into a behavior off switch. A separate state will be needed to
1078-# denote departing.
1079+# Unit relation error states
1080+#
1081+# There's been some discussion, if we should have per change type
1082+# error states here, corresponding to the different changes that the
1083+# relation-changed hook is invoked for. The important aspects to
1084+# capture are both observability of error type locally and globally
1085+# (zk), and per error type and instance recovery of the same. To
1086+# provide for this functionality without additional states, the error
1087+# information (change type, and error message) are captured in state
1088+# variables which are locally and globally observable. Future
1089+# extension of the restart transition action, will allow for
1090+# customized recovery based on the change type state
1091+# variable. Effectively this differs from the unit definition, in that
1092+# it collapses three possible error states, into a behavior off
1093+# switch. A separate state will be needed to denote departing.
1094+
1095+
1096+# Process recovery using on disk workflow state
1097+#
1098+# Another interesting issue, process recovery using the on disk state,
1099+# is complicated by consistency to the the in memory state, which
1100+# won't be directly recoverable anymore without some state specific
1101+# semantics to recovering from on disk state, ie a restarted unit
1102+# agent, with a relation in an error state would require special
1103+# semantics around loading from disk to ensure that the in-memory
1104+# process state (watching and scheduling but not executing) matches
1105+# the recovery transition actions (which just restart hook execution,
1106+# but assume the watch continues).. this functionality added to better
1107+# allow for the behavior that while down due to a hook error, the
1108+# relation would continues to schedule pending hooks
1109
1110 RelationWorkflow = Workflow(
1111 Transition("start", "Start", None, "up"),
1112 Transition("stop", "Stop", "up", "down"),
1113- Transition("restart", "Restart", "down", "up"),
1114+ Transition("restart", "Restart", "down", "up", alias="retry"),
1115+ Transition("error", "Relation hook error", "up", "error"),
1116+ Transition("reset", "Recover from hook error", "error", "up"),
1117 Transition("depart", "Relation broken", "up", "departed"),
1118 Transition("down_depart", "Relation broken", "down", "departed"),
1119 )
1120@@ -223,7 +276,6 @@
1121 row per entry with CSV escaping.
1122 """
1123 state_serialized = yaml.safe_dump(state_dict)
1124-
1125 # State File
1126 with open(self.state_file_path, "w") as handle:
1127 handle.write(state_serialized)
1128@@ -243,6 +295,7 @@
1129 return {"state": None}
1130 with open(self.state_file_path, "r") as handle:
1131 content = handle.read()
1132+
1133 return yaml.load(content)
1134
1135
1136@@ -282,51 +335,77 @@
1137 self._lifecycle = lifecycle
1138
1139 @inlineCallbacks
1140- def _invoke_lifecycle(self, method):
1141+ def _invoke_lifecycle(self, method, *args, **kw):
1142 try:
1143- result = yield method()
1144+ result = yield method(*args, **kw)
1145 except (FileNotFound, FormulaError, FormulaInvocationError), e:
1146 raise TransitionError(e)
1147 returnValue(result)
1148
1149- # Transition Actions
1150+ # Install transitions
1151 def do_install(self):
1152 return self._invoke_lifecycle(self._lifecycle.install)
1153
1154+ def do_retry_install(self):
1155+ return self._invoke_lifecycle(self._lifecycle.install,
1156+ fire_hooks=False)
1157+
1158+ def do_retry_install_hook(self):
1159+ return self._invoke_lifecycle(self._lifecycle.install)
1160+
1161+ # Start transitions
1162 def do_start(self):
1163 return self._invoke_lifecycle(self._lifecycle.start)
1164
1165- def do_stop(self):
1166- return self._invoke_lifecycle(self._lifecycle.stop)
1167-
1168 def do_retry_start(self):
1169+ return self._invoke_lifecycle(self._lifecycle.start,
1170+ fire_hooks=False)
1171+
1172+ def do_retry_start_hook(self):
1173 return self._invoke_lifecycle(self._lifecycle.start)
1174
1175+ # Stop transitions
1176+ def do_stop(self):
1177+ return self._invoke_lifecycle(self._lifecycle.stop)
1178+
1179 def do_retry_stop(self):
1180- self._invoke_lifecycle(self._lifecycle.stop)
1181-
1182- def do_retry_install(self):
1183- return self._invoke_lifecycle(self._lifecycle.install)
1184+ return self._invoke_lifecycle(self._lifecycle.stop,
1185+ fire_hooks=False)
1186+
1187+ def do_retry_stop_hook(self):
1188+ return self._invoke_lifecycle(self._lifecycle.stop)
1189+
1190+ # Upgrade transititions
1191+ def do_upgrade_formula(self):
1192+ return self._invoke_lifecycle(self._lifecycle.upgrade_formula)
1193
1194 def do_retry_upgrade_formula(self):
1195- return self._invoke_lifecycle(self._lifecycle.upgrade_formula)
1196-
1197- def do_upgrade_formula(self):
1198- return self._invoke_lifecycle(self._lifecycle.upgrade_formula)
1199-
1200- # Some of this needs support from the resolved branches, as we
1201- # want to fire some of these lifecycle methods sans hooks.
1202+ return self._invoke_lifecycle(self._lifecycle.upgrade_formula,
1203+ fire_hooks=False)
1204+
1205+ def do_retry_upgrade_formula_hook(self):
1206+ return self._invoke_lifecycle(self._lifecycle.upgrade_formula)
1207+
1208+ # Config transitions
1209 def do_error_configure(self):
1210- # self._invoke_lifecycle(self._lifecycle.stop, fire_hooks=False)
1211- pass
1212+ return self._invoke_lifecycle(self._lifecycle.stop, fire_hooks=False)
1213
1214 def do_reconfigure(self):
1215 return self._invoke_lifecycle(self._lifecycle.configure)
1216
1217+ def do_retry_error(self):
1218+ return self._invoke_lifecycle(self._lifecycle.stop, fire_hooks=False)
1219+
1220+ @inlineCallbacks
1221 def do_retry_configure(self):
1222- # self._invoke_lifecycle(self._lifecycle.start, fire_hooks=False)
1223- self._invoke_lifecycle(
1224- self._lifecycle.configure) # fire_hooks=False)
1225+ yield self._invoke_lifecycle(self._lifecycle.start, fire_hooks=False)
1226+ yield self._invoke_lifecycle(self._lifecycle.configure,
1227+ fire_hooks=False)
1228+
1229+ @inlineCallbacks
1230+ def do_retry_configure_hook(self):
1231+ yield self._invoke_lifecycle(self._lifecycle.start, fire_hooks=False)
1232+ yield self._invoke_lifecycle(self._lifecycle.configure)
1233
1234
1235 class RelationWorkflowState(DiskWorkflowState):
1236@@ -360,7 +439,7 @@
1237
1238 @param: error: The error from hook invocation.
1239 """
1240- yield self.fire_transition("stop",
1241+ yield self.fire_transition("error",
1242 change_type=relation_change.change_type,
1243 error_message=str(error))
1244
1245@@ -369,12 +448,30 @@
1246 """Transition the workflow to the 'down' state.
1247
1248 Turns off the unit-relation lifecycle monitoring and hook execution.
1249+
1250+ :param error_info: If called on relation hook error, contains
1251+ error variables.
1252 """
1253 yield self._lifecycle.stop()
1254
1255 @inlineCallbacks
1256+ def do_reset(self):
1257+ """Transition the workflow to the 'up' state from an error state.
1258+
1259+ Turns on the unit-relation lifecycle monitoring and hook execution.
1260+ """
1261+ yield self._lifecycle.start(watches=False)
1262+
1263+ @inlineCallbacks
1264+ def do_error(self, **error_info):
1265+ """A relation hook error, stops further execution hooks but
1266+ continues to watch for changes.
1267+ """
1268+ yield self._lifecycle.stop(watches=False)
1269+
1270+ @inlineCallbacks
1271 def do_restart(self):
1272- """Transition the workflow to the 'up' state.
1273+ """Transition the workflow to the 'up' state from the down state.
1274
1275 Turns on the unit-relation lifecycle monitoring and hook execution.
1276 """

Subscribers

People subscribed via source and target branches

to status/vote changes: