Merge lp:~rvb/maas/retry-power-changes into lp:~maas-committers/maas/trunk
- retry-power-changes
- Merge into trunk
Status: | Merged |
---|---|
Approved by: | Raphaël Badin |
Approved revision: | no longer in the source branch. |
Merged at revision: | 2595 |
Proposed branch: | lp:~rvb/maas/retry-power-changes |
Merge into: | lp:~maas-committers/maas/trunk |
Diff against target: |
424 lines (+331/-17) 4 files modified
src/provisioningserver/rpc/clusterservice.py (+5/-5) src/provisioningserver/rpc/power.py (+83/-0) src/provisioningserver/rpc/tests/test_clusterservice.py (+11/-12) src/provisioningserver/rpc/tests/test_power.py (+232/-0) |
To merge this branch: | bzr merge lp:~rvb/maas/retry-power-changes |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Gavin Panella (community) | Needs Fixing | ||
Julian Edwards (community) | Approve | ||
Review via email: mp+227884@code.launchpad.net |
Commit message
Add utility to retry powering up or down a node.
Description of the change
This is part of the robustness work.
In between retries, the code waits a little longer every time. This is to "avoid" a race condition where a node gets power-cycled and then is still in the process of being power-cycled when the power state check occurs. This is obviously a bit fragile but it's better than nothing.
I deliberately didn't use the retries() utility that we have because: a) I don't think the waiting time should be constant (to avoid the race condition I mentioned above) b) I didn't want to use a fixed timeout: some power templates take a long time executing and this shouldn't mean they can't be retried.
Raphaël Badin (rvb) wrote : | # |
Thanks for the review. First batch of changes published. I'm working on your suggestion to use Clock() now.
Julian Edwards (julian-edwards) wrote : | # |
On Wednesday 23 Jul 2014 12:23:49 you wrote:
> Not dealing with errors was deliberate: I don't want the retry loop to catch
> failures at all. Failures should only happen when there is a problem
> executing the power template (syntax problem, missing argument, etc.) and I
> don't think we should catch any of those. If something inside
> perform_
> swallow it right? It will simply be "raised" from 'yield deferToThread(…)
> correct?
I'm afraid I don't agree with this approach. :)
Yes, Twisted internally turns it into a Failure, and then yield will re-raise
the exception it contains.
We've had instances in the past where we've had support requests for "power
not working" which was caused by a dodgy template that someone had edited, and
they were not looking at the cluster log at all!
I really think that we should catch these and flag them up somewhere.
> That's not part of the interface (yet). I think the node event log will
> contain the explanation of why it failed but we can always refine this
> later based on actual user testing.
OK.
I'd like to see a failure_reason text on the node for this situation. The
user should be able to see this alongside the BROKEN status IMO.
> Fixed (I didn't realize we had so many matchers to deal with mocks). Here
> and elsewhere.
They're great, add more :)
Raphaël Badin (rvb) wrote : | # |
On 07/23/2014 02:33 PM, Julian Edwards wrote:
> On Wednesday 23 Jul 2014 12:23:49 you wrote:
>> Not dealing with errors was deliberate: I don't want the retry loop to catch
>> failures at all. Failures should only happen when there is a problem
>> executing the power template (syntax problem, missing argument, etc.) and I
>> don't think we should catch any of those. If something inside
>> perform_
>> swallow it right? It will simply be "raised" from 'yield deferToThread(…)
>> correct?
>
> I'm afraid I don't agree with this approach. :)
>
> Yes, Twisted internally turns it into a Failure, and then yield will re-raise
> the exception it contains.
>
> We've had instances in the past where we've had support requests for "power
> not working" which was caused by a dodgy template that someone had edited, and
> they were not looking at the cluster log at all!
Hum, I still think the stacktrace belongs to the log file… but one thing
is sure (and means that, like you said, we need to take care of this
case): we want the node marked as broken when the template execution
blows up.
>> Fixed (I didn't realize we had so many matchers to deal with mocks). Here
>> and elsewhere.
>
> They're great, add more :)
The errors I'm getting aren't so great sometimes:
[...]
raise mismatch_error
MismatchError: !=:
reference = [call(context-
power_change=
actual = [call(context-
power_change=
: after <function get_mock_calls> on <Mock name='mock(
id='14054746274
Julian Edwards (julian-edwards) wrote : | # |
On Wednesday 23 Jul 2014 12:48:24 you wrote:
> Hum, I still think the stacktrace belongs to the log file… but one thing
> is sure (and means that, like you said, we need to take care of this
> case): we want the node marked as broken when the template execution
> blows up.
There's no point putting a stack trace in the log for template errors. Other
errors, yes.
Julian Edwards (julian-edwards) wrote : | # |
Getting there! Couple of comments inline
Raphaël Badin (rvb) wrote : | # |
Thanks for the review!
Julian Edwards (julian-edwards) wrote : | # |
review: approve
On Thursday 24 Jul 2014 10:36:00 you wrote:
> Turns out simply using 'yield power.change_
> (instead of calling clock.callLater) is enough to remove the
> clock.advance(0.1). I /think/ this is due to the fact that all the waiting
> done inside power.change_
> the method so the initial yield ensures the first run is done (and thus the
> first check succeeds) and after that, you can control the waiting by using
> clock.advance.
\o/
Awesome.
Julian Edwards (julian-edwards) : | # |
Raphaël Badin (rvb) : | # |
Gavin Panella (allenap) wrote : | # |
A couple of things I noticed that means this branch needs fixing. Also, perform_
Gavin Panella (allenap) : | # |
Julian Edwards (julian-edwards) wrote : | # |
On Sunday 27 Jul 2014 11:47:18 you wrote:
> Same here, mark_node_broken() does not exist.
>
> I think this highlights a shortcoming with using Mock: it lets you do
> anything. A couple of things come to mind that might help here:
>
> 1. Don't mock the whole client object, just mock __call__.
> 2. Use mock.create_
> underlying implementation does (so you can mock the whole client).
*cough* end-to-end fixture *cough*
Gavin Panella (allenap) wrote : | # |
> *cough* end-to-end fixture *cough*
This would means configuring Django and creating databases in the pserv tests. The pserv tests are currently fast and quick to iterate on, and they won't be if we do that.
We actually need to test against the RPC schemas. That's why I've been doing my utmost to keep them tightly defined. The call_responder helper function does that back-and-forth through the schemas. Where we need to stub out actual RPC calls we ought to use call_responder to keep us honest.
Julian Edwards (julian-edwards) wrote : | # |
On Sunday 27 Jul 2014 13:49:06 Gavin Panella wrote:
> > *cough* end-to-end fixture *cough*
>
> This would means configuring Django and creating databases in the pserv
> tests. The pserv tests are currently fast and quick to iterate on, and they
> won't be if we do that.
>
> We actually need to test against the RPC schemas. That's why I've been doing
> my utmost to keep them tightly defined. The call_responder helper function
> does that back-and-forth through the schemas. Where we need to stub out
> actual RPC calls we ought to use call_responder to keep us honest.
I didn't mean that we don't need a real Django running; a mock one will do.
But we need something that responds like a real call to avoid the mistake that
rvb made.
Gavin Panella (allenap) wrote : | # |
> I didn't mean that we don't need a real Django running; a mock one
> will do. But we need something that responds like a real call to
> avoid the mistake that rvb made.
Ah, okeydoke, then I agree. I think we can dynamically create an amp.AMP
subclass with stub responders that correspond to a subset of amp.Command
calls.
Maybe something like:
def make_stub_
responders = {
for command in commands
}
return type("DynamicAMP", amp.AMP, responders)
amp = make_stub_
amp.
call_
"fillings": ["ketchup", "crisps", "scratchings"],
})
The `Client` returned from ClusterService.
patched to use call_responder in this way. Or, as ClusterRPCFixture
does, the connections that the client uses need to be patched.
Julian Edwards (julian-edwards) wrote : | # |
Right, my end aim is to be able to do client(
and have that talk to a mock backend with canned responses, maybe even hook it
into a function in the tests that does that responding.
Perhaps we can talk more about it at the sprint this week!
Raphaël Badin (rvb) wrote : | # |
> The client has no method mark_node_broken(). This needs to be client(
>
Damn. Using Mock is indeed a bit nasty indeed, and I wasn't really
about this… but I thought you had the end-to-end fixture in progress so
my intention to switch to that once it was done. Let's work on the
fixture this week.
Preview Diff
1 | === modified file 'src/provisioningserver/rpc/clusterservice.py' | |||
2 | --- src/provisioningserver/rpc/clusterservice.py 2014-07-22 14:38:34 +0000 | |||
3 | +++ src/provisioningserver/rpc/clusterservice.py 2014-07-24 10:46:38 +0000 | |||
4 | @@ -32,7 +32,6 @@ | |||
5 | 32 | ArchitectureRegistry, | 32 | ArchitectureRegistry, |
6 | 33 | PowerTypeRegistry, | 33 | PowerTypeRegistry, |
7 | 34 | ) | 34 | ) |
8 | 35 | from provisioningserver.power.poweraction import PowerAction | ||
9 | 36 | from provisioningserver.rpc import ( | 35 | from provisioningserver.rpc import ( |
10 | 37 | cluster, | 36 | cluster, |
11 | 38 | common, | 37 | common, |
12 | @@ -45,6 +44,7 @@ | |||
13 | 45 | get_preseed_data, | 44 | get_preseed_data, |
14 | 46 | validate_license_key, | 45 | validate_license_key, |
15 | 47 | ) | 46 | ) |
16 | 47 | from provisioningserver.rpc.power import change_power_state | ||
17 | 48 | from twisted.application.internet import ( | 48 | from twisted.application.internet import ( |
18 | 49 | StreamServerEndpointService, | 49 | StreamServerEndpointService, |
19 | 50 | TimerService, | 50 | TimerService, |
20 | @@ -150,15 +150,15 @@ | |||
21 | 150 | @cluster.PowerOn.responder | 150 | @cluster.PowerOn.responder |
22 | 151 | def power_on(self, system_id, power_type, context): | 151 | def power_on(self, system_id, power_type, context): |
23 | 152 | """Turn a node on.""" | 152 | """Turn a node on.""" |
26 | 153 | action = PowerAction(power_type) | 153 | change_power_state( |
27 | 154 | action.execute(power_change='on', **context) | 154 | system_id, power_type, power_change='on', context=context) |
28 | 155 | return {} | 155 | return {} |
29 | 156 | 156 | ||
30 | 157 | @cluster.PowerOff.responder | 157 | @cluster.PowerOff.responder |
31 | 158 | def power_off(self, system_id, power_type, context): | 158 | def power_off(self, system_id, power_type, context): |
32 | 159 | """Turn a node off.""" | 159 | """Turn a node off.""" |
35 | 160 | action = PowerAction(power_type) | 160 | change_power_state( |
36 | 161 | action.execute(power_change='off', **context) | 161 | system_id, power_type, power_change='off', context=context) |
37 | 162 | return {} | 162 | return {} |
38 | 163 | 163 | ||
39 | 164 | @amp.StartTLS.responder | 164 | @amp.StartTLS.responder |
40 | 165 | 165 | ||
41 | === added file 'src/provisioningserver/rpc/power.py' | |||
42 | --- src/provisioningserver/rpc/power.py 1970-01-01 00:00:00 +0000 | |||
43 | +++ src/provisioningserver/rpc/power.py 2014-07-24 10:46:38 +0000 | |||
44 | @@ -0,0 +1,83 @@ | |||
45 | 1 | # Copyright 2014 Canonical Ltd. This software is licensed under the | ||
46 | 2 | # GNU Affero General Public License version 3 (see the file LICENSE). | ||
47 | 3 | |||
48 | 4 | """RPC helpers relating to power control.""" | ||
49 | 5 | |||
50 | 6 | from __future__ import ( | ||
51 | 7 | absolute_import, | ||
52 | 8 | print_function, | ||
53 | 9 | unicode_literals, | ||
54 | 10 | ) | ||
55 | 11 | |||
56 | 12 | str = None | ||
57 | 13 | |||
58 | 14 | __metaclass__ = type | ||
59 | 15 | __all__ = [ | ||
60 | 16 | "change_power_state", | ||
61 | 17 | ] | ||
62 | 18 | |||
63 | 19 | |||
64 | 20 | from provisioningserver.power.poweraction import PowerAction | ||
65 | 21 | from provisioningserver.rpc import getRegionClient | ||
66 | 22 | from provisioningserver.utils import pause | ||
67 | 23 | from twisted.internet import reactor | ||
68 | 24 | from twisted.internet.defer import inlineCallbacks | ||
69 | 25 | from twisted.internet.threads import deferToThread | ||
70 | 26 | |||
71 | 27 | # List of power_types that support querying the power state. | ||
72 | 28 | # change_power_state() will only retry changing the power | ||
73 | 29 | # state for these power types. | ||
74 | 30 | # This is meant to be temporary until all the power types support | ||
75 | 31 | # querying the power state of a node. | ||
76 | 32 | QUERY_POWER_TYPES = ['amt', 'ipmi'] | ||
77 | 33 | |||
78 | 34 | |||
79 | 35 | def perform_power_change(system_id, power_type, power_change, context): | ||
80 | 36 | """Issue the given `power_change` command. | ||
81 | 37 | |||
82 | 38 | If any exception is raised during the execution of the command, mark | ||
83 | 39 | the node as broken and re-raise the exception. | ||
84 | 40 | """ | ||
85 | 41 | action = PowerAction(power_type) | ||
86 | 42 | try: | ||
87 | 43 | return action.execute(power_change=power_change, **context) | ||
88 | 44 | except Exception: | ||
89 | 45 | client = getRegionClient() | ||
90 | 46 | client.mark_node_broken(system_id) | ||
91 | 47 | raise | ||
92 | 48 | |||
93 | 49 | |||
94 | 50 | @inlineCallbacks | ||
95 | 51 | def change_power_state(system_id, power_type, power_change, context, | ||
96 | 52 | clock=reactor): | ||
97 | 53 | """Change the power state of a node. | ||
98 | 54 | |||
99 | 55 | Monitor the result of the power change action by querying the | ||
100 | 56 | power state of the node and mark the node as failed if it doesn't | ||
101 | 57 | work. | ||
102 | 58 | """ | ||
103 | 59 | assert power_change in ('on', 'off'), ( | ||
104 | 60 | "Unknown power change: %s" % power_change) | ||
105 | 61 | |||
106 | 62 | # Use increasing waiting times to work around race conditions that could | ||
107 | 63 | # arise when power-cycling the node. | ||
108 | 64 | for waiting_time in (3, 5, 10): | ||
109 | 65 | # Perform power change. | ||
110 | 66 | yield deferToThread( | ||
111 | 67 | perform_power_change, system_id, power_type, power_change, | ||
112 | 68 | context) | ||
113 | 69 | # If the power_type doesn't support querying the power state: | ||
114 | 70 | # exit now. | ||
115 | 71 | if power_type not in QUERY_POWER_TYPES: | ||
116 | 72 | return | ||
117 | 73 | # Wait to let the node some time to change its power state. | ||
118 | 74 | yield pause(waiting_time, clock) | ||
119 | 75 | # Check current power state. | ||
120 | 76 | new_power_state = yield deferToThread( | ||
121 | 77 | perform_power_change, system_id, power_type, 'query', context) | ||
122 | 78 | if new_power_state == power_change: | ||
123 | 79 | return | ||
124 | 80 | |||
125 | 81 | # Failure: the power state of the node hasn't changed: mark it as broken. | ||
126 | 82 | client = getRegionClient() | ||
127 | 83 | client.mark_node_broken(system_id) | ||
128 | 0 | 84 | ||
129 | === modified file 'src/provisioningserver/rpc/tests/test_clusterservice.py' | |||
130 | --- src/provisioningserver/rpc/tests/test_clusterservice.py 2014-07-22 14:38:34 +0000 | |||
131 | +++ src/provisioningserver/rpc/tests/test_clusterservice.py 2014-07-24 10:46:38 +0000 | |||
132 | @@ -857,8 +857,8 @@ | |||
133 | 857 | responder = protocol.locateResponder(self.command.commandName) | 857 | responder = protocol.locateResponder(self.command.commandName) |
134 | 858 | self.assertIsNot(responder, None) | 858 | self.assertIsNot(responder, None) |
135 | 859 | 859 | ||
138 | 860 | def test_executes_a_power_action(self): | 860 | def test_executes_change_power_state(self): |
139 | 861 | PowerAction = self.patch(clusterservice, "PowerAction") | 861 | change_power_state = self.patch(clusterservice, "change_power_state") |
140 | 862 | 862 | ||
141 | 863 | system_id = factory.make_name("system_id") | 863 | system_id = factory.make_name("system_id") |
142 | 864 | power_type = factory.make_name("power_type") | 864 | power_type = factory.make_name("power_type") |
143 | @@ -873,17 +873,16 @@ | |||
144 | 873 | }) | 873 | }) |
145 | 874 | 874 | ||
146 | 875 | def check(response): | 875 | def check(response): |
147 | 876 | self.assertThat(PowerAction, MockCalledOnceWith(power_type)) | ||
148 | 877 | self.assertThat( | 876 | self.assertThat( |
150 | 878 | PowerAction.return_value.execute, | 877 | change_power_state, |
151 | 879 | MockCalledOnceWith( | 878 | MockCalledOnceWith( |
154 | 880 | power_change=self.expected_power_change, | 879 | system_id, power_type, |
155 | 881 | **context)) | 880 | power_change=self.expected_power_change, context=context)) |
156 | 882 | return d.addCallback(check) | 881 | return d.addCallback(check) |
157 | 883 | 882 | ||
158 | 884 | def test_power_on_can_propagate_UnknownPowerType(self): | 883 | def test_power_on_can_propagate_UnknownPowerType(self): |
161 | 885 | PowerAction = self.patch(clusterservice, "PowerAction") | 884 | self.patch(clusterservice, "change_power_state").side_effect = ( |
162 | 886 | PowerAction.side_effect = UnknownPowerType | 885 | UnknownPowerType) |
163 | 887 | 886 | ||
164 | 888 | d = call_responder(Cluster(), self.command, { | 887 | d = call_responder(Cluster(), self.command, { |
165 | 889 | "system_id": "id", "power_type": "type", "context": {}, | 888 | "system_id": "id", "power_type": "type", "context": {}, |
166 | @@ -897,8 +896,8 @@ | |||
167 | 897 | return d.addErrback(check) | 896 | return d.addErrback(check) |
168 | 898 | 897 | ||
169 | 899 | def test_power_on_can_propagate_NotImplementedError(self): | 898 | def test_power_on_can_propagate_NotImplementedError(self): |
172 | 900 | PowerAction = self.patch(clusterservice, "PowerAction") | 899 | self.patch(clusterservice, "change_power_state").side_effect = ( |
173 | 901 | PowerAction.side_effect = NotImplementedError | 900 | NotImplementedError) |
174 | 902 | 901 | ||
175 | 903 | d = call_responder(Cluster(), self.command, { | 902 | d = call_responder(Cluster(), self.command, { |
176 | 904 | "system_id": "id", "power_type": "type", "context": {}, | 903 | "system_id": "id", "power_type": "type", "context": {}, |
177 | @@ -912,8 +911,8 @@ | |||
178 | 912 | return d.addErrback(check) | 911 | return d.addErrback(check) |
179 | 913 | 912 | ||
180 | 914 | def test_power_on_can_propagate_PowerActionFail(self): | 913 | def test_power_on_can_propagate_PowerActionFail(self): |
183 | 915 | PowerAction = self.patch(clusterservice, "PowerAction") | 914 | self.patch(clusterservice, "change_power_state").side_effect = ( |
184 | 916 | PowerAction.return_value.execute.side_effect = PowerActionFail | 915 | PowerActionFail) |
185 | 917 | 916 | ||
186 | 918 | d = call_responder(Cluster(), self.command, { | 917 | d = call_responder(Cluster(), self.command, { |
187 | 919 | "system_id": "id", "power_type": "type", "context": {}, | 918 | "system_id": "id", "power_type": "type", "context": {}, |
188 | 920 | 919 | ||
189 | === added file 'src/provisioningserver/rpc/tests/test_power.py' | |||
190 | --- src/provisioningserver/rpc/tests/test_power.py 1970-01-01 00:00:00 +0000 | |||
191 | +++ src/provisioningserver/rpc/tests/test_power.py 2014-07-24 10:46:38 +0000 | |||
192 | @@ -0,0 +1,232 @@ | |||
193 | 1 | # Copyright 2014 Canonical Ltd. This software is licensed under the | ||
194 | 2 | # GNU Affero General Public License version 3 (see the file LICENSE). | ||
195 | 3 | |||
196 | 4 | """Tests for :py:module:`~provisioningserver.rpc.power`.""" | ||
197 | 5 | |||
198 | 6 | from __future__ import ( | ||
199 | 7 | absolute_import, | ||
200 | 8 | print_function, | ||
201 | 9 | unicode_literals, | ||
202 | 10 | ) | ||
203 | 11 | |||
204 | 12 | str = None | ||
205 | 13 | |||
206 | 14 | __metaclass__ = type | ||
207 | 15 | __all__ = [] | ||
208 | 16 | |||
209 | 17 | |||
210 | 18 | import random | ||
211 | 19 | |||
212 | 20 | from maastesting.factory import factory | ||
213 | 21 | from maastesting.matchers import ( | ||
214 | 22 | MockCalledOnceWith, | ||
215 | 23 | MockCallsMatch, | ||
216 | 24 | MockNotCalled, | ||
217 | 25 | ) | ||
218 | 26 | from maastesting.testcase import MAASTestCase | ||
219 | 27 | from mock import ( | ||
220 | 28 | call, | ||
221 | 29 | Mock, | ||
222 | 30 | ) | ||
223 | 31 | from provisioningserver.rpc import power | ||
224 | 32 | from testtools.deferredruntest import ( | ||
225 | 33 | assert_fails_with, | ||
226 | 34 | AsynchronousDeferredRunTest, | ||
227 | 35 | ) | ||
228 | 36 | from twisted.internet.defer import ( | ||
229 | 37 | inlineCallbacks, | ||
230 | 38 | maybeDeferred, | ||
231 | 39 | ) | ||
232 | 40 | from twisted.internet.task import Clock | ||
233 | 41 | |||
234 | 42 | |||
235 | 43 | class TestPowerHelpers(MAASTestCase): | ||
236 | 44 | |||
237 | 45 | run_tests_with = AsynchronousDeferredRunTest.make_factory(timeout=5) | ||
238 | 46 | |||
239 | 47 | def patch_power_action(self, return_value=None, side_effect=None): | ||
240 | 48 | """Patch the PowerAction object. | ||
241 | 49 | |||
242 | 50 | Patch the PowerAction object so that PowerAction().execute | ||
243 | 51 | is replaced by a Mock object created using the given `return_value` | ||
244 | 52 | and `side_effect`. | ||
245 | 53 | |||
246 | 54 | This can be used to simulate various successes or failures patterns | ||
247 | 55 | while manipulating the power state of a node. | ||
248 | 56 | |||
249 | 57 | Returns a tuple of mock objects: power.PowerAction and | ||
250 | 58 | power.PowerAction().execute. | ||
251 | 59 | """ | ||
252 | 60 | power_action_obj = Mock() | ||
253 | 61 | power_action_obj_execute = Mock( | ||
254 | 62 | return_value=return_value, side_effect=side_effect) | ||
255 | 63 | power_action_obj.execute = power_action_obj_execute | ||
256 | 64 | power_action = self.patch(power, 'PowerAction') | ||
257 | 65 | power_action.return_value = power_action_obj | ||
258 | 66 | return power_action, power_action_obj_execute | ||
259 | 67 | |||
260 | 68 | @inlineCallbacks | ||
261 | 69 | def test_change_power_state_changes_power_state(self): | ||
262 | 70 | system_id = factory.make_name('system_id') | ||
263 | 71 | power_type = random.choice(power.QUERY_POWER_TYPES) | ||
264 | 72 | power_change = random.choice(['on', 'off']) | ||
265 | 73 | context = { | ||
266 | 74 | factory.make_name('context-key'): factory.make_name('context-val') | ||
267 | 75 | } | ||
268 | 76 | self.patch(power, 'pause') | ||
269 | 77 | client = Mock() | ||
270 | 78 | self.patch(power, 'getRegionClient').return_value = client | ||
271 | 79 | # Patch the power action utility so that it says the node is | ||
272 | 80 | # in the required power state. | ||
273 | 81 | power_action, execute = self.patch_power_action( | ||
274 | 82 | return_value=power_change) | ||
275 | 83 | |||
276 | 84 | yield power.change_power_state( | ||
277 | 85 | system_id, power_type, power_change, context) | ||
278 | 86 | self.assertThat( | ||
279 | 87 | execute, | ||
280 | 88 | MockCallsMatch( | ||
281 | 89 | # One call to change the power state. | ||
282 | 90 | call(power_change=power_change, **context), | ||
283 | 91 | # One call to query the power state. | ||
284 | 92 | call(power_change='query', **context), | ||
285 | 93 | ), | ||
286 | 94 | ) | ||
287 | 95 | # The node hasn't been marked broken. | ||
288 | 96 | self.assertThat(client.mark_node_broken, MockNotCalled()) | ||
289 | 97 | |||
290 | 98 | @inlineCallbacks | ||
291 | 99 | def test_change_power_state_doesnt_retry_for_certain_power_types(self): | ||
292 | 100 | system_id = factory.make_name('system_id') | ||
293 | 101 | # Use a power type that is not among power.QUERY_POWER_TYPES. | ||
294 | 102 | power_type = factory.make_name('power_type') | ||
295 | 103 | power_change = random.choice(['on', 'off']) | ||
296 | 104 | context = { | ||
297 | 105 | factory.make_name('context-key'): factory.make_name('context-val') | ||
298 | 106 | } | ||
299 | 107 | self.patch(power, 'pause') | ||
300 | 108 | client = Mock() | ||
301 | 109 | self.patch(power, 'getRegionClient').return_value = client | ||
302 | 110 | power_action, execute = self.patch_power_action( | ||
303 | 111 | return_value=random.choice(['on', 'off'])) | ||
304 | 112 | |||
305 | 113 | yield power.change_power_state( | ||
306 | 114 | system_id, power_type, power_change, context) | ||
307 | 115 | self.assertThat( | ||
308 | 116 | execute, | ||
309 | 117 | MockCallsMatch( | ||
310 | 118 | # Only one call to change the power state. | ||
311 | 119 | call(power_change=power_change, **context), | ||
312 | 120 | ), | ||
313 | 121 | ) | ||
314 | 122 | # The node hasn't been marked broken. | ||
315 | 123 | self.assertThat(client.mark_node_broken, MockNotCalled()) | ||
316 | 124 | |||
317 | 125 | @inlineCallbacks | ||
318 | 126 | def test_change_power_state_retries_if_power_state_doesnt_change(self): | ||
319 | 127 | system_id = factory.make_name('system_id') | ||
320 | 128 | power_type = random.choice(power.QUERY_POWER_TYPES) | ||
321 | 129 | power_change = 'on' | ||
322 | 130 | context = { | ||
323 | 131 | factory.make_name('context-key'): factory.make_name('context-val') | ||
324 | 132 | } | ||
325 | 133 | self.patch(power, 'pause') | ||
326 | 134 | client = Mock() | ||
327 | 135 | self.patch(power, 'getRegionClient').return_value = client | ||
328 | 136 | # Simulate a failure to power up the node, then a success. | ||
329 | 137 | power_action, execute = self.patch_power_action( | ||
330 | 138 | side_effect=[None, 'off', None, 'on']) | ||
331 | 139 | |||
332 | 140 | yield power.change_power_state( | ||
333 | 141 | system_id, power_type, power_change, context) | ||
334 | 142 | self.assertThat( | ||
335 | 143 | execute, | ||
336 | 144 | MockCallsMatch( | ||
337 | 145 | call(power_change=power_change, **context), | ||
338 | 146 | call(power_change='query', **context), | ||
339 | 147 | call(power_change=power_change, **context), | ||
340 | 148 | call(power_change='query', **context), | ||
341 | 149 | ) | ||
342 | 150 | ) | ||
343 | 151 | # The node hasn't been marked broken. | ||
344 | 152 | self.assertThat(client.mark_node_broken, MockNotCalled()) | ||
345 | 153 | |||
346 | 154 | @inlineCallbacks | ||
347 | 155 | def test_change_power_state_marks_the_node_broken_if_failure(self): | ||
348 | 156 | system_id = factory.make_name('system_id') | ||
349 | 157 | power_type = random.choice(power.QUERY_POWER_TYPES) | ||
350 | 158 | power_change = 'on' | ||
351 | 159 | context = { | ||
352 | 160 | factory.make_name('context-key'): factory.make_name('context-val') | ||
353 | 161 | } | ||
354 | 162 | self.patch(power, 'pause') | ||
355 | 163 | client = Mock() | ||
356 | 164 | self.patch(power, 'getRegionClient').return_value = client | ||
357 | 165 | # Simulate a persistent failure. | ||
358 | 166 | power_action, execute = self.patch_power_action(return_value='off') | ||
359 | 167 | |||
360 | 168 | yield power.change_power_state( | ||
361 | 169 | system_id, power_type, power_change, context) | ||
362 | 170 | |||
363 | 171 | # The node has been marked broken. | ||
364 | 172 | self.assertThat( | ||
365 | 173 | client.mark_node_broken, MockCalledOnceWith(system_id)) | ||
366 | 174 | |||
367 | 175 | def test_change_power_state_marks_the_node_broken_if_exception(self): | ||
368 | 176 | system_id = factory.make_name('system_id') | ||
369 | 177 | power_type = random.choice(power.QUERY_POWER_TYPES) | ||
370 | 178 | power_change = 'on' | ||
371 | 179 | context = { | ||
372 | 180 | factory.make_name('context-key'): factory.make_name('context-val') | ||
373 | 181 | } | ||
374 | 182 | self.patch(power, 'pause') | ||
375 | 183 | client = Mock() | ||
376 | 184 | self.patch(power, 'getRegionClient').return_value = client | ||
377 | 185 | # Simulate an exception. | ||
378 | 186 | power_action, execute = self.patch_power_action(side_effect=Exception) | ||
379 | 187 | |||
380 | 188 | d = power.change_power_state( | ||
381 | 189 | system_id, power_type, power_change, context) | ||
382 | 190 | assert_fails_with(d, Exception) | ||
383 | 191 | return d.addErrback( | ||
384 | 192 | lambda failure: self.assertThat( | ||
385 | 193 | client.mark_node_broken, MockCalledOnceWith(system_id))) | ||
386 | 194 | |||
387 | 195 | def test_change_power_state_pauses_in_between_retries(self): | ||
388 | 196 | system_id = factory.make_name('system_id') | ||
389 | 197 | power_type = random.choice(power.QUERY_POWER_TYPES) | ||
390 | 198 | power_change = 'on' | ||
391 | 199 | context = { | ||
392 | 200 | factory.make_name('context-key'): factory.make_name('context-val') | ||
393 | 201 | } | ||
394 | 202 | client = Mock() | ||
395 | 203 | self.patch(power, 'getRegionClient').return_value = client | ||
396 | 204 | # Simulate two failures to power up the node, then a success. | ||
397 | 205 | power_action, execute = self.patch_power_action( | ||
398 | 206 | side_effect=[None, 'off', None, 'off', None, 'on']) | ||
399 | 207 | self.patch(power, "deferToThread", maybeDeferred) | ||
400 | 208 | clock = Clock() | ||
401 | 209 | |||
402 | 210 | calls_and_pause = [ | ||
403 | 211 | ([ | ||
404 | 212 | call(power_change=power_change, **context), | ||
405 | 213 | ], 3), | ||
406 | 214 | ([ | ||
407 | 215 | call(power_change='query', **context), | ||
408 | 216 | call(power_change=power_change, **context), | ||
409 | 217 | ], 5), | ||
410 | 218 | ([ | ||
411 | 219 | call(power_change='query', **context), | ||
412 | 220 | call(power_change=power_change, **context), | ||
413 | 221 | ], 10), | ||
414 | 222 | ([ | ||
415 | 223 | call(power_change='query', **context), | ||
416 | 224 | ], 0), | ||
417 | 225 | ] | ||
418 | 226 | calls = [] | ||
419 | 227 | yield power.change_power_state( | ||
420 | 228 | system_id, power_type, power_change, context, clock=clock) | ||
421 | 229 | for newcalls, waiting_time in calls_and_pause: | ||
422 | 230 | calls.extend(newcalls) | ||
423 | 231 | self.assertThat(execute, MockCallsMatch(*calls)) | ||
424 | 232 | clock.advance(waiting_time) |
Good branch but a few things to fix:
1. Use the test matchers, Luke! (See inline)
2. You're not catching Failures from the deferToThread calls (yield turns errback into exceptions)
3. The tests aren't very Twisted-like in terms of using a Clock(), which means that you've not tested the pause durations anywhere. If you don't know how to use Clock() ping me on IRC and I'll show you.