Merge lp:~jimbaker/pyjuju/expose-cleanup into lp:pyjuju

Proposed by Jim Baker
Status: Merged
Approved by: Gustavo Niemeyer
Approved revision: 331
Merged at revision: 335
Proposed branch: lp:~jimbaker/pyjuju/expose-cleanup
Merge into: lp:pyjuju
Diff against target: 835 lines (+510/-100)
9 files modified
ensemble/control/shutdown.py (+2/-1)
ensemble/providers/ec2/__init__.py (+30/-5)
ensemble/providers/ec2/machine.py (+0/-2)
ensemble/providers/ec2/securitygroup.py (+110/-1)
ensemble/providers/ec2/tests/common.py (+48/-2)
ensemble/providers/ec2/tests/test_connect.py (+1/-1)
ensemble/providers/ec2/tests/test_provider.py (+1/-0)
ensemble/providers/ec2/tests/test_securitygroup.py (+178/-6)
ensemble/providers/ec2/tests/test_shutdown.py (+140/-82)
To merge this branch: bzr merge lp:~jimbaker/pyjuju/expose-cleanup
Reviewer Review Type Date Requested Status
Gustavo Niemeyer Approve
William Reade (community) Approve
Review via email: mp+72132@code.launchpad.net

Description of the change

In the shutdown of EC2 machines, deletes the associated machine security groups (identified by using the Reservation metadata when describing instances), and if possible the environment security group.

This code necessarily does a polling of EC2, via describe_instances, to determine when the machine has moved from shutting-down to terminated, since only then can the security groups be deleted. In practice, the polling used here should be generous, but we may want to consider fine tuning the sleep interval (necessary to avoid unnecessarily hammering EC2) and the max number of polls. It might also be desirable to provide the latter as an option in the shutdown command.

To post a comment you must log in.
Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :
Download full text (3.2 KiB)

The logic looks mostly good, besides for [8] maybe. Some organizational issues, though.

[1]

+ def _get_ensemble_security_group(self):
+ return "ensemble-%s" % self.environment_name
+
+ def _get_machine_security_group(self, instance):
+ ensemble_security_group = self._get_ensemble_security_group()
+ for group in instance.reservation.groups:
+ if group != ensemble_security_group:
+ return group
+
+ # Ignore if no such group exists; this allows some limited
+ # backwards compatibility with old setups without machine
+ # security group
+ log.debug("No associated group for instance %r", instance.instance_id)
+ return None
+
+ @inlineCallbacks
+ def _delete_security_group(self, group):
+ yield self.ec2.delete_security_group(group)
+ log.debug("Deleted security group %r", group)

These utility functions also have little relation to the provider interface
itself. Please move them to top levels under ensemble.providers.ec2.securitygroup

[2]

+ # Repeatedly poll EC2 until instances are in terminated state;
(...)
+ except EC2Error:
+ pass # Ignore, cannot delete if other machines are using
+
+ returnValue(killable_machines)

Same thing. Big algorithm polluting the interface of the provider with
lots of domain-specific knowledge about a minor aspect.

Please create a function named remove_security_groups under the same
module as suggested above, and test it in isolation.

[3]

+ i = 0
+ while wait_on and i < 60:
(...)
+ i += 1
+ if wait_on:
+ yield self._sleep(1) # Reasonable sleep to continue polling

Minor hint:

    for i in range(60):
        (...)
        if not wait_on:
            break
        yield self._sleep(1)

[4]

Try dropping the sleep in that loop entirely. The describe_instances
operation is a natural limiter.

When you do that, how many iterations do you get in practice before
the operation completes, and how many seconds it took?

[5]

+ def _sleep(self, delay):
+ """Non-blocking sleep."""
+ from twisted.internet import reactor
+
+ deferred = Deferred()
+ reactor.callLater(delay, deferred.callback, None)
+ return deferred

If you dropped it in the point above, remove this. If not, please move
this to ensemble.lib.twistedutils and test it accordingly.

[6]

+ log.debug("%d Info on instance: %s, %s",
+ i, instance.instance_id, instance.instance_state)

Slightly better debug message:

    "[iteraction %d] Instance %s state: %s"

[7]

+ log.debug("Instance %r was terminated",
+ instance.instance_id)

Drop this.. the exact same information was just logged.

[8]

My understanding is that describe_instances would blow up if some of the
instances were not found. If that's the case, the algorithm won't work
in practice.

[9]

+ # Wait for the security groups deleted in this round
+ yield DeferredList(deletions)

Why? It's killing parallelism unnecessarily, it seems.

[10]

+ log.info("Shutdo...

Read more...

review: Needs Fixing
Revision history for this message
Jim Baker (jimbaker) wrote :
Download full text (4.2 KiB)

On Fri, Aug 19, 2011 at 2:27 PM, Gustavo Niemeyer <email address hidden>wrote:

> Review: Needs Fixing
> The logic looks mostly good, besides for [8] maybe. Some organizational
> issues, though.
>
> [1]
>
> + def _get_ensemble_security_group(self):
> + return "ensemble-%s" % self.environment_name
> +
> + def _get_machine_security_group(self, instance):
> + ensemble_security_group = self._get_ensemble_security_group()
> + for group in instance.reservation.groups:
> + if group != ensemble_security_group:
> + return group
> +
> + # Ignore if no such group exists; this allows some limited
> + # backwards compatibility with old setups without machine
> + # security group
> + log.debug("No associated group for instance %r",
> instance.instance_id)
> + return None
> +
> + @inlineCallbacks
> + def _delete_security_group(self, group):
> + yield self.ec2.delete_security_group(group)
> + log.debug("Deleted security group %r", group)
>
> These utility functions also have little relation to the provider interface
> itself. Please move them to top levels under
> ensemble.providers.ec2.securitygroup
>

Moved

>
> [2]
>
> + # Repeatedly poll EC2 until instances are in terminated state;
> (...)
> + except EC2Error:
> + pass # Ignore, cannot delete if other machines are using
> +
> + returnValue(killable_machines)
>
> Same thing. Big algorithm polluting the interface of the provider with
> lots of domain-specific knowledge about a minor aspect.
>
> Please create a function named remove_security_groups under the same
> module as suggested above, and test it in isolation.
>

Moved and tested now also in test_securitygroup. Note there are still tests
in test_shutdown

> [3]
>
> + i = 0
> + while wait_on and i < 60:
> (...)
> + i += 1
> + if wait_on:
> + yield self._sleep(1) # Reasonable sleep to continue
> polling
>
> Minor hint:
>
> for i in range(60):
> (...)
> if not wait_on:
> break
> yield self._sleep(1)
>

Changed to this form

>
> [4]
>
> Try dropping the sleep in that loop entirely. The describe_instances
> operation is a natural limiter.
>

Removed the sleep

>
> When you do that, how many iterations do you get in practice before
> the operation completes, and how many seconds it took?
>

and empirically observed approx 500ms per op

> [5]
>
> + def _sleep(self, delay):
> + """Non-blocking sleep."""
> + from twisted.internet import reactor
> +
> + deferred = Deferred()
> + reactor.callLater(delay, deferred.callback, None)
> + return deferred
>
> If you dropped it in the point above, remove this. If not, please move
> this to ensemble.lib.twistedutils and test it accordingly.
>

Removed

>
> [6]
>
> + log.debug("%d Info on instance: %s, %s",
> + i, instance.instance_id,
> instance.instance_state)
>
> Slightly better debug message:
>
> "[iteraction %d] Instance %s state: %s"
>
>
This was not intended to be in the branch pushed, so removed

> [...

Read more...

Revision history for this message
William Reade (fwereade) wrote :

Just a few notes:

[0]

Sorry, you've got a potentially nasty merge inbound :(. Give me a shout if there's anything I can help with.

[1]

The security_group methods on EC2's MachineProvider, which moved to securitygroup.py, haven't been deleted from MachineProvider yet (but they do seem to be unused now)

[2]

+ returnValue(killable_machines)

I think it would be slightly more technically correct (the best kind of correct!) to construct terminated_machines from the list of terminated_ids, since you have them available.

[3]

+def remove_security_groups(provider, killable_ids):

Consider renaming killable_ids to terminated_ids, or perhaps instance_ids -- I can think of arguments either way, but I don't think "killable" is the right word to describe them at this stage.

[4]

+ # Delete the security group for the environment as a whole.

I think it would be more sensible to override MachineProvider.destroy_environment() on ec2.MachineProvider, and do this in there, rather than trying and failing on shutdown of any other machine. Note that the base destroy_environment can raise, so you might want something like this:

    def destroy_environment(self):
        try:
            super(MachineProvider, self).destroy_environment()
        finally:
            destroy_environment_security_group(self)

...which would at least ensure we made a good-faith attempt to do everything required to destroy the full environment, although the current logging in the face of multiple errors is a bit nonexistent.

All that said, LGTM; I'm very happy to see a clean API again. Thanks, and +1.

review: Approve
Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :

This looks good. A couple of comments on top of the changes:

[11]

You got [9] reversed, as I pointed out on IRC:

<niemeyer> jimbaker: I was pointing out that you don't have to wait for deletion to finish on all of the removed groups before running another round
<niemeyer> jimbaker: since it kills parallelism unnecessarily
<niemeyer> jimbaker: You can wait for them to finish after _all_ of the groups have been deleted
<niemeyer> jimbaker: Instead, you've moved backwards, and made it wait on each individual group's deletion

[12]

+def _get_ensemble_security_group(provider):
+ """Get EC2 security group name for environment of `provider`."""
+ return "ensemble-%s" % provider.environment_name

+ def _get_ensemble_security_group(self):
+ return "ensemble-%s" % self.environment_name

There's no point in having that in a function if it's going to be
duplicated either way. Either unify, or just inline.

review: Approve
lp:~jimbaker/pyjuju/expose-cleanup updated
332. By Jim Baker

Delete security groups in the background

333. By Jim Baker

Removed debugging

334. By Jim Baker

Minor cleanup

335. By Jim Baker

Only remove env security group in context of provider destroy_environment

336. By Jim Baker

Reworked remaining mocks

337. By Jim Baker

Test new securitygroup function

338. By Jim Baker

Merged trunk & resolved conflicts

339. By Jim Baker

Fixed merge issues

340. By Jim Baker

Merged trunk

341. By Jim Baker

PEP8 & PyFlakes

Revision history for this message
Jim Baker (jimbaker) wrote :

On Wed, Aug 24, 2011 at 2:20 AM, William Reade
<email address hidden>wrote:

> Review: Approve
> Just a few notes:
>
> [0]
>
> Sorry, you've got a potentially nasty merge inbound :(. Give me a shout if
> there's anything I can help with.
>

It wasn't so bad, but definitely it wasn't trivial.

>
> [1]
>
> The security_group methods on EC2's MachineProvider, which moved to
> securitygroup.py, haven't been deleted from MachineProvider yet (but they do
> seem to be unused now)
>

Removed these (Gustavo pointed out one in his review too)

>
> [2]
>
> + returnValue(killable_machines)
>
> I think it would be slightly more technically correct (the best kind of
> correct!) to construct terminated_machines from the list of terminated_ids,
> since you have them available.
>

 Sorry, that was going to make it, but I just realized it wasn't done. I
could do that as a trivial.

> [3]
>
> +def remove_security_groups(provider, killable_ids):
>
> Consider renaming killable_ids to terminated_ids, or perhaps instance_ids
> -- I can think of arguments either way, but I don't think "killable" is the
> right word to describe them at this stage.
>

Renamed.

>
> [4]
>
> + # Delete the security group for the environment as a whole.
>
> I think it would be more sensible to override
> MachineProvider.destroy_environment() on ec2.MachineProvider, and do this in
> there, rather than trying and failing on shutdown of any other machine. Note
> that the base destroy_environment can raise, so you might want something
> like this:
>
> def destroy_environment(self):
> try:
> super(MachineProvider, self).destroy_environment()
> finally:
> destroy_environment_security_group(self)
>
> ...which would at least ensure we made a good-faith attempt to do
> everything required to destroy the full environment, although the current
> logging in the face of multiple errors is a bit nonexistent.
>

Changed as requested and a number of tests suitably adjusted in their
mocking expectations.

Revision history for this message
Jim Baker (jimbaker) wrote :

On Thu, Aug 25, 2011 at 2:24 PM, Gustavo Niemeyer <email address hidden>wrote:

> Review: Approve
> This looks good. A couple of comments on top of the changes:
>
> [11]
>
> You got [9] reversed, as I pointed out on IRC:
>
> <niemeyer> jimbaker: I was pointing out that you don't have to wait for
> deletion to finish on all of the removed groups before running another round
> <niemeyer> jimbaker: since it kills parallelism unnecessarily
> <niemeyer> jimbaker: You can wait for them to finish after _all_ of the
> groups have been deleted
> <niemeyer> jimbaker: Instead, you've moved backwards, and made it wait on
> each individual group's deletion
>

Machine security groups are now deleted in the background. The environment
security group deletion is performed in a separate step, due to William's
point that should be done from destroy_environment, which made a lot of
sense.

>
> [12]
>
>
> +def _get_ensemble_security_group(provider):
> + """Get EC2 security group name for environment of `provider`."""
> + return "ensemble-%s" % provider.environment_name
>
> + def _get_ensemble_security_group(self):
> + return "ensemble-%s" % self.environment_name
>
> There's no point in having that in a function if it's going to be
> duplicated either way. Either unify, or just inline.
>
> Removed, this was code that wasn't deleted as part of the move to
securitygroup.py (as in a move operation can be thought of as add + delete
:) ). William noticed the same thing.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'ensemble/control/shutdown.py'
2--- ensemble/control/shutdown.py 2011-08-11 19:49:32 +0000
3+++ ensemble/control/shutdown.py 2011-08-29 18:36:25 +0000
4@@ -22,7 +22,8 @@
5
6 value = raw_input(
7 "Warning, this will destroy all machines and services in the\n"
8- "environment. Continue [y/N]")
9+ "%r (type: %s) environment. Continue [y/N]" % (
10+ environment.name, environment.type))
11
12 if value.strip().lower() not in ("y", "yes"):
13 options.log.info("Shutdown aborted")
14
15=== modified file 'ensemble/providers/ec2/__init__.py'
16--- ensemble/providers/ec2/__init__.py 2011-08-26 12:24:00 +0000
17+++ ensemble/providers/ec2/__init__.py 2011-08-29 18:36:25 +0000
18@@ -2,7 +2,6 @@
19 import re
20
21 from twisted.internet.defer import inlineCallbacks, returnValue
22-
23 from txaws.ec2.exception import EC2Error
24 from txaws.service import AWSServiceRegion
25
26@@ -14,7 +13,8 @@
27 from .launch import EC2LaunchMachine
28 from .machine import EC2ProviderMachine, machine_from_instance
29 from .securitygroup import (
30- open_provider_port, close_provider_port, get_provider_opened_ports)
31+ open_provider_port, close_provider_port, get_provider_opened_ports,
32+ remove_security_groups, destroy_environment_security_group)
33 from .utils import get_region_uri
34
35
36@@ -104,6 +104,21 @@
37 returnValue(machines)
38
39 @inlineCallbacks
40+ def destroy_environment(self):
41+ """Terminate all associated machines and security groups.
42+
43+ The super defintion of this method terminates each machine in
44+ the environment; this needs to be augmented here by also
45+ removing the security group for the environment.
46+ """
47+ try:
48+ killed_machines = yield super(MachineProvider, self).\
49+ destroy_environment()
50+ returnValue(killed_machines)
51+ finally:
52+ yield destroy_environment_security_group(self)
53+
54+ @inlineCallbacks
55 def shutdown_machines(self, machines):
56 """Terminate machines associated with this provider.
57
58@@ -119,9 +134,19 @@
59
60 ids = [m.instance_id for m in machines]
61 killable_machines = yield self.get_machines(ids)
62- if killable_machines:
63- killable_ids = [m.instance_id for m in killable_machines]
64- yield self.ec2.terminate_instances(*killable_ids)
65+ if not killable_machines:
66+ returnValue([]) # Nothing to do
67+
68+ killable_ids = [m.instance_id for m in killable_machines]
69+ terminated = yield self.ec2.terminate_instances(*killable_ids)
70+
71+ # Pass on what was actually terminated, in the case the
72+ # machine has somehow disappeared since get_machines
73+ # above. This is to avoid getting EC2Error: Error Message:
74+ # Invalid id when running ec2.describe_instances in
75+ # remove_security_groups
76+ terminated_ids = [info[0] for info in terminated]
77+ yield remove_security_groups(self, terminated_ids)
78 returnValue(killable_machines)
79
80 def open_port(self, machine, machine_id, port, protocol="tcp"):
81
82=== modified file 'ensemble/providers/ec2/machine.py'
83--- ensemble/providers/ec2/machine.py 2011-08-17 13:34:45 +0000
84+++ ensemble/providers/ec2/machine.py 2011-08-29 18:36:25 +0000
85@@ -1,5 +1,3 @@
86-import datetime
87-
88 from ensemble.machine import ProviderMachine
89
90
91
92=== modified file 'ensemble/providers/ec2/securitygroup.py'
93--- ensemble/providers/ec2/securitygroup.py 2011-08-23 09:59:59 +0000
94+++ ensemble/providers/ec2/securitygroup.py 2011-08-29 18:36:25 +0000
95@@ -1,11 +1,17 @@
96-from twisted.internet.defer import inlineCallbacks, returnValue
97+from twisted.internet.defer import Deferred, inlineCallbacks, returnValue
98 from txaws.ec2.exception import EC2Error
99
100 from ensemble.errors import ProviderInteractionError
101+from ensemble.lib.twistutils import gather_results
102
103 from .utils import log
104
105
106+def _get_ensemble_security_group(provider):
107+ """Get EC2 security group name for environment of `provider`."""
108+ return "ensemble-%s" % provider.environment_name
109+
110+
111 def _get_machine_group_name(provider, machine_id):
112 """Get EC2 security group name associated just with `machine_id`."""
113 return "ensemble-%s-%s" % (provider.environment_name, machine_id)
114@@ -87,3 +93,106 @@
115 # Ensemble (at this time at least)
116 opened_ports.add((from_port, ip_permission.ip_protocol))
117 returnValue(opened_ports)
118+
119+
120+def _get_machine_security_group_from_instance(provider, instance):
121+ """Parses the `reservation` of `instance` to get assoc machine group."""
122+ ensemble_security_group = _get_ensemble_security_group(provider)
123+ for group in instance.reservation.groups:
124+ if group != ensemble_security_group:
125+ return group
126+
127+ # Ignore if no such group exists; this allows some limited
128+ # backwards compatibility with old setups without machine
129+ # security group
130+ log.info("Ignoring missing machine security group for instance %r",
131+ instance.instance_id)
132+ return None
133+
134+
135+@inlineCallbacks
136+def _delete_security_group(provider, group, completed, deletion_required=True):
137+ """Wrap EC2 delete_security_group."""
138+ try:
139+ yield provider.ec2.delete_security_group(group)
140+ log.debug("Deleted security group %r", group)
141+ completed.callback(None)
142+ except EC2Error, e:
143+ if deletion_required:
144+ completed.errback(ProviderInteractionError(
145+ "EC2 error when attempting to delete group %s: %s" % (
146+ group, e)))
147+ else:
148+ completed.callback(None)
149+
150+
151+def _background_delete_security_group(provider, group, deletion_required=True):
152+ """Runs deletion task in the background.
153+
154+ Wait on this returned deferred for completion of the deletion.
155+ """
156+ from twisted.internet import reactor
157+
158+ completed = Deferred()
159+ reactor.callLater(
160+ 0, _delete_security_group,
161+ provider, group, completed, deletion_required=deletion_required)
162+ return completed
163+
164+
165+@inlineCallbacks
166+def remove_security_groups(provider, instance_ids):
167+ """Remove security groups associated with `instance_ids` for `provider`"""
168+ log.info(
169+ "Waiting on %d EC2 instances to transition to terminated state, "
170+ "this may take a while", len(instance_ids))
171+
172+ # Repeatedly poll EC2 until instances are in terminated state;
173+ # upon reaching that state, delete associated machine security
174+ # groups. The limit of 200 polls is arbitrary and could be
175+ # specified by a command line option (and/or an overall
176+ # timeout). It's based on an observed ~500 ms roundtrip time per
177+ # call of the describe_instances web service, along with typically
178+ # taking about 40s to move all instances to a terminated state.
179+ wait_on = set(instance_ids)
180+ pending_deletions = []
181+ for i in xrange(200):
182+ if not wait_on:
183+ break
184+ instances = yield provider.ec2.describe_instances(*wait_on)
185+ for instance in instances:
186+ if instance.instance_state == "terminated":
187+ log.debug("Instance %r was terminated",
188+ instance.instance_id)
189+ wait_on.discard(instance.instance_id)
190+ group = _get_machine_security_group_from_instance(
191+ provider, instance)
192+ if group:
193+ pending_deletions.append(
194+ _background_delete_security_group(provider, group))
195+ if wait_on:
196+ outstanding = [
197+ _get_machine_security_group_from_instance(provider, instance)
198+ for instance in instances]
199+ log.error("Instance shutdown taking too long, "
200+ "could not delete groups %s",
201+ ", ".join(sorted(outstanding)))
202+
203+ # Wait for all pending deletions to complete
204+ yield gather_results(pending_deletions)
205+
206+
207+@inlineCallbacks
208+def destroy_environment_security_group(provider):
209+ """Delete the security group for the environment of `provider`"""
210+ group = _get_ensemble_security_group(provider)
211+ try:
212+ yield provider.ec2.delete_security_group(group)
213+ log.debug("Deleted environment security group %r", group)
214+ returnValue(True)
215+ except EC2Error, e:
216+ # Ignore, since this is only attempting to cleanup
217+ log.debug(
218+ "Ignoring EC2 error when attempting to delete group %s: %s" % (
219+ group, e))
220+ returnValue(False)
221
222=== modified file 'ensemble/providers/ec2/tests/common.py'
223--- ensemble/providers/ec2/tests/common.py 2011-08-23 13:01:13 +0000
224+++ ensemble/providers/ec2/tests/common.py 2011-08-29 18:36:25 +0000
225@@ -14,6 +14,7 @@
226
227
228 MATCH_AMI = MATCH(lambda image_id: image_id.startswith("ami-"))
229+MATCH_GROUP = MATCH(lambda x: x.startswith("ensemble-moon"))
230
231
232 class EC2TestMixin(object):
233@@ -36,8 +37,11 @@
234 """
235 return MachineProvider(self.env_name, self.get_config())
236
237- def get_instance(self, instance_id, state="running", **kwargs):
238- groups = kwargs.pop("groups", ["ensemble-%s" % self.env_name])
239+ def get_instance(self,
240+ instance_id, state="running", machine_id=42, **kwargs):
241+ groups = kwargs.pop("groups",
242+ ["ensemble-%s" % self.env_name,
243+ "ensemble-%s-%s" % (self.env_name, machine_id)])
244 reservation = Reservation("x", "y", groups=groups)
245 return Instance(instance_id, state, reservation=reservation, **kwargs)
246
247@@ -168,3 +172,45 @@
248 # connect grabs the first host of a set.
249 self.ec2.describe_instances(hosts[0].instance_id)
250 self.mocker.result(succeed([hosts[0]]))
251+
252+
253+class MockInstanceState(object):
254+ """Mock the result of ec2_describe_instances when called successively.
255+
256+ Each call of :method:`get_round` returns a list of mock `Instance`
257+ objects, using the state for that round. Instance IDs not used in
258+ the round (and passed in from ec2_describe_instances) are
259+ automatically skipped."""
260+
261+ def __init__(self, tester, instance_ids, machine_ids, states):
262+ self.tester = tester
263+ self.instance_ids = instance_ids
264+ self.machine_ids = machine_ids
265+ self.states = states
266+ self.round = 0
267+
268+ def get_round(self, *current_instance_ids):
269+ result = []
270+ for instance_id, machine_id, state in zip(
271+ self.instance_ids, self.machine_ids, self.states[self.round]):
272+ if instance_id not in current_instance_ids:
273+ # Ignore instance_ids that are no longer being
274+ # described, because they have since moved into a
275+ # terminated state
276+ continue
277+ result.append(self.tester.get_instance(instance_id,
278+ machine_id=machine_id,
279+ state=state))
280+ self.round += 1
281+ return succeed(result)
282+
283+
284+class Observed(object):
285+ """Minimal wrapper just to ensure :method:`add` returns a `Deferred`."""
286+
287+ def __init__(self):
288+ self.items = set()
289+
290+ def add(self, item):
291+ self.items.add(item)
292+ return succeed(True)
293
294=== modified file 'ensemble/providers/ec2/tests/test_connect.py'
295--- ensemble/providers/ec2/tests/test_connect.py 2011-08-19 16:21:13 +0000
296+++ ensemble/providers/ec2/tests/test_connect.py 2011-08-29 18:36:25 +0000
297@@ -113,7 +113,7 @@
298 # Hand back a real connected client to test the wait on initialization.
299 instance = self.get_instance("i-foobar", dns_name="foo.example.com")
300 connected_client = ZookeeperClient()
301- self.client = connected_client # for poke_zk
302+ self.client = connected_client # for poke_zk
303 self.mock_connect(False, instance, succeed(connected_client))
304
305 self.mocker.replay()
306
307=== modified file 'ensemble/providers/ec2/tests/test_provider.py'
308--- ensemble/providers/ec2/tests/test_provider.py 2011-08-18 12:19:22 +0000
309+++ ensemble/providers/ec2/tests/test_provider.py 2011-08-29 18:36:25 +0000
310@@ -115,6 +115,7 @@
311 serialized.pop("authorized-keys", None)
312 self.assertEqual(config, serialized)
313
314+
315 class FailCreateTest(TestCase):
316
317 def test_conflicting_authorized_keys_options(self):
318
319=== modified file 'ensemble/providers/ec2/tests/test_securitygroup.py'
320--- ensemble/providers/ec2/tests/test_securitygroup.py 2011-08-18 12:19:22 +0000
321+++ ensemble/providers/ec2/tests/test_securitygroup.py 2011-08-29 18:36:25 +0000
322@@ -7,12 +7,13 @@
323 from ensemble.lib.testing import TestCase
324 from ensemble.machine import ProviderMachine
325 from ensemble.providers.ec2.securitygroup import (
326- open_provider_port, close_provider_port, get_provider_opened_ports)
327-
328-from .common import EC2TestMixin
329-
330-
331-class EC2SecurityGroupTest(EC2TestMixin, TestCase):
332+ open_provider_port, close_provider_port, get_provider_opened_ports,
333+ remove_security_groups, destroy_environment_security_group)
334+from ensemble.providers.ec2.tests.common import (
335+ EC2TestMixin, MATCH_GROUP, Observed, MockInstanceState)
336+
337+
338+class EC2PortMgmtTest(EC2TestMixin, TestCase):
339
340 @inlineCallbacks
341 def test_open_provider_port(self):
342@@ -127,3 +128,174 @@
343 str(ex),
344 "Unexpected EC2Error getting open ports on machine i-foobar: "
345 "The instance ID 'i-foobar' does not exist")
346+
347+
348+class EC2RemoveGroupsTest(EC2TestMixin, TestCase):
349+
350+ @inlineCallbacks
351+ def test_remove_security_groups(self):
352+ """Test normal running seen in polling instances for their state."""
353+ system_state = MockInstanceState(
354+ self,
355+ ["i-amkillable", "i-amkillabletoo"], [0, 2],
356+ [["shutting-down", "shutting-down"],
357+ ["shutting-down", "shutting-down"],
358+ ["shutting-down", "terminated"],
359+ ["terminated"]])
360+
361+ self.ec2.describe_instances("i-amkillable", "i-amkillabletoo")
362+ self.mocker.call(system_state.get_round)
363+ self.mocker.count(3)
364+
365+ self.ec2.describe_instances("i-amkillable")
366+ self.mocker.call(system_state.get_round)
367+
368+ self.ec2.delete_security_group(MATCH_GROUP)
369+ deleted_groups = Observed()
370+ self.mocker.call(deleted_groups.add)
371+ self.mocker.count(2)
372+ self.mocker.replay()
373+
374+ provider = self.get_provider()
375+ yield remove_security_groups(
376+ provider, ["i-amkillable", "i-amkillabletoo"])
377+ self.assertEquals(
378+ deleted_groups.items,
379+ set(["ensemble-moon-0", "ensemble-moon-2"]))
380+
381+ @inlineCallbacks
382+ def test_machine_not_in_machine_security_group(self):
383+ """Old environments do not have machines in security groups.
384+
385+ TODO It might be desirable to change this behavior to log as an
386+ error, or even raise as a provider exception.
387+ """
388+ log = self.capture_logging()
389+
390+ # Instance is not in a machine security group, just its env
391+ # security group (ensemble-moon)
392+ self.ec2.describe_instances("i-amkillable")
393+ self.mocker.result(succeed([
394+ self.get_instance("i-amkillable", "terminated", groups=[])]))
395+ self.mocker.replay()
396+
397+ provider = self.get_provider()
398+ yield remove_security_groups(provider, ["i-amkillable"])
399+ self.assertIn(
400+ "Ignoring missing machine security group for instance "
401+ "'i-amkillable'",
402+ log.getvalue())
403+
404+ @inlineCallbacks
405+ def test_cannot_delete_machine_security_group(self):
406+ """Verify this raises error.
407+
408+ Note it is possible for a malicious admin to start other
409+ machines with an existing machine security group, so need to
410+ test this removal is robust in this case. Another scenario
411+ that might cause this is that there's an EC2 timeout.
412+ """
413+ system_state = MockInstanceState(
414+ self, ["i-amkillable"], [0],
415+ [["shutting-down"],
416+ ["shutting-down"],
417+ ["shutting-down"],
418+ ["terminated"]])
419+
420+ self.ec2.describe_instances("i-amkillable")
421+ self.mocker.call(system_state.get_round)
422+ self.mocker.count(4)
423+
424+ self.ec2.delete_security_group("ensemble-moon-0")
425+ self.mocker.result(fail(
426+ self.get_ec2_error(
427+ "ensemble-moon-0",
428+ format="There are active instances using security group %r"
429+ )))
430+
431+ self.mocker.replay()
432+
433+ provider = self.get_provider()
434+ ex = yield self.assertFailure(
435+ remove_security_groups(provider, ["i-amkillable"]),
436+ ProviderInteractionError)
437+ self.assertEquals(
438+ str(ex),
439+ "EC2 error when attempting to delete group ensemble-moon-0: "
440+ "Error Message: There are active instances using security group "
441+ "'ensemble-moon-0'")
442+
443+ @inlineCallbacks
444+ def test_remove_security_groups_takes_too_long(self):
445+ """Verify that removing security groups doesn't continue indefinitely.
446+
447+ This scenario might happen if the instance is taking too long
448+ to report as going into the terminated state, although not
449+ likely after 1000 polls!
450+ """
451+ log = self.capture_logging()
452+ states = [["shutting-down", "shutting-down"],
453+ ["shutting-down", "shutting-down"],
454+ ["shutting-down", "terminated"]]
455+ keeps_on_going = [["shutting-down"] for i in xrange(1000)]
456+ states.extend(keeps_on_going)
457+
458+ system_state = MockInstanceState(
459+ self, ["i-amkillable", "i-amkillabletoo"], [0, 2], states)
460+
461+ self.ec2.describe_instances("i-amkillable", "i-amkillabletoo")
462+ self.mocker.call(system_state.get_round)
463+ self.mocker.count(3)
464+
465+ # Keep the rounds going until it exits with the error message
466+ self.ec2.describe_instances("i-amkillable")
467+ self.mocker.call(system_state.get_round)
468+ self.mocker.count(197)
469+
470+ # Verify that at least this machine security group was deleted
471+ deleted_groups = Observed()
472+ self.ec2.delete_security_group("ensemble-moon-2")
473+ self.mocker.call(deleted_groups.add)
474+
475+ self.mocker.replay()
476+
477+ provider = self.get_provider()
478+ yield remove_security_groups(
479+ provider, ["i-amkillable", "i-amkillabletoo"])
480+ self.assertEquals(deleted_groups.items, set(["ensemble-moon-2"]))
481+ self.assertIn(
482+ "Instance shutdown taking too long, "
483+ "could not delete groups ensemble-moon-0",
484+ log.getvalue())
485+
486+ @inlineCallbacks
487+ def test_destroy_environment_security_group(self):
488+ """Verify the deletion of the security group for the environment"""
489+ self.ec2.delete_security_group("ensemble-moon")
490+ self.mocker.result(succeed(True))
491+ self.mocker.replay()
492+
493+ provider = self.get_provider()
494+ destroyed = yield destroy_environment_security_group(provider)
495+ self.assertTrue(destroyed)
496+
497+ @inlineCallbacks
498+ def test_destroy_environment_security_group_missing(self):
499+ """Verify ignores errors in deleting the env security group"""
500+ log = self.capture_logging(level=logging.DEBUG)
501+ self.ec2.delete_security_group("ensemble-moon")
502+ self.mocker.result(fail(
503+ self.get_ec2_error(
504+ "ensemble-moon",
505+ format="The security group %r does not exist"
506+ )))
507+ self.mocker.replay()
508+
509+ provider = self.get_provider()
510+ destroyed = yield destroy_environment_security_group(provider)
511+ self.assertFalse(destroyed)
512+ self.assertIn(
513+ "Ignoring EC2 error when attempting to delete group "
514+ "ensemble-moon: Error Message: The security group "
515+ "'ensemble-moon' does not exist",
516+ log.getvalue())
517
518=== modified file 'ensemble/providers/ec2/tests/test_shutdown.py'
519--- ensemble/providers/ec2/tests/test_shutdown.py 2011-08-16 08:33:47 +0000
520+++ ensemble/providers/ec2/tests/test_shutdown.py 2011-08-29 18:36:25 +0000
521@@ -1,7 +1,8 @@
522-from twisted.internet.defer import fail, succeed
523+from twisted.internet.defer import succeed, fail, inlineCallbacks
524
525 from ensemble.lib.testing import TestCase
526-from ensemble.providers.ec2.tests.common import EC2TestMixin
527+from ensemble.providers.ec2.tests.common import (
528+ EC2TestMixin, MATCH_GROUP, Observed, MockInstanceState)
529
530 from ensemble.machine import ProviderMachine
531
532@@ -15,23 +16,36 @@
533
534 class EC2ShutdownMachineTest(EC2TestMixin, TestCase):
535
536+ @inlineCallbacks
537 def test_shutdown_machine(self):
538- instance = self.get_instance("i-foobar")
539+ system_state = MockInstanceState(
540+ self, ["i-foobar"], [0],
541+ [["running"],
542+ ["shutting-down"],
543+ ["shutting-down"],
544+ ["shutting-down"],
545+ ["terminated"]])
546+
547 self.ec2.describe_instances("i-foobar")
548- self.mocker.result(succeed([instance]))
549+ self.mocker.call(system_state.get_round)
550+ self.mocker.count(5)
551+
552 self.ec2.terminate_instances("i-foobar")
553 self.mocker.result(succeed([("i-foobar", "running", "shutting-down")]))
554+
555+ self.ec2.delete_security_group("ensemble-moon-0")
556+ deleted_groups = Observed()
557+ self.mocker.call(deleted_groups.add)
558+
559 self.mocker.replay()
560
561 machine = EC2ProviderMachine("i-foobar")
562 provider = self.get_provider()
563- d = provider.shutdown_machine(machine)
564-
565- def verify(machine):
566- self.assertTrue(isinstance(machine, EC2ProviderMachine))
567- self.assertEquals(machine.instance_id, "i-foobar")
568- d.addCallback(verify)
569- return d
570+ machine = yield provider.shutdown_machine(machine)
571+ self.assertTrue(isinstance(machine, EC2ProviderMachine))
572+ self.assertEquals(machine.instance_id, "i-foobar")
573+ # Notably, ensemble-moon is not mock deleted
574+ self.assertEquals(deleted_groups.items, set(["ensemble-moon-0"]))
575
576 def test_shutdown_machine_invalid_group(self):
577 """
578@@ -56,7 +70,7 @@
579 def test_shutdown_machine_invalid_machine(self):
580 """
581 Attempting to shutdown a machine that from a different provider
582- type will raise a syntaxerror.
583+ type will raise a `ProviderError`.
584 """
585 self.mocker.replay()
586 machine = ProviderMachine("i-foobar")
587@@ -71,16 +85,14 @@
588 d.addCallback(check_error)
589 return d
590
591+ @inlineCallbacks
592 def test_shutdown_machines_none(self):
593 self.mocker.replay()
594 provider = self.get_provider()
595- d = provider.shutdown_machines([])
596-
597- def verify(result):
598- self.assertEquals(result, [])
599- d.addCallback(verify)
600- return d
601-
602+ result = yield provider.shutdown_machines([])
603+ self.assertEquals(result, [])
604+
605+ @inlineCallbacks
606 def test_shutdown_machines_some_invalid(self):
607 self.ec2.describe_instances("i-amkillable", "i-amalien", "i-amdead")
608 self.mocker.result(succeed([
609@@ -90,104 +102,148 @@
610 self.mocker.replay()
611
612 provider = self.get_provider()
613- d = provider.shutdown_machines([
614- EC2ProviderMachine("i-amkillable"),
615- EC2ProviderMachine("i-amalien"),
616- EC2ProviderMachine("i-amdead")])
617- self.failUnlessFailure(d, MachinesNotFound)
618-
619- def verify(error):
620- self.assertEquals(str(error),
621- "Cannot find machines: i-amalien, i-amdead")
622- d.addCallback(verify)
623- return d
624-
625+ ex = yield self.assertFailure(
626+ provider.shutdown_machines([
627+ EC2ProviderMachine("i-amkillable"),
628+ EC2ProviderMachine("i-amalien"),
629+ EC2ProviderMachine("i-amdead")]),
630+ MachinesNotFound)
631+ self.assertEquals(str(ex),
632+ "Cannot find machines: i-amalien, i-amdead")
633+
634+ @inlineCallbacks
635 def test_shutdown_machines_some_success(self):
636+ """Verify that shutting down some machines works.
637+
638+ In particular, the environment as a whole is not removed
639+ because there's still the environment security group left."""
640+ system_state = MockInstanceState(
641+ self,
642+ ["i-amkillable", "i-amkillabletoo"], [0, 2],
643+ [["running", "running"],
644+ ["shutting-down", "shutting-down"],
645+ ["shutting-down", "shutting-down"],
646+ ["shutting-down", "terminated"],
647+ ["terminated"]])
648+
649 self.ec2.describe_instances("i-amkillable", "i-amkillabletoo")
650- self.mocker.result(succeed([
651- self.get_instance("i-amkillable"),
652- self.get_instance("i-amkillabletoo")]))
653+ self.mocker.call(system_state.get_round)
654+ self.mocker.count(4)
655+
656+ self.ec2.describe_instances("i-amkillable")
657+ self.mocker.call(system_state.get_round)
658+
659 self.ec2.terminate_instances("i-amkillable", "i-amkillabletoo")
660 self.mocker.result(succeed([
661 ("i-amkillable", "running", "shutting-down"),
662 ("i-amkillabletoo", "running", "shutting-down")]))
663+
664+ deleted_groups = Observed()
665+ self.ec2.delete_security_group("ensemble-moon-0")
666+ self.mocker.call(deleted_groups.add)
667+
668+ self.ec2.delete_security_group("ensemble-moon-2")
669+ self.mocker.call(deleted_groups.add)
670+
671 self.mocker.replay()
672
673 provider = self.get_provider()
674- d = provider.shutdown_machines([
675- EC2ProviderMachine("i-amkillable"),
676- EC2ProviderMachine("i-amkillabletoo")])
677-
678- def verify(result):
679- (machine_1, machine_2) = result
680- self.assertTrue(isinstance(machine_1, EC2ProviderMachine))
681- self.assertEquals(machine_1.instance_id, "i-amkillable")
682- self.assertTrue(isinstance(machine_2, EC2ProviderMachine))
683- self.assertEquals(machine_2.instance_id, "i-amkillabletoo")
684- d.addCallback(verify)
685- return d
686+ machine_1, machine_2 = yield provider.shutdown_machines([
687+ EC2ProviderMachine("i-amkillable"),
688+ EC2ProviderMachine("i-amkillabletoo")])
689+ self.assertTrue(isinstance(machine_1, EC2ProviderMachine))
690+ self.assertEquals(machine_1.instance_id, "i-amkillable")
691+ self.assertTrue(isinstance(machine_2, EC2ProviderMachine))
692+ self.assertEquals(machine_2.instance_id, "i-amkillabletoo")
693+ self.assertEquals(
694+ deleted_groups.items,
695+ set(["ensemble-moon-0", "ensemble-moon-2"]))
696
697
698 class EC2DestroyTest(EC2TestMixin, TestCase):
699
700+ @inlineCallbacks
701 def test_destroy_environment(self):
702 """
703 The destroy_environment operation terminates all running and pending
704- instances associated to the C{MachineProvider} instance.
705+ instances associated to the `MachineProvider` instance.
706 """
707 self.s3.put_object("moon", "provider-state", "{}\n")
708 self.mocker.result(succeed(None))
709 self.ec2.describe_instances()
710- self.mocker.result(succeed([
711- self.get_instance("i-canbekilled"),
712- self.get_instance("i-amdead", state="terminated"),
713+ instances = [
714+ self.get_instance("i-canbekilled", machine_id=0),
715+ self.get_instance("i-amdead", machine_id=1, state="terminated"),
716 self.get_instance("i-dontbelong", groups=["unknown"]),
717- self.get_instance("i-canbekilledtoo", state="pending")]))
718+ self.get_instance(
719+ "i-canbekilledtoo", machine_id=2, state="pending")]
720+ self.mocker.result(succeed(instances))
721 self.ec2.describe_instances("i-canbekilled", "i-canbekilledtoo")
722 self.mocker.result(succeed([
723- self.get_instance("i-canbekilled"),
724- self.get_instance("i-canbekilledtoo", state="pending")]))
725+ self.get_instance("i-canbekilled"),
726+ self.get_instance("i-canbekilledtoo", state="pending")]))
727 self.ec2.terminate_instances("i-canbekilled", "i-canbekilledtoo")
728 self.mocker.result(succeed([
729- ("i-canbekilled", "running", "shutting-down"),
730- ("i-canbekilledtoo", "pending", "shutting-down")]))
731+ ("i-canbekilled", "running", "shutting-down"),
732+ ("i-canbekilledtoo", "pending", "shutting-down")]))
733+
734+ self.ec2.describe_instances("i-canbekilled", "i-canbekilledtoo")
735+ states = [
736+ self.get_instance(
737+ "i-canbekilled", state="terminated", machine_id=0),
738+ self.get_instance(
739+ "i-canbekilledtoo", state="terminated", machine_id=2)]
740+ self.mocker.result(succeed(states))
741+
742+ self.ec2.delete_security_group(MATCH_GROUP)
743+ deleted_groups = Observed()
744+ self.mocker.call(deleted_groups.add)
745+ self.mocker.count(3)
746+
747 self.mocker.replay()
748
749 provider = self.get_provider()
750- d = provider.destroy_environment()
751-
752- def verify(result):
753- (machine_1, machine_2) = result
754- self.assertTrue(isinstance(machine_1, EC2ProviderMachine))
755- self.assertEquals(machine_1.instance_id, "i-canbekilled")
756- self.assertTrue(isinstance(machine_2, EC2ProviderMachine))
757- self.assertEquals(machine_2.instance_id, "i-canbekilledtoo")
758- d.addCallback(verify)
759- return d
760-
761+ machine_1, machine_2 = yield provider.destroy_environment()
762+ self.assertTrue(isinstance(machine_1, EC2ProviderMachine))
763+ self.assertEquals(machine_1.instance_id, "i-canbekilled")
764+ self.assertTrue(isinstance(machine_2, EC2ProviderMachine))
765+ self.assertEquals(machine_2.instance_id, "i-canbekilledtoo")
766+ self.assertEquals(
767+ deleted_groups.items,
768+ set(["ensemble-moon-0", "ensemble-moon-2", "ensemble-moon"]))
769+
770+ @inlineCallbacks
771 def test_s3_failure(self):
772 """Failing to store empty state should not stop us killing machines"""
773 self.s3.put_object("moon", "provider-state", "{}\n")
774 self.mocker.result(fail(SomeError()))
775 self.ec2.describe_instances()
776 self.mocker.result(succeed([self.get_instance("i-canbekilled")]))
777+ system_state = MockInstanceState(
778+ self, ["i-canbekilled"], [0],
779+ [["running"],
780+ ["shutting-down"],
781+ ["shutting-down"],
782+ ["shutting-down"],
783+ ["terminated"]])
784 self.ec2.describe_instances("i-canbekilled")
785- self.mocker.result(succeed([self.get_instance("i-canbekilled")]))
786+ self.mocker.call(system_state.get_round)
787+ self.mocker.count(5)
788 self.ec2.terminate_instances("i-canbekilled")
789 self.mocker.result(succeed([
790 ("i-canbekilled", "running", "shutting-down")]))
791+ self.ec2.delete_security_group("ensemble-moon-0")
792+ self.mocker.result(succeed(True))
793+ self.ec2.delete_security_group("ensemble-moon")
794+ self.mocker.result(succeed(True))
795 self.mocker.replay()
796
797 provider = self.get_provider()
798- d = provider.destroy_environment()
799-
800- def verify(result):
801- (machine,) = result
802- self.assertTrue(isinstance(machine, EC2ProviderMachine))
803- self.assertEquals(machine.instance_id, "i-canbekilled")
804- d.addCallback(verify)
805- return d
806-
807+ machine, = yield provider.destroy_environment()
808+ self.assertTrue(isinstance(machine, EC2ProviderMachine))
809+ self.assertEquals(machine.instance_id, "i-canbekilled")
810+
811+ @inlineCallbacks
812 def test_shutdown_no_instances(self):
813 """
814 If there are no instances to shutdown, running the destroy_environment
815@@ -197,12 +253,14 @@
816 self.mocker.result(succeed(None))
817 self.ec2.describe_instances()
818 self.mocker.result(succeed([]))
819+ self.ec2.delete_security_group("ensemble-moon")
820+ self.mocker.result(fail(
821+ self.get_ec2_error(
822+ "ensemble-moon",
823+ format="The security group %r does not exist"
824+ )))
825 self.mocker.replay()
826
827 provider = self.get_provider()
828- d = provider.destroy_environment()
829-
830- def verify(result):
831- self.assertEquals(result, [])
832- d.addCallback(verify)
833- return d
834+ result = yield provider.destroy_environment()
835+ self.assertEquals(result, [])

Subscribers

People subscribed via source and target branches

to status/vote changes: