Merge lp:~salgado/launchpad/kill-rogue-ec2-instances into lp:launchpad

Proposed by Guilherme Salgado
Status: Merged
Merged at revision: not available
Proposed branch: lp:~salgado/launchpad/kill-rogue-ec2-instances
Merge into: lp:launchpad
Diff against target: 74 lines (+15/-4)
3 files modified
lib/devscripts/ec2test/builtins.py (+2/-1)
lib/devscripts/ec2test/instance.py (+1/-1)
lib/devscripts/ec2test/testrunner.py (+12/-2)
To merge this branch: bzr merge lp:~salgado/launchpad/kill-rogue-ec2-instances
Reviewer Review Type Date Requested Status
Jonathan Lange (community) Approve
Review via email: mp+18834@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Guilherme Salgado (salgado) wrote :

= Summary =

Today, for the second time in less than 6 months, I started working and
realized an 'ec2 land' I ran (last Friday) on behalf of a community
member was hung and thus failed to bring the ec2 instance down.

This is pretty bad because in this case the instance was left up for
more than 48 hours, so I thought of adding scheduling a shutdown for 8h
after we start the preparations to run the test suite. This is more than
twice the time our test suite takes to complete, and it won't happen for
demo instances.

Revision history for this message
Jonathan Lange (jml) wrote :

You're right, we need some sort of timeout, given that normal termination is unreliable. Ideally, the timeout would be based on inactivity, rather than total time, but I think this patch is a net win. Please land.

review: Approve
Revision history for this message
Jonathan Lange (jml) wrote :

Looks good to me.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lib/devscripts/ec2test/builtins.py'
2--- lib/devscripts/ec2test/builtins.py 2009-11-30 16:26:24 +0000
3+++ lib/devscripts/ec2test/builtins.py 2010-02-08 13:41:20 +0000
4@@ -301,7 +301,8 @@
5 pqm_submit_location=pqm_submit_location,
6 open_browser=open_browser, pqm_email=pqm_email,
7 include_download_cache_changes=include_download_cache_changes,
8- instance=instance, launchpad_login=instance._launchpad_login)
9+ instance=instance, launchpad_login=instance._launchpad_login,
10+ timeout=480)
11
12 instance.set_up_and_run(postmortem, not headless, runner.run_tests)
13
14
15=== modified file 'lib/devscripts/ec2test/instance.py'
16--- lib/devscripts/ec2test/instance.py 2009-11-27 07:24:49 +0000
17+++ lib/devscripts/ec2test/instance.py 2010-02-08 13:41:20 +0000
18@@ -129,7 +129,7 @@
19
20 apt-get -y install launchpad-developer-dependencies apache2 apache2-mpm-worker
21
22-# Creat the ec2test user, give them passwordless sudo.
23+# Create the ec2test user, give them passwordless sudo.
24 adduser --gecos "" --disabled-password ec2test
25 echo 'ec2test\tALL=(ALL) NOPASSWD: ALL' >> /etc/sudoers
26
27
28=== modified file 'lib/devscripts/ec2test/testrunner.py'
29--- lib/devscripts/ec2test/testrunner.py 2009-12-01 22:53:47 +0000
30+++ lib/devscripts/ec2test/testrunner.py 2010-02-08 13:41:20 +0000
31@@ -115,10 +115,14 @@
32 pqm_submit_location=None,
33 open_browser=False, pqm_email=None,
34 include_download_cache_changes=None, instance=None,
35- launchpad_login=None):
36+ launchpad_login=None,
37+ timeout=None):
38 """Create a new EC2TestRunner.
39
40- This sets the following attributes:
41+ :param timeout: Number of minutes before we force a shutdown. This is
42+ useful because sometimes the normal instance termination might
43+ fail.
44+
45 - original_branch
46 - test_options
47 - headless
48@@ -129,6 +133,7 @@
49 - email (after validating email capabilities)
50 - image (after connecting to ec2)
51 - file
52+ - timeout
53 """
54 self.original_branch = branch
55 self.test_options = test_options
56@@ -137,6 +142,7 @@
57 self.open_browser = open_browser
58 self.file = file
59 self._launchpad_login = launchpad_login
60+ self.timeout = timeout
61
62 trunk_specified = False
63 trunk_branch = TRUNK_BRANCH
64@@ -314,6 +320,10 @@
65
66 def configure_system(self):
67 user_connection = self._instance.connect()
68+ if self.timeout is not None:
69+ user_connection.perform(
70+ "echo sudo shutdown -h now | at today + %d minutes"
71+ % self.timeout)
72 as_user = user_connection.perform
73 # Set up bazaar.conf with smtp information if necessary
74 if self.email or self.message: