OpenStack Compute (Nova)

Merge lp:~tr3buchet/nova/lock into lp:~hudson-openstack/nova/trunk

Proposed by Trey Morris on 2010-12-29
Status: Merged
Approved by: Eric Day on 2011-01-07
Approved revision: 529
Merged at revision: 524
Proposed branch: lp:~tr3buchet/nova/lock
Merge into: lp:~hudson-openstack/nova/trunk
Diff against target: 330 lines (+184/-2) 5 files modified
To merge this branch: bzr merge lp:~tr3buchet/nova/lock
Reviewer Review Type Date Requested Status
Eric Day (community) 2010-12-29 Approve on 2011-01-07
Devin Carlen (community) Approve on 2010-12-30
Matt Dietz (community) Needs Fixing on 2010-12-29
Review via email: mp+44874@code.launchpad.net

Description of the Change

This branch implements lock functionality. The lock is stored in the compute worker database. Decorators have been added to the openstack API actions which alter instances in any way.

To post a comment you must log in.
Trey Morris (tr3buchet) wrote :

also, not shown is me modifying the users and instances tables in another window to set the state of is_admin and locked attributes which are commented about above.

lp:~tr3buchet/nova/lock updated on 2010-12-29
507. By Trey Morris on 2010-12-29

removed some code i didn't end up using

Trey Morris (tr3buchet) wrote :

also, I apologize for launchpad mangling my commit message

Rick Harris (rconradharris) wrote :

Overall lgtm. A few small comments:

> + if(admin or not locked):

No parens needed :-)

> + @checks_lock

I'm wondering if we should give this particular lock a memorable/unambiguous name-- there will be various kinds of locks, mutexs, etc throughout the system, so down the road, check_lock might not be as clear as it is today.

Maybe, @check_instance_lock, or @check_state_lock. Just a thought.

Trey Morris (tr3buchet) wrote :

i like that
i'll fix up both

lp:~tr3buchet/nova/lock updated on 2010-12-29
508. By Trey Morris on 2010-12-29

removed () from if (can't believe i did that) and renamed checks_lock decorator

Trey Morris (tr3buchet) wrote :
Download full text (6.0 KiB)

fixed.

also moved my testing notes from above to this comment so it isn't in the commit message

Notes: unittests passed locally. pep8 returned not errors locally. My integration test with output is as follows (remember that the openstack API maps state "pause" to "error"):

(trey|3nova)~/openstack/n3vascript> n3va.sh teardown
(trey|3nova)~/openstack/n3vascript> n3va.sh run

(screen window)
(trey|3nova)~/openstack/n3vascript> . /home/trey/openstack/n3vascript/novarc
(trey|3nova)~/openstack/n3vascript> euca-add-keypair test > test.pem
(trey|3nova)~/openstack/n3vascript> export CLOUD_SERVERS_USERNAME=admin
(trey|3nova)~/openstack/n3vascript> export CLOUD_SERVERS_API_KEY=admin
(trey|3nova)~/openstack/n3vascript> export CLOUD_SERVERS_URL=http://localhost:8774/v1.0/
(trey|3nova)~/openstack/n3vascript> euca-run-instances -k test -t m1.tiny ami-tiny
RESERVATION r-27hxqjra admin
INSTANCE i-qssosb ami-tiny scheduling test (admin, None) 0 m1.tiny 2010-12-29 20:54:38.391163
(trey|3nova)~/openstack/n3vascript>
(trey|3nova)~/openstack/n3vascript> # unlocked + admin
(trey|3nova)~/openstack/n3vascript> cloudservers list
+------------+-------------------+--------+-----------+------------+
| ID | Name | Status | Public IP | Private IP |
+------------+-------------------+--------+-----------+------------+
| 1620488315 | Server 1620488315 | active | | |
+------------+-------------------+--------+-----------+------------+
(trey|3nova)~/openstack/n3vascript> cloudservers pause 1620488315
(trey|3nova)~/openstack/n3vascript> cloudservers list
+------------+-------------------+--------+-----------+------------+
| ID | Name | Status | Public IP | Private IP |
+------------+-------------------+--------+-----------+------------+
| 1620488315 | Server 1620488315 | error | | |
+------------+-------------------+--------+-----------+------------+
(trey|3nova)~/openstack/n3vascript> cloudservers unpause 1620488315
(trey|3nova)~/openstack/n3vascript> cloudservers list
+------------+-------------------+--------+-----------+------------+
| ID | Name | Status | Public IP | Private IP |
+------------+-------------------+--------+-----------+------------+
| 1620488315 | Server 1620488315 | active | | |
+------------+-------------------+--------+-----------+------------+
(trey|3nova)~/openstack/n3vascript>
(trey|3nova)~/openstack/n3vascript> # locked + admin
(trey|3nova)~/openstack/n3vascript> cloudservers list
+------------+-------------------+--------+-----------+------------+
| ID | Name | Status | Public IP | Private IP |
+------------+-------------------+--------+-----------+------------+
| 1620488315 | Server 1620488315 | active | | |
+------------+-------------------+--------+-----------+------------+
(trey|3nova)~/openstack/n3vascript> cloudservers pause 1620488315
(trey|3nova)~/openstack/n3vascript> cloudservers list
+------------+-------------------+--------+-----------+------------+
| ID | Name | Status | Public ...

Read more...

Trey Morris (tr3buchet) wrote :

oooh less mangling this way too :D

Eric Day (eday) wrote :

I think the lock check should be happening on the compute worker node, not within the API server (and if it were happening in the API server, it should be in compute.api, not in the openstack API specific code). We need to start planning for race conditions in a distributed system, and this means handling operations at the canonical source.

review: Needs Fixing
Matt Dietz (cerberus) wrote :

I'm not sure I agree with the idea of a locked instance returning not found. I think it delivers an idea inconsistent with the intent of the locking mechanism. I would say HTTP 405 (MethodNotAllowed) or something like that instead, as a 404 wouldn't let the end user know that it needs to be unlocked. Also, I don't think we want to lock the show action, as the lock is supposed to represent immutability, not visibility.

review: Needs Fixing
lp:~tr3buchet/nova/lock updated on 2010-12-29
509. By Trey Morris on 2010-12-29

removed lock check from show and changed returning 404 to 405

Trey Morris (tr3buchet) wrote :

re matthew: fixed.
re eric: yikes, good point.

Trey Morris (tr3buchet) wrote :

also re eric: you devil...

lp:~tr3buchet/nova/lock updated on 2010-12-30
510. By Trey Morris on 2010-12-30

moved check lock decorator to compute api level. altered openstack.test_servers according and wrote test for lock in tests.test_compute

511. By Trey Morris on 2010-12-30

fixd variables being out of scope in lock decorator

512. By Trey Morris on 2010-12-30

altered error exception/logging

513. By Trey Morris on 2010-12-30

altered error exception/logging

514. By Trey Morris on 2010-12-30

typo, trying to hurry.. look where that got me

515. By Trey Morris on 2010-12-30

added some logging

516. By Trey Morris on 2010-12-30

removed db.set_lock, using update_instance instead

517. By Trey Morris on 2010-12-30

moved check lock decorator from the compute api to the come manager... when it rains it pours

518. By Trey Morris on 2010-12-30

syntax error

519. By Trey Morris on 2010-12-30

fixed up the compute lock test, was failing because the context was always admin

520. By Trey Morris on 2010-12-30

removed tests.api.openstack.test_servers test_lock, to hell with it. i'm not even sure if testing lock needs to be at this level

Trey Morris (tr3buchet) wrote :

re eric:
ok moved the lock check business to the compute api. Tests spent forever fixing the unittests, gave up. Moved the lock check business to the compute manager. Tests are passing. Testing shows it works. Attempting to perform a lock checking action on a locked instance when not an admin will throw an exception which is currently uncaught. Also I removed the test for lock in the openstack api tests because I'm not sure a lock test should go there.

lp:~tr3buchet/nova/lock updated on 2010-12-30
521. By Trey Morris on 2010-12-30

altered the compute lock test

522. By Trey Morris on 2010-12-30

fixed the compute lock test

Trey Morris (tr3buchet) wrote :

alright here we go. Now it works, passes unittests and when you attempt to do something to a locked instance and are not an admin, nothing happens and this shows up in the logs (for example):

ERROR:root:Instance |1| is locked, cannot execute |<function pause_instance at 0x2d910c8>|

Devin Carlen (devcamcar) wrote :

good work, lock name is clear and code is in compute api now.

review: Approve
Eric Day (eday) wrote :

140: I would simplify this and assume (self, context, instance_id) for args[0/1/2] and not looks in kwargs. In fact, on 129, I would instead do:

+ def decorated_function(self, context, instance_id, *args, **kwargs):

and remove the try block altogether.

lp:~tr3buchet/nova/lock updated on 2011-01-06
523. By Trey Morris on 2011-01-06

altered argument handling

524. By Trey Morris on 2011-01-06

pep8

525. By Trey Morris on 2011-01-06

merged trunk

526. By Trey Morris on 2011-01-06

accidentally left unlocked in there, it should have been locked

527. By Trey Morris on 2011-01-06

passing the correct parameters to decorated function

528. By Trey Morris on 2011-01-06

refers to instance_id instead of instance_ref[instance_id]

529. By Trey Morris on 2011-01-06

typo

Trey Morris (tr3buchet) wrote :

alright eric day, i've got your changes implemented. locally unittests pass, pep8 returns nothing and the lock works.

Eric Day (eday) wrote :

lgtm

review: Approve

Preview Diff

1=== modified file 'nova/api/openstack/servers.py'
2--- nova/api/openstack/servers.py 2011-01-05 17:50:19 +0000
3+++ nova/api/openstack/servers.py 2011-01-07 00:00:12 +0000
4@@ -170,6 +170,50 @@
5 return faults.Fault(exc.HTTPUnprocessableEntity())
6 return exc.HTTPAccepted()
7
8+ def lock(self, req, id):
9+ """
10+ lock the instance with id
11+ admin only operation
12+
13+ """
14+ context = req.environ['nova.context']
15+ try:
16+ self.compute_api.lock(context, id)
17+ except:
18+ readable = traceback.format_exc()
19+ logging.error(_("Compute.api::lock %s"), readable)
20+ return faults.Fault(exc.HTTPUnprocessableEntity())
21+ return exc.HTTPAccepted()
22+
23+ def unlock(self, req, id):
24+ """
25+ unlock the instance with id
26+ admin only operation
27+
28+ """
29+ context = req.environ['nova.context']
30+ try:
31+ self.compute_api.unlock(context, id)
32+ except:
33+ readable = traceback.format_exc()
34+ logging.error(_("Compute.api::unlock %s"), readable)
35+ return faults.Fault(exc.HTTPUnprocessableEntity())
36+ return exc.HTTPAccepted()
37+
38+ def get_lock(self, req, id):
39+ """
40+ return the boolean state of (instance with id)'s lock
41+
42+ """
43+ context = req.environ['nova.context']
44+ try:
45+ self.compute_api.get_lock(context, id)
46+ except:
47+ readable = traceback.format_exc()
48+ logging.error(_("Compute.api::get_lock %s"), readable)
49+ return faults.Fault(exc.HTTPUnprocessableEntity())
50+ return exc.HTTPAccepted()
51+
52 def pause(self, req, id):
53 """ Permit Admins to Pause the server. """
54 ctxt = req.environ['nova.context']
55
56=== modified file 'nova/compute/api.py'
57--- nova/compute/api.py 2011-01-05 17:50:19 +0000
58+++ nova/compute/api.py 2011-01-07 00:00:12 +0000
59@@ -147,6 +147,7 @@
60 'user_data': user_data or '',
61 'key_name': key_name,
62 'key_data': key_data,
63+ 'locked': False,
64 'availability_zone': availability_zone}
65
66 elevated = context.elevated()
67@@ -354,6 +355,38 @@
68 {"method": "unrescue_instance",
69 "args": {"instance_id": instance_id}})
70
71+ def lock(self, context, instance_id):
72+ """
73+ lock the instance with instance_id
74+
75+ """
76+ instance = self.get_instance(context, instance_id)
77+ host = instance['host']
78+ rpc.cast(context,
79+ self.db.queue_get_for(context, FLAGS.compute_topic, host),
80+ {"method": "lock_instance",
81+ "args": {"instance_id": instance['id']}})
82+
83+ def unlock(self, context, instance_id):
84+ """
85+ unlock the instance with instance_id
86+
87+ """
88+ instance = self.get_instance(context, instance_id)
89+ host = instance['host']
90+ rpc.cast(context,
91+ self.db.queue_get_for(context, FLAGS.compute_topic, host),
92+ {"method": "unlock_instance",
93+ "args": {"instance_id": instance['id']}})
94+
95+ def get_lock(self, context, instance_id):
96+ """
97+ return the boolean state of (instance with instance_id)'s lock
98+
99+ """
100+ instance = self.get_instance(context, instance_id)
101+ return instance['locked']
102+
103 def attach_volume(self, context, instance_id, volume_id, device):
104 if not re.match("^/dev/[a-z]d[a-z]+$", device):
105 raise exception.ApiError(_("Invalid device specified: %s. "
106
107=== modified file 'nova/compute/manager.py'
108--- nova/compute/manager.py 2011-01-06 13:12:57 +0000
109+++ nova/compute/manager.py 2011-01-07 00:00:12 +0000
110@@ -36,6 +36,7 @@
111
112 import datetime
113 import logging
114+import functools
115
116 from nova import exception
117 from nova import flags
118@@ -53,6 +54,38 @@
119 'Stub network related code')
120
121
122+def checks_instance_lock(function):
123+ """
124+ decorator used for preventing action against locked instances
125+ unless, of course, you happen to be admin
126+
127+ """
128+
129+ @functools.wraps(function)
130+ def decorated_function(self, context, instance_id, *args, **kwargs):
131+
132+ logging.info(_("check_instance_lock: decorating: |%s|"), function)
133+ logging.info(_("check_instance_lock: arguments: |%s| |%s| |%s|"),
134+ self,
135+ context,
136+ instance_id)
137+ locked = self.get_lock(context, instance_id)
138+ admin = context.is_admin
139+ logging.info(_("check_instance_lock: locked: |%s|"), locked)
140+ logging.info(_("check_instance_lock: admin: |%s|"), admin)
141+
142+ # if admin or unlocked call function otherwise log error
143+ if admin or not locked:
144+ logging.info(_("check_instance_lock: executing: |%s|"), function)
145+ function(self, context, instance_id, *args, **kwargs)
146+ else:
147+ logging.error(_("check_instance_lock: not executing |%s|"),
148+ function)
149+ return False
150+
151+ return decorated_function
152+
153+
154 class ComputeManager(manager.Manager):
155
156 """Manages the running instances from creation to destruction."""
157@@ -158,6 +191,7 @@
158 self._update_state(context, instance_id)
159
160 @exception.wrap_exception
161+ @checks_instance_lock
162 def terminate_instance(self, context, instance_id):
163 """Terminate an instance on this machine."""
164 context = context.elevated()
165@@ -202,6 +236,7 @@
166 self.db.instance_destroy(context, instance_id)
167
168 @exception.wrap_exception
169+ @checks_instance_lock
170 def reboot_instance(self, context, instance_id):
171 """Reboot an instance on this server."""
172 context = context.elevated()
173@@ -246,6 +281,7 @@
174 self.driver.snapshot(instance_ref, name)
175
176 @exception.wrap_exception
177+ @checks_instance_lock
178 def rescue_instance(self, context, instance_id):
179 """Rescue an instance on this server."""
180 context = context.elevated()
181@@ -261,6 +297,7 @@
182 self._update_state(context, instance_id)
183
184 @exception.wrap_exception
185+ @checks_instance_lock
186 def unrescue_instance(self, context, instance_id):
187 """Rescue an instance on this server."""
188 context = context.elevated()
189@@ -280,6 +317,7 @@
190 self._update_state(context, instance_id)
191
192 @exception.wrap_exception
193+ @checks_instance_lock
194 def pause_instance(self, context, instance_id):
195 """Pause an instance on this server."""
196 context = context.elevated()
197@@ -297,6 +335,7 @@
198 result))
199
200 @exception.wrap_exception
201+ @checks_instance_lock
202 def unpause_instance(self, context, instance_id):
203 """Unpause a paused instance on this server."""
204 context = context.elevated()
205@@ -324,8 +363,12 @@
206 return self.driver.get_diagnostics(instance_ref)
207
208 @exception.wrap_exception
209+ @checks_instance_lock
210 def suspend_instance(self, context, instance_id):
211- """suspend the instance with instance_id"""
212+ """
213+ suspend the instance with instance_id
214+
215+ """
216 context = context.elevated()
217 instance_ref = self.db.instance_get(context, instance_id)
218
219@@ -340,8 +383,12 @@
220 result))
221
222 @exception.wrap_exception
223+ @checks_instance_lock
224 def resume_instance(self, context, instance_id):
225- """resume the suspended instance with instance_id"""
226+ """
227+ resume the suspended instance with instance_id
228+
229+ """
230 context = context.elevated()
231 instance_ref = self.db.instance_get(context, instance_id)
232
233@@ -356,6 +403,41 @@
234 result))
235
236 @exception.wrap_exception
237+ def lock_instance(self, context, instance_id):
238+ """
239+ lock the instance with instance_id
240+
241+ """
242+ context = context.elevated()
243+ instance_ref = self.db.instance_get(context, instance_id)
244+
245+ logging.debug(_('instance %s: locking'), instance_id)
246+ self.db.instance_update(context, instance_id, {'locked': True})
247+
248+ @exception.wrap_exception
249+ def unlock_instance(self, context, instance_id):
250+ """
251+ unlock the instance with instance_id
252+
253+ """
254+ context = context.elevated()
255+ instance_ref = self.db.instance_get(context, instance_id)
256+
257+ logging.debug(_('instance %s: unlocking'), instance_id)
258+ self.db.instance_update(context, instance_id, {'locked': False})
259+
260+ @exception.wrap_exception
261+ def get_lock(self, context, instance_id):
262+ """
263+ return the boolean state of (instance with instance_id)'s lock
264+
265+ """
266+ context = context.elevated()
267+ logging.debug(_('instance %s: getting locked state'), instance_id)
268+ instance_ref = self.db.instance_get(context, instance_id)
269+ return instance_ref['locked']
270+
271+ @exception.wrap_exception
272 def get_console_output(self, context, instance_id):
273 """Send the console output for an instance."""
274 context = context.elevated()
275@@ -365,6 +447,7 @@
276 return self.driver.get_console_output(instance_ref)
277
278 @exception.wrap_exception
279+ @checks_instance_lock
280 def attach_volume(self, context, instance_id, volume_id, mountpoint):
281 """Attach a volume to an instance."""
282 context = context.elevated()
283@@ -394,6 +477,7 @@
284 return True
285
286 @exception.wrap_exception
287+ @checks_instance_lock
288 def detach_volume(self, context, instance_id, volume_id):
289 """Detach a volume from an instance."""
290 context = context.elevated()
291
292=== modified file 'nova/db/sqlalchemy/models.py'
293--- nova/db/sqlalchemy/models.py 2011-01-05 22:59:41 +0000
294+++ nova/db/sqlalchemy/models.py 2011-01-07 00:00:12 +0000
295@@ -224,6 +224,8 @@
296 display_name = Column(String(255))
297 display_description = Column(String(255))
298
299+ locked = Column(Boolean)
300+
301 # TODO(vish): see Ewan's email about state improvements, probably
302 # should be in a driver base class or some such
303 # vmstate_state = running, halted, suspended, paused
304
305=== modified file 'nova/tests/test_compute.py'
306--- nova/tests/test_compute.py 2010-12-31 03:55:00 +0000
307+++ nova/tests/test_compute.py 2011-01-07 00:00:12 +0000
308@@ -178,3 +178,22 @@
309 self.context,
310 instance_id)
311 self.compute.terminate_instance(self.context, instance_id)
312+
313+ def test_lock(self):
314+ """ensure locked instance cannot be changed"""
315+ instance_id = self._create_instance()
316+ self.compute.run_instance(self.context, instance_id)
317+
318+ non_admin_context = context.RequestContext(None, None, False, False)
319+
320+ # decorator should return False (fail) with locked nonadmin context
321+ self.compute.lock_instance(self.context, instance_id)
322+ ret_val = self.compute.reboot_instance(non_admin_context, instance_id)
323+ self.assertEqual(ret_val, False)
324+
325+ # decorator should return None (success) with unlocked nonadmin context
326+ self.compute.unlock_instance(self.context, instance_id)
327+ ret_val = self.compute.reboot_instance(non_admin_context, instance_id)
328+ self.assertEqual(ret_val, None)
329+
330+ self.compute.terminate_instance(self.context, instance_id)