Merge lp:~salvatore-orlando/nova/bug723301 into lp:~hudson-openstack/nova/trunk

Proposed by Salvatore Orlando
Status: Merged
Approved by: Brian Waldon
Approved revision: 798
Merged at revision: 1250
Proposed branch: lp:~salvatore-orlando/nova/bug723301
Merge into: lp:~hudson-openstack/nova/trunk
Diff against target: 621 lines (+320/-95)
5 files modified
nova/tests/test_xenapi.py (+42/-0)
nova/tests/xenapi/stubs.py (+39/-0)
nova/virt/xenapi/vm_utils.py (+116/-49)
nova/virt/xenapi/vmops.py (+121/-44)
plugins/xenserver/xenapi/etc/xapi.d/plugins/glance (+2/-2)
To merge this branch: bzr merge lp:~salvatore-orlando/nova/bug723301
Reviewer Review Type Date Requested Status
Brian Waldon (community) Approve
Soren Hansen (community) Abstain
Brian Lamar (community) Approve
Sandy Walsh (community) Approve
termie (community) Needs Fixing
Matt Dietz (community) Approve
Rick Harris (community) Approve
Review via email: mp+53482@code.launchpad.net

Commit message

implemented clean-up logic when VM fails to spawn for xenapi back-end

Description of the change

The spawn method in VMOps has been modified in order to handle exceptions occuring before a VM record is created; the exception handling procedure will destroy any VDI created so far, as well as remove kernel/ramdisk files from /boot/guest in dom0 file system.

Exceptions generated by methods dealing with resources on the hypervisor called by spawn are caught and then raised again adding the identifier of any already created VDI.

To post a comment you must log in.
Revision history for this message
Brian Lamar (blamar) wrote :

I'm not familiar with a lot of the terminology here but hopefully I can be of some assistance.

33: self.assertRaises(Exception,
34: + self._test_spawn, 1, 2, 3)

Can this look for a particular exception instead of `Exception`?

139 + task = session.call_xenapi('Async.VDI.destroy', vdi_ref)
140 + session.wait_for_task(task)

Do both line 139 and 140 raise the cls.XenAPI.Failure exception? If not, only the calls that can raise the exception in question should be encapsulated in the try/except.

I have the same question for lines 181 through 209. Which exact calls are you looking to catch an exception from? I don't like "except BaseException" which will catch all sort of things (SystemExit, etc.).

Same for line 216, is it possible to be more explicit than "except:"?

Same for 262 through 320. Error handling is best handed at a more granular level.
Same for 407 through 452. That is a lot of logic to handle for the error case.

Overall I'd like to see more concise error handling using more specific errors if/where possible.

review: Needs Fixing
Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

Hi Brian,

thanks for your prompt review.
Please see my comments inline.

> I'm not familiar with a lot of the terminology here but hopefully I can be of
> some assistance.
>
> 33: self.assertRaises(Exception,
> 34: + self._test_spawn, 1, 2, 3)
>
> Can this look for a particular exception instead of `Exception`?

The reason for which I'm looking for a generic exception is that the spawn process might fail in several ways. However, since I'm forcing to fail in a particular way by stubbing out a method in the test case, I guess I can easily use a specific type of exception in assertRaises.

>
> 139 + task = session.call_xenapi('Async.VDI.destroy', vdi_ref)
> 140 + session.wait_for_task(task)
>
> Do both line 139 and 140 raise the cls.XenAPI.Failure exception? If not, only
> the calls that can raise the exception in question should be encapsulated in
> the try/except.

Yes, they can both raise a XenAPI Failure. Line 139 might fail, for instance, if we lose connection to the server, whereas line 140 raises a Failure if the async operation started at line 139 fails for some reason.

>
> I have the same question for lines 181 through 209. Which exact calls are you
> looking to catch an exception from? I don't like "except BaseException" which
> will catch all sort of things (SystemExit, etc.).
>

I used BaseException in order to be able to handle both system and user exception. However, as concerns line 181 to 209, I'm not interested in anything else than XenAPI.Failure. I will change the code to catch only this kind of failure

> Same for line 216, is it possible to be more explicit than "except:"?
>

Definitely.

> Same for 262 through 320. Error handling is best handed at a more granular
> level.

In this code fragment we can have several kinds of failures (Glance failure, OS failures during partition preparation, XenAPI failures). I used a single try/except block with BaseException for having a single point for handling all those exception. I understand however this code will not deal properly with unexpected exceptions that I'm not prepared to handle in the catch block.

I can either replace BaseException with the exact types of exception I'm ready to handle or have smaller try/except blocks only for bits that I'm interesting in.
Which approach would you suggest to be better (or should I probably have smaller try/except blocks catching only specific expections) ?
> Same for 407 through 452. That is a lot of logic to handle for the error case.
>
The except block in this code fragment deals with exception raised in the spawn method and exception throw by other methods. I agree that is quite a lot code. I'll try and find a way to reduce the error handling logic.
>
> Overall I'd like to see more concise error handling using more specific errors
> if/where possible.
Agreed.

Revision history for this message
Brian Lamar (blamar) wrote :

> I can either replace BaseException with the exact types of exception I'm ready to handle or
> have smaller try/except blocks only for bits that I'm interesting in.
> Which approach would you suggest to be better (or should I probably have smaller try/except
> blocks catching only specific expections) ?

I'd love to see several smaller try/except blocks encapsulating the relevant calls. If/when this isn't possible you can catch multiple types of exceptions. If code can be shared between the except blocks then break those out into functions etc.

Thanks for the code!

Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

Some comments to help reviewers' task:
During the spawn process, we create several resources before the VM record is created (VDIs and files in dom0's filesystem). The aim of this bug fix is to make sure all these resources are removed if the spawn process fails *before* the VM record is created; if the spawn process fails after that, all the resources will be removed upon instance termination, as the VM record has a reference to all of them.

Resources to be removed might include:
- disk VDI
- kernel/ramdisk files on dom0's filesystem (in /boot/guest)
- temporary VDIs for kernel and ramdisk

The resources to be removed will be stored in a dict. Keys of the dict are Image Types (KERNEL, RAMDISK, VHD, DISK), whereas values are a 2-element tuple in which the 1st element is the VDI uuid and the 2nd element the filename on dom0.

We look for the following exceptions:
- XenAPI.Failure which can occurs in several places
- OSError, which can occur when operations like parted or pygrub fail
- IOError, which can occur when _stream_disk fails (e.g.: backend failures, glance failures)

When an exception occurs, exception handling blocks in interested function add information about resources created so far to the dict, and then raise again the error. Resources are then finally removed in _handle_spawn_error.

As an example:
{
  ImageType.KERNEL: (None, /boot/guest/kernel-file),
  ImageType.RAMDISK: (ramdisk-vdi-uuid,),
  ImageType.DISK: (disk-vdi-uuid,)
}

Represents the dict that would have been built if an exception occurred while copying the temporary ramdisk VDI to dom0. The 'main' exception handling function, VMOps._handle_spawn_error will therefore remove ramdisk and disk VDI using XenAPI's VDI.destroy, and invoke a XenAPI plugin call for removing /boot/guest/kernel-file

Revision history for this message
Matt Dietz (cerberus) wrote :

Thanks for this Salvatore! We've need this for a while now.

Could only find a few things to pick on ;-)

PEP8 / Style nit-picks:

96 import os
97 +import sys
98 import pickle

pep8: sys should come after re in this list

351 import subprocess
352 import tempfile
353 import uuid
354 +import sys

Same thing

review: Needs Fixing
Revision history for this message
Sandy Walsh (sandy-walsh) wrote :

Thanks for the clarification of what's going on here. That makes my life much easier. Perhaps consider putting that in a comment somewhere? :)

Great branch by the way! Much needed.

Minor style point: I think there're supposed to be a space after the hash in:
 # Comment ...

Easy changes:

433 + if len(vm_create_error.args) > 0:
434 + last_arg = vm_create_error.args[len(vm_create_error.args) - 1]
if vm_create_error.args:
   last_arg = vm_create_error.args[-1]

same thing in 455-456, 475, 479

465 + if vdi_to_remove != None:
if vdi_to_remove:

Set WIP so I can approve afterwards ...

review: Needs Fixing
Revision history for this message
Sandy Walsh (sandy-walsh) wrote :

easy to tell I'm fried today ... s/there're/there is/

(there're ... wtf?)

Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

Style comments have been addressed...
all but one:

if len(items_to_remove) > 1:
476 + # there is also a file to remove
477 + files_to_remove[vdi_type] = items_to_remove[1]

as we are actually checking that there are at least two elements in the list.

Thanks for spending time reviewing this branch, I hope you can find some time to give another round of review!

Revision history for this message
termie (termie) wrote :

- line 251: please put one arg per line, or all on the same line on the next line, double indented
- please format all docstrings according to guide in HACKING
- line 258: perhaps assign the lambda on the line before and pass it in to this function instead of defining it inline here as it gets a bit messy
- line 526: dictionary contents on the same line as the opening parens, you can remove the comment and you doing foo[:255] is the normal way of saying foo[0:255]
- line 551: you actually need that extra line

review: Needs Fixing
Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

Guys, I have addressed the last review's comments, ball in your court now :)

Cheers!

Revision history for this message
Brian Lamar (blamar) :
review: Approve
Revision history for this message
termie (termie) wrote :

almost there, just looks like the couple docstrings in nova/tests/test_xenapi.py need to be formatted correctly.

review: Needs Fixing
Revision history for this message
Soren Hansen (soren) wrote :

> === modified file 'nova/tests/test_xenapi.py'
> --- nova/tests/test_xenapi.py 2011-03-28 21:00:17 +0000
> +++ nova/tests/test_xenapi.py 2011-03-31 09:25:51 +0000
> @@ -327,6 +327,19 @@
> self.assertEquals(self.vm['HVM_boot_params'], {})
> self.assertEquals(self.vm['HVM_boot_policy'], '')
>
> + def _check_no_unbound_vdi(self):
> + url = FLAGS.xenapi_connection_url
> + username = FLAGS.xenapi_connection_username
> + password = FLAGS.xenapi_connection_password
> + session = xenapi_conn.XenAPISession(url, username, password)
> + vdi_refs = session.call_xenapi('VDI.get_all')
> + for vdi_ref in vdi_refs:
> + vdi_rec = session.call_xenapi('VDI.get_record', vdi_ref)
> + if 'VBDs' in vdi_rec:
> + self.assertEquals(vdi_rec['VBDs'], {})
> + else:
> + self.fail('Found unexpected unbound VDI:%s' % vdi_rec['uuid'])

I'm having trouble followin the data model.
If vdi_rec has a 'VBDs' key, it has to be empty. If there's no 'VBDs' key, that
means unexpected unbound VDI's are found? Is that accurate?

review: Needs Information
Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

@termie:

docstrings are now formatted according to PEP-257

@Soren:

I agree the code is at least cryptic. Basically the test fails if a VDI without a VBD was found (hence 'unbound'). The VBD is the 'Virtual Block Device' that binds a VDI to a VM.
The other check:

if 'VBDs' in vdi_rec:
             self.assertEquals(vdi_rec['VBDs'], {})

was in place because we are dealing with the fake xenapi driver, but was not strictly required.

However, I have improved the unit tests trying to make them more clear: we now extract the whole list of VDIs before and after the spawn process, and verify that no new VDI has been added.

Revision history for this message
Rick Harris (rconradharris) wrote :

Really nice work here, Salvatore. Marking as approved with a few optional cleanup suggestions:

> 401 + if not image_type in (ImageType.KERNEL, ImageType.RAMDISK):

`X not in Y` seems a bit more readable:

    if image_type not in (ImageType.KERNEL, ImageType.RAMDISK):

> 205 + def destroy_vdi(cls, session, vdi_ref):

Now that we have a destroy_vdi helper, can we modify vmops:_destroy_vdis and
vmops:_destroy_rescue_vdis to use this. Should DRY up the code somewhat.

May have to add a `raise_on_error` kwarg since _destroy_rescue_vdis (for some
reason) just `continue`s.

> + except (self.XenAPI.Failure, OSError, IOError) as vm_create_error:

For me, the code which propogates the 'rollback-context` to the handler is a
little hard to follow. I think it may be due to trying to re-use the original
exception and attaching the rollback-context as a positional-arg.

As a point of discussion, another route might be to create a wrapper exception called
`ExceptionWithRollback` where any information needed to rollback after the
original exception is triggered is stored within a `rollback_context` dict.

Then, after handling the `ExceptionWithRollback` exception, we can just
re-raise the inner exception, something like:

  raise rollback_exc.orig_exc # re-raise the wrapped exception

The advantage of this approach, is that it clearly delinates the original
exception from its rollback-context, we don't have to fool around with last
position-arg code, and we can potentially re-use this concept/pattern in other
portions of Xenapi virt layer.

review: Approve
Revision history for this message
Matt Dietz (cerberus) wrote :

lgtm. We'll see what Termie has to say

review: Approve
Revision history for this message
termie (termie) wrote :

These two docstrings should have a one line summary, followed by a newline and then the additional information.

37 + """Simulates an error while downloading an image.
38 + Verifies that VDIs created are properly cleaned up.
39 +
40 + """

For example, would become:

"""Simulates an error while downloading an image.

Verifies that VDIs created are properly cleaned up.

"""

50 + def test_spawn_fail_cleanup_2(self):
51 + """Simulates an error while creating VM record. It
52 + verifies that VDIs created are properly cleaned up.
53 +
54 + """

review: Needs Fixing
Revision history for this message
Dan Prince (dan-prince) wrote :

Hi Salvatore,

I get conflicts when trying to merge your latest code w/ lp:nova. Can you have a look?

Dan

Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

Sorry for the delay in fixing this code, I was on vacation.

Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

Hi,

I am reproposing this branch for merging.
Code has been adapted to latest changes in routines for creating VDIs and fetching images from glance and objectstore.

However, I think the code can be improved as we are now using two distinct sets of values for describing image types, the ImageType enumeration, and the vdi_type key returned by methods which fetch images.

I was thinking of using just one enumeration, thus simplifying future code maintenance.
I submitted a question on this topic: https://answers.launchpad.net/nova/+question/161496

Thanks,
Salvatore

Revision history for this message
Sandy Walsh (sandy-walsh) wrote :

+252/259 commented out?
+295/303 ... remove?
+436 necessary?

Needs a trunk merge ... conflicts.

Nit picky: Comments should be proper sentences. Capitalized and punctuated. Space after #

review: Needs Fixing
Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

Hi Sandy,

I've addressed your comments and resolved the conflicts.

Thanks,
Salvatore

Revision history for this message
Sandy Walsh (sandy-walsh) wrote :

Thanks ... works for me!

review: Approve
Revision history for this message
Vish Ishaya (vishvananda) wrote :

looks good to go as soon as termie's docstring concerns are addressed.

Revision history for this message
Brian Waldon (bcwaldon) wrote :

Setting to WIP until test_xenapi.py docstrings are formatted correctly.

Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

docstrings fixed

Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

No setting back to needs reviews though, as I get the following exception while running tests:

======================================================================
ERROR: <nose.suite.ContextSuite context=nova.tests>
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/pymodules/python2.6/nose/suite.py", line 183, in run
    self.setUp()
  File "/usr/lib/pymodules/python2.6/nose/suite.py", line 264, in setUp
    self.setupContext(ancestor)
  File "/usr/lib/pymodules/python2.6/nose/suite.py", line 287, in setupContext
    try_run(context, names)
  File "/usr/lib/pymodules/python2.6/nose/util.py", line 487, in try_run
    return func()
  File "/home/salvatore/workspace/Openstack-bug723301/nova/tests/__init__.py", line 73, in setup
    cleandb = os.path.join(FLAGS.state_path, FLAGS.sqlite_clean_db)
  File "/home/salvatore/workspace/Openstack-bug723301/nova/flags.py", line 144, in __getattr__
    val = gflags.FlagValues.__getattr__(self, name)
  File "/usr/lib/pymodules/python2.6/gflags.py", line 810, in __getattr__
    raise AttributeError(name)
AttributeError: sqlite_clean_db

It is the first time I get it, I don't think it depends on my code, but I prefer to fix it before proposing again for merge.

Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

Previous issue with tests has now been solved.
Setting back to needs review.

Revision history for this message
Brian Lamar (blamar) :
review: Approve
Revision history for this message
Brian Waldon (bcwaldon) wrote :

I think termie's concerns have been addressed. Waiting to hear back from Soren

Revision history for this message
Soren Hansen (soren) wrote :

My question has been answered, and this MP has more than enough +1's. Just ignore me :)

review: Abstain
Revision history for this message
Brian Waldon (bcwaldon) wrote :

Looks good to me

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'nova/tests/test_xenapi.py'
2--- nova/tests/test_xenapi.py 2011-06-27 21:48:03 +0000
3+++ nova/tests/test_xenapi.py 2011-07-04 16:24:47 +0000
4@@ -381,6 +381,18 @@
5 self.assertEquals(self.vm['HVM_boot_params'], {})
6 self.assertEquals(self.vm['HVM_boot_policy'], '')
7
8+ def _list_vdis(self):
9+ url = FLAGS.xenapi_connection_url
10+ username = FLAGS.xenapi_connection_username
11+ password = FLAGS.xenapi_connection_password
12+ session = xenapi_conn.XenAPISession(url, username, password)
13+ return session.call_xenapi('VDI.get_all')
14+
15+ def _check_vdis(self, start_list, end_list):
16+ for vdi_ref in end_list:
17+ if not vdi_ref in start_list:
18+ self.fail('Found unexpected VDI:%s' % vdi_ref)
19+
20 def _test_spawn(self, image_ref, kernel_id, ramdisk_id,
21 instance_type_id="3", os_type="linux",
22 architecture="x86-64", instance_id=1,
23@@ -422,6 +434,36 @@
24 self._test_spawn,
25 1, 2, 3, "4") # m1.xlarge
26
27+ def test_spawn_fail_cleanup_1(self):
28+ """Simulates an error while downloading an image.
29+
30+ Verifies that VDIs created are properly cleaned up.
31+
32+ """
33+ vdi_recs_start = self._list_vdis()
34+ FLAGS.xenapi_image_service = 'glance'
35+ stubs.stubout_fetch_image_glance_disk(self.stubs)
36+ self.assertRaises(xenapi_fake.Failure,
37+ self._test_spawn, 1, 2, 3)
38+ # No additional VDI should be found.
39+ vdi_recs_end = self._list_vdis()
40+ self._check_vdis(vdi_recs_start, vdi_recs_end)
41+
42+ def test_spawn_fail_cleanup_2(self):
43+ """Simulates an error while creating VM record.
44+
45+ It verifies that VDIs created are properly cleaned up.
46+
47+ """
48+ vdi_recs_start = self._list_vdis()
49+ FLAGS.xenapi_image_service = 'glance'
50+ stubs.stubout_create_vm(self.stubs)
51+ self.assertRaises(xenapi_fake.Failure,
52+ self._test_spawn, 1, 2, 3)
53+ # No additional VDI should be found.
54+ vdi_recs_end = self._list_vdis()
55+ self._check_vdis(vdi_recs_start, vdi_recs_end)
56+
57 def test_spawn_raw_objectstore(self):
58 FLAGS.xenapi_image_service = 'objectstore'
59 self._test_spawn(1, None, None)
60
61=== modified file 'nova/tests/xenapi/stubs.py'
62--- nova/tests/xenapi/stubs.py 2011-06-06 15:00:51 +0000
63+++ nova/tests/xenapi/stubs.py 2011-07-04 16:24:47 +0000
64@@ -98,6 +98,42 @@
65 stubs.Set(vm_utils, '_is_vdi_pv', f)
66
67
68+def stubout_determine_is_pv_objectstore(stubs):
69+ """Assumes VMs never have PV kernels"""
70+
71+ @classmethod
72+ def f(cls, *args):
73+ return False
74+ stubs.Set(vm_utils.VMHelper, '_determine_is_pv_objectstore', f)
75+
76+
77+def stubout_lookup_image(stubs):
78+ """Simulates a failure in lookup image."""
79+ def f(_1, _2, _3, _4):
80+ raise Exception("Test Exception raised by fake lookup_image")
81+ stubs.Set(vm_utils, 'lookup_image', f)
82+
83+
84+def stubout_fetch_image_glance_disk(stubs):
85+ """Simulates a failure in fetch image_glance_disk."""
86+
87+ @classmethod
88+ def f(cls, *args):
89+ raise fake.Failure("Test Exception raised by " +
90+ "fake fetch_image_glance_disk")
91+ stubs.Set(vm_utils.VMHelper, '_fetch_image_glance_disk', f)
92+
93+
94+def stubout_create_vm(stubs):
95+ """Simulates a failure in create_vm."""
96+
97+ @classmethod
98+ def f(cls, *args):
99+ raise fake.Failure("Test Exception raised by " +
100+ "fake create_vm")
101+ stubs.Set(vm_utils.VMHelper, 'create_vm', f)
102+
103+
104 def stubout_loopingcall_start(stubs):
105 def fake_start(self, interval, now=True):
106 self.f(*self.args, **self.kw)
107@@ -120,6 +156,9 @@
108 super(FakeSessionForVMTests, self).__init__(uri)
109
110 def host_call_plugin(self, _1, _2, plugin, method, _5):
111+ # If the call is for 'copy_kernel_vdi' return None.
112+ if method == 'copy_kernel_vdi':
113+ return
114 sr_ref = fake.get_all('SR')[0]
115 vdi_ref = fake.create_vdi('', False, sr_ref, False)
116 vdi_rec = fake.get_record('VDI', vdi_ref)
117
118=== modified file 'nova/virt/xenapi/vm_utils.py'
119--- nova/virt/xenapi/vm_utils.py 2011-06-24 12:01:51 +0000
120+++ nova/virt/xenapi/vm_utils.py 2011-07-04 16:24:47 +0000
121@@ -23,6 +23,7 @@
122 import os
123 import pickle
124 import re
125+import sys
126 import tempfile
127 import time
128 import urllib
129@@ -71,17 +72,51 @@
130 class ImageType:
131 """
132 Enumeration class for distinguishing different image types
133- 0 - kernel/ramdisk image (goes on dom0's filesystem)
134- 1 - disk image (local SR, partitioned by objectstore plugin)
135- 2 - raw disk image (local SR, NOT partitioned by plugin)
136- 3 - vhd disk image (local SR, NOT inspected by XS, PV assumed for
137+ 0 - kernel image (goes on dom0's filesystem)
138+ 1 - ramdisk image (goes on dom0's filesystem)
139+ 2 - disk image (local SR, partitioned by objectstore plugin)
140+ 3 - raw disk image (local SR, NOT partitioned by plugin)
141+ 4 - vhd disk image (local SR, NOT inspected by XS, PV assumed for
142 linux, HVM assumed for Windows)
143 """
144
145- KERNEL_RAMDISK = 0
146- DISK = 1
147- DISK_RAW = 2
148- DISK_VHD = 3
149+ KERNEL = 0
150+ RAMDISK = 1
151+ DISK = 2
152+ DISK_RAW = 3
153+ DISK_VHD = 4
154+
155+ KERNEL_STR = "kernel"
156+ RAMDISK_STR = "ramdisk"
157+ DISK_STR = "os"
158+ DISK_RAW_STR = "os_raw"
159+ DISK_VHD_STR = "vhd"
160+
161+ @classmethod
162+ def to_string(cls, image_type):
163+ if image_type == ImageType.KERNEL:
164+ return ImageType.KERNEL_STR
165+ elif image_type == ImageType.RAMDISK:
166+ return ImageType.RAMDISK_STR
167+ elif image_type == ImageType.DISK:
168+ return ImageType.DISK_STR
169+ elif image_type == ImageType.DISK_RAW:
170+ return ImageType.DISK_RAW_STR
171+ elif image_type == ImageType.DISK_VHD:
172+ return ImageType.VHD_STR
173+
174+ @classmethod
175+ def from_string(cls, image_type_str):
176+ if image_type_str == ImageType.KERNEL_STR:
177+ return ImageType.KERNEL
178+ elif image_type == ImageType.RAMDISK_STR:
179+ return ImageType.RAMDISK
180+ elif image_type == ImageType.DISK_STR:
181+ return ImageType.DISK
182+ elif image_type == ImageType.DISK_RAW_STR:
183+ return ImageType.DISK_RAW
184+ elif image_type == ImageType.DISK_VHD_STR:
185+ return ImageType.VHD
186
187
188 class VMHelper(HelperBase):
189@@ -145,7 +180,6 @@
190 'VCPUs_max': vcpus,
191 'VCPUs_params': {},
192 'xenstore_data': {}}
193-
194 # Complete VM configuration record according to the image type
195 # non-raw/raw with PV kernel/raw in HVM mode
196 if use_pv_kernel:
197@@ -240,6 +274,15 @@
198 raise StorageError(_('Unable to destroy VBD %s') % vbd_ref)
199
200 @classmethod
201+ def destroy_vdi(cls, session, vdi_ref):
202+ try:
203+ task = session.call_xenapi('Async.VDI.destroy', vdi_ref)
204+ session.wait_for_task(task)
205+ except cls.XenAPI.Failure, exc:
206+ LOG.exception(exc)
207+ raise StorageError(_('Unable to destroy VDI %s') % vdi_ref)
208+
209+ @classmethod
210 def create_vif(cls, session, vm_ref, network_ref, mac_address,
211 dev, rxtx_cap=0):
212 """Create a VIF record. Returns a Deferred that gives the new
213@@ -394,12 +437,12 @@
214 """
215 LOG.debug(_("Asking xapi to fetch vhd image %(image)s")
216 % locals())
217-
218 sr_ref = safe_find_sr(session)
219
220- # NOTE(sirp): The Glance plugin runs under Python 2.4 which does not
221- # have the `uuid` module. To work around this, we generate the uuids
222- # here (under Python 2.6+) and pass them as arguments
223+ # NOTE(sirp): The Glance plugin runs under Python 2.4
224+ # which does not have the `uuid` module. To work around this,
225+ # we generate the uuids here (under Python 2.6+) and
226+ # pass them as arguments
227 uuid_stack = [str(uuid.uuid4()) for i in xrange(2)]
228
229 glance_host, glance_port = \
230@@ -449,18 +492,20 @@
231 # FIXME(sirp): Since the Glance plugin seems to be required for the
232 # VHD disk, it may be worth using the plugin for both VHD and RAW and
233 # DISK restores
234+ LOG.debug(_("Fetching image %(image)s") % locals())
235+ LOG.debug(_("Image Type: %s"), ImageType.to_string(image_type))
236 sr_ref = safe_find_sr(session)
237
238 glance_client, image_id = nova.image.get_glance_client(image)
239 meta, image_file = glance_client.get_image(image_id)
240 virtual_size = int(meta['size'])
241 vdi_size = virtual_size
242- LOG.debug(_("Size for image %(image)s:%(virtual_size)d") % locals())
243-
244+ LOG.debug(_("Size for image %(image)s:" +
245+ "%(virtual_size)d") % locals())
246 if image_type == ImageType.DISK:
247 # Make room for MBR.
248 vdi_size += MBR_SIZE_BYTES
249- elif image_type == ImageType.KERNEL_RAMDISK and \
250+ elif image_type in (ImageType.KERNEL, ImageType.RAMDISK) and \
251 vdi_size > FLAGS.max_kernel_ramdisk_size:
252 max_size = FLAGS.max_kernel_ramdisk_size
253 raise exception.Error(
254@@ -469,29 +514,45 @@
255
256 name_label = get_name_label_for_image(image)
257 vdi_ref = cls.create_vdi(session, sr_ref, name_label, vdi_size, False)
258-
259- with_vdi_attached_here(session, vdi_ref, False,
260- lambda dev:
261- _stream_disk(dev, image_type,
262- virtual_size, image_file))
263- if image_type == ImageType.KERNEL_RAMDISK:
264- #we need to invoke a plugin for copying VDI's
265- #content into proper path
266- LOG.debug(_("Copying VDI %s to /boot/guest on dom0"), vdi_ref)
267- fn = "copy_kernel_vdi"
268- args = {}
269- args['vdi-ref'] = vdi_ref
270- #let the plugin copy the correct number of bytes
271- args['image-size'] = str(vdi_size)
272- task = session.async_call_plugin('glance', fn, args)
273- filename = session.wait_for_task(task, instance_id)
274- #remove the VDI as it is not needed anymore
275- session.get_xenapi().VDI.destroy(vdi_ref)
276- LOG.debug(_("Kernel/Ramdisk VDI %s destroyed"), vdi_ref)
277- return filename
278- else:
279+ # From this point we have a VDI on Xen host;
280+ # If anything goes wrong, we need to remember its uuid.
281+ try:
282+ filename = None
283 vdi_uuid = session.get_xenapi().VDI.get_uuid(vdi_ref)
284- return [dict(vdi_type='os', vdi_uuid=vdi_uuid)]
285+ with_vdi_attached_here(session, vdi_ref, False,
286+ lambda dev:
287+ _stream_disk(dev, image_type,
288+ virtual_size, image_file))
289+ if image_type in (ImageType.KERNEL, ImageType.RAMDISK):
290+ # We need to invoke a plugin for copying the
291+ # content of the VDI into the proper path.
292+ LOG.debug(_("Copying VDI %s to /boot/guest on dom0"), vdi_ref)
293+ fn = "copy_kernel_vdi"
294+ args = {}
295+ args['vdi-ref'] = vdi_ref
296+ # Let the plugin copy the correct number of bytes.
297+ args['image-size'] = str(vdi_size)
298+ task = session.async_call_plugin('glance', fn, args)
299+ filename = session.wait_for_task(task, instance_id)
300+ # Remove the VDI as it is not needed anymore.
301+ session.get_xenapi().VDI.destroy(vdi_ref)
302+ LOG.debug(_("Kernel/Ramdisk VDI %s destroyed"), vdi_ref)
303+ return [dict(vdi_type=ImageType.to_string(image_type),
304+ vdi_uuid=None,
305+ file=filename)]
306+ else:
307+ return [dict(vdi_type=ImageType.to_string(image_type),
308+ vdi_uuid=vdi_uuid,
309+ file=None)]
310+ except (cls.XenAPI.Failure, IOError, OSError) as e:
311+ # We look for XenAPI and OS failures.
312+ LOG.exception(_("instance %s: Failed to fetch glance image"),
313+ instance_id, exc_info=sys.exc_info())
314+ e.args = e.args + ([dict(vdi_type=ImageType.
315+ to_string(image_type),
316+ vdi_uuid=vdi_uuid,
317+ file=filename)],)
318+ raise e
319
320 @classmethod
321 def determine_disk_image_type(cls, instance):
322@@ -506,7 +567,8 @@
323 whether a kernel_id is specified.
324 """
325 def log_disk_format(image_type):
326- pretty_format = {ImageType.KERNEL_RAMDISK: 'KERNEL_RAMDISK',
327+ pretty_format = {ImageType.KERNEL: 'KERNEL',
328+ ImageType.RAMDISK: 'RAMDISK',
329 ImageType.DISK: 'DISK',
330 ImageType.DISK_RAW: 'DISK_RAW',
331 ImageType.DISK_VHD: 'DISK_VHD'}
332@@ -519,8 +581,8 @@
333 def determine_from_glance():
334 glance_disk_format2nova_type = {
335 'ami': ImageType.DISK,
336- 'aki': ImageType.KERNEL_RAMDISK,
337- 'ari': ImageType.KERNEL_RAMDISK,
338+ 'aki': ImageType.KERNEL,
339+ 'ari': ImageType.RAMDISK,
340 'raw': ImageType.DISK_RAW,
341 'vhd': ImageType.DISK_VHD}
342 image_ref = instance.image_ref
343@@ -553,7 +615,7 @@
344 image_type):
345 """Fetch image from glance based on image type.
346
347- Returns: A single filename if image_type is KERNEL_RAMDISK
348+ Returns: A single filename if image_type is KERNEL or RAMDISK
349 A list of dictionaries that describe VDIs, otherwise
350 """
351 if image_type == ImageType.DISK_VHD:
352@@ -568,13 +630,13 @@
353 secret, image_type):
354 """Fetch an image from objectstore.
355
356- Returns: A single filename if image_type is KERNEL_RAMDISK
357+ Returns: A single filename if image_type is KERNEL or RAMDISK
358 A list of dictionaries that describe VDIs, otherwise
359 """
360 url = "http://%s:%s/_images/%s/image" % (FLAGS.s3_host, FLAGS.s3_port,
361 image)
362 LOG.debug(_("Asking xapi to fetch %(url)s as %(access)s") % locals())
363- if image_type == ImageType.KERNEL_RAMDISK:
364+ if image_type in (ImageType.KERNEL, ImageType.RAMDISK):
365 fn = 'get_kernel'
366 else:
367 fn = 'get_vdi'
368@@ -584,15 +646,20 @@
369 args['password'] = secret
370 args['add_partition'] = 'false'
371 args['raw'] = 'false'
372- if image_type != ImageType.KERNEL_RAMDISK:
373+ if not image_type in (ImageType.KERNEL, ImageType.RAMDISK):
374 args['add_partition'] = 'true'
375 if image_type == ImageType.DISK_RAW:
376 args['raw'] = 'true'
377 task = session.async_call_plugin('objectstore', fn, args)
378- uuid_or_fn = session.wait_for_task(task, instance_id)
379- if image_type != ImageType.KERNEL_RAMDISK:
380- return [dict(vdi_type='os', vdi_uuid=uuid_or_fn)]
381- return uuid_or_fn
382+ vdi_uuid = None
383+ filename = None
384+ if image_type in (ImageType.KERNEL, ImageType.RAMDISK):
385+ filename = session.wait_for_task(task, instance_id)
386+ else:
387+ vdi_uuid = session.wait_for_task(task, instance_id)
388+ return [dict(vdi_type=ImageType.to_string(image_type),
389+ vdi_uuid=vdi_uuid,
390+ file=filename)]
391
392 @classmethod
393 def determine_is_pv(cls, session, instance_id, vdi_ref, disk_image_type,
394
395=== modified file 'nova/virt/xenapi/vmops.py'
396--- nova/virt/xenapi/vmops.py 2011-06-30 19:20:59 +0000
397+++ nova/virt/xenapi/vmops.py 2011-07-04 16:24:47 +0000
398@@ -25,6 +25,7 @@
399 import os
400 import pickle
401 import subprocess
402+import sys
403 import time
404 import uuid
405
406@@ -137,9 +138,18 @@
407 return vdis
408
409 def spawn(self, instance, network_info):
410- vdis = self._create_disks(instance)
411- vm_ref = self._create_vm(instance, vdis, network_info)
412- self._spawn(instance, vm_ref)
413+ vdis = None
414+ try:
415+ vdis = self._create_disks(instance)
416+ vm_ref = self._create_vm(instance, vdis, network_info)
417+ self._spawn(instance, vm_ref)
418+ except (self.XenAPI.Failure, OSError, IOError) as spawn_error:
419+ LOG.exception(_("instance %s: Failed to spawn"),
420+ instance.id, exc_info=sys.exc_info())
421+ LOG.debug(_('Instance %s failed to spawn - performing clean-up'),
422+ instance.id)
423+ self._handle_spawn_error(vdis, spawn_error)
424+ raise spawn_error
425
426 def spawn_rescue(self, instance):
427 """Spawn a rescue instance."""
428@@ -165,42 +175,64 @@
429 project = AuthManager().get_project(instance.project_id)
430
431 disk_image_type = VMHelper.determine_disk_image_type(instance)
432-
433 kernel = None
434- if instance.kernel_id:
435- kernel = VMHelper.fetch_image(self._session, instance.id,
436- instance.kernel_id, user, project,
437- ImageType.KERNEL_RAMDISK)
438-
439 ramdisk = None
440- if instance.ramdisk_id:
441- ramdisk = VMHelper.fetch_image(self._session, instance.id,
442- instance.ramdisk_id, user, project,
443- ImageType.KERNEL_RAMDISK)
444-
445- # Create the VM ref and attach the first disk
446- first_vdi_ref = self._session.call_xenapi('VDI.get_by_uuid',
447- vdis[0]['vdi_uuid'])
448-
449- vm_mode = instance.vm_mode and instance.vm_mode.lower()
450- if vm_mode == 'pv':
451- use_pv_kernel = True
452- elif vm_mode in ('hv', 'hvm'):
453- use_pv_kernel = False
454- vm_mode = 'hvm' # Normalize
455- else:
456- use_pv_kernel = VMHelper.determine_is_pv(self._session,
457- instance.id, first_vdi_ref, disk_image_type,
458- instance.os_type)
459- vm_mode = use_pv_kernel and 'pv' or 'hvm'
460-
461- if instance.vm_mode != vm_mode:
462- # Update database with normalized (or determined) value
463- db.instance_update(context.get_admin_context(),
464- instance['id'], {'vm_mode': vm_mode})
465-
466- vm_ref = VMHelper.create_vm(self._session, instance,
467- kernel, ramdisk, use_pv_kernel)
468+ try:
469+ if instance.kernel_id:
470+ kernel = VMHelper.fetch_image(self._session, instance.id,
471+ instance.kernel_id, user, project,
472+ ImageType.KERNEL)[0]
473+ if instance.ramdisk_id:
474+ ramdisk = VMHelper.fetch_image(self._session, instance.id,
475+ instance.ramdisk_id, user, project,
476+ ImageType.RAMDISK)[0]
477+ # Create the VM ref and attach the first disk
478+ first_vdi_ref = self._session.call_xenapi('VDI.get_by_uuid',
479+ vdis[0]['vdi_uuid'])
480+
481+ vm_mode = instance.vm_mode and instance.vm_mode.lower()
482+ if vm_mode == 'pv':
483+ use_pv_kernel = True
484+ elif vm_mode in ('hv', 'hvm'):
485+ use_pv_kernel = False
486+ vm_mode = 'hvm' # Normalize
487+ else:
488+ use_pv_kernel = VMHelper.determine_is_pv(self._session,
489+ instance.id, first_vdi_ref, disk_image_type,
490+ instance.os_type)
491+ vm_mode = use_pv_kernel and 'pv' or 'hvm'
492+
493+ if instance.vm_mode != vm_mode:
494+ # Update database with normalized (or determined) value
495+ db.instance_update(context.get_admin_context(),
496+ instance['id'], {'vm_mode': vm_mode})
497+ vm_ref = VMHelper.create_vm(self._session, instance,
498+ kernel and kernel.get('file', None) or None,
499+ ramdisk and ramdisk.get('file', None) or None,
500+ use_pv_kernel)
501+ except (self.XenAPI.Failure, OSError, IOError) as vm_create_error:
502+ # Collect VDI/file resources to clean up;
503+ # These resources will be removed by _handle_spawn_error.
504+ LOG.exception(_("instance %s: Failed to spawn - " +
505+ "Unable to create VM"),
506+ instance.id, exc_info=sys.exc_info())
507+ last_arg = None
508+ resources = []
509+
510+ if vm_create_error.args:
511+ last_arg = vm_create_error.args[-1]
512+ if isinstance(last_arg, list):
513+ resources = last_arg
514+ else:
515+ vm_create_error.args = vm_create_error.args + (resources,)
516+
517+ if kernel:
518+ resources.append(kernel)
519+ if ramdisk:
520+ resources.append(ramdisk)
521+
522+ raise vm_create_error
523+
524 VMHelper.create_vbd(session=self._session, vm_ref=vm_ref,
525 vdi_ref=first_vdi_ref, userdevice=0, bootable=True)
526
527@@ -321,6 +353,47 @@
528
529 return timer.start(interval=0.5, now=True)
530
531+ def _handle_spawn_error(self, vdis, spawn_error):
532+ # Extract resource list from spawn_error.
533+ resources = []
534+ if spawn_error.args:
535+ last_arg = spawn_error.args[-1]
536+ resources = last_arg
537+ if vdis:
538+ for vdi in vdis:
539+ resources.append(dict(vdi_type=vdi['vdi_type'],
540+ vdi_uuid=vdi['vdi_uuid'],
541+ file=None))
542+
543+ LOG.debug(_("Resources to remove:%s"), resources)
544+ kernel_file = None
545+ ramdisk_file = None
546+
547+ for item in resources:
548+ vdi_type = item['vdi_type']
549+ vdi_to_remove = item['vdi_uuid']
550+ if vdi_to_remove:
551+ try:
552+ vdi_ref = self._session.call_xenapi('VDI.get_by_uuid',
553+ vdi_to_remove)
554+ LOG.debug(_('Removing VDI %(vdi_ref)s' +
555+ '(uuid:%(vdi_to_remove)s)'), locals())
556+ VMHelper.destroy_vdi(self._session, vdi_ref)
557+ except self.XenAPI.Failure:
558+ # Vdi has already been deleted
559+ LOG.debug(_("Skipping VDI destroy for %s"), vdi_to_remove)
560+ if item['file']:
561+ # There is also a file to remove.
562+ if vdi_type == ImageType.KERNEL_STR:
563+ kernel_file = item['file']
564+ elif vdi_type == ImageType.RAMDISK_STR:
565+ ramdisk_file = item['file']
566+
567+ if kernel_file or ramdisk_file:
568+ LOG.debug(_("Removing kernel/ramdisk files from dom0"))
569+ self._destroy_kernel_ramdisk_plugin_call(kernel_file,
570+ ramdisk_file)
571+
572 def _get_vm_opaque_ref(self, instance_or_vm):
573 """
574 Refactored out the common code of many methods that receive either
575@@ -698,6 +771,16 @@
576 VMHelper.unplug_vbd(self._session, vbd_ref)
577 VMHelper.destroy_vbd(self._session, vbd_ref)
578
579+ def _destroy_kernel_ramdisk_plugin_call(self, kernel, ramdisk):
580+ args = {}
581+ if kernel:
582+ args['kernel-file'] = kernel
583+ if ramdisk:
584+ args['ramdisk-file'] = ramdisk
585+ task = self._session.async_call_plugin(
586+ 'glance', 'remove_kernel_ramdisk', args)
587+ self._session.wait_for_task(task)
588+
589 def _destroy_kernel_ramdisk(self, instance, vm_ref):
590 """Three situations can occur:
591
592@@ -727,13 +810,7 @@
593 (kernel, ramdisk) = VMHelper.lookup_kernel_ramdisk(self._session,
594 vm_ref)
595
596- LOG.debug(_("Removing kernel/ramdisk files"))
597-
598- args = {'kernel-file': kernel, 'ramdisk-file': ramdisk}
599- task = self._session.async_call_plugin(
600- 'glance', 'remove_kernel_ramdisk', args)
601- self._session.wait_for_task(task, instance.id)
602-
603+ self._destroy_kernel_ramdisk_plugin_call(kernel, ramdisk)
604 LOG.debug(_("kernel/ramdisk files removed"))
605
606 def _destroy_vm(self, instance, vm_ref):
607
608=== modified file 'plugins/xenserver/xenapi/etc/xapi.d/plugins/glance'
609--- plugins/xenserver/xenapi/etc/xapi.d/plugins/glance 2011-06-03 18:14:28 +0000
610+++ plugins/xenserver/xenapi/etc/xapi.d/plugins/glance 2011-07-04 16:24:47 +0000
611@@ -412,8 +412,8 @@
612
613 def remove_kernel_ramdisk(session, args):
614 """Removes kernel and/or ramdisk from dom0's file system"""
615- kernel_file = exists(args, 'kernel-file')
616- ramdisk_file = exists(args, 'ramdisk-file')
617+ kernel_file = optional(args, 'kernel-file')
618+ ramdisk_file = optional(args, 'ramdisk-file')
619 if kernel_file:
620 os.remove(kernel_file)
621 if ramdisk_file: