Merge lp:~jk0/nova/diagnostics-per-instance into lp:~hudson-openstack/nova/trunk

Proposed by Josh Kearney
Status: Merged
Approved by: Rick Clark
Approved revision: 478
Merged at revision: 475
Proposed branch: lp:~jk0/nova/diagnostics-per-instance
Merge into: lp:~hudson-openstack/nova/trunk
Diff against target: 403 lines (+101/-66)
6 files modified
nova/db/api.py (+5/-0)
nova/db/sqlalchemy/api.py (+12/-0)
nova/db/sqlalchemy/models.py (+0/-1)
nova/tests/virt_unittest.py (+24/-22)
nova/virt/xenapi/vmops.py (+22/-18)
nova/virt/xenapi_conn.py (+38/-25)
To merge this branch: bzr merge lp:~jk0/nova/diagnostics-per-instance
Reviewer Review Type Date Requested Status
Eric Day (community) Needs Information
Rick Clark (community) Approve
Ed Leafe (community) Approve
Review via email: mp+44394@code.launchpad.net

Commit message

Log all XenAPI actions to InstanceActions.

To post a comment you must log in.
475. By Josh Kearney

Filter templates and dom0 from list_instances()

476. By Josh Kearney

Style correction

477. By Josh Kearney

Merged trunk

478. By Josh Kearney

Merged trunk

Revision history for this message
Ed Leafe (ed-leafe) wrote :

Looks good.

review: Approve
Revision history for this message
Rick Clark (dendrobates) wrote :

lgtm and tests pass

review: Approve
Revision history for this message
Eric Day (eday) wrote :

How is the diagnostics data intended to be pulled out and analyzed? Right now it looks like it's only being stored, but there is no API or tools to read it.

Part of my concern with this and the previous branch sticking it into the same core database is that this seems like more of a logging hook, where you could push to some plugin interface, and then write implementations to a queue, another DB with monitoring, etc. It would be great to see more design details around how this data will be accessed and the use cases behind them before we start sticking it all in the DB.

review: Needs Information
Revision history for this message
Josh Kearney (jk0) wrote :

> How is the diagnostics data intended to be pulled out and analyzed? Right now
> it looks like it's only being stored, but there is no API or tools to read it.

Currently there is no way to retrieve the data short of making calls directly in SQL (that is my project today -- making this available via the API). My thoughts were to propose these merges in small steps instead of one big chunk, getting lots of good feedback along the way.

> Part of my concern with this and the previous branch sticking it into the same
> core database is that this seems like more of a logging hook, where you could
> push to some plugin interface, and then write implementations to a queue,
> another DB with monitoring, etc. It would be great to see more design details
> around how this data will be accessed and the use cases behind them before we
> start sticking it all in the DB.

I'm just going off of what the blueprint wants for now. I agree that this might not be the best way to store the data, but it's easy enough to change once the distributed data store design is in place. Otherwise I just don't see this getting done in time for Bexar.

Revision history for this message
Sandy Walsh (sandy-walsh) wrote :

Would like to see tests around _poll_task calling done.send_exception(XenAPI.Failure(error_info)) to ensure the action gets logged. This is independent of the pending xenserver tests.

If the len(action) > 255, will this bust the operation? Shouldn't just the logging fail, but the action complete?

Perhaps more of general question, but should there be an error code on failures?

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'nova/db/api.py'
--- nova/db/api.py 2010-11-03 22:06:00 +0000
+++ nova/db/api.py 2010-12-22 16:46:27 +0000
@@ -334,6 +334,11 @@
334 security_group_id)334 security_group_id)
335335
336336
337def instance_action_create(context, values):
338 """Create an instance action from the values dictionary."""
339 return IMPL.instance_action_create(context, values)
340
341
337###################342###################
338343
339344
340345
=== modified file 'nova/db/sqlalchemy/api.py'
--- nova/db/sqlalchemy/api.py 2010-12-21 15:25:39 +0000
+++ nova/db/sqlalchemy/api.py 2010-12-22 16:46:27 +0000
@@ -749,6 +749,18 @@
749 instance_ref.save(session=session)749 instance_ref.save(session=session)
750750
751751
752@require_context
753def instance_action_create(context, values):
754 """Create an instance action from the values dictionary."""
755 action_ref = models.InstanceActions()
756 action_ref.update(values)
757
758 session = get_session()
759 with session.begin():
760 action_ref.save(session=session)
761 return action_ref
762
763
752###################764###################
753765
754766
755767
=== modified file 'nova/db/sqlalchemy/models.py'
--- nova/db/sqlalchemy/models.py 2010-12-20 20:21:01 +0000
+++ nova/db/sqlalchemy/models.py 2010-12-22 16:46:27 +0000
@@ -248,7 +248,6 @@
248 instance_id = Column(Integer, ForeignKey('instances.id'))248 instance_id = Column(Integer, ForeignKey('instances.id'))
249249
250 action = Column(String(255))250 action = Column(String(255))
251 result = Column(Boolean)
252 error = Column(Text)251 error = Column(Text)
253252
254253
255254
=== modified file 'nova/tests/virt_unittest.py'
--- nova/tests/virt_unittest.py 2010-12-17 12:07:43 +0000
+++ nova/tests/virt_unittest.py 2010-12-22 16:46:27 +0000
@@ -129,43 +129,45 @@
129 check_list.append(check)129 check_list.append(check)
130 else:130 else:
131 if expect_kernel:131 if expect_kernel:
132 check = (lambda t: t.find('./os/kernel').text.split('/'132 check = (lambda t: t.find('./os/kernel').text.split(
133 )[1], 'kernel')133 '/')[1], 'kernel')
134 else:134 else:
135 check = (lambda t: t.find('./os/kernel'), None)135 check = (lambda t: t.find('./os/kernel'), None)
136 check_list.append(check)136 check_list.append(check)
137137
138 if expect_ramdisk:138 if expect_ramdisk:
139 check = (lambda t: t.find('./os/initrd').text.split('/'139 check = (lambda t: t.find('./os/initrd').text.split(
140 )[1], 'ramdisk')140 '/')[1], 'ramdisk')
141 else:141 else:
142 check = (lambda t: t.find('./os/initrd'), None)142 check = (lambda t: t.find('./os/initrd'), None)
143 check_list.append(check)143 check_list.append(check)
144144
145 common_checks = [145 common_checks = [
146 (lambda t: t.find('.').tag, 'domain'),146 (lambda t: t.find('.').tag, 'domain'),
147 (lambda t: t.find('./devices/interface/filterref/parameter'147 (lambda t: t.find(
148 ).get('name'), 'IP'),148 './devices/interface/filterref/parameter').get('name'), 'IP'),
149 (lambda t: t.find('./devices/interface/filterref/parameter'149 (lambda t: t.find(
150 ).get('value'), '10.11.12.13'),150 './devices/interface/filterref/parameter').get(
151 (lambda t: t.findall('./devices/interface/filterref/parameter'151 'value'), '10.11.12.13'),
152 )[1].get('name'), 'DHCPSERVER'),152 (lambda t: t.findall(
153 (lambda t: t.findall('./devices/interface/filterref/parameter'153 './devices/interface/filterref/parameter')[1].get(
154 )[1].get('value'), '10.0.0.1'),154 'name'), 'DHCPSERVER'),
155 (lambda t: t.find('./devices/serial/source').get('path'155 (lambda t: t.findall(
156 ).split('/')[1], 'console.log'),156 './devices/interface/filterref/parameter')[1].get(
157 'value'), '10.0.0.1'),
158 (lambda t: t.find('./devices/serial/source').get(
159 'path').split('/')[1], 'console.log'),
157 (lambda t: t.find('./memory').text, '2097152')]160 (lambda t: t.find('./memory').text, '2097152')]
158161
159 if rescue:162 if rescue:
160 common_checks += [(lambda t: t.findall('./devices/disk/source'163 common_checks += [
161 )[0].get('file').split('/')[1],164 (lambda t: t.findall('./devices/disk/source')[0].get(
162 'rescue-disk'),165 'file').split('/')[1], 'rescue-disk'),
163 (lambda t: t.findall('./devices/disk/source'166 (lambda t: t.findall('./devices/disk/source')[1].get(
164 )[1].get('file').split('/')[1],167 'file').split('/')[1], 'disk')]
165 'disk')]
166 else:168 else:
167 common_checks += [(lambda t: t.findall('./devices/disk/source'169 common_checks += [(lambda t: t.findall(
168 )[0].get('file').split('/')[1],170 './devices/disk/source')[0].get('file').split('/')[1],
169 'disk')]171 'disk')]
170172
171 for (libvirt_type, (expected_uri, checks)) in type_uri_map.iteritems():173 for (libvirt_type, (expected_uri, checks)) in type_uri_map.iteritems():
172174
=== modified file 'nova/virt/xenapi/vmops.py'
--- nova/virt/xenapi/vmops.py 2010-12-15 18:19:39 +0000
+++ nova/virt/xenapi/vmops.py 2010-12-22 16:46:27 +0000
@@ -44,12 +44,16 @@
44 VMHelper.late_import()44 VMHelper.late_import()
4545
46 def list_instances(self):46 def list_instances(self):
47 """ List VM instances """47 """List VM instances"""
48 return [self._session.get_xenapi().VM.get_name_label(vm) \48 vms = []
49 for vm in self._session.get_xenapi().VM.get_all()]49 for vm in self._session.get_xenapi().VM.get_all():
50 rec = self._session.get_xenapi().VM.get_record(vm)
51 if not rec["is_a_template"] and not rec["is_control_domain"]:
52 vms.append(rec["name_label"])
53 return vms
5054
51 def spawn(self, instance):55 def spawn(self, instance):
52 """ Create VM instance """56 """Create VM instance"""
53 vm = VMHelper.lookup(self._session, instance.name)57 vm = VMHelper.lookup(self._session, instance.name)
54 if vm is not None:58 if vm is not None:
55 raise Exception('Attempted to create non-unique name %s' %59 raise Exception('Attempted to create non-unique name %s' %
@@ -81,16 +85,16 @@
81 vm_ref)85 vm_ref)
8286
83 def reboot(self, instance):87 def reboot(self, instance):
84 """ Reboot VM instance """88 """Reboot VM instance"""
85 instance_name = instance.name89 instance_name = instance.name
86 vm = VMHelper.lookup(self._session, instance_name)90 vm = VMHelper.lookup(self._session, instance_name)
87 if vm is None:91 if vm is None:
88 raise Exception('instance not present %s' % instance_name)92 raise Exception('instance not present %s' % instance_name)
89 task = self._session.call_xenapi('Async.VM.clean_reboot', vm)93 task = self._session.call_xenapi('Async.VM.clean_reboot', vm)
90 self._session.wait_for_task(task)94 self._session.wait_for_task(instance.id, task)
9195
92 def destroy(self, instance):96 def destroy(self, instance):
93 """ Destroy VM instance """97 """Destroy VM instance"""
94 vm = VMHelper.lookup(self._session, instance.name)98 vm = VMHelper.lookup(self._session, instance.name)
95 if vm is None:99 if vm is None:
96 # Don't complain, just return. This lets us clean up instances100 # Don't complain, just return. This lets us clean up instances
@@ -101,7 +105,7 @@
101 try:105 try:
102 task = self._session.call_xenapi('Async.VM.hard_shutdown',106 task = self._session.call_xenapi('Async.VM.hard_shutdown',
103 vm)107 vm)
104 self._session.wait_for_task(task)108 self._session.wait_for_task(instance.id, task)
105 except XenAPI.Failure, exc:109 except XenAPI.Failure, exc:
106 logging.warn(exc)110 logging.warn(exc)
107 # Disk clean-up111 # Disk clean-up
@@ -109,43 +113,43 @@
109 for vdi in vdis:113 for vdi in vdis:
110 try:114 try:
111 task = self._session.call_xenapi('Async.VDI.destroy', vdi)115 task = self._session.call_xenapi('Async.VDI.destroy', vdi)
112 self._session.wait_for_task(task)116 self._session.wait_for_task(instance.id, task)
113 except XenAPI.Failure, exc:117 except XenAPI.Failure, exc:
114 logging.warn(exc)118 logging.warn(exc)
115 try:119 try:
116 task = self._session.call_xenapi('Async.VM.destroy', vm)120 task = self._session.call_xenapi('Async.VM.destroy', vm)
117 self._session.wait_for_task(task)121 self._session.wait_for_task(instance.id, task)
118 except XenAPI.Failure, exc:122 except XenAPI.Failure, exc:
119 logging.warn(exc)123 logging.warn(exc)
120124
121 def _wait_with_callback(self, task, callback):125 def _wait_with_callback(self, instance_id, task, callback):
122 ret = None126 ret = None
123 try:127 try:
124 ret = self._session.wait_for_task(task)128 ret = self._session.wait_for_task(instance_id, task)
125 except XenAPI.Failure, exc:129 except XenAPI.Failure, exc:
126 logging.warn(exc)130 logging.warn(exc)
127 callback(ret)131 callback(ret)
128132
129 def pause(self, instance, callback):133 def pause(self, instance, callback):
130 """ Pause VM instance """134 """Pause VM instance"""
131 instance_name = instance.name135 instance_name = instance.name
132 vm = VMHelper.lookup(self._session, instance_name)136 vm = VMHelper.lookup(self._session, instance_name)
133 if vm is None:137 if vm is None:
134 raise Exception('instance not present %s' % instance_name)138 raise Exception('instance not present %s' % instance_name)
135 task = self._session.call_xenapi('Async.VM.pause', vm)139 task = self._session.call_xenapi('Async.VM.pause', vm)
136 self._wait_with_callback(task, callback)140 self._wait_with_callback(instance.id, task, callback)
137141
138 def unpause(self, instance, callback):142 def unpause(self, instance, callback):
139 """ Unpause VM instance """143 """Unpause VM instance"""
140 instance_name = instance.name144 instance_name = instance.name
141 vm = VMHelper.lookup(self._session, instance_name)145 vm = VMHelper.lookup(self._session, instance_name)
142 if vm is None:146 if vm is None:
143 raise Exception('instance not present %s' % instance_name)147 raise Exception('instance not present %s' % instance_name)
144 task = self._session.call_xenapi('Async.VM.unpause', vm)148 task = self._session.call_xenapi('Async.VM.unpause', vm)
145 self._wait_with_callback(task, callback)149 self._wait_with_callback(instance.id, task, callback)
146150
147 def get_info(self, instance_id):151 def get_info(self, instance_id):
148 """ Return data about VM instance """152 """Return data about VM instance"""
149 vm = VMHelper.lookup_blocking(self._session, instance_id)153 vm = VMHelper.lookup_blocking(self._session, instance_id)
150 if vm is None:154 if vm is None:
151 raise Exception('instance not present %s' % instance_id)155 raise Exception('instance not present %s' % instance_id)
@@ -161,6 +165,6 @@
161 return VMHelper.compile_diagnostics(self._session, rec)165 return VMHelper.compile_diagnostics(self._session, rec)
162166
163 def get_console_output(self, instance):167 def get_console_output(self, instance):
164 """ Return snapshot of console """168 """Return snapshot of console"""
165 # TODO: implement this to fix pylint!169 # TODO: implement this to fix pylint!
166 return 'FAKE CONSOLE OUTPUT of instance'170 return 'FAKE CONSOLE OUTPUT of instance'
167171
=== modified file 'nova/virt/xenapi_conn.py'
--- nova/virt/xenapi_conn.py 2010-12-21 15:25:39 +0000
+++ nova/virt/xenapi_conn.py 2010-12-22 16:46:27 +0000
@@ -54,6 +54,8 @@
54from eventlet import event54from eventlet import event
55from eventlet import tpool55from eventlet import tpool
5656
57from nova import context
58from nova import db
57from nova import utils59from nova import utils
58from nova import flags60from nova import flags
59from nova.virt.xenapi.vmops import VMOps61from nova.virt.xenapi.vmops import VMOps
@@ -101,7 +103,7 @@
101103
102104
103class XenAPIConnection(object):105class XenAPIConnection(object):
104 """ A connection to XenServer or Xen Cloud Platform """106 """A connection to XenServer or Xen Cloud Platform"""
105107
106 def __init__(self, url, user, pw):108 def __init__(self, url, user, pw):
107 session = XenAPISession(url, user, pw)109 session = XenAPISession(url, user, pw)
@@ -109,31 +111,31 @@
109 self._volumeops = VolumeOps(session)111 self._volumeops = VolumeOps(session)
110112
111 def list_instances(self):113 def list_instances(self):
112 """ List VM instances """114 """List VM instances"""
113 return self._vmops.list_instances()115 return self._vmops.list_instances()
114116
115 def spawn(self, instance):117 def spawn(self, instance):
116 """ Create VM instance """118 """Create VM instance"""
117 self._vmops.spawn(instance)119 self._vmops.spawn(instance)
118120
119 def reboot(self, instance):121 def reboot(self, instance):
120 """ Reboot VM instance """122 """Reboot VM instance"""
121 self._vmops.reboot(instance)123 self._vmops.reboot(instance)
122124
123 def destroy(self, instance):125 def destroy(self, instance):
124 """ Destroy VM instance """126 """Destroy VM instance"""
125 self._vmops.destroy(instance)127 self._vmops.destroy(instance)
126128
127 def pause(self, instance, callback):129 def pause(self, instance, callback):
128 """ Pause VM instance """130 """Pause VM instance"""
129 self._vmops.pause(instance, callback)131 self._vmops.pause(instance, callback)
130132
131 def unpause(self, instance, callback):133 def unpause(self, instance, callback):
132 """ Unpause paused VM instance """134 """Unpause paused VM instance"""
133 self._vmops.unpause(instance, callback)135 self._vmops.unpause(instance, callback)
134136
135 def get_info(self, instance_id):137 def get_info(self, instance_id):
136 """ Return data about VM instance """138 """Return data about VM instance"""
137 return self._vmops.get_info(instance_id)139 return self._vmops.get_info(instance_id)
138140
139 def get_diagnostics(self, instance_id):141 def get_diagnostics(self, instance_id):
@@ -141,33 +143,33 @@
141 return self._vmops.get_diagnostics(instance_id)143 return self._vmops.get_diagnostics(instance_id)
142144
143 def get_console_output(self, instance):145 def get_console_output(self, instance):
144 """ Return snapshot of console """146 """Return snapshot of console"""
145 return self._vmops.get_console_output(instance)147 return self._vmops.get_console_output(instance)
146148
147 def attach_volume(self, instance_name, device_path, mountpoint):149 def attach_volume(self, instance_name, device_path, mountpoint):
148 """ Attach volume storage to VM instance """150 """Attach volume storage to VM instance"""
149 return self._volumeops.attach_volume(instance_name,151 return self._volumeops.attach_volume(instance_name,
150 device_path,152 device_path,
151 mountpoint)153 mountpoint)
152154
153 def detach_volume(self, instance_name, mountpoint):155 def detach_volume(self, instance_name, mountpoint):
154 """ Detach volume storage to VM instance """156 """Detach volume storage to VM instance"""
155 return self._volumeops.detach_volume(instance_name, mountpoint)157 return self._volumeops.detach_volume(instance_name, mountpoint)
156158
157159
158class XenAPISession(object):160class XenAPISession(object):
159 """ The session to invoke XenAPI SDK calls """161 """The session to invoke XenAPI SDK calls"""
160162
161 def __init__(self, url, user, pw):163 def __init__(self, url, user, pw):
162 self._session = XenAPI.Session(url)164 self._session = XenAPI.Session(url)
163 self._session.login_with_password(user, pw)165 self._session.login_with_password(user, pw)
164166
165 def get_xenapi(self):167 def get_xenapi(self):
166 """ Return the xenapi object """168 """Return the xenapi object"""
167 return self._session.xenapi169 return self._session.xenapi
168170
169 def get_xenapi_host(self):171 def get_xenapi_host(self):
170 """ Return the xenapi host """172 """Return the xenapi host"""
171 return self._session.xenapi.session.get_this_host(self._session.handle)173 return self._session.xenapi.session.get_this_host(self._session.handle)
172174
173 def call_xenapi(self, method, *args):175 def call_xenapi(self, method, *args):
@@ -183,42 +185,53 @@
183 self._session.xenapi.Async.host.call_plugin,185 self._session.xenapi.Async.host.call_plugin,
184 self.get_xenapi_host(), plugin, fn, args)186 self.get_xenapi_host(), plugin, fn, args)
185187
186 def wait_for_task(self, task):188 def wait_for_task(self, instance_id, task):
187 """Return a Deferred that will give the result of the given task.189 """Return a Deferred that will give the result of the given task.
188 The task is polled until it completes."""190 The task is polled until it completes."""
189191
190 done = event.Event()192 done = event.Event()
191 loop = utils.LoopingCall(self._poll_task, task, done)193 loop = utils.LoopingCall(self._poll_task, instance_id, task, done)
192 loop.start(FLAGS.xenapi_task_poll_interval, now=True)194 loop.start(FLAGS.xenapi_task_poll_interval, now=True)
193 rv = done.wait()195 rv = done.wait()
194 loop.stop()196 loop.stop()
195 return rv197 return rv
196198
197 def _poll_task(self, task, done):199 def _poll_task(self, instance_id, task, done):
198 """Poll the given XenAPI task, and fire the given Deferred if we200 """Poll the given XenAPI task, and fire the given Deferred if we
199 get a result."""201 get a result."""
200 try:202 try:
201 #logging.debug('Polling task %s...', task)203 name = self._session.xenapi.task.get_name_label(task)
202 status = self._session.xenapi.task.get_status(task)204 status = self._session.xenapi.task.get_status(task)
203 if status == 'pending':205 action = dict(
206 instance_id=int(instance_id),
207 action=name,
208 error=None)
209 if status == "pending":
204 return210 return
205 elif status == 'success':211 elif status == "success":
206 result = self._session.xenapi.task.get_result(task)212 result = self._session.xenapi.task.get_result(task)
207 logging.info(_('Task %s status: success. %s'), task, result)213 logging.info(_("Task [%s] %s status: success %s") % (
214 name,
215 task,
216 result))
208 done.send(_parse_xmlrpc_value(result))217 done.send(_parse_xmlrpc_value(result))
209 else:218 else:
210 error_info = self._session.xenapi.task.get_error_info(task)219 error_info = self._session.xenapi.task.get_error_info(task)
211 logging.warn(_('Task %s status: %s. %s'), task, status,220 action["error"] = str(error_info)
212 error_info)221 logging.warn(_("Task [%s] %s status: %s %s") % (
222 name,
223 task,
224 status,
225 error_info))
213 done.send_exception(XenAPI.Failure(error_info))226 done.send_exception(XenAPI.Failure(error_info))
214 #logging.debug('Polling task %s done.', task)227 db.instance_action_create(context.get_admin_context(), action)
215 except XenAPI.Failure, exc:228 except XenAPI.Failure, exc:
216 logging.warn(exc)229 logging.warn(exc)
217 done.send_exception(*sys.exc_info())230 done.send_exception(*sys.exc_info())
218231
219232
220def _unwrap_plugin_exceptions(func, *args, **kwargs):233def _unwrap_plugin_exceptions(func, *args, **kwargs):
221 """ Parse exception details """234 """Parse exception details"""
222 try:235 try:
223 return func(*args, **kwargs)236 return func(*args, **kwargs)
224 except XenAPI.Failure, exc:237 except XenAPI.Failure, exc: