Merge lp:~jk0/nova/diagnostics-per-instance into lp:~hudson-openstack/nova/trunk

Proposed by Josh Kearney
Status: Merged
Approved by: Eric Day
Approved revision: 503
Merged at revision: 507
Proposed branch: lp:~jk0/nova/diagnostics-per-instance
Merge into: lp:~hudson-openstack/nova/trunk
Diff against target: 304 lines (+92/-24)
14 files modified
nova/api/openstack/__init__.py (+2/-0)
nova/api/openstack/servers.py (+10/-0)
nova/compute/api.py (+14/-0)
nova/compute/manager.py (+9/-0)
nova/db/api.py (+5/-0)
nova/db/sqlalchemy/api.py (+12/-0)
nova/db/sqlalchemy/models.py (+2/-17)
nova/tests/api/openstack/test_servers.py (+16/-0)
nova/tests/test_xenapi.py (+5/-1)
nova/virt/fake.py (+3/-0)
nova/virt/libvirt_conn.py (+3/-0)
nova/virt/xenapi/vm_utils.py (+4/-0)
nova/virt/xenapi/vmops.py (+4/-3)
nova/virt/xenapi_conn.py (+3/-3)
To merge this branch: bzr merge lp:~jk0/nova/diagnostics-per-instance
Reviewer Review Type Date Requested Status
Eric Day (community) Approve
Matt Dietz (community) Approve
Review via email: mp+44872@code.launchpad.net

Commit message

Make InstanceActions and live diagnostics available through the Admin API.

Description of the change

The InstanceDiagnostics table was removed since storing diagnostics is beyond the scope of this blueprint and is a big enough project to have its own blueprint.

To post a comment you must log in.
Revision history for this message
Matt Dietz (cerberus) wrote :

I'm not going to make you do this, but we need to more explicitly establish what "admin" API means, because right now it's simply whether or not a whole slew of extra routes are available. No vetting of any kind is being done. I know that the DB layer will ultimately reject the request, but I think that's a silly waste of time if the API 1) already knows what you are and 2) can do something about it

Also, all controller verbs returning data are expected to format it into a dictionary for proper formatting into the Accept'd type. It doesn't look like this would return anything we would want to the client. We should be consistent

review: Needs Fixing
Revision history for this message
Matt Dietz (cerberus) wrote :

Josh corrected me offline. I'm fine with the rest

review: Approve
Revision history for this message
Eric Day (eday) wrote :

70, 81: Does this need to be elevated?

For get_actions, why are you diving all the way down into the virt worker when you could just read the table from compute.api? I know we discussed storing this data elsewhere eventually since it's not critical for operation, but for now we are still central (and this call will probably happen in compuite.api. I imagine when libvirt support is added it will use the same reporting mechanism (which is currently the actions table), so I wouldn't worry about different implementations yet either.

review: Needs Fixing
Revision history for this message
Cory Wright (corywright) wrote :

I don't understand the changes to test_run_terminate. Why change

  self.assertEqual(len(instances), 1)

to

  self.assertNotEqual(instance_count, 0)

? The former is more specific and is a tighter test.

Also, the assertion should probably compare against a constant value that you are expecting, 0 or 1 in this case, rather than len(instances) + 1.

You may have a good reason for these changes and I'm just not aware of it. If so, let me know. :)

Revision history for this message
Josh Kearney (jk0) wrote :

> 70, 81: Does this need to be elevated?

I don't believe so -- good catch. I'll remove that and test again without it.

> For get_actions, why are you diving all the way down into the virt worker when
> you could just read the table from compute.api? I know we discussed storing
> this data elsewhere eventually since it's not critical for operation, but for
> now we are still central (and this call will probably happen in compuite.api.
> I imagine when libvirt support is added it will use the same reporting
> mechanism (which is currently the actions table), so I wouldn't worry about
> different implementations yet either.

Sure thing. I'll clean that up.

Revision history for this message
Josh Kearney (jk0) wrote :

> I don't understand the changes to test_run_terminate. Why change
>
> self.assertEqual(len(instances), 1)
>
> to
>
> self.assertNotEqual(instance_count, 0)
>
> ? The former is more specific and is a tighter test.

I had to change that because the test was failing when more than one instance was returned. By doing it this way, as long as the instance count is >0, we know there is one or more running. The restriction comes from Python 2.6's unittest. There just isn't much to work with - it's either Equal or Not Equal.

> Also, the assertion should probably compare against a constant value that you
> are expecting, 0 or 1 in this case, rather than len(instances) + 1.
>
> You may have a good reason for these changes and I'm just not aware of it. If
> so, let me know. :)

The thing is we don't know that it will *always* be 1 or 0. We could start off with 15 running instances. Terminate one. Assert that the original instance_count is equal to the current instance count + 1. If we were running Python 2.7's unittest, there is assertGreater, assertGreaterEqual, etc, but unfortunately those aren't available in 2.6.

Revision history for this message
Eric Day (eday) wrote :

But what actually changed in this patch that would have broken the tests? Is this unrelated to the changes you propose? I've not seen any issues with this test case failing as it was.

Revision history for this message
Josh Kearney (jk0) wrote :

> But what actually changed in this patch that would have broken the tests? Is
> this unrelated to the changes you propose? I've not seen any issues with this
> test case failing as it was.

Nothing changed in this branch that would break the tests -- I just merged trunk yesterday and ran the tests and had that problem.

Revision history for this message
Eric Day (eday) wrote :

get_actions is still going over an RPC call to the compute manager, where it could be answered directly in compute.api. Push it one more layer up :)

Revision history for this message
Cory Wright (corywright) wrote :

> > Also, the assertion should probably compare against a constant value that you
> > are expecting, 0 or 1 in this case, rather than len(instances) + 1.
> >
> The thing is we don't know that it will *always* be 1 or 0. We could start off
> with 15 running instances. Terminate one. Assert that the original
> instance_count is equal to the current instance count + 1.

Shouldn't those instances be torn down between each test case?

> If we were running
> Python 2.7's unittest, there is assertGreater, assertGreaterEqual, etc, but
> unfortunately those aren't available in 2.6.

Don't forget that you can always just use Python's built-in assert:

  assert len(instances) > 0

Revision history for this message
Eric Day (eday) wrote :

Sorry to be a bother, but compute.api methods should include a verb, such as get_actions, not just 'actions'. In the future I could see having other operations on actions. :)

Revision history for this message
Eric Day (eday) wrote :

lgtm!

review: Approve
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :
Download full text (40.3 KiB)

The attempt to merge lp:~jk0/nova/diagnostics-per-instance into lp:nova failed. Below is the output from the failed tests.

TrialTestCase
    runTest ok
AdminAPITest
    test_admin_disabled ok
    test_admin_enabled ok
APITest
    test_exceptions_are_converted_to_faults ok
Test
    test_authorize_token ok
    test_authorize_user ok
    test_bad_token ok
    test_bad_user ok
    test_no_user ok
    test_token_expiry ok
TestLimiter
    test_authorize_token ok
TestFaults
    test_fault_parts ok
    test_raise ok
    test_retry_header ok
FlavorsTest
    test_get_flavor_by_id ok
    test_get_flavor_list ok
GlanceImageServiceTest
    test_create ok
    test_create_and_show_non_existing_image ok
    test_delete ok
    test_update ok
ImageControllerWithGlanceServiceTest
    test_get_image_details ok
    test_get_image_index ok
LocalImageServiceTest
    test_create ok
    test_create_and_show_non_existing_image ok
    test_delete ok
    test_update ok
LimiterTest
    test_minute ok
    test_one_per_period ok
    test_second ok
    test_users_get_separate_buckets ok
    test_we_can_go_indefinitely_if_we_spread_out_requests ok
WSGIAppProxyTest
    test_200 ok
    test_403 ok
    test_failure ok
WSGIAppTest
    test_escaping ok
    test_good_urls ok
    test_invalid_methods ok
    test_invalid_urls ok
    test_response_to_delays ok
ServersTest
    test_create_backup_schedules ERROR
    test_create_instance ERROR
    test_delete_backup_schedules ERROR
    test_delete_server_instance ...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :
Download full text (17.7 KiB)

The attempt to merge lp:~jk0/nova/diagnostics-per-instance into lp:nova failed. Below is the output from the failed tests.

TrialTestCase
    runTest ok
AdminAPITest
    test_admin_disabled ok
    test_admin_enabled ok
APITest
    test_exceptions_are_converted_to_faults ok
Test
    test_authorize_token ok
    test_authorize_user ok
    test_bad_token ok
    test_bad_user ok
    test_no_user ok
    test_token_expiry ok
TestLimiter
    test_authorize_token ok
TestFaults
    test_fault_parts ok
    test_raise ok
    test_retry_header ok
FlavorsTest
    test_get_flavor_by_id ok
    test_get_flavor_list ok
GlanceImageServiceTest
    test_create ok
    test_create_and_show_non_existing_image ok
    test_delete ok
    test_update ok
ImageControllerWithGlanceServiceTest
    test_get_image_details ok
    test_get_image_index ok
LocalImageServiceTest
    test_create ok
    test_create_and_show_non_existing_image ok
    test_delete ok
    test_update ok
LimiterTest
    test_minute ok
    test_one_per_period ok
    test_second ok
    test_users_get_separate_buckets ok
    test_we_can_go_indefinitely_if_we_spread_out_requests ok
WSGIAppProxyTest
    test_200 ok
    test_403 ok
    test_failure ok
WSGIAppTest
    test_escaping ok
    test_good_urls ok
    test_invalid_methods ok
    test_invalid_urls ok
    test_response_to_delays ok
ServersTest
    test_create_backup_schedules ok
    test_create_instance ok
    test_delete_backup_schedules ok
    test_delete_server_instance ...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :
Download full text (17.7 KiB)

The attempt to merge lp:~jk0/nova/diagnostics-per-instance into lp:nova failed. Below is the output from the failed tests.

TrialTestCase
    runTest ok
AdminAPITest
    test_admin_disabled ok
    test_admin_enabled ok
APITest
    test_exceptions_are_converted_to_faults ok
Test
    test_authorize_token ok
    test_authorize_user ok
    test_bad_token ok
    test_bad_user ok
    test_no_user ok
    test_token_expiry ok
TestLimiter
    test_authorize_token ok
TestFaults
    test_fault_parts ok
    test_raise ok
    test_retry_header ok
FlavorsTest
    test_get_flavor_by_id ok
    test_get_flavor_list ok
GlanceImageServiceTest
    test_create ok
    test_create_and_show_non_existing_image ok
    test_delete ok
    test_update ok
ImageControllerWithGlanceServiceTest
    test_get_image_details ok
    test_get_image_index ok
LocalImageServiceTest
    test_create ok
    test_create_and_show_non_existing_image ok
    test_delete ok
    test_update ok
LimiterTest
    test_minute ok
    test_one_per_period ok
    test_second ok
    test_users_get_separate_buckets ok
    test_we_can_go_indefinitely_if_we_spread_out_requests ok
WSGIAppProxyTest
    test_200 ok
    test_403 ok
    test_failure ok
WSGIAppTest
    test_escaping ok
    test_good_urls ok
    test_invalid_methods ok
    test_invalid_urls ok
    test_response_to_delays ok
ServersTest
    test_create_backup_schedules ok
    test_create_instance ok
    test_delete_backup_schedules ok
    test_delete_server_instance ...

501. By Josh Kearney

Removed problematic test

502. By Josh Kearney

Improved test

503. By Josh Kearney

Cleanup

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'nova/api/openstack/__init__.py'
2--- nova/api/openstack/__init__.py 2010-12-30 17:26:21 +0000
3+++ nova/api/openstack/__init__.py 2010-12-31 01:09:13 +0000
4@@ -93,6 +93,8 @@
5 logging.debug("Including admin operations in API.")
6 server_members['pause'] = 'POST'
7 server_members['unpause'] = 'POST'
8+ server_members["diagnostics"] = "GET"
9+ server_members["actions"] = "GET"
10 server_members['suspend'] = 'POST'
11 server_members['resume'] = 'POST'
12
13
14=== modified file 'nova/api/openstack/servers.py'
15--- nova/api/openstack/servers.py 2010-12-28 23:34:51 +0000
16+++ nova/api/openstack/servers.py 2010-12-31 01:09:13 +0000
17@@ -218,3 +218,13 @@
18 logging.error(_("compute.api::resume %s"), readable)
19 return faults.Fault(exc.HTTPUnprocessableEntity())
20 return exc.HTTPAccepted()
21+
22+ def diagnostics(self, req, id):
23+ """Permit Admins to retrieve server diagnostics."""
24+ ctxt = req.environ["nova.context"]
25+ return self.compute_api.get_diagnostics(ctxt, id)
26+
27+ def actions(self, req, id):
28+ """Permit Admins to retrieve server actions."""
29+ ctxt = req.environ["nova.context"]
30+ return self.compute_api.get_actions(ctxt, id)
31
32=== modified file 'nova/compute/api.py'
33--- nova/compute/api.py 2010-12-28 19:01:54 +0000
34+++ nova/compute/api.py 2010-12-31 01:09:13 +0000
35@@ -293,6 +293,20 @@
36 {"method": "unpause_instance",
37 "args": {"instance_id": instance['id']}})
38
39+ def get_diagnostics(self, context, instance_id):
40+ """Retrieve diagnostics for the given instance."""
41+ instance = self.db.instance_get_by_internal_id(context, instance_id)
42+ host = instance["host"]
43+ return rpc.call(context,
44+ self.db.queue_get_for(context, FLAGS.compute_topic, host),
45+ {"method": "get_diagnostics",
46+ "args": {"instance_id": instance["id"]}})
47+
48+ def get_actions(self, context, instance_id):
49+ """Retrieve actions for the given instance."""
50+ instance = self.db.instance_get_by_internal_id(context, instance_id)
51+ return self.db.instance_get_actions(context, instance["id"])
52+
53 def suspend(self, context, instance_id):
54 """suspend the instance with instance_id"""
55 instance = self.db.instance_get_by_internal_id(context, instance_id)
56
57=== modified file 'nova/compute/manager.py'
58--- nova/compute/manager.py 2010-12-28 19:01:54 +0000
59+++ nova/compute/manager.py 2010-12-31 01:09:13 +0000
60@@ -318,6 +318,15 @@
61 result))
62
63 @exception.wrap_exception
64+ def get_diagnostics(self, context, instance_id):
65+ """Retrieve diagnostics for an instance on this server."""
66+ instance_ref = self.db.instance_get(context, instance_id)
67+
68+ if instance_ref["state"] == power_state.RUNNING:
69+ logging.debug(_("instance %s: retrieving diagnostics"),
70+ instance_ref["internal_id"])
71+ return self.driver.get_diagnostics(instance_ref)
72+
73 def suspend_instance(self, context, instance_id):
74 """suspend the instance with instance_id"""
75 context = context.elevated()
76
77=== modified file 'nova/db/api.py'
78--- nova/db/api.py 2010-12-22 21:17:30 +0000
79+++ nova/db/api.py 2010-12-31 01:09:13 +0000
80@@ -383,6 +383,11 @@
81 return IMPL.instance_action_create(context, values)
82
83
84+def instance_get_actions(context, instance_id):
85+ """Get instance actions by instance id."""
86+ return IMPL.instance_get_actions(context, instance_id)
87+
88+
89 ###################
90
91
92
93=== modified file 'nova/db/sqlalchemy/api.py'
94--- nova/db/sqlalchemy/api.py 2010-12-23 21:41:54 +0000
95+++ nova/db/sqlalchemy/api.py 2010-12-31 01:09:13 +0000
96@@ -856,6 +856,18 @@
97 return action_ref
98
99
100+@require_admin_context
101+def instance_get_actions(context, instance_id):
102+ """Return the actions associated to the given instance id"""
103+ session = get_session()
104+ actions = {}
105+ for action in session.query(models.InstanceActions).\
106+ filter_by(instance_id=instance_id).\
107+ all():
108+ actions[action.action] = action.error
109+ return actions
110+
111+
112 ###################
113
114
115
116=== modified file 'nova/db/sqlalchemy/models.py'
117--- nova/db/sqlalchemy/models.py 2010-12-22 21:17:30 +0000
118+++ nova/db/sqlalchemy/models.py 2010-12-31 01:09:13 +0000
119@@ -22,7 +22,7 @@
120 import datetime
121
122 from sqlalchemy.orm import relationship, backref, object_mapper
123-from sqlalchemy import Column, Integer, Float, String, schema
124+from sqlalchemy import Column, Integer, String, schema
125 from sqlalchemy import ForeignKey, DateTime, Boolean, Text
126 from sqlalchemy.exc import IntegrityError
127 from sqlalchemy.ext.declarative import declarative_base
128@@ -236,21 +236,6 @@
129 # 'shutdown', 'shutoff', 'crashed'])
130
131
132-class InstanceDiagnostics(BASE, NovaBase):
133- """Represents a guest VM's diagnostics"""
134- __tablename__ = "instance_diagnostics"
135- id = Column(Integer, primary_key=True)
136- instance_id = Column(Integer, ForeignKey('instances.id'))
137-
138- memory_available = Column(Float)
139- memory_free = Column(Float)
140- cpu_load = Column(Float)
141- disk_read = Column(Float)
142- disk_write = Column(Float)
143- net_tx = Column(Float)
144- net_rx = Column(Float)
145-
146-
147 class InstanceActions(BASE, NovaBase):
148 """Represents a guest VM's actions and results"""
149 __tablename__ = "instance_actions"
150@@ -561,7 +546,7 @@
151 it will never need to be called explicitly elsewhere.
152 """
153 from sqlalchemy import create_engine
154- models = (Service, Instance, InstanceDiagnostics, InstanceActions,
155+ models = (Service, Instance, InstanceActions,
156 Volume, ExportDevice, IscsiTarget, FixedIp, FloatingIp,
157 Network, SecurityGroup, SecurityGroupIngressRule,
158 SecurityGroupInstanceAssociation, AuthToken, User,
159
160=== modified file 'nova/tests/api/openstack/test_servers.py'
161--- nova/tests/api/openstack/test_servers.py 2010-12-23 21:33:16 +0000
162+++ nova/tests/api/openstack/test_servers.py 2010-12-31 01:09:13 +0000
163@@ -95,6 +95,10 @@
164 fake_compute_api)
165 self.stubs.Set(nova.compute.api.ComputeAPI, 'resume',
166 fake_compute_api)
167+ self.stubs.Set(nova.compute.api.ComputeAPI, "get_diagnostics",
168+ fake_compute_api)
169+ self.stubs.Set(nova.compute.api.ComputeAPI, "get_actions",
170+ fake_compute_api)
171 self.allow_admin = FLAGS.allow_admin_api
172
173 def tearDown(self):
174@@ -274,6 +278,18 @@
175 res = req.get_response(nova.api.API('os'))
176 self.assertEqual(res.status_int, 202)
177
178+ def test_server_diagnostics(self):
179+ req = webob.Request.blank("/v1.0/servers/1/diagnostics")
180+ req.method = "GET"
181+ res = req.get_response(nova.api.API("os"))
182+ self.assertEqual(res.status_int, 404)
183+
184+ def test_server_actions(self):
185+ req = webob.Request.blank("/v1.0/servers/1/actions")
186+ req.method = "GET"
187+ res = req.get_response(nova.api.API("os"))
188+ self.assertEqual(res.status_int, 404)
189+
190 def test_server_reboot(self):
191 body = dict(server=dict(
192 name='server_test', imageId=2, flavorId=2, metadata={},
193
194=== modified file 'nova/tests/test_xenapi.py'
195--- nova/tests/test_xenapi.py 2010-12-28 18:53:32 +0000
196+++ nova/tests/test_xenapi.py 2010-12-31 01:09:13 +0000
197@@ -166,6 +166,10 @@
198 instances = self.conn.list_instances()
199 self.assertEquals(instances, [])
200
201+ def test_get_diagnostics(self):
202+ instance = self._create_instance()
203+ self.conn.get_diagnostics(instance)
204+
205 def test_instance_snapshot(self):
206 stubs.stubout_instance_snapshot(self.stubs)
207 instance = self._create_instance()
208@@ -253,7 +257,7 @@
209 'kernel_id': 2,
210 'ramdisk_id': 3,
211 'instance_type': 'm1.large',
212- 'mac_address': 'aa:bb:cc:dd:ee:ff',
213+ 'mac_address': 'aa:bb:cc:dd:ee:ff'
214 }
215 instance = db.instance_create(values)
216 self.conn.spawn(instance)
217
218=== modified file 'nova/virt/fake.py'
219--- nova/virt/fake.py 2010-12-28 19:01:54 +0000
220+++ nova/virt/fake.py 2010-12-31 01:09:13 +0000
221@@ -216,6 +216,9 @@
222 'num_cpu': 2,
223 'cpu_time': 0}
224
225+ def get_diagnostics(self, instance_name):
226+ pass
227+
228 def list_disks(self, instance_name):
229 """
230 Return the IDs of all the virtual disks attached to the specified
231
232=== modified file 'nova/virt/libvirt_conn.py'
233--- nova/virt/libvirt_conn.py 2010-12-30 17:26:21 +0000
234+++ nova/virt/libvirt_conn.py 2010-12-31 01:09:13 +0000
235@@ -587,6 +587,9 @@
236 'num_cpu': num_cpu,
237 'cpu_time': cpu_time}
238
239+ def get_diagnostics(self, instance_name):
240+ raise exception.APIError("diagnostics are not supported for libvirt")
241+
242 def get_disks(self, instance_name):
243 """
244 Note that this function takes an instance name, not an Instance, so
245
246=== modified file 'nova/virt/xenapi/vm_utils.py'
247--- nova/virt/xenapi/vm_utils.py 2010-12-28 19:01:54 +0000
248+++ nova/virt/xenapi/vm_utils.py 2010-12-31 01:09:13 +0000
249@@ -347,6 +347,10 @@
250 try:
251 host = session.get_xenapi_host()
252 host_ip = session.get_xenapi().host.get_record(host)["address"]
253+ except (cls.XenAPI.Failure, KeyError) as e:
254+ return {"Unable to retrieve diagnostics": e}
255+
256+ try:
257 diags = {}
258 xml = get_rrd(host_ip, record["uuid"])
259 if xml:
260
261=== modified file 'nova/virt/xenapi/vmops.py'
262--- nova/virt/xenapi/vmops.py 2010-12-28 19:01:54 +0000
263+++ nova/virt/xenapi/vmops.py 2010-12-31 01:09:13 +0000
264@@ -268,11 +268,12 @@
265 rec = self._session.get_xenapi().VM.get_record(vm)
266 return VMHelper.compile_info(rec)
267
268- def get_diagnostics(self, instance_id):
269+ def get_diagnostics(self, instance):
270 """Return data about VM diagnostics"""
271- vm = VMHelper.lookup(self._session, instance_id)
272+ vm = VMHelper.lookup(self._session, instance.name)
273 if vm is None:
274- raise exception.NotFound(_("Instance not found %s") % instance_id)
275+ raise exception.NotFound(_("Instance not found %s") %
276+ instance.name)
277 rec = self._session.get_xenapi().VM.get_record(vm)
278 return VMHelper.compile_diagnostics(self._session, rec)
279
280
281=== modified file 'nova/virt/xenapi_conn.py'
282--- nova/virt/xenapi_conn.py 2010-12-28 19:01:54 +0000
283+++ nova/virt/xenapi_conn.py 2010-12-31 01:09:13 +0000
284@@ -167,9 +167,9 @@
285 """Return data about VM instance"""
286 return self._vmops.get_info(instance_id)
287
288- def get_diagnostics(self, instance_id):
289+ def get_diagnostics(self, instance):
290 """Return data about VM diagnostics"""
291- return self._vmops.get_diagnostics(instance_id)
292+ return self._vmops.get_diagnostics(instance)
293
294 def get_console_output(self, instance):
295 """Return snapshot of console"""
296@@ -242,7 +242,7 @@
297 status = self._session.xenapi.task.get_status(task)
298 action = dict(
299 instance_id=int(id),
300- action=name,
301+ action=name[0:255], # Ensure action is never > 255
302 error=None)
303 if status == "pending":
304 return