Merge lp:~jk0/nova/xs-rescue-periodic-tasks into lp:~hudson-openstack/nova/trunk
- xs-rescue-periodic-tasks
- Merge into trunk
Status: | Merged | ||||
---|---|---|---|---|---|
Approved by: | Rick Harris | ||||
Approved revision: | 851 | ||||
Merged at revision: | 858 | ||||
Proposed branch: | lp:~jk0/nova/xs-rescue-periodic-tasks | ||||
Merge into: | lp:~hudson-openstack/nova/trunk | ||||
Diff against target: |
235 lines (+101/-26) 6 files modified
nova/compute/manager.py (+11/-2) nova/utils.py (+8/-0) nova/virt/hyperv.py (+3/-0) nova/virt/libvirt_conn.py (+4/-0) nova/virt/xenapi/vmops.py (+71/-24) nova/virt/xenapi_conn.py (+4/-0) |
||||
To merge this branch: | bzr merge lp:~jk0/nova/xs-rescue-periodic-tasks | ||||
Related bugs: |
|
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Sandy Walsh (community) | Needs Fixing | ||
Rick Harris (community) | Approve | ||
Matt Dietz (community) | Approve | ||
Review via email: mp+54597@code.launchpad.net |
Commit message
Description of the change
Offers the ability to run a periodic_task that sweeps through rescued instances older than 24 hours and forcibly unrescues them.
Flag added: rescue_timeout (default is 0 - disabled)
- 848. By Josh Kearney
-
Added docstring
- 849. By Josh Kearney
-
Better method name
Rick Harris (rconradharris) wrote : | # |
Hey Josh. Overall looks good, just a few suggestions:
> 12 +flags.
> 13 + "Automatically unrescue an instance after N hours."
> 14 + " Set to 0 to disable.")
I'd consider using minutes or seconds here for maximum flexibility.
Suppose some other provider wants to auto-unrescue after 15 minutes, since
we're using an integer, that would be impossible with hours as the metric.
> 25 + if FLAGS.rescue_
> 26 + self.driver.
It looks like you only defined an implementation in xenapi_conn.py. We may
want to add `poll_rescued_
NotImplementedError so someone can come along add at that in for us.
> 105 + _vbd_ref = self._session.
> 106 + if _vbd_ref[
Per new-convention, this should probably be:
vbd_rec = self._session.
if vbd_rec[
> 106 + if _vbd_ref[
Could you clarify the significance of userdevice == "1". A comment would be
useful, but even better would be setting it nicely named variable:
something_
if something_
# blah
> 186 + vms = []
> 187 + for instance in self.list_
> 188 + if instance.
Since these are only rescue VMs, var might be clearer as `rescue_vms`.
> 196 + original_name = vm["name"
> 197 + original_vm_ref = VMHelper.
Might sense to move these lines down so they're next to where they are used:
original_name = vm["name"
original_vm_ref = VMHelper.
self.
self.
> self._destroy_
> self._shutdown_
> self._destroy_
> self._destroy_
> self._release_
> self._start(
This block of code is effectively repeated twice in vmops. Might be work
refactoring it out to a separate method to DRY it up.
> 81 + def _shutdown_
> 82 + """Shutdown a rescue instance"""
> 83 + self._session.
I'm not sure we need an additional method here (the less the better!). I
*think* we could replace with a call to:
self.
This gives us the benefit of logging (and the wait_for_task code that is
missing).
> 92 + def _destroy_
> 93 + """Destroys all VDIs associated with a rescued VM"""
> 94 + vdi_refs = VMHelper.
> 95 + for vdi_ref in vdi_refs:
> 96 ...
- 850. By Josh Kearney
-
Review feedback
Josh Kearney (jk0) wrote : | # |
Thanks Rick, this is some great feedback. I was able to fix everything with the exception of a few things below (comments inline):
> It looks like you only defined an implementation in xenapi_conn.py. We may
> want to add `poll_rescued_
> NotImplementedError so someone can come along add at that in for us.
I did implement a `pass` method for libvirt, however, I didn't bother with HyperV since that layer is extremely bare at the moment. I suspect that will improve and change quite rapidly over time.
> As above, I think it would be better to use the regular destroy_*
> implementations and just pass the rescue_vm_ref instead of the usual vm_ref.
Unfortunately since the `instance` object isn't available in these circumstances, I had to break them down into their own rescue methods respectively. Looking at the bright side, this will allow me to provide more thorough test coverage for Rescue/Unrescue once this lands. Another possibility is the big XenAPI refactor that we've talked about, but that will likely warrant its own BP for a future release.
Rick Harris (rconradharris) wrote : | # |
> I didn't bother with HyperV since that layer is extremely bare at the moment.
Not a deal-breaker, but might be a good idea to add to HyperV anyway so it doesn't fall any further behind. Whomever ends up owning HyperV will benefit greatly from the stubs, and they cost next to nothing to add.
> Another possibility is the big XenAPI refactor that we've talked about
Yeah, I'm pretty much in agreement with you here. As much as I hate to see methods which are near-duplicates, I don't see an easy way around without some refactoring.
`_poll_tasks` in xenapi_conn.py is particularly nefarious, considering it takes an instance_id, optionally.
So, with both those items cleared up, I think this patch is good-to-go. Great job, Josh!
- 851. By Josh Kearney
-
Added hyperv stub
Sandy Walsh (sandy-walsh) wrote : | # |
9 +def is_then_
40 + if utcnow() - then > datetime.
41 + return True
42 + else:
43 + return False
Beyond the naming as matt pointed out, how about
return utcnow() - then > datetime.
98 + continue
log message?
105 + if _vbd_ref[
Is there a risk of userdevice not being defined?
Perhaps, if _vbd_ref.
173 + if last_ran:
174 + if not utils.is_
175 + # Do not run. Let's bail.
176 + return
177 + else:
178 + # Update the time tracker and proceed.
179 + self.poll_
180 + else:
181 + # We need a base time to start tracking.
182 + self.poll_
183 + return
I try to keep my returns together to keep the if blocks smaller
173 + if not last_ran:
181 + # We need a base time to start tracking.
182 + self.poll_
183 + return
174 + if not utils.is_
175 + # Do not run. Let's bail.
176 + return
177 + # Update the time tracker and proceed.
179 + self.poll_
No tests?
Josh Kearney (jk0) wrote : | # |
Thanks Sandy, I've worked all your suggestions into my other branch (where the unit tests live :)).
Preview Diff
1 | === modified file 'nova/compute/manager.py' |
2 | --- nova/compute/manager.py 2011-03-23 18:56:23 +0000 |
3 | +++ nova/compute/manager.py 2011-03-24 03:56:32 +0000 |
4 | @@ -65,8 +65,11 @@ |
5 | 'Console proxy host to use to connect to instances on' |
6 | 'this host.') |
7 | flags.DEFINE_integer('live_migration_retry_count', 30, |
8 | - ("Retry count needed in live_migration." |
9 | - " sleep 1 sec for each count")) |
10 | + "Retry count needed in live_migration." |
11 | + " sleep 1 sec for each count") |
12 | +flags.DEFINE_integer("rescue_timeout", 0, |
13 | + "Automatically unrescue an instance after N seconds." |
14 | + " Set to 0 to disable.") |
15 | |
16 | LOG = logging.getLogger('nova.compute.manager') |
17 | |
18 | @@ -132,6 +135,12 @@ |
19 | """ |
20 | self.driver.init_host(host=self.host) |
21 | |
22 | + def periodic_tasks(self, context=None): |
23 | + """Tasks to be run at a periodic interval.""" |
24 | + super(ComputeManager, self).periodic_tasks(context) |
25 | + if FLAGS.rescue_timeout > 0: |
26 | + self.driver.poll_rescued_instances(FLAGS.rescue_timeout) |
27 | + |
28 | def _update_state(self, context, instance_id): |
29 | """Update the state of an instance from the driver info.""" |
30 | # FIXME(ja): include other fields from state? |
31 | |
32 | === modified file 'nova/utils.py' |
33 | --- nova/utils.py 2011-03-22 16:13:48 +0000 |
34 | +++ nova/utils.py 2011-03-24 03:56:32 +0000 |
35 | @@ -335,6 +335,14 @@ |
36 | utcnow.override_time = None |
37 | |
38 | |
39 | +def is_older_than(before, seconds): |
40 | + """Return True if before is older than 'seconds'""" |
41 | + if utcnow() - before > datetime.timedelta(seconds=seconds): |
42 | + return True |
43 | + else: |
44 | + return False |
45 | + |
46 | + |
47 | def utcnow_ts(): |
48 | """Timestamp version of our utcnow function.""" |
49 | return time.mktime(utcnow().timetuple()) |
50 | |
51 | === modified file 'nova/virt/hyperv.py' |
52 | --- nova/virt/hyperv.py 2011-01-27 17:56:54 +0000 |
53 | +++ nova/virt/hyperv.py 2011-03-24 03:56:32 +0000 |
54 | @@ -467,3 +467,6 @@ |
55 | if vm is None: |
56 | raise exception.NotFound('Cannot detach volume from missing %s ' |
57 | % instance_name) |
58 | + |
59 | + def poll_rescued_instances(self, timeout): |
60 | + pass |
61 | |
62 | === modified file 'nova/virt/libvirt_conn.py' |
63 | --- nova/virt/libvirt_conn.py 2011-03-23 23:51:08 +0000 |
64 | +++ nova/virt/libvirt_conn.py 2011-03-24 03:56:32 +0000 |
65 | @@ -417,6 +417,10 @@ |
66 | self.reboot(instance) |
67 | |
68 | @exception.wrap_exception |
69 | + def poll_rescued_instances(self, timeout): |
70 | + pass |
71 | + |
72 | + @exception.wrap_exception |
73 | def spawn(self, instance): |
74 | xml = self.to_xml(instance) |
75 | db.instance_set_state(context.get_admin_context(), |
76 | |
77 | === modified file 'nova/virt/xenapi/vmops.py' |
78 | --- nova/virt/xenapi/vmops.py 2011-03-23 18:58:08 +0000 |
79 | +++ nova/virt/xenapi/vmops.py 2011-03-24 03:56:32 +0000 |
80 | @@ -51,6 +51,7 @@ |
81 | def __init__(self, session): |
82 | self.XenAPI = session.get_imported_xenapi() |
83 | self._session = session |
84 | + self.poll_rescue_last_ran = None |
85 | |
86 | VMHelper.XenAPI = self.XenAPI |
87 | |
88 | @@ -488,6 +489,10 @@ |
89 | except self.XenAPI.Failure, exc: |
90 | LOG.exception(exc) |
91 | |
92 | + def _shutdown_rescue(self, rescue_vm_ref): |
93 | + """Shutdown a rescue instance""" |
94 | + self._session.call_xenapi("Async.VM.hard_shutdown", rescue_vm_ref) |
95 | + |
96 | def _destroy_vdis(self, instance, vm_ref): |
97 | """Destroys all VDIs associated with a VM""" |
98 | instance_id = instance.id |
99 | @@ -505,6 +510,24 @@ |
100 | except self.XenAPI.Failure, exc: |
101 | LOG.exception(exc) |
102 | |
103 | + def _destroy_rescue_vdis(self, rescue_vm_ref): |
104 | + """Destroys all VDIs associated with a rescued VM""" |
105 | + vdi_refs = VMHelper.lookup_vm_vdis(self._session, rescue_vm_ref) |
106 | + for vdi_ref in vdi_refs: |
107 | + try: |
108 | + self._session.call_xenapi("Async.VDI.destroy", vdi_ref) |
109 | + except self.XenAPI.Failure: |
110 | + continue |
111 | + |
112 | + def _destroy_rescue_vbds(self, rescue_vm_ref): |
113 | + """Destroys all VBDs tied to a rescue VM""" |
114 | + vbd_refs = self._session.get_xenapi().VM.get_VBDs(rescue_vm_ref) |
115 | + for vbd_ref in vbd_refs: |
116 | + vbd_rec = self._session.get_xenapi().VBD.get_record(vbd_ref) |
117 | + if vbd_rec["userdevice"] == "1": # primary VBD is always 1 |
118 | + VMHelper.unplug_vbd(self._session, vbd_ref) |
119 | + VMHelper.destroy_vbd(self._session, vbd_ref) |
120 | + |
121 | def _destroy_kernel_ramdisk(self, instance, vm_ref): |
122 | """ |
123 | Three situations can occur: |
124 | @@ -555,6 +578,14 @@ |
125 | |
126 | LOG.debug(_("Instance %(instance_id)s VM destroyed") % locals()) |
127 | |
128 | + def _destroy_rescue_instance(self, rescue_vm_ref): |
129 | + """Destroy a rescue instance""" |
130 | + self._destroy_rescue_vbds(rescue_vm_ref) |
131 | + self._shutdown_rescue(rescue_vm_ref) |
132 | + self._destroy_rescue_vdis(rescue_vm_ref) |
133 | + |
134 | + self._session.call_xenapi("Async.VM.destroy", rescue_vm_ref) |
135 | + |
136 | def destroy(self, instance): |
137 | """ |
138 | Destroy VM instance |
139 | @@ -658,41 +689,57 @@ |
140 | |
141 | """ |
142 | rescue_vm_ref = VMHelper.lookup(self._session, |
143 | - instance.name + "-rescue") |
144 | + instance.name + "-rescue") |
145 | |
146 | if not rescue_vm_ref: |
147 | raise exception.NotFound(_( |
148 | "Instance is not in Rescue Mode: %s" % instance.name)) |
149 | |
150 | original_vm_ref = self._get_vm_opaque_ref(instance) |
151 | - vbd_refs = self._session.get_xenapi().VM.get_VBDs(rescue_vm_ref) |
152 | - |
153 | instance._rescue = False |
154 | |
155 | - for vbd_ref in vbd_refs: |
156 | - _vbd_ref = self._session.get_xenapi().VBD.get_record(vbd_ref) |
157 | - if _vbd_ref["userdevice"] == "1": |
158 | - VMHelper.unplug_vbd(self._session, vbd_ref) |
159 | - VMHelper.destroy_vbd(self._session, vbd_ref) |
160 | - |
161 | - task1 = self._session.call_xenapi("Async.VM.hard_shutdown", |
162 | - rescue_vm_ref) |
163 | - self._session.wait_for_task(task1, instance.id) |
164 | - |
165 | - vdi_refs = VMHelper.lookup_vm_vdis(self._session, rescue_vm_ref) |
166 | - for vdi_ref in vdi_refs: |
167 | - try: |
168 | - task = self._session.call_xenapi('Async.VDI.destroy', vdi_ref) |
169 | - self._session.wait_for_task(task, instance.id) |
170 | - except self.XenAPI.Failure: |
171 | - continue |
172 | - |
173 | - task2 = self._session.call_xenapi('Async.VM.destroy', rescue_vm_ref) |
174 | - self._session.wait_for_task(task2, instance.id) |
175 | - |
176 | + self._destroy_rescue_instance(rescue_vm_ref) |
177 | self._release_bootlock(original_vm_ref) |
178 | self._start(instance, original_vm_ref) |
179 | |
180 | + def poll_rescued_instances(self, timeout): |
181 | + """Look for expirable rescued instances |
182 | + - forcibly exit rescue mode for any instances that have been |
183 | + in rescue mode for >= the provided timeout |
184 | + """ |
185 | + last_ran = self.poll_rescue_last_ran |
186 | + if last_ran: |
187 | + if not utils.is_older_than(last_ran, timeout): |
188 | + # Do not run. Let's bail. |
189 | + return |
190 | + else: |
191 | + # Update the time tracker and proceed. |
192 | + self.poll_rescue_last_ran = utils.utcnow() |
193 | + else: |
194 | + # We need a base time to start tracking. |
195 | + self.poll_rescue_last_ran = utils.utcnow() |
196 | + return |
197 | + |
198 | + rescue_vms = [] |
199 | + for instance in self.list_instances(): |
200 | + if instance.endswith("-rescue"): |
201 | + rescue_vms.append(dict(name=instance, |
202 | + vm_ref=VMHelper.lookup(self._session, |
203 | + instance))) |
204 | + |
205 | + for vm in rescue_vms: |
206 | + rescue_name = vm["name"] |
207 | + rescue_vm_ref = vm["vm_ref"] |
208 | + |
209 | + self._destroy_rescue_instance(rescue_vm_ref) |
210 | + |
211 | + original_name = vm["name"].split("-rescue", 1)[0] |
212 | + original_vm_ref = VMHelper.lookup(self._session, original_name) |
213 | + |
214 | + self._release_bootlock(original_vm_ref) |
215 | + self._session.call_xenapi("VM.start", original_vm_ref, False, |
216 | + False) |
217 | + |
218 | def get_info(self, instance): |
219 | """Return data about VM instance""" |
220 | vm_ref = self._get_vm_opaque_ref(instance) |
221 | |
222 | === modified file 'nova/virt/xenapi_conn.py' |
223 | --- nova/virt/xenapi_conn.py 2011-03-17 16:03:07 +0000 |
224 | +++ nova/virt/xenapi_conn.py 2011-03-24 03:56:32 +0000 |
225 | @@ -223,6 +223,10 @@ |
226 | """Unrescue the specified instance""" |
227 | self._vmops.unrescue(instance, callback) |
228 | |
229 | + def poll_rescued_instances(self, timeout): |
230 | + """Poll for rescued instances""" |
231 | + self._vmops.poll_rescued_instances(timeout) |
232 | + |
233 | def reset_network(self, instance): |
234 | """reset networking for specified instance""" |
235 | self._vmops.reset_network(instance) |
39 +def is_then_ greater( then, seconds):
That's rather awkwardly named. Perhaps "is_older_ than(before, seconds):" ?
Otherwise I think this looks good.