Hey Josh. Overall looks good, just a few suggestions: > 12 +flags.DEFINE_integer("rescue_timeout", 0, > 13 + "Automatically unrescue an instance after N hours." > 14 + " Set to 0 to disable.") I'd consider using minutes or seconds here for maximum flexibility. Suppose some other provider wants to auto-unrescue after 15 minutes, since we're using an integer, that would be impossible with hours as the metric. > 25 + if FLAGS.rescue_timeout > 0: > 26 + self.driver.poll_rescued_instances(FLAGS.rescue_timeout) It looks like you only defined an implementation in xenapi_conn.py. We may want to add `poll_rescued_instances` to the other drivers with a NotImplementedError so someone can come along add at that in for us. > 105 + _vbd_ref = self._session.get_xenapi().VBD.get_record(vbd_ref) > 106 + if _vbd_ref["userdevice"] == "1": Per new-convention, this should probably be: vbd_rec = self._session.get_xenapi().VBD.get_record(vbd_ref) if vbd_rec["userdevice"] == "1": > 106 + if _vbd_ref["userdevice"] == "1": Could you clarify the significance of userdevice == "1". A comment would be useful, but even better would be setting it nicely named variable: something_meaningful_is_true = (vbd_rec["userdevice"] == "1") if something_meaningful_is_true: # blah > 186 + vms = [] > 187 + for instance in self.list_instances(): > 188 + if instance.endswith("-rescue"): Since these are only rescue VMs, var might be clearer as `rescue_vms`. > 196 + original_name = vm["name"].split("-rescue", 1)[0] > 197 + original_vm_ref = VMHelper.lookup(self._session, original_name) Might sense to move these lines down so they're next to where they are used: original_name = vm["name"].split("-rescue", 1)[0] original_vm_ref = VMHelper.lookup(self._session, original_name) self._release_bootlock(original_vm_ref) self._session.call_xenapi("VM.start", original_vm_ref, False, False) > self._destroy_rescue_vbds(rescue_vm_ref) > self._shutdown_rescue(rescue_vm_ref) > self._destroy_rescue_vdis(rescue_vm_ref) > self._destroy_rescue_instance(rescue_vm_ref) > self._release_bootlock(original_vm_ref) > self._start(instance, original_vm_ref) This block of code is effectively repeated twice in vmops. Might be work refactoring it out to a separate method to DRY it up. > 81 + def _shutdown_rescue(self, rescue_vm_ref): > 82 + """Shutdown a rescue instance""" > 83 + self._session.call_xenapi("Async.VM.hard_shutdown", rescue_vm_ref) I'm not sure we need an additional method here (the less the better!). I *think* we could replace with a call to: self._shutdown(instance, rescue_vm_ref, hard=True) This gives us the benefit of logging (and the wait_for_task code that is missing). > 92 + def _destroy_rescue_vdis(self, rescue_vm_ref): > 93 + """Destroys all VDIs associated with a rescued VM""" > 94 + vdi_refs = VMHelper.lookup_vm_vdis(self._session, rescue_vm_ref) > 95 + for vdi_ref in vdi_refs: > 96 + try: > 97 + self._session.call_xenapi("Async.VDI.destroy", vdi_ref) > 98 + except self.XenAPI.Failure: > 99 + continue > 100 + > 101 + def _destroy_rescue_vbds(self, rescue_vm_ref): > 102 + """Destroys all VBDs tied to a rescue VM""" > 103 + vbd_refs = self._session.get_xenapi().VM.get_VBDs(rescue_vm_ref) > 104 + for vbd_ref in vbd_refs: > 105 + _vbd_ref = self._session.get_xenapi().VBD.get_record(vbd_ref) > 106 + if _vbd_ref["userdevice"] == "1": > 107 + VMHelper.unplug_vbd(self._session, vbd_ref) > 108 + VMHelper.destroy_vbd(self._session, vbd_ref) > 109 + As above, I think it would be better to use the regular destroy_* implementations and just pass the rescue_vm_ref instead of the usual vm_ref.