Hey Josh. Overall looks good, just a few suggestions:

> 12	+flags.DEFINE_integer("rescue_timeout", 0,
> 13	+                     "Automatically unrescue an instance after N hours."
> 14	+                     " Set to 0 to disable.")

I'd consider using minutes or seconds here for maximum flexibility.

Suppose some other provider wants to auto-unrescue after 15 minutes, since
we're using an integer, that would be impossible with hours as the metric.

> 25	+        if FLAGS.rescue_timeout > 0:
> 26	+            self.driver.poll_rescued_instances(FLAGS.rescue_timeout)

It looks like you only defined an implementation in xenapi_conn.py. We may
want to add `poll_rescued_instances` to the other drivers with a
NotImplementedError so someone can come along add at that in for us.


> 105	+            _vbd_ref = self._session.get_xenapi().VBD.get_record(vbd_ref)
> 106	+            if _vbd_ref["userdevice"] == "1":

Per new-convention, this should probably be:

    vbd_rec = self._session.get_xenapi().VBD.get_record(vbd_ref)
    if vbd_rec["userdevice"] == "1":

> 106	+            if _vbd_ref["userdevice"] == "1":

Could you clarify the significance of userdevice == "1". A comment would be
useful, but even better would be setting it nicely named variable:

    something_meaningful_is_true = (vbd_rec["userdevice"] == "1")
    if something_meaningful_is_true:
        # blah

> 186	+        vms = []
> 187	+        for instance in self.list_instances():
> 188	+            if instance.endswith("-rescue"):

Since these are only rescue VMs, var might be clearer as `rescue_vms`.

> 196	+            original_name = vm["name"].split("-rescue", 1)[0]
> 197	+            original_vm_ref = VMHelper.lookup(self._session, original_name)

Might sense to move these lines down so they're next to where they are used:

    original_name = vm["name"].split("-rescue", 1)[0]
    original_vm_ref = VMHelper.lookup(self._session, original_name)
    self._release_bootlock(original_vm_ref)
    self._session.call_xenapi("VM.start", original_vm_ref, False,
                              False)

>        self._destroy_rescue_vbds(rescue_vm_ref)
>        self._shutdown_rescue(rescue_vm_ref)
>        self._destroy_rescue_vdis(rescue_vm_ref)
>        self._destroy_rescue_instance(rescue_vm_ref)
>        self._release_bootlock(original_vm_ref)
>        self._start(instance, original_vm_ref)

This block of code is effectively repeated twice in vmops. Might be work
refactoring it out to a separate method to DRY it up.

> 81	+    def _shutdown_rescue(self, rescue_vm_ref):
> 82	+        """Shutdown a rescue instance"""
> 83	+        self._session.call_xenapi("Async.VM.hard_shutdown", rescue_vm_ref)

I'm not sure we need an additional method here (the less the better!). I
*think* we could replace with a call to:

    self._shutdown(instance, rescue_vm_ref, hard=True)

This gives us the benefit of logging (and the wait_for_task code that is
missing).

> 92	+    def _destroy_rescue_vdis(self, rescue_vm_ref):
> 93	+        """Destroys all VDIs associated with a rescued VM"""
> 94	+        vdi_refs = VMHelper.lookup_vm_vdis(self._session, rescue_vm_ref)
> 95	+        for vdi_ref in vdi_refs:
> 96	+            try:
> 97	+                self._session.call_xenapi("Async.VDI.destroy", vdi_ref)
> 98	+            except self.XenAPI.Failure:
> 99	+                continue
> 100	+
> 101	+    def _destroy_rescue_vbds(self, rescue_vm_ref):
> 102	+        """Destroys all VBDs tied to a rescue VM"""
> 103	+        vbd_refs = self._session.get_xenapi().VM.get_VBDs(rescue_vm_ref)
> 104	+        for vbd_ref in vbd_refs:
> 105	+            _vbd_ref = self._session.get_xenapi().VBD.get_record(vbd_ref)
> 106	+            if _vbd_ref["userdevice"] == "1":
> 107	+                VMHelper.unplug_vbd(self._session, vbd_ref)
> 108	+                VMHelper.destroy_vbd(self._session, vbd_ref)
> 109	+

As above, I think it would be better to use the regular destroy_*
implementations and just pass the rescue_vm_ref instead of the usual vm_ref.