Rebooting instance doesn't restore mounted volume

Bug #747922 reported by Tushar Patil
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Masanori Itoh

Bug Description

Tested on Revision No 925.

Steps to reproduce:-
1) Run one VM instance
2) Attach volume to the VM instance
3) SSH to the VM instance, mount the volume and logout from SSH
4) reboot the VM instance
5) Again SSH to the VM instance and try to mount the volume.
It doesn't allow and gives the error message
{{{
Could not stat /dev/vdb --- No such file or directory

The device apparently does not exist; did you specify it correctly?
}}}

6) euca-describe-volumes still shows that the volume is attached to the VM instance and is in use.
{{{
root@ubuntu-openstack-single-server:/home/tpatil# euca-describe-volumes
VOLUME vol-00000001 1 nova in-use (admin, ubuntu-openstack-single-server, i-00000002[ubuntu-openstack-single-server], \/dev\/vdb) 2011-04-02T00:48:20Z
}}}

7) If I try to detach the volume, then it gives error message in the nova-compute.log
nova-compute.log
-----------------------
{{{
2011-04-01 17:59:04,743 ERROR nova [-] Exception during message handling
(nova): TRACE: Traceback (most recent call last):
(nova): TRACE: File "/home/tpatil/nova/nova/rpc.py", line 190, in _receive
(nova): TRACE: rval = node_func(context=ctxt, **node_args)
(nova): TRACE: File "/home/tpatil/nova/nova/exception.py", line 120, in _wrap
(nova): TRACE: return f(*args, **kw)
(nova): TRACE: File "/home/tpatil/nova/nova/compute/manager.py", line 105, in decorated_function
(nova): TRACE: function(self, context, instance_id, *args, **kwargs)
(nova): TRACE: File "/home/tpatil/nova/nova/compute/manager.py", line 779, in detach_volume
(nova): TRACE: volume_ref['mountpoint'])
(nova): TRACE: File "/home/tpatil/nova/nova/exception.py", line 120, in _wrap
(nova): TRACE: return f(*args, **kw)
(nova): TRACE: File "/home/tpatil/nova/nova/virt/libvirt_conn.py", line 405, in detach_volume
(nova): TRACE: raise exception.NotFound(_("No disk at %s") % mount_device)
(nova): TRACE: NotFound: No disk at vdb
(nova): TRACE:
}}}

Related branches

Thierry Carrez (ttx)
Changed in nova:
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
Masanori Itoh (itohm) wrote :

Hi Tushar,

You use KVM on Ubuntu, right?
Also, you rebooted the instance on the guest OS?
I mean, not using euca-reboot-instances.

I'm feeling that the root cause of this issue would be a KVM problem.
If this issue is reproducible, please check collect information before and after rebooting instances on compute node.

  # virsh dumpxml VM_NAME

I guess the device you attached to your VM vanished from VM configuration after guest OS reboot.
In the case, what we can do anyway would be logging the exeption and cleaning up database, I think.

-Masanori

Revision history for this message
Masanori Itoh (itohm) wrote :

BTW, if you used euca-reboot-instances on KVM based systems, the issue gets back to nova side.
At this moment, libvirt does not support rebooting KVM instances, and the current implementation of
RebootInstance is like the following.

  trunk/nova/virt/libvirt_conn.py
    473 def reboot(self, instance):
    474 self.destroy(instance, False) # DESTROY ONCE
    475 xml = self.to_xml(instance)

One idea could be calling virsh dumpxml to the instance to be rebooted and updating the above xml here.

    476 self.firewall_driver.setup_basic_filtering(instance)
    477 self.firewall_driver.prepare_instance_filter(instance)
    478 self._conn.createXML(xml, 0) # CREATE AGAIN, AND THERE IS NO CODE TO RE-ATTACH EBSs.
    479 self.firewall_driver.apply_instance_filter(instance)
    480
    481 timer = utils.LoopingCall(f=None)
    482
    483 def _wait_for_reboot():
    484 try:
    485 state = self.get_info(instance['name'])['state']
    486 db.instance_set_state(context.get_admin_context(),
    487 instance['id'], state)
    488 if state == power_state.RUNNING:
    489 LOG.debug(_('instance %s: rebooted'), instance['name'])
    490 timer.stop()
    491 except Exception, exn:
    492 LOG.exception(_('_wait_for_reboot failed: %s'), exn)
    493 db.instance_set_state(context.get_admin_context(),
    494 instance['id'],
    495 power_state.SHUTDOWN)
    496 timer.stop()
    497
    498 timer.f = _wait_for_reboot
    499 return timer.start(interval=0.5, now=True)

-Masanori

Revision history for this message
Masanori Itoh (itohm) wrote :

Hi,

I wrote an ultimately STUPID workaround fix on this issue and linked my branch here.
Actually, this issue is a problematic one if we want to resolve it in an elegant way, I think.

Anyway, it looks working on my Ubuntu 10.10 box at least.

Tushar, if you are testing volume driver other than ISCSI or multiple multi-node nova installation,
could you try the patch below?

  lp:~itoumsn/nova/lp747922

Thanks,

Changed in nova:
status: Confirmed → In Progress
assignee: nobody → Masanori Itoh (itoumsn)
Revision history for this message
Masanori Itoh (itohm) wrote :

I will volunteer as the assignee of this issue until someone with more elegant resolution appears...

-Masanori

Revision history for this message
Tushar Patil (tpatil) wrote :

I tested using your branch lp:~itoumsn/nova/lp747922 and it seems to be working as expected now.

After rebooting the instance, I see the volume is still attached and I can see all files intact.

Thank you.

Revision history for this message
Masanori Itoh (itohm) wrote :

Hi Tushar,

Thanks for testing. :)
I will post a merge request soon after the cactus release.

Thanks,
Masanori

Masanori Itoh (itohm)
Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → diablo-1
Revision history for this message
ady (imelurang) wrote :

pardon me, i'm a newbie here, how can i apply above patch into my existing cloud? :)

Thierry Carrez (ttx)
Changed in nova:
milestone: diablo-1 → 2011.3
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.