OpenStack Compute (nova)

Merge lp:~cbjchen/nova/juno-sru-lp1353939 into lp:~ubuntu-server-dev/nova/juno

juno-sru-lp1353939
Merge into juno

Proposed by Liang Chen on 2015-07-21

Status:	Needs review
Proposed branch:	lp:~cbjchen/nova/juno-sru-lp1353939
Merge into:	lp:~ubuntu-server-dev/nova/juno
Diff against target:	163 lines (+141/-0) 3 files modified debian/changelog (+7/-0) debian/patches/series (+1/-0) debian/patches/shutdown-timeout-retry.patch (+133/-0)
To merge this branch:	bzr merge lp:~cbjchen/nova/juno-sru-lp1353939
Related bugs:	Link a bug report

Reviewer	Review Type	Date Requested	Status
Corey Bryant		2015-07-21	Abstain on 2015-09-08
Review via email: mp+265466@code.launchpad.net

lp:~cbjchen/nova/juno-sru-lp1353939 updated on 2015-07-24

726. By Liang Chen <email address hidden> on 2015-07-24: edit changelog

Revision history for this message

Corey Bryant (corey.bryant) wrote on 2015-09-08:

Looks good, thanks Liang!

review: Approve

Revision history for this message

Corey Bryant (corey.bryant) wrote on 2015-09-08:

I'm moving my vote to abstain for now until this lands upstream in stable/juno.

review: Abstain

Revision history for this message

Billy Olsen (billy-olsen) wrote on 2015-09-08:

Proposed change upstream in stable/juno (https://review.openstack.org/#/c/221529/)

Unmerged revisions

726. By Liang Chen <email address hidden> on 2015-07-24: edit changelog
725. By lchen <<email address hidden>@canonical.com> on 2015-07-21: SRU LP: #1353939

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Liang Chen

Ubuntu Server Developers

 === modified file 'debian/changelog'
 --- debian/changelog	2015-07-16 09:42:45 +0000
 +++ debian/changelog	2015-07-24 18:56:41 +0000
@@ -1,5 +1,12 @@
  nova (1:2014.2.3-0ubuntu1.2) utopic; urgency=medium
++  * Cherrypick fix for not able to stop instances: Device or resource busy
++    - d/p/shutdown-timeout-retry.patch (LP: #1353939)
++
++ -- Liang Chen <liang.chen@canonical.com>  Tue, 21 Jul 2015 16:33:47 -0400
++
++nova (1:2014.2.3-0ubuntu1.2) utopic; urgency=medium
++
    * Add rsyslog retry support (LP: #1459046)
      - d/p/add-support-for-syslog-connect-retries.patch
 === modified file 'debian/patches/series'
 --- debian/patches/series	2015-07-15 16:11:19 +0000
 +++ debian/patches/series	2015-07-24 18:56:41 +0000
@@ -6,3 +6,4 @@
  disable-websockify-tests.patch
  neutron-floating-ip-list.patch
  add-support-for-syslog-connect-retries.patch
++shutdown-timeout-retry.patch
 === added file 'debian/patches/shutdown-timeout-retry.patch'
 --- debian/patches/shutdown-timeout-retry.patch	1970-01-01 00:00:00 +0000
 +++ debian/patches/shutdown-timeout-retry.patch	2015-07-24 18:56:41 +0000
@@ -0,0 +1,133 @@
++commit add5b4f751ff27a1e1af82a0799cf75ef6169619
++Author: Matt Riedemann <mriedem@us.ibm.com>
++Date:   Sun May 10 18:46:37 2015 -0700
++
++    libvirt: handle code=38 + sigkill (ebusy) in destroy()
++
++    Handle the libvirt error during destroy when the sigkill fails due to an
++    EBUSY. This is taken from a comment by danpb in the bug report as a
++    potential workaround.
++
++    Co-authored-by: Daniel Berrange (berrange@redhat.com)
++
++    Closes-Bug: #1353939
++
++    Conflicts:
++        nova/tests/unit/virt/libvirt/test_driver.py
++
++        NOTE (kashyapc): 'stable/kilo' branch doesn't have the
++        'libvirt_guest' object, so, adjust the below unit tests accordingly:
++
++            test_private_destroy_ebusy_timeout
++            test_private_destroy_ebusy_multiple_attempt_ok
++
++    Change-Id: I128bf6b939fbbc85df521fd3fe23c3c6f93b1b2c
++    (cherry picked from commit 3907867601d1044eaadebff68a590d176abff6cf)
++
++    Conflicts:
++    	nova/tests/unit/virt/libvirt/test_driver.py
++
++--- a/nova/tests/virt/libvirt/test_driver.py
+++++ b/nova/tests/virt/libvirt/test_driver.py
++@@ -7905,6 +7905,53 @@ class LibvirtConnTestCase(test.TestCase):
++         # NOTE(vish): verifies destroy doesn't raise if the instance disappears
++         conn._destroy(instance)
++
+++    def test_private_destroy_ebusy_timeout(self):
+++        # Tests that _destroy will retry 3 times to destroy the guest when an
+++        # EBUSY is raised, but eventually times out and raises the libvirtError
+++        ex = fakelibvirt.make_libvirtError(
+++                libvirt.libvirtError,
+++                "Failed to terminate process 26425 with SIGKILL: "
+++                "Device or resource busy",
+++                error_code=libvirt.VIR_ERR_SYSTEM_ERROR,
+++                int1=errno.EBUSY)
+++
+++        instance = self.create_instance_obj(self.context)
+++        drvr = libvirt_driver.LibvirtDriver(fake.FakeVirtAPI(), False)
+++
+++        with mock.patch.object(drvr._conn, 'lookupByName') as mock_get_domain:
+++            mock_domain = mock.MagicMock()
+++            mock_domain.ID.return_value = 1
+++            mock_get_domain.return_value = mock_domain
+++            mock_domain.destroy.side_effect = ex
+++
+++            self.assertRaises(libvirt.libvirtError, drvr._destroy, instance)
+++
+++        self.assertEqual(3, mock_domain.destroy.call_count)
+++
+++    def test_private_destroy_ebusy_multiple_attempt_ok(self):
+++        # Tests that the _destroy attempt loop is broken when EBUSY is no
+++        # longer raised.
+++        ex = fakelibvirt.make_libvirtError(
+++                libvirt.libvirtError,
+++                "Failed to terminate process 26425 with SIGKILL: "
+++                "Device or resource busy",
+++                error_code=libvirt.VIR_ERR_SYSTEM_ERROR,
+++                int1=errno.EBUSY)
+++
+++        inst_info = {'state': power_state.SHUTDOWN, 'id': 1}
+++        instance = self.create_instance_obj(self.context)
+++        drvr = libvirt_driver.LibvirtDriver(fake.FakeVirtAPI(), False)
+++
+++        with mock.patch.object(drvr._conn, 'lookupByName') as mock_get_domain, \
+++             mock.patch.object(drvr, 'get_info', return_value=inst_info):
+++                mock_domain = mock.MagicMock()
+++                mock_domain.ID.return_value = 1
+++                mock_get_domain.return_value = mock_domain
+++                mock_domain.destroy.side_effect = ex, None
+++                drvr._destroy(instance)
+++
+++        self.assertEqual(2, mock_domain.destroy.call_count)
+++
++     def test_undefine_domain_with_not_found_instance(self):
++         def fake_lookup(instance_name):
++             raise libvirt.libvirtError("not found")
++Index: nova-2014.2.3/nova/virt/libvirt/driver.py
++===================================================================
++--- nova-2014.2.3.orig/nova/virt/libvirt/driver.py
+++++ nova-2014.2.3/nova/virt/libvirt/driver.py
++@@ -965,7 +965,7 @@ class LibvirtDriver(driver.ComputeDriver
++         rootfs_dev = instance.system_metadata.get('rootfs_device_name')
++         disk.teardown_container(container_dir, rootfs_dev)
++
++-    def _destroy(self, instance):
+++    def _destroy(self, instance, attempt=1):
++         try:
++             virt_dom = self._lookup_by_name(instance['name'])
++         except exception.InstanceNotFound:
++@@ -1002,6 +1002,34 @@ class LibvirtDriver(driver.ComputeDriver
++                             instance=instance)
++                     reason = _("operation time out")
++                     raise exception.InstancePowerOffFailure(reason=reason)
+++                elif errcode == libvirt.VIR_ERR_SYSTEM_ERROR:
+++                    if e.get_int1() == errno.EBUSY:
+++                        # NOTE(danpb): When libvirt kills a process it sends it
+++                        # SIGTERM first and waits 10 seconds. If it hasn't gone
+++                        # it sends SIGKILL and waits another 5 seconds. If it
+++                        # still hasn't gone then you get this EBUSY error.
+++                        # Usually when a QEMU process fails to go away upon
+++                        # SIGKILL it is because it is stuck in an
+++                        # uninterruptable kernel sleep waiting on I/O from
+++                        # some non-responsive server.
+++                        # Given the CPU load of the gate tests though, it is
+++                        # conceivable that the 15 second timeout is too short,
+++                        # particularly if the VM running tempest has a high
+++                        # steal time from the cloud host. ie 15 wallclock
+++                        # seconds may have passed, but the VM might have only
+++                        # have a few seconds of scheduled run time.
+++                        LOG.warn(_LW('Error from libvirt during destroy. '
+++                                     'Code=%(errcode)s Error=%(e)s; '
+++                                     'attempt %(attempt)d of 3'),
+++                                 {'errcode': errcode, 'e': e,
+++                                  'attempt': attempt},
+++                                 instance=instance)
+++                        with excutils.save_and_reraise_exception() as ctxt:
+++                            # Try up to 3 times before giving up.
+++                            if attempt < 3:
+++                                ctxt.reraise = False
+++                                self._destroy(instance, attempt + 1)
+++                                return
++
++                 if not is_okay:
++                     with excutils.save_and_reraise_exception():

OpenStack Compute (nova)

Merge lp:~cbjchen/nova/juno-sru-lp1353939 into lp:~ubuntu-server-dev/nova/juno

Commit message

Description of the change

Unmerged revisions

Preview Diff

Subscribers