During post_live_migration the nova libvirt driver assumes that the destination connection info is the same as the source, which is not always true

Bug #1475411 reported by Anthony Lee
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
Anthony Lee
Juno
Fix Released
Undecided
Unassigned
Kilo
Fix Released
Undecided
Unassigned
nova (Ubuntu)
Fix Released
Undecided
Unassigned
Trusty
Fix Released
Undecided
Unassigned

Bug Description

[Impact]

The post_live_migration step for Nova libvirt driver is currently making a bad assumption about the source and destination connector information. The destination connection info may be different from the source which ends up causing LUNs to be left dangling on the source as the BDM has overridden the connection info with that of the destination.

Code section where this problem is occuring:

https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L6036

At line 6038 the potentially wrong connection info will be passed to _disconnect_volume which then ends up not finding the proper LUNs to remove (and potentially removes the LUNs for a different volume instead).

By adding debug logging after line 6036 and then comparing that to the connection info of the source host (by making a call to Cinder's initialize_connection API) you can see that the connection info does not match:

http://paste.openstack.org/show/TjBHyPhidRuLlrxuGktz/

Version of nova being used:

commit 35375133398d862a61334783c1e7a90b95f34cdb
Merge: 83623dd b2c5542
Author: Jenkins <email address hidden>
Date: Thu Jul 16 02:01:05 2015 +0000

    Merge "Port crypto to Python 3"

[Test Case]

Live migrate an instance which is connected to a volume through multi-path in which the source and target connection information is not the same. Verify that the correct device/LUN is removed (instead of wrong one).

[Regression Potential]

The regression potential is small as it has run in newer versions of nova for awhile now (since Juno, the release immediately following Icehouse). If a regression were to occur it would likely prevent a live migration from completing (failing in the post processing), leaving the instance in an error state. However, it should be migrated to the target hypervisor with access to the LUN so it would require manual cleanup of the lun at the source hypervisor and a reset of the instance state to active.

Changed in nova:
assignee: nobody → Anthony Lee (anthony-mic-lee)
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/202770

Changed in nova:
status: New → In Progress
tags: added: live-migrate
removed: live-migration
tags: added: kilo-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/211051

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/202770
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8b649aa86fb26e998d66e75e5cebfd19c396942d
Submitter: Jenkins
Branch: master

commit 8b649aa86fb26e998d66e75e5cebfd19c396942d
Author: Anthony Lee <email address hidden>
Date: Thu Jul 16 13:02:00 2015 -0700

    Fix live-migrations usage of the wrong connector information

    During the post_live_migration step for the Nova libvirt driver
    an incorrect assumption is being made about the connector
    information being sent to _disconnect_volume. It is assumed that
    the connection information on the source and destination is the
    same but that is not always the case. The BDM, where the
    connector information is being retrieved from only contains the
    connection information for the destination. This will not work
    when trying to disconnect volumes from the source during live
    migration as the properties such as the target_lun and
    initiator_target_map could be different. This ends up leaving
    behind dangling LUNs and possibly removing the incorrect
    volume's LUNs.

    The solution proposed here utilizes the connection_info that
    can be retrieved for a host from Cinder's initialize_connection
    API. This connection information contains the correct data for
    the source host and allows volume LUNs to be removed properly.

    Change-Id: I3dfb75eb58dfbc66b218bcee473af4c2ac282eb6
    Closes-Bug: #1475411
    Closes-Bug: #1288039
    Closes-Bug: #1423772

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/kilo)

Reviewed: https://review.openstack.org/211051
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=587092c909e15e983f7aef31d7bc0862271a32c7
Submitter: Jenkins
Branch: stable/kilo

commit 587092c909e15e983f7aef31d7bc0862271a32c7
Author: Anthony Lee <email address hidden>
Date: Thu Jul 16 13:02:00 2015 -0700

    Fix live-migrations usage of the wrong connector information

    During the post_live_migration step for the Nova libvirt driver
    an incorrect assumption is being made about the connector
    information being sent to _disconnect_volume. It is assumed that
    the connection information on the source and destination is the
    same but that is not always the case. The BDM, where the
    connector information is being retrieved from only contains the
    connection information for the destination. This will not work
    when trying to disconnect volumes from the source during live
    migration as the properties such as the target_lun and
    initiator_target_map could be different. This ends up leaving
    behind dangling LUNs and possibly removing the incorrect
    volume's LUNs.

    The solution proposed here utilizes the connection_info that
    can be retrieved for a host from Cinder's initialize_connection
    API. This connection information contains the correct data for
    the source host and allows volume LUNs to be removed properly.

    --

    NOTE(sahid): The TODO comment in the original change on master is
    omitted here since os-brick wasn't used by nova in kilo so leaving
    it in the backport would be confusing.

    Change-Id: I3dfb75eb58dfbc66b218bcee473af4c2ac282eb6
    Closes-Bug: #1475411
    Closes-Bug: #1288039
    Closes-Bug: #1423772

tags: added: in-stable-kilo
Thierry Carrez (ttx)
Changed in nova:
milestone: none → liberty-3
status: Fix Committed → Fix Released
Revision history for this message
Matt Riedemann (mriedem) wrote :

This might fix part of OSSA bug 1419577.

tags: removed: kilo-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/juno)

Fix proposed to branch: stable/juno
Review: https://review.openstack.org/228517

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/juno)

Reviewed: https://review.openstack.org/228517
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=9d2abbd9ab60ca873650759feaba98b4d8d35566
Submitter: Jenkins
Branch: stable/juno

commit 9d2abbd9ab60ca873650759feaba98b4d8d35566
Author: Anthony Lee <email address hidden>
Date: Thu Jul 16 13:02:00 2015 -0700

    Fix live-migrations usage of the wrong connector information

    During the post_live_migration step for the Nova libvirt driver
    an incorrect assumption is being made about the connector
    information being sent to _disconnect_volume. It is assumed that
    the connection information on the source and destination is the
    same but that is not always the case. The BDM, where the
    connector information is being retrieved from only contains the
    connection information for the destination. This will not work
    when trying to disconnect volumes from the source during live
    migration as the properties such as the target_lun and
    initiator_target_map could be different. This ends up leaving
    behind dangling LUNs and possibly removing the incorrect
    volume's LUNs.

    The solution proposed here utilizes the connection_info that
    can be retrieved for a host from Cinder's initialize_connection
    API. This connection information contains the correct data for
    the source host and allows volume LUNs to be removed properly.

    Conflicts:
            nova/tests/unit/virt/libvirt/test_driver.py

    NOTE(mriedem): The conflicts are due to the tests being moved
    in Kilo and 41f80226e0a1f73af76c7968617ebfda0aeb40b1 not being
    in stable/juno (renamed conn var to drvr in libvirt tests).

    Change-Id: I3dfb75eb58dfbc66b218bcee473af4c2ac282eb6
    Closes-Bug: #1475411
    Closes-Bug: #1288039
    Closes-Bug: #1423772
    (cherry picked from commit 587092c909e15e983f7aef31d7bc0862271a32c7)

tags: added: in-stable-juno
Thierry Carrez (ttx)
Changed in nova:
milestone: liberty-3 → 12.0.0
Revision history for this message
Martin Pitt (pitti) wrote :

There is an SRU waiting in the trusty-proposed queue for this. Please clarify in which Ubuntu release(s) this is already fixed, or upload the fix to yakkety, so that the trusty SRU can proceed.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

Hi Martin. This was fixed upstream in nova for OpenStack Juno which mapped to Utopic. So it is already fixed in Utopic and all releases after that.

Revision history for this message
Martin Pitt (pitti) wrote :

OK. Please always set correct task states, to avoid stalling SRUs due to that.

Changed in nova (Ubuntu):
status: New → Fix Released
Changed in nova (Ubuntu Trusty):
status: New → Fix Committed
tags: added: verification-needed
Revision history for this message
Martin Pitt (pitti) wrote : Please test proposed package

Hello Anthony, or anyone else affected,

Accepted nova into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/nova/1:2014.1.5-0ubuntu1.5 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

description: updated
tags: added: verification-done
removed: verification-needed
Revision history for this message
Martin Pitt (pitti) wrote : Update Released

The verification of the Stable Release Update for nova has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nova - 1:2014.1.5-0ubuntu1.5

---------------
nova (1:2014.1.5-0ubuntu1.5) trusty; urgency=medium

  * Fix live migration usage of the wrong connector (LP: #1475411)
    - d/p/Fix-live-migrations-usage-of-the-wrong-connector-inf.patch
  * Fix wrong used ProcessExecutionError exception (LP: #1308839)
    - d/p/Fix-wrong-used-ProcessExecutionError-exception.patch
  * Clean up iSCSI multipath devices in Post Live Migration (LP: #1357368)
    - d/p/Clean-up-iSCSI-multipath-devices-in-Post-Live-Migrat.patch
  * Detach iSCSI latest path for latest disk (LP: #1374999)
    - d/p/Detach-iSCSI-latest-path-for-latest-disk.patch

 -- Billy Olsen <email address hidden> Fri, 29 Apr 2016 15:35:01 -0700

Changed in nova (Ubuntu Trusty):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.