forget to release resource when terminate an instance from a failed compute node

Bug #1067214 reported by Tiantian Gao
34
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Vish Ishaya
Folsom
Fix Released
Medium
Vish Ishaya
nova (Ubuntu)
Fix Released
Undecided
Unassigned
Quantal
Fix Released
Undecided
Unassigned

Bug Description

reproduct:
1. start a vm
# nova list
+--------------------------------------+----------------+---------+----------------------+
| ID | Name | Status | Networks |
+--------------------------------------+----------------+---------+----------------------+
| 6d74b0cc-5bb2-4233-b9a0-f80013412a3d | test-fixed-ip | ACTIVE | private=10.120.34.10 |

2. shutdown the host, or stop nova-compute on the host. wait about 60 second to let openstack see the host is down.

3. terminate the vm.
# nova show 6d74b0cc-5bb2-4233-b9a0-f80013412a3d
ERROR: No server with a name or ID of '6d74b0cc-5bb2-4233-b9a0-f80013412a3d' exists.

4. the fixed-ip in database is still allocated
mysql> select * from fixed_ips where address='10.120.34.10';
+---------------------+---------------------+------------+---------+----+--------------+------------+-----------+--------+----------+----------------------+------+--------------------------------------+
| created_at | updated_at | deleted_at | deleted | id | address | network_id | allocated | leased | reserved | virtual_interface_id | host | instance_uuid |
+---------------------+---------------------+------------+---------+----+--------------+------------+-----------+--------+----------+----------------------+------+--------------------------------------+
| 2012-08-21 07:30:35 | 2012-10-16 05:00:42 | NULL | 0 | 11 | 10.120.34.10 | 1 | 1 | 0 | 0 | 500 | NULL | 6d74b0cc-5bb2-4233-b9a0-f80013412a3d |
+---------------------+---------------------+------------+---------+----+--------------+------------+-----------+--------+----------+----------------------+------+--------------------------------------+
1 row in set (0.00 sec)

Revision history for this message
Michael Still (mikal) wrote :

What version of nova are you running?

Changed in nova:
status: New → Incomplete
Revision history for this message
Tiantian Gao (gtt116) wrote :

folsom released

Revision history for this message
Tiantian Gao (gtt116) wrote :

since nova-api will check if the host is down, if it is, nova-api just delete the instance from database, but forget release the fixed-ip.

Tiantian Gao (gtt116)
Changed in nova:
status: Incomplete → Confirmed
Tiantian Gao (gtt116)
summary: - terminate instance from a down node forget release fixed-ip
+ forget to release resource when terminate an instance from a failed
+ compute node
Revision history for this message
Tiantian Gao (gtt116) wrote :

The code below will run in nova-api if nova-compute failed.
=========================
if is_up == False:
    # If compute node isn't up, just delete from DB
    LOG.warning(_('host for instance is down, deleting from '
               'database'), instance=instance)
    self.db.instance_destroy(context, instance['uuid'])
==========================

if so, the resource like fixed-ip, floating-ip, volumes will be unavailable forever, and quota maybe wrong.

Revision history for this message
Vish Ishaya (vishvananda) wrote :

The remaining data "should" be cleaned up if the compute node ever comes back, but we really should be clearing as much as we can here, specifically network deallocation, volume detach.

Changed in nova:
status: Confirmed → Triaged
importance: Undecided → Medium
assignee: nobody → Joe Gordon (joe-gordon0)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/14805

Changed in nova:
status: Triaged → In Progress
tags: added: folsom-backport-potential
Changed in nova:
assignee: Joe Gordon (joe-gordon0) → Vish Ishaya (vishvananda)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/folsom)

Fix proposed to branch: stable/folsom
Review: https://review.openstack.org/15069

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/14805
Committed: http://github.com/openstack/nova/commit/3dff43356619b4a935a5f395b066e96e8856bb4f
Submitter: Jenkins
Branch: master

commit 3dff43356619b4a935a5f395b066e96e8856bb4f
Author: Joe Gordon <email address hidden>
Date: Wed Oct 24 18:07:28 2012 -0700

    Fix VM deletion from down compute node

    * free network resources
    * free volume resources
    * delete.start and delete.stop notifications added
    * Handle network deallocate in multi_host mode

    Fixes bug 1067214

    Co-authored-by: Vishvananda Ishaya <email address hidden>
    Change-Id: I0d4a7dc5836d39e405824528de214f23b214849f

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/folsom)

Reviewed: https://review.openstack.org/15069
Committed: http://github.com/openstack/nova/commit/c0e12477f2d85003b4f8fa223f769e285ec4cb93
Submitter: Jenkins
Branch: stable/folsom

commit c0e12477f2d85003b4f8fa223f769e285ec4cb93
Author: Joe Gordon <email address hidden>
Date: Wed Oct 24 18:07:28 2012 -0700

    Fix VM deletion from down compute node

    * free network resources
    * free volume resources
    * delete.start and delete.stop notifications added
    * Handle network deallocate in multi_host mode

    Fixes bug 1067214

    Co-authored-by: Vishvananda Ishaya <email address hidden>
    Change-Id: I0d4a7dc5836d39e405824528de214f23b214849f
    (cherry picked from commit 3dff43356619b4a935a5f395b066e96e8856bb4f)

Thierry Carrez (ttx)
Changed in nova:
milestone: none → grizzly-1
status: Fix Committed → Fix Released
tags: removed: folsom-backport-potential
Changed in nova (Ubuntu):
status: New → Fix Released
Changed in nova (Ubuntu Quantal):
status: New → Confirmed
Revision history for this message
Clint Byrum (clint-fewbar) wrote : Please test proposed package

Hello TianTian, or anyone else affected,

Accepted nova into quantal-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/nova/2012.2.1+stable-20121212-a99a802e-0ubuntu1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in nova (Ubuntu Quantal):
status: Confirmed → Fix Committed
tags: added: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (8.3 KiB)

This bug was fixed in the package nova - 2012.2.1+stable-20121212-a99a802e-0ubuntu1

---------------
nova (2012.2.1+stable-20121212-a99a802e-0ubuntu1) quantal-proposed; urgency=low

  * Ubuntu updates:
    - debian/control: Ensure novaclient is upgraded with nova,
      require python-keystoneclient >= 1:2.9.0. (LP: #1073289)
    - d/p/avoid_setuptools_git_dependency.patch: Refresh.
  * Dropped patches, applied upstream:
    - debian/patches/CVE-2012-5625.patch: [a99a802]
  * Resynchronize with stable/folsom (b55014ca) (LP: #1085255):
    - [a99a802] create_lvm_image allocates dirty blocks (LP: #1070539)
    - [670b388] RPC exchange name defaults to 'openstack' (LP: #1083944)
    - [3ede373] disassociate_floating_ip with multi_host=True fails
      (LP: #1074437)
    - [22d7c3b] libvirt imagecache should handle shared image storage
      (LP: #1075018)
    - [e787786] Detached and deleted RBD volumes remain associated with insance
      (LP: #1083818)
    - [9265eb0] live_migration missing migrate_data parameter in Hyper-V driver
      (LP: #1066513)
    - [3d99848] use_single_default_gateway does not function correctly
      (LP: #1075859)
    - [65a2d0a] resize does not migrate DHCP host information (LP: #1065440)
    - [102c76b] Nova backup image fails (LP: #1065053)
    - [48a3521] Fix config-file overrides for nova-dhcpbridge
    - [69663ee] Cloudpipe in Folsom: no such option: cnt_vpn_clients
      (LP: #1069573)
    - [6e47cc8] DisassociateAddress can cause Internal Server Error
      (LP: #1080406)
    - [22c3d7b] API calls to dis-associate an auto-assigned floating IP should
      return proper warning (LP: #1061499)
    - [bd11d15] libvirt: if exception raised during volume_detach, volume state
      is inconsistent (LP: #1057756)
    - [dcb59c3] admin can't describe all images in ec2 api (LP: #1070138)
    - [78de622] Incorrect Exception raised during Create server when metadata
      over 255 characters (LP: #1004007)
    - [c313de4] Fixed IP isn't released before updating DHCP host file
      (LP: #1078718)
    - [f4ab42d] Enabling Return Reservation ID with XML create server request
      returns no body (LP: #1061124)
    - [3db2a38] 'BackupCreate' should accept rotation parameter greater than or
      equal to zero (LP: #1071168)
    - [f7e5dde] libvirt reboot sometimes fails to reattach volumes
      (LP: #1073720)
    - [ff776d4] libvirt: detaching volume may fail while terminating other
      instances on the same host concurrently (LP: #1060836)
    - [85a8bc2] Used instance uuid rather than id in remove-fixed-ip
    - [42a85c0] Fix error on invalid delete_on_termination value
    - [6a17579] xenapi migrations fail w/ swap (LP: #1064083)
    - [97649b8] attach-time field for volumes is not updated for detach volume
      (LP: #1056122)
    - [8f6a718] libvirt: rebuild is not using kernel and ramdisk associated with
      the new image (LP: #1060925)
    - [fbe835f] live-migration and volume host assignement (LP: #1066887)
    - [c2a9150] typo prevents volume_tmp_dir flag from working (LP: #1071536)
    - [93efa21] Instances deleted during spawn leak network allocations
      (LP: #1068716)
    - [ebabd02] After restartin...

Read more...

Changed in nova (Ubuntu Quantal):
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: grizzly-1 → 2013.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.