Resource reservation isn't rolled back properly for certain failures during Instance Create

Bug #1065092 reported by Andy McCrae
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Andy McCrae
Folsom
Fix Released
Medium
Andy McCrae
nova (Ubuntu)
Fix Released
Undecided
Unassigned
Quantal
Fix Released
Undecided
Unassigned

Bug Description

The resources are reserved at the start of the instance creation, but are never rolled back/removed if certain exceptions occur, for example if you pass an incorrect network ID (404), or the image isn't currently 'active' in glance, the reservation counter will increase and not be removed.

You can recreate this pretty simply by performing the following (assuming that net-id below isn't a legitimate net-id on your setup):

nova boot --flavor 1 --image cirros-0.3.0-x86_64-uec --nic net-id=9596314f-0006-4a5e-ad13-2232674e2102 Test

Rerunning the above command until you hit your quota limit will begin returning "Quota exceeded for instances: Requested 1, but already used 10 of 10 instances" - although there are no running instances. The nova.quota_usages table will show the reservation counter increasing.

This works for some exceptions where it is explicitly rolled back, but essentially it the rollback seems inconsistent.

Changed in nova:
assignee: nobody → Andy McCrae (andrew-mccrae)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/14289

Changed in nova:
status: New → In Progress
Changed in nova:
importance: Undecided → Medium
tags: added: folsom-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/14289
Committed: http://github.com/openstack/nova/commit/49eca4f8309cebb1e083a06b17b002bb843ad0ab
Submitter: Jenkins
Branch: master

commit 49eca4f8309cebb1e083a06b17b002bb843ad0ab
Author: Andy McCrae <email address hidden>
Date: Wed Oct 24 13:21:33 2012 +0000

    Consistent Rollback for instance creation failures

    Fixes Bug 1065092

    Adds a single QUOTA.rollback for all exceptions rather than performing
    the rollback individually. Ensures that no stale "reservations" are left
    in place after a failed instance creation.

    Change-Id: I354726b651ae8feb6153d5865e0b14b54c73314b

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/folsom)

Fix proposed to branch: stable/folsom
Review: https://review.openstack.org/14877

Revision history for this message
Andy McCrae (andrew-mccrae) wrote :

I've proposed the backport now.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/folsom)

Reviewed: https://review.openstack.org/14877
Committed: http://github.com/openstack/nova/commit/c9cade2d73194a56f4f0de03f11c4d2d6e481535
Submitter: Jenkins
Branch: stable/folsom

commit c9cade2d73194a56f4f0de03f11c4d2d6e481535
Author: Andy McCrae <email address hidden>
Date: Wed Oct 24 13:21:33 2012 +0000

    Consistent Rollback for instance creation failures

    Fixes Bug 1065092

    Adds a single QUOTA.rollback for all exceptions rather than performing
    the rollback individually. Ensures that no stale "reservations" are left
    in place after a failed instance creation.

    Change-Id: I354726b651ae8feb6153d5865e0b14b54c73314b

tags: added: in-stable-folsom
Thierry Carrez (ttx)
Changed in nova:
milestone: none → grizzly-1
status: Fix Committed → Fix Released
tags: removed: folsom-backport-potential
Changed in nova (Ubuntu):
status: New → Fix Released
Changed in nova (Ubuntu Quantal):
status: New → Confirmed
Revision history for this message
Clint Byrum (clint-fewbar) wrote : Please test proposed package

Hello Andy, or anyone else affected,

Accepted nova into quantal-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/nova/2012.2.1+stable-20121212-a99a802e-0ubuntu1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in nova (Ubuntu Quantal):
status: Confirmed → Fix Committed
tags: added: verification-needed
Mark McLoughlin (markmc)
tags: removed: in-stable-folsom
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (8.3 KiB)

This bug was fixed in the package nova - 2012.2.1+stable-20121212-a99a802e-0ubuntu1

---------------
nova (2012.2.1+stable-20121212-a99a802e-0ubuntu1) quantal-proposed; urgency=low

  * Ubuntu updates:
    - debian/control: Ensure novaclient is upgraded with nova,
      require python-keystoneclient >= 1:2.9.0. (LP: #1073289)
    - d/p/avoid_setuptools_git_dependency.patch: Refresh.
  * Dropped patches, applied upstream:
    - debian/patches/CVE-2012-5625.patch: [a99a802]
  * Resynchronize with stable/folsom (b55014ca) (LP: #1085255):
    - [a99a802] create_lvm_image allocates dirty blocks (LP: #1070539)
    - [670b388] RPC exchange name defaults to 'openstack' (LP: #1083944)
    - [3ede373] disassociate_floating_ip with multi_host=True fails
      (LP: #1074437)
    - [22d7c3b] libvirt imagecache should handle shared image storage
      (LP: #1075018)
    - [e787786] Detached and deleted RBD volumes remain associated with insance
      (LP: #1083818)
    - [9265eb0] live_migration missing migrate_data parameter in Hyper-V driver
      (LP: #1066513)
    - [3d99848] use_single_default_gateway does not function correctly
      (LP: #1075859)
    - [65a2d0a] resize does not migrate DHCP host information (LP: #1065440)
    - [102c76b] Nova backup image fails (LP: #1065053)
    - [48a3521] Fix config-file overrides for nova-dhcpbridge
    - [69663ee] Cloudpipe in Folsom: no such option: cnt_vpn_clients
      (LP: #1069573)
    - [6e47cc8] DisassociateAddress can cause Internal Server Error
      (LP: #1080406)
    - [22c3d7b] API calls to dis-associate an auto-assigned floating IP should
      return proper warning (LP: #1061499)
    - [bd11d15] libvirt: if exception raised during volume_detach, volume state
      is inconsistent (LP: #1057756)
    - [dcb59c3] admin can't describe all images in ec2 api (LP: #1070138)
    - [78de622] Incorrect Exception raised during Create server when metadata
      over 255 characters (LP: #1004007)
    - [c313de4] Fixed IP isn't released before updating DHCP host file
      (LP: #1078718)
    - [f4ab42d] Enabling Return Reservation ID with XML create server request
      returns no body (LP: #1061124)
    - [3db2a38] 'BackupCreate' should accept rotation parameter greater than or
      equal to zero (LP: #1071168)
    - [f7e5dde] libvirt reboot sometimes fails to reattach volumes
      (LP: #1073720)
    - [ff776d4] libvirt: detaching volume may fail while terminating other
      instances on the same host concurrently (LP: #1060836)
    - [85a8bc2] Used instance uuid rather than id in remove-fixed-ip
    - [42a85c0] Fix error on invalid delete_on_termination value
    - [6a17579] xenapi migrations fail w/ swap (LP: #1064083)
    - [97649b8] attach-time field for volumes is not updated for detach volume
      (LP: #1056122)
    - [8f6a718] libvirt: rebuild is not using kernel and ramdisk associated with
      the new image (LP: #1060925)
    - [fbe835f] live-migration and volume host assignement (LP: #1066887)
    - [c2a9150] typo prevents volume_tmp_dir flag from working (LP: #1071536)
    - [93efa21] Instances deleted during spawn leak network allocations
      (LP: #1068716)
    - [ebabd02] After restartin...

Read more...

Changed in nova (Ubuntu Quantal):
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: grizzly-1 → 2013.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.