[SRU] race between neutron-ovs-cleanup and nova-compute

Bug #1420572 reported by Liang Chen
36
This bug affects 5 people
Affects Status Importance Assigned to Milestone
nova (Ubuntu)
Fix Released
High
Edward Hope-Morley
Trusty
Fix Released
High
Edward Hope-Morley
Utopic
Fix Released
High
Edward Hope-Morley
Vivid
Fix Released
High
Edward Hope-Morley

Bug Description

[Impact]

 * We run neutron-ovs-cleanup in startup if neutron installed. If
   nova-compute does not wait for completion it will try to use
   veth/bridge devices that may be in the process of bring deleted.

[Test Case]

 * Create neutron (ovs) network and boot an instance with this network
   as --nic

 * Check that creation was successful and network is functional. Also make
   a note corresponding veth and bridge devices (ip a).

 * Reboot system, check that expected veth and bridge devices are still
   there and that nova-compute is happy e.g. try sshing to your instance.
   Also check /var/log/upstart/nova-compute.log to see if service waited
   for ovs-cleanup to finish.

[Regression Potential]

 * None

---- ---- ---- ----

There is a race when both neutron-ovs-cleanup and nova-compute trying to do operations on the qvb*** and qvo*** devices. Below is a scenario I recently met,

1. nova-compute was started and creating the veth_pair for VM instances running on the host - https://github.com/openstack/nova/blob/stable/icehouse/nova/network/linux_net.py#L1298

2. neutron-ovs-cleanup was kicked off and deleted all the ports.

3. when nova-compute tried to set the MTU at https://github.com/openstack/nova/blob/stable/icehouse/nova/network/linux_net.py#L1280 , Stderr: u'Cannot find device "qvo***"\n' was reported. Because the device that was just created was deleted again by neutron-ovs-cleanup.

As they both operate on the same resources, there needs a way to synchronize the operations the two processes do on those resources.

Related branches

Changed in neutron:
status: New → Confirmed
tags: added: cts
Revision history for this message
Edward Hope-Morley (hopem) wrote :

I think this can be fixed by adding an upstart pre-start rule similar to the one used in neutron-*-agent.upstart e.g.

pre-start script
  # Check to see if openvswitch plugin in use by checking
  # status of cleanup upstart configuration
  if status neutron-ovs-cleanup; then
    start wait-for-state WAIT_FOR=neutron-ovs-cleanup WAIT_STATE=running WAITER=nova-compute
  fi
end script

Changed in neutron:
assignee: nobody → Edward Hope-Morley (hopem)
affects: neutron → nova (Ubuntu)
Changed in nova (Ubuntu):
assignee: Edward Hope-Morley (hopem) → nobody
Changed in nova (Ubuntu):
importance: Undecided → High
assignee: nobody → Edward Hope-Morley (hopem)
status: Confirmed → In Progress
description: updated
description: updated
Revision history for this message
Edward Hope-Morley (hopem) wrote :
Revision history for this message
Edward Hope-Morley (hopem) wrote :
Changed in nova (Ubuntu Utopic):
status: New → In Progress
importance: Undecided → High
Changed in nova (Ubuntu Trusty):
importance: Undecided → High
status: New → In Progress
Changed in nova (Ubuntu Utopic):
assignee: nobody → Edward Hope-Morley (hopem)
Changed in nova (Ubuntu Trusty):
assignee: nobody → Edward Hope-Morley (hopem)
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "nova-compute-2014.1-lp1420572.patch" seems to be a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. If the attachment isn't a patch, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]

tags: added: patch
Revision history for this message
Edward Hope-Morley (hopem) wrote :

I have tested both the attached Icehouse and Juno patches and can confirm that they behave as expected i.e.

Installed nova-compute + neutron-plugin-openvswitch-agent (which installs neutron-ovs-cleanup)

In /var/log/upstart/nova-compute.log I get as expected:

libvirt-bin start/running, process 1409
wait-for-state stop/waiting
neutron-ovs-cleanup stop/waiting
wait-for-state stop/waiting

And if I add a 10 second delay to /usr/bin/neutron-ovs-cleanup I get as expected:

(time sudo service neutron-ovs-cleanup restart &); time sudo service nova-compute restart
nova-compute stop/waiting
neutron-ovs-cleanup stop/waiting
neutron-ovs-cleanup start/running

real 0m10.460s
user 0m0.010s
sys 0m0.015s
nova-compute start/running, process 3026

real 0m10.468s
user 0m0.010s
sys 0m0.014s

So, nova-compute will now always wait for ovs-cleanup to complete and I tested that if ovs-cleanup is not installed it gets ignored and nova-compute starts.

Revision history for this message
Edward Hope-Morley (hopem) wrote :
James Page (james-page)
Changed in nova (Ubuntu Vivid):
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nova - 1:2015.1~b2-0ubuntu4

---------------
nova (1:2015.1~b2-0ubuntu4) vivid; urgency=medium

  * Fixed race between nova-compute and neutron-ovs-cleanup (LP: #1420572).
 -- Edward Hope-Morley <email address hidden> Mon, 23 Feb 2015 13:41:47 +0000

Changed in nova (Ubuntu Vivid):
status: Fix Committed → Fix Released
summary: - race between neutron-ovs-cleanup and nova-compute
+ [SRU] race between neutron-ovs-cleanup and nova-compute
Revision history for this message
Chris J Arges (arges) wrote : Please test proposed package

Hello Liang, or anyone else affected,

Accepted nova into trusty-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/nova/1:2014.1.4-0ubuntu2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in nova (Ubuntu Trusty):
status: In Progress → Fix Committed
tags: added: verification-needed
Changed in nova (Ubuntu Utopic):
status: In Progress → Fix Committed
Revision history for this message
Chris J Arges (arges) wrote :

Hello Liang, or anyone else affected,

Accepted nova into utopic-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/nova/1:2014.2.2-0ubuntu2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Revision history for this message
Sebastien Bacher (seb128) wrote :

(seems like the changes got sponsored, unsubscribing sponsors)

Revision history for this message
Edward Hope-Morley (hopem) wrote :

trusty-proposed verified

Revision history for this message
Corey Bryant (corey.bryant) wrote :

Hello Chris,

This bug has been verified against nova 2014.1.4-0ubuntu2 for trusty.

Thanks,
Corey

Revision history for this message
Liang Chen (cbjchen) wrote :

Hi Chirs,

I have installed the package - version 1:2014.2.2-0ubuntu2. And I don't see the above mention problem anymore. Thanks for the fix.

Thanks,
Liang

tags: added: verification-done
removed: verification-needed
Revision history for this message
Chris J Arges (arges) wrote : Update Released

The verification of the Stable Release Update for nova has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nova - 1:2014.1.4-0ubuntu2

---------------
nova (1:2014.1.4-0ubuntu2) trusty; urgency=medium

  [ Edward Hope-Morley ]
  * Fixed race between nova-compute and neutron-ovs-cleanup (LP: #1420572)

  [ Corey Bryant ]
  * d/control: Set minimum python-six dependency to 1.5.2 (LP: #1403114).
 -- Corey Bryant <email address hidden> Mon, 30 Mar 2015 09:28:30 -0400

Changed in nova (Ubuntu Trusty):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nova - 1:2014.2.2-0ubuntu2

---------------
nova (1:2014.2.2-0ubuntu2) utopic; urgency=medium

  [ Edward Hope-Morley ]
  * Fixed race between nova-compute and neutron-ovs-cleanup (LP: #1420572)
 -- Corey Bryant <email address hidden> Mon, 30 Mar 2015 09:43:24 -0400

Changed in nova (Ubuntu Utopic):
status: Fix Committed → Fix Released
Revision history for this message
Gustavo Randich (gustavo-randich) wrote :
Download full text (3.3 KiB)

Testing Mitaka in Ubuntu Xenial, rebooting hosts with > 30 instances, we recently hit upon a race condition that seems similar to the one in this issue; maybe we need a wait condition in nova-compute's systemd unit file?

ERROR oslo_service.service [req-34d48ca5-bd93-4d10-a80a-bafad4228467 - - - - -] Error starting thread.
ERROR oslo_service.service Traceback (most recent call last):
ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/oslo_service/service.py", line 680, in run_service
ERROR oslo_service.service service.start()
ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/nova/service.py", line 198, in start
ERROR oslo_service.service self.manager.init_host()
ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1329, in init_host
ERROR oslo_service.service self._init_instance(context, instance)
ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1142, in _init_instance
ERROR oslo_service.service self.driver.plug_vifs(instance, net_info)
ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 880, in plug_vifs
ERROR oslo_service.service self.vif_driver.plug(instance, vif)
ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/vif.py", line 756, in plug
ERROR oslo_service.service func(instance, vif)
ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/vif.py", line 529, in plug_ovs
ERROR oslo_service.service self.plug_ovs_hybrid(instance, vif)
ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/vif.py", line 525, in plug_ovs_hybrid
ERROR oslo_service.service self._plug_bridge_with_port(instance, vif, port='ovs')
ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/vif.py", line 505, in _plug_bridge_with_port
ERROR oslo_service.service linux_net._create_veth_pair(v1_name, v2_name, mtu)
ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/nova/network/linux_net.py", line 1356, in _create_veth_pair
ERROR oslo_service.service _set_device_mtu(dev, mtu)
ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/nova/network/linux_net.py", line 1340, in _set_device_mtu
ERROR oslo_service.service check_exit_code=[0, 2, 254])
ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/nova/utils.py", line 388, in execute
ERROR oslo_service.service return RootwrapProcessHelper().execute(*cmd, **kwargs)
ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/nova/utils.py", line 271, in execute
ERROR oslo_service.service return processutils.execute(*cmd, **kwargs)
ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/oslo_concurrency/processutils.py", line 389, in execute
ERROR oslo_service.service cmd=sanitized_cmd)
ERROR oslo_service.service ProcessExecutionError: Unexpected error while running command.
ERROR oslo_service.service Command: sudo nova-rootwrap /etc/nova/rootwrap.conf ip link set qvo5ab170bb-a8 mtu 8950
ERROR oslo_service.se...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.