[SRU] trusty/icehouse neutron-plugin-openvswitch-agent: lvm.tun_ofports.remove crashes with KeyError

Bug #1531963 reported by JuanJo Ciarlante
30
This bug affects 5 people
Affects Status Importance Assigned to Milestone
neutron (Ubuntu)
Fix Released
High
Corey Bryant
Trusty
Fix Released
High
Unassigned

Bug Description

[Impact]
Neutron OVS breaks with unhandled exceptions on compute nodes.

[Test Case]
For reproduction see the original bug description below.

[Regression Potential]

The backported patch is very straightforward, with a few minor conflicts noted in the patch.

-------------

Original bug description:

Filing this on ubuntu/neutron package, as neutron itself is EOL'd for Icehouse.

FYI this is a nonHA icehouse/trusty deploy using serverteam's juju charms.

On one of our production environments with a rather high rate of API calls, (sp for transient VMs from CI), we frequently get neutron OVS breakage on compute nodes¹, which we've been able to more or less correlate with the following alike errors at /var/log/neutron/openvswitch-agent.log:

2016-01-07 06:33:48.917 18357 TRACE neutron.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/neutron/plugins/openvswitch/agent/ovs_neutron_agent.py", line 399, in _del_fdb_flow
2016-01-07 06:33:48.917 18357 TRACE neutron.openstack.common.rpc.amqp lvm.tun_ofports.remove(ofport)
2016-01-07 06:33:48.917 18357 TRACE neutron.openstack.common.rpc.amqp KeyError: '13'

Detailed log: http://paste.ubuntu.com/14431656/ - note the same time of occurrence on the 3 diff compute nodes shown there.

¹ What we then observe are missing are missing tun_ids from
  ovs-ofctl dump-flows br-tun
ie provider:segmentation_id not present at the compute node for a VM with a neutron network that has it.

Afaics this had been fixed upstream at ( lp#1421105 ):
https://git.openstack.org/cgit/openstack/neutron/commit/?id=841b2f58f375df53b380cf5796bb31c82cd09260
, please consider backporting it to Icehouse, it's a pretty trivial fix.

JuanJo Ciarlante (jjo)
description: updated
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in neutron (Ubuntu):
status: New → Confirmed
James Troup (elmo)
tags: added: openstack sts
Revision history for this message
Antonio Rosales (arosales) wrote :

Chatting with Anthony and Tom this hasn't been tested on Openstack > Icehouse but the fix (https://git.openstack.org/cgit/openstack/neutron/commit/?id=841b2f58f375df53b380cf5796bb31c82cd09260) is suggested to resolve the issue they are seeing.

Next action, triage https://git.openstack.org/cgit/openstack/neutron/commit/?id=841b2f58f375df53b380cf5796bb31c82cd09260 and confirm back-porting to Icehouse.

Changed in neutron (Ubuntu):
status: Confirmed → Triaged
Revision history for this message
Corey Bryant (corey.bryant) wrote :

Seems like a fairly straightforward backport for icehouse (assuming you don't need the ofagent bits).

Revision history for this message
Corey Bryant (corey.bryant) wrote :

Note, fix is already in juno so we're ready to go straight to trusty/icehouse.

Changed in neutron (Ubuntu):
status: Triaged → Fix Committed
assignee: nobody → Corey Bryant (corey.bryant)
importance: Undecided → High
summary: - trusty/icehouse neutron-plugin-openvswitch-agent: lvm.tun_ofports.remove
- crashes with KeyError
+ [SRU] trusty/icehouse neutron-plugin-openvswitch-agent:
+ lvm.tun_ofports.remove crashes with KeyError
description: updated
James Page (james-page)
Changed in neutron (Ubuntu Trusty):
status: New → In Progress
importance: Undecided → High
Changed in neutron (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
Chris J Arges (arges) wrote : Please test proposed package

Hello JuanJo, or anyone else affected,

Accepted neutron into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/neutron/1:2014.1.5-0ubuntu3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in neutron (Ubuntu Trusty):
status: In Progress → Fix Committed
tags: added: verification-needed
Revision history for this message
JuanJo Ciarlante (jjo) wrote :

Thanks Chris for the updates - FYI we've upgraded all of our compute nodes
1:2014.1.5-0ubuntu3 from proposed, no (extra)issues so far after some hours,
FYI this stack has ~30 nodes, ~1k+ active instances.

We expect this change to (obviously) stop those KeyError messages at log,
and likely also stop nodes from missing tun_ids - FYI we regularly get alerted
for the latter (~several times a week), I'll add an update next week on how
it went.

Revision history for this message
JuanJo Ciarlante (jjo) wrote :

Chris: confirming this bug most likely fixed indeed by
1:2014.1.5-0ubuntu3, as there has been no further alerts from missing
tun_ids since it got installed 1 week ago (recall we had been getting
several of those per week).

Thanks! :) --J

Revision history for this message
Corey Bryant (corey.bryant) wrote :

Thanks JuanJo. Regression tests for neutron have also passed successfully. Marking this as verification-done.

tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package neutron - 1:2014.1.5-0ubuntu3

---------------
neutron (1:2014.1.5-0ubuntu3) trusty; urgency=medium

  [ Corey Bryant ]
  * d/p/make_del_fdb_flow_idempotent.patch: Cherry pick from Juno
    to prevent KeyError on duplicate port removal in del_fdb_flow()
    (LP: #1531963).
  * d/tests/*-plugin: Fix race between service restart and pidof test.

  [ James Page ]
  * d/p/ovs-restart.patch: Ensure that tunnels are fully reset on ovs
    restart (LP: #1460164).

 -- Corey Bryant <email address hidden> Wed, 10 Feb 2016 14:52:04 -0500

Changed in neutron (Ubuntu Trusty):
status: Fix Committed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Update Released

The verification of the Stable Release Update for neutron has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.