test_cleanup_stale_devices functional test sporadic failures

Bug #1604115 reported by Assaf Muller
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Ihar Hrachyshka

Bug Description

Revision history for this message
Assaf Muller (amuller) wrote :

Both successful and failed runs show the same, snipped / simplified:

Found stale device tapfoo_id2, deleting _cleanup_stale_devices neutron/agent/linux/dhcp.py:1215
Running txn command(idx=0): DelPortCommand(bridge=test-br75746765, port=tapfoo_id2, if_exists=True) do_commit neutron/agent/ovsdb/impl_idl.py:83
Found stale device tapfoo_id3, deleting _cleanup_stale_devices neutron/agent/linux/dhcp.py:1215
Running txn command(idx=0): DelPortCommand(bridge=test-br75746765, port=tapfoo_id3, if_exists=True) do_commit neutron/agent/ovsdb/impl_idl.py:83
Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qdhcp-foo_id', 'find', '/sys/class/net', '-maxdepth', '1', '-type', 'l', '-printf', '%f '] execute_rootwrap_daemon neutron/agent/linux/utils.py:99

We then assert that the find command returns 1 non-loopback device in the namespace, which is the DHCP interface itself. However, for failed runs we get 2 or 3 (I've seen both) even though the OVS delete command did not return any errors. The only explanation I can come up with is that there is asynchronicity involved and that the command returns before the device is entirely deleted from the Linux network stack. In which case we have to ask ourselves if we expect Neutron's ovs_lib delete_port to return when it's done, or if the test needs to be fixed to loop the get_devices until it returns the expected number of devices.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/344468

Changed in neutron:
assignee: nobody → Armando Migliaccio (armando-migliaccio)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/344468
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=5bbb802222ba3b4de9a134ce728421dc2b077b11
Submitter: Jenkins
Branch: master

commit 5bbb802222ba3b4de9a134ce728421dc2b077b11
Author: Armando Migliaccio <email address hidden>
Date: Tue Jul 19 14:04:03 2016 -0700

    Ensure test_cleanup_stale_devices fails gracefully

    Give some time for devices to be cleared before we claim defeat.

    Change-Id: I34b2ec634f1c9ec27a1b82cc3f55a5e0b7d71237
    Closes-bug: #1604115

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.openstack.org/344859
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=11dc21d3a6d765f7bcc95548b55ff13c4397c2e7
Submitter: Jenkins
Branch: master

commit 11dc21d3a6d765f7bcc95548b55ff13c4397c2e7
Author: Terry Wilson <email address hidden>
Date: Fri Apr 22 08:55:11 2016 -0500

    Wait for vswitchd to add interfaces in native ovsdb

    ovs-vsctl, unless --no-wait is passed, will wait until ovs-vswitchd
    has reacted to a successful transaction. This patch implements
    the same logic, waiting for next_cfg to be incremented and checking
    that any added interfaces have actually been assigned ofports.

    Closes-Bug: #1604816
    Closes-Bug: #1604370
    Related-Bug: #1604115
    Change-Id: I638b82c13394f150c0bd23301285bd3375e66139

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/mitaka)

Related fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/349416

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/mitaka)

Reviewed: https://review.openstack.org/349416
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=a98b6ebe48f4ecda689646d05050e156bc28be4d
Submitter: Jenkins
Branch: stable/mitaka

commit a98b6ebe48f4ecda689646d05050e156bc28be4d
Author: Terry Wilson <email address hidden>
Date: Fri Apr 22 08:55:11 2016 -0500

    Wait for vswitchd to add interfaces in native ovsdb

    ovs-vsctl, unless --no-wait is passed, will wait until ovs-vswitchd
    has reacted to a successful transaction. This patch implements
    the same logic, waiting for next_cfg to be incremented and checking
    that any added interfaces have actually been assigned ofports.

    Closes-Bug: #1604816
    Closes-Bug: #1604370
    Related-Bug: #1604115
    Change-Id: I638b82c13394f150c0bd23301285bd3375e66139
    (cherry picked from commit 11dc21d3a6d765f7bcc95548b55ff13c4397c2e7)

tags: added: in-stable-mitaka
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 9.0.0.0b3

This issue was fixed in the openstack/neutron 9.0.0.0b3 development milestone.

Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :
Changed in neutron:
status: Fix Released → Confirmed
assignee: Armando Migliaccio (armando-migliaccio) → Ihar Hrachyshka (ihar-hrachyshka)
Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

It may be gre devices popping up. We seem to consider them for calculation.

Changed in neutron:
importance: Medium → High
tags: removed: in-stable-mitaka
Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :
Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

For posterity, the traceback is:

2017-04-04 09:57:18.991708 | 2017-04-04 09:57:18.990 | Captured traceback:
2017-04-04 09:57:18.993395 | 2017-04-04 09:57:18.993 | ~~~~~~~~~~~~~~~~~~~
2017-04-04 09:57:18.995605 | 2017-04-04 09:57:18.994 | Traceback (most recent call last):
2017-04-04 09:57:18.998772 | 2017-04-04 09:57:18.997 | File "neutron/tests/base.py", line 116, in func
2017-04-04 09:57:19.001732 | 2017-04-04 09:57:19.000 | return f(self, *args, **kwargs)
2017-04-04 09:57:19.004336 | 2017-04-04 09:57:19.003 | File "neutron/tests/functional/agent/linux/test_dhcp.py", line 80, in test_cleanup_stale_devices
2017-04-04 09:57:19.007712 | 2017-04-04 09:57:19.007 | self.assertEqual(2, len(devices))
2017-04-04 09:57:19.009504 | 2017-04-04 09:57:19.009 | File "/opt/stack/new/neutron/.tox/dsvm-functional/local/lib/python2.7/site-packages/testtools/testcase.py", line 411, in assertEqual
2017-04-04 09:57:19.011261 | 2017-04-04 09:57:19.010 | self.assertThat(observed, matcher, message)
2017-04-04 09:57:19.013026 | 2017-04-04 09:57:19.012 | File "/opt/stack/new/neutron/.tox/dsvm-functional/local/lib/python2.7/site-packages/testtools/testcase.py", line 498, in assertThat
2017-04-04 09:57:19.014816 | 2017-04-04 09:57:19.014 | raise mismatch_error
2017-04-04 09:57:19.016733 | 2017-04-04 09:57:19.016 | testtools.matchers._impl.MismatchError: 2 != 4

Changed in neutron:
milestone: none → pike-1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/454870

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.openstack.org/454872

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.openstack.org/455431

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.openstack.org/454870
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=5c286f590f390a25399093afc61cbf4db396fe1c
Submitter: Jenkins
Branch: master

commit 5c286f590f390a25399093afc61cbf4db396fe1c
Author: Ihar Hrachyshka <email address hidden>
Date: Fri Apr 7 13:07:33 2017 -0700

    Ignore gre devices when fetching devices in test_cleanup_stale_devices

    They may show up in namespaces depending on kernel modules loaded.

    Change-Id: I78892244d17c4ab7421d3eae9bdeeec1e69690bc
    Related-Bug: #1604115

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/ocata)

Related fix proposed to branch: stable/ocata
Review: https://review.openstack.org/456727

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/newton)

Related fix proposed to branch: stable/newton
Review: https://review.openstack.org/456728

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/newton)

Reviewed: https://review.openstack.org/456728
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=723df147b91f6397666b6193063c70297a331ce6
Submitter: Jenkins
Branch: stable/newton

commit 723df147b91f6397666b6193063c70297a331ce6
Author: Ihar Hrachyshka <email address hidden>
Date: Fri Apr 7 13:07:33 2017 -0700

    Ignore gre devices when fetching devices in test_cleanup_stale_devices

    They may show up in namespaces depending on kernel modules loaded.

    Change-Id: I78892244d17c4ab7421d3eae9bdeeec1e69690bc
    Related-Bug: #1604115
    (cherry picked from commit 5c286f590f390a25399093afc61cbf4db396fe1c)

tags: added: in-stable-newton
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/ocata)

Reviewed: https://review.openstack.org/456727
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=662b3cf2ad12c3e9f77d4afaee7409614b2b7588
Submitter: Jenkins
Branch: stable/ocata

commit 662b3cf2ad12c3e9f77d4afaee7409614b2b7588
Author: Ihar Hrachyshka <email address hidden>
Date: Fri Apr 7 13:07:33 2017 -0700

    Ignore gre devices when fetching devices in test_cleanup_stale_devices

    They may show up in namespaces depending on kernel modules loaded.

    Change-Id: I78892244d17c4ab7421d3eae9bdeeec1e69690bc
    Related-Bug: #1604115
    (cherry picked from commit 5c286f590f390a25399093afc61cbf4db396fe1c)

tags: added: in-stable-ocata
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.openstack.org/454872
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=91c15edf54c07da77ecf4fc335b0ba3951ff1f90
Submitter: Jenkins
Branch: master

commit 91c15edf54c07da77ecf4fc335b0ba3951ff1f90
Author: Ihar Hrachyshka <email address hidden>
Date: Fri Apr 7 13:10:06 2017 -0700

    Ignore gre devices in namespaces when cleaning up devices

    Agents and netns_cleanup tool attempt to clean up devices from
    namespaces before destroying namespaces, but they should skip doing it
    for gre devices that are automatic and show up depending on kernel
    modules loaded.

    Change-Id: Ie95890ed92ac73ec8e2d118a9727b9e1624a5178
    Related-Bug: #1604115

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/455431
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=1ae91ce9be56fd6952d53ce3f8b094a6958b2709
Submitter: Jenkins
Branch: master

commit 1ae91ce9be56fd6952d53ce3f8b094a6958b2709
Author: Ihar Hrachyshka <email address hidden>
Date: Mon Apr 10 12:21:41 2017 -0700

    ip_lib: ignore gre and lo devices in get_devices by default

    This is the most common use pattern for the method, so it makes sense to
    make it default.

    (Actually, it may be that there are no usage for the arguments
    whatsoever, but better safe than sorry.)

    NeutronLibImpact this change potentially breaks callers of get_devices
    that may want to get the automatic devices by default. Those imaginary
    callers may need to set exclude_gre_devices and/or exclude_loopback to
    True from now on.

    Change-Id: Ic32b8abc7f8502b8907ae21c996e13cb8fd5401d
    Related-Bug: #1604115

Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

No hits in 7 days. Closing, please reopen if we see it again.

Changed in neutron:
status: Confirmed → Fix Released
tags: added: neutron-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/ocata)

Related fix proposed to branch: stable/ocata
Review: https://review.openstack.org/474274

tags: removed: neutron-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/newton)

Related fix proposed to branch: stable/newton
Review: https://review.openstack.org/474275

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/ocata)

Reviewed: https://review.openstack.org/474274
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=b744172fb8b96e6779ebc286d55d650788390a20
Submitter: Jenkins
Branch: stable/ocata

commit b744172fb8b96e6779ebc286d55d650788390a20
Author: Ihar Hrachyshka <email address hidden>
Date: Fri Apr 7 13:10:06 2017 -0700

    Ignore gre devices in namespaces when cleaning up devices

    Agents and netns_cleanup tool attempt to clean up devices from
    namespaces before destroying namespaces, but they should skip doing it
    for gre devices that are automatic and show up depending on kernel
    modules loaded.

    Change-Id: Ie95890ed92ac73ec8e2d118a9727b9e1624a5178
    Related-Bug: #1604115
    (cherry picked from commit 91c15edf54c07da77ecf4fc335b0ba3951ff1f90)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/newton)

Reviewed: https://review.openstack.org/474275
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=1e3db7e08f57e9660244e68e1e8363e741e3bd67
Submitter: Jenkins
Branch: stable/newton

commit 1e3db7e08f57e9660244e68e1e8363e741e3bd67
Author: Ihar Hrachyshka <email address hidden>
Date: Fri Apr 7 13:10:06 2017 -0700

    Ignore gre devices in namespaces when cleaning up devices

    Agents and netns_cleanup tool attempt to clean up devices from
    namespaces before destroying namespaces, but they should skip doing it
    for gre devices that are automatic and show up depending on kernel
    modules loaded.

    Conflicts:
     neutron/agent/l3/dvr_snat_ns.py

    Change-Id: Ie95890ed92ac73ec8e2d118a9727b9e1624a5178
    Related-Bug: #1604115
    (cherry picked from commit 91c15edf54c07da77ecf4fc335b0ba3951ff1f90)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.