fullstack job failing to create namespace because it's already exists

Bug #1717582 reported by Slawek Kaplonski
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Undecided
Slawek Kaplonski

Bug Description

In fullstack tests there is many times error with creating e.g. "host-xxxx" namespace. Error message tells that such namespace already exists and test is failing because of that.
It looks that this is kind of race condition when ip.netns.exists() method return false but few miliseconds later namespace already exists and ip.netns.add() method fails.
It is in method ensure_namespace() in https://github.com/openstack/neutron/blob/master/neutron/agent/linux/ip_lib.py#L204

Tags: fullstack
Changed in neutron:
assignee: nobody → Slawek Kaplonski (slaweq)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/503890
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=fd1403fd9a971cf3cbd863fa33ca68eb019fbdc1
Submitter: Jenkins
Branch: master

commit fd1403fd9a971cf3cbd863fa33ca68eb019fbdc1
Author: Sławek Kapłoński <email address hidden>
Date: Thu Sep 14 02:11:50 2017 +0000

    Fix for race condition during netns creation

    In some cases if ip_lib.IPWrapper.ensure_namespace() method
    is called more than once for same namespace in very short
    period of time it could raise error that "File already exists"
    for second call of this method.
    It happens often e.g. in fullstack tests.
    Reason of such problem is in Netlink protocol which is used
    by iproute2 to communicate with kernel. This protocol, according
    to http://man7.org/linux/man-pages/man7/netlink.7.html is not
    reliable so it is not guaranteed when the message will be
    delivered to kernel and when action will be really executed.
    Because of that if on quite loaded host ensure_namespace() method
    would be executed twice it can lead to error described above.

    This patch is changing way how ensure_namespace() method works
    to avoid raising ProcessExecutionError exception with this
    error message.

    Closes-Bug: #1717582
    Change-Id: I1898426789c85ce1faa97665bfd47f1fa38ef727

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/505749

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/505750

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/506839

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Slawek Kaplonski (<email address hidden>) on branch: master
Review: https://review.openstack.org/506839
Reason: it'a already done in https://review.openstack.org/#/c/505701/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (stable/pike)

Change abandoned by Ihar Hrachyshka (<email address hidden>) on branch: stable/pike
Review: https://review.openstack.org/505749
Reason: We revert the fix: https://review.openstack.org/#/c/507382/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (stable/ocata)

Change abandoned by Ihar Hrachyshka (<email address hidden>) on branch: stable/ocata
Review: https://review.openstack.org/505750
Reason: We revert the fix: https://review.openstack.org/#/c/507382/

tags: added: neutron-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/505701
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=4f627b4e8dfe699944a196fe90e0642cced6278f
Submitter: Jenkins
Branch: master

commit 4f627b4e8dfe699944a196fe90e0642cced6278f
Author: Brian Haley <email address hidden>
Date: Wed Sep 20 16:09:04 2017 -0400

    Change ip_lib network namespace code to use pyroute2

    Change network namespace add/delete/list code to use
    pyroute2 library instead of calling /sbin/ip.

    Also changed all in-tree callers to use the new calls.

    Closes-bug: #1717582
    Related-bug: #1492714

    Change-Id: Id802e77543177fbb95ff15c2c7361172e8824633

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 12.0.0.0b1

This issue was fixed in the openstack/neutron 12.0.0.0b1 development milestone.

tags: removed: neutron-proactive-backport-potential
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.