TestHAL3Agent.test_ha_router_restart_agents_no_packet_lost fullstack fails

Bug #1776459 reported by Slawek Kaplonski
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Slawek Kaplonski

Bug Description

I saw that sometimes TestHAL3Agent.test_ha_router_restart_agents_no_packet_lost fullstack test fails because there is really packet lost after agent's restart.
Example of such failure is on: http://logs.openstack.org/70/574370/1/check/neutron-fullstack/804a4fa/logs/dsvm-fullstack-logs/TestHAL3Agent.test_ha_router_restart_agents_no_packet_lost.txt.gz#_2018-06-11_21_26_57_858

What I saw in logs is that after L3 agent restart there are some warnings: http://logs.openstack.org/70/574370/1/check/neutron-fullstack/804a4fa/logs/dsvm-fullstack-logs/TestHAL3Agent.test_ha_router_restart_agents_no_packet_lost/neutron-l3-agent--2018-06-11--21-26-43-905027.txt.gz#_2018-06-11_21_27_04_621

Such warnings are not observed when test runs fine.

Revision history for this message
Slawek Kaplonski (slaweq) wrote :
Changed in neutron:
importance: Medium → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/575419

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.openstack.org/575419
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=ba1e1eb4dd7b56a72b97204a9a1e1b2f49b34268
Submitter: Zuul
Branch: master

commit ba1e1eb4dd7b56a72b97204a9a1e1b2f49b34268
Author: Slawek Kaplonski <email address hidden>
Date: Thu Jun 14 14:31:28 2018 +0200

    Mark test_ha_router_restart_agents_no_packet_lost as unstable

    Fullstack test
    test_l3_agent.TestHAL3Agent.test_ha_router_restart_agents_no_packet_lost
    is marked as unstable now becuase it is failing quite often recently.
    We need to figure out what is the reason of this issue but
    to not block gates with many failures, let's mark it as unstable
    for now.

    Change-Id: I21e590a24390345dfe451b035fd973928445e987
    Related-Bug: #1776459

tags: added: neutron-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/575710

Miguel Lavalle (minsel)
Changed in neutron:
assignee: nobody → Slawek Kaplonski (slaweq)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.openstack.org/575710
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=0b3a64480bc268bec25a4216a793cd7f1c4d1966
Submitter: Zuul
Branch: master

commit 0b3a64480bc268bec25a4216a793cd7f1c4d1966
Author: Slawek Kaplonski <email address hidden>
Date: Fri Jun 15 11:53:06 2018 +0200

    [Fullstack] Ensure connectivity to ext gw before agents restart

    In TestHAL3Agent.test_ha_router_restart_agents_no_packet_lost
    fullstack test we should first ensure that connection from external_vm
    to router's external gateway is possible. If it's fine, we
    can restart L3 agents and test if connectivity will not be broken.

    Change-Id: I1f153c553cd2dfa846ce80c166e2a35acd9169a3
    Related-Bug: #1776459

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/queens)

Related fix proposed to branch: stable/queens
Review: https://review.openstack.org/577650

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/pike)

Related fix proposed to branch: stable/pike
Review: https://review.openstack.org/577651

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (stable/pike)

Change abandoned by Slawek Kaplonski (<email address hidden>) on branch: stable/pike
Review: https://review.openstack.org/577651

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

It looks that there is packet lost when there is change of master/backup node which is in fact quite normal as it needs some time to reconfigure everything on new master node.
I will now investigate why there is such change sometimes

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

I still need to confirm that somehow but it looks for me that test fails when router which was "master" is restarted as first one.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/578402

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/queens)

Reviewed: https://review.openstack.org/577650
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=cd67ac6f0a9578ddb8a83fd217c5798c26aaa2ab
Submitter: Zuul
Branch: stable/queens

commit cd67ac6f0a9578ddb8a83fd217c5798c26aaa2ab
Author: Slawek Kaplonski <email address hidden>
Date: Thu Jun 14 14:31:28 2018 +0200

    Mark test_ha_router_restart_agents_no_packet_lost as unstable

    Fullstack test
    test_l3_agent.TestHAL3Agent.test_ha_router_restart_agents_no_packet_lost
    is marked as unstable now becuase it is failing quite often recently.
    We need to figure out what is the reason of this issue but
    to not block gates with many failures, let's mark it as unstable
    for now.

    Change-Id: I21e590a24390345dfe451b035fd973928445e987
    Related-Bug: #1776459
    (cherry picked from commit ba1e1eb4dd7b56a72b97204a9a1e1b2f49b34268)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/578402
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=c57a5d9d8aef4a9b89e3d41c293ceb40d7ebad22
Submitter: Zuul
Branch: master

commit c57a5d9d8aef4a9b89e3d41c293ceb40d7ebad22
Author: Slawek Kaplonski <email address hidden>
Date: Wed Jun 27 16:02:28 2018 +0200

    [Fullstack] HA L3 agent restart only standby agents

    In fullstack test which is testing if there is no packet lost
    during restart of agents there were restarted always all agents
    which hosted router.
    In case when as first was restarted 'master' agent it might
    lead to the case when after restart 'master' node was switched
    to second L3 agent and that caused lost of few packets and
    failed test.
    This test should only check if restart of standby agents will
    not cause any packet lost so this patch do it in this way.

    Change-Id: I6293169d7d7f35e3a9726071e63003ac569dd01e
    Closes-Bug: #1776459

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/579931

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/581384

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/582922

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/582925

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/ocata)

Reviewed: https://review.openstack.org/582925
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=9bc52421eba987b38348ec0a135723c94d7ac6c6
Submitter: Zuul
Branch: stable/ocata

commit 9bc52421eba987b38348ec0a135723c94d7ac6c6
Author: Slawek Kaplonski <email address hidden>
Date: Wed Jun 27 16:02:28 2018 +0200

    [Fullstack] HA L3 agent restart only standby agents

    In fullstack test which is testing if there is no packet lost
    during restart of agents there were restarted always all agents
    which hosted router.
    In case when as first was restarted 'master' agent it might
    lead to the case when after restart 'master' node was switched
    to second L3 agent and that caused lost of few packets and
    failed test.
    This test should only check if restart of standby agents will
    not cause any packet lost so this patch do it in this way.

    Change-Id: I6293169d7d7f35e3a9726071e63003ac569dd01e
    Closes-Bug: #1776459
    (cherry picked from commit c57a5d9d8aef4a9b89e3d41c293ceb40d7ebad22)

tags: added: in-stable-ocata
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 13.0.0.0b3

This issue was fixed in the openstack/neutron 13.0.0.0b3 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/pike)

Reviewed: https://review.openstack.org/582922
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=e5584db43a1764a7baf4d79c590374536d3faba4
Submitter: Zuul
Branch: stable/pike

commit e5584db43a1764a7baf4d79c590374536d3faba4
Author: Slawek Kaplonski <email address hidden>
Date: Wed Jun 27 16:02:28 2018 +0200

    [Fullstack] HA L3 agent restart only standby agents

    In fullstack test which is testing if there is no packet lost
    during restart of agents there were restarted always all agents
    which hosted router.
    In case when as first was restarted 'master' agent it might
    lead to the case when after restart 'master' node was switched
    to second L3 agent and that caused lost of few packets and
    failed test.
    This test should only check if restart of standby agents will
    not cause any packet lost so this patch do it in this way.

    Change-Id: I6293169d7d7f35e3a9726071e63003ac569dd01e
    Closes-Bug: #1776459
    (cherry picked from commit c57a5d9d8aef4a9b89e3d41c293ceb40d7ebad22)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.openstack.org/579931
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=88a0ebbe7fca2757c2d5d644058c22c896a8e92d
Submitter: Zuul
Branch: master

commit 88a0ebbe7fca2757c2d5d644058c22c896a8e92d
Author: Brian Haley <email address hidden>
Date: Tue Jul 3 13:31:02 2018 -0400

    Add fullstack test to restart agent with active l3-ha router

    Re-start of the l3 agent hosting the active l3-ha router
    shouldn't cause data plane interruption, assuming there
    is no failover. Create a test explicitly for that.

    Change-Id: I5963c21e2b382a09c40b81e2446350696e16d265
    Related-Bug: #1776459

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 12.0.4

This issue was fixed in the openstack/neutron 12.0.4 release.

tags: added: neutron-easy-proactive-backport-potential
tags: removed: neutron-easy-proactive-backport-potential neutron-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 11.0.6

This issue was fixed in the openstack/neutron 11.0.6 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron ocata-eol

This issue was fixed in the openstack/neutron ocata-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.