Neutron 18.6.0 - Wallaby on Ubuntu 20.04, neutron-dhcp-agent RPC unusually slow

Bug #2011513 reported by Zakhar Kirpichenko
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu Cloud Archive
Fix Released
Undecided
Unassigned
Wallaby
Fix Released
High
Unassigned
neutron
Fix Committed
Undecided
Rodolfo Alonso
neutron (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Hi!

We're running Openstack Wallaby on Ubuntu 20.04, 3 high-performance infra nodes with a RabbitMQ cluster. I updated Neutron components to version 18.6.0, which recently became available in the cloud repository (http://ubuntu-cloud.archive.canonical.com/ubuntu focal-updates/wallaby main). The exact package versions updated are as follows:

Install: libunbound8:amd64 (1.9.4-2ubuntu1.4, automatic), openvswitch-common:amd64 (2.15.2-0ubuntu1~cloud0, automatic)
Upgrade: neutron-common:amd64 (2:18.5.0-0ubuntu1~cloud0, 2:18.6.0-0ubuntu1~cloud1), python3-werkzeug:amd64 (0.16.1+dfsg1-2, 0.16.1+dfsg1-2ubuntu0.1), neutron-dhcp-agent:amd64 (2:18.5.0-0ubuntu1~cloud0, 2:18.6.0-0ubuntu1~cloud1), neutron-l3-agent:amd64 (2:18.5.0-0ubuntu1~cloud0, 2:18.6.0-0ubuntu1~cloud1), python3-neutron:amd64 (2:18.5.0-0ubuntu1~cloud0, 2:18.6.0-0ubuntu1~cloud1), neutron-server:amd64 (2:18.5.0-0ubuntu1~cloud0, 2:18.6.0-0ubuntu1~cloud1), neutron-plugin-ml2:amd64 (2:18.5.0-0ubuntu1~cloud0, 2:18.6.0-0ubuntu1~cloud1), neutron-metadata-agent:amd64 (2:18.5.0-0ubuntu1~cloud0, 2:18.6.0-0ubuntu1~cloud1), neutron-linuxbridge-agent:amd64 (2:18.5.0-0ubuntu1~cloud0, 2:18.6.0-0ubuntu1~cloud1)

Installed Neutron packages:

ii neutron-common 2:18.6.0-0ubuntu1~cloud1 all Neutron is a virtual network service for Openstack - common
ii neutron-dhcp-agent 2:18.6.0-0ubuntu1~cloud1 all Neutron is a virtual network service for Openstack - DHCP agent
 Firewall-as-a-Service driver for OpenStack Neutron
ii neutron-l3-agent 2:18.6.0-0ubuntu1~cloud1 all Neutron is a virtual network service for Openstack - l3 agent
ii neutron-linuxbridge-agent 2:18.6.0-0ubuntu1~cloud1 all Neutron is a virtual network service for Openstack - linuxbridge agent
ii neutron-metadata-agent 2:18.6.0-0ubuntu1~cloud1 all Neutron is a virtual network service for Openstack - metadata agent
ii neutron-plugin-ml2 2:18.6.0-0ubuntu1~cloud1 all Neutron is a virtual network service for Openstack - ML2 plugin
ii neutron-server 2:18.6.0-0ubuntu1~cloud1 all Neutron is a virtual network service for Openstack - server
ii python3-neutron 2:18.6.0-0ubuntu1~cloud1 all Neutron is a virtual network service for Openstack - Python library
ii python3-neutron-lib 2.10.1-0ubuntu1~cloud0 all Neutron shared routines and utilities - Python 3.x
ii python3-neutronclient 1:7.2.1-0ubuntu1~cloud0 all client API library for Neutron - Python 3.x

Normally this would be an easy update, but this time neutron-dhcp-agent doesn't work properly:

2023-03-14 05:44:27.572 2534501 INFO neutron.agent.dhcp.agent [req-4a362701-cc1f-4b9d-87e6-045b6a388709 - - - - -] Synchronizing state complete
2023-03-14 05:44:38.868 2534501 ERROR neutron_lib.rpc [req-cb1dc604-1372-44cd-bc06-09496ed5f68f - - - - -] Timeout in RPC method dhcp_ready_on_ports. Waiting for 55 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID bd97110b004e413cb2d6b05d9fb3b57c
2023-03-14 05:44:38.871 2534501 WARNING neutron_lib.rpc [req-cb1dc604-1372-44cd-bc06-09496ed5f68f - - - - -] Increasing timeout for dhcp_ready_on_ports calls to 120 seconds. Restart the agent to restore it to the default value.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID bd97110b004e413cb2d6b05d9fb3b57c
2023-03-14 05:45:34.244 2534501 ERROR neutron.agent.dhcp.agent [req-cb1dc604-1372-44cd-bc06-09496ed5f68f - - - - -] Timeout notifying server of ports ready. Retrying...: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID bd97110b004e413cb2d6b05d9fb3b57c
2023-03-14 05:47:10.876 2534501 INFO oslo_messaging._drivers.amqpdriver [-] No calling threads waiting for msg_id : bd97110b004e413cb2d6b05d9fb3b57c
2023-03-14 05:47:34.353 2534501 ERROR neutron_lib.rpc [req-607a9252-49b1-4043-aa0d-2457b78dc99e - - - - -] Timeout in RPC method dhcp_ready_on_ports. Waiting for 27 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID f254f735998243c4b0a58ce95c974534
2023-03-14 05:47:34.354 2534501 WARNING neutron_lib.rpc [req-607a9252-49b1-4043-aa0d-2457b78dc99e - - - - -] Increasing timeout for dhcp_ready_on_ports calls to 240 seconds. Restart the agent to restore it to the default value.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID f254f735998243c4b0a58ce95c974534
2023-03-14 05:47:46.681 2534501 INFO oslo_messaging._drivers.amqpdriver [-] No calling threads waiting for msg_id : f254f735998243c4b0a58ce95c974534
2023-03-14 05:48:01.086 2534501 ERROR neutron.agent.dhcp.agent [req-607a9252-49b1-4043-aa0d-2457b78dc99e - - - - -] Timeout notifying server of ports ready. Retrying...: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID f254f735998243c4b0a58ce95c974534
2023-03-14 05:49:45.035 2534501 INFO neutron.agent.dhcp.agent [req-5935a0d0-a981-463c-a4ea-23ccbb54c896 - - - - -] DHCP configuration for ports ... (A successful configuration here).

While neutron-dhcp-agent is waiting, neutron-server log gets filled up with:

neutron-server.log:2023-03-14 05:47:05.761 4171971 INFO neutron.plugins.ml2.plugin [req-cb1dc604-1372-44cd-bc06-09496ed5f68f - - - - -] Attempt 1 to provision port 18cddbb8-f3ed-4b49-9c6f-c0c67b4f7c76
...
neutron-server.log:2023-03-14 05:47:10.727 4171971 INFO neutron.plugins.ml2.plugin [req-cb1dc604-1372-44cd-bc06-09496ed5f68f - - - - -] Attempt 10 to provision port 18cddbb8-f3ed-4b49-9c6f-c0c67b4f7c76

This repeats for each port of each network neutron-dhcp-agent needs to configure.

Each subsequent configuration for each network takes about 1-2 minutes, depending on the network size. With earlier Neutron versions the whole process of configuring all networks would finish in under a minute, i.e. DHCP configuration per port (and network) is several orders of magnitude slower than it should be. Once neutron-dhcp-agent finishes synchronization, it seems to work without issues although there aren't that many changes in our cloud to tell whether it's fast or slow, individual port updates seem to happen quickly.

All other services are working well, RabbitMQ cluster is working well, infra nodes are not overloaded and there are no apparent issues other than this one with Neutron, thus I am inclined to think that the issue is specific to version 18.6.0 of neutron-dhcp-agent or neutron-server.

I would appreciate any advice!

Best regards,
Zakhar

summary: - Wallaby on Ubuntu 20.04, Neutron 18.6.0 neutron-dhcp-agent RPC unusually
- slow
+ Neutron 18.6.0 - Wallaby on Ubuntu 20.04, neutron-dhcp-agent RPC
+ unusually slow
Changed in neutron:
assignee: nobody → Rodolfo Alonso (rodolfo-alonso-hernandez)
status: New → Confirmed
Changed in neutron:
status: Confirmed → Fix Committed
Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello:

The issue reported was fixed in [1]. The patch introducing this active wait for all ports [2] is in Neutron 18.6.0 but the patch fixing this issue [3] is still not released.

The next release will be the EOL one, but that will be in more than one year. If needed, you can manually patch this version or use 18.5.0. Remember that Wallaby is in extended maintenance mode: we still push patches (like [3]), but no new releases will be delivered until the EOL one.

Regards.

[1]https://bugs.launchpad.net/neutron/+bug/1991092
[2]https://review.opendev.org/c/openstack/neutron/+/857569
[3]https://review.opendev.org/c/openstack/neutron/+/863620

Revision history for this message
Zakhar Kirpichenko (kzakhar) wrote :

Thanks for this, Rodolfo.

Please consider making an interim release without the [2] or with a fix, as 18.6.0 was released and it breaks things in a bad way.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

This is in progress for the wallaby cloud archive. For releases > wallaby, we'll pick the fix up in stable point releases.

Changed in neutron (Ubuntu):
status: New → Fix Released
Changed in cloud-archive:
status: New → Fix Released
Revision history for this message
Corey Bryant (corey.bryant) wrote : Please test proposed package

Hello Zakhar, or anyone else affected,

Accepted neutron into wallaby-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:wallaby-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-wallaby-needed to verification-wallaby-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-wallaby-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-wallaby-needed
Revision history for this message
Zakhar Kirpichenko (kzakhar) wrote :

Thanks Corey, I'll try the fix and get back with our results.

Revision history for this message
Zakhar Kirpichenko (kzakhar) wrote :

I'm happy to report that the fix appears to be successful. I used the following packages for testing:

neutron-common 2:18.6.0-0ubuntu1~cloud2
neutron-dhcp-agent 2:18.6.0-0ubuntu1~cloud2
neutron-l3-agent 2:18.6.0-0ubuntu1~cloud2
neutron-linuxbridge-agent 2:18.6.0-0ubuntu1~cloud2
neutron-metadata-agent 2:18.6.0-0ubuntu1~cloud2
neutron-plugin-ml2 2:18.6.0-0ubuntu1~cloud2
neutron-server 2:18.6.0-0ubuntu1~cloud2
python3-neutron 2:18.6.0-0ubuntu1~cloud2

Performance is back to expected levels, neutron-server logging is limited to 1 entry per port: "Attempt 1 to provision port xxx". I'm updating the tag to verification-wallaby-done.

tags: added: verification-wallaby-done
removed: verification-wallaby-needed
Revision history for this message
Corey Bryant (corey.bryant) wrote :

Zakhar/Rodlofo, thank you for the help on this!

Revision history for this message
Corey Bryant (corey.bryant) wrote : Update Released

The verification of the Stable Release Update for neutron has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

This bug was fixed in the package neutron - 2:18.6.0-0ubuntu1~cloud2
---------------

 neutron (2:18.6.0-0ubuntu1~cloud2) focal-wallaby; urgency=medium
 .
   * d/p/port-provisioning-retry-only-for-vm-ports.patch: Fix
     performance regression introduced in 18.6.0 (LP: #2011513).

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.