ovsdb: raft: Fix probe intervals after install snapshot request.
If the new snapshot received with INSTALL_SNAPSHOT request contains
a different election timer value, the timer is updated, but the
probe intervals for RAFT connections are not.
Fix that by updating probe intervals whenever we get election timer
from the log.
ovsdb: raft: Fix inability to join a cluster with a large database.
Inactivity probe interval on RAFT connections depend on a value of the
election timer. However, the actual value is not known until the
database snapshot with the RAFT information is received by a joining
server. New joining server is using a default 1 second until then.
In case a new joining server is trying to join an existing cluster
with a large database, it may take more than a second to generate and
send an initial database snapshot. This is causing an inability to
actually join this cluster. Joining server sends ADD_SERVER request,
waits 1 second, sends a probe, doesn't get a reply within another
second, because the leader is busy preparing and sending an initial
snapshot to it, disconnects, repeat.
This is not an issue for the servers that did already join, since
their probe intervals are larger than election timeout.
Cooperative multitasking also doesn't fully solve this issue, since
it depends on election timer, which is likely higher in the existing
cluster with a very big database.
Fix that by using the maximum election timer value for inactivity
probes until the actual value is known. We still shouldn't completely
disable the probes, because in the rare event the connection is
established but the other side silently goes away, we still want to
disconnect and try to re-establish the connection eventually.
Since probe intervals also depend on the joining state now, update
them when the server joins the cluster.
Fixes: 14b2b0aad7ae ("raft: Reintroduce jsonrpc inactivity probes.")
Reported-by: Terry Wilson <email address hidden>
Reported-at: https://issues.redhat.com/browse/FDP-144
Acked-by: Mike Pattrick <email address hidden>
Signed-off-by: Ilya Maximets <email address hidden>
rhel/systemd: Set ovsdb-server timeout to 5 minutes.
If the database is particularly large (multi-GB), ovsdb-server can take
several minutes to come up. This tends to fall afoul of the default
systemd start timeout, which is typically 90s, putting the service into
an infinite restart loop.
To avoid this, set the timeout to a more generous 5 minutes.
This change brings ovsdb-server's timeout in line with ovs-vswitchd,
which got the same treatment in commit c1c69e8a45 ("rhel/systemd: Set
ovs-vswitchd timeout to 5 minutes").
Acked-by: Simon Horman <email address hidden>
Signed-off-by: Chris Riches <email address hidden>
Signed-off-by: Ilya Maximets <email address hidden>
ovsdb-idl: Add python keyword to persistent UUID test.
The Python persistent UUID tests should have the keyword "python"
added so that TESTSUITEFLAGS="-k python" will not miss testing
them.
Fixes: 55b9507e6824 ("ovsdb-idl: Add the support to specify the uuid for row insert.")
Signed-off-by: Terry Wilson <email address hidden>
Tested-by: Simon Horman <email address hidden>
Signed-off-by: Simon Horman <email address hidden>
Before the patch, the size of the backlog depended on the type of socket
(UNIX vs INET) as well as on the language (C vs Python), specifically:
- python used backlog size = 10 for all sockets;
- C used 64 for UNIX sockets but 10 for INET sockets.
This consolidates the values across the board. It effectively bumps the
number of simultaneous connections to python unixctl servers to 64. Also
for INET C servers too.
The rationale to do it, on top of consistency, is as follows:
- fmt_pkt in ovn testsuite is limited by python server listen backlog,
and as was found out when adopting the tool, it is sometimes useful to
run lots of parallel calls to fmt_pkt unixctl server in some tests.
(See [1] for example.)
- there is a recent report [2] on discuss@ ML where the reporter noticed
significant listen queue overflows in some scenarios (large openstack
deployments; happens during leader transition when hundreds of neutron
nodes - with dozens of neutron api workers each - simultaneously
reconnect to the same northbound leader.) Note: While there is no
clear indication that this backlog size bump would resolve the
reported issues, it would probably help somewhat.
4f29804...
by
Roi Dayan via dev <email address hidden>
netdev-dpdk: Fix possible memory leak configuring VF MAC address.
VLOG_WARN_BUF() is allocating memory for the error string and should
e used if the configuration cannot continue and error is being returned
so the caller has indication of releasing the pointer.
Change to VLOG_WARN() to keep the logic that error is not being
returned.
Fixes: f4336f504b17 ("netdev-dpdk: Add option to configure VF MAC address.")
Signed-off-by: Roi Dayan <email address hidden>
Acked-by: Gaetan Rivet <email address hidden>
Acked-by: Eli Britstein <email address hidden>
Signed-off-by: Simon Horman <email address hidden>