Comment 27 for bug 1817484

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Alberto,

It seems that with the patch from #24 (applied on top of 2.5.0) your settings get applied to sockets created by regiond and override the system ones as intended:

+ 'keepalives': 1,
+ 'keepalives_idle': 15,
+ 'keepalives_interval': 15,
+ 'keepalives_count': 2

I built knetstat with this patch https://github.com/veithen/knetstat/pull/17 to test:

sysctl -a --pattern keepalive
net.ipv4.tcp_keepalive_intvl = 1
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_time = 200

sudo cat /proc/net/tcpstat | grep 5432 | tail -n2
     0 0 100.64.0.3:56968 100.64.0.254:5432 ESTB SO_REUSEADDR=0,SO_REUSEPORT=0,SO_KEEPALIVE=1,TCP_KEEPIDLE=15,TCP_KEEPCNT=2,TCP_KEEPINTVL=15,TCP_NODELAY=1,TCP_DEFER_ACCEPT=0
     0 0 100.64.0.3:56870 100.64.0.254:5432 ESTB SO_REUSEADDR=0,SO_REUSEPORT=0,SO_KEEPALIVE=1,TCP_KEEPIDLE=15,TCP_KEEPCNT=2,TCP_KEEPINTVL=15,TCP_NODELAY=1,TCP_DEFER_ACCEPT=0

I also tested the DB failover scenario a few times - it seems to work reliably with our DNS notification use-case.

What I would advise considering:

1) making keepalive parameters tunable via MAAS settings (for region and rack controllers) so that we can apply more aggressive keepalives if needed at the socket level;

2) check if similar patching needs to be done for rack controllers (not sure if you have listeners there as with regiond).