[SRU] Doesn't regain quorum when tracked process restarts with PID > 32767
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
keepalived (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Focal |
Fix Released
|
High
|
Lucas Kanashiro |
Bug Description
[Impact]
If a user is tracking a process with PID > 32767 keepalived will not be able to work as expected.
This bug was fixed upstream here: https:/
[Test Plan]
Launch a Focal VM and run the following commands:
# install keepalived and nginx
$ apt install -y nginx keepalived
# configure keepalived to track the nginx process
$ cat << EOF > /etc/keepalived
global_defs {
enable_
}
vrrp_track_process track_nginx {
process nginx
weight 10
delay 1
}
vrrp_instance lb {
interface enp5s0
state MASTER
priority 100
virtual_
authentication {
auth_type PASS
auth_pass password
}
track_process {
track_nginx
}
virtual_
}
}
EOF
$ systemctl restart keepalived
# stop nginx process to loose quorum
$ systemctl stop nginx
$ journalctl -u keepalived | grep Quorum
Feb 07 20:13:04 keepalived-debug Keepalived_
# start nginx process to gain quorum
$ systemctl start nginx
$ pidof nginx
3282 3281
$ journalctl -u keepalived | grep Quorum
Feb 07 20:13:04 keepalived-debug Keepalived_
Feb 07 20:19:58 keepalived-debug Keepalived_
# stop nginx process to loose quorum again
$ systemctl stop nginx
$ journalctl -u keepalived | grep Quorum
Feb 07 20:13:04 keepalived-debug Keepalived_
Feb 07 20:19:58 keepalived-debug Keepalived_
Feb 07 20:21:39 keepalived-debug Keepalived_
# start nginx process forcing its PID to be > 32767
$ echo 32767 > /proc/sys/
$ pidof nginx
32773 32772
# quorum is not gained again
$ journalctl -u keepalived | grep Quorum
Feb 07 20:13:04 keepalived-debug Keepalived_
Feb 07 20:19:58 keepalived-debug Keepalived_
Feb 07 20:21:39 keepalived-debug Keepalived_
To make sure the bug is fixed we need to install the fixed keepalived package, then stop and start the nginx process (with PID > 32767). After that, the quorum will be lost again and then regained:
$ systemctl stop nginx
$ systemctl start nginx
$ pidof nginx
33505 33504
$ journalctl -u keepalived | grep Quorum
Feb 08 14:46:47 keepalived-debug2 Keepalived_
Feb 08 14:47:00 keepalived-debug2 Keepalived_
Feb 08 14:47:19 keepalived-debug2 Keepalived_
Feb 08 14:49:01 keepalived-debug2 Keepalived_
Feb 08 14:49:14 keepalived-debug2 Keepalived_
[Where problems could occur]
The upstream fix is quite straightforward but if a problem would occur it would be manifested in any tracking process feature in keepalived. Since keepalived is widely used in HA, this might be reflected in some specific setups using keepalived.
[Original description]
Keepalived doesn't regain quorum when using tracked process due to a bug in high numbered pids
The upstream has already fixed in a patch release 2.0.20
https:/
Could we please get the 2.0.20 released to 20.04
Related branches
- Sergio Durigan Junior (community): Approve
- Canonical Server: Pending requested
-
Diff: 64 lines (+44/-0)3 files modifieddebian/changelog (+6/-0)
debian/patches/Fix-track_process-with-PID-greater-than-32767.patch (+37/-0)
debian/patches/series (+1/-0)
summary: |
- Doesnt regain quorum when tracked process restarts + [SRU] Doesn't regain quorum when tracked process restarts with PID > + 32767 |
description: | updated |
Changed in keepalived (Ubuntu Focal): | |
status: | Confirmed → In Progress |
description: | updated |
tags: |
added: verification-done removed: verification-needed verification-needed-focal |
Thanks for the report Jason!
This is fixed in: /github. com/acassen/ keepalived/ commit/ 23a5b8113bf0b8e c4718443df04068 82e8e4d831
https:/
Which is in 2.1.0 and later as well as in the 2.0.20 backport.
Thereby >=Impish are fixed already: 0.2ubuntu1. 1 | impish-updates | source, amd64, arm64, armhf, ppc64el, riscv64, s390x
keepalived | 1:2.0.19-2ubuntu0.1 | focal-updates | source, amd64, arm64, armhf, ppc64el, riscv64, s390x
keepalived | 1:2.1.5-
keepalived | 1:2.2.4-0.2 | jammy | source, amd64, arm64, armhf, ppc64el, riscv64, s390x
I'd leave it to Lucas who usually looks after HA bits to decide if we'd want to go for just fixing this or if we should consider 2.0.20 as a whole for a MRE.
Assigning him for further triaging ...