wsrep_sst_xtrabackup-v2 doesn't stop when mysql is SIGKILLed
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC | Status tracked in 5.6 | |||||
5.5 |
Fix Released
|
Medium
|
Raghavendra D Prabhu | |||
5.6 |
Fix Released
|
Medium
|
Raghavendra D Prabhu |
Bug Description
I see that all xtrabackup related services are present on system, though mysql process was killed by pacemaker. Also, MySQL cannot inititiate a new transfer as port 4444 is already utilized
How to reproduce
Run SST between the nodes
kill -9 mysqld
Expected
ps aux | egrep 'socat|
Got
ps aux | egrep 'socat|
mysql 7036 0.0 0.0 4408 612 ? S 15:18 0:00 sh -c wsrep_sst_
mysql 7037 0.0 0.0 9712 1740 ? S 15:18 0:00 /bin/bash -ue /usr//bin/
mysql 7323 0.0 0.0 26220 1768 ? S 15:18 0:00 socat -u TCP-LISTEN:
mysql 7324 0.0 0.0 84696 1852 ? S 15:18 0:00 xbstream -x
As a temporary fix you may kill all processes in question and re initiate SST.
Related branches
- Alexey Kopytov (community): Approve
- Raghavendra D Prabhu (community): Needs Resubmitting
-
Diff: 144 lines (+89/-5) (has conflicts)1 file modifiedscripts/wsrep_sst_xtrabackup-v2.sh (+89/-5)
- Alexey Kopytov (community): Needs Fixing
-
Diff: 133 lines (+63/-7)3 files modifiedpercona-xtradb-cluster-tests/conf/conf20.cnf-node1 (+1/-1)
percona-xtradb-cluster-tests/conf/conf20.cnf-node2 (+1/-1)
scripts/wsrep_sst_xtrabackup-v2.sh (+61/-5)
I see xtrabackup issuing select sys calls on the fd that is not open according to /proc file system
here is an output snippet:
root@node-11:~# ps aux | grep mysql xtrabackup- v2 --role 'joiner' --address '192.168.0.4' --auth 'wsrep_ sst:password' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --parent '7019' '' wsrep_sst_ xtrabackup- v2 --role joiner --address 192.168.0.4 --auth wsrep_sst:password --datadir /var/lib/mysql/ --defaults-file /etc/my.cnf --parent 7019 4444,reuseaddr, nodelay, sndbuf= 1048576, rcvbuf= 1048576 stdio ocf/resource. d/mirantis/ mysql-wss start 4444,reuseaddr, nodelay, sndbuf= 1048576, rcvbuf= 1048576 stdio 1-classic' (universe) 5-classic- common' (universe) nrpe-server' (main)
mysql 7036 0.0 0.0 4408 612 ? S 15:18 0:00 sh -c wsrep_sst_
mysql 7037 0.0 0.0 9712 1740 ? S 15:18 0:00 /bin/bash -ue /usr//bin/
mysql 7323 0.0 0.0 26220 1768 ? S 15:18 0:00 socat -u TCP-LISTEN:
mysql 7324 0.0 0.0 84696 1852 ? S 15:18 0:00 xbstream -x
root 19155 0.0 0.0 12708 1940 ? S 15:42 0:00 /bin/bash /usr/lib/
root 21767 0.0 0.0 9396 928 pts/3 S+ 15:45 0:00 grep --color=auto mysql
root@node-11:~# netstat -ntlp | grep 7323
root@node-11:~# ps aux | grep socat
mysql 7323 0.0 0.0 26220 1768 ? S 15:18 0:00 socat -u TCP-LISTEN:
root 22040 0.0 0.0 9396 928 pts/3 S+ 15:46 0:00 grep --color=auto socat
root@node-11:~# netstat -nlp | grep 4444
root@node-11:~# ps aux | grpe 4444
No command 'grpe' found, did you mean:
Command 'grape' from package 'groovy' (universe)
Command 'grpn' from package 'grpn' (universe)
Command 'gpre' from package 'firebird2.5-super' (universe)
Command 'gpre' from package 'firebird2.
Command 'gpre' from package 'firebird2.
Command 'gpre' from package 'firebird2.1-super' (universe)
Command 'grep' from package 'grep' (main)
Command 'gvpe' from package 'gvpe' (universe)
Command 'nrpe' from package 'nagios-
grpe: command not found
root@node-11:~# strace -f -p 7323
Process 7323 attached - interrupt to quit
select(12, [11], [], [], NULL^C <unfinished ...>
Process 7323 detached
root@node-11:~# cat /proc/7323/fd
fd/ fdinfo/
root@node-11:~# cat /proc/7323/fd
fd/ fdinfo/
root@node-11:~# cat /proc/7323/fd/
0 1 10 11 14 15 2 3 5 6 7 8 9
As you can see, xtrabackup is running select() syscall on non-existing socket