Comment 2 for bug 1004567

Revision history for this message
Mark W (deviant-dolphin) wrote : Re: pt-heartbeat --update --replace is causing duplicate key errors

Last night i had another HA Failover and Fail Back.
Upon Fail Back i got the same type of error.

Default database: 'test'. Query: 'INSERT INTO `test`.`heartbeat`
(server_id, ts) VALUES ('8006', NOW())'

I reviewed my binary log and indeed most of the time pt-heartbeat is sending a replace
but it occasionally does do an insert.

In the past I ran into this very same problem and i created a secondary cron job that deletes a row from the table that matches the current HA primary nodes server-id. In retrospect this would not really help because once replication brakes its to late.

So my heartbeat goes out every 2 minutes

*/2 * * * * nagios /usr/bin/pt-heartbeat --defaults-file /home/nagios/.my.cnf -D test --update --replace --create-table -h LBVIP -P LBPORT --run-time 1

Then i had a cron job deleting the row it inserted every 10 minutes. ( as of today i am no longer running this job)
*/10 * * * * nagios mysql -h LBVIP -P LBPORT test -e "DELETE FROM test.heartbeat WHERE server_id <> LBSERVERID "

I think the problem is that when table becomes empty the --create-table option fires off an insert , which does not respect the --update --replace parameters. I am not sure why this breaks replication because the slaves should also have an empty heartbeat table at this point.

I have attached the relevant portions my binlog from last night.

--