Crash resistant replication breaks with binlog XA transaction recovery

Bug #1012715 reported by Laurynas Biveinis
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
Fix Released
High
Laurynas Biveinis
5.1
Fix Released
High
Laurynas Biveinis
5.5
Fix Released
High
Laurynas Biveinis

Bug Description

Moved from bug 937852:

It seems that there are several potential issues with the crash-resistant replication. One way to fail 5.1 does not involve the case where slave-relay.info is overwritten:

1) Add a crash injection site at trx_commit_off_kernel. This will trigger during the XA 2PC
commit protocol in the COMMIT phase.
2) Replicate an event from master to slave that will trigger this crash.
3) At the time of the crash the relay log master log position will point to the crashed
prepared transaction at position X, relay log pos will point to Y, InnoDB transactional fields
will point to the same master log position and relay log position Z, Z < Y.
4) On the InnoDB crash recovery InnoDB will undo the prepared transaction.
5) On the binlog crash recovery InnoDB will redo and commit the prepared transaction.
6) The slave will attempt to start replication assuming position X for the master log and
position Y for the relay log.
7) Thus it will attempt to re-execute the transaction that was committed in 5)

Related branches

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

One of the causes is that the log positions are written in the 2PC commit COMMIT phase while that should happen in the PREPARE instead.

A second potential issue (not confirmed yet) is that slave-relay.info is overwritten too early, after the InnoDB crash recovery has run, but before the binlog crash recovery. It's possible however that this is purely theoretical issue right now.

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-564

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.