Percona Server moved to https://jira.percona.com/projects/PS

Merge lp:~laurynas-biveinis/percona-server/bug1012715-5.1 into lp:percona-server/5.1

bug1012715-5.1
Merge into 5.1

Proposed by Laurynas Biveinis on 2012-07-18

Status:

Superseded

Proposed branch:

lp:~laurynas-biveinis/percona-server/bug1012715-5.1

Merge into:

lp:percona-server/5.1

Diff against target:

878 lines (+527/-170)

7 files modified

Percona-Server/mysql-test/suite/rpl/r/rpl_percona_crash_resistant_rpl.result (+54/-0)
Percona-Server/mysql-test/suite/rpl/t/rpl_percona_crash_resistant_rpl-slave.opt (+1/-0)
Percona-Server/mysql-test/suite/rpl/t/rpl_percona_crash_resistant_rpl.test (+117/-0)
Percona-Server/storage/innodb_plugin/handler/ha_innodb.cc (+225/-113)
Percona-Server/storage/innodb_plugin/include/trx0sys.h (+15/-1)
Percona-Server/storage/innodb_plugin/trx/trx0sys.c (+96/-54)
Percona-Server/storage/innodb_plugin/trx/trx0trx.c (+19/-2)

To merge this branch:

bzr merge lp:~laurynas-biveinis/percona-server/bug1012715-5.1

High

Fix Released

Link a bug report

Reviewer	Date Requested	Status
Alexey Kopytov (community)	2012-07-18	Needs Fixing on 2012-08-08
Percona core	2012-07-18	Pending
Laurynas Biveinis		Pending
Review via email: mp+115485@code.launchpad.net

This proposal supersedes a proposal from 2012-06-29.

This proposal has been superseded by a proposal from 2012-08-17.

Description of the change

Issue 22478.

Fix bug 1012715. See the revision commit message for the changes.

Jenkins: http://jenkins.percona.com/job/percona-server-5.1-param/345/

Some of the refactoring might seem excessive for this bug fix. However, I introduced the new functions in order to avoid copy-pasting the same code, and new file-scope log variables in ha_innodb.cc to preserve my sanity.

The functional difference from the previous MP is that both transaction commit and rollback are tested.

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2012-06-28: Posted in a previous version of this proposal

Found one issue myself.

The testcase has one sync bug: lines 88--91 do not sync slave with master first, thus might result in slave shutdown while slave SQL thread is still executing.

review: Needs Fixing

Revision history for this message

Alexey Kopytov (akopytov) wrote on 2012-07-03: Posted in a previous version of this proposal

Laurynas,

The assumption this patch is based on looks wrong to me, i.e.:

> + XA COMMIT. In contrast to that, the slave position is an
> + actual part of the changes made by this transaction and thus
> + must be updated in the XA PREPARE stage. */

A prepared transaction may either be committed or rolled back (see xarecover_handlerton()) depending on whether the corresponding record made it to the binary log (and if the binary log is used at all).

The correct solution would be to update slave coordinates when commiting the corresponding XA transaction on recovery, rather than moving the code storing slave coordinates to persistent storage from COMMIT to PREPARE stage. The problem is that slave coordinates are not available at the point, i.e. when committing prepared transactions on recovery.

I wonder if we can fix this by introducing another set of slave coordinates in the trx header. The we can only update those fields on PREPARE, and update the regular TRX_SYS_MYSQL_RELAY_LOG_INFO fields on COMMIT. On recovery, we could copy the "prepare" fields to "committed" ones and overwrite the relay log info file, *if* the PREPAREd transaction is being committed, e.g. in innobase_commit_by_xid(). What do you think?

Minor comments on the test case:
- do we really need the "rpl_" prefix in rpl_percona_crash_resistant_rpl.*?
- please use

SET GLOBAL debug="+d,keyword"

instead of

SET GLOBAL debug="d,keyword"

The latter breaks ./mtr --debug.

review: Needs Fixing

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2012-07-03: Posted in a previous version of this proposal

Download full text (3.3 KiB)

Alexey -

Thanks. Could you please expand on your comment on why you think this assumption is wrong -

> The assumption this patch is based on looks wrong to me, i.e.:
>
> > + XA COMMIT. In contrast to that, the slave position is an
> > + actual part of the changes made by this transaction and thus
> > + must be updated in the XA PREPARE stage. */

because the following explanation matches my assumptions precisely:

> A prepared transaction may either be committed or rolled back (see
> xarecover_handlerton()) depending on whether the corresponding record made it
> to the binary log (and if the binary log is used at all).

Namely, why is the assumption "slave position update is a part of actual transaction changes" wrong in the light of this? If a prepared transaction is rolled back, the old slave position is restored from an undo seg. If it is 2pc-committed, the new slave position becomes permanent.

Maybe you view this fix as incomplete or working by chance because of the following (which is the actual action sequence that happens in this bug): on crash recovery the relay status log overwrite will happen before the XA rollback, thus the slave position will point to as if the transaction was fully committed. Then the transaction will be rolled back, (which would require overwriting relay status log with the old position) and replayed from the binlog, (which would require overwriting the relay status log position again, but binlog does not have the required info for that). This results in correct positions, although with a shortcut taken. The assumption here that all replicated InnoDB XA prepared transactions will be eventually committed. It is not perfect but IMHO "slave position is a part of transaction itself" is a step to the right direction.

> The correct solution would be to update slave coordinates when commiting the
> corresponding XA transaction on recovery, rather than moving the code storing
> slave coordinates to persistent storage from COMMIT to PREPARE stage. The
> problem is that slave coordinates are not available at the point, i.e. when
> committing prepared transactions on recovery.

Fully agreed modulus that ATM I think it's "one of the possible correct solutions."

> I wonder if we can fix this by introducing another set of slave coordinates in
> the trx header. The we can only update those fields on PREPARE, and update the
> regular TRX_SYS_MYSQL_RELAY_LOG_INFO fields on COMMIT. On recovery, we could
> copy the "prepare" fields to "committed" ones and overwrite the relay log info
> file, *if* the PREPAREd transaction is being committed, e.g. in
> innobase_commit_by_xid(). What do you think?

Upon the first thought it seems workable, however I would like to postpone further discussion until we agree on the previous point, because there are other bugs in crash-resistant replication bug too, fixing which might require implementing "transactional system table for slave relay log," and if we have to do that, then we should discuss such fixes as a whole.

> Minor comments on the test case:
> - do we really need the "rpl_" prefix in rpl_percona_crash_resistant_rpl.*?

Looks funny to me too...

Alexey -

Thanks. Could you please expand on your comment on why you think this assumption is wrong -

> The assumption this patch is based on looks wrong to me, i.e.:
> 
> > +             XA COMMIT.  In contrast to that, the slave position is an
> > +             actual part of the changes made by this transaction and thus
> > +             must be updated in the XA PREPARE stage.  */

because the following explanation matches my assumptions precisely:

Fully agreed modulus that ATM I think it's "one of the possible correct solutions."

> Minor comments on the test case:
> - do we really need the "rpl_" prefix in rpl_percona_crash_resistant_rpl.*?

Looks funny to me too, I just didn't want the convention of all rpl suite tests having a "rpl_" prefix, even though that seems historical.

> - please use
> 
> SET GLOBAL debug="+d,keyword"

Will do.

Revision history for this message

Alexey Kopytov (akopytov) wrote on 2012-07-04: Posted in a previous version of this proposal

On 07/03/2012 08:35 PM, Laurynas Biveinis wrote:
=> Namely, why is the assumption "slave position update is a part of
actual transaction changes" wrong in the light of this? If a prepared
transaction is rolled back, the old slave position is restored from an
undo seg. If it is 2pc-committed, the new slave position becomes permanent.
>

Right, it didn't occur to me that updates of the trx header fields are
done as a part of the current transaction. If they are rolled back when
a prepared transaction is rolled back, then the fix is correct. Can we
have it covered by the test case?

Though it's not clear then what the difference with the binlog position
update is, i.e. what following means:

> + /* Update the replication position info inside InnoDB. This is
> + different from the binlog position update that happens during
> + XA COMMIT. In contrast to that, the slave position is an

> Maybe you view this fix as incomplete or working by chance because of the following (which is the actual action sequence that happens in this bug): on crash recovery the relay status log overwrite will happen before the XA rollback, thus the slave position will point to as if the transaction was fully committed. Then the transaction will be rolled back, (which would require overwriting relay status log with the old position) and replayed from the binlog, (which would require overwriting the relay status log position again, but binlog does not have the required info for that). This results in correct positions, although with a shortcut taken. The assumption here that all replicated InnoDB XA prepared transactions will be eventually committed. It is not perfect but IMHO "slave position is a part of transaction itself" is a step to the right direction.
>

I don't understand this. When and how a replay from binlog of a rolled
back XA transaction occurs? If I'm not mistaken, a roll back happens
when the corresponding event is _not_ in the binary log (or binlog not
used at all).

On 07/03/2012 08:35 PM, Laurynas Biveinis wrote:
=> Namely, why is the assumption "slave position update is a part of 
actual transaction changes" wrong in the light of this? If a prepared 
transaction is rolled back, the old slave position is restored from an 
undo seg. If it is 2pc-committed, the new slave position becomes permanent.
>

Right, it didn't occur to me that updates of the trx header fields are 
done as a part of the current transaction. If they are rolled back when 
a prepared transaction is rolled back, then the fix is correct. Can we 
have it covered by the test case?

Though it's not clear then what the difference with the binlog position 
update is, i.e. what following means:

> +		/* Update the replication position info inside InnoDB.  This is
> +		different from the binlog position update that happens during
> +		XA COMMIT.  In contrast to that, the slave position is an

I don't understand this. When and how a replay from binlog of a rolled 
back XA transaction occurs? If I'm not mistaken, a roll back happens 
when the corresponding event is _not_ in the binary log (or binlog not 
used at all).

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2012-07-04: Posted in a previous version of this proposal

> On 07/03/2012 08:35 PM, Laurynas Biveinis wrote:
> => Namely, why is the assumption "slave position update is a part of
> actual transaction changes" wrong in the light of this? If a prepared
> transaction is rolled back, the old slave position is restored from an
> undo seg. If it is 2pc-committed, the new slave position becomes permanent.
> >
>
> Right, it didn't occur to me that updates of the trx header fields are
> done as a part of the current transaction. If they are rolled back when
> a prepared transaction is rolled back, then the fix is correct. Can we
> have it covered by the test case?

Good idea, I will work on this if the points below are agreed upon. I will probably use the explicit XA syntax.

> Though it's not clear then what the difference with the binlog position
> update is, i.e. what following means:
>
> > + /* Update the replication position info inside InnoDB. This
> is
> > + different from the binlog position update that happens during
> > + XA COMMIT. In contrast to that, the slave position is an

My understanding is that binlog position record in InnoDB means "InnoDB transactions are committed (not prepared) up to this binlog position." Thus it cannot possibly go back and in contrast to the slave info log, it is not part of the transaction itself, but rather InnoDB/binlog metadata of sorts. I briefly experimented with moving this to PREPARE too and broke crash recovery even worse. I can research to provide more info if you want me to.

> > Maybe you view this fix as incomplete or working by chance because of the
> following (which is the actual action sequence that happens in this bug): on
> crash recovery the relay status log overwrite will happen before the XA
> rollback, thus the slave position will point to as if the transaction was
> fully committed. Then the transaction will be rolled back, (which would
> require overwriting relay status log with the old position) and replayed from
> the binlog, (which would require overwriting the relay status log position
> again, but binlog does not have the required info for that). This results in
> correct positions, although with a shortcut taken. The assumption here that
> all replicated InnoDB XA prepared transactions will be eventually committed.
> It is not perfect but IMHO "slave position is a part of transaction itself" is
> a step to the right direction.
> >
>
> I don't understand this. When and how a replay from binlog of a rolled
> back XA transaction occurs? If I'm not mistaken, a roll back happens
> when the corresponding event is _not_ in the binary log (or binlog not
> used at all).

Right, sorry, memory failed me. The transaction is never rolledback, it sits there in prepared state on crash recovery and is committed during the binlog crash recovery. The fix still works as designed.

> On 07/03/2012 08:35 PM, Laurynas Biveinis wrote:
> => Namely, why is the assumption "slave position update is a part of
> actual transaction changes" wrong in the light of this? If a prepared
> transaction is rolled back, the old slave position is restored from an
> undo seg. If it is 2pc-committed, the new slave position becomes permanent.
> >
> 
> Right, it didn't occur to me that updates of the trx header fields are
> done as a part of the current transaction. If they are rolled back when
> a prepared transaction is rolled back, then the fix is correct. Can we
> have it covered by the test case?

Good idea, I will work on this if the points below are agreed upon. I will probably use the explicit XA syntax.

> Though it's not clear then what the difference with the binlog position
> update is, i.e. what following means:
> 
> > +             /* Update the replication position info inside InnoDB.  This
> is
> > +             different from the binlog position update that happens during
> > +             XA COMMIT.  In contrast to that, the slave position is an

> > Maybe you view this fix as incomplete or working by chance because of the
> following (which is the actual action sequence that happens in this bug): on
> crash recovery the relay status log overwrite will happen before the XA
> rollback, thus the slave position will point to as if the transaction was
> fully committed. Then the transaction will be rolled back, (which would
> require overwriting relay status log with the old position) and replayed from
> the binlog, (which would require overwriting the relay status log position
> again, but binlog does not have the required info for that). This results in
> correct positions, although with a shortcut taken. The assumption here that
> all replicated InnoDB XA prepared transactions will be eventually committed.
> It is not perfect but IMHO "slave position is a part of transaction itself" is
> a step to the right direction.
> >
> 
> I don't understand this. When and how a replay from binlog of a rolled
> back XA transaction occurs? If I'm not mistaken, a roll back happens
> when the corresponding event is _not_ in the binary log (or binlog not
> used at all).

Revision history for this message

Alexey Kopytov (akopytov) wrote on 2012-07-04: Posted in a previous version of this proposal

On 07/04/2012 05:50 PM, Laurynas Biveinis wrote:
>> On 07/03/2012 08:35 PM, Laurynas Biveinis wrote:
>> => Namely, why is the assumption "slave position update is a part of
>> actual transaction changes" wrong in the light of this? If a prepared
>> transaction is rolled back, the old slave position is restored from an
>> undo seg. If it is 2pc-committed, the new slave position becomes permanent.
>>>
>>
>> Right, it didn't occur to me that updates of the trx header fields are
>> done as a part of the current transaction. If they are rolled back when
>> a prepared transaction is rolled back, then the fix is correct. Can we
>> have it covered by the test case?
>
> Good idea, I will work on this if the points below are agreed upon. I will probably use the explicit XA syntax.
>

There are also existing injection sites that you may want to use, e.g.
crash_commit_after_prepare will crash the server after preparing a
transaction, but before writing a xid event to binlog, so that would
theoretically lead to a rollback on recovery.

>> Though it's not clear then what the difference with the binlog position
>> update is, i.e. what following means:
>>
>>> + /* Update the replication position info inside InnoDB. This
>> is
>>> + different from the binlog position update that happens during
>>> + XA COMMIT. In contrast to that, the slave position is an
>
> My understanding is that binlog position record in InnoDB means "InnoDB transactions are committed (not prepared) up to this binlog position." Thus it cannot possibly go back and in contrast to the slave info log, it is not part of the transaction itself, but rather InnoDB/binlog metadata of sorts. I briefly experimented with moving this to PREPARE too and broke crash recovery even worse. I can research to provide more info if you want me to.
>

OK, I see what you meant to say, no need for details. Thanks for
clarifications.

Revision history for this message

Alexey Kopytov (akopytov) wrote on 2012-08-08:

Laurynas,

   - many lines exceed the 80 line width limit
   - it is a good practice to end replication test cases with "--source
     include/rpl_end.inc"
   - Yoda notation in comparisons ("0 == ...")
   - variable declarations in the middle of a block is C99 (i.e. more
     Windows incompatibilities)
   - unnecessary (char *) cast for the first argument in bzero()
   - I don't think we actually need IO_CACHE in
     innobase_do_overwrite_relay_log_info(). it's basically an enhanced
     version of fwrite() & friends, whereas we only want a single write
     of buff to the file.
   - no spaces around '=' sign in many places
   - no braces in single-statement if()s
   - in general, I would suggest to fix all of the above by leaving the
     (incorrectly formatted) code alone and not moving it into a
     separate function. That would save me about half an hour on reading
     and commenting on changes that are not really changes but rather
     moving the code around, and allow me to focus on what's really been
     changed.
   - same goes for fname -> info_fname renaming
   - otherwise the patch looks good, but in case you decide to follow my
     suggestion and revert unnecessary changes, I'd like to take another
     review round. It's too easy to miss important things with so much
     insignificant changes.

review: Needs Fixing

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2012-08-09:

Alexey -

Thanks for the review

> - many lines exceed the 80 line width limit
> - it is a good practice to end replication test cases with "--source
> include/rpl_end.inc"
> - Yoda notation in comparisons ("0 == ...")

Noted.

> - variable declarations in the middle of a block is C99 (i.e. more
> Windows incompatibilities)

Isn't ha_innodb.cc C++? But I will fix them in any case.

> - unnecessary (char *) cast for the first argument in bzero()
> - I don't think we actually need IO_CACHE in
> innobase_do_overwrite_relay_log_info(). it's basically an enhanced
> version of fwrite() & friends, whereas we only want a single write
> of buff to the file.
> - no spaces around '=' sign in many places
> - no braces in single-statement if()s

Noted.

> - in general, I would suggest to fix all of the above by leaving the
> (incorrectly formatted) code alone and not moving it into a
> separate function. That would save me about half an hour on reading
> and commenting on changes that are not really changes but rather
> moving the code around, and allow me to focus on what's really been
> changed.
> - same goes for fname -> info_fname renaming

There are two places where I extracted new functions from existing code: in ha_innodb.cc and in trx0sys.c. Which one of them, or both, are you referring to? The trx0sys.c one I can revert, but IMHO it is easy to review too. Re. ha_innodb.cc changes, I have extracted innobase_do_overwrite_relay_log_info() because I needed to call it from another function as well and I don't see a good alternative to making a new function: if I copy pasted the code, I'd still need to adjust it heavily due to different local var context and the end result would be very close to innobase_do_overwrite_relay_log_info() anyway then. And I did the fname/pos variable rename & split, because these variables being repurposed five times is way beyond my pain threshold and extracted from innobase_setup() they stop working anyway.

Is there any way I can make the review easier with separate function? A separate MP with no functional changes perhaps?

Alexey -

Thanks for the review

>    - many lines exceed the 80 line width limit
>    - it is a good practice to end replication test cases with "--source
>      include/rpl_end.inc"
>    - Yoda notation in comparisons ("0 == ...")

Noted.

>    - variable declarations in the middle of a block is C99 (i.e. more
>      Windows incompatibilities)

Isn't ha_innodb.cc C++? But I will fix them in any case.

>    - unnecessary (char *) cast for the first argument in bzero()
>    - I don't think we actually need IO_CACHE in
>      innobase_do_overwrite_relay_log_info(). it's basically an enhanced
>      version of fwrite() & friends, whereas we only want a single write
>      of buff to the file.
>    - no spaces around '=' sign in many places
>    - no braces in single-statement if()s

Noted.

>    - in general, I would suggest to fix all of the above by leaving the
>      (incorrectly formatted) code alone and not moving it into a
>      separate function. That would save me about half an hour on reading
>      and commenting on changes that are not really changes but rather
>      moving the code around, and allow me to focus on what's really been
>      changed.
>    - same goes for fname -> info_fname renaming

There are two places where I extracted new functions from existing code: in ha_innodb.cc and in trx0sys.c. Which one of them, or both, are you referring to?  The trx0sys.c one I can revert, but IMHO it is easy to review too. Re. ha_innodb.cc changes, I have extracted innobase_do_overwrite_relay_log_info() because I needed to call it from another function as well and I don't see a good alternative to making a new function: if I copy pasted the code, I'd still need to adjust it heavily due to different local var context and the end result would be very close to innobase_do_overwrite_relay_log_info() anyway then. And I did the fname/pos variable rename & split, because these variables being repurposed five times is way beyond my pain threshold and extracted from innobase_setup() they stop working anyway.

Is there any way I can make the review easier with separate function? A separate MP with no functional changes perhaps?

Revision history for this message

Alexey Kopytov (akopytov) wrote on 2012-08-09:

On 09.08.12 6:30, Laurynas Biveinis wrote:
>
> Isn't ha_innodb.cc C++? But I will fix them in any case.
>

Right, it's C++, sorry.

>> - in general, I would suggest to fix all of the above by leaving the
>> (incorrectly formatted) code alone and not moving it into a
>> separate function. That would save me about half an hour on reading
>> and commenting on changes that are not really changes but rather
>> moving the code around, and allow me to focus on what's really been
>> changed.
>> - same goes for fname -> info_fname renaming
>
> There are two places where I extracted new functions from existing code: in ha_innodb.cc and in trx0sys.c. Which one of them, or both, are you referring to? The trx0sys.c one I can revert, but IMHO it is easy to review too. Re. ha_innodb.cc changes, I have extracted innobase_do_overwrite_relay_log_info() because I needed to call it from another function as well and I don't see a good alternative to making a new function: if I copy pasted the code, I'd still need to adjust it heavily due to different local var context and the end result would be very close to innobase_do_overwrite_relay_log_info() anyway then. And I did the fname/pos variable rename & split, because these variables being repurposed five times is way beyond my pain threshold and extracted from innobase_setup() they stop working anyway.
>

I was referring to innobase_do_overwrite_relay_log_info().

The second call of the function looked like a part of the comment to me
when I grepped:

+ /* On rollback of a prepared transaction revert the
+ current slave positions to the ones recorded by the
+ last COMMITTed transaction. This has an effect of
+ undoing the position change caused by the transaction
+ being rolled back. Assumes single-threaded slave SQL
+ thread. If the server has non-master write traffic
+ with XA rollbacks, this will cause additional spurious
+ slave info log overwrites, which should be harmless. */
+ trx_sys_print_committed_mysql_master_log_pos();
+ innobase_do_overwrite_relay_log_info();

In the InnoDB code multi-line comments are usually separated from code
with blank lines. That is easier to read.

On 09.08.12 6:30, Laurynas Biveinis wrote:
> 
> Isn't ha_innodb.cc C++? But I will fix them in any case.
>

Right, it's C++, sorry.

>>    - in general, I would suggest to fix all of the above by leaving the
>>      (incorrectly formatted) code alone and not moving it into a
>>      separate function. That would save me about half an hour on reading
>>      and commenting on changes that are not really changes but rather
>>      moving the code around, and allow me to focus on what's really been
>>      changed.
>>    - same goes for fname -> info_fname renaming
> 
> There are two places where I extracted new functions from existing code: in ha_innodb.cc and in trx0sys.c. Which one of them, or both, are you referring to?  The trx0sys.c one I can revert, but IMHO it is easy to review too. Re. ha_innodb.cc changes, I have extracted innobase_do_overwrite_relay_log_info() because I needed to call it from another function as well and I don't see a good alternative to making a new function: if I copy pasted the code, I'd still need to adjust it heavily due to different local var context and the end result would be very close to innobase_do_overwrite_relay_log_info() anyway then. And I did the fname/pos variable rename & split, because these variables being repurposed five times is way beyond my pain threshold and extracted from innobase_setup() they stop working anyway.
>

I was referring to innobase_do_overwrite_relay_log_info().

The second call of the function looked like a part of the comment to me
when I grepped:

+			/* On rollback of a prepared transaction revert the
+			current slave positions to the ones recorded by the
+			last COMMITTed transaction.  This has an effect of
+			undoing the position change caused by the transaction
+			being rolled back.  Assumes single-threaded slave SQL
+			thread.  If the server has non-master write traffic
+			with XA rollbacks, this will cause additional spurious
+			slave info log overwrites, which should be harmless. */
+			trx_sys_print_committed_mysql_master_log_pos();
+			innobase_do_overwrite_relay_log_info();

In the InnoDB code multi-line comments are usually separated from code
with blank lines. That is easier to read.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Alejandra Bedolla Diaz

Alexey Kopytov

Laurynas Biveinis

Otávio Fernandes

Percona developers

 === added file 'Percona-Server/mysql-test/suite/rpl/r/rpl_percona_crash_resistant_rpl.result'
 --- Percona-Server/mysql-test/suite/rpl/r/rpl_percona_crash_resistant_rpl.result	1970-01-01 00:00:00 +0000
 +++ Percona-Server/mysql-test/suite/rpl/r/rpl_percona_crash_resistant_rpl.result	2012-08-16 13:46:15 +0000
@@ -0,0 +1,54 @@
++include/master-slave.inc
++[connection master]
++DROP TABLE IF EXISTS t1;
++CREATE TABLE t1 (id INT(11) NOT NULL AUTO_INCREMENT, PRIMARY KEY(id)) ENGINE=InnoDB;
++INSERT INTO t1 VALUES ();
++SELECT COUNT(*) FROM t1;
++COUNT(*)
++1
++include/rpl_restart_server.inc [server_number=2]
++include/start_slave.inc
++SELECT COUNT(*) FROM t1;
++COUNT(*)
++1
++STOP SLAVE;
++include/wait_for_slave_to_stop.inc
++INSERT INTO t1 VALUES();
++SELECT COUNT(*) FROM t1;
++COUNT(*)
++2
++SET GLOBAL debug="+d,crash_commit_before";
++START SLAVE;
++include/rpl_start_server.inc [server_number=2]
++include/start_slave.inc
++SELECT COUNT(*) FROM t1;
++COUNT(*)
++2
++STOP SLAVE;
++include/wait_for_slave_to_stop.inc
++INSERT INTO t1 VALUES();
++SELECT COUNT(*) FROM t1;
++COUNT(*)
++3
++SET GLOBAL debug="+d,crash_innodb_after_prepare";
++START SLAVE;
++include/rpl_start_server.inc [server_number=2]
++include/start_slave.inc
++SELECT COUNT(*) FROM t1;
++COUNT(*)
++3
++STOP SLAVE;
++include/wait_for_slave_to_stop.inc
++INSERT INTO t1 VALUES();
++SELECT COUNT(*) FROM t1;
++COUNT(*)
++4
++SET GLOBAL debug="+d,crash_innodb_before_commit";
++START SLAVE;
++include/rpl_start_server.inc [server_number=2]
++include/start_slave.inc
++SELECT COUNT(*) FROM t1;
++COUNT(*)
++4
++DROP TABLE t1;
++include/rpl_end.inc
 === added file 'Percona-Server/mysql-test/suite/rpl/t/rpl_percona_crash_resistant_rpl-slave.opt'
 --- Percona-Server/mysql-test/suite/rpl/t/rpl_percona_crash_resistant_rpl-slave.opt	1970-01-01 00:00:00 +0000
 +++ Percona-Server/mysql-test/suite/rpl/t/rpl_percona_crash_resistant_rpl-slave.opt	2012-08-16 13:46:15 +0000
@@ -0,0 +1,1 @@
++--innodb-overwrite-relay-log-info=TRUE --skip-core-file --skip-stack-trace
 === added file 'Percona-Server/mysql-test/suite/rpl/t/rpl_percona_crash_resistant_rpl.test'
 --- Percona-Server/mysql-test/suite/rpl/t/rpl_percona_crash_resistant_rpl.test	1970-01-01 00:00:00 +0000
 +++ Percona-Server/mysql-test/suite/rpl/t/rpl_percona_crash_resistant_rpl.test	2012-08-16 13:46:15 +0000
@@ -0,0 +1,117 @@
++# Tests for Percona crash-resistant replication feature
++--source include/have_innodb_plugin.inc
++--source include/master-slave.inc
++--source include/not_valgrind.inc
++--source include/not_crashrep.inc
++--source include/have_debug.inc
++
++#
++# Setup
++#
++
++--disable_query_log
++call mtr.add_suppression("InnoDB: Warning: innodb_overwrite_relay_log_info is enabled.");
++--enable_query_log
++
++connection master;
++
++--disable_warnings
++DROP TABLE IF EXISTS t1;
++--enable_warnings
++
++CREATE TABLE t1 (id INT(11) NOT NULL AUTO_INCREMENT, PRIMARY KEY(id)) ENGINE=InnoDB;
++
++#
++# Test the non-crashing case
++#
++
++INSERT INTO t1 VALUES ();
++SELECT COUNT(*) FROM t1;
++
++sync_slave_with_master;
++--let $rpl_server_number= 2
++--source include/rpl_restart_server.inc
++--source include/start_slave.inc
++SELECT COUNT(*) FROM t1;
++
++#
++# Test the crashing case where relay-log.info needs not to be overwritten
++#
++
++STOP SLAVE;
++--source include/wait_for_slave_to_stop.inc
++
++connection master;
++INSERT INTO t1 VALUES();
++SELECT COUNT(*) FROM t1;
++
++connection slave;
++SET GLOBAL debug="+d,crash_commit_before";
++--exec echo "restart" > $MYSQLTEST_VARDIR/tmp/mysqld.2.expect
++--error 0,2013
++START SLAVE;
++--source include/wait_until_disconnected.inc
++--enable_reconnect
++
++--let $rpl_server_number= 2
++--source include/rpl_start_server.inc
++--source include/start_slave.inc
++connection master;
++sync_slave_with_master;
++SELECT COUNT(*) FROM t1;
++
++#
++# Test the rollback of slave position stored in the InnoDB trx header.
++#
++STOP SLAVE;
++--source include/wait_for_slave_to_stop.inc
++
++connection master;
++INSERT INTO t1 VALUES();
++SELECT COUNT(*) FROM t1;
++
++connection slave;
++SET GLOBAL debug="+d,crash_innodb_after_prepare";
++--exec echo "restart" > $MYSQLTEST_VARDIR/tmp/mysqld.2.expect
++--error 0,2013
++START SLAVE;
++--source include/wait_until_disconnected.inc
++--enable_reconnect
++
++--let $rpl_server_number= 2
++--source include/rpl_start_server.inc
++--source include/start_slave.inc
++connection master;
++sync_slave_with_master;
++SELECT COUNT(*) FROM t1;
++
++#
++# Test crash with XA transaction recovery (bug 1012715)
++#
++STOP SLAVE;
++--source include/wait_for_slave_to_stop.inc
++connection master;
++INSERT INTO t1 VALUES();
++SELECT COUNT(*) FROM t1;
++
++connection slave;
++SET GLOBAL debug="+d,crash_innodb_before_commit";
++--exec echo "restart" > $MYSQLTEST_VARDIR/tmp/mysqld.2.expect
++--error 0,2013
++START SLAVE;
++--source include/wait_until_disconnected.inc
++--enable_reconnect
++
++--let $rpl_server_number= 2
++--source include/rpl_start_server.inc
++--source include/start_slave.inc
++SELECT COUNT(*) FROM t1;
++
++#
++# Cleanup
++#
++
++connection master;
++DROP TABLE t1;
++
++--source include/rpl_end.inc
 === modified file 'Percona-Server/storage/innodb_plugin/handler/ha_innodb.cc'
 --- Percona-Server/storage/innodb_plugin/handler/ha_innodb.cc	2012-07-02 02:04:45 +0000
 +++ Percona-Server/storage/innodb_plugin/handler/ha_innodb.cc	2012-08-16 13:46:15 +0000
@@ -2093,6 +2093,118 @@
  	reset_template(prebuilt);
+ }
++/* The last read master log coordinates in the slave info file */
++static char	master_log_fname[TRX_SYS_MYSQL_MASTER_LOG_NAME_LEN] = "";
++static int	master_log_pos;
++/* The slave relay log coordinates in the slave info file after startup */
++static char	original_relay_log_fname[TRX_SYS_MYSQL_MASTER_LOG_NAME_LEN] = "";
++static int	original_relay_log_pos;
++/* The master log coordinates in the slave info file after startup */
++static char	original_master_log_fname[TRX_SYS_MYSQL_MASTER_LOG_NAME_LEN] = "";
++static int	original_master_log_pos;
++
++/*****************************************************************//**
++Overwrites the MySQL relay log info file with the current master and relay log
++coordinates from InnoDB.  Skips overwrite if the master log position did not
++change from the last overwrite.  If the InnoDB master log position is equal
++to position that was read from the info file on startup before any overwrites,
++restores the original positions. */
++static
++void
++innobase_do_overwrite_relay_log_info(void)
++/*======================================*/
++{
++	char	info_fname[FN_REFLEN];
++	File	info_fd = -1;
++	int	error	= 0;
++	char	buff[FN_REFLEN*2+22*2+4];
++	char	*relay_info_log_pos;
++	size_t	buf_len;
++
++	if (master_log_fname[0] == '\0') {
++		fprintf(stderr,
++			"InnoDB: something wrong with relay-log.info. "
++			"InnoDB will not overwrite it.\n");
++		return;
++	}
++
++	if (strcmp(master_log_fname, trx_sys_mysql_master_log_name) == 0
++	    && master_log_pos == trx_sys_mysql_master_log_pos) {
++		fprintf(stderr,
++			"InnoDB: InnoDB and relay-log.info are synchronized. "
++			"InnoDB will not overwrite it.\n");
++		return;
++	}
++
++	/* If we overwrite the file back to the original master log position,
++	restore the original relay log position too.  This is required because
++	we might have rolled back a prepared transaction and restored the
++	original master log position from the InnoDB trx sys header, but the
++	corresponding relay log position points to an already-purged file. */
++	if (strcmp(original_master_log_fname, trx_sys_mysql_master_log_name)
++	    == 0
++	    && (original_master_log_pos	== trx_sys_mysql_master_log_pos)) {
++
++		strncpy(trx_sys_mysql_relay_log_name, original_relay_log_fname,
++			TRX_SYS_MYSQL_MASTER_LOG_NAME_LEN);
++		trx_sys_mysql_relay_log_pos = original_relay_log_pos;
++	}
++
++	fn_format(info_fname, relay_log_info_file, mysql_data_home, "",
++		  MY_UNPACK_FILENAME | MY_RETURN_REAL_PATH);
++
++	if (access(info_fname, F_OK)) {
++		/* File does not exist */
++		error = 1;
++		goto skip_overwrite;
++	}
++
++	/* File exists */
++	info_fd = my_open(info_fname, O_RDWR|O_BINARY, MYF(MY_WME));
++	if (info_fd < 0) {
++		error = 1;
++		goto skip_overwrite;
++	}
++
++	relay_info_log_pos = strmov(buff, trx_sys_mysql_relay_log_name);
++	*relay_info_log_pos ++= '\n';
++	relay_info_log_pos = longlong2str(trx_sys_mysql_relay_log_pos,
++					  relay_info_log_pos, 10);
++	*relay_info_log_pos ++= '\n';
++	relay_info_log_pos = strmov(relay_info_log_pos,
++				    trx_sys_mysql_master_log_name);
++	*relay_info_log_pos ++= '\n';
++	relay_info_log_pos = longlong2str(trx_sys_mysql_master_log_pos,
++					  relay_info_log_pos, 10);
++	*relay_info_log_pos = '\n';
++
++	buf_len = (relay_info_log_pos - buff) + 1;
++	if (my_write(info_fd, (uchar *)buff, buf_len, MY_WME) != buf_len) {
++		error = 1;
++	} else if (my_sync(info_fd, MY_WME)) {
++		error = 1;
++	}
++
++	if (info_fd >= 0) {
++		my_close(info_fd, MYF(0));
++	}
++
++	strncpy(master_log_fname, trx_sys_mysql_relay_log_name,
++		TRX_SYS_MYSQL_MASTER_LOG_NAME_LEN);
++	master_log_pos = trx_sys_mysql_master_log_pos;
++
++skip_overwrite:
++	if (error) {
++		fprintf(stderr,
++			"InnoDB: ERROR: error occured during overwriting "
++			"relay-log.info.\n");
++	} else {
++		fprintf(stderr,
++			"InnoDB: relay-log.info was overwritten.\n");
++	}
++}
++
++
  /*********************************************************************//**
  Opens an InnoDB database.
  @return	0 on success, error code on failure */
@@ -2221,12 +2333,13 @@
  #ifdef HAVE_REPLICATION
  #ifdef MYSQL_SERVER
  	/* read master log position from relay-log.info if exists */
--	char fname[FN_REFLEN+128];
--	int pos;
++	char info_fname[FN_REFLEN];
++	char relay_log_fname[TRX_SYS_MYSQL_MASTER_LOG_NAME_LEN];
++	int relay_log_pos;
  	int info_fd;
  	IO_CACHE info_file;
--	fname[0] = '\0';
++	info_fname[0] = '\0';
  	if(innobase_overwrite_relay_log_info) {
@@ -2235,13 +2348,14 @@
  		" Updates in other storage engines may have problem with consistency.\n");
  	bzero((char*) &info_file, sizeof(info_file));
--	fn_format(fname, relay_log_info_file, mysql_data_home, "", 4+32);
++	fn_format(info_fname, relay_log_info_file, mysql_data_home, "", 4+32);
  	int error=0;
--	if (!access(fname,F_OK)) {
++	if (!access(info_fname,F_OK)) {
  		/* exist */
--		if ((info_fd = my_open(fname, O_RDWR|O_BINARY, MYF(MY_WME))) < 0) {
++		if ((info_fd = my_open(info_fname, O_RDWR | O_BINARY,
++				       MYF(MY_WME))) < 0) {
  			error=1;
  		} else if (init_io_cache(&info_file, info_fd, IO_SIZE*2,
  					READ_CACHE, 0L, 0, MYF(MY_WME))) {
@@ -2252,16 +2366,18 @@
  relay_info_error:
  			if (info_fd >= 0)
  				my_close(info_fd, MYF(0));
--			fname[0] = '\0';
++			master_log_fname[0] = '\0';
  			goto skip_relay;
+ 		}
  	} else {
--		fname[0] = '\0';
++		master_log_fname[0] = '\0';
  		goto skip_relay;
+ 	}
--	if (init_strvar_from_file(fname, sizeof(fname), &info_file, "") || /* dummy (it is relay-log) */
--	    init_intvar_from_file(&pos, &info_file, BIN_LOG_HEADER_SIZE)) {
++	if (init_strvar_from_file(relay_log_fname, sizeof(relay_log_fname),
++				  &info_file, "")
++	    || /* dummy (it is relay-log) */ init_intvar_from_file(
++		    &relay_log_pos, &info_file, BIN_LOG_HEADER_SIZE)) {
  		end_io_cache(&info_file);
  		error=1;
  		goto relay_info_error;
@@ -2270,13 +2386,19 @@
  	fprintf(stderr,
  		"InnoDB: relay-log.info is detected.\n"
  		"InnoDB: relay log: position %u, file name %s\n",
--		pos, fname);
--
--	strncpy(trx_sys_mysql_relay_log_name, fname, TRX_SYS_MYSQL_MASTER_LOG_NAME_LEN);
--	trx_sys_mysql_relay_log_pos = (ib_int64_t) pos;
--
--	if (init_strvar_from_file(fname, sizeof(fname), &info_file, "") ||
--	    init_intvar_from_file(&pos, &info_file, 0)) {
++		relay_log_pos, relay_log_fname);
++
++	strncpy(trx_sys_mysql_relay_log_name, relay_log_fname,
++		TRX_SYS_MYSQL_MASTER_LOG_NAME_LEN);
++	trx_sys_mysql_relay_log_pos = (ib_int64_t) relay_log_pos;
++
++	strncpy(original_relay_log_fname, relay_log_fname,
++		TRX_SYS_MYSQL_MASTER_LOG_NAME_LEN);
++	original_relay_log_pos = relay_log_pos;
++
++	if (init_strvar_from_file(master_log_fname, sizeof(master_log_fname),
++				  &info_file, "")
++	    || init_intvar_from_file(&master_log_pos, &info_file, 0)) {
  		end_io_cache(&info_file);
  		error=1;
  		goto relay_info_error;
@@ -2284,10 +2406,15 @@
  	fprintf(stderr,
  		"InnoDB: master log: position %u, file name %s\n",
--		pos, fname);
--
--	strncpy(trx_sys_mysql_master_log_name, fname, TRX_SYS_MYSQL_MASTER_LOG_NAME_LEN);
--	trx_sys_mysql_master_log_pos = (ib_int64_t) pos;
++		master_log_pos, master_log_fname);
++
++	strncpy(trx_sys_mysql_master_log_name, master_log_fname,
++		TRX_SYS_MYSQL_MASTER_LOG_NAME_LEN);
++	trx_sys_mysql_master_log_pos = (ib_int64_t) master_log_pos;
++
++	strncpy(original_master_log_fname, master_log_fname,
++		TRX_SYS_MYSQL_MASTER_LOG_NAME_LEN);
++	original_master_log_pos = master_log_pos;
  	end_io_cache(&info_file);
  	if (info_fd >= 0)
@@ -2587,75 +2714,9 @@
  		goto mem_free_and_error;
+ 	}
--#ifdef HAVE_REPLICATION
--#ifdef MYSQL_SERVER
  	if(innobase_overwrite_relay_log_info) {
--	/* If InnoDB progressed from relay-log.info, overwrite it */
--	if (fname[0] == '\0') {
--		fprintf(stderr,
--			"InnoDB: something wrong with relay-info.log. InnoDB will not overwrite it.\n");
--	} else if (0 != strcmp(fname, trx_sys_mysql_master_log_name)
--		   || pos != trx_sys_mysql_master_log_pos) {
--		/* Overwrite relay-log.info */
--		bzero((char*) &info_file, sizeof(info_file));
--		fn_format(fname, relay_log_info_file, mysql_data_home, "", 4+32);
--
--		int error = 0;
--
--		if (!access(fname,F_OK)) {
--			/* exist */
--			if ((info_fd = my_open(fname, O_RDWR|O_BINARY, MYF(MY_WME))) < 0) {
--				error = 1;
--			} else if (init_io_cache(&info_file, info_fd, IO_SIZE*2,
--						WRITE_CACHE, 0L, 0, MYF(MY_WME))) {
--				error = 1;
--			}
--
--			if (error) {
--				if (info_fd >= 0)
--					my_close(info_fd, MYF(0));
--				goto skip_overwrite;
--			}
--		} else {
--			error = 1;
--			goto skip_overwrite;
--		}
--
--		char buff[FN_REFLEN*2+22*2+4], *pos;
--
--		my_b_seek(&info_file, 0L);
--		pos=strmov(buff, trx_sys_mysql_relay_log_name);
--		*pos++='\n';
--		pos=longlong2str(trx_sys_mysql_relay_log_pos, pos, 10);
--		*pos++='\n';
--		pos=strmov(pos, trx_sys_mysql_master_log_name);
--		*pos++='\n';
--		pos=longlong2str(trx_sys_mysql_master_log_pos, pos, 10);
--		*pos='\n';
--
--		if (my_b_write(&info_file, (uchar*) buff, (size_t) (pos-buff)+1))
--			error = 1;
--		if (flush_io_cache(&info_file))
--			error = 1;
--
--		end_io_cache(&info_file);
--		if (info_fd >= 0)
--			my_close(info_fd, MYF(0));
--skip_overwrite:
--		if (error) {
--			fprintf(stderr,
--				"InnoDB: ERROR: error occured during overwriting relay-log.info.\n");
--		} else {
--			fprintf(stderr,
--				"InnoDB: relay-log.info was overwritten.\n");
--		}
--	} else {
--		fprintf(stderr,
--			"InnoDB: InnoDB and relay-log.info are synchronized. InnoDB will not overwrite it.\n");
--	}
--	}
--#endif /* MYSQL_SERVER */
--#endif /* HAVE_REPLICATION */
++		innobase_do_overwrite_relay_log_info();
++	}
  	innobase_open_tables = hash_create(200);
  	pthread_mutex_init(&innobase_share_mutex, MY_MUTEX_INIT_FAST);
@@ -2757,38 +2818,50 @@
  		| HA_ONLINE_ADD_PK_INDEX_NO_WRITES);
+ }
--/*****************************************************************//**
--Commits a transaction in an InnoDB database. */
++/****************************************************************//**
++Copy the current replication position from MySQL to a transaction. */
  static
  void
--innobase_commit_low(
--/*================*/
--	trx_t*	trx)	/*!< in: transaction handle */
++innobase_copy_repl_coords_to_trx(
++/*=============================*/
++	const THD*	thd,	/*!< in: thread handle */
++	trx_t*		trx)	/*!< in/out: transaction */
+ {
--	if (trx->conc_state == TRX_NOT_STARTED) {
--
--		return;
--	}
--
--#ifdef HAVE_REPLICATION
--#ifdef MYSQL_SERVER
--	THD *thd=current_thd;
--
  	if (thd && thd->slave_thread) {
--		/* Update the replication position info inside InnoDB */
++		const Relay_log_info*	rli = &active_mi->rli;
++
  		trx->mysql_master_log_file_name
--			= active_mi->rli.group_master_log_name;
++			= rli->group_master_log_name;
  		trx->mysql_master_log_pos
--			= ((ib_int64_t)active_mi->rli.group_master_log_pos +
--			   ((ib_int64_t)active_mi->rli.future_event_relay_log_pos -
--			    (ib_int64_t)active_mi->rli.group_relay_log_pos));
++			= ((ib_int64_t)rli->group_master_log_pos
++			   + ((ib_int64_t)
++			      rli->future_event_relay_log_pos
++			      - (ib_int64_t)rli->group_relay_log_pos));
  		trx->mysql_relay_log_file_name
--			= active_mi->rli.group_relay_log_name;
++			= rli->group_relay_log_name;
  		trx->mysql_relay_log_pos
--			= (ib_int64_t)active_mi->rli.future_event_relay_log_pos;
--	}
--#endif /* MYSQL_SERVER */
--#endif /* HAVE_REPLICATION */
++			= (ib_int64_t)rli->future_event_relay_log_pos;
++	}
++}
++
++/*****************************************************************//**
++Commits a transaction in an InnoDB database. */
++static
++void
++innobase_commit_low(
++/*================*/
++	trx_t*	trx)	/*!< in: transaction handle */
++{
++	if (trx->conc_state == TRX_NOT_STARTED) {
++
++		return;
++	}
++
++	/* Save the current replication position for write to trx sys header
++	for undo purposes, see the comment at corresponding call at
++	innobase_xa_prepare(). */
++
++	innobase_copy_repl_coords_to_trx(current_thd, trx);
  	trx_commit_for_mysql(trx);
+ }
@@ -2898,6 +2971,9 @@
  	if (all
  		|| (!thd_test_options(thd, OPTION_NOT_AUTOCOMMIT | OPTION_BEGIN))) {
++		DBUG_EXECUTE_IF("crash_innodb_before_commit",
++				DBUG_SUICIDE(););
++
  		/* We were instructed to commit the whole transaction, or
  		this is an SQL statement end and autocommit is on */
@@ -10657,7 +10733,27 @@
  		ut_ad(trx->active_trans);
++		/* Update the replication position info in current trx.  This
++		is different from the binlog position update that happens
++		during XA COMMIT.  In contrast to that, the slave position is
++		an actual part of the changes made by this transaction and thus
++		must be updated in the XA PREPARE stage.  Since the trx sys
++		header page changes are not undo-logged, again store this
++		position in a different field in the XA COMMIT stage, so that
++		it might be used in case of rollbacks. */
++
++		/* Since currently there might be only one slave SQL thread, we
++		don't need to take any precautions (e.g. prepare_commit_mutex)
++		to ensure position ordering.  Revisit this in 5.6 which has
++		both the multi-threaded replication to cause us problems and
++		the group commit to solve them.  */
++
++		innobase_copy_repl_coords_to_trx(thd, trx);
++
  		error = (int) trx_prepare_for_mysql(trx);
++
++		DBUG_EXECUTE_IF("crash_innodb_after_prepare",
++				DBUG_SUICIDE(););
  	} else {
  		/* We just mark the SQL statement ended and do not do a
  		transaction prepare */
@@ -10780,6 +10876,22 @@
  	if (trx) {
  		int	ret = innobase_rollback_trx(trx);
  		trx_free_for_background(trx);
++
++		if (innobase_overwrite_relay_log_info) {
++
++			/* On rollback of a prepared transaction revert the
++			current slave positions to the ones recorded by the
++			last COMMITTed transaction.  This has an effect of
++			undoing the position change caused by the transaction
++			being rolled back.  Assumes single-threaded slave SQL
++			thread.  If the server has non-master write traffic
++			with XA rollbacks, this will cause additional spurious
++			slave info log overwrites, which should be harmless. */
++
++			trx_sys_print_committed_mysql_master_log_pos();
++			innobase_do_overwrite_relay_log_info();
++		}
++
  		return(ret);
  	} else {
  		return(XAER_NOTA);
 === modified file 'Percona-Server/storage/innodb_plugin/include/trx0sys.h'
 --- Percona-Server/storage/innodb_plugin/include/trx0sys.h	2012-04-02 02:09:15 +0000
 +++ Percona-Server/storage/innodb_plugin/include/trx0sys.h	2012-08-16 13:46:15 +0000
@@ -357,6 +357,14 @@
  trx_sys_print_mysql_binlog_offset(void);
  /*===================================*/
  /*****************************************************************//**
++Prints to stderr the MySQL master log offset info in the trx system header
++COMMIT set of fields if the magic number shows it valid and stores it
++in global variables. */
++UNIV_INTERN
++void
++trx_sys_print_committed_mysql_master_log_pos(void);
++/*==============================================*/
++/*****************************************************************//**
  Prints to stderr the MySQL master log offset info in the trx system header if
  the magic number shows it valid. */
  UNIV_INTERN
@@ -536,10 +544,16 @@
  //# error "UNIV_PAGE_SIZE < 4096"
  //#endif
  /** The offset of the MySQL replication info in the trx system header;
--this contains the same fields as TRX_SYS_MYSQL_LOG_INFO below */
++this contains the same fields as TRX_SYS_MYSQL_LOG_INFO below.  These are
++written at prepare time and are the main copy. */
  #define TRX_SYS_MYSQL_MASTER_LOG_INFO	(UNIV_PAGE_SIZE - 2000)
  #define TRX_SYS_MYSQL_RELAY_LOG_INFO	(UNIV_PAGE_SIZE - 1500)
++/** The copy of the above which is made at transaction COMMIT time. If binlog
++crash recovery rollbacks a PREPAREd transaction, they are copied back. */
++#define TRX_SYS_COMMIT_MASTER_LOG_INFO	(UNIV_PAGE_SIZE - 3000)
++#define TRX_SYS_COMMIT_RELAY_LOG_INFO	(UNIV_PAGE_SIZE - 2500)
++
  /** The offset of the MySQL binlog offset info in the trx system header */
  #define TRX_SYS_MYSQL_LOG_INFO		(UNIV_PAGE_SIZE - 1000)
  #define	TRX_SYS_MYSQL_LOG_MAGIC_N_FLD	0	/*!< magic number which is
 === modified file 'Percona-Server/storage/innodb_plugin/trx/trx0sys.c'
 --- Percona-Server/storage/innodb_plugin/trx/trx0sys.c	2012-04-02 02:09:15 +0000
 +++ Percona-Server/storage/innodb_plugin/trx/trx0sys.c	2012-08-16 13:46:15 +0000
@@ -947,8 +947,31 @@
+ }
  /*****************************************************************//**
--Prints to stderr the MySQL master log offset info in the trx system header if
--the magic number shows it valid. */
++Reads the log coordinates at the given offset in the trx sys header. */
++static
++void
++trx_sys_read_log_pos(
++/*=================*/
++	const trx_sysf_t*	sys_header,	/*!< in: the trx sys header */
++	uint			header_offset,	/*!< in: coord offset in the
++						header */
++	char*			log_fn,		/*!< out: the log file name */
++	ib_int64_t*		log_pos)	/*!< out: the log poistion */
++{
++	ut_memcpy(log_fn, sys_header + header_offset + TRX_SYS_MYSQL_LOG_NAME,
++		  TRX_SYS_MYSQL_MASTER_LOG_NAME_LEN);
++
++	*log_pos =
++		(((ib_int64_t)mach_read_from_4(sys_header + header_offset
++				+ TRX_SYS_MYSQL_LOG_OFFSET_HIGH)) << 32)
++		+ mach_read_from_4(sys_header + header_offset
++				   + TRX_SYS_MYSQL_LOG_OFFSET_LOW);
++}
++
++/*****************************************************************//**
++Prints to stderr the MySQL master log offset info in the trx system header
++PREPARE set of fields if the magic number shows it valid and stores it
++in global variables. */
  UNIV_INTERN
  void
  trx_sys_print_mysql_master_log_pos(void)
@@ -970,60 +993,79 @@
  		return;
+ 	}
--	fprintf(stderr,
--		"InnoDB: In a MySQL replication slave the last"
--		" master binlog file\n"
--		"InnoDB: position %lu %lu, file name %s\n",
--		(ulong) mach_read_from_4(sys_header
--					 + TRX_SYS_MYSQL_MASTER_LOG_INFO
--					 + TRX_SYS_MYSQL_LOG_OFFSET_HIGH),
--		(ulong) mach_read_from_4(sys_header
--					 + TRX_SYS_MYSQL_MASTER_LOG_INFO
--					 + TRX_SYS_MYSQL_LOG_OFFSET_LOW),
--		sys_header + TRX_SYS_MYSQL_MASTER_LOG_INFO
--		+ TRX_SYS_MYSQL_LOG_NAME);
--
--	fprintf(stderr,
--		"InnoDB: and relay log file\n"
--		"InnoDB: position %lu %lu, file name %s\n",
--		(ulong) mach_read_from_4(sys_header
--					 + TRX_SYS_MYSQL_RELAY_LOG_INFO
--					 + TRX_SYS_MYSQL_LOG_OFFSET_HIGH),
--		(ulong) mach_read_from_4(sys_header
--					 + TRX_SYS_MYSQL_RELAY_LOG_INFO
--					 + TRX_SYS_MYSQL_LOG_OFFSET_LOW),
--		sys_header + TRX_SYS_MYSQL_RELAY_LOG_INFO
--		+ TRX_SYS_MYSQL_LOG_NAME);
--
  	/* Copy the master log position info to global variables we can
  	use in ha_innobase.cc to initialize glob_mi to right values */
--
--	ut_memcpy(trx_sys_mysql_master_log_name,
--		  sys_header + TRX_SYS_MYSQL_MASTER_LOG_INFO
--		  + TRX_SYS_MYSQL_LOG_NAME,
--		  TRX_SYS_MYSQL_MASTER_LOG_NAME_LEN);
--
--	trx_sys_mysql_master_log_pos
--		= (((ib_int64_t) mach_read_from_4(
--			    sys_header + TRX_SYS_MYSQL_MASTER_LOG_INFO
--			    + TRX_SYS_MYSQL_LOG_OFFSET_HIGH)) << 32)
--		+ ((ib_int64_t) mach_read_from_4(
--			   sys_header + TRX_SYS_MYSQL_MASTER_LOG_INFO
--			   + TRX_SYS_MYSQL_LOG_OFFSET_LOW));
--
--	ut_memcpy(trx_sys_mysql_relay_log_name,
--		  sys_header + TRX_SYS_MYSQL_RELAY_LOG_INFO
--		  + TRX_SYS_MYSQL_LOG_NAME,
--		  TRX_SYS_MYSQL_MASTER_LOG_NAME_LEN);
--
--	trx_sys_mysql_relay_log_pos
--		= (((ib_int64_t) mach_read_from_4(
--			    sys_header + TRX_SYS_MYSQL_RELAY_LOG_INFO
--			    + TRX_SYS_MYSQL_LOG_OFFSET_HIGH)) << 32)
--		+ ((ib_int64_t) mach_read_from_4(
--			   sys_header + TRX_SYS_MYSQL_RELAY_LOG_INFO
--			   + TRX_SYS_MYSQL_LOG_OFFSET_LOW));
--	mtr_commit(&mtr);
++	trx_sys_read_log_pos(sys_header, TRX_SYS_MYSQL_MASTER_LOG_INFO,
++			     trx_sys_mysql_master_log_name,
++			     &trx_sys_mysql_master_log_pos);
++
++	trx_sys_read_log_pos(sys_header, TRX_SYS_MYSQL_RELAY_LOG_INFO,
++			     trx_sys_mysql_relay_log_name,
++			     &trx_sys_mysql_relay_log_pos);
++
++	mtr_commit(&mtr);
++
++	fprintf(stderr,
++		"InnoDB: In a MySQL replication slave the last"
++		" master binlog file\n"
++		"InnoDB: position %llu, file name %s\n",
++		trx_sys_mysql_master_log_pos,
++		trx_sys_mysql_master_log_name);
++
++	fprintf(stderr,
++		"InnoDB: and relay log file\n"
++		"InnoDB: position %llu, file name %s\n",
++		trx_sys_mysql_relay_log_pos,
++		trx_sys_mysql_relay_log_name);
++}
++
++/*****************************************************************//**
++Prints to stderr the MySQL master log offset info in the trx system header
++COMMIT set of fields if the magic number shows it valid and stores it
++in global variables. */
++UNIV_INTERN
++void
++trx_sys_print_committed_mysql_master_log_pos(void)
++/*==============================================*/
++{
++	trx_sysf_t*	sys_header;
++	mtr_t		mtr;
++
++	mtr_start(&mtr);
++
++	sys_header = trx_sysf_get(&mtr);
++
++	if (mach_read_from_4(sys_header + TRX_SYS_COMMIT_MASTER_LOG_INFO
++			     + TRX_SYS_MYSQL_LOG_MAGIC_N_FLD)
++	    != TRX_SYS_MYSQL_LOG_MAGIC_N) {
++
++		mtr_commit(&mtr);
++
++		return;
++	}
++
++	/* Copy the master log position info to global variables we can
++	   use in ha_innobase.cc to initialize glob_mi to right values */
++	trx_sys_read_log_pos(sys_header, TRX_SYS_COMMIT_MASTER_LOG_INFO,
++			     trx_sys_mysql_master_log_name,
++			     &trx_sys_mysql_master_log_pos);
++
++	trx_sys_read_log_pos(sys_header, TRX_SYS_COMMIT_RELAY_LOG_INFO,
++			     trx_sys_mysql_relay_log_name,
++			     &trx_sys_mysql_relay_log_pos);
++
++	mtr_commit(&mtr);
++
++	fprintf(stderr,
++		"InnoDB: In a MySQL replication slave the last"
++		" master binlog file\n"
++		"InnoDB: position %llu, file name %s\n",
++		trx_sys_mysql_master_log_pos, trx_sys_mysql_master_log_name);
++
++	fprintf(stderr,
++		"InnoDB: and relay log file\n"
++		"InnoDB: position %llu, file name %s\n",
++		trx_sys_mysql_relay_log_pos, trx_sys_mysql_relay_log_name);
+ }
  /****************************************************************//**
 === modified file 'Percona-Server/storage/innodb_plugin/trx/trx0trx.c'
 --- Percona-Server/storage/innodb_plugin/trx/trx0trx.c	2012-04-02 02:09:15 +0000
 +++ Percona-Server/storage/innodb_plugin/trx/trx0trx.c	2012-08-16 13:46:15 +0000
@@ -904,12 +904,12 @@
  				sys_header,
  				trx->mysql_relay_log_file_name,
  				trx->mysql_relay_log_pos,
--				TRX_SYS_MYSQL_RELAY_LOG_INFO, &mtr);
++				TRX_SYS_COMMIT_RELAY_LOG_INFO, &mtr);
  			trx_sys_update_mysql_binlog_offset(
  				sys_header,
  				trx->mysql_master_log_file_name,
  				trx->mysql_master_log_pos,
--				TRX_SYS_MYSQL_MASTER_LOG_INFO, &mtr);
++				TRX_SYS_COMMIT_MASTER_LOG_INFO, &mtr);
  			trx->mysql_master_log_file_name = "";
+ 		}
@@ -2002,6 +2002,23 @@
  		mutex_exit(&(rseg->mutex));
++		if (trx->mysql_master_log_file_name[0] != '\0') {
++			/* This database server is a MySQL replication slave */
++			trx_sysf_t*	sys_header	= trx_sysf_get(&mtr);
++
++			trx_sys_update_mysql_binlog_offset(
++				sys_header,
++				trx->mysql_relay_log_file_name,
++				trx->mysql_relay_log_pos,
++				TRX_SYS_MYSQL_RELAY_LOG_INFO, &mtr);
++			trx_sys_update_mysql_binlog_offset(
++				sys_header,
++				trx->mysql_master_log_file_name,
++				trx->mysql_master_log_pos,
++				TRX_SYS_MYSQL_MASTER_LOG_INFO, &mtr);
++			trx->mysql_master_log_file_name = "";
++		}
++
  		/*--------------*/
  		mtr_commit(&mtr);	/* This mtr commit makes the
  					transaction prepared in the file-based

Percona Server moved to https://jira.percona.com/projects/PS

Merge lp:~laurynas-biveinis/percona-server/bug1012715-5.1 into lp:percona-server/5.1

Commit message

Description of the change

Preview Diff

Subscribers