Percona Server moved to https://jira.percona.com/projects/PS

Merge lp:~sergei.glushchenko/percona-server/ST28246-bug1092593-5.1 into lp:percona-server/5.1

ST28246-bug1092593-5.1
Merge into 5.1

Proposed by Sergei Glushchenko on 2013-04-22

Status:

Merged

Approved by:

Laurynas Biveinis on 2013-05-14

Approved revision:

no longer in the source branch.

Merged at revision:

561

Proposed branch:

lp:~sergei.glushchenko/percona-server/ST28246-bug1092593-5.1

Merge into:

lp:percona-server/5.1

Diff against target:

136 lines (+112/-0)

4 files modified

Percona-Server/mysql-test/suite/rpl/r/rpl_percona_bug1092593.result (+20/-0)
Percona-Server/mysql-test/suite/rpl/t/rpl_percona_bug1092593-slave.opt (+1/-0)
Percona-Server/mysql-test/suite/rpl/t/rpl_percona_bug1092593.test (+81/-0)
Percona-Server/storage/innodb_plugin/trx/trx0trx.c (+10/-0)

To merge this branch:

bzr merge lp:~sergei.glushchenko/percona-server/ST28246-bug1092593-5.1

Medium

Fix Released

Link a bug report

Reviewer	Review Type	Date Requested	Status
Laurynas Biveinis (community)		2013-04-22	Approve on 2013-05-14
Review via email: mp+160069@code.launchpad.net

Description of the change

Bug 1092593: crash resistant replication doesn't work correctly after change
master or reset slave
- further might look rather unstructured; so one might benefit from
  refreshing discussion here
  https://code.launchpad.net/~laurynas-biveinis/percona-server/bug1012715-5.1/+merge/120042
  before reading this
- this is how I imagine XA-transactions flow
  * BEGIN do something PREPARE COMMIT BEGIN do something PREPARE COMMIT
  * on each state ROLLBACK can be issued except after
    the COMMIT has been completed successfully and BEGIN is not issued
    (it this case ROLLBACK will just be noop)
  * ROLLBACK should leave us somewhere before the BEGIN
- when InnoDB performs recovery it takes binlog position
  from the "prepare" point.
- after this XA transaction can be reverted or committed
- if XA transaction is reverted then we take binlog position
  from "commit" point which in this case is older than "prepare"
- if XA transaction is committed then we continue to use "prepare"
  which in this case of same age as "commit" point
- this however does not work when XA transactions are not in use;
  in this case we never write "prepare" points
- this might be masked by the fact that we never have real info
  in "prepare" points so it's content is not overwrite MySQL *-info
  files; it is still bad because we don't really have transactional
  writes for binlog position as has been claimed in the documentation
- solution of this problem looks trivial for me; on commit we should
  overwrite both "prepare" and "commit"; this doesn't make any harm
  as after commit there is no way back to prepare anyway, so XA-case
  continues to work
- non XA case will look as following
  BEGIN do something COMMIT BEGIN do something COMMIT
  * commit and prepare become the one commit point after which
    there is no way back and it is reflected in our "commit" and
    "prepare" points holding the same binlog position; rollback
    is possible to the previous commit point only

http://jenkins.percona.com/view/PS%205.1/job/percona-server-5.1-param/532/

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2013-05-07:

The code is OK.

Please update the bug telling the root cause of the issue (crash-resistant recovery does not work when InnoDB operates in 1PC, i.e. binlog disabled) for SEO.

Comments for the testcase:

    - The testcase does not need not_valgrind.inc. It shutdowns the
      server cleanly, which is compatible with Valgrind.
    - What does "# Kill the server without sending a shutdown
      command" mean? Is it supposed to mean the same as just "kill
      the server" as opposed to "shutdown the server"? This comment
      appears before rpl_restart_server.inc, which is a clean
      restart.
    - Why do you do shutdown and start server in separate steps in
      diff lines 108--115 instead of just restarting it? I don't see
      any action in the middle.

review: Needs Fixing

Revision history for this message

Sergei Glushchenko (sergei.glushchenko) wrote on 2013-05-07:

Laurynas,

> Please update the bug telling the root cause of the issue (crash-resistant recovery does not work when InnoDB operates in 1PC, i.e. binlog disabled) for SEO.

> Comments for the testcase:
>
> - The testcase does not need not_valgrind.inc. It shutdowns the
> server cleanly, which is compatible with Valgrind.

I kill server in this test case (look at --shutdown_server 0)

> - What does "# Kill the server without sending a shutdown
> command" mean? Is it supposed to mean the same as just "kill
> the server" as opposed to "shutdown the server"? This comment
> appears before rpl_restart_server.inc, which is a clean
> restart.

Agee, first comment is wrong, should be "restart server".

The meaning of second comes from understanding of how shutdown_server works.
It takes timeout as an argument. If timeout is > 0, then it sends
shutdown command to server and waits gives timeout seconds for
server to finish. After it kills server.
If timeout is 0, server just killed.

> - Why do you do shutdown and start server in separate steps in
> diff lines 108--115 instead of just restarting it? I don't see
> any action in the middle.

Because I need to kill server instead of clean shutdown.

Revision history for this message

Sergei Glushchenko (sergei.glushchenko) wrote on 2013-05-07:

I need to kill server because both "commit" and "prepare" log positions are written at server clean shutdown

Revision history for this message

Sergei Glushchenko (sergei.glushchenko) wrote on 2013-05-07:

I changed bug title and updated it in revision comments and test case comments.
I changed comments before first server restart and second server restart.

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2013-05-07:

    - Ah, I didn't know that shutdown_server 0 is a guaranteed
      kill. Thanks. That also answers some questions to me that
      were dependant on this.

- Empty diff line 59.

- Diff line 70: s/t1/x

    - Is the first restart required so that, with the bug present,
      the COMMIT position gets a value written to it which is then
      not updated but read from? Please add a comment about this
      there. There is already a comment at "This will fail ... " but
      adding some comments at earlier points would be helpful as
      well.

review: Needs Fixing

Revision history for this message

Sergei Glushchenko (sergei.glushchenko) wrote on 2013-05-09:

Reviewed test case once more, which led to drop of one of the runs. More verbose comments has been added.
http://jenkins.percona.com/view/PS%205.1/job/percona-server-5.1-param/535/

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2013-05-10:

I suggest different wordings for the testcase comments:

    - "Test the slave running with --log-slave-updates first, then
      restarting without this option, and crashing. With the bug
      present, crash recovery will restore binlog position that was
      written before the restart and thus is outdated."

    - "InnoDB and binlog are operating using two-phase commit protocol
      at slave, both "prepare" and "commit" points are updated with
      binlog coordinates"

    - "Now InnoDB is operating using one-phase commit protocol at
      slave. Before the fix, only the "commit" point was being
      updated."

- "Kill the slave to trigger binlog position recovery from
"prepare" point on the next startup."

    - "This will fail if the bug is present: the binlog coordinates
      at "prepare" point have been last updated before the server
      restart. After the restart the slave was running without
      --log-slave-updates, skipping the "prepare" point update. Thus
      on startup slave will read the obsolete position and fail.
      After the fix the "prepare" point will be current.

Since this is a comment-only change, no Jenkins run is necessary. Just a local run of this single testcase.

review: Needs Fixing

Revision history for this message

Sergei Glushchenko (sergei.glushchenko) wrote on 2013-05-10:

I've fixed comments following your suggestions. Thanks for improving my English by the way :)

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) on 2013-05-11:

review: Approve

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2013-05-13:

Actually, we do need a 5.6 branch.

review: Needs Fixing

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) on 2013-05-14:

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Alejandra Bedolla Diaz

Alexey Kopytov

Otávio Fernandes

Percona developers

Sergei Glushchenko

 === added file 'Percona-Server/mysql-test/suite/rpl/r/rpl_percona_bug1092593.result'
 --- Percona-Server/mysql-test/suite/rpl/r/rpl_percona_bug1092593.result	1970-01-01 00:00:00 +0000
 +++ Percona-Server/mysql-test/suite/rpl/r/rpl_percona_bug1092593.result	2013-05-10 09:38:28 +0000
@@ -0,0 +1,20 @@
++include/master-slave.inc
++[connection master]
++DROP TABLE IF EXISTS x;
++CREATE TABLE x (a INT) engine=InnoDB;
++INSERT INTO x VALUES (1);
++include/rpl_restart_server.inc [server_number=2 parameters: --log-slave-updates=FALSE]
++include/start_slave.inc
++INSERT INTO x VALUES (2);
++SELECT a FROM x ORDER BY a;
++a
++1
++2
++include/rpl_start_server.inc [server_number=2 parameters: --log-slave-updates=FALSE]
++include/start_slave.inc
++SELECT a FROM x ORDER BY a;
++a
++1
++2
++DROP TABLE x;
++include/rpl_end.inc
 === added file 'Percona-Server/mysql-test/suite/rpl/t/rpl_percona_bug1092593-slave.opt'
 --- Percona-Server/mysql-test/suite/rpl/t/rpl_percona_bug1092593-slave.opt	1970-01-01 00:00:00 +0000
 +++ Percona-Server/mysql-test/suite/rpl/t/rpl_percona_bug1092593-slave.opt	2013-05-10 09:38:28 +0000
@@ -0,0 +1,1 @@
++--innodb-overwrite-relay-log-info --skip-core-file --skip-stack-trace --log-bin --log-slave-updates
 === added file 'Percona-Server/mysql-test/suite/rpl/t/rpl_percona_bug1092593.test'
 --- Percona-Server/mysql-test/suite/rpl/t/rpl_percona_bug1092593.test	1970-01-01 00:00:00 +0000
 +++ Percona-Server/mysql-test/suite/rpl/t/rpl_percona_bug1092593.test	2013-05-10 09:38:28 +0000
@@ -0,0 +1,81 @@
++###########################################################################
++# Bug 1092593: crash-resistant replication doesn't work when InnoDB
++#              operates with binary log disabled
++#
++# Test the slave running with --log-slave-updates first, then
++# restarting without this option, and crashing.  With the bug
++# present, crash recovery will restore binlog position that was
++# written before the restart and thus is outdated
++###########################################################################
++
++--source include/have_innodb_plugin.inc
++--source include/not_valgrind.inc
++--source include/not_crashrep.inc
++--source include/master-slave.inc
++
++--disable_query_log
++call mtr.add_suppression("InnoDB: Warning: innodb_overwrite_relay_log_info is enabled.");
++--enable_query_log
++
++connection master;
++
++# InnoDB and binlog are operating using two-phase commit protocol
++# at slave, both "prepare" and "commit" points are updated with
++# binlog coordinates
++
++--disable_warnings
++DROP TABLE IF EXISTS x;
++--enable_warnings
++
++CREATE TABLE x (a INT) engine=InnoDB;
++
++INSERT INTO x VALUES (1);
++
++sync_slave_with_master;
++
++# Restart the slave.
++# Now InnoDB is operating using one-phase commit protocol at
++# slave.  Before the fix, only the "commit" point was being
++# updated.
++--let $rpl_server_number= 2
++--let $rpl_server_parameters= --log-slave-updates=FALSE
++--source include/rpl_restart_server.inc
++--source include/start_slave.inc
++
++connection master;
++
++INSERT INTO x VALUES (2);
++
++sync_slave_with_master;
++
++SELECT a FROM x ORDER BY a;
++
++# Kill the slave to trigger binlog position recovery from
++# "prepare" point on the next startup
++-- exec echo "wait" > $MYSQLTEST_VARDIR/tmp/mysqld.2.expect
++-- shutdown_server 0
++-- source include/wait_until_disconnected.inc
++
++--let $rpl_server_number= 2
++--let $rpl_server_parameters= --log-slave-updates=FALSE
++--source include/rpl_start_server.inc
++
++# This will fail if the bug is present: the binlog coordinates
++# at "prepare" point have been last updated before the server
++# restart.  After the restart the slave was running without
++# --log-slave-updates, skipping the "prepare" point update.  Thus
++# on startup slave will read the obsolete position and fail.
++# After the fix the "prepare" point will be current.
++--source include/start_slave.inc
++
++connection master;
++
++sync_slave_with_master;
++
++SELECT a FROM x ORDER BY a;
++
++connection master;
++
++DROP TABLE x;
++
++--source include/rpl_end.inc
 === modified file 'Percona-Server/storage/innodb_plugin/trx/trx0trx.c'
 --- Percona-Server/storage/innodb_plugin/trx/trx0trx.c	2012-08-16 13:36:42 +0000
 +++ Percona-Server/storage/innodb_plugin/trx/trx0trx.c	2013-05-10 09:38:28 +0000
@@ -910,6 +910,16 @@
  				trx->mysql_master_log_file_name,
  				trx->mysql_master_log_pos,
  				TRX_SYS_COMMIT_MASTER_LOG_INFO, &mtr);
++			trx_sys_update_mysql_binlog_offset(
++				sys_header,
++				trx->mysql_relay_log_file_name,
++				trx->mysql_relay_log_pos,
++				TRX_SYS_MYSQL_RELAY_LOG_INFO, &mtr);
++			trx_sys_update_mysql_binlog_offset(
++				sys_header,
++				trx->mysql_master_log_file_name,
++				trx->mysql_master_log_pos,
++				TRX_SYS_MYSQL_MASTER_LOG_INFO, &mtr);
  			trx->mysql_master_log_file_name = "";
+ 		}

Percona Server moved to https://jira.percona.com/projects/PS

Merge lp:~sergei.glushchenko/percona-server/ST28246-bug1092593-5.1 into lp:percona-server/5.1

Commit message

Description of the change

Preview Diff

Subscribers