Merge lp:~sergei.glushchenko/percona-server/ST28246-bug1092593-5.1 into lp:percona-server/5.1

Proposed by Sergei Glushchenko
Status: Merged
Approved by: Laurynas Biveinis
Approved revision: no longer in the source branch.
Merged at revision: 561
Proposed branch: lp:~sergei.glushchenko/percona-server/ST28246-bug1092593-5.1
Merge into: lp:percona-server/5.1
Diff against target: 136 lines (+112/-0)
4 files modified
Percona-Server/mysql-test/suite/rpl/r/rpl_percona_bug1092593.result (+20/-0)
Percona-Server/mysql-test/suite/rpl/t/rpl_percona_bug1092593-slave.opt (+1/-0)
Percona-Server/mysql-test/suite/rpl/t/rpl_percona_bug1092593.test (+81/-0)
Percona-Server/storage/innodb_plugin/trx/trx0trx.c (+10/-0)
To merge this branch: bzr merge lp:~sergei.glushchenko/percona-server/ST28246-bug1092593-5.1
Reviewer Review Type Date Requested Status
Laurynas Biveinis (community) Approve
Review via email: mp+160069@code.launchpad.net

Description of the change

Bug 1092593: crash resistant replication doesn't work correctly after change
master or reset slave
- further might look rather unstructured; so one might benefit from
  refreshing discussion here
  https://code.launchpad.net/~laurynas-biveinis/percona-server/bug1012715-5.1/+merge/120042
  before reading this
- this is how I imagine XA-transactions flow
  * BEGIN do something PREPARE COMMIT BEGIN do something PREPARE COMMIT
  * on each state ROLLBACK can be issued except after
    the COMMIT has been completed successfully and BEGIN is not issued
    (it this case ROLLBACK will just be noop)
  * ROLLBACK should leave us somewhere before the BEGIN
- when InnoDB performs recovery it takes binlog position
  from the "prepare" point.
- after this XA transaction can be reverted or committed
- if XA transaction is reverted then we take binlog position
  from "commit" point which in this case is older than "prepare"
- if XA transaction is committed then we continue to use "prepare"
  which in this case of same age as "commit" point
- this however does not work when XA transactions are not in use;
  in this case we never write "prepare" points
- this might be masked by the fact that we never have real info
  in "prepare" points so it's content is not overwrite MySQL *-info
  files; it is still bad because we don't really have transactional
  writes for binlog position as has been claimed in the documentation
- solution of this problem looks trivial for me; on commit we should
  overwrite both "prepare" and "commit"; this doesn't make any harm
  as after commit there is no way back to prepare anyway, so XA-case
  continues to work
- non XA case will look as following
  BEGIN do something COMMIT BEGIN do something COMMIT
  * commit and prepare become the one commit point after which
    there is no way back and it is reflected in our "commit" and
    "prepare" points holding the same binlog position; rollback
    is possible to the previous commit point only

http://jenkins.percona.com/view/PS%205.1/job/percona-server-5.1-param/532/

To post a comment you must log in.
Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

The code is OK.

Please update the bug telling the root cause of the issue (crash-resistant recovery does not work when InnoDB operates in 1PC, i.e. binlog disabled) for SEO.

Comments for the testcase:

    - The testcase does not need not_valgrind.inc. It shutdowns the
      server cleanly, which is compatible with Valgrind.
    - What does "# Kill the server without sending a shutdown
      command" mean? Is it supposed to mean the same as just "kill
      the server" as opposed to "shutdown the server"? This comment
      appears before rpl_restart_server.inc, which is a clean
      restart.
    - Why do you do shutdown and start server in separate steps in
      diff lines 108--115 instead of just restarting it? I don't see
      any action in the middle.

review: Needs Fixing
Revision history for this message
Sergei Glushchenko (sergei.glushchenko) wrote :

Laurynas,

> Please update the bug telling the root cause of the issue (crash-resistant recovery does not work when InnoDB operates in 1PC, i.e. binlog disabled) for SEO.

> Comments for the testcase:
>
> - The testcase does not need not_valgrind.inc. It shutdowns the
> server cleanly, which is compatible with Valgrind.

I kill server in this test case (look at --shutdown_server 0)

> - What does "# Kill the server without sending a shutdown
> command" mean? Is it supposed to mean the same as just "kill
> the server" as opposed to "shutdown the server"? This comment
> appears before rpl_restart_server.inc, which is a clean
> restart.

Agee, first comment is wrong, should be "restart server".

The meaning of second comes from understanding of how shutdown_server works.
It takes timeout as an argument. If timeout is > 0, then it sends
shutdown command to server and waits gives timeout seconds for
server to finish. After it kills server.
If timeout is 0, server just killed.

> - Why do you do shutdown and start server in separate steps in
> diff lines 108--115 instead of just restarting it? I don't see
> any action in the middle.

Because I need to kill server instead of clean shutdown.

Revision history for this message
Sergei Glushchenko (sergei.glushchenko) wrote :

I need to kill server because both "commit" and "prepare" log positions are written at server clean shutdown

Revision history for this message
Sergei Glushchenko (sergei.glushchenko) wrote :

I changed bug title and updated it in revision comments and test case comments.
I changed comments before first server restart and second server restart.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

    - Ah, I didn't know that shutdown_server 0 is a guaranteed
      kill. Thanks. That also answers some questions to me that
      were dependant on this.

    - Empty diff line 59.

    - Diff line 70: s/t1/x

    - Is the first restart required so that, with the bug present,
      the COMMIT position gets a value written to it which is then
      not updated but read from? Please add a comment about this
      there. There is already a comment at "This will fail ... " but
      adding some comments at earlier points would be helpful as
      well.

review: Needs Fixing
Revision history for this message
Sergei Glushchenko (sergei.glushchenko) wrote :

Reviewed test case once more, which led to drop of one of the runs. More verbose comments has been added.
http://jenkins.percona.com/view/PS%205.1/job/percona-server-5.1-param/535/

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

I suggest different wordings for the testcase comments:

    - "Test the slave running with --log-slave-updates first, then
      restarting without this option, and crashing. With the bug
      present, crash recovery will restore binlog position that was
      written before the restart and thus is outdated."

    - "InnoDB and binlog are operating using two-phase commit protocol
      at slave, both "prepare" and "commit" points are updated with
      binlog coordinates"

    - "Now InnoDB is operating using one-phase commit protocol at
      slave. Before the fix, only the "commit" point was being
      updated."

    - "Kill the slave to trigger binlog position recovery from
      "prepare" point on the next startup."

    - "This will fail if the bug is present: the binlog coordinates
      at "prepare" point have been last updated before the server
      restart. After the restart the slave was running without
      --log-slave-updates, skipping the "prepare" point update. Thus
      on startup slave will read the obsolete position and fail.
      After the fix the "prepare" point will be current.

Since this is a comment-only change, no Jenkins run is necessary. Just a local run of this single testcase.

review: Needs Fixing
Revision history for this message
Sergei Glushchenko (sergei.glushchenko) wrote :

I've fixed comments following your suggestions. Thanks for improving my English by the way :)

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) :
review: Approve
Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Actually, we do need a 5.6 branch.

review: Needs Fixing
Revision history for this message
Laurynas Biveinis (laurynas-biveinis) :
review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== added file 'Percona-Server/mysql-test/suite/rpl/r/rpl_percona_bug1092593.result'
2--- Percona-Server/mysql-test/suite/rpl/r/rpl_percona_bug1092593.result 1970-01-01 00:00:00 +0000
3+++ Percona-Server/mysql-test/suite/rpl/r/rpl_percona_bug1092593.result 2013-05-10 09:38:28 +0000
4@@ -0,0 +1,20 @@
5+include/master-slave.inc
6+[connection master]
7+DROP TABLE IF EXISTS x;
8+CREATE TABLE x (a INT) engine=InnoDB;
9+INSERT INTO x VALUES (1);
10+include/rpl_restart_server.inc [server_number=2 parameters: --log-slave-updates=FALSE]
11+include/start_slave.inc
12+INSERT INTO x VALUES (2);
13+SELECT a FROM x ORDER BY a;
14+a
15+1
16+2
17+include/rpl_start_server.inc [server_number=2 parameters: --log-slave-updates=FALSE]
18+include/start_slave.inc
19+SELECT a FROM x ORDER BY a;
20+a
21+1
22+2
23+DROP TABLE x;
24+include/rpl_end.inc
25
26=== added file 'Percona-Server/mysql-test/suite/rpl/t/rpl_percona_bug1092593-slave.opt'
27--- Percona-Server/mysql-test/suite/rpl/t/rpl_percona_bug1092593-slave.opt 1970-01-01 00:00:00 +0000
28+++ Percona-Server/mysql-test/suite/rpl/t/rpl_percona_bug1092593-slave.opt 2013-05-10 09:38:28 +0000
29@@ -0,0 +1,1 @@
30+--innodb-overwrite-relay-log-info --skip-core-file --skip-stack-trace --log-bin --log-slave-updates
31
32=== added file 'Percona-Server/mysql-test/suite/rpl/t/rpl_percona_bug1092593.test'
33--- Percona-Server/mysql-test/suite/rpl/t/rpl_percona_bug1092593.test 1970-01-01 00:00:00 +0000
34+++ Percona-Server/mysql-test/suite/rpl/t/rpl_percona_bug1092593.test 2013-05-10 09:38:28 +0000
35@@ -0,0 +1,81 @@
36+###########################################################################
37+# Bug 1092593: crash-resistant replication doesn't work when InnoDB
38+# operates with binary log disabled
39+#
40+# Test the slave running with --log-slave-updates first, then
41+# restarting without this option, and crashing. With the bug
42+# present, crash recovery will restore binlog position that was
43+# written before the restart and thus is outdated
44+###########################################################################
45+
46+--source include/have_innodb_plugin.inc
47+--source include/not_valgrind.inc
48+--source include/not_crashrep.inc
49+--source include/master-slave.inc
50+
51+--disable_query_log
52+call mtr.add_suppression("InnoDB: Warning: innodb_overwrite_relay_log_info is enabled.");
53+--enable_query_log
54+
55+connection master;
56+
57+# InnoDB and binlog are operating using two-phase commit protocol
58+# at slave, both "prepare" and "commit" points are updated with
59+# binlog coordinates
60+
61+--disable_warnings
62+DROP TABLE IF EXISTS x;
63+--enable_warnings
64+
65+CREATE TABLE x (a INT) engine=InnoDB;
66+
67+INSERT INTO x VALUES (1);
68+
69+sync_slave_with_master;
70+
71+# Restart the slave.
72+# Now InnoDB is operating using one-phase commit protocol at
73+# slave. Before the fix, only the "commit" point was being
74+# updated.
75+--let $rpl_server_number= 2
76+--let $rpl_server_parameters= --log-slave-updates=FALSE
77+--source include/rpl_restart_server.inc
78+--source include/start_slave.inc
79+
80+connection master;
81+
82+INSERT INTO x VALUES (2);
83+
84+sync_slave_with_master;
85+
86+SELECT a FROM x ORDER BY a;
87+
88+# Kill the slave to trigger binlog position recovery from
89+# "prepare" point on the next startup
90+-- exec echo "wait" > $MYSQLTEST_VARDIR/tmp/mysqld.2.expect
91+-- shutdown_server 0
92+-- source include/wait_until_disconnected.inc
93+
94+--let $rpl_server_number= 2
95+--let $rpl_server_parameters= --log-slave-updates=FALSE
96+--source include/rpl_start_server.inc
97+
98+# This will fail if the bug is present: the binlog coordinates
99+# at "prepare" point have been last updated before the server
100+# restart. After the restart the slave was running without
101+# --log-slave-updates, skipping the "prepare" point update. Thus
102+# on startup slave will read the obsolete position and fail.
103+# After the fix the "prepare" point will be current.
104+--source include/start_slave.inc
105+
106+connection master;
107+
108+sync_slave_with_master;
109+
110+SELECT a FROM x ORDER BY a;
111+
112+connection master;
113+
114+DROP TABLE x;
115+
116+--source include/rpl_end.inc
117
118=== modified file 'Percona-Server/storage/innodb_plugin/trx/trx0trx.c'
119--- Percona-Server/storage/innodb_plugin/trx/trx0trx.c 2012-08-16 13:36:42 +0000
120+++ Percona-Server/storage/innodb_plugin/trx/trx0trx.c 2013-05-10 09:38:28 +0000
121@@ -910,6 +910,16 @@
122 trx->mysql_master_log_file_name,
123 trx->mysql_master_log_pos,
124 TRX_SYS_COMMIT_MASTER_LOG_INFO, &mtr);
125+ trx_sys_update_mysql_binlog_offset(
126+ sys_header,
127+ trx->mysql_relay_log_file_name,
128+ trx->mysql_relay_log_pos,
129+ TRX_SYS_MYSQL_RELAY_LOG_INFO, &mtr);
130+ trx_sys_update_mysql_binlog_offset(
131+ sys_header,
132+ trx->mysql_master_log_file_name,
133+ trx->mysql_master_log_pos,
134+ TRX_SYS_MYSQL_MASTER_LOG_INFO, &mtr);
135 trx->mysql_master_log_file_name = "";
136 }
137

Subscribers

People subscribed via source and target branches