Merge lp:~sergei.glushchenko/percona-server/5.6-ST43091-ps-bug1331586 into lp:percona-server/5.6

Proposed by Sergei Glushchenko
Status: Merged
Approved by: Laurynas Biveinis
Approved revision: no longer in the source branch.
Merged at revision: 674
Proposed branch: lp:~sergei.glushchenko/percona-server/5.6-ST43091-ps-bug1331586
Merge into: lp:percona-server/5.6
Diff against target: 526 lines (+409/-1)
9 files modified
mysql-test/suite/rpl/r/percona_bug1331586.result (+139/-0)
mysql-test/suite/rpl/t/percona_bug1331586-master.opt (+3/-0)
mysql-test/suite/rpl/t/percona_bug1331586-slave.opt (+5/-0)
mysql-test/suite/rpl/t/percona_bug1331586.test (+148/-0)
sql/log_event.h (+25/-0)
sql/rpl_master.cc (+10/-0)
sql/rpl_rli.cc (+2/-1)
sql/rpl_rli.h (+23/-0)
sql/rpl_slave.cc (+54/-0)
To merge this branch: bzr merge lp:~sergei.glushchenko/percona-server/5.6-ST43091-ps-bug1331586
Reviewer Review Type Date Requested Status
Laurynas Biveinis (community) Approve
Review via email: mp+229168@code.launchpad.net

Description of the change

This patch addresses the issue when incomplete group is read from the
relay log my MTS slave. I tested single-thread slave and it handles
this case correctly.

Incomplete group (not ended by COMMIT/ROLLBACK/XID) can be written
into relay log in two cases:

1. Slave IO thread restart in the middle of the group.
2. Slave crash or shutdown while receiving log group from master.
3. Master gone when sending events to slave.
4. FLUSH LOGS on slave.

First three cases produce incomplete group at the end of the relay
log file which should be rolled back, while 4th produces log group
spread into two relay log files which should be committed. To
distinguish between these two cases we look into log format events
and stop events. Stop event should always do rollback while format
event should cause rollback in case when "created" flag is set.

Rollback is made by returning artificially generated "ROLLBACK" from
next_event() when needed.

To make test case possible I injected debug-only sync points in
certain places in master IO thread and slave IO thread to produce
incomplete log groups.

To post a comment you must log in.
Revision history for this message
Sergei Glushchenko (sergei.glushchenko) wrote :
Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

    - I wonder whether the Relay_log_info::rli_next_event is properly
      cleaned up in all error code paths, such as slave killed while
      rli_next_event != NULL.

The rest of review is in the diff comments, mostly typos and such.

review: Needs Fixing
Revision history for this message
Sergei Glushchenko (sergei.glushchenko) wrote :

http://jenkins.percona.com/view/PS%205.6/job/percona-server-5.6-param/729/

Changes:

* typo in cleanup
* should_rollback renamed to should_rollback_current_group and made const
* rli_next_event made private
* removed newline in rpl_slave.cc
* doesnt in comments replaced with doesn't
* To prevent MTS from stall -> To prevent MTS from stalling
* removed #if 0'ed block in rpl_slave.cc
* added cleanup code into rpl_slave.cc (delete rli_nex_event)

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) :
review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== added file 'mysql-test/suite/rpl/r/percona_bug1331586.result'
--- mysql-test/suite/rpl/r/percona_bug1331586.result 1970-01-01 00:00:00 +0000
+++ mysql-test/suite/rpl/r/percona_bug1331586.result 2014-10-06 10:07:51 +0000
@@ -0,0 +1,139 @@
1include/master-slave.inc
2Warnings:
3Note #### Sending passwords in plain text without SSL/TLS is extremely insecure.
4Note #### Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
5[connection master]
6call mtr.add_suppression("Slave:*");
7create table a(id int) engine=InnoDB;
8insert into a values(11);
9start transaction;
10insert into a values(12);
11set global debug="d,relay_xid_trigger";
12insert into a values(13);
13commit;
14set debug_sync="now wait_for relay_xid_reached";
15flush logs;
16set global debug="";
17set debug_sync="now signal resume";
18# master:
19select * from a;
20id
2111
2212
2313
24# slave:
25select * from a;
26id
2711
2812
2913
30insert into a values(21);
31set global debug="d,stop_slave_middle_group,incomplete_group_in_relay_log";
32set global debug="d,master_xid_trigger";
33start transaction;
34insert into a values(22);
35insert into a values(23);
36commit;
37set debug_sync="now wait_for master_xid_reached";
38include/rpl_restart_server.inc [server_number=2]
39include/start_slave.inc
40set global debug="";
41set debug_sync="now signal resume";
42# master:
43select * from a;
44id
4511
4612
4713
4821
4922
5023
51# slave:
52select * from a;
53id
5411
5512
5613
5721
5822
5923
60insert into a values(31);
61start transaction;
62insert into a values(32);
63set global debug="d,relay_xid_trigger";
64insert into a values(33);
65commit;
66set debug_sync="now wait_for relay_xid_reached";
67include/rpl_restart_server.inc [server_number=2]
68include/start_slave.inc
69set global debug="";
70set debug_sync="now signal resume";
71# master:
72select * from a;
73id
7411
7512
7613
7721
7822
7923
8031
8132
8233
83# slave:
84select * from a;
85id
8611
8712
8813
8921
9022
9123
9231
9332
9433
95insert into a values(41);
96set global debug="d,master_xid_trigger";
97start transaction;
98insert into a values(42);
99insert into a values(43);
100commit;
101set debug_sync="now wait_for master_xid_reached";
102set global debug="d,stop_slave_middle_group,incomplete_group_in_relay_log";
103include/stop_slave.inc
104set global debug="";
105include/start_slave.inc
106set global debug="";
107set debug_sync="now signal resume";
108# master:
109select * from a;
110id
11111
11212
11313
11421
11522
11623
11731
11832
11933
12041
12142
12243
123# slave:
124select * from a;
125id
12611
12712
12813
12921
13022
13123
13231
13332
13433
13541
13642
13743
138drop table a;
139include/rpl_end.inc
0140
=== added file 'mysql-test/suite/rpl/t/percona_bug1331586-master.opt'
--- mysql-test/suite/rpl/t/percona_bug1331586-master.opt 1970-01-01 00:00:00 +0000
+++ mysql-test/suite/rpl/t/percona_bug1331586-master.opt 2014-10-06 10:07:51 +0000
@@ -0,0 +1,3 @@
1--gtid-mode=ON
2--enforce-gtid-consistency=true
3--log-slave-updates
04
=== added file 'mysql-test/suite/rpl/t/percona_bug1331586-slave.opt'
--- mysql-test/suite/rpl/t/percona_bug1331586-slave.opt 1970-01-01 00:00:00 +0000
+++ mysql-test/suite/rpl/t/percona_bug1331586-slave.opt 2014-10-06 10:07:51 +0000
@@ -0,0 +1,5 @@
1--slave-parallel-workers=2
2--gtid-mode=ON
3--enforce-gtid-consistency=true
4--log-slave-updates
5--slave-transaction-retries=0
06
=== added file 'mysql-test/suite/rpl/t/percona_bug1331586.test'
--- mysql-test/suite/rpl/t/percona_bug1331586.test 1970-01-01 00:00:00 +0000
+++ mysql-test/suite/rpl/t/percona_bug1331586.test 2014-10-06 10:07:51 +0000
@@ -0,0 +1,148 @@
1--source include/have_debug.inc
2--source include/have_debug_sync.inc
3--source include/master-slave.inc
4--source include/have_innodb.inc
5--source include/have_binlog_format_statement.inc
6
7call mtr.add_suppression("Slave:*");
8
9create table a(id int) engine=InnoDB;
10insert into a values(11);
11
12sync_slave_with_master;
13connection master;
14
15start transaction;
16insert into a values(12);
17
18connection slave;
19set global debug="d,relay_xid_trigger";
20
21connection master;
22insert into a values(13);
23commit;
24
25connection slave;
26set debug_sync="now wait_for relay_xid_reached";
27
28flush logs;
29
30set global debug="";
31set debug_sync="now signal resume";
32
33connection master;
34echo # master: ;
35select * from a;
36
37sync_slave_with_master;
38echo # slave: ;
39select * from a;
40
41# round two
42connection master;
43insert into a values(21);
44
45sync_slave_with_master;
46set global debug="d,stop_slave_middle_group,incomplete_group_in_relay_log";
47connection master;
48set global debug="d,master_xid_trigger";
49
50start transaction;
51insert into a values(22);
52insert into a values(23);
53commit;
54
55set debug_sync="now wait_for master_xid_reached";
56
57connection slave;
58--let $rpl_server_number= 2
59--source include/rpl_restart_server.inc
60--source include/start_slave.inc
61
62connection master;
63set global debug="";
64set debug_sync="now signal resume";
65
66echo # master: ;
67select * from a;
68
69sync_slave_with_master;
70echo # slave: ;
71select * from a;
72
73# round three
74connection master;
75insert into a values(31);
76
77sync_slave_with_master;
78connection master;
79
80start transaction;
81insert into a values(32);
82
83connection slave;
84set global debug="d,relay_xid_trigger";
85
86connection master;
87insert into a values(33);
88commit;
89
90connection slave;
91set debug_sync="now wait_for relay_xid_reached";
92
93--let $rpl_server_number= 2
94--let $rpl_force_stop= 1
95--source include/rpl_restart_server.inc
96--source include/start_slave.inc
97
98set global debug="";
99set debug_sync="now signal resume";
100
101connection master;
102echo # master: ;
103select * from a;
104
105sync_slave_with_master;
106echo # slave: ;
107select * from a;
108
109# round four
110connection master;
111insert into a values(41);
112
113sync_slave_with_master;
114connection master;
115set global debug="d,master_xid_trigger";
116
117start transaction;
118insert into a values(42);
119insert into a values(43);
120commit;
121
122set debug_sync="now wait_for master_xid_reached";
123
124connection slave;
125set global debug="d,stop_slave_middle_group,incomplete_group_in_relay_log";
126--source include/stop_slave.inc
127set global debug="";
128--source include/start_slave.inc
129
130connection master;
131set global debug="";
132set debug_sync="now signal resume";
133
134echo # master: ;
135select * from a;
136
137sync_slave_with_master;
138echo # slave: ;
139select * from a;
140
141# cleanup
142
143connection master;
144drop table a;
145
146sync_slave_with_master;
147
148--source include/rpl_end.inc
0149
=== modified file 'sql/log_event.h'
--- sql/log_event.h 2014-09-25 14:16:07 +0000
+++ sql/log_event.h 2014-10-06 10:07:51 +0000
@@ -1389,6 +1389,15 @@
13891389
1390#if defined(MYSQL_SERVER) && defined(HAVE_REPLICATION)1390#if defined(MYSQL_SERVER) && defined(HAVE_REPLICATION)
13911391
1392 /**
1393 Re-implement this function for events which should rollback current group,
1394 such as "ROLLBACK" statement, format description event or stop event.
1395 */
1396 virtual bool should_rollback_current_group() const
1397 {
1398 return false;
1399 }
1400
1392private:1401private:
13931402
1394 /*1403 /*
@@ -2673,6 +2682,14 @@
2673 return START_V3_HEADER_LEN; //no variable-sized part2682 return START_V3_HEADER_LEN; //no variable-sized part
2674 }2683 }
26752684
2685 /**
2686 New log file rolls the current group back if "created" property is set.
2687 */
2688 virtual bool should_rollback_current_group() const
2689 {
2690 return created != 0;
2691 }
2692
2676protected:2693protected:
2677#if defined(MYSQL_SERVER) && defined(HAVE_REPLICATION)2694#if defined(MYSQL_SERVER) && defined(HAVE_REPLICATION)
2678 virtual int do_apply_event(Relay_log_info const *rli);2695 virtual int do_apply_event(Relay_log_info const *rli);
@@ -3075,6 +3092,14 @@
3075 Log_event_type get_type_code() { return STOP_EVENT;}3092 Log_event_type get_type_code() { return STOP_EVENT;}
3076 bool is_valid() const { return 1; }3093 bool is_valid() const { return 1; }
30773094
3095 /**
3096 Always rollback current group
3097 */
3098 virtual bool should_rollback_current_group() const
3099 {
3100 return true;
3101 }
3102
3078private:3103private:
3079#if defined(MYSQL_SERVER) && defined(HAVE_REPLICATION)3104#if defined(MYSQL_SERVER) && defined(HAVE_REPLICATION)
3080 virtual int do_update_pos(Relay_log_info *rli);3105 virtual int do_update_pos(Relay_log_info *rli);
30813106
=== modified file 'sql/rpl_master.cc'
--- sql/rpl_master.cc 2014-09-25 14:16:07 +0000
+++ sql/rpl_master.cc 2014-10-06 10:07:51 +0000
@@ -1402,6 +1402,16 @@
1402 GOTO_ERR;1402 GOTO_ERR;
1403 }1403 }
14041404
1405 DBUG_EXECUTE_IF("master_xid_trigger",
1406 {
1407 Log_event_type event_type= (Log_event_type)
1408 (*packet)[EVENT_TYPE_OFFSET + ev_offset];
1409 if (event_type == XID_EVENT)
1410 {
1411 const char act[]= "now signal master_xid_reached wait_for resume";
1412 DBUG_ASSERT(!debug_sync_set_action(current_thd,
1413 STRING_WITH_LEN(act)));
1414 }});
14051415
1406 DBUG_EXECUTE_IF("dump_thread_wait_before_send_xid",1416 DBUG_EXECUTE_IF("dump_thread_wait_before_send_xid",
1407 {1417 {
14081418
=== modified file 'sql/rpl_rli.cc'
--- sql/rpl_rli.cc 2014-09-25 14:16:07 +0000
+++ sql/rpl_rli.cc 2014-10-06 10:07:51 +0000
@@ -98,7 +98,8 @@
98 mts_group_status(MTS_NOT_IN_GROUP), reported_unsafe_warning(false),98 mts_group_status(MTS_NOT_IN_GROUP), reported_unsafe_warning(false),
99 rli_description_event(NULL),99 rli_description_event(NULL),
100 sql_delay(0), sql_delay_end(0), m_flags(0), row_stmt_start_timestamp(0),100 sql_delay(0), sql_delay_end(0), m_flags(0), row_stmt_start_timestamp(0),
101 long_find_row_note_printed(false), error_on_rli_init_info(false)101 long_find_row_note_printed(false), error_on_rli_init_info(false),
102 rli_next_event(NULL)
102{103{
103 DBUG_ENTER("Relay_log_info::Relay_log_info");104 DBUG_ENTER("Relay_log_info::Relay_log_info");
104105
105106
=== modified file 'sql/rpl_rli.h'
--- sql/rpl_rli.h 2014-09-25 14:16:07 +0000
+++ sql/rpl_rli.h 2014-10-06 10:07:51 +0000
@@ -894,6 +894,22 @@
894 }894 }
895895
896 /**896 /**
897 Remember next event.
898 */
899 void set_rli_next_event(Log_event *ev)
900 {
901 rli_next_event= ev;
902 }
903
904 /**
905 Return the remembered event.
906 */
907 Log_event *get_rli_next_event() const
908 {
909 return rli_next_event;
910 }
911
912 /**
897 adaptation for the slave applier to specific master versions.913 adaptation for the slave applier to specific master versions.
898 */914 */
899 void adapt_to_master_version(Format_description_log_event *fdle);915 void adapt_to_master_version(Format_description_log_event *fdle);
@@ -965,6 +981,13 @@
965 SLAVE must be executed and the problem fixed manually.981 SLAVE must be executed and the problem fixed manually.
966 */982 */
967 bool error_on_rli_init_info;983 bool error_on_rli_init_info;
984
985 /*
986 This variable is used when we want to inject "ROLLBACK" event just before
987 the real event. The real event is remembered here and is returned by
988 the next call.
989 */
990 Log_event *rli_next_event;
968};991};
969992
970bool mysql_show_relaylog_events(THD* thd);993bool mysql_show_relaylog_events(THD* thd);
971994
=== modified file 'sql/rpl_slave.cc'
--- sql/rpl_slave.cc 2014-09-26 04:43:24 +0000
+++ sql/rpl_slave.cc 2014-10-06 10:07:51 +0000
@@ -4485,6 +4485,22 @@
4485 */4485 */
4486 THD_STAGE_INFO(thd, stage_waiting_for_master_to_send_event);4486 THD_STAGE_INFO(thd, stage_waiting_for_master_to_send_event);
4487 event_len= read_event(mysql, mi, &suppress_warnings);4487 event_len= read_event(mysql, mi, &suppress_warnings);
4488
4489 DBUG_EXECUTE_IF("relay_xid_trigger",
4490 if (event_len != packet_error)
4491 {
4492 const char* event_buf= (const char*)mysql->net.read_pos + 1;
4493 Log_event_type event_type= (Log_event_type)
4494 event_buf[EVENT_TYPE_OFFSET];
4495 if (event_type == XID_EVENT)
4496 {
4497 const char act[]= "now signal relay_xid_reached wait_for resume";
4498 DBUG_ASSERT(!debug_sync_set_action(current_thd,
4499 STRING_WITH_LEN(act)));
4500 }
4501 }
4502 );
4503
4488 if (check_io_slave_killed(thd, mi, "Slave I/O thread killed while \4504 if (check_io_slave_killed(thd, mi, "Slave I/O thread killed while \
4489reading event"))4505reading event"))
4490 goto err;4506 goto err;
@@ -6016,6 +6032,8 @@
6016 /* we die so won't remember charset - re-update them on next thread start */6032 /* we die so won't remember charset - re-update them on next thread start */
6017 rli->cached_charset_invalidate();6033 rli->cached_charset_invalidate();
6018 rli->save_temporary_tables = thd->temporary_tables;6034 rli->save_temporary_tables = thd->temporary_tables;
6035 delete rli->get_rli_next_event();
6036 rli->set_rli_next_event(NULL);
60196037
6020 /*6038 /*
6021 TODO: see if we can do this conditionally in next_event() instead6039 TODO: see if we can do this conditionally in next_event() instead
@@ -7194,6 +7212,12 @@
71947212
7195 DBUG_ASSERT(thd != 0);7213 DBUG_ASSERT(thd != 0);
71967214
7215 if ((ev = rli->get_rli_next_event()) != NULL)
7216 {
7217 rli->set_rli_next_event(NULL);
7218 DBUG_RETURN(ev);
7219 }
7220
7197#ifndef DBUG_OFF7221#ifndef DBUG_OFF
7198 if (abort_slave_event_count && !rli->events_until_exit--)7222 if (abort_slave_event_count && !rli->events_until_exit--)
7199 DBUG_RETURN(0);7223 DBUG_RETURN(0);
@@ -7318,6 +7342,36 @@
7318 sql_slave_killed(thd, rli));7342 sql_slave_killed(thd, rli));
7319 mysql_mutex_lock(&rli->data_lock);7343 mysql_mutex_lock(&rli->data_lock);
7320 }7344 }
7345 if (rli->is_parallel_exec() &&
7346 ev->should_rollback_current_group() &&
7347 (rli->curr_group_seen_begin || rli->curr_group_seen_gtid))
7348 {
7349 /* We reached the end of relay log file and it doesn't end with
7350 COMMIT or ROLLBACK. To prevent MTS from stalling we generate
7351 ROLLBACK event. Single-treaded slave takes care of such
7352 cases (see Format_description_log_event::do_apply_event).
7353 When relay_log_recovery is ON, relay log will be redownloaded
7354 from the master. */
7355 rli->report(WARNING_LEVEL, 0,
7356 "injecting ROLLBACK at the end of cold group");
7357 Query_log_event *rollback_event=
7358 new Query_log_event(thd, STRING_WITH_LEN("ROLLBACK"),
7359 TRUE, FALSE, TRUE, 0, FALSE);
7360 if (unlikely(!rollback_event))
7361 {
7362 errmsg= "Slave SQL thread failed to create a ROLLBACK event "
7363 "(out of memory?), MTS may stall";
7364 goto err;
7365 }
7366 rollback_event->data_written = 0;
7367 rollback_event->db= "";
7368 rollback_event->db_len= 0;
7369 rollback_event->server_id= 0; // won't be ignored by slave SQL thread
7370 rollback_event->set_artificial_event();
7371 rollback_event->future_event_relay_log_pos = BIN_LOG_HEADER_SIZE;
7372 rli->set_rli_next_event(ev);
7373 DBUG_RETURN(rollback_event);
7374 }
7321 DBUG_RETURN(ev);7375 DBUG_RETURN(ev);
7322 }7376 }
7323 DBUG_ASSERT(thd==rli->info_thd);7377 DBUG_ASSERT(thd==rli->info_thd);

Subscribers

People subscribed via source and target branches