Merge lp:~sergei.glushchenko/percona-server/5.6-ST43091-ps-bug1331586 into lp:percona-server/5.6

Proposed by Sergei Glushchenko
Status: Merged
Approved by: Laurynas Biveinis
Approved revision: 619
Merged at revision: 674
Proposed branch: lp:~sergei.glushchenko/percona-server/5.6-ST43091-ps-bug1331586
Merge into: lp:percona-server/5.6
Diff against target: 526 lines (+409/-1)
9 files modified
mysql-test/suite/rpl/r/percona_bug1331586.result (+139/-0)
mysql-test/suite/rpl/t/percona_bug1331586-master.opt (+3/-0)
mysql-test/suite/rpl/t/percona_bug1331586-slave.opt (+5/-0)
mysql-test/suite/rpl/t/percona_bug1331586.test (+148/-0)
sql/log_event.h (+25/-0)
sql/rpl_master.cc (+10/-0)
sql/rpl_rli.cc (+2/-1)
sql/rpl_rli.h (+23/-0)
sql/rpl_slave.cc (+54/-0)
To merge this branch: bzr merge lp:~sergei.glushchenko/percona-server/5.6-ST43091-ps-bug1331586
Reviewer Review Type Date Requested Status
Laurynas Biveinis (community) Approve
Review via email: mp+229168@code.launchpad.net

Description of the change

This patch addresses the issue when incomplete group is read from the
relay log my MTS slave. I tested single-thread slave and it handles
this case correctly.

Incomplete group (not ended by COMMIT/ROLLBACK/XID) can be written
into relay log in two cases:

1. Slave IO thread restart in the middle of the group.
2. Slave crash or shutdown while receiving log group from master.
3. Master gone when sending events to slave.
4. FLUSH LOGS on slave.

First three cases produce incomplete group at the end of the relay
log file which should be rolled back, while 4th produces log group
spread into two relay log files which should be committed. To
distinguish between these two cases we look into log format events
and stop events. Stop event should always do rollback while format
event should cause rollback in case when "created" flag is set.

Rollback is made by returning artificially generated "ROLLBACK" from
next_event() when needed.

To make test case possible I injected debug-only sync points in
certain places in master IO thread and slave IO thread to produce
incomplete log groups.

To post a comment you must log in.

    - I wonder whether the Relay_log_info::rli_next_event is properly
      cleaned up in all error code paths, such as slave killed while
      rli_next_event != NULL.

The rest of review is in the diff comments, mostly typos and such.

review: Needs Fixing

http://jenkins.percona.com/view/PS%205.6/job/percona-server-5.6-param/729/

Changes:

* typo in cleanup
* should_rollback renamed to should_rollback_current_group and made const
* rli_next_event made private
* removed newline in rpl_slave.cc
* doesnt in comments replaced with doesn't
* To prevent MTS from stall -> To prevent MTS from stalling
* removed #if 0'ed block in rpl_slave.cc
* added cleanup code into rpl_slave.cc (delete rli_nex_event)

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== added file 'mysql-test/suite/rpl/r/percona_bug1331586.result'
2--- mysql-test/suite/rpl/r/percona_bug1331586.result 1970-01-01 00:00:00 +0000
3+++ mysql-test/suite/rpl/r/percona_bug1331586.result 2014-10-06 10:07:51 +0000
4@@ -0,0 +1,139 @@
5+include/master-slave.inc
6+Warnings:
7+Note #### Sending passwords in plain text without SSL/TLS is extremely insecure.
8+Note #### Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
9+[connection master]
10+call mtr.add_suppression("Slave:*");
11+create table a(id int) engine=InnoDB;
12+insert into a values(11);
13+start transaction;
14+insert into a values(12);
15+set global debug="d,relay_xid_trigger";
16+insert into a values(13);
17+commit;
18+set debug_sync="now wait_for relay_xid_reached";
19+flush logs;
20+set global debug="";
21+set debug_sync="now signal resume";
22+# master:
23+select * from a;
24+id
25+11
26+12
27+13
28+# slave:
29+select * from a;
30+id
31+11
32+12
33+13
34+insert into a values(21);
35+set global debug="d,stop_slave_middle_group,incomplete_group_in_relay_log";
36+set global debug="d,master_xid_trigger";
37+start transaction;
38+insert into a values(22);
39+insert into a values(23);
40+commit;
41+set debug_sync="now wait_for master_xid_reached";
42+include/rpl_restart_server.inc [server_number=2]
43+include/start_slave.inc
44+set global debug="";
45+set debug_sync="now signal resume";
46+# master:
47+select * from a;
48+id
49+11
50+12
51+13
52+21
53+22
54+23
55+# slave:
56+select * from a;
57+id
58+11
59+12
60+13
61+21
62+22
63+23
64+insert into a values(31);
65+start transaction;
66+insert into a values(32);
67+set global debug="d,relay_xid_trigger";
68+insert into a values(33);
69+commit;
70+set debug_sync="now wait_for relay_xid_reached";
71+include/rpl_restart_server.inc [server_number=2]
72+include/start_slave.inc
73+set global debug="";
74+set debug_sync="now signal resume";
75+# master:
76+select * from a;
77+id
78+11
79+12
80+13
81+21
82+22
83+23
84+31
85+32
86+33
87+# slave:
88+select * from a;
89+id
90+11
91+12
92+13
93+21
94+22
95+23
96+31
97+32
98+33
99+insert into a values(41);
100+set global debug="d,master_xid_trigger";
101+start transaction;
102+insert into a values(42);
103+insert into a values(43);
104+commit;
105+set debug_sync="now wait_for master_xid_reached";
106+set global debug="d,stop_slave_middle_group,incomplete_group_in_relay_log";
107+include/stop_slave.inc
108+set global debug="";
109+include/start_slave.inc
110+set global debug="";
111+set debug_sync="now signal resume";
112+# master:
113+select * from a;
114+id
115+11
116+12
117+13
118+21
119+22
120+23
121+31
122+32
123+33
124+41
125+42
126+43
127+# slave:
128+select * from a;
129+id
130+11
131+12
132+13
133+21
134+22
135+23
136+31
137+32
138+33
139+41
140+42
141+43
142+drop table a;
143+include/rpl_end.inc
144
145=== added file 'mysql-test/suite/rpl/t/percona_bug1331586-master.opt'
146--- mysql-test/suite/rpl/t/percona_bug1331586-master.opt 1970-01-01 00:00:00 +0000
147+++ mysql-test/suite/rpl/t/percona_bug1331586-master.opt 2014-10-06 10:07:51 +0000
148@@ -0,0 +1,3 @@
149+--gtid-mode=ON
150+--enforce-gtid-consistency=true
151+--log-slave-updates
152
153=== added file 'mysql-test/suite/rpl/t/percona_bug1331586-slave.opt'
154--- mysql-test/suite/rpl/t/percona_bug1331586-slave.opt 1970-01-01 00:00:00 +0000
155+++ mysql-test/suite/rpl/t/percona_bug1331586-slave.opt 2014-10-06 10:07:51 +0000
156@@ -0,0 +1,5 @@
157+--slave-parallel-workers=2
158+--gtid-mode=ON
159+--enforce-gtid-consistency=true
160+--log-slave-updates
161+--slave-transaction-retries=0
162
163=== added file 'mysql-test/suite/rpl/t/percona_bug1331586.test'
164--- mysql-test/suite/rpl/t/percona_bug1331586.test 1970-01-01 00:00:00 +0000
165+++ mysql-test/suite/rpl/t/percona_bug1331586.test 2014-10-06 10:07:51 +0000
166@@ -0,0 +1,148 @@
167+--source include/have_debug.inc
168+--source include/have_debug_sync.inc
169+--source include/master-slave.inc
170+--source include/have_innodb.inc
171+--source include/have_binlog_format_statement.inc
172+
173+call mtr.add_suppression("Slave:*");
174+
175+create table a(id int) engine=InnoDB;
176+insert into a values(11);
177+
178+sync_slave_with_master;
179+connection master;
180+
181+start transaction;
182+insert into a values(12);
183+
184+connection slave;
185+set global debug="d,relay_xid_trigger";
186+
187+connection master;
188+insert into a values(13);
189+commit;
190+
191+connection slave;
192+set debug_sync="now wait_for relay_xid_reached";
193+
194+flush logs;
195+
196+set global debug="";
197+set debug_sync="now signal resume";
198+
199+connection master;
200+echo # master: ;
201+select * from a;
202+
203+sync_slave_with_master;
204+echo # slave: ;
205+select * from a;
206+
207+# round two
208+connection master;
209+insert into a values(21);
210+
211+sync_slave_with_master;
212+set global debug="d,stop_slave_middle_group,incomplete_group_in_relay_log";
213+connection master;
214+set global debug="d,master_xid_trigger";
215+
216+start transaction;
217+insert into a values(22);
218+insert into a values(23);
219+commit;
220+
221+set debug_sync="now wait_for master_xid_reached";
222+
223+connection slave;
224+--let $rpl_server_number= 2
225+--source include/rpl_restart_server.inc
226+--source include/start_slave.inc
227+
228+connection master;
229+set global debug="";
230+set debug_sync="now signal resume";
231+
232+echo # master: ;
233+select * from a;
234+
235+sync_slave_with_master;
236+echo # slave: ;
237+select * from a;
238+
239+# round three
240+connection master;
241+insert into a values(31);
242+
243+sync_slave_with_master;
244+connection master;
245+
246+start transaction;
247+insert into a values(32);
248+
249+connection slave;
250+set global debug="d,relay_xid_trigger";
251+
252+connection master;
253+insert into a values(33);
254+commit;
255+
256+connection slave;
257+set debug_sync="now wait_for relay_xid_reached";
258+
259+--let $rpl_server_number= 2
260+--let $rpl_force_stop= 1
261+--source include/rpl_restart_server.inc
262+--source include/start_slave.inc
263+
264+set global debug="";
265+set debug_sync="now signal resume";
266+
267+connection master;
268+echo # master: ;
269+select * from a;
270+
271+sync_slave_with_master;
272+echo # slave: ;
273+select * from a;
274+
275+# round four
276+connection master;
277+insert into a values(41);
278+
279+sync_slave_with_master;
280+connection master;
281+set global debug="d,master_xid_trigger";
282+
283+start transaction;
284+insert into a values(42);
285+insert into a values(43);
286+commit;
287+
288+set debug_sync="now wait_for master_xid_reached";
289+
290+connection slave;
291+set global debug="d,stop_slave_middle_group,incomplete_group_in_relay_log";
292+--source include/stop_slave.inc
293+set global debug="";
294+--source include/start_slave.inc
295+
296+connection master;
297+set global debug="";
298+set debug_sync="now signal resume";
299+
300+echo # master: ;
301+select * from a;
302+
303+sync_slave_with_master;
304+echo # slave: ;
305+select * from a;
306+
307+# cleanup
308+
309+connection master;
310+drop table a;
311+
312+sync_slave_with_master;
313+
314+--source include/rpl_end.inc
315
316=== modified file 'sql/log_event.h'
317--- sql/log_event.h 2014-09-25 14:16:07 +0000
318+++ sql/log_event.h 2014-10-06 10:07:51 +0000
319@@ -1389,6 +1389,15 @@
320
321 #if defined(MYSQL_SERVER) && defined(HAVE_REPLICATION)
322
323+ /**
324+ Re-implement this function for events which should rollback current group,
325+ such as "ROLLBACK" statement, format description event or stop event.
326+ */
327+ virtual bool should_rollback_current_group() const
328+ {
329+ return false;
330+ }
331+
332 private:
333
334 /*
335@@ -2673,6 +2682,14 @@
336 return START_V3_HEADER_LEN; //no variable-sized part
337 }
338
339+ /**
340+ New log file rolls the current group back if "created" property is set.
341+ */
342+ virtual bool should_rollback_current_group() const
343+ {
344+ return created != 0;
345+ }
346+
347 protected:
348 #if defined(MYSQL_SERVER) && defined(HAVE_REPLICATION)
349 virtual int do_apply_event(Relay_log_info const *rli);
350@@ -3075,6 +3092,14 @@
351 Log_event_type get_type_code() { return STOP_EVENT;}
352 bool is_valid() const { return 1; }
353
354+ /**
355+ Always rollback current group
356+ */
357+ virtual bool should_rollback_current_group() const
358+ {
359+ return true;
360+ }
361+
362 private:
363 #if defined(MYSQL_SERVER) && defined(HAVE_REPLICATION)
364 virtual int do_update_pos(Relay_log_info *rli);
365
366=== modified file 'sql/rpl_master.cc'
367--- sql/rpl_master.cc 2014-09-25 14:16:07 +0000
368+++ sql/rpl_master.cc 2014-10-06 10:07:51 +0000
369@@ -1402,6 +1402,16 @@
370 GOTO_ERR;
371 }
372
373+ DBUG_EXECUTE_IF("master_xid_trigger",
374+ {
375+ Log_event_type event_type= (Log_event_type)
376+ (*packet)[EVENT_TYPE_OFFSET + ev_offset];
377+ if (event_type == XID_EVENT)
378+ {
379+ const char act[]= "now signal master_xid_reached wait_for resume";
380+ DBUG_ASSERT(!debug_sync_set_action(current_thd,
381+ STRING_WITH_LEN(act)));
382+ }});
383
384 DBUG_EXECUTE_IF("dump_thread_wait_before_send_xid",
385 {
386
387=== modified file 'sql/rpl_rli.cc'
388--- sql/rpl_rli.cc 2014-09-25 14:16:07 +0000
389+++ sql/rpl_rli.cc 2014-10-06 10:07:51 +0000
390@@ -98,7 +98,8 @@
391 mts_group_status(MTS_NOT_IN_GROUP), reported_unsafe_warning(false),
392 rli_description_event(NULL),
393 sql_delay(0), sql_delay_end(0), m_flags(0), row_stmt_start_timestamp(0),
394- long_find_row_note_printed(false), error_on_rli_init_info(false)
395+ long_find_row_note_printed(false), error_on_rli_init_info(false),
396+ rli_next_event(NULL)
397 {
398 DBUG_ENTER("Relay_log_info::Relay_log_info");
399
400
401=== modified file 'sql/rpl_rli.h'
402--- sql/rpl_rli.h 2014-09-25 14:16:07 +0000
403+++ sql/rpl_rli.h 2014-10-06 10:07:51 +0000
404@@ -894,6 +894,22 @@
405 }
406
407 /**
408+ Remember next event.
409+ */
410+ void set_rli_next_event(Log_event *ev)
411+ {
412+ rli_next_event= ev;
413+ }
414+
415+ /**
416+ Return the remembered event.
417+ */
418+ Log_event *get_rli_next_event() const
419+ {
420+ return rli_next_event;
421+ }
422+
423+ /**
424 adaptation for the slave applier to specific master versions.
425 */
426 void adapt_to_master_version(Format_description_log_event *fdle);
427@@ -965,6 +981,13 @@
428 SLAVE must be executed and the problem fixed manually.
429 */
430 bool error_on_rli_init_info;
431+
432+ /*
433+ This variable is used when we want to inject "ROLLBACK" event just before
434+ the real event. The real event is remembered here and is returned by
435+ the next call.
436+ */
437+ Log_event *rli_next_event;
438 };
439
440 bool mysql_show_relaylog_events(THD* thd);
441
442=== modified file 'sql/rpl_slave.cc'
443--- sql/rpl_slave.cc 2014-09-26 04:43:24 +0000
444+++ sql/rpl_slave.cc 2014-10-06 10:07:51 +0000
445@@ -4485,6 +4485,22 @@
446 */
447 THD_STAGE_INFO(thd, stage_waiting_for_master_to_send_event);
448 event_len= read_event(mysql, mi, &suppress_warnings);
449+
450+ DBUG_EXECUTE_IF("relay_xid_trigger",
451+ if (event_len != packet_error)
452+ {
453+ const char* event_buf= (const char*)mysql->net.read_pos + 1;
454+ Log_event_type event_type= (Log_event_type)
455+ event_buf[EVENT_TYPE_OFFSET];
456+ if (event_type == XID_EVENT)
457+ {
458+ const char act[]= "now signal relay_xid_reached wait_for resume";
459+ DBUG_ASSERT(!debug_sync_set_action(current_thd,
460+ STRING_WITH_LEN(act)));
461+ }
462+ }
463+ );
464+
465 if (check_io_slave_killed(thd, mi, "Slave I/O thread killed while \
466 reading event"))
467 goto err;
468@@ -6016,6 +6032,8 @@
469 /* we die so won't remember charset - re-update them on next thread start */
470 rli->cached_charset_invalidate();
471 rli->save_temporary_tables = thd->temporary_tables;
472+ delete rli->get_rli_next_event();
473+ rli->set_rli_next_event(NULL);
474
475 /*
476 TODO: see if we can do this conditionally in next_event() instead
477@@ -7194,6 +7212,12 @@
478
479 DBUG_ASSERT(thd != 0);
480
481+ if ((ev = rli->get_rli_next_event()) != NULL)
482+ {
483+ rli->set_rli_next_event(NULL);
484+ DBUG_RETURN(ev);
485+ }
486+
487 #ifndef DBUG_OFF
488 if (abort_slave_event_count && !rli->events_until_exit--)
489 DBUG_RETURN(0);
490@@ -7318,6 +7342,36 @@
491 sql_slave_killed(thd, rli));
492 mysql_mutex_lock(&rli->data_lock);
493 }
494+ if (rli->is_parallel_exec() &&
495+ ev->should_rollback_current_group() &&
496+ (rli->curr_group_seen_begin || rli->curr_group_seen_gtid))
497+ {
498+ /* We reached the end of relay log file and it doesn't end with
499+ COMMIT or ROLLBACK. To prevent MTS from stalling we generate
500+ ROLLBACK event. Single-treaded slave takes care of such
501+ cases (see Format_description_log_event::do_apply_event).
502+ When relay_log_recovery is ON, relay log will be redownloaded
503+ from the master. */
504+ rli->report(WARNING_LEVEL, 0,
505+ "injecting ROLLBACK at the end of cold group");
506+ Query_log_event *rollback_event=
507+ new Query_log_event(thd, STRING_WITH_LEN("ROLLBACK"),
508+ TRUE, FALSE, TRUE, 0, FALSE);
509+ if (unlikely(!rollback_event))
510+ {
511+ errmsg= "Slave SQL thread failed to create a ROLLBACK event "
512+ "(out of memory?), MTS may stall";
513+ goto err;
514+ }
515+ rollback_event->data_written = 0;
516+ rollback_event->db= "";
517+ rollback_event->db_len= 0;
518+ rollback_event->server_id= 0; // won't be ignored by slave SQL thread
519+ rollback_event->set_artificial_event();
520+ rollback_event->future_event_relay_log_pos = BIN_LOG_HEADER_SIZE;
521+ rli->set_rli_next_event(ev);
522+ DBUG_RETURN(rollback_event);
523+ }
524 DBUG_RETURN(ev);
525 }
526 DBUG_ASSERT(thd==rli->info_thd);

Subscribers

People subscribed via source and target branches