Merge lp:~akopytov/percona-server/backup_locks_bugs into lp:percona-server/5.6

Proposed by Alexey Kopytov
Status: Merged
Approved by: Laurynas Biveinis
Approved revision: 657
Merged at revision: 663
Proposed branch: lp:~akopytov/percona-server/backup_locks_bugs
Merge into: lp:percona-server/5.6
Diff against target: 1000 lines (+487/-45)
16 files modified
mysql-test/suite/rpl/r/rpl_backup_locks.result (+7/-0)
mysql-test/suite/rpl/r/rpl_backup_locks_mts.result (+30/-0)
mysql-test/suite/rpl/r/rpl_bug1358836.result (+18/-0)
mysql-test/suite/rpl/t/rpl_backup_locks.test (+32/-0)
mysql-test/suite/rpl/t/rpl_backup_locks_mts-slave.opt (+1/-0)
mysql-test/suite/rpl/t/rpl_backup_locks_mts.test (+84/-0)
mysql-test/suite/rpl/t/rpl_bug1358836-slave.opt (+1/-0)
mysql-test/suite/rpl/t/rpl_bug1358836.test (+49/-0)
sql/binlog.cc (+13/-3)
sql/log_event.cc (+21/-0)
sql/rpl_info_factory.cc (+24/-0)
sql/rpl_rli.cc (+55/-21)
sql/rpl_rli.h (+17/-8)
sql/rpl_rli_pdb.cc (+56/-13)
sql/rpl_slave.cc (+77/-0)
sql/sql_class.h (+2/-0)
To merge this branch: bzr merge lp:~akopytov/percona-server/backup_locks_bugs
Reviewer Review Type Date Requested Status
Laurynas Biveinis (community) Approve
Review via email: mp+235794@code.launchpad.net

Description of the change

Bug #1358836: Backup locks do not guarantee consistent SHOW SLAVE STATUS
              with binary log disabled

Bug #1372806: Deadlock with LOCK BINLOG FOR BACKUP + SHOW SLAVE STATUS
              with binary log enabled

The fixes are inter-dependent, hence a single revision for both bugs.

The problem in bug #1358836 was that with the binary log disabled slave
SQL threads did not acquire binlog protection locks when committing in
some cases. The original backup locks implementation was done in
assumption that the only place where master binary logs are updated is
Relay_log_info::inc_event_relay_log_pos(). Which is incorrect and
replication code does not make figuring things like these
easy. Relay_log_info::inc_event_relay_log_pos() is only called for
multi-statement transactions. There are different code paths for
AUTOCOMMIT transactions and other kinds of updates, depending on the
storage engines involved, event types, and whether the slave is
multi-threaded.

Inspecting all possible code paths was difficult, because Relay_log_info
members that have to be protected by an active BINLOG backup lock were
sometimes modified directly, rather than with the ‘setter’ functions,
i.e. set_group_master_log_name() and set_group_master_log_pos().

Fixed by:

- asserting that the current connection has acquired a binlog protection
lock in Relay_log_info::set_group_master_log_name() and
Relay_log_info::set_group_master_log_pos();

- changing all code that accessed those variables directly to use
getter/setter functions instead;

- making sure that whenever a setter function is called (i.e. master
binary log coordinates are updated), binlog protection is acquired;

- renaming the group_master_log_{name,pos} member variables of
Relay_log_info to m_group_master_log_{name,pos} to prevent other code
that modifies those variables directly from creeping in on future
changes or upstream merges;

The problem in bug #1372806 was that with binary log enabled, an SQL
thread would acquire a binlog protection lock after locking the
Relay_log_info::data_lock mutex, which resulted in a deadlock with a
user connection holding a binlog lock while executing SHOW SLAVE STATUS.

Fixed by inspecting the code and making sure that no binlog protection
locks are taken while the Relay_log_info::data_locks mutex is locked by
the same connection, and adding assertions where possible.

To post a comment you must log in.
Revision history for this message
Alexey Kopytov (akopytov) wrote :
Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

    - Should Relay_log_info::purge_relay_logs set errmsg on failure to
      acquire binlog lock protection?

    - Why does Slave_worker::reset_recovery_info try to acquire the
      binlog protection and proceeds to call the setter without
      considering whether the acquisition succeeded? That will fail
      the DBUG_ASSERT in the setter in the case of failure at
      least. If I understand the context of the code correctly, it
      should be just like the other protection sites instead, with an
      early error return in the case of failure.

    - Have you considered extracting

bool rpl_acquire_backup_binlog_lock_protection(thd, rli)
{
  bool ret= true;
  if (thd && !thd->backup_binlog_lock.is_acquired())
  {
    const ulong timeout= info_thd->variables.lock_wait_timeout;

    DBUG_PRINT("debug", ("Acquiring binlog protection lock"));

    if (rli)
      mysql_mutex_assert_not_owner(&rli->data_lock);

   ret= !info_thd->backup_binlog_lock.acquire_protection(thd,
MDL_EXPLICIT, timeout);
  }
  return ret;
}

and void release(thd, bool protection_acquired)
{
  if (thd && prot_acquired)
  {
    DBUG_PRINT("debug", ("Releasing binlog protection lock"));
    thd->backup_binlog_lock.release_protection(thd);
  }
}

review: Needs Information
Revision history for this message
Alexey Kopytov (akopytov) wrote :

>
> - Should Relay_log_info::purge_relay_logs set errmsg on failure to
> acquire binlog lock protection?
>

No, because setting errmsg would cause callers to call
my_error(ER_RELAY_LOG_FAIL, errmsg), which is not what we want, because
that would have been already called by MDL_context::acquire_lock().

> - Why does Slave_worker::reset_recovery_info try to acquire the
> binlog protection and proceeds to call the setter without
> considering whether the acquisition succeeded? That will fail
> the DBUG_ASSERT in the setter in the case of failure at
> least. If I understand the context of the code correctly, it
> should be just like the other protection sites instead, with an
> early error return in the case of failure.
>

For some reasons I wasn’t sure if the callers are able to handle errors
returned by Slave_worker::reset_recovery_info(), but I can’t recall
them. Anyway, this seems to be a rather corner case, so I changed to
return an error instead.

> - Have you considered extracting
>
> bool rpl_acquire_backup_binlog_lock_protection(thd, rli)
> {
> bool ret= true;
> if (thd && !thd->backup_binlog_lock.is_acquired())
> {
> const ulong timeout= info_thd->variables.lock_wait_timeout;
>
> DBUG_PRINT("debug", ("Acquiring binlog protection lock"));
>
> if (rli)
> mysql_mutex_assert_not_owner(&rli->data_lock);
>
> ret= !info_thd->backup_binlog_lock.acquire_protection(thd,
> MDL_EXPLICIT, timeout);
> }
> return ret;
> }
>
> and void release(thd, bool protection_acquired)
> {
> if (thd && prot_acquired)
> {
> DBUG_PRINT("debug", ("Releasing binlog protection lock"));
> thd->backup_binlog_lock.release_protection(thd);
> }
> }

I have, yes. That would prevent some code duplication, but not too much,
as callers would still have to handle errors, the ‘protection_acquired’
flag + some of them also assert that the current connection has not
acquired the binlog lock, while the others do not. Which means the
wrapper code around those acquire/release helper functions would be
about the same number of lines replicating to every site. So I decided
to leave it as is for clarity.

--
Best regards,
Alexey.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

OK if Jenkins OK

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'mysql-test/suite/rpl/r/rpl_backup_locks.result'
2--- mysql-test/suite/rpl/r/rpl_backup_locks.result 2014-02-25 07:30:22 +0000
3+++ mysql-test/suite/rpl/r/rpl_backup_locks.result 2014-09-24 16:59:42 +0000
4@@ -29,6 +29,13 @@
5 UNLOCK TABLES;
6 include/check_slave_param.inc [Exec_Master_Log_Pos]
7 UNLOCK BINLOG;
8+# connection slave
9+LOCK BINLOG FOR BACKUP;
10+# connection master
11+INSERT INTO t_innodb VALUES (1);
12+# connection slave
13+include/check_slave_param.inc [Exec_Master_Log_Pos]
14+UNLOCK BINLOG;
15 # connection master
16 DROP TABLE t_innodb, t_myisam;
17 include/rpl_end.inc
18
19=== added file 'mysql-test/suite/rpl/r/rpl_backup_locks_mts.result'
20--- mysql-test/suite/rpl/r/rpl_backup_locks_mts.result 1970-01-01 00:00:00 +0000
21+++ mysql-test/suite/rpl/r/rpl_backup_locks_mts.result 2014-09-24 16:59:42 +0000
22@@ -0,0 +1,30 @@
23+include/master-slave.inc
24+Warnings:
25+Note #### Sending passwords in plain text without SSL/TLS is extremely insecure.
26+Note #### Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
27+[connection master]
28+CREATE DATABASE db_1;
29+CREATE DATABASE db_2;
30+CREATE TABLE db_1.test_1 (id INTEGER) engine=Innodb;
31+CREATE TABLE db_2.test_1 (id INTEGER) engine=Innodb;
32+include/sync_slave_sql_with_master.inc
33+include/stop_slave.inc
34+SET @save.slave_parallel_workers= @@GLOBAL.slave_parallel_workers;
35+SET @@GLOBAL.slave_parallel_workers= 2;
36+include/start_slave.inc
37+LOCK BINLOG FOR BACKUP;
38+INSERT INTO db_1.test_1 VALUES(1);
39+INSERT INTO db_2.test_1 VALUES(1);
40+include/check_slave_param.inc [Exec_Master_Log_Pos]
41+UNLOCK BINLOG;
42+include/sync_slave_sql_with_master.inc
43+LOCK BINLOG FOR BACKUP;
44+INSERT INTO db_1.test_1 VALUES(2);
45+INSERT INTO db_2.test_1 VALUES(2);
46+include/check_slave_param.inc [Exec_Master_Log_Pos]
47+UNLOCK BINLOG;
48+DROP DATABASE db_1;
49+DROP DATABASE db_2;
50+include/sync_slave_sql_with_master.inc
51+SET @@GLOBAL.slave_parallel_workers= @save.slave_parallel_workers;
52+include/rpl_end.inc
53
54=== added file 'mysql-test/suite/rpl/r/rpl_bug1358836.result'
55--- mysql-test/suite/rpl/r/rpl_bug1358836.result 1970-01-01 00:00:00 +0000
56+++ mysql-test/suite/rpl/r/rpl_bug1358836.result 2014-09-24 16:59:42 +0000
57@@ -0,0 +1,18 @@
58+include/master-slave.inc
59+Warnings:
60+Note #### Sending passwords in plain text without SSL/TLS is extremely insecure.
61+Note #### Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
62+[connection master]
63+CREATE TABLE t1 (
64+a int(11) DEFAULT NULL
65+) ENGINE=InnoDB DEFAULT CHARSET=latin1;
66+# connection slave
67+LOCK BINLOG FOR BACKUP;
68+# connection master
69+INSERT INTO t1 VALUES (1);
70+# connection slave
71+include/check_slave_param.inc [Exec_Master_Log_Pos]
72+UNLOCK BINLOG;
73+# connection master
74+DROP TABLE t1;
75+include/rpl_end.inc
76
77=== modified file 'mysql-test/suite/rpl/t/rpl_backup_locks.test'
78--- mysql-test/suite/rpl/t/rpl_backup_locks.test 2014-02-25 07:30:22 +0000
79+++ mysql-test/suite/rpl/t/rpl_backup_locks.test 2014-09-24 16:59:42 +0000
80@@ -86,6 +86,38 @@
81
82 UNLOCK BINLOG;
83
84+############################################################################
85+# Bug 1372806: Deadlock with LOCK BINLOG FOR BACKUP + SHOW SLAVE STATUS with
86+# binary log enabled
87+############################################################################
88+
89+--connection slave
90+--echo # connection slave
91+LOCK BINLOG FOR BACKUP;
92+
93+--let $master_log_pos= query_get_value(SHOW SLAVE STATUS, Exec_Master_Log_Pos, 1)
94+
95+--connection master
96+--echo # connection master
97+INSERT INTO t_innodb VALUES (1);
98+
99+--connection slave
100+--echo # connection slave
101+let $wait_condition=
102+ SELECT COUNT(*) = 1 FROM INFORMATION_SCHEMA.PROCESSLIST
103+ WHERE STATE = "Waiting for binlog lock" AND
104+ (INFO = "INSERT INTO t_innodb VALUES (1)" OR INFO IS NULL);
105+--source include/wait_condition.inc
106+
107+# The following would deadlock with binlog_format=stmt on SHOW SLAVE STATUS
108+# before the fix
109+
110+--let $slave_param= Exec_Master_Log_Pos
111+--let $slave_param_value= $master_log_pos
112+--source include/check_slave_param.inc
113+
114+UNLOCK BINLOG;
115+
116 --connection master
117 --echo # connection master
118
119
120=== added file 'mysql-test/suite/rpl/t/rpl_backup_locks_mts-slave.opt'
121--- mysql-test/suite/rpl/t/rpl_backup_locks_mts-slave.opt 1970-01-01 00:00:00 +0000
122+++ mysql-test/suite/rpl/t/rpl_backup_locks_mts-slave.opt 2014-09-24 16:59:42 +0000
123@@ -0,0 +1,1 @@
124+--skip-log-bin --skip-log-slave-updates --slave-transaction-retries=0
125
126=== added file 'mysql-test/suite/rpl/t/rpl_backup_locks_mts.test'
127--- mysql-test/suite/rpl/t/rpl_backup_locks_mts.test 1970-01-01 00:00:00 +0000
128+++ mysql-test/suite/rpl/t/rpl_backup_locks_mts.test 2014-09-24 16:59:42 +0000
129@@ -0,0 +1,84 @@
130+###########################################################################
131+# MTS tests for binlog backup locks
132+###########################################################################
133+
134+--source include/have_innodb.inc
135+--let $rpl_skip_reset_master_and_slave=1
136+--source include/master-slave.inc
137+
138+--connection master
139+CREATE DATABASE db_1;
140+CREATE DATABASE db_2;
141+
142+CREATE TABLE db_1.test_1 (id INTEGER) engine=Innodb;
143+CREATE TABLE db_2.test_1 (id INTEGER) engine=Innodb;
144+
145+--connection master
146+--source include/sync_slave_sql_with_master.inc
147+
148+--connection slave
149+
150+--source include/stop_slave.inc
151+
152+SET @save.slave_parallel_workers= @@GLOBAL.slave_parallel_workers;
153+SET @@GLOBAL.slave_parallel_workers= 2;
154+
155+--source include/start_slave.inc
156+
157+LOCK BINLOG FOR BACKUP;
158+
159+--let $master_log_pos= query_get_value(SHOW SLAVE STATUS, Exec_Master_Log_Pos, 1)
160+
161+--connection master
162+INSERT INTO db_1.test_1 VALUES(1);
163+INSERT INTO db_2.test_1 VALUES(1);
164+
165+--connection slave
166+
167+let $wait_condition=
168+ SELECT COUNT(*) = 2 FROM INFORMATION_SCHEMA.PROCESSLIST
169+ WHERE STATE = "Waiting for binlog lock" AND
170+ (INFO LIKE "INSERT INTO db\__.test\__ VALUES(1))" OR INFO IS NULL);
171+--source include/wait_condition.inc
172+
173+--let $slave_param= Exec_Master_Log_Pos
174+--let $slave_param_value= $master_log_pos
175+--source include/check_slave_param.inc
176+
177+UNLOCK BINLOG;
178+
179+--connection master
180+--source include/sync_slave_sql_with_master.inc
181+
182+--connection slave
183+LOCK BINLOG FOR BACKUP;
184+
185+--let $master_log_pos= query_get_value(SHOW SLAVE STATUS, Exec_Master_Log_Pos, 1)
186+
187+--connection master
188+INSERT INTO db_1.test_1 VALUES(2);
189+INSERT INTO db_2.test_1 VALUES(2);
190+
191+--connection slave
192+
193+let $wait_condition=
194+ SELECT COUNT(*) = 2 FROM INFORMATION_SCHEMA.PROCESSLIST
195+ WHERE STATE = "Waiting for binlog lock" AND
196+ (INFO LIKE "INSERT INTO db\__.test\__ VALUES(2)" OR INFO IS NULL);
197+--source include/wait_condition.inc
198+
199+--let $slave_param= Exec_Master_Log_Pos
200+--let $slave_param_value= $master_log_pos
201+--source include/check_slave_param.inc
202+
203+UNLOCK BINLOG;
204+
205+--connection master
206+DROP DATABASE db_1;
207+DROP DATABASE db_2;
208+
209+--source include/sync_slave_sql_with_master.inc
210+
211+SET @@GLOBAL.slave_parallel_workers= @save.slave_parallel_workers;
212+
213+--source include/rpl_end.inc
214
215=== added file 'mysql-test/suite/rpl/t/rpl_bug1358836-slave.opt'
216--- mysql-test/suite/rpl/t/rpl_bug1358836-slave.opt 1970-01-01 00:00:00 +0000
217+++ mysql-test/suite/rpl/t/rpl_bug1358836-slave.opt 2014-09-24 16:59:42 +0000
218@@ -0,0 +1,1 @@
219+--skip-log-bin --skip-log-slave-updates
220
221=== added file 'mysql-test/suite/rpl/t/rpl_bug1358836.test'
222--- mysql-test/suite/rpl/t/rpl_bug1358836.test 1970-01-01 00:00:00 +0000
223+++ mysql-test/suite/rpl/t/rpl_bug1358836.test 2014-09-24 16:59:42 +0000
224@@ -0,0 +1,49 @@
225+###############################################################################
226+# Bug #1358836: Backup locks do not guarantee consistent SHOW SLAVE STATUS with
227+# binary log disabled
228+###############################################################################
229+
230+--source include/have_innodb.inc
231+--let $rpl_skip_reset_master_and_slave=1
232+--source include/master-slave.inc
233+
234+CREATE TABLE t1 (
235+ a int(11) DEFAULT NULL
236+) ENGINE=InnoDB DEFAULT CHARSET=latin1;
237+
238+sync_slave_with_master;
239+
240+--connection slave
241+--echo # connection slave
242+
243+LOCK BINLOG FOR BACKUP;
244+
245+--let $master_log_pos= query_get_value(SHOW SLAVE STATUS, Exec_Master_Log_Pos, 1)
246+
247+--connection master
248+--echo # connection master
249+
250+INSERT INTO t1 VALUES (1);
251+
252+--connection slave
253+--echo # connection slave
254+
255+let $wait_condition=
256+ SELECT COUNT(*) = 1 FROM INFORMATION_SCHEMA.PROCESSLIST
257+ WHERE STATE = "Waiting for binlog lock" AND
258+ (INFO = "INSERT INTO t1 VALUES (1)" OR INFO IS NULL);
259+--source include/wait_condition.inc
260+
261+--let $slave_param= Exec_Master_Log_Pos
262+--let $slave_param_value= $master_log_pos
263+--source include/check_slave_param.inc
264+
265+UNLOCK BINLOG;
266+
267+--connection master
268+--echo # connection master
269+DROP TABLE t1;
270+
271+sync_slave_with_master;
272+
273+--source include/rpl_end.inc
274
275=== modified file 'sql/binlog.cc'
276--- sql/binlog.cc 2014-08-22 10:02:24 +0000
277+++ sql/binlog.cc 2014-09-24 16:59:42 +0000
278@@ -6446,6 +6446,7 @@
279 my_xid xid= thd->transaction.xid_state.xid.get_my_xid();
280 int error= RESULT_SUCCESS;
281 bool stuff_logged= false;
282+ bool binlog_prot_acquired= false;
283
284 DBUG_PRINT("enter", ("thd: 0x%llx, all: %s, xid: %llu, cache_mngr: 0x%llx",
285 (ulonglong) thd, YESNO(all), (ulonglong) xid,
286@@ -6590,8 +6591,15 @@
287 {
288 int rc;
289
290- /* Block binlog updates if there's an active BINLOG lock. */
291- if (!thd->backup_binlog_lock.is_acquired())
292+ /*
293+ Block binlog updates if there's an active BINLOG lock.
294+
295+ We allow binlog lock owner to commit, assuming it knows what it does. We
296+ also check if protection has not been acquired earlier, which is possible
297+ in slave threads to protect master binlog coordinates.
298+ */
299+ if (!thd->backup_binlog_lock.is_acquired() &&
300+ !thd->backup_binlog_lock.is_protection_acquired())
301 {
302 const ulong timeout= thd->variables.lock_wait_timeout;
303
304@@ -6604,11 +6612,13 @@
305
306 DBUG_RETURN(RESULT_ABORTED);
307 }
308+
309+ binlog_prot_acquired= true;
310 }
311
312 rc= ordered_commit(thd, all);
313
314- if (!thd->backup_binlog_lock.is_acquired())
315+ if (binlog_prot_acquired)
316 {
317 DBUG_PRINT("debug", ("Releasing binlog protection lock"));
318 thd->backup_binlog_lock.release_protection(thd);
319
320=== modified file 'sql/log_event.cc'
321--- sql/log_event.cc 2014-08-22 10:02:24 +0000
322+++ sql/log_event.cc 2014-09-24 16:59:42 +0000
323@@ -6910,6 +6910,7 @@
324 */
325 DBUG_ASSERT(!thd->backup_binlog_lock.is_acquired());
326 DBUG_PRINT("debug", ("Acquiring binlog protection lock"));
327+ mysql_mutex_assert_not_owner(&rli->data_lock);
328 const ulong timeout= thd->variables.lock_wait_timeout;
329 if (thd->backup_binlog_lock.acquire_protection(thd, MDL_EXPLICIT, timeout))
330 {
331@@ -7431,11 +7432,25 @@
332 lex_start(thd);
333 mysql_reset_thd_for_next_command(thd);
334 Relay_log_info *rli_ptr= const_cast<Relay_log_info *>(rli);
335+ bool binlog_prot_acquired= false;
336
337 /* For a slave Xid_log_event is COMMIT */
338 general_log_print(thd, COM_QUERY,
339 "COMMIT /* implicit, from Xid_log_event */");
340
341+ if (!thd->backup_binlog_lock.is_acquired())
342+ {
343+ const ulong timeout= thd->variables.lock_wait_timeout;
344+
345+ DBUG_PRINT("debug", ("Acquiring binlog protection lock"));
346+ mysql_mutex_assert_not_owner(&rli->data_lock);
347+ if (thd->backup_binlog_lock.acquire_protection(thd, MDL_EXPLICIT,
348+ timeout))
349+ return 1;
350+
351+ binlog_prot_acquired= true;
352+ }
353+
354 mysql_mutex_lock(&rli_ptr->data_lock);
355
356 DBUG_PRINT("info", ("do_apply group master %s %llu group relay %s %llu event %s %llu\n",
357@@ -7496,6 +7511,12 @@
358 mysql_cond_broadcast(&rli_ptr->data_cond);
359 mysql_mutex_unlock(&rli_ptr->data_lock);
360
361+ if (binlog_prot_acquired)
362+ {
363+ DBUG_PRINT("debug", ("Releasing binlog protection lock"));
364+ thd->backup_binlog_lock.release_protection(thd);
365+ }
366+
367 return error;
368 }
369
370
371=== modified file 'sql/rpl_info_factory.cc'
372--- sql/rpl_info_factory.cc 2013-10-15 11:37:57 +0000
373+++ sql/rpl_info_factory.cc 2014-09-24 16:59:42 +0000
374@@ -529,6 +529,8 @@
375 bool error= true;
376 enum_return_check return_check_src= ERROR_CHECKING_REPOSITORY;
377 enum_return_check return_check_dst= ERROR_CHECKING_REPOSITORY;
378+ bool binlog_prot_acquired= false;
379+ THD * const thd= current_thd;
380 DBUG_ENTER("Rpl_info_factory::decide_repository");
381
382 if (option == INFO_REPOSITORY_DUMMY)
383@@ -585,6 +587,23 @@
384 goto err;
385
386 /*
387+ Acquire a backup binlog protection lock, because Rpl_info::copy_info()
388+ will modify master binary log coordinates.
389+ */
390+ if (thd && !thd->backup_binlog_lock.is_acquired())
391+ {
392+ const ulong timeout= thd->variables.lock_wait_timeout;
393+
394+ DBUG_PRINT("debug", ("Acquiring binlog protection lock"));
395+
396+ if (thd->backup_binlog_lock.acquire_protection(thd, MDL_EXPLICIT,
397+ timeout))
398+ goto err;
399+
400+ binlog_prot_acquired= true;
401+ }
402+
403+ /*
404 Transfer information from source to destination and delete the
405 source. Note this is not fault-tolerant and a crash before removing
406 source may cause the next restart to fail as is_src and is_dest may
407@@ -628,6 +647,11 @@
408 }
409
410 err:
411+ if (binlog_prot_acquired)
412+ {
413+ DBUG_PRINT("debug", ("Releasing binlog protection lock"));
414+ thd->backup_binlog_lock.release_protection(thd);
415+ }
416 DBUG_RETURN(error);
417 }
418
419
420=== modified file 'sql/rpl_rli.cc'
421--- sql/rpl_rli.cc 2014-02-27 12:29:47 +0000
422+++ sql/rpl_rli.cc 2014-09-24 16:59:42 +0000
423@@ -78,7 +78,7 @@
424 is_relay_log_recovery(is_slave_recovery),
425 save_temporary_tables(0),
426 cur_log_old_open_count(0), group_relay_log_pos(0), event_relay_log_pos(0),
427- group_master_log_pos(0),
428+ m_group_master_log_pos(0),
429 gtid_set(global_sid_map, global_sid_lock),
430 log_space_total(0), ignore_log_space_limit(0),
431 sql_force_rotate_relay(false),
432@@ -120,7 +120,7 @@
433 #endif
434
435 group_relay_log_name[0]= event_relay_log_name[0]=
436- group_master_log_name[0]= 0;
437+ m_group_master_log_name[0]= 0;
438 until_log_name[0]= ign_master_log_name_end[0]= 0;
439 set_timespec_nsec(last_clock, 0);
440 memset(&cache_buf, 0, sizeof(cache_buf));
441@@ -688,6 +688,8 @@
442 {
443 bool pos_reached;
444 int cmp_result= 0;
445+ const char * const group_master_log_name= get_group_master_log_name();
446+ const ulonglong group_master_log_pos= get_group_master_log_pos();
447
448 DBUG_PRINT("info",
449 ("init_abort_pos_wait: %ld abort_pos_wait: %ld",
450@@ -710,8 +712,8 @@
451 */
452 if (*group_master_log_name)
453 {
454- char *basename= (group_master_log_name +
455- dirname_length(group_master_log_name));
456+ const char * const basename= (group_master_log_name +
457+ dirname_length(group_master_log_name));
458 /*
459 First compare the parts before the extension.
460 Find the dot in the master's log basename,
461@@ -962,6 +964,7 @@
462 */
463 DBUG_ASSERT(!info_thd->backup_binlog_lock.is_acquired());
464 DBUG_PRINT("debug", ("Acquiring binlog protection lock"));
465+ mysql_mutex_assert_not_owner(&data_lock);
466 if (info_thd->backup_binlog_lock.acquire_protection(info_thd, MDL_EXPLICIT,
467 timeout))
468 DBUG_RETURN(1);
469@@ -971,8 +974,7 @@
470 else
471 {
472 mysql_mutex_assert_owner(&data_lock);
473- DBUG_ASSERT(info_thd->mdl_context.is_lock_owner(MDL_key::BINLOG, "", "",
474- MDL_INTENTION_EXCLUSIVE));
475+ DBUG_ASSERT(info_thd->backup_binlog_lock.is_protection_acquired());
476 }
477
478 inc_event_relay_log_pos();
479@@ -1003,10 +1005,10 @@
480 With the end_log_pos solution, we avoid computations involving lengthes.
481 */
482 DBUG_PRINT("info", ("log_pos: %lu group_master_log_pos: %lu",
483- (long) log_pos, (long) group_master_log_pos));
484+ (long) log_pos, (long) get_group_master_log_pos()));
485
486 if (log_pos > 0) // 3.23 binlogs don't have log_posx
487- group_master_log_pos= log_pos;
488+ set_group_master_log_pos(log_pos);
489
490 /*
491 In MTS mode FD or Rotate event commit their solitary group to
492@@ -1080,6 +1082,7 @@
493 {
494 int error=0;
495 DBUG_ENTER("Relay_log_info::purge_relay_logs");
496+ bool binlog_prot_acquired= false;
497
498 /*
499 Even if inited==0, we still try to empty master_log_* variables. Indeed,
500@@ -1097,12 +1100,33 @@
501 SLAVE, he will see old, confusing master_log_*. In other words, we reinit
502 master_log_* for SHOW SLAVE STATUS to display fine in any case.
503 */
504- group_master_log_name[0]= 0;
505- group_master_log_pos= 0;
506+
507+ if (!thd->backup_binlog_lock.is_acquired())
508+ {
509+ const ulong timeout= thd->variables.lock_wait_timeout;
510+
511+ DBUG_PRINT("debug", ("Acquiring binlog protection lock"));
512+ mysql_mutex_assert_not_owner(&data_lock);
513+ if (thd->backup_binlog_lock.acquire_protection(thd, MDL_EXPLICIT,
514+ timeout))
515+ return 1;
516+
517+ binlog_prot_acquired= true;
518+ }
519+
520+ set_group_master_log_name("");
521+ set_group_master_log_pos(0);
522
523 if (!inited)
524 {
525 DBUG_PRINT("info", ("inited == 0"));
526+
527+ if (binlog_prot_acquired)
528+ {
529+ DBUG_PRINT("debug", ("Releasing binlog protection lock"));
530+ thd->backup_binlog_lock.release_protection(thd);
531+ }
532+
533 DBUG_RETURN(0);
534 }
535
536@@ -1154,6 +1178,13 @@
537 #endif
538 DBUG_PRINT("info",("log_space_total: %s",llstr(log_space_total,buf)));
539 mysql_mutex_unlock(&data_lock);
540+
541+ if (binlog_prot_acquired)
542+ {
543+ DBUG_PRINT("debug", ("Releasing binlog protection lock"));
544+ thd->backup_binlog_lock.release_protection(thd);
545+ }
546+
547 DBUG_RETURN(error);
548 }
549
550@@ -1206,10 +1237,10 @@
551 {
552 if (ev && ev->server_id == (uint32) ::server_id && !replicate_same_server_id)
553 DBUG_RETURN(false);
554- log_name= group_master_log_name;
555- log_pos= (!ev)? group_master_log_pos :
556+ log_name= get_group_master_log_name();
557+ log_pos= (!ev) ? get_group_master_log_pos() :
558 ((thd->variables.option_bits & OPTION_BEGIN || !ev->log_pos) ?
559- group_master_log_pos : ev->log_pos - ev->data_written);
560+ get_group_master_log_pos() : ev->log_pos - ev->data_written);
561 }
562 else
563 { /* until_condition == UNTIL_RELAY_POS */
564@@ -1221,7 +1252,8 @@
565 {
566 char buf[32];
567 DBUG_PRINT("info", ("group_master_log_name='%s', group_master_log_pos=%s",
568- group_master_log_name, llstr(group_master_log_pos, buf)));
569+ get_group_master_log_name(),
570+ llstr(get_group_master_log_pos(), buf)));
571 DBUG_PRINT("info", ("group_relay_log_name='%s', group_relay_log_pos=%s",
572 group_relay_log_name, llstr(group_relay_log_pos, buf)));
573 DBUG_PRINT("info", ("(%s) log_name='%s', log_pos=%s",
574@@ -1857,8 +1889,8 @@
575 error= 1;
576 goto err;
577 }
578- group_master_log_name[0]= 0;
579- group_master_log_pos= 0;
580+ set_group_master_log_name("");
581+ set_group_master_log_pos(0);
582 }
583 else
584 {
585@@ -2069,6 +2101,7 @@
586 int lines= 0;
587 char *first_non_digit= NULL;
588 ulong temp_group_relay_log_pos= 0;
589+ char temp_group_master_log_name[FN_REFLEN];
590 ulong temp_group_master_log_pos= 0;
591 int temp_sql_delay= 0;
592 int temp_internal_id= internal_id;
593@@ -2133,8 +2166,8 @@
594
595 if (from->get_info((ulong *) &temp_group_relay_log_pos,
596 (ulong) BIN_LOG_HEADER_SIZE) ||
597- from->get_info(group_master_log_name,
598- (size_t) sizeof(group_relay_log_name),
599+ from->get_info(temp_group_master_log_name,
600+ (size_t) sizeof(temp_group_master_log_name),
601 (char *) "") ||
602 from->get_info((ulong *) &temp_group_master_log_pos,
603 (ulong) 0))
604@@ -2159,7 +2192,8 @@
605 }
606
607 group_relay_log_pos= temp_group_relay_log_pos;
608- group_master_log_pos= temp_group_master_log_pos;
609+ set_group_master_log_name(temp_group_master_log_name);
610+ set_group_master_log_pos(temp_group_master_log_pos);
611 sql_delay= (int32) temp_sql_delay;
612 internal_id= (uint) temp_internal_id;
613
614@@ -2182,8 +2216,8 @@
615 to->set_info((int) LINES_IN_RELAY_LOG_INFO_WITH_ID) ||
616 to->set_info(group_relay_log_name) ||
617 to->set_info((ulong) group_relay_log_pos) ||
618- to->set_info(group_master_log_name) ||
619- to->set_info((ulong) group_master_log_pos) ||
620+ to->set_info(get_group_master_log_name()) ||
621+ to->set_info((ulong) get_group_master_log_pos()) ||
622 to->set_info((int) sql_delay) ||
623 to->set_info(recovery_parallel_workers) ||
624 to->set_info((int) internal_id))
625
626=== modified file 'sql/rpl_rli.h'
627--- sql/rpl_rli.h 2014-06-04 14:48:06 +0000
628+++ sql/rpl_rli.h 2014-09-24 16:59:42 +0000
629@@ -223,8 +223,8 @@
630 !belongs_to_client(); client thread executing BINLOG statement if
631 belongs_to_client()).
632 */
633- char group_master_log_name[FN_REFLEN];
634- volatile my_off_t group_master_log_pos;
635+ char m_group_master_log_name[FN_REFLEN];
636+ volatile my_off_t m_group_master_log_pos;
637
638 /*
639 When it commits, InnoDB internally stores the master log position it has
640@@ -417,7 +417,7 @@
641 bool is_until_satisfied(THD *thd, Log_event *ev);
642 inline ulonglong until_pos()
643 {
644- return ((until_condition == UNTIL_MASTER_POS) ? group_master_log_pos :
645+ return ((until_condition == UNTIL_MASTER_POS) ? get_group_master_log_pos() :
646 group_relay_log_pos);
647 }
648
649@@ -733,15 +733,24 @@
650 future_event_relay_log_pos= log_pos;
651 }
652
653- inline const char* get_group_master_log_name() { return group_master_log_name; }
654- inline ulonglong get_group_master_log_pos() { return group_master_log_pos; }
655+ inline const char* get_group_master_log_name()
656+ {
657+ return m_group_master_log_name;
658+ }
659+ inline ulonglong get_group_master_log_pos() { return m_group_master_log_pos; }
660 inline void set_group_master_log_name(const char *log_file_name)
661 {
662- strmake(group_master_log_name,log_file_name, sizeof(group_master_log_name)-1);
663+ DBUG_ASSERT(!info_thd ||
664+ info_thd->backup_binlog_lock.is_protection_acquired());
665+
666+ strmake(m_group_master_log_name, log_file_name,
667+ sizeof(m_group_master_log_name) - 1);
668 }
669 inline void set_group_master_log_pos(ulonglong log_pos)
670 {
671- group_master_log_pos= log_pos;
672+ DBUG_ASSERT(!info_thd ||
673+ info_thd->backup_binlog_lock.is_protection_acquired());
674+ m_group_master_log_pos= log_pos;
675 }
676
677 inline const char* get_group_relay_log_name() { return group_relay_log_name; }
678@@ -775,7 +784,7 @@
679 }
680 inline const char* get_rpl_log_name()
681 {
682- return (group_master_log_name[0] ? group_master_log_name : "FIRST");
683+ return (m_group_master_log_name[0] ? m_group_master_log_name : "FIRST");
684 }
685
686 #if MYSQL_VERSION_ID < 40100
687
688=== modified file 'sql/rpl_rli_pdb.cc'
689--- sql/rpl_rli_pdb.cc 2014-08-22 10:02:24 +0000
690+++ sql/rpl_rli_pdb.cc 2014-09-24 16:59:42 +0000
691@@ -300,6 +300,7 @@
692 DBUG_ENTER("Slave_worker::read_info");
693
694 ulong temp_group_relay_log_pos= 0;
695+ char temp_group_master_log_name[FN_REFLEN];
696 ulong temp_group_master_log_pos= 0;
697 ulong temp_checkpoint_relay_log_pos= 0;
698 ulong temp_checkpoint_master_log_pos= 0;
699@@ -317,8 +318,8 @@
700 (char *) "") ||
701 from->get_info((ulong *) &temp_group_relay_log_pos,
702 (ulong) 0) ||
703- from->get_info(group_master_log_name,
704- (size_t) sizeof(group_master_log_name),
705+ from->get_info(temp_group_master_log_name,
706+ (size_t) sizeof(temp_group_master_log_name),
707 (char *) "") ||
708 from->get_info((ulong *) &temp_group_master_log_pos,
709 (ulong) 0) ||
710@@ -343,7 +344,8 @@
711
712 internal_id=(uint) temp_internal_id;
713 group_relay_log_pos= temp_group_relay_log_pos;
714- group_master_log_pos= temp_group_master_log_pos;
715+ set_group_master_log_name(temp_group_master_log_name);
716+ set_group_master_log_pos(temp_group_master_log_pos);
717 checkpoint_relay_log_pos= temp_checkpoint_relay_log_pos;
718 checkpoint_master_log_pos= temp_checkpoint_master_log_pos;
719 checkpoint_seqno= temp_checkpoint_seqno;
720@@ -363,8 +365,8 @@
721 to->set_info((int) internal_id) ||
722 to->set_info(group_relay_log_name) ||
723 to->set_info((ulong) group_relay_log_pos) ||
724- to->set_info(group_master_log_name) ||
725- to->set_info((ulong) group_master_log_pos) ||
726+ to->set_info(get_group_master_log_name()) ||
727+ to->set_info((ulong) get_group_master_log_pos()) ||
728 to->set_info(checkpoint_relay_log_name) ||
729 to->set_info((ulong) checkpoint_relay_log_pos) ||
730 to->set_info(checkpoint_master_log_name) ||
731@@ -387,11 +389,33 @@
732 */
733 bool Slave_worker::reset_recovery_info()
734 {
735+ bool binlog_prot_acquired= false;
736+
737 DBUG_ENTER("Slave_worker::reset_recovery_info");
738
739+ if (info_thd && !info_thd->backup_binlog_lock.is_acquired())
740+ {
741+ const ulong timeout= info_thd->variables.lock_wait_timeout;
742+
743+ DBUG_PRINT("debug", ("Acquiring binlog protection lock"));
744+
745+ if (info_thd->backup_binlog_lock.acquire_protection(info_thd, MDL_EXPLICIT,
746+ timeout))
747+ DBUG_RETURN(true);
748+
749+ binlog_prot_acquired= true;
750+ }
751+
752 set_group_master_log_name("");
753 set_group_master_log_pos(0);
754
755+ if (binlog_prot_acquired)
756+ {
757+ DBUG_PRINT("debug", ("Releasing binlog protection lock"));
758+ info_thd->backup_binlog_lock.release_protection(info_thd);
759+ }
760+
761+
762 DBUG_RETURN(flush_info(true));
763 }
764
765@@ -410,8 +434,22 @@
766
767 bool Slave_worker::commit_positions(Log_event *ev, Slave_job_group* ptr_g, bool force)
768 {
769+ bool binlog_prot_acquired= false;
770 DBUG_ENTER("Slave_worker::checkpoint_positions");
771
772+ if (info_thd && !info_thd->backup_binlog_lock.is_acquired())
773+ {
774+ const ulong timeout= info_thd->variables.lock_wait_timeout;
775+
776+ DBUG_PRINT("debug", ("Acquiring binlog protection lock"));
777+
778+ if (info_thd->backup_binlog_lock.acquire_protection(info_thd, MDL_EXPLICIT,
779+ timeout))
780+ DBUG_RETURN(true);
781+
782+ binlog_prot_acquired= true;
783+ }
784+
785 /*
786 Initial value of checkpoint_master_log_name is learned from
787 group_master_log_name. The latter can be passed to Worker
788@@ -422,11 +460,10 @@
789 */
790 if (ptr_g->group_master_log_name != NULL)
791 {
792- strmake(group_master_log_name, ptr_g->group_master_log_name,
793- sizeof(group_master_log_name) - 1);
794+ set_group_master_log_name(ptr_g->group_master_log_name);
795 my_free(ptr_g->group_master_log_name);
796 ptr_g->group_master_log_name= NULL;
797- strmake(checkpoint_master_log_name, group_master_log_name,
798+ strmake(checkpoint_master_log_name, get_group_master_log_name(),
799 sizeof(checkpoint_master_log_name) - 1);
800 }
801 if (ptr_g->checkpoint_log_name != NULL)
802@@ -467,19 +504,19 @@
803 bitmap_set_bit(&group_executed, ptr_g->checkpoint_seqno);
804 checkpoint_seqno= ptr_g->checkpoint_seqno;
805 group_relay_log_pos= ev->future_event_relay_log_pos;
806- group_master_log_pos= ev->log_pos;
807+ set_group_master_log_pos(ev->log_pos);
808
809 /*
810 Directly accessing c_rli->get_group_master_log_name() does not
811 represent a concurrency issue because the current code places
812 a synchronization point when master rotates.
813 */
814- strmake(group_master_log_name, c_rli->get_group_master_log_name(),
815- sizeof(group_master_log_name)-1);
816+ set_group_master_log_name(c_rli->get_group_master_log_name());
817
818 DBUG_PRINT("mts", ("Committing worker-id %lu group master log pos %llu "
819 "group master log name %s checkpoint sequence number %lu.",
820- id, group_master_log_pos, group_master_log_name, checkpoint_seqno));
821+ id, get_group_master_log_pos(),
822+ get_group_master_log_name(), checkpoint_seqno));
823
824 DBUG_EXECUTE_IF("mts_debug_concurrent_access",
825 {
826@@ -487,6 +524,12 @@
827 };
828 );
829
830+ if (binlog_prot_acquired)
831+ {
832+ DBUG_PRINT("debug", ("Releasing binlog protection lock"));
833+ info_thd->backup_binlog_lock.release_protection(info_thd);
834+ }
835+
836 DBUG_RETURN(flush_info(force));
837 }
838
839@@ -1060,7 +1103,7 @@
840 );
841 }
842
843- ptr_g->group_master_log_pos= group_master_log_pos;
844+ ptr_g->group_master_log_pos= get_group_master_log_pos();
845 ptr_g->group_relay_log_pos= group_relay_log_pos;
846
847 ptr_g->done= 1; // GAQ index is available to C now
848
849=== modified file 'sql/rpl_slave.cc'
850--- sql/rpl_slave.cc 2014-08-22 10:02:24 +0000
851+++ sql/rpl_slave.cc 2014-09-24 16:59:42 +0000
852@@ -507,6 +507,21 @@
853 int init_error= 0;
854 enum_return_check check_return= ERROR_CHECKING_REPOSITORY;
855 THD *thd= current_thd;
856+ bool binlog_prot_acquired= false;
857+
858+ if (thd && !thd->backup_binlog_lock.is_acquired())
859+ {
860+ const ulong timeout= thd->variables.lock_wait_timeout;
861+
862+ DBUG_PRINT("debug", ("Acquiring binlog protection lock"));
863+ mysql_mutex_assert_not_owner(&mi->rli->data_lock);
864+
865+ if (thd->backup_binlog_lock.acquire_protection(thd, MDL_EXPLICIT,
866+ timeout))
867+ DBUG_RETURN(1);
868+
869+ binlog_prot_acquired= true;
870+ }
871
872 /*
873 We need a mutex while we are changing master info parameters to
874@@ -570,6 +585,13 @@
875
876 mysql_mutex_unlock(&mi->rli->data_lock);
877 mysql_mutex_unlock(&mi->data_lock);
878+
879+ if (binlog_prot_acquired)
880+ {
881+ DBUG_PRINT("debug", ("Releasing binlog protection lock"));
882+ thd->backup_binlog_lock.release_protection(thd);
883+ }
884+
885 DBUG_RETURN(check_return == ERROR_CHECKING_REPOSITORY || init_error);
886 }
887
888@@ -5066,6 +5088,7 @@
889 ulong cnt;
890 bool error= FALSE;
891 struct timespec curr_clock;
892+ bool binlog_prot_acquired= false;
893
894 DBUG_ENTER("checkpoint_routine");
895
896@@ -5137,9 +5160,32 @@
897 sort_dynamic(&rli->least_occupied_workers, (qsort_cmp) ulong_cmp);
898
899 if (need_data_lock)
900+ {
901+ THD * const info_thd= rli->info_thd;
902+ const ulong timeout= info_thd->variables.lock_wait_timeout;
903+
904+ /*
905+ Acquire protection against global BINLOG lock before rli->data_lock is
906+ locked (otherwise we would also block SHOW SLAVE STATUS).
907+ */
908+ DBUG_ASSERT(!info_thd->backup_binlog_lock.is_acquired());
909+ DBUG_PRINT("debug", ("Acquiring binlog protection lock"));
910+ mysql_mutex_assert_not_owner(&rli->data_lock);
911+ error= info_thd->backup_binlog_lock.acquire_protection(info_thd,
912+ MDL_EXPLICIT,
913+ timeout);
914+ if (error)
915+ goto end;
916+
917+ binlog_prot_acquired= true;
918+
919 mysql_mutex_lock(&rli->data_lock);
920+ }
921 else
922+ {
923 mysql_mutex_assert_owner(&rli->data_lock);
924+ DBUG_ASSERT(rli->info_thd->backup_binlog_lock.is_protection_acquired());
925+ }
926
927 /*
928 "Coordinator::commit_positions" {
929@@ -5184,6 +5230,13 @@
930 /* end-of "Coordinator::"commit_positions" */
931
932 end:
933+
934+ if (binlog_prot_acquired)
935+ {
936+ DBUG_PRINT("debug", ("Releasing binlog protection lock"));
937+ rli->info_thd->backup_binlog_lock.release_protection(rli->info_thd);
938+ }
939+
940 #ifndef DBUG_OFF
941 if (DBUG_EVALUATE_IF("check_slave_debug_group", 1, 0))
942 DBUG_SUICIDE();
943@@ -8134,6 +8187,7 @@
944 my_off_t saved_log_pos= 0;
945 my_bool save_relay_log_purge= relay_log_purge;
946 bool mts_remove_workers= false;
947+ bool binlog_prot_acquired= false;
948
949 DBUG_ENTER("change_master");
950
951@@ -8481,6 +8535,22 @@
952 }
953 relay_log_purge= save_relay_log_purge;
954
955+ if (!thd->backup_binlog_lock.is_acquired())
956+ {
957+ const ulong timeout= thd->variables.lock_wait_timeout;
958+
959+ DBUG_PRINT("debug", ("Acquiring binlog protection lock"));
960+ mysql_mutex_assert_not_owner(&mi->rli->data_lock);
961+ if (thd->backup_binlog_lock.acquire_protection(thd, MDL_EXPLICIT,
962+ timeout))
963+ {
964+ ret= true;
965+ goto err;
966+ }
967+
968+ binlog_prot_acquired= true;
969+ }
970+
971 /*
972 Coordinates in rli were spoilt by the 'if (need_relay_log_purge)' block,
973 so restore them to good values. If we left them to ''/0, that would work;
974@@ -8533,6 +8603,13 @@
975 mysql_mutex_unlock(&mi->rli->data_lock);
976
977 err:
978+
979+ if (binlog_prot_acquired)
980+ {
981+ DBUG_PRINT("debug", ("Releasing binlog protection lock"));
982+ thd->backup_binlog_lock.release_protection(thd);
983+ }
984+
985 unlock_slave_threads(mi);
986 if (ret == FALSE)
987 {
988
989=== modified file 'sql/sql_class.h'
990--- sql/sql_class.h 2014-08-22 10:02:24 +0000
991+++ sql/sql_class.h 2014-09-24 16:59:42 +0000
992@@ -2170,6 +2170,8 @@
993
994 bool is_acquired() const { return m_lock != NULL; }
995
996+ bool is_protection_acquired() const { return m_prot_lock != NULL; }
997+
998 private:
999 MDL_key::enum_mdl_namespace m_namespace;
1000 MDL_ticket *m_lock;

Subscribers

People subscribed via source and target branches