maria:bb-10.5-sujatha

Last commit made on 2019-09-18
Get this branch:
git clone -b bb-10.5-sujatha https://git.launchpad.net/maria

Branch merges

Branch information

Name:
bb-10.5-sujatha
Repository:
lp:maria

Recent commits

cde9170... by Sujatha <email address hidden> on 2019-09-18

MDEV-18648: slave_parallel_mode= optimistic default in 10.5

Description:
============
To change CONSERVATIVE @@global.slave_parallel_mode default to OPTIMISTIC in
10.5.

@sql/sys_vars.cc
Changed default parallel_mode to 'OPTIMISTIC'

@sql/sql_class.h
Moved 'wait_for_prior_commit(THD *thd)' method inside sql_class.cc

@sql/sql_class.cc
Added code to check for 'stop_on_error_sub_id' for event groups which get skipped
and don't have any preceding group to wait for.

@sql/rpl_filter.cc
Changed default parallel_mode to 'OPTIMISTIC'

@sql/mysqld.cc
Removed the initialization of 'SLAVE_PARALLEL_CONSERVATIVE' to 'opt_slave_parallel_mode' variable.

@mysql-test/suite/rpl/t/rpl_parallel_mdev6589.test
@mysql-test/suite/rpl/t/rpl_mdev6386.test
Added mtr suppression to ignore 'ER_PRIOR_COMMIT_FAILED'. In case of
OPTIMISTIC mode if a transaction gets killed during "wait_for_prior_commit" it
results in above error "1964". Hence suppression needs to be added for this
error.

@mysql-test/suite/rpl/t/rpl_parallel_conflicts.test
Test has a 'slave.opt' which explicitly sets
slave_parallel_mode='conservative'. When the test ends this mode conflicts
with new default mode. Hence check test case reports an error. The 'slave.opt'
is removed and options are set and reset within test.

@mysql-test/suite/multi_source/info_logs.result
@mysql-test/suite/multi_source/reset_slave.result
@mysql-test/suite/multi_source/simple.result
Result content mismatch in "show slave status" output. This is expected as new
slave_parallel_mode='OPTIMISTIC'.

@mysql-test/include/check-testcase.test
Updated default 'slave_parallel_mode' to 'optimistic'.

@mysql-test/suite/rpl/t/rpl_parallel_analyze_table_hang.test
@mysql-test/suite/rpl/t/rpl_parallel_deadlock_corrupt_binlog.test
@mysql-test/suite/rpl/t/rpl_parallel_domain.test
@mysql-test/suite/rpl/t/rpl_parallel_domain_slave_single_grp.test
@mysql-test/suite/rpl/t/rpl_parallel_free_deferred_event.test
@mysql-test/suite/rpl/t/rpl_parallel_gco_wait_kill.test
@mysql-test/suite/rpl/t/rpl_parallel_gtid_slave_pos_update_fail.test
@mysql-test/suite/rpl/t/rpl_parallel_ignored_errors.test
@mysql-test/suite/rpl/t/rpl_parallel_incorrect_relay_pos.test
@mysql-test/suite/rpl/t/rpl_parallel_innodb_lock_conflict.test
@mysql-test/suite/rpl/t/rpl_parallel_missed_error_handling.test
@mysql-test/suite/rpl/t/rpl_parallel_mode.test
@mysql-test/suite/rpl/t/rpl_parallel_partial_binlog_trans.test
@mysql-test/suite/rpl/t/rpl_parallel_record_gtid_wakeup.test
@mysql-test/suite/rpl/t/rpl_parallel_retry_deadlock.test
@mysql-test/suite/rpl/t/rpl_parallel_rollback_assert.test
@mysql-test/suite/rpl/t/rpl_parallel_single_grpcmt.test
@mysql-test/suite/rpl/t/rpl_parallel_slave_bgc_kill.test
@mysql-test/suite/rpl/t/rpl_parallel_stop_on_con_kill.test
@mysql-test/suite/rpl/t/rpl_parallel_stop_slave.test
@mysql-test/suite/rpl/t/rpl_parallel_wrong_binlog_order.test
@mysql-test/suite/rpl/t/rpl_parallel_wrong_exec_master_pos.test

Refactored rpl_parallel.test into above individual test cases.

rpl_parallel_ignored_errors.test had sporadic failures with following
symptoms.

Problem:
=======
CURRENT_TEST: rpl.rpl_parallel_ignored_errors
--- /test-10.5/mysql-test/suite/rpl/r/rpl_parallel_ignored_errors.result
+++ /test-10.5/mysql-test/suite/rpl/r/rpl_parallel_ignored_errors.reject
@@ -53,6 +53,7 @@
 a
 31
 32
+34
 SET sql_slave_skip_counter= 1;
 ERROR HY000: When using parallel replication and GTID with multiple
 replication domains, @@sql_slave_skip_counter can not be used. Instead,
 setting @@gtid_slave_pos explicitly can be used to skip to after a given GTID
 position
 include/stop_slave_io.inc
@@ -62,7 +63,6 @@
 a
 31
 32
-33
 34
 connection server_2;
 include/stop_slave.inc

Analysis:
=========
In general if there are three groups.
1 - Inserts 32 which fails due to local entry '32' on slave.
2 - Inserts 33
3 - Inserts 34

Each group considers itself as a waiter and it waits for prior group 'waitee'.
This is done in 'register_wait_for_prior_event_group_commit'. If there is no
other parallel group being scheduled then no waitee will be there.

Let us assume 3 groups are being scheduled in parallel.

3-> waits for 2-> waits for->1

'1' upon completion it checks is there any registered subsequent waiter. If
so it wakes up the subsequent waiter with its execution status. This execution
status is stored in wakeup_error.

If '1' failed then it sends corresponding wakeup_error to 2. Then '2' aborts
and it propagates error to '3'. So all further commits are aborted. This
mechanism works only when all transactions reach a stage where they are
waiting for their prior commit to complete.

In case of optimistic following scenario occurs.

1,2,3 are scheduled in parallel.

3 - Reaches group_commit_code waits for 2 to complete.
1 - errors out sets stop_on_error_sub_id=1.

When a group execution results in error its corresponding sub_id is set to
'stop_on_error_sub_id'. Any new groups queued for execution will check if
their sub_id is > stop_on_error_sub_id. If it is true their execution will be
skipped as prior group execution failed. 'skip_event_group=1' will be set.
Since the execution of SQL thread is about to stop we just skip execution of
all the following event groups. We still do all the normal waiting and wakeup
processing between the event groups as a simple way to ensure that everything
is stopped and cleaned up correctly.

Upon error '1' transaction checks for registered waiters. Since no one is
there it simply goes away.

2 - Starts the execution. It checks do I have a waitee.

Since wait_commit_sub_id == entry->last_committed_sub_id no waitee is set.

Secondly: 'entry->stop_on_error_sub_id' is set by '1'st execution. Now
'handle_parallel_thread' code checks if the current group 'sub_id' is greater
than the 'sub_id' set within 'stop_on_error_sub_id'.

Since the above is true 'skip_event_group=true' is set. Simply call
'wait_for_prior_commit' to wakeup all waiters. Group '2' didn't had any
waitee and its execution is skipped. Hence its wakeup_error=0.It sends a
positive wakeup signal to '3'. Which commits. This results in a missed
transaction. i.e 33 is missed.

Fix is to check for 'ston_on_error_sub_id' during wakeup. Please check
sql_class.cc specific changes.

Some recent commit information could not be fetched.