maria:bb-10.2-MDEV-19344

Last commit made on 2019-10-04
Get this branch:
git clone -b bb-10.2-MDEV-19344 https://git.launchpad.net/maria

Branch merges

Branch information

Name:
bb-10.2-MDEV-19344
Repository:
lp:maria

Recent commits

18dee43... by Marko Mäkelä

MDEV-19344 InnoDB purge buffering may corrupt a page

MySQL 5.5 introduced the ability of InnoDB to buffer delete-mark
and delete operations (the insert buffer was generalized to the
change buffer). These operations were only buffered on DELETE,
UPDATE (of a key) and the purge of history of committed transactions.

We never buffered anything on ROLLBACK; it could have been beneficial
for rolling back a large recovered transaction.

The delete-mark buffering appears to work fine, but there are problems
with the purge buffering.

MySQL Bug #61104 InnoDB: Failing assertion: page_get_n_recs(page) > 1
reported a problem with the purge buffering: an index page could
become empty, which essentially means that the secondary index becomes
corrupted. At MariaDB, we got closer to the root cause, by making
a failure repeatable with innodb.innodb-change-buffer-recovery-innodb
when an additional debug assertion is present.

page_header_set_field(): Enable the assertion that PAGE_N_RECS
must never be set to 0. (This function will not be invoked when
initializing an empty page. If an index page is empty, it must
be the root page, and the table must be empty. No changes to
the root page are ever buffered.)

A combination of two asynchronous, inherently nondeterministic operations
(purge and change buffering) is difficult to cover in tests or to reason
about. The purge buffering required a complex mechanism in the buffer pool,
the buffer pool watch. If we no longer buffer purge operations, we
can remove the watch as well.

We fix this by ceasing to buffer delete (purge) operations,
that is, by treating innodb_change_buffering=all (the default)
in the same way as innodb_change_buffering=changes
and treating innodb_change_buffering=purges
in the same way as innodb_change_buffering=deletes.

MDEV-16260 will attempt to improve the performance of purge in a more
controlled fashion by scaling the effort according to the workload.

We will retain the code that merges buffered purge operations,
so that upgrades from older versions will be possible.

BTR_DELETE_OP, BTR_DELETE, BUF_BLOCK_POOL_WATCH,
BUF_GET_IF_IN_POOL_OR_WATCH, BUF_POOL_WATCH_SIZE,
ROW_NOT_DELETED_REF: Remove.

btr_cur_t::purge_node, buf_pool_t::watch: Remove.

ibuf_get_volume_buffered_hash(): Remove. It is no longer necessary
to estimate whether the page could become empty.

5e65c67... by Eugene

fix clang warning

edda2fd... by Vlad Lesin

MDEV-20703: mariabackup creates binlog files in server binlog directory on --prepare --export step

When "--export" mariabackup option is used, mariabackup starts the server in
bootstrap mode to generate *.cfg files for the certain innodb tables.
The started instance of the server reads options from the file, pointed
out in "--defaults-file" mariabackup option.

If the server uses the same config file as mariabackup, and binlog is
switched on in that config file, then "mariabackup --prepare --export"
will create binary log files in the server's binary log directory, what
can cause issues.

The fix is to add "--skip-log-bin" in mysld options when the server is
started to generate *.cfg files.

f203245... by Alexander Barkov

Merge remote-tracking branch 'origin/10.1' into 10.2

4bcf524... by Alexander Barkov

Merge remote-tracking branch 'origin/5.5' into 10.1

d481f69... by Alexander Barkov

MDEV-20704 An index on a double column erroneously uses prefix compression

576a5f0... by Robert Bindar

MDEV-20647 Fix and enable SphinxSE tests

46b7852... by Marko Mäkelä

Fix -Wunused for CMAKE_BUILD_TYPE=RelWithDebInfo

For release builds, do not declare unused variables.

unpack_row(): Omit a debug-only variable from WSREP diagnostic message.

create_wsrep_THD(): Fix -Wmaybe-uninitialized for the PSI_thread_key.

9b80f93... by Sujatha Sivakumar

MDEV-20645: Replication consistency is broken as workers miss the error notification from an earlier failed group.

Analysis:
========
In general if there are three groups.
1 - Inserts 32 which fails due to local entry '32' on slave.
2 - Inserts 33
3 - Inserts 34

Each group considers itself as a waiter and it waits for prior group 'waitee'.
This is done in 'register_wait_for_prior_event_group_commit'. If there is no
other parallel group being scheduled then no waitee will be there.

Let us assume 3 groups are being scheduled in parallel.

3-> waits for 2-> waits for->1

'1' upon completion it checks is there any registered subsequent waiter. If
so it wakes up the subsequent waiter with its execution status. This execution
status is stored in wakeup_error.

If '1' failed then it sends corresponding wakeup_error to 2. Then '2' aborts
and it propagates error to '3'. So all further commits are aborted. This
mechanism works only when all transactions reach a stage where they are
waiting for their prior commit to complete.

In case of optimistic following scenario occurs.

1,2,3 are scheduled in parallel.

3 - Reaches group_commit_code waits for 2 to complete.
1 - errors out sets stop_on_error_sub_id=1.

When a group execution results in error its corresponding sub_id is set to
'stop_on_error_sub_id'. Any new groups queued for execution will check if
their sub_id is > stop_on_error_sub_id. If it is true their execution will be
skipped as prior group execution failed. 'skip_event_group=1' will be set.
Since the execution of SQL thread is about to stop we just skip execution of
all the following event groups. We still do all the normal waiting and wakeup
processing between the event groups as a simple way to ensure that everything
is stopped and cleaned up correctly.

Upon error '1' transaction checks for registered waiters. Since no one is
there it simply goes away.

2 - Starts the execution. It checks do I have a waitee.

Since wait_commit_sub_id == entry->last_committed_sub_id no waitee is set.

Secondly: 'entry->stop_on_error_sub_id' is set by '1'st execution. Now
'handle_parallel_thread' code checks if the current group 'sub_id' is greater
than the 'sub_id' set within 'stop_on_error_sub_id'.

Since the above is true 'skip_event_group=true' is set. Simply call
'wait_for_prior_commit' to wakeup all waiters. Group '2' didn't had any
waitee and its execution is skipped. Hence its wakeup_error=0.It sends a
positive wakeup signal to '3'. Which commits. This results in a missed
transaction. i.e 33 is missed and 34 is committed.

Fix:
===
When a worker learns that an earlier transaction execution has failed, and it
should not proceed for further execution, it should mark its own execution
status as failed so that it alerts its followers to abort as well.

bc70862... by Julius Goryavsky <email address hidden>

MDEV-20614: Syntax error, and option put in wrong place

A syntax error in the mysqld_multi.sh script has been fixed
here + a "--defaults-group-suffix" option has been moved to
the top of the mysqld options list.