rpl.rpl_xa_prepare_gtid_fail would sporadically fail, as an XA COMMIT,
running concurrently with a failed prepare, could sometimes complete
succesfully, or be rolled back due to failure of prior commit. The
test expected the completion case, but if it failed, the xa transaction
would be left in a prepared state with a lock on its table. This then
created a lock time-out during test cleanup, as it tries to drop that
table.
The fix is to extend the test to check if the transaction is still
prepared, and silently commit it if so, thus releasing the locks.
The non-determinism is fine (i.e. DEBUG_SYNC isn't needed to force
one path), as the verification performed by the test has already
completed. This is just for cleanup.
MDEV-31949 parallel slave xa Round-Robin distribution
XA-Prepare group of events
XA START xid
...
XA END xid
XA PREPARE xid
and its XA-"complete" terminator
XA COMMIT or
XA ROLLBACK
are made distributed Round-Robin across slave parallel workers.
The former hash-based policy was proven to attribute to execution
latency through creating a big - many times larger than the size
of the worker pool - queue of binlog-ordered transactions
to commit.
Acronyms and notations used below:
XAP := XA-Prepare event or the whole prepared XA group of events
XAC := XA-"complete", which is a solitary group of events
|W| := the size of the slave worker pool
Subscripts like `_k' denote order in a corresponding sequence
(e.g binlog file).
KEY CHANGES:
The parallel slave
------------------
driver thread now maintains a list XAP:s currently
in processing. It's purpose is to avoid "wild" parallel execution of XA:s
with duplicate xids (unlikely, but that's the user's right).
The list is arranged as a sliding window with the size of 2*|W| to account
a possibility of XAP_k -> XAP_k+2|W|-1 the largest (in the group-of-events
count sense) dependency.
Say k=1, and |W| the # of Workers is 4. As transactions are distributed
Round-Robin, it's possible to have T^*_1 -> T^*_8 as the largest
dependency ('*' marks the dependents) in runtime.
It can be seen from worker queues, like in the picture below.
Let Q_i worker queues develop downward:
Q1 ... Q4
1^* 2 3 4
5 6 7 8^*
Worker # 1 has assigned with T_1 and T_5.
Worker #4 can take on its T_8 when T_1 is yet at the
beginning of its processing, so even before XA START of that XAP.
XA related
----------
XID_cache_element is extended with two pointers to resolve
two types of dependencies: the duplicate xid XAP_k -> XAP_k+i
and the ordinary completion on the prepare XAP_k -> XAC_k+j.
The former is handled by a wait-for-xid protocol conducted by
xid_cache_delete() and xid_cache_insert_maybe_wait().
The later is done analogously by xid_cache_search_maybe_wait() and
slave_applier_reset_xa_trans().
XA-"complete" are allowed to go forward before its XAP parent
has released the xid (all recovery concerns are covered in MDEV-21496,
MDEV-21777).
Yet XAC is going to wait for it at a critical
point of execution which is at "complete" the work in Engine.
CAVEAT: storage/innobase/trx/trx0undo.cc changes are due to possibly
fixed MDEV-32144,
TODO: to be verified.
Thanks to Brandon Nesterenko at mariadb.com for initial review and
a lot of creative efforts to advance with this work!
MDEV-28122 Optimize table crash while applying online log
- InnoDB fails to check the overflow buffer while applying
the operation to the table that was rebuilt. This is caused
by commit 3cef4f8f0fc88ae5bfae4603d8d600ec84cc70a9 (MDEV-515).
Do not create histograms for single column unique key
The intentention was always to not create histograms for single value
unique keys (as histograms is not useful in this case), but because of
a bug in the code this was still done.
The changes in the test cases was mainly because hist_size is now NULL
for these kind of columns.
MDEV-32272 lock_release_on_prepare_try() does not release lock if supremum bit is set along with other bits set in lock's bitmap
The error is caused by MDEV-30165 fix with the following commit:
d13a57ae8181f2a8fbee86838d5476740e050d50
There is logical error in lock_release_on_prepare_try():
if (supremum_bit) lock_rec_unlock_supremum(*cell, lock);
else lock_rec_dequeue_from_page(lock, false);
Because there can be other bits set in the lock's bitmap, and the lock
type can be suitable for releasing criteria, but the above logic
releases only supremum bit of the lock.
The fix is to release lock if it suits for releasing criteria and unlock
supremum if supremum is locked otherwise.
Tere is also the test for the case, which was reported by QA team. I
placed it in a separate files, because it requires debug build.
MDEV-31098 InnoDB Recovery doesn't display encryption message when no encryption configuration passed
- InnoDB fails to report the error when encryption configuration
wasn't passed. This patch addresses the issue by adding
the error while loading the tablespace and deferring the
tablespace creation.