- xid:s of Xlle are treated as if they are XAP:s at the start of
the (recovery) initial binlog file
- xid_cache_insert() needs an explicit "was binlogged" argument
when xid is recovered (into prepared state)
- Xid_list_log_event::Xid_list_log_event() made to compute
the size of the list inside the ctor's body; its
List_log_event base is inited with zero.
That's 'cos thd_arg may be also null and then a THD instance
has to be dynamically created, to be passed to methods dealing
with the list, also at time of `mysqld_server_started` is still false.
The commit adopts Xid_list_log_event from bb-10.6-MDEV-31949-xlle
branch, contributed by Brandon.
The following changes are done over that work:
- unlike Gtid-list don't write zero size xid list
- Xid_list_log_event element size is (significantly) optimized, for that
+min,max _size() had to be introduced to *the* T classes
(impossible do that in the template 'cos min_size() is invoke
in the ctor stack)
- xid list is not broken by a new line anymore
MDEV-33168: XA crash-recovery base on engines prepare first rule
This patch begins to add a new Xid_list_log_event to
facilitate recovery of user XA transactions, to
go alongside MDEV-31949.
The patch is rough so far, but the idea is:
* Create base class List_log_event which provides logic
to be shared between Xid and Gtid list events
* Changed Gtid_list_log_event to derive from
List_log_event (likely to be split into a separate
pre-commit later on for easier readibility)
* Created Xid_list_log_event which derives from
List_log_event
* Extend XID_cache_element to track if the XAP is the
latest event in the binlog for an XA transaction
* When binlogging XAP or XAC, while still holding LOCK_log,
the XID_cache_element is updated to track the binlog state
* Write XID list log event at the start of a binary log
such that it collects XIDs who's cache element indicates
that the XAP is the latest event in binlog
Many TODOs still, including:
1. Added slave capability version for Xid_list_log_event,
though it may be unnecessary, as I don't think it needs
to go to the slave as it is entirely for recovery.
2. Finish TODOs mentioned throughout the current patch
3. Add comments, and ensure existing comments from
Gtid_list_log_event are consistent
MDEV-33168 XA crash-recovery base on engines prepare first rule.
This commit address XA transaction's prepare, commit and rollback
commands' crash-recovery both in the normal and semisync slave mode.
Key changes include:
- xid_recovery_member is extended to keep track of XID transaction
"state" change. The state associated with its XID that when being
reused could exists in binlog in multiple instance
- xarecover_handlerton is extended to register the user prepare xid
similarly to the normal transactions
- xarecover_do_commit_or_rollback calls now a new
xarecover_decide_xa() to decide on a user xid at the
end of recovery
- TC_LOG_BINLOG::recover and few members of Recovery_context class are
extended to scan binlog for the user XIDs and keep track of them, and
their marking as truncatable or contrary - that is durable
Low level notable changes include fixes of sync/2 XA recovery:
- +member::idx_gtid_to_truncate to specify ctx::gtid_maybe_to_trunc array index to clear through
- Preserved the original decide's in-doubt branch structure:
either call `set()` or append(gtid) to the array w/o having set g^t
- x_k+i resets on behalf of x_k, for any (sane) state of both
member::end_log_pos is set in step with ::binlog_coord.
Todo:
1. integrate with Xid_list_log_event
2. test out the semisync slave recovery mode
2. complete/test with multiple-engines
MDEV-31949 III. Innodb flush_log_later for XA commit,rollback
The previous part I,II commits of the MDEV-31949 branch implement
XA-commit,rollback commands (shortcut as XAC,XAR here and elsewhere)
processing in the Engine from inside binlog group commit, similarly
how it's done to the normal transaction's commit. Under enabled
binlogging the execution path is now the same for normal and xa
transactions:
MYSQL_BIN_LOG::run_commit_ordered():
...
/* for each hton do */ hton->{commit_ordered,commit_by_xid,rollback_by_xid}
...
return rc;
The current commit implements flush-log-later policy of durable write
at commit for the user xa. Unlike the normal trx case, XA-rollback
also needs a similar addressing.
MDEV-31949 II.2 Slave: XAP -> XAC via wait 4 pc optimization.
This commit optimizes XAP -xid-> XAC dependency tracking
to ceasing tag XAC events with SPECULATE_WAIT so
letting it to run in parallel with its parent and/or any duplicate
XAP:s to the extent that it may not find its xid yet
inserted by the parent into the system cache.
In such a case, when slave binlog is ON, XAC anyway proceeds to
wait-for-prior-transactions-to-commit stage and when it survives that
the xid must be available, being released by the parent XAP.
To facilitate handling of XAP -xid-> XAC dependency a dummy prepared xid
object is initially associated with the parallel slave XAC
transaction.
That allows for XAC to pass through is-xa-prepared checks at the
intial binlog phase. The actual xid will be acquired after
the wait-for-prior-transactions-to-commit stage, that is at time
all preceding transactions have been completed. The parent XAP in
particular must have released the xid.
A measure against MDEV-32455 to not rely on THD::lex::xid for other
than slave XAC:s in
binlog_{commit,rollback}_flush_trx_cache() which can be called in a buggy cases.
MDEV-31949 II.1 slow parallel replication of user xa
This is the 1st commit of the II part of the 3 parts branch.
The part I refactors the binary logging of XA to integrate
it properly with the Binlog-Group-Commit module.
The part III is concerned with an optimal durable write
policy by the user XA-PREPARE,COMMIT,ROLLBACK.
Introduced by MDEV-742 XA-Prepare group of events
XA START xid
...
XA END xid
XA PREPARE xid
and its XA-"complete" terminator
XA COMMIT or
XA ROLLBACK
that constitute a binlog image of a user XA transaction
have now been made distributed Round-Robin across slave parallel workers.
The original hash-based policy was reasonably good in identifying
the XA-PREPARE worker by a respective XA-"COMPLETE".
Yet the hash method was proven to attribute to execution latency
through creating a big - many times larger than the size of the worker
pool - queue of binlog-ordered transactions to commit.
Acronyms used in the commit message and patch comments:
XAP := XA-Prepare event or the whole prepared XA group of events
XAC := XA-"complete", which is a solitary group of events,
sometimes specifically XA-Commit.
NOTABLE CHANGES in this commit are done mostly to
the parallel slave module.
The parallel slave driver thread now maintains a list XAP:s currently
in scheduling. Its purpose is to avoid parallel execution of XA:s
with duplicate XIDs (unlikely, but that's the user's right).
The list is arranged as a sliding window with the size of the slave
worker pool which is sufficient to resolve any XAP_i -> XAP_i+k
dependency. E.g having 4 worker threads XAP_5(xid) can be safely
distributed when its XID did not occur in XAP:s indexed 2,3,4; it
could in XAP_1, but this would be safe because both would belong
to the same worker.
The current commit also tracks XAP_i -> XAC_i+k dependency between
the prepared and the commit parts of the same transaction and does
so rather conservatively. This is optimized away in the following commit.
The dependency XAP -xid-> XA[PC] tracker works as follows.
When a new XAP (or XAC in this commit) enters the replication applier its XID
is compared against ones in the window. If the current one duplicates,
e.g when it's XAC(xid) that comes in after XAP(xid), then
XAC would be 'waiting for prior transaction to commit'.
In this example among the prior transactions one would necessarily be
the XAC's "parent" XA-PREPARE part of the whole XA transaction.
OTHER CHANGES,
to
ha_close_connection()
are done to keep the binlog hton connection alive for the end of
XA-prepare's binlogging on the parallel slave.
Other than the binlog hton:s are disconnected from the slave
transaction handler in TC_LOG::run_commit_ordered() which makes
the prepared transaction engine objects available to
binlog-commit-order followers.
UNRELATED
to
rpl_xa_prepare_gtid_fail.test
added expected warning suppression that MDEV-31038 missed out.
Thanks to Brandon Nesterenko at mariadb.com for initial review and
a lot of creative efforts to advance with this work!
Enormous kudos to Kristian Nielsen for thorough insightful reviews!
Roel Van de Paar helped to improved from the testing.
MDEV-32852 assert in xa-rollback branch of MDEV-32830 run_commit_ordered
is replaced.
XA in either XA_ACTIVE or XA_IDLE state can
be ended by connection close which in combination with
a non-safe to rollback property can lead into the assert branch as well.
A proper assert is placed to check the fact of a forceful xa ending
with connection close.