maria:10.6-MDEV-31949-xlle

Last commit made on 2024-02-15
Get this branch:
git clone -b 10.6-MDEV-31949-xlle https://git.launchpad.net/maria

Branch merges

Branch information

Name:
10.6-MDEV-31949-xlle
Repository:
lp:maria

Recent commits

50adf21... by Andrei <email address hidden>

Another increment to Xlle support.

- xid:s of Xlle are treated as if they are XAP:s at the start of
  the (recovery) initial binlog file
- xid_cache_insert() needs an explicit "was binlogged" argument
  when xid is recovered (into prepared state)
- Xid_list_log_event::Xid_list_log_event() made to compute
  the size of the list inside the ctor's body; its
  List_log_event base is inited with zero.
  That's 'cos thd_arg may be also null and then a THD instance
  has to be dynamically created, to be passed to methods dealing
  with the list, also at time of `mysqld_server_started` is still false.

4cb5e5b... by Andrei <email address hidden>

MDEV-31949: MDEV-33168 XA crash-recovery

The commit adopts Xid_list_log_event from bb-10.6-MDEV-31949-xlle
branch, contributed by Brandon.

The following changes are done over that work:
- unlike Gtid-list don't write zero size xid list
- Xid_list_log_event element size is (significantly) optimized, for that
  +min,max _size() had to be introduced to *the* T classes
  (impossible do that in the template 'cos min_size() is invoke
  in the ctor stack)
- xid list is not broken by a new line anymore

713e733... by Brandon Nesterenko

MDEV-33168: XA crash-recovery base on engines prepare first rule

This patch begins to add a new Xid_list_log_event to
facilitate recovery of user XA transactions, to
go alongside MDEV-31949.

The patch is rough so far, but the idea is:

 * Create base class List_log_event which provides logic
   to be shared between Xid and Gtid list events
 * Changed Gtid_list_log_event to derive from
   List_log_event (likely to be split into a separate
   pre-commit later on for easier readibility)
 * Created Xid_list_log_event which derives from
   List_log_event
 * Extend XID_cache_element to track if the XAP is the
   latest event in the binlog for an XA transaction
 * When binlogging XAP or XAC, while still holding LOCK_log,
   the XID_cache_element is updated to track the binlog state
 * Write XID list log event at the start of a binary log
   such that it collects XIDs who's cache element indicates
   that the XAP is the latest event in binlog

Many TODOs still, including:
 1. Added slave capability version for Xid_list_log_event,
    though it may be unnecessary, as I don't think it needs
    to go to the slave as it is entirely for recovery.
 2. Finish TODOs mentioned throughout the current patch
 3. Add comments, and ensure existing comments from
    Gtid_list_log_event are consistent

ceab965... by Andrei <email address hidden>

MDEV-31949/33168 IV. Recovery

MDEV-33168 XA crash-recovery base on engines prepare first rule.
This commit address XA transaction's prepare, commit and rollback
commands' crash-recovery both in the normal and semisync slave mode.

Key changes include:

- xid_recovery_member is extended to keep track of XID transaction
  "state" change. The state associated with its XID that when being
  reused could exists in binlog in multiple instance
- xarecover_handlerton is extended to register the user prepare xid
  similarly to the normal transactions
- xarecover_do_commit_or_rollback calls now a new
  xarecover_decide_xa() to decide on a user xid at the
  end of recovery
- TC_LOG_BINLOG::recover and few members of Recovery_context class are
  extended to scan binlog for the user XIDs and keep track of them, and
  their marking as truncatable or contrary - that is durable

Low level notable changes include fixes of sync/2 XA recovery:
- +member::idx_gtid_to_truncate to specify ctx::gtid_maybe_to_trunc
                                  array index to clear through
- Preserved the original decide's in-doubt branch structure:
   either call `set()` or append(gtid) to the array w/o having set g^t

- x_k+i resets on behalf of x_k, for any (sane) state of both
   member::end_log_pos is set in step with ::binlog_coord.

Todo:
1. integrate with Xid_list_log_event
2. test out the semisync slave recovery mode
2. complete/test with multiple-engines

a4a9b08... by Andrei <email address hidden>

MDEV-31949 III. Innodb flush_log_later for XA commit,rollback

The previous part I,II commits of the MDEV-31949 branch implement
XA-commit,rollback commands (shortcut as XAC,XAR here and elsewhere)
processing in the Engine from inside binlog group commit, similarly
how it's done to the normal transaction's commit. Under enabled
binlogging the execution path is now the same for normal and xa
transactions:

       MYSQL_BIN_LOG::run_commit_ordered():
       ...
       /* for each hton do */
         hton->{commit_ordered,commit_by_xid,rollback_by_xid}
       ...
       return rc;

The current commit implements flush-log-later policy of durable write
at commit for the user xa. Unlike the normal trx case, XA-rollback
also needs a similar addressing.

833e7e1... by Andrei <email address hidden>

MDEV-31949 II.2 Slave: XAP -> XAC via wait 4 pc optimization.

This commit optimizes XAP -xid-> XAC dependency tracking
to ceasing tag XAC events with SPECULATE_WAIT so
letting it to run in parallel with its parent and/or any duplicate
XAP:s to the extent that it may not find its xid yet
inserted by the parent into the system cache.
In such a case, when slave binlog is ON, XAC anyway proceeds to
wait-for-prior-transactions-to-commit stage and when it survives that
the xid must be available, being released by the parent XAP.

To facilitate handling of XAP -xid-> XAC dependency a dummy prepared xid
object is initially associated with the parallel slave XAC
transaction.
That allows for XAC to pass through is-xa-prepared checks at the
intial binlog phase. The actual xid will be acquired after
the wait-for-prior-transactions-to-commit stage, that is at time
all preceding transactions have been completed. The parent XAP in
particular must have released the xid.

A measure against MDEV-32455 to not rely on THD::lex::xid for other
than slave XAC:s in
binlog_{commit,rollback}_flush_trx_cache() which can be called in a buggy cases.

6cbb63f... by Andrei <email address hidden>

MDEV-31949 II.1 slow parallel replication of user xa

This is the 1st commit of the II part of the 3 parts branch.
The part I refactors the binary logging of XA to integrate
it properly with the Binlog-Group-Commit module.
The part III is concerned with an optimal durable write
policy by the user XA-PREPARE,COMMIT,ROLLBACK.

Introduced by MDEV-742 XA-Prepare group of events

  XA START xid
  ...
  XA END xid
  XA PREPARE xid

and its XA-"complete" terminator

  XA COMMIT or
  XA ROLLBACK

that constitute a binlog image of a user XA transaction
have now been made distributed Round-Robin across slave parallel workers.
The original hash-based policy was reasonably good in identifying
the XA-PREPARE worker by a respective XA-"COMPLETE".
Yet the hash method was proven to attribute to execution latency
through creating a big - many times larger than the size of the worker
pool - queue of binlog-ordered transactions to commit.

Acronyms used in the commit message and patch comments:

  XAP := XA-Prepare event or the whole prepared XA group of events
  XAC := XA-"complete", which is a solitary group of events,
         sometimes specifically XA-Commit.

NOTABLE CHANGES in this commit are done mostly to
the parallel slave module.

The parallel slave driver thread now maintains a list XAP:s currently
in scheduling. Its purpose is to avoid parallel execution of XA:s
with duplicate XIDs (unlikely, but that's the user's right).
The list is arranged as a sliding window with the size of the slave
worker pool which is sufficient to resolve any XAP_i -> XAP_i+k
dependency. E.g having 4 worker threads XAP_5(xid) can be safely
distributed when its XID did not occur in XAP:s indexed 2,3,4; it
could in XAP_1, but this would be safe because both would belong
to the same worker.

The current commit also tracks XAP_i -> XAC_i+k dependency between
the prepared and the commit parts of the same transaction and does
so rather conservatively. This is optimized away in the following commit.
The dependency XAP -xid-> XA[PC] tracker works as follows.
When a new XAP (or XAC in this commit) enters the replication applier its XID
is compared against ones in the window. If the current one duplicates,
e.g when it's XAC(xid) that comes in after XAP(xid), then
XAC would be 'waiting for prior transaction to commit'.
In this example among the prior transactions one would necessarily be
the XAC's "parent" XA-PREPARE part of the whole XA transaction.

OTHER CHANGES,
to
  ha_close_connection()
are done to keep the binlog hton connection alive for the end of
XA-prepare's binlogging on the parallel slave.
Other than the binlog hton:s are disconnected from the slave
transaction handler in TC_LOG::run_commit_ordered() which makes
the prepared transaction engine objects available to
binlog-commit-order followers.

UNRELATED
to
  rpl_xa_prepare_gtid_fail.test
added expected warning suppression that MDEV-31038 missed out.

Thanks to Brandon Nesterenko at mariadb.com for initial review and
a lot of creative efforts to advance with this work!
Enormous kudos to Kristian Nielsen for thorough insightful reviews!
Roel Van de Paar helped to improved from the testing.

1e81859... by Andrei <email address hidden>

MDEV-31949 The slave side changes to tests

This commit carries specific replication and slave side tests.

c8c1bf6... by Andrei <email address hidden>

MDEV-32857 Assertion in new branches of MYSQL_BIN_LOG::run_commit_ordered

This MDEV-32455 assert is refined to let in the actual bug load
without firing.

b4ca0b5... by Andrei <email address hidden>

MDEV-32852 assert in xa-rollback branch of MDEV-32830 run_commit_ordered

is replaced.
XA in either XA_ACTIVE or XA_IDLE state can
be ended by connection close which in combination with
a non-safe to rollback property can lead into the assert branch as well.

A proper assert is placed to check the fact of a forceful xa ending
with connection close.