maria:knielsen_mdev31949_review

Last commit made on 2024-02-01
Get this branch:
git clone -b knielsen_mdev31949_review https://git.launchpad.net/maria

Branch merges

Branch information

Name:
knielsen_mdev31949_review
Repository:
lp:maria

Recent commits

e2f0f91... by Kristian Nielsen

MDEV-31949: XA review, test cases for binlog recovery

Signed-off-by: Kristian Nielsen <email address hidden>

7a622bb... by Kristian Nielsen

MDEV-31949: Test case to demonstrate missing save of innodb position on XA PREPARE

Signed-off-by: Kristian Nielsen <email address hidden>

1bdc9bc... by Andrei <email address hidden>

MDEV-31949 IV. Recovery
MDEV-33168 XA crash-recovery base on engines prepare first rule

This commit address XA transaction's prepare, commit and rollback
commands' crash-recovery both in the normal and semisync slave mode.

Key changes include:
- xid_recovery_member is extended to keep track of XID transaction
  "state" change. The state associated with its XID that when being
  reused could exists in binlog in multiple instance
- xarecover_handlerton is extended to register the user prepare xid
  similarly to the normal transactions
- xarecover_do_commit_or_rollback calls now a new
  xarecover_decide_xa() to decide on a user xid at the
  end of recovery
- TC_LOG_BINLOG::recover and few members of Recovery_context class are
  extended to scan binlog for the user XIDs and keep track of them, and
  their marking as truncatable or contrary - that is durable

Todo:
1. integrate with Xid_list_log_event
2. tests with multiple-engines

a4a9b08... by Andrei <email address hidden>

MDEV-31949 III. Innodb flush_log_later for XA commit,rollback

The previous part I,II commits of the MDEV-31949 branch implement
XA-commit,rollback commands (shortcut as XAC,XAR here and elsewhere)
processing in the Engine from inside binlog group commit, similarly
how it's done to the normal transaction's commit. Under enabled
binlogging the execution path is now the same for normal and xa
transactions:

       MYSQL_BIN_LOG::run_commit_ordered():
       ...
       /* for each hton do */
         hton->{commit_ordered,commit_by_xid,rollback_by_xid}
       ...
       return rc;

The current commit implements flush-log-later policy of durable write
at commit for the user xa. Unlike the normal trx case, XA-rollback
also needs a similar addressing.

833e7e1... by Andrei <email address hidden>

MDEV-31949 II.2 Slave: XAP -> XAC via wait 4 pc optimization.

This commit optimizes XAP -xid-> XAC dependency tracking
to ceasing tag XAC events with SPECULATE_WAIT so
letting it to run in parallel with its parent and/or any duplicate
XAP:s to the extent that it may not find its xid yet
inserted by the parent into the system cache.
In such a case, when slave binlog is ON, XAC anyway proceeds to
wait-for-prior-transactions-to-commit stage and when it survives that
the xid must be available, being released by the parent XAP.

To facilitate handling of XAP -xid-> XAC dependency a dummy prepared xid
object is initially associated with the parallel slave XAC
transaction.
That allows for XAC to pass through is-xa-prepared checks at the
intial binlog phase. The actual xid will be acquired after
the wait-for-prior-transactions-to-commit stage, that is at time
all preceding transactions have been completed. The parent XAP in
particular must have released the xid.

A measure against MDEV-32455 to not rely on THD::lex::xid for other
than slave XAC:s in
binlog_{commit,rollback}_flush_trx_cache() which can be called in a buggy cases.

6cbb63f... by Andrei <email address hidden>

MDEV-31949 II.1 slow parallel replication of user xa

This is the 1st commit of the II part of the 3 parts branch.
The part I refactors the binary logging of XA to integrate
it properly with the Binlog-Group-Commit module.
The part III is concerned with an optimal durable write
policy by the user XA-PREPARE,COMMIT,ROLLBACK.

Introduced by MDEV-742 XA-Prepare group of events

  XA START xid
  ...
  XA END xid
  XA PREPARE xid

and its XA-"complete" terminator

  XA COMMIT or
  XA ROLLBACK

that constitute a binlog image of a user XA transaction
have now been made distributed Round-Robin across slave parallel workers.
The original hash-based policy was reasonably good in identifying
the XA-PREPARE worker by a respective XA-"COMPLETE".
Yet the hash method was proven to attribute to execution latency
through creating a big - many times larger than the size of the worker
pool - queue of binlog-ordered transactions to commit.

Acronyms used in the commit message and patch comments:

  XAP := XA-Prepare event or the whole prepared XA group of events
  XAC := XA-"complete", which is a solitary group of events,
         sometimes specifically XA-Commit.

NOTABLE CHANGES in this commit are done mostly to
the parallel slave module.

The parallel slave driver thread now maintains a list XAP:s currently
in scheduling. Its purpose is to avoid parallel execution of XA:s
with duplicate XIDs (unlikely, but that's the user's right).
The list is arranged as a sliding window with the size of the slave
worker pool which is sufficient to resolve any XAP_i -> XAP_i+k
dependency. E.g having 4 worker threads XAP_5(xid) can be safely
distributed when its XID did not occur in XAP:s indexed 2,3,4; it
could in XAP_1, but this would be safe because both would belong
to the same worker.

The current commit also tracks XAP_i -> XAC_i+k dependency between
the prepared and the commit parts of the same transaction and does
so rather conservatively. This is optimized away in the following commit.
The dependency XAP -xid-> XA[PC] tracker works as follows.
When a new XAP (or XAC in this commit) enters the replication applier its XID
is compared against ones in the window. If the current one duplicates,
e.g when it's XAC(xid) that comes in after XAP(xid), then
XAC would be 'waiting for prior transaction to commit'.
In this example among the prior transactions one would necessarily be
the XAC's "parent" XA-PREPARE part of the whole XA transaction.

OTHER CHANGES,
to
  ha_close_connection()
are done to keep the binlog hton connection alive for the end of
XA-prepare's binlogging on the parallel slave.
Other than the binlog hton:s are disconnected from the slave
transaction handler in TC_LOG::run_commit_ordered() which makes
the prepared transaction engine objects available to
binlog-commit-order followers.

UNRELATED
to
  rpl_xa_prepare_gtid_fail.test
added expected warning suppression that MDEV-31038 missed out.

Thanks to Brandon Nesterenko at mariadb.com for initial review and
a lot of creative efforts to advance with this work!
Enormous kudos to Kristian Nielsen for thorough insightful reviews!
Roel Van de Paar helped to improved from the testing.

1e81859... by Andrei <email address hidden>

MDEV-31949 The slave side changes to tests

This commit carries specific replication and slave side tests.

c8c1bf6... by Andrei <email address hidden>

MDEV-32857 Assertion in new branches of MYSQL_BIN_LOG::run_commit_ordered

This MDEV-32455 assert is refined to let in the actual bug load
without firing.

b4ca0b5... by Andrei <email address hidden>

MDEV-32852 assert in xa-rollback branch of MDEV-32830 run_commit_ordered

is replaced.
XA in either XA_ACTIVE or XA_IDLE state can
be ended by connection close which in combination with
a non-safe to rollback property can lead into the assert branch as well.

A proper assert is placed to check the fact of a forceful xa ending
with connection close.

3cca582... by Andrei <email address hidden>

MDEV-32830 I. refactor XA binlogging for better integration with BGC/replication/recovery

This commit is the part I in the series of three that addresses MDEV-31949.

This part improves upon MDEV-742 design of the XA binlogging.
When binlog is ON, to execute XA-prepare first in engines to write
the xa-prepare replication event last.
MYSQL_BIN_LOG::run_ordered_commit() is made to execute XA commit and
rollback in engines, as well as
it completes with the XA transaction state, so that its
XID has become available for next use, e.g by another slave parallel
worker, that e.g could be waiting for its order to commit a being
prepared XID transaction.

Notable changes:

sql/handler.cc
  - ha_prepare() now is ensured to execute binlog_hton::prepare() as
    the last branch to prepare; recovery concerns are addressed
    separately (not by MDEV-21469, but rather along the plan of bug#76233)
  - read-only cases are handled more uniformly with extending
    has_binlog_hton() that registers the binlog branch at ha_prepare()
    when necessary
  - ha_rollback_trans() executes binlog_hton::rollback() as first
    branch, as a part of the 76233 planned recovery;
  - ditto to the external completion of XA via ha_commit_or_rollback_by_xid().
sql/log.cc
  - binlog_commit,rollback() simplified/clarified on the condition of an empty
    statement/trx
  - MYSQL_BIN_LOG::run_commit_ordered() is extended to cover
    XA-{commit, rollback} respective actions in Engines
  - XID unlogging by XA is converted to employ the standard
    binlog-checkpoint mechanism (cache_mngr->using_xa= TRUE)
sql/xa.cc
   - trans_xa_commit,rollback() are refactored to invoke a common function
     for external (connection) completion
   - adapting the above two and slave_applier_reset_xa_trans() to
     possible (normal to slave) XID release, inside run_ordered_commit().