maria:bb-10.6-MDEV-30456-galera

Last commit made on 2024-03-26
Get this branch:
git clone -b bb-10.6-MDEV-30456-galera https://git.launchpad.net/maria

Branch merges

Branch information

Name:
bb-10.6-MDEV-30456-galera
Repository:
lp:maria

Recent commits

fded896... by Daniele Sciascia <email address hidden>

Re-enable MTR test galera_sr.GCF-1060

Test seems to pass reliably now.

Signed-off-by: Julius Goryavsky <email address hidden>

98cf8b5... by sjaakola <email address hidden>

MDEV-30456 ALTER TABLE algorithm clause not replicated

Fixing the regression happening with galera.galera_toi_ddl_nonconflicting
The problem, surfaced by the test, was due to multi-statement used in the test.
The test issues two queries, as multi-statement:

--send ALTER TABLE t1 ADD COLUMN f3 INTEGER; INSERT INTO t1 (f1, f2) VALUES (DEFAULT, 123);

With this, the session's THD structure has in query_string member
the full multi-statement, and this was used as base query in
TOI replication rewriting.

The fix is to pass actual length of the ALTER statement part for
TOI replication.

Signed-off-by: Julius Goryavsky <email address hidden>

8da8fa9... by Julius Goryavsky <email address hidden>

MDEV-30456: post-review changes

93d7258... by sjaakola <email address hidden>

MDEV-30456 ALTER TABLE algorithm clause not replicated

If user has specified the desired alter algorithm by using session
variable alter_algorithm, and not specifying the algorithm in the
ALTER SQL statement, then the ALTER will be processed in sending node
according the algorithm chosen by the alter_algorithm variable.
But in receiving nodes, appliers cannot see this session variable,
and will use the default algorithm when executing the ALTER.

The fix in this commit, will check if user has set alter_algorithm
variable and has not specified the algorithm clause in the ALTER
statement, and in such case will rewrite the original ALTER statement
appended by the algorithm clause. This rewritten ALTER statement will
be used in replication.

The commit has also new mtr test: galera.galera_alter_table_algorithm
which verifies that ALTER query rewrite happens when needed

Signed-off-by: Julius Goryavsky <email address hidden>

c1da568... by Daniel Black

MDEV-33726 Moving from MariaDB 10.5 to 10.6 mysql_upgrade

.. is not updating some system tables

Some schema changes from MDEV-24312 master_host has 60 character limit, increase to 255 bytes
failed to happen in the upgrade for tables in the mysql schema:
* mysql.global_priv
* mysql.procs_priv
* mysql.proxies_priv
* mysql.roles_mapping

17e59ed... by Marko Mäkelä

MDEV-33454 release row locks for non-modified rows at XA PREPARE

From the correctness point of view, it should be safe to release
all locks on index records that were not modified by the transaction.
Doing so should make the locks after XA PREPARE fully compatible
with what would happen if the server were restarted: InnoDB table
IX locks and exclusive record locks would be resurrected based on
undo log records.

Concurrently running transactions that are waiting for a lock may invoke
lock_rec_convert_impl_to_expl() to create an explicit record lock object
on behalf of the lock-owning transaction so that they can attaching
their waiting lock request on the explicit record lock object. Explicit
locks would be released by trx_t::release_locks() during commit or
rollback.

Any clustered index record whose DB_TRX_ID belongs to a transaction that
is in active or XA PREPARE state will be implicitly locked by that
transaction. On XA PREPARE, we can release explicit exclusive locks on
records whose DB_TRX_ID does not match the current transaction identifier.

lock_rec_unlock_unmodified(): Release record locks that are not implicitly
held by the current transaction.

lock_release_on_prepare_try(), lock_release_on_prepare():
Invoke lock_rec_unlock_unmodified().

row_trx_id_offset(): Declare non-static.

lock_rec_unlock(): Replaces lock_rec_unlock_supremum().

Reviewed by: Vladislav Lesin

fa8a46e... by Marko Mäkelä

MDEV-33613 InnoDB may still hang when temporarily running out of buffer pool

By design, InnoDB has always hung when permanently running out of
buffer pool, for example when several threads are waiting to allocate
a block, and all of the buffer pool is buffer-fixed by the active threads.

The hang that we are fixing here occurs when the buffer pool is only
temporarily running out and the situation could be rescued by writing out
some dirty pages or evicting some clean pages.

buf_LRU_get_free_block(): Simplify the way how we wait for
the buf_flush_page_cleaner thread. This fixes occasional hangs
of the test encryption.innochecksum that were introduced by
commit a55b951e6082a4ce9a1f2ed5ee176ea7dbbaf1f2 (MDEV-26827).
To play it safe, we use a timed wait when waiting for the
buf_flush_page_cleaner() thread to perform its job. Should that
thread get stuck, we will invoke buf_pool.LRU_warn() in order to
display a message that pages could not be freed, and keep trying
to wake up the buf_flush_page_cleaner() thread.

The INFORMATION_SCHEMA.INNODB_METRICS counters
buffer_LRU_single_flush_failure_count and
buffer_LRU_get_free_waits will be removed.
The latter is represented by buffer_pool_wait_free.

Also removed will be the message
"InnoDB: Difficult to find free blocks in the buffer pool"
because in d34479dc664d4cbd4e9dad6b0f92d7c8e55595d1 we
introduced a more precise message
"InnoDB: Could not free any blocks in the buffer pool"
in the buf_flush_page_cleaner thread.

buf_pool_t::LRU_warn(): Issue the warning message that we could
not free any blocks in the buffer pool. This may also be invoked
by buf_LRU_get_free_block() if buf_flush_page_cleaner() appears
to be stuck.

buf_pool_t::n_flush_dec(): Remove.

buf_pool_t::n_flush_dec_holding_mutex(): Rename to n_flush_dec().

buf_flush_LRU_list_batch(): Increment the eviction counter for blocks
of temporary, discarded or dropped tablespaces.

buf_flush_LRU(): Make static, and remove the constant parameter
evict=false. The only caller will be the buf_flush_page_cleaner()
thread.

IORequest::is_LRU(): Remove. The only case of evicting pages on
write completion will be when we are writing out pages of the
temporary tablespace. Those pages are not in buf_pool.flush_list,
only in buf_pool.LRU.

buf_page_t::flush(): Remove the parameter evict.

buf_page_t::write_complete(): Change the parameter "bool temporary"
to "bool persistent" and add a parameter for an already read state().

Reviewed by: Debarun Banerjee

75c7c6d... by Brandon Nesterenko

MDEV-33551: Semi-sync Wait Point AFTER_COMMIT Slow on Workloads with Heavy Concurrency

When using semi-sync replication with
rpl_semi_sync_master_wait_point=AFTER_COMMIT, the performance of the
primary can significantly reduce compared to AFTER_SYNC's
performance for workloads with many concurrent users executing
transactions. This is because all connections on the primary share
the same cond_wait variable/mutex pair, so any time an ACK is
received from a replica, all waiting connections are awoken to check
if the ACK was for itself, which is done in mutual exclusion.

This patch changes this such that the waiting THD will use its own
local condition variable, and the ACK receiver thread only signals
connections which have been ACKed for wakeup. That is, the
THD::LOCK_wakeup_ready condition variable is re-used for this
purpose, and the Active_tranx queue nodes are extended to hold the
waiting thread, so it can be signalled once ACKed.

Additionally:

 1) Removed part of MDEV-11853 additions, which allowed suspended
connection threads awaiting their semi-sync ACKs to live until their
ACKs had been received. This part, however, wasn't needed. That is,
all that was needed was for the Ack_thread to survive. So now the
connection threads are killed during phase 1. Thereby
THD::is_awaiting_semisync_ack, and all its related code was removed.

 2) COND_binlog_send is repurposed to signal on the condition when
Active_tranx is emptied during clear_active_tranx_nodes.

 3) At master shutdown (when waiting for slaves), instead of the
main loop individually waiting for each ACK, await_slave_reply()
(renamed await_all_slave_replies()) just waits once for the
repurposed COND_binlog_send to signal it is empty.

 4) Test rpl_semi_sync_shutdown_await_ack is updates as following:
   4.1) Added test case (adapted from Kristian Nielsen) to ensure
that if a thread awaiting its ACK is killed while SHUTDOWN WAIT FOR
ALL SLAVES is issued, the primary will still wait for the ACK from
the killed thread.
   4.2) As connections which by-passed phase 1 of thread killing no
longer are delayed for kill until phase 2, we can no longer query
yes/no tx after receiving an ACK/timeout. The check for these
variables is removed.
   4.3) Comment descriptions are updated which mention that the
connection is alive; and adjusted to be the Ack_thread.

Reviewed By:
============
Kristian Nielsen <email address hidden>

b8a6719... by Marko Mäkelä

MDEV-26642/MDEV-26643/MDEV-32898 Implement innodb_snapshot_isolation

https://jepsen.io/analyses/mysql-8.0.34 highlights that the
transaction isolation levels in the InnoDB storage engine do not
correspond to any widely accepted definitions, such as
"Generalized Isolation Level Definitions"
https://pmg.csail.mit.edu/papers/icde00.pdf
(PL-1 = READ UNCOMMITTED, PL-2 = READ COMMITTED, PL-2.99 = REPEATABLE READ,
PL-3 = SERIALIZABLE).
Only READ UNCOMMITTED in InnoDB seems to match the above definition.

The issue is that InnoDB does not detect write/write conflicts
(Section 4.4.3, Definition 6) in the above.

It appears that as soon as we implement write/write conflict detection
(SET SESSION innodb_snapshot_isolation=ON), the default isolation level
(SET TRANSACTION ISOLATION LEVEL REPEATABLE READ) will become
Snapshot Isolation (similar to Postgres), as defined in Section 4.2 of
"A Critique of ANSI SQL Isolation Levels", MSR-TR-95-51, June 1995
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-95-51.pdf

Locking reads inside InnoDB used to read the latest committed version,
ignoring what should actually be visible to the transaction.
The added test innodb.lock_isolation illustrates this. The statement
 UPDATE t SET a=3 WHERE b=2;
is executed in a transaction that was started before a read view or
a snapshot of the current transaction was created, and committed before
the current transaction attempts to execute
 UPDATE t SET b=3;
If SET innodb_snapshot_isolation=ON is in effect when the second
transaction was started, the second transaction will be aborted with
the error ER_CHECKREAD. By default (innodb_snapshot_isolation=OFF),
the second transaction would execute inconsistently, displaying an
incorrect SELECT COUNT(*) FROM t in its read view.

If innodb_snapshot_isolation=ON, if an attempt to acquire a lock on a
record that does not exist in the current read view is made, an error
DB_RECORD_CHANGED (HA_ERR_RECORD_CHANGED, ER_CHECKREAD) will
be raised. This error will be treated in the same way as a deadlock:
the transaction will be rolled back.

lock_clust_rec_read_check_and_lock(): If the current transaction has
a read view where the record is not visible and
innodb_snapshot_isolation=ON, fail before trying to acquire the lock.

row_sel_build_committed_vers_for_mysql(): If innodb_snapshot_isolation=ON,
disable the "semi-consistent read" logic that had been implemented by
myself on the directions of Heikki Tuuri in order to address
https://bugs.mysql.com/bug.php?id=3300 that was motivated by a customer
wanting UPDATE to skip locked rows that do not match the WHERE condition.
It looks like my changes were included in the MySQL 5.1.5
commit ad126d90e019f223470e73e1b2b528f9007c4532; at that time, employees
of Innobase Oy (a recent acquisition of Oracle) had lost write access to
the repository.

The only reason why we set innodb_snapshot_isolation=OFF by default is
backward compatibility with applications, such as the one that motivated
the implementation of "semi-consistent read" back in 2005. In a later
major release, we can default to innodb_snapshot_isolation=ON.

Thanks to Peter Alvaro, Kyle Kingsbury and Alexey Gotsman for their work
on https://github.com/jepsen-io/ and to Kyle and Alexey for explanations
and some testing of this fix.

Thanks to Vladislav Lesin for the initial test for MDEV-26643,
as well as reviewing these changes.

ca07f62... by Brandon Nesterenko

MDEV-33716: rpl.rpl_semi_sync_slave_enabled_consistent Fails with Error Condition Reached

Though the test itself doesn't create any transactions
directly, the added test suppressions are replicated,
and when the SQL thread is stopped mid-execution,
it is set into an error state because these are
non-transactional events being aborted.

This patch fixes the test by ensuring that the test
suppressions are fully replicated before continuing