maria:knielsen_faster_stop_slave2

Last commit made on 2023-07-12
Get this branch:
git clone -b knielsen_faster_stop_slave2 https://git.launchpad.net/maria

Branch merges

Branch information

Name:
knielsen_faster_stop_slave2
Repository:
lp:maria

Recent commits

b555b51... by Kristian Nielsen

MDEV-31671: implement fast STOP SLAVE with rollback ongoing transactions

This patch implements a STOP SLAVE FORCE option that tries to stop the slave
faster by rolling back active (transactional) event groups:

 - Abort any replicating row event on the next row operation.
 - In parallel replication, abort event group on the next event (as well as
   all following event groups).

Makes rpl_parallel_entry::abort_slave an std::atomic<>, as it is read
without mutex protection in some cases. And renames it as the more
appropriate name `slave_stopping`, to avoid merges silently leaving direct
accesses to the variable (these would default to std:: memory_order_seq_cst,
which is usually not desirable).

Based on original work by Brandon Nesterenko.

Reviewed-by: Andrei Elkin <email address hidden>
Signed-off-by: Kristian Nielsen <email address hidden>

3f5cee8... by Kristian Nielsen

Fix one case that should not be marked transactional in the GTID event

The case is statement format and mixed InnoDB/MyISAM without
binlog_direct_non_trans_update. Fix due to Brandon Nesterenko.

Reviewed-by: Andrei Elkin <email address hidden>
Signed-off-by: Kristian Nielsen <email address hidden>

08585b0... by Kristian Nielsen

MDEV-31509: Lost data with FTWRL and STOP SLAVE

The largest_started_sub_id needs to be set under LOCK_parallel_entry
together with testing stop_sub_id. However, in-between was the logic for
do_ftwrl_wait(), which temporarily releases the mutex. This could lead to
inconsistent stopping amongst worker threads and lost data.

Fix by moving all the stop-related logic out from unrelated do_gco_wait()
and do_ftwrl_wait() and into its own function do_stop_handling().

Reviewed-by: Andrei Elkin <email address hidden>
Signed-off-by: Kristian Nielsen <email address hidden>

d4309d4... by Kristian Nielsen

MDEV-31448: Killing a replica thread awaiting its GCO can hang/crash a parallel replica

Various test cases for the bugs around MDEV-31448.
Test cases due to Brandon Nesterenko, thanks!

Reviewed-by: Andrei Elkin <email address hidden>
Signed-off-by: Kristian Nielsen <email address hidden>

5d61442... by Kristian Nielsen

MDEV-31448: Killing a replica thread awaiting its GCO can hang/crash a parallel replica

The problem is that when a worker thread is (user) killed in
wait_for_prior_commit, the event group may complete out-of-order since the
wait for prior commit was aborted by the kill.

This fix ensures that event groups will always complete in-order, even
in the error case. This is done in finish_event_group() by doing an
extra wait_for_prior_commit(), if necessary, that ignores kills.

This fix supersedes the fix for MDEV-30780, so the earlier fix for
that is reverted in this patch.

Also fix that an error from wait_for_prior_commit() inside
finish_event_group() would not signal the error to
wakeup_subsequent_commits().

Based on earlier work by Brandon Nesterenko and Andrei Elkin, with
some changes to simplify the semantics of wait_for_prior_commit() and
make the code more robust to future changes.

Reviewed-by: Andrei Elkin <email address hidden>
Signed-off-by: Kristian Nielsen <email address hidden>

a8ea662... by Kristian Nielsen

MDEV-31448: Killing a replica thread awaiting its GCO can hang/crash a parallel replica

The problem was an incorrect unmark_start_commit() in
signal_error_to_sql_driver_thread(). If an event group gets an error, this
unmark could run after the following GCO started, and the subsequent
re-marking could access de-allocated GCO.

The offending unmark_start_commit() looks obviously incorrect, and the fix
is to just remove it. It was introduced in the MDEV-8302 patch, the commit
message of which suggests it was added there solely to satisfy an assertion
in ha_rollback_trans(). So update this assertion instead to not trigger for
event groups that experienced an error (rgi->worker_error). When an error
occurs in an event group, all following event groups are skipped anyway, so
the unmark should never be needed in this case.

Reviewed-by: Andrei Elkin <email address hidden>
Signed-off-by: Kristian Nielsen <email address hidden>

60bec1d... by Kristian Nielsen

MDEV-13915: STOP SLAVE takes very long time on a busy system

At STOP SLAVE, worker threads will continue applying event groups until the
end of the current GCO before stopping. This is a left-over from when only
conservative mode was available. In optimistic and aggressive mode, often
_all_ queued event will be in the same GCO, and slave stop will be
needlessly delayed.

This patch instead records at STOP SLAVE time the latest (highest sub_id)
event group that has started. Then worker threads will continue to apply
event groups up to that event group, but skip any following. The result is
that each worker thread will complete its currently running event group, and
then the slave will stop.

If the slave is caught up, and STOP SLAVE is run in the middle of an event
group that is already executing in a worker thread, then that event group
will be rolled back and the slave stop immediately, as normal.

Reviewed-by: Andrei Elkin <email address hidden>
Signed-off-by: Kristian Nielsen <email address hidden>

b4646c6... by Kristian Nielsen

Misc. small cleanups unrelated to any particular MDEV

Signed-off-by: Kristian Nielsen <email address hidden>

23d5391... by Daniel Black

MDEV-27038 Custom configuration file procedure does not work with Docker Desktop for Windows 10+

Docker when mounting a configuration file into a Windows exposes the
file with permission 0777. These world writable files are ignored by
by MariaDB.

Add the access check such that filesystem RO or immutable file is
counted as sufficient protection on the file.

Test:
$ mkdir /tmp/src
$ vi /tmp/src/my.cnf
$ chmod 666 /tmp/src/my.cnf
$ mkdir /tmp/dst
$ sudo mount --bind /tmp/src /tmp/dst -o ro
$ ls -la /tmp/dst
total 4
drwxr-xr-x. 2 dan dan 60 Jun 15 15:12 .
drwxrwxrwt. 25 root root 660 Jun 15 15:13 ..
-rw-rw-rw-. 1 dan dan 10 Jun 15 15:12 my.cnf
$ mount | grep dst
tmpfs on /tmp/dst type tmpfs (ro,seclabel,nr_inodes=1048576,inode64)

strace client/mariadb --defaults-file=/tmp/dst/my.cnf

newfstatat(AT_FDCWD, "/tmp/dst/my.cnf", {st_mode=S_IFREG|0666, st_size=10, ...}, 0) = 0
access("/tmp/dst/my.cnf", W_OK) = -1 EROFS (Read-only file system)
openat(AT_FDCWD, "/tmp/dst/my.cnf", O_RDONLY|O_CLOEXEC) = 3

The one failing test, but this isn't a regression, just not a total fix:

$ chmod u-w /tmp/src/my.cnf
$ ls -la /tmp/src/my.cnf
-r--rw-rw-. 1 dan dan 18 Jun 16 10:22 /tmp/src/my.cnf
$ strace -fe trace=access client/mariadb --defaults-file=/tmp/dst/my.cnf
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
access("/etc/system-fips", F_OK) = -1 ENOENT (No such file or directory)
access("/tmp/dst/my.cnf", W_OK) = -1 EACCES (Permission denied)
Warning: World-writable config file '/tmp/dst/my.cnf' is ignored

Windows test (Docker Desktop ~4.21) which was the important one to fix:

dan@LAPTOP-5B5P7RCK:~$ docker run --rm -v /mnt/c/Users/danie/Desktop/conf:/etc/mysql/conf.d/:ro -e MARIADB_ROOT_PASSWORD=bob quay.io/m
ariadb-foundation/mariadb-devel:10.4-MDEV-27038-ro-mounts-pkgtest ls -la /etc/mysql/conf.d
total 4
drwxrwxrwx 1 root root 512 Jun 15 13:57 .
drwxr-xr-x 4 root root 4096 Jun 15 07:32 ..
-rwxrwxrwx 1 root root 43 Jun 15 13:56 myapp.cnf

root@a59b38b45af1:/# strace -fe trace=access mariadb
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
access("/etc/mysql/conf.d/myapp.cnf", W_OK) = -1 EROFS (Read-only file system)

7a5c984... by Monty <email address hidden>

MDEV-20010 Equal on two RANK window functions create wrong result

The problematic query outlined a bug in window functions sorting
optimization. When multiple window functions are present in a query,
we sort the sorting key (as defined by PARTITION BY and ORDER BY) from
generic to specific.

SELECT RANK() OVER (ORDER BY const_col) as r1,
       RANK() OVER (ORDER BY const_col, a) as r2,
       RANK() OVER (PARTITION BY c) as r3,
       RANK() OVER (PARTITION BY c ORDER BY b) as r4
FROM table;

For these functions, the sorting we need to do for window function
computations are: [(const_col), (const_col, a)] and [(c), (c, b)].

Instead of doing 4 different sort order, the sorts grouped within [] are
compatible and we can use the most *specific* sort to cover both window
functions.

The bug was caused by an incorrect flagging of which sort is most
specific for a compatible group of functions. In our specific test case,
instead of picking (const_col, a) as the most specific sort, it would
only sort by (const_col), which lead to wrong results for rank function.
By ensuring that we pick the last sort key before an "incompatible sort"
flag is met in our "ordered array of sorting specifications", we
guarantee correct results.