maria:st-10.6-MDEV-10962-deadlock-deletes

Last commit made on 2023-07-06
Get this branch:
git clone -b st-10.6-MDEV-10962-deadlock-deletes https://git.launchpad.net/maria

Branch merges

Branch information

Name:
st-10.6-MDEV-10962-deadlock-deletes
Repository:
lp:maria

Recent commits

4388d79... by Vlad Lesin

MDEV-29311 Server Status Innodb_row_lock_time% is reported in seconds

Before MDEV-24671, the wait time was derived from my_interval_timer() /
1000 (nanoseconds converted to microseconds, and not microseconds to
milliseconds like I must have assumed). The lock_sys.wait_time and
lock_sys.wait_time_max are already in milliseconds; we should not divide
them by 1000.

In MDEV-24738 the millisecond counts lock_sys.wait_time and
lock_sys.wait_time_max were changed to a 32-bit type. That would
overflow in 49.7 days. Keep using a 64-bit type for those millisecond
counters.

b3b29ba... by Vlad Lesin

MDEV-10962 Deadlock with 3 concurrent DELETEs by unique key

PROBLEM:
A deadlock was possible when a transaction tried to "upgrade" an already
held Record Lock to Next Key Lock.

SOLUTION:
This patch is based on observations that:
(1) a Next Key Lock is equivalent to Record Lock combined with Gap Lock
(2) a GAP Lock never has to wait for any other lock
In case we request a Next Key Lock, we check if we already own a Record
Lock of equal or stronger mode, and if so, then we change the requested
lock type to GAP Lock, which we either already have, or can be granted
immediately, as GAP locks don't conflict with any other lock types.
(We don't consider Insert Intention Locks a Gap Lock in above statements).

The reason of why we don't upgrage Record Lock to Next Key Lock is the
following.

Imagine a transaction which does something like this:

for each row {
    request lock in LOCK_X|LOCK_REC_NOT_GAP mode
    request lock in LOCK_S mode
}

If we upgraded lock from Record Lock to Next Key lock, there would be
created only two lock_t structs for each page, one for
LOCK_X|LOCK_REC_NOT_GAP mode and one for LOCK_S mode, and then used
their bitmaps to mark all records from the same page.

The situation would look like this:

request lock in LOCK_X|LOCK_REC_NOT_GAP mode on row 1:
// -> creates new lock_t for LOCK_X|LOCK_REC_NOT_GAP mode and sets bit for
// 1
request lock in LOCK_S mode on row 1:
// -> notices that we already have LOCK_X|LOCK_REC_NOT_GAP on the row 1,
// so it upgrades it to X
request lock in LOCK_X|LOCK_REC_NOT_GAP mode on row 2:
// -> creates a new lock_t for LOCK_X|LOCK_REC_NOT_GAP mode (because we
// don't have any after we've upgraded!) and sets bit for 2
request lock in LOCK_S mode on row 2:
// -> notices that we already have LOCK_X|LOCK_REC_NOT_GAP on the row 2,
// so it upgrades it to X
    ...etc...etc..

Each iteration of the loop creates a new lock_t struct, and in the end we
have a lot (one for each record!) of LOCK_X locks, each with single bit
set in the bitmap. Soon we run out of space for lock_t structs.

If we create LOCK_GAP instead of lock upgrading, the above scenario works
like the following:

// -> creates new lock_t for LOCK_X|LOCK_REC_NOT_GAP mode and sets bit for
// 1
request lock in LOCK_S mode on row 1:
// -> notices that we already have LOCK_X|LOCK_REC_NOT_GAP on the row 1,
// so it creates LOCK_S|LOCK_GAP only and sets bit for 1
request lock in LOCK_X|LOCK_REC_NOT_GAP mode on row 2:
// -> reuses the lock_t for LOCK_X|LOCK_REC_NOT_GAP by setting bit for 2
request lock in LOCK_S mode on row 2:
// -> notices that we already have LOCK_X|LOCK_REC_NOT_GAP on the row 2,
// so it reuses LOCK_S|LOCK_GAP setting bit for 2

In the end we have just two locks per page, one for each mode:
LOCK_X|LOCK_REC_NOT_GAP and LOCK_S|LOCK_GAP.
Another benefit of this solution is that it avoids not-entirely
const-correct, (and otherwise looking risky) "upgrading".

The fix was ported from
mysql/mysql-server@bfba840dfa7794b988c59c94658920dbe556075d
mysql/mysql-server@75cefdb1f73b8f8ac8e22b10dfb5073adbdfdfb0

Reviewed by: Marko Mäkelä

dc1bd18... by Marko Mäkelä

MDEV-31386 InnoDB: Failing assertion: page_type == i_s_page_type[page_type].type_value

i_s_innodb_buffer_page_get_info(): Correct a condition.
After crash recovery, there may be some buffer pool pages in FREED state,
containing garbage (invalid data page contents). Let us ignore such pages
in the INFORMATION_SCHEMA output.

The test innodb.innodb_defragment_fill_factor will be removed, because
the queries that it is invoking on information_schema.innodb_buffer_page
would start to fail. The defragmentation feature was removed in
commit 7ca89af6f8faf1f8ec6ede01a9353ac499d37711 in MariaDB Server 11.1.

Tested by: Matthias Leich

3d90143... by Marko Mäkelä

MDEV-31559 btr_search_hash_table_validate() does not check if CHECK TABLE is killed

btr_search_hash_table_validate(), btr_search_validate(): Add the
parameter THD for checking if the statement has been killed.
Any non-QUICK CHECK TABLE will validate the entire adaptive hash index
for all InnoDB tables, which may be extremely slow when running
multiple concurrent CHECK TABLE.

6d91121... by Oleg Smirnov

MDEV-30639 Upgrade to 10.8 and later does not work on Windows

During the upgrade procedure on Windows mysqld.exe is started with
the named pipe connection protocol. mysqladmin.exe then pings the
server to check if is up and running. Command line looks like:
   mysqladmin.exe --protocol=pipe --socket=mysql_upgrade_service_xxx ping
But the "socket" parameter resets the "protocol" which was previously
initialized with the "pipe" value, setting it to "socket".
As a result, connection cannot be established and the upgrade
procedure fails.
"socket" in Windows is used to pass the name of the pipe so resetting
the protocol is not valid in this case.

This commit fixes resetting of the "protocol" parameter with "socket"
parameter in the case when protocol has been previously initialized
to "pipe" value

3e89b4f... by Vlad Lesin

MDEV-31570 gap_lock_split.test hangs sporadically

The fix is in replacing the waiting for the whole purge finishing
with the the waiting for only delete-marked records purging finishing.

Reviewed by: Marko Mäkelä

687fd6b... by Vlad Lesin

MDEV-30648 btr_estimate_n_rows_in_range() accesses unfixed, unlatched page

The issue is caused by MDEV-30400 fix.

There are two cursors in btr_estimate_n_rows_in_range() - p1 and p2, but
both share the same mtr. Each cursor contains mtr savepoint for the
previously fetched block to release it then the current block is
fetched.

Before MDEV-30400 the block was released with
mtr_t::release_block_at_savepoint(), it just unfixed a block and
released its page patch. In MDEV-30400 it was replaced with
mtr_t::rollback_to_savepoint(), which does the same as the former
mtr_t::release_block_at_savepoint(ulint begin, ulint end) but also
erases the corresponding slots from mtr memo, what invalidates any
stored mtr's memo savepoints, greater or equal to "begin".

The idea of the fix is to get rid of savepoints at all in
btr_estimate_n_rows_in_range() and
btr_estimate_n_rows_in_range_on_level(). As
mtr_t::rollback_to_savepoint() erases elements from mtr_t::m_memo, we
know what element of mtr_t::m_memo can be deleted on the certain case,
so there is no need to store savepoints.

See also the following slides for details:
https://docs.google.com/presentation/d/1RFYBo7EUhM22ab3GOYctv3j_3yC0vHtBY9auObZec8U

Reviewed by: Marko Mäkelä

694ce0d... by Marko Mäkelä

Merge 10.5 into 10.6

84dbd02... by Marko Mäkelä

MDEV-31487: Recovery or backup failure after innodb_undo_log_truncate=ON

recv_sys_t::parse(): For undo tablespace truncation mini-transactions,
remember the start_lsn instead of the end LSN. This is what we expect
after commit 461402a56432fa5cd0f6d53118ccce23081fca18 (MDEV-30479).

4930838... by Marko Mäkelä

Merge 10.5 into 10.6