maria:bb-10.6-MDEV-28800

Last commit made on 2022-08-26
Get this branch:
git clone -b bb-10.6-MDEV-28800 https://git.launchpad.net/maria

Branch merges

Branch information

Name:
bb-10.6-MDEV-28800
Repository:
lp:maria

Recent commits

8ab50a3... by Marko Mäkelä

MDEV-28800 WIP: Avoid crashes on memory allocation failure

FIXME: Allocate locks upfront for page split or reorganize,
so that the operation can gracefully fail before any irreversible
persistent changes are performed. This affects lock_move_reorganize_page(),
lock_move_rec_list_end(), lock_move_rec_list_start(),
btr_root_raise_and_insert(), btr_insert_into_right_sibling(),
btr_page_split_and_insert().

buf_block_alloc(): Remove. This was an alias of
buf_LRU_get_free_block(false). Let us call that function directly.

buf_LRU_get_free_block(), buf_buddy_alloc_low(), buf_buddy_alloc():
If there is no free block in the buffer pool, return nullptr.

recv_sys_t::recover_low(), recv_sys_t::recover(): Return an error code,
which may be DB_OUT_OF_MEMORY.

lock_rec_create_low(): Return nullptr if the lock table is full.
This will be the only caller of buf_pool.running_out().

btr_search_check_free_space_in_heap(): Replaced with
btr_search_lock_and_alloc().

0fbcb0a... by Marko Mäkelä

MDEV-29383 Assertion mysql_mutex_assert_owner(&log_sys.flush_order_mutex) failed in mtr_t::commit()

In commit 0b47c126e31cddda1e94588799599e138400bcf8 (MDEV-13542)
a few calls to mtr_t::memo_push() were moved before a write latch
on the page was acquired. This introduced a race condition:

1. is_block_dirtied() returned false to mtr_t::memo_push()
2. buf_page_t::write_complete() was executed, the block marked clean,
and a page latch released
3. The page latch was acquired by the caller of mtr_t::memo_push(),
and mtr_t::m_made_dirty was not set even though the block is in
a clean state.

The impact of this race condition is that crash recovery and backups
may fail.

btr_cur_latch_leaves(), btr_store_big_rec_extern_fields(),
btr_free_externally_stored_field(), trx_purge_free_segment():
Acquire the page latch before invoking mtr_t::memo_push().
This fixes the regression caused by MDEV-13542.

Side note: It would suffice to set mtr_t::m_made_dirty at the time
we set the MTR_MEMO_MODIFY flag for a block. Currently that flag is
unnecessarily set if a mini-transaction acquires a page latch on
a page that is in a clean state, and will not actually modify the block.
This may cause unnecessary acquisitions of log_sys.flush_order_mutex
on mtr_t::commit().

mtr_t::free(): If the block had been exclusively latched in this
mini-transaction, set the m_made_dirty flag so that the flush order mutex
will be acquired during mtr_t::commit(). This should have been part of
commit 4179f93d28035ea2798cb1c16feeaaef87ab4775 (MDEV-18976).
It was necessary to change mtr_t::free() so that
WriteOPT_PAGE_CHECKSUM::operator() would be able to avoid writing
checksums for freed pages.

76bb671... by Marko Mäkelä

Merge 10.5 into 10.6

9929301... by Marko Mäkelä

Merge 10.4 into 10.5

851058a... by Marko Mäkelä

Merge 10.3 into 10.4

d1a80c4... by Marko Mäkelä

MDEV-29384 Hangs caused by innodb_adaptive_hash_index=ON

buf_defer_drop_ahi(): Remove. Ever since
commit c7f8cfc9e733517cff4aaa6f6eaca625a3afc098 (MDEV-27700)
it is safe to invoke btr_search_drop_page_hash_index(block, true)
to remove an orphan adaptive hash index.

Any attempt to upgrade page latches is prone to deadlocks. Recently,
we observed a few hangs that involved nothing more than a small table
consisting of one clustered index page, one secondary index page and
some undo pages.

2f6a728... by Sergei Golubchik

update a global_suppressions() list

followup for "remove invalid options from warning messages"

8ff1096... by Vlad Lesin

MDEV-29081 trx_t::lock.was_chosen_as_deadlock_victim race in lock_wait_end()

The issue is that trx_t::lock.was_chosen_as_deadlock_victim can be reset
before the transaction check it and set trx_t::error_state.

The fix is to reset trx_t::lock.was_chosen_as_deadlock_victim only in
trx_t::commit_in_memory(), which is invoked on full rollback. There is
also no need to have separate bit in
trx_t::lock.was_chosen_as_deadlock_victim to flag transaction it was
chosen as a victim of Galera conflict resolution, the same variable can be
used for both cases except debug build. For debug build we need to
distinguish deadlock and Galera's abort victims for debug checks. Also
there is no need to check for deadlock in lock_table_enqueue_waiting() for
Galera as the coresponding check presents in lock_wait().

Local variable "error_state" in lock_wait() was replaced with
trx->error_state, because before the replace
lock_sys_t::cancel<false>(trx, lock) and lock_sys.deadlock_check() could
change trx->error_state, which then could be overwritten with the local
"error_state" variable value.

The lock_wait_suspend_thread_enter DEBUG_SYNC point name is misleading,
because lock_wait_suspend_thread was eliminated in e71e613. It was renamed
to lock_wait_start.

Reviewed by: Marko Mäkelä, Jan Lindström.

f2a53b6... by Marko Mäkelä

btr_search_drop_page_hash_index(): Remove a racey debug check

a3fd9e6... by Vladislav Vaintroub

MDEV-29367 Refactor tpool::cache

Removed use std::vector's ba push_back(), pop_back() to make it more
obvious that memory in the vectors won't be reallocated.

Also, "borrowed" elements can be debugged a little better now,
they are put into the start of the m_cache vector.