MDEV-21910 : KIlling thread on Galera could cause mutex deadlock
Following issues here:
Whenever Galera BF (brute force) transaction decides to abort conflicting
transaction it will kill that thread using thd::awake()
User KILL [QUERY|CONNECTION] ... for a thread it will also call thd::awake()
Whenever one of these actions is executed we will hold number of InnoDB
internal mutexes and thd mutexes. Sometimes these mutexes are taken in
different order causing mutex deadlock.
Lets consider a example starting from Galera BF transaction deciding to abort
conflicting transaction (let's call this thread1):
thread1:
lock_rec_other_has_conflicting
here we hold both lock_sys mutex and trx mutex for conflicting lock transaction.
wsrep_innobase_kill_one_trx
For victim we call wsrep_thd_awake
Next thread2 we assume user to execute KILL QUERY to some other executing
query (note it can't be BF query).
thread2:
sql_kill()
kill_one_thread
find_thread_by_id
takes LOCK_thd_kill
thd->awake_no_mutex()
thread1:
thd->awake(KILL_QUERY)
Tries to have LOCK_thd_kill but it is hold by thread2 so we wait (note that
we still hold lock_sys mutex and trx mutex).
thread2:
ha_kill_query()
kill_handlerton
innobase_kill_query
lock_trx_handle_wait
lock_mutex_enter() must wait as thread1 is holding it
Thus thread1 lock_sys, trx_mutex waits -> thread2 LOCK_thd_kill waits lock_sys -> thread1
==> thread1 waits -> thread2 waits -> thread1 ==> mutex deadlock.
In this patch we will fix Galera BF and user kill cases so that we enqueue
victim thread to a list while we hold InnoDB mutexes and we then release them.
A new background thread will pick victim thread from this new list and uses
thd::awake() with no InnoDB mutexes. Idea is similar to replication background
kill. This fix enforces that we take LOCK_thd_data -> lock sys mutex -> trx mutex
always in this order.
wsrep_mysqld.cc
Here we introduce a list where victim threads are stored,
condition variable to be used to wake up background thread
and mutex to protect list.
wsrep_thd.cc
Create a new background thread to handle victim thread
abort. We may take wsrep_thd_LOCK mutex here but not any
InnoDB mutexes.
wsrep_innobase_kill_one_trx
Remove all the wsrep code that was moved to wsrep_thd.cc
We just enqueue required information to background kill
list and cancel victim trx lock wait if there is such.
Here we have InnoDB lock sys mutex and trx mutex so here
we can't take wsrep_thd_LOCK mutex.
wsrep_abort_transaction
Cleanup only.
Fix GCC 10.0 -Wstringop-overflow
myrg_open(): Reduce the scope of the variable 'end' and
simplify the code.
For some reason, I got no warning for this code in the 10.2
branch, only 10.3 or later.
The ENGINE=MERGE is covered by the tests main.merge, main.merge_debug,
and main.merge-big.
MDEV-21933 INFORMATION_SCHEMA.INNODB_SYS_TABLESPACES accesses SYS_DATAFILES
All tablespace metadata is buffered in fil_system. There is a LRU
mechanism, but that only controls the opening and closing of
fil_node_t::handle.
It is much more efficient and less error-prone to access data file names
by looking up the fil_space_t object rather than by essentially joining
each row with an access to SYS_DATAFILES via the InnoDB internal SQL parser.
dict_get_first_path(): Declare static. The function may only be needed
when loading or updating the data dictionary. Also, change a condition
in order to avoid a bogus GCC 10 -Wstringop-overflow warning for
mem_strdupl() about len==ULINT_UNDEFINED.
i_s_sys_tablespaces_fill_table(): Do not access other InnoDB internal
dictionary tables than SYS_TABLESPACES.