MDEV-27088: Server crash on ARM (WMM architecture) due to missing barriers in lf-hash
MariaDB server crashes on ARM (weak memory model architecture) while
concurrently executing l_find to load node->key and add_to_purgatory
to store node->key = NULL. l_find then uses key (which is NULL), to
pass it to a comparison function.
The specific problem is the out-of-order execution that happens on a
weak memory model architecture. Two essential reorderings are possible,
which need to be prevented.
a) As l_find has no barriers in place between the optimistic read of
the key field lf_hash.cc#L117 and the verification of link lf_hash.cc#L124,
the processor can reorder the load to happen after the while-loop.
In that case, a concurrent thread executing add_to_purgatory on the same
node can be scheduled to store NULL at the key field lf_alloc-pin.c#L253
before key is loaded in l_find.
b) A node is marked as deleted by a CAS in l_delete lf_hash.cc#L247 and
taken off the list with an upfollowing CAS lf_hash.cc#L252. Only if both
CAS succeed, the key field is written to by add_to_purgatory. However,
due to a missing barrier, the relaxed store of key lf_alloc-pin.c#L253
can be moved ahead of the two CAS operations, which makes the value of
the local purgatory list stored by add_to_purgatory visible to all threads
operating on the list. As the node is not marked as deleted yet, the
same error occurs in l_find.
This change three accesses to be atomic.
* optimistic read of key in l_find lf_hash.cc#L117
* read of link for verification lf_hash.cc#L124
* write of key in add_to_purgatory lf_alloc-pin.c#L253
MDEV-26553 NOT IN subquery construct crashing 10.1 and up
This bug was introduced by commit be00e279c6061134a33a8099fd69d4304735d02e
The commit was applied for the task MDEV-6480 that allowed to remove top
level disjuncts from WHERE conditions if the range optimizer evaluated them
as always equal to FALSE/NULL.
If such disjuncts are removed the WHERE condition may become an AND formula
and if this formula contains multiple equalities the field JOIN::item_equal
must be updated to refer to these equalities. The above mentioned commit
forgot to do this and it could cause crashes for some queries.
Approved by Oleksandr Byelkin <email address hidden>
close_connections() in mysqld.cc sends a signal to all threads.
But InnoDB is too busy purging, doesn't react immediately.
close_connections() waits 20 seconds, which isn't enough in this
particular case, and then unlinks all threads from
the list and forcibly closes their vio connection.
InnoDB background threads have no vio connection to close, but
they're unlinked all the same. So when later they finally notice
the shutdown request and try to unlink themselves, they fail to
assert that they're still linked.
Fix: don't assert_linked, as another thread can unlink this THD anytime