lp:~maria-captains/maria/10.0-custombld
- Get this branch:
- bzr branch lp:~maria-captains/maria/10.0-custombld
Branch merges
Branch information
Recent revisions
- 4513. By Kristian Nielsen
-
MDEV-7326: Server deadlock in connection with parallel replication
Some printouts for a custom build, to try to discover the source of a curious
rare deadlock during parallel replication.Printout the GTIDS and lock types in case of a deadlock kill (later
transaction blocking earlier transaction on an InnoDB row or table lock).
With --innodb-print_all_ deadlocks= 1, print out the full transaction and lock
in the error log.Print all slave deadlock errors as warnings in the error log.
- 4511. By Kristian Nielsen
-
MDEV-7326: Server deadlock in connection with parallel replication
Previous fix was incomplete. We could still deadlock in the following
scenario:T1 in one GCO retries and does unmark. T2/T3 are in the next GCO. T3 goes to
wait for T1 to mark again. Then T2 does mark. T2 would not signal T3 (because
T2 and T3 are in the same GCO). But then T1 later would also not signal T3;
because there was a check for count_committing_event_ groups= =GCO->wait_ count.
But this does not hold here, because T2 has also done mark_start_commit( ).
So now count_committing_event_ groups > GCO->wait_count. The fix is to relax the check, and signal any GCO with
wait_count <= count_committing_event_ groups. - 4510. By Kristian Nielsen
-
MDEV-7326: Server deadlock in connection with parallel replication
Add test case, and fix stupid typo found from the test case.
- 4509. By Kristian Nielsen
-
MDEV-7326: Server deadlock in connection with parallel replication
The bug occurs when a transaction does a retry after all transactions have
done mark_start_commit() in a batch of group commit from the master. In this
case, the retrying transaction can unmark_start_commit( ) after the following
batch has already started running and de-allocated the GCO. Then after retry,
the transaction will re-do mark_start_commit() on a de-allocated GCO, and also
wakeup of later GCOs can be lost.This was seen "in the wild" by a user, even though it is not known exactly
what circumstances can lead to retry of one transaction after all transactions
in a group have reached the commit phase.The lifetime around GCO was somewhat clunky anyway. With this patch, a GCO
lives until rpl_parallel_entry:: last_committed_ sub_id has reached the last
transaction in the GCO. This guarantees that the GCO will still be alive when
a transaction does mark_start_commit( ). Also, we now loop over the list of
active GCOs for wakeup, to ensure we do not lose a wakeup even in the
problematic case. - 4507. By Kristian Nielsen
-
MDEV-7326: Server deadlock in connection with parallel replication
Add some printouts to try to track down a server deadlock in parallel
replication that happens for a user.The hang seems to happen when a worker thread is waiting for the prior group
commit batch to all start committing. It looks like the prior batch completed,
but the stuck thread somehow missed its wakeup. Also, it seems to happen
related to transaction retry.Add a printout when the stuck thread is killed with some info about what
happened around the wait. And also add a check at transaction retry to try to
confirm a suspected possible cause of the missed wakeup.
Branch metadata
- Branch format:
- Branch format 7
- Repository format:
- Bazaar repository format 2a (needs bzr 1.16 or later)
- Stacked on:
- lp:maria