lp:~maria-captains/maria/10.0-custombld

Created by Kristian Nielsen and last modified
Get this branch:
bzr branch lp:~maria-captains/maria/10.0-custombld
Members of Maria-captains can upload to this branch. Log in for directions.

Branch merges

Related bugs

Related blueprints

Branch information

Owner:
Maria-captains
Project:
MariaDB
Status:
Development

Recent revisions

4514. By Kristian Nielsen

Empty commit to trigger a buildbot run

4513. By Kristian Nielsen

MDEV-7326: Server deadlock in connection with parallel replication

Some printouts for a custom build, to try to discover the source of a curious
rare deadlock during parallel replication.

Printout the GTIDS and lock types in case of a deadlock kill (later
transaction blocking earlier transaction on an InnoDB row or table lock).
With --innodb-print_all_deadlocks=1, print out the full transaction and lock
in the error log.

Print all slave deadlock errors as warnings in the error log.

4512. By Kristian Nielsen

Cherry-pick MDEV-7237 fix for user testing in custom build.

4511. By Kristian Nielsen

MDEV-7326: Server deadlock in connection with parallel replication

Previous fix was incomplete. We could still deadlock in the following
scenario:

T1 in one GCO retries and does unmark. T2/T3 are in the next GCO. T3 goes to
wait for T1 to mark again. Then T2 does mark. T2 would not signal T3 (because
T2 and T3 are in the same GCO). But then T1 later would also not signal T3;
because there was a check for count_committing_event_groups==GCO->wait_count.
But this does not hold here, because T2 has also done mark_start_commit().
So now count_committing_event_groups > GCO->wait_count.

The fix is to relax the check, and signal any GCO with
wait_count <= count_committing_event_groups.

4510. By Kristian Nielsen

MDEV-7326: Server deadlock in connection with parallel replication

Add test case, and fix stupid typo found from the test case.

4509. By Kristian Nielsen

MDEV-7326: Server deadlock in connection with parallel replication

The bug occurs when a transaction does a retry after all transactions have
done mark_start_commit() in a batch of group commit from the master. In this
case, the retrying transaction can unmark_start_commit() after the following
batch has already started running and de-allocated the GCO. Then after retry,
the transaction will re-do mark_start_commit() on a de-allocated GCO, and also
wakeup of later GCOs can be lost.

This was seen "in the wild" by a user, even though it is not known exactly
what circumstances can lead to retry of one transaction after all transactions
in a group have reached the commit phase.

The lifetime around GCO was somewhat clunky anyway. With this patch, a GCO
lives until rpl_parallel_entry::last_committed_sub_id has reached the last
transaction in the GCO. This guarantees that the GCO will still be alive when
a transaction does mark_start_commit(). Also, we now loop over the list of
active GCOs for wakeup, to ensure we do not lose a wakeup even in the
problematic case.

4508. By Kristian Nielsen

Empty commit to trigger a buildbot run

4507. By Kristian Nielsen

MDEV-7326: Server deadlock in connection with parallel replication

Add some printouts to try to track down a server deadlock in parallel
replication that happens for a user.

The hang seems to happen when a worker thread is waiting for the prior group
commit batch to all start committing. It looks like the prior batch completed,
but the stuck thread somehow missed its wakeup. Also, it seems to happen
related to transaction retry.

Add a printout when the stuck thread is killed with some info about what
happened around the wait. And also add a check at transaction retry to try to
confirm a suspected possible cause of the missed wakeup.

4506. By Sergei Golubchik

5.5 merge

4505. By Sergei Golubchik

Merge

Branch metadata

Branch format:
Branch format 7
Repository format:
Bazaar repository format 2a (needs bzr 1.16 or later)
Stacked on:
lp:maria
This branch contains Public information 
Everyone can see this information.