Merge lp:~laurynas-biveinis/percona-server/buf-mutex-split-fixes-5.1 into lp:percona-server/5.1
Status: | Merged |
---|---|
Approved by: | Alexey Kopytov |
Approved revision: | no longer in the source branch. |
Merged at revision: | 585 |
Proposed branch: | lp:~laurynas-biveinis/percona-server/buf-mutex-split-fixes-5.1 |
Merge into: | lp:percona-server/5.1 |
Diff against target: |
515 lines (+102/-79) 6 files modified
Percona-Server/storage/innodb_plugin/buf/buf0buf.c (+24/-18) Percona-Server/storage/innodb_plugin/buf/buf0flu.c (+56/-47) Percona-Server/storage/innodb_plugin/buf/buf0lru.c (+11/-7) Percona-Server/storage/innodb_plugin/include/buf0buf.h (+4/-4) Percona-Server/storage/innodb_plugin/include/buf0flu.h (+4/-1) Percona-Server/storage/innodb_plugin/include/buf0lru.h (+3/-2) |
To merge this branch: | bzr merge lp:~laurynas-biveinis/percona-server/buf-mutex-split-fixes-5.1 |
Related bugs: |
|
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Laurynas Biveinis (community) | Approve | ||
Alexey Kopytov (community) | Approve | ||
George Ormond Lorch III (community) | g2 | Approve | |
Review via email: mp+165526@code.launchpad.net |
Description of the change
http://
Fix several buffer pool mutex split bugs:
- bug 1086680 (Valgrind: free in buf_page_get_gen (Invalid read in
buf_flush_batch / buf_flush_list) + free in buf_page_get_gen
(Invalid read in buf_flush_
- bug 1103850 (InnoDB: Failing assertion:
mutex_
- bug 1181269 (Unnecessary LRU list mutex acquisition in
buf_page_
The Valgrind errors and crashes of bug 1086680 happen because of a
race condition involving a dirty compressed page block for which there
is uncompressed page image in the buffer pool.
First, a master thread (or possible another query thread) does a flush
list flush and for that acquires the flush list mutex, gets a pointer
to a page, releases the flush list mutex.
At this point another thread starts reading the same page into the
buffer. Since it is a dirty compressed page, it allocates an
uncompressed page, relocates the page on the flush list, frees the
compressed page descriptor.
At this point the flushing thread proceeds to use the pointer which is
now dangling, causing the issue.
Fix by adjusting buf_flush_batch() to hold the flush list mutex for
all the bpage pointer dereferences. This should not introduce any new
stalls because the mutex is released after checking a few fields
only. This also allows simplifying buf_flush_batch() to become closer
to the InnoDB version, by removing prev_bpage and remaining local
vars. However, this means that buf_flush_
becomes callable with the flush list mutex both held and not held.
This requires additional handling because this function may call
buf_flush_remove() in XtraDB due to the lazy table drop feature. The
latter needs to acquire the flush list mutex, thus we introduce a
boolean have_flush_
buf_flush_
asserts.
Bug 1103850 is caused by buf_flush_remove() calling
buf_LRU_
builds, which requires the LRU list mutex, which is not always held
here. Fixed by disabling the zip_clean list maintenance for debug
builds. The alternative would be to buffer-fix the block, release and
re-acquire all the mutexes, including the LRU list one. But since
buf_flush_remove() may be called from different contexts holding
different mutexes (also due to bug 1086680 fix), this seems to be very
impractical.
Bug 1181269 is a missed performance improvement opportunity due to
upstream fixing http://
that buf_LRU_
debug builds only. But XtraDB still kept on acquiring the LRU list
mutex buf_flush_
buf_flush_remove(). With the buf_LRU_
debug-only by upstream and removed by bug 1103850 fix,
buf_flush_
the LRU list flushes.
Nice work, especially the fix for bug 1086680 and the changes in buf_flush_batch.