Merge lp:~laurynas-biveinis/percona-server/bug1086680-5.1 into lp:percona-server/5.1
Status: | Rejected |
---|---|
Rejected by: | Alexey Kopytov |
Proposed branch: | lp:~laurynas-biveinis/percona-server/bug1086680-5.1 |
Merge into: | lp:percona-server/5.1 |
Diff against target: |
237 lines (+63/-6) 5 files modified
Percona-Server/storage/innodb_plugin/buf/buf0buf.c (+29/-3) Percona-Server/storage/innodb_plugin/buf/buf0flu.c (+29/-3) Percona-Server/storage/innodb_plugin/include/buf0buf.h (+3/-0) Percona-Server/storage/innodb_plugin/include/sync0sync.h (+1/-0) Percona-Server/storage/innodb_plugin/sync/sync0sync.c (+1/-0) |
To merge this branch: | bzr merge lp:~laurynas-biveinis/percona-server/bug1086680-5.1 |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Alexey Kopytov (community) | Needs Resubmitting | ||
Laurynas Biveinis | Pending | ||
Review via email: mp+144470@code.launchpad.net |
This proposal supersedes a proposal from 2013-01-18.
Description of the change
Fix bug 1086680 for 5.1
http://
Additionally tested locally with UNIV_SYNC_DEBUG, MTR --suite=
No BT or ST, but blocks BT 16274 QA.
Unmerged revisions
- 519. By Laurynas Biveinis
-
Fix bug 1086680 (Valgrind: free in buf_page_get_gen (Invalid read in
buf_flush_batch / buf_flush_list) + free in buf_page_get_gen (Invalid
read in buf_flush_page_and_ try_neighbors) ) in XtraDB. The Valgrind errors and crashes happen because of a race condition
involving a dirty compressed page block for which there is
uncompressed page image in the buffer pool.First, a master thread (or possible another query thread) does a flush
list flush and for that acquires the flush list mutex, gets a pointer
to a page, releases the flush list mutex.At this point another thread starts reading the same page into the
buffer. Since it is a dirty compressed page, it allocates an
uncompressed page, relocates the page on the flush list, frees the
compressed page descriptor.At this point the flushing thread proceeds to use the pointer which is
now dangling, causing the issue.To fix this, a new buffer pool mutex is introduced that protects flush
list relocations and flush list page reads with no flush list mutex
taken. The mutex priority is between the page latch and the LRU list
mutex priorities.Acquire this mutex in buf_page_get_gen() when we are about to relocate
the page on the flush list. Also acquire it in buf_flush_batch()
around the flushing. Temporarily release this mutex to obey latching
order in buf_flush_page() after an I/O fix is set on the page we are
about to flush, as this also prevents the flush list relocation.
Left a race condition in that block that is about to be relocated state might change from ZIP_DIRTY to ZIP_CLEAN or back while the mutexes are released in buf_page_get_gen. It's an issue because the decision to take or not the zip dirty flush mutex is made earlier.
Easy to fix by rechecking the block state after the final mutex reacquisition.