Merge into 5.1 : bug1086680-5.1 : Code : Percona Server moved to https://jira.percona.com/projects/PS

Status:

Rejected

Rejected by:

Alexey Kopytov on 2013-08-15

Proposed branch:

lp:~laurynas-biveinis/percona-server/bug1086680-5.1

Merge into:

lp:percona-server/5.1

Diff against target:

237 lines (+63/-6)

5 files modified

Percona-Server/storage/innodb_plugin/buf/buf0buf.c (+29/-3)
Percona-Server/storage/innodb_plugin/buf/buf0flu.c (+29/-3)
Percona-Server/storage/innodb_plugin/include/buf0buf.h (+3/-0)
Percona-Server/storage/innodb_plugin/include/sync0sync.h (+1/-0)
Percona-Server/storage/innodb_plugin/sync/sync0sync.c (+1/-0)

To merge this branch:

bzr merge lp:~laurynas-biveinis/percona-server/bug1086680-5.1

High

Invalid

Link a bug report

Reviewer	Review Type	Date Requested	Status
Alexey Kopytov (community)			Needs Resubmitting on 2013-08-15
Laurynas Biveinis			Pending
Review via email: mp+144470@code.launchpad.net

This proposal supersedes a proposal from 2013-01-18.

Description of the change

Fix bug 1086680 for 5.1
http://jenkins.percona.com/job/percona-server-5.1-param/508/
Additionally tested locally with UNIV_SYNC_DEBUG, MTR --suite=innodb_plugin.
No BT or ST, but blocks BT 16274 QA.

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2013-01-23: Posted in a previous version of this proposal

#

Left a race condition in that block that is about to be relocated state might change from ZIP_DIRTY to ZIP_CLEAN or back while the mutexes are released in buf_page_get_gen. It's an issue because the decision to take or not the zip dirty flush mutex is made earlier.

Easy to fix by rechecking the block state after the final mutex reacquisition.

review: Needs Fixing

Revision history for this message

Alexey Kopytov (akopytov) wrote on 2013-03-15:

#

The patch itself looks correct in the sense that it does solve the
problem under the described conditions. Though it is not clear if there
are other conditions resulting in a dirty bpage read in
buf_flush_batch(), e.g. not related to compressed page relocations, and
if those will be solved by the patch as well.

However, I don't like that we:

1) introduce another mutex. it's another potential source of regressions
and another thing to care about on merges.

2) lock a global buffer pool mutex for the entire flushing loop in
buf_flush_batch() when we in fact only want to protect individual pages
on every loop iteration. Which means a buf_page_get_gen() call in
another thread trying to relocate a completely unrelated (e.g. not even
touched by the buf_flush_batch() loop) page will have to wait for
buf_flush_batch() to complete flushing all the pages it wants to flush.

Are there any reasons to not follow the same logic in buf_flush_batch()
wrt. flush_list_mutex as we do wrt. LRU_list_mutex? After all, the root
cause of the problem is that we release flush_list_mutex too early and
then read from the block while it's not protected either by
flush_list_mutex or the block mutex. So let's just not release
flush_list_mutex until we lock the block mutex a few lines down in the
code, and that will also prevent relocations? flush_list_mutex can then
be released temporarily while buf_flush_try_neighbors() is in
progress. It will likely result in a much smaller & easier to maintain
patch, and even increase locking granularity as opposed to a separate
global mutex approach?

review: Needs Fixing

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2013-03-15:

#

> Though it is not clear if there
> are other conditions resulting in a dirty bpage read in
> buf_flush_batch(), e.g. not related to compressed page relocations, and
> if those will be solved by the patch as well.

IIRC I reviewed the code for other such issues.

> However, I don't like that we:
>
> 1) introduce another mutex. it's another potential source of regressions
> and another thing to care about on merges.

I don't like it neither, and I'd be glad to reimplement it some other way. But that was the least bad alternative I could come up with.

> 2) lock a global buffer pool mutex for the entire flushing loop in
> buf_flush_batch() when we in fact only want to protect individual pages
> on every loop iteration. Which means a buf_page_get_gen() call in
> another thread trying to relocate a completely unrelated (e.g. not even
> touched by the buf_flush_batch() loop) page will have to wait for
> buf_flush_batch() to complete flushing all the pages it wants to flush.

But in this patch the new mutex is temporarily released once the the block-to-be-flushed is fixed right before the flushing itself, where we don't have any block pointers acquired from the flush list. Thus at least the blocking is not very long. Thus what this mutex serializes are 1) single page relocations; 2) single page flushes.

> Are there any reasons to not follow the same logic in buf_flush_batch()
> wrt. flush_list_mutex as we do wrt. LRU_list_mutex? After all, the root
> cause of the problem is that we release flush_list_mutex too early and
> then read from the block while it's not protected either by
> flush_list_mutex or the block mutex. So let's just not release
> flush_list_mutex until we lock the block mutex a few lines down in the
> code, and that will also prevent relocations?

Unfortunately, the block and flush list mutex priority is wrong for this, obviously the easiest if it were possible, solution. Block mutexes have to locked before the flush list mutex. The LRU list mutex is different, it has to be locked before the block mutexes.

> flush_list_mutex can then
> be released temporarily while buf_flush_try_neighbors() is in
> progress. It will likely result in a much smaller & easier to maintain
> patch, and even increase locking granularity as opposed to a separate
> global mutex approach?

Yes, it would if not the mutex priority.

The alternatives I see:
- Rework buffer pool mutexes so that flush list mutex comes before the block mutex. Not something I'd want to maintain for the upstream merges and haven't looked into what kind of issues would such implementation raise.
- Overload (rename?) the LRU list mutex to take on the job of the new mutex too since it has the right priority for the job. This comes close to unsplitting the buffer pool mutex split, and has most of the disadvantages of the current solution.

Maybe I'm missing some other alternative or my reasoning above is wrong?

Thanks,

> Though it is not clear if there
> are other conditions resulting in a dirty bpage read in
> buf_flush_batch(), e.g. not related to compressed page relocations, and
> if those will be solved by the patch as well.

IIRC I reviewed the code for other such issues.

> However, I don't like that we:
> 
> 1) introduce another mutex. it's another potential source of regressions
> and another thing to care about on merges.

I don't like it neither, and I'd be glad to reimplement it some other way. But that was the least bad alternative I could come up with.

> 2) lock a global buffer pool mutex for the entire flushing loop in
> buf_flush_batch() when we in fact only want to protect individual pages
> on every loop iteration. Which means a buf_page_get_gen() call in
> another thread trying to relocate a completely unrelated (e.g. not even
> touched by the buf_flush_batch() loop) page will have to wait for
> buf_flush_batch() to complete flushing all the pages it wants to flush.

But in this patch the new mutex is temporarily released once the the block-to-be-flushed is fixed right before the flushing itself, where we don't have any block pointers acquired from the flush list. Thus at least the blocking is not very long. Thus what this mutex serializes are 1) single page relocations; 2) single page flushes.

> Are there any reasons to not follow the same logic in buf_flush_batch()
> wrt. flush_list_mutex as we do wrt. LRU_list_mutex? After all, the root
> cause of the problem is that we release flush_list_mutex too early and
> then read from the block while it's not protected either by
> flush_list_mutex or the block mutex. So let's just not release
> flush_list_mutex until we lock the block mutex a few lines down in the
> code, and that will also prevent relocations?

Unfortunately, the block and flush list mutex priority is wrong for this, obviously the easiest if it were possible, solution. Block mutexes have to locked before the flush list mutex. The LRU list mutex is different, it has to be locked before the block mutexes.

> flush_list_mutex can then
> be released temporarily while buf_flush_try_neighbors() is in
> progress. It will likely result in a much smaller & easier to maintain
> patch, and even increase locking granularity as opposed to a separate
> global mutex approach?

Yes, it would if not the mutex priority.

The alternatives I see:
- Rework buffer pool mutexes so that flush list mutex comes before the block mutex. Not something I'd want to maintain for the upstream merges and haven't looked into what kind of issues would such implementation raise.
- Overload (rename?) the LRU list mutex to take on the job of the new mutex too since it has the right priority for the job. This comes close to unsplitting the buffer pool mutex split, and has most of the disadvantages of the current solution.

Maybe I'm missing some other alternative or my reasoning above is wrong?

Thanks,

Revision history for this message

Alexey Kopytov (akopytov) wrote on 2013-03-15:

#

Download full text (3.1 KiB)

Hi Laurynas,

On Fri, 15 Mar 2013 13:14:20 -0000, Laurynas Biveinis wrote:
>> 2) lock a global buffer pool mutex for the entire flushing loop in
>> buf_flush_batch() when we in fact only want to protect individual pages
>> on every loop iteration. Which means a buf_page_get_gen() call in
>> another thread trying to relocate a completely unrelated (e.g. not even
>> touched by the buf_flush_batch() loop) page will have to wait for
>> buf_flush_batch() to complete flushing all the pages it wants to flush.
>
> But in this patch the new mutex is temporarily released once the the block-to-be-flushed is fixed right before the flushing itself, where we don't have any block pointers acquired from the flush list. Thus at least the blocking is not very long. Thus what this mutex serializes are 1) single page relocations; 2) single page flushes.
>

Right, but temporarily unlocking zip_dirty_flush_mutex only occurs after
locking another 2 global locks: page_hash_latch and the buffer pool
mutex. Which is still bad, because whether it is "not very long" or not
depends on many other factors.

>> Are there any reasons to not follow the same logic in buf_flush_batch()
>> wrt. flush_list_mutex as we do wrt. LRU_list_mutex? After all, the root
>> cause of the problem is that we release flush_list_mutex too early and
>> then read from the block while it's not protected either by
>> flush_list_mutex or the block mutex. So let's just not release
>> flush_list_mutex until we lock the block mutex a few lines down in the
>> code, and that will also prevent relocations?
>
> Unfortunately, the block and flush list mutex priority is wrong for this, obviously the easiest if it were possible, solution. Block mutexes have to locked before the flush list mutex. The LRU list mutex is different, it has to be locked before the block mutexes.
>

I don't really see reasons for them to be different. The order relation
between LRU_list_mutex, flush_list_mutex and block_mutex priorities were
defined by our patches. And something tells me they have either been
defined arbitrarily, or to hide some implementation "drawback" :)

Seriously, what's the reason for them be different with respect to the
block mutex priority?

>> flush_list_mutex can then
>> be released temporarily while buf_flush_try_neighbors() is in
>> progress. It will likely result in a much smaller & easier to maintain
>> patch, and even increase locking granularity as opposed to a separate
>> global mutex approach?
>
> Yes, it would if not the mutex priority.
>
> The alternatives I see:
> - Rework buffer pool mutexes so that flush list mutex comes before the block mutex. Not something I'd want to maintain for the upstream merges and haven't looked into what kind of issues would such implementation raise.

It doesn't apply to upstream merges. I'd first try to answer the "why is
it different?" question and then see how feasible it is to change that.

> - Overload (rename?) the LRU list mutex to take on the job of the new mutex too since it has the right priority for the job. This comes close to unsplitting the buffer pool mutex split, and has most of the disadvantages of the current solution.
>
Yep, this doesn't ...

Hi Laurynas,

On Fri, 15 Mar 2013 13:14:20 -0000, Laurynas Biveinis wrote:
>> 2) lock a global buffer pool mutex for the entire flushing loop in
>> buf_flush_batch() when we in fact only want to protect individual pages
>> on every loop iteration. Which means a buf_page_get_gen() call in
>> another thread trying to relocate a completely unrelated (e.g. not even
>> touched by the buf_flush_batch() loop) page will have to wait for
>> buf_flush_batch() to complete flushing all the pages it wants to flush.
>
> But in this patch the new mutex is temporarily released once the the block-to-be-flushed is fixed right before the flushing itself, where we don't have any block pointers acquired from the flush list. Thus at least the blocking is not very long. Thus what this mutex serializes are 1) single page relocations; 2) single page flushes.
>

Right, but temporarily unlocking zip_dirty_flush_mutex only occurs after 
locking another 2 global locks: page_hash_latch and the buffer pool 
mutex. Which is still bad, because whether it is "not very long" or not 
depends on many other factors.

>> Are there any reasons to not follow the same logic in buf_flush_batch()
>> wrt. flush_list_mutex as we do wrt. LRU_list_mutex? After all, the root
>> cause of the problem is that we release flush_list_mutex too early and
>> then read from the block while it's not protected either by
>> flush_list_mutex or the block mutex. So let's just not release
>> flush_list_mutex until we lock the block mutex a few lines down in the
>> code, and that will also prevent relocations?
>
> Unfortunately, the block and flush list mutex priority is wrong for this, obviously the easiest if it were possible, solution. Block mutexes have to locked before the flush list mutex. The LRU list mutex is different, it has to be locked before the block mutexes.
>

I don't really see reasons for them to be different. The order relation 
between LRU_list_mutex, flush_list_mutex and block_mutex priorities were 
defined by our patches. And something tells me they have either been 
defined arbitrarily, or to hide some implementation "drawback" :)

Seriously, what's the reason for them be different with respect to the 
block mutex priority?

>> flush_list_mutex can then
>> be released temporarily while buf_flush_try_neighbors() is in
>> progress. It will likely result in a much smaller & easier to maintain
>> patch, and even increase locking granularity as opposed to a separate
>> global mutex approach?
>
> Yes, it would if not the mutex priority.
>
> The alternatives I see:
> - Rework buffer pool mutexes so that flush list mutex comes before the block mutex. Not something I'd want to maintain for the upstream merges and haven't looked into what kind of issues would such implementation raise.

It doesn't apply to upstream merges. I'd first try to answer the "why is 
it different?" question and then see how feasible it is to change that.

> - Overload (rename?) the LRU list mutex to take on the job of the new mutex too since it has the right priority for the job. This comes close to unsplitting the buffer pool mutex split, and has most of the disadvantages of the current solution.
>
Yep, this doesn't look nice.

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2013-03-15:

#

Hi Alexey -

> The order relation
> between LRU_list_mutex, flush_list_mutex and block_mutex priorities were
> defined by our patches.

flush_list_mutex and block_mutex are upstream mutexes, with their order defined in the upstream:

#define SYNC_BUF_BLOCK 146 /* Block mutex */
#define SYNC_BUF_FLUSH_LIST 145 /* Buffer flush list mutex */

LRU_list_mutex is of course ours.

> And something tells me they have either been
> defined arbitrarily, or to hide some implementation "drawback" :)
>
> Seriously, what's the reason for them be different with respect to the
> block mutex priority?

Arguably it is the LRU_list_mutex that has the right (with some thought I guess the only possible) priority WRT block mutex even though it differs from the flush list mutex one. It's more of an upstream "drawback" IMHO :)

> > - Rework buffer pool mutexes so that flush list mutex comes before the block
> mutex. Not something I'd want to maintain for the upstream merges and haven't
> looked into what kind of issues would such implementation raise.
>
> It doesn't apply to upstream merges. I'd first try to answer the "why is
> it different?" question and then see how feasible it is to change that.

Since these two are upstream mutexes, I believe it does apply to the merges. I will spend some time looking into the priority reversal and its implications. Perhaps it's not very bad code-wise, but the merges add an additional risk here.

Thanks,
Laurynas

Revision history for this message

Alexey Kopytov (akopytov) wrote on 2013-03-16:

#

Hi Laurynas,

On Fri, 15 Mar 2013 15:08:36 -0000, Laurynas Biveinis wrote:
> Hi Alexey -
>
>> The order relation
>> between LRU_list_mutex, flush_list_mutex and block_mutex priorities were
>> defined by our patches.
>
> flush_list_mutex and block_mutex are upstream mutexes, with their order defined in the upstream:
>
> #define SYNC_BUF_BLOCK 146 /* Block mutex */
> #define SYNC_BUF_FLUSH_LIST 145 /* Buffer flush list mutex */
>

That's from upstream 5.5. This is a 5.1 MP, and upstream 5.1 doesn't
have flush_list_mutex.

> LRU_list_mutex is of course ours.
>
>> And something tells me they have either been
>> defined arbitrarily, or to hide some implementation "drawback" :)
>>
>> Seriously, what's the reason for them be different with respect to the
>> block mutex priority?
>
> Arguably it is the LRU_list_mutex that has the right (with some thought I guess the only possible) priority WRT block mutex even though it differs from the flush list mutex one. It's more of an upstream "drawback" IMHO :)
>

In fact it doesn't matter. If flush_list_mutex has a higher priority
than block_mutex, that just means it can be used _instead_ of the
block_mutex. So we only want to acquire it for BUF_FLUSH_LRU, but
acquiring flush_list_mutex and holding it until all necessary bpage
fields are accessed looks possible.

The 5.5 implementation will be a bit more tricky as you need a bit
different implementation of buf_flush_page_and_try_neighbors() in
buf_flush_flush_list_batch() (or just don't use it and call
buf_flush_try_neighbors() directly from that function).

Revision history for this message

Alexey Kopytov (akopytov) on 2013-08-15:

#

review: Needs Resubmitting

Unmerged revisions

519. By Laurynas Biveinis on 2013-01-23

Fix bug 1086680 (Valgrind: free in buf_page_get_gen (Invalid read in
buf_flush_batch / buf_flush_list) + free in buf_page_get_gen (Invalid
read in buf_flush_page_and_try_neighbors)) in XtraDB.

The Valgrind errors and crashes happen because of a race condition
involving a dirty compressed page block for which there is
uncompressed page image in the buffer pool.

First, a master thread (or possible another query thread) does a flush
list flush and for that acquires the flush list mutex, gets a pointer
to a page, releases the flush list mutex.

At this point another thread starts reading the same page into the
buffer. Since it is a dirty compressed page, it allocates an
uncompressed page, relocates the page on the flush list, frees the
compressed page descriptor.

At this point the flushing thread proceeds to use the pointer which is
now dangling, causing the issue.

To fix this, a new buffer pool mutex is introduced that protects flush
list relocations and flush list page reads with no flush list mutex
taken. The mutex priority is between the page latch and the LRU list
mutex priorities.

Acquire this mutex in buf_page_get_gen() when we are about to relocate
the page on the flush list. Also acquire it in buf_flush_batch()
around the flushing. Temporarily release this mutex to obey latching
order in buf_flush_page() after an I/O fix is set on the page we are
about to flush, as this also prevents the flush list relocation.

Percona Server moved to https://jira.percona.com/projects/PS

Merge lp:~laurynas-biveinis/percona-server/bug1086680-5.1 into lp:percona-server/5.1

Commit message

Description of the change

Unmerged revisions

Preview Diff

Subscribers

1	=== modified file 'Percona-Server/storage/innodb_plugin/buf/buf0buf.c'
2	--- Percona-Server/storage/innodb_plugin/buf/buf0buf.c 2013-01-18 03:33:06 +0000
3	+++ Percona-Server/storage/innodb_plugin/buf/buf0buf.c 2013-01-23 10:49:24 +0000
4	@@ -294,6 +294,9 @@
5	UNIV_INTERN mutex_t free_list_mutex;
6	UNIV_INTERN mutex_t zip_free_mutex;
7	UNIV_INTERN mutex_t zip_hash_mutex;
8	+/** mutex protecting the compressed page relocation and flush list page
9	+accesses after the flush list mutex release */
10	+UNIV_INTERN mutex_t zip_dirty_flush_mutex;
11	/** mutex protecting the control blocks of compressed-only pages
12	(of type buf_page_t, not buf_block_t) */
13	UNIV_INTERN mutex_t buf_pool_zip_mutex;
14	@@ -980,6 +983,7 @@
15	mutex_create(&free_list_mutex, SYNC_BUF_FREE_LIST);
16	mutex_create(&zip_free_mutex, SYNC_BUF_ZIP_FREE);
17	mutex_create(&zip_hash_mutex, SYNC_BUF_ZIP_HASH);
18	+ mutex_create(&zip_dirty_flush_mutex, SYNC_BUF_ZIP_DIRTY_FLUSH);
19
20	mutex_create(&buf_pool_zip_mutex, SYNC_BUF_BLOCK);
21
22	@@ -1931,6 +1935,7 @@
23	switch (buf_block_get_state(block)) {
24	buf_page_t* bpage;
25	ibool success;
26	+ ibool is_zip_dirty;
27
28	case BUF_BLOCK_FILE_PAGE:
29	if (block_mutex == &buf_pool_zip_mutex) {
30	@@ -1943,6 +1948,8 @@
31	case BUF_BLOCK_ZIP_PAGE:
32	case BUF_BLOCK_ZIP_DIRTY:
33	ut_ad(block_mutex == &buf_pool_zip_mutex);
34	+ is_zip_dirty
35	+ = (buf_block_get_state(block) == BUF_BLOCK_ZIP_DIRTY);
36	bpage = &block->page;
37	/* Protect bpage->buf_fix_count. */
38	/* Already proteced here. */
39	@@ -1974,6 +1981,13 @@
40	block_mutex = &block->mutex;
41
42	//buf_pool_mutex_enter();
43	+ if (is_zip_dirty) {
44	+ /* We have a dirty compressed block for this page, but
45	+ no uncompressed block. As we will replace the block on
46	+ the flush list, block any flushes from acquiring this
47	+ block. */
48	+ mutex_enter(&zip_dirty_flush_mutex);
49	+ }
50	mutex_enter(&LRU_list_mutex);
51	rw_lock_x_lock(&page_hash_latch);
52	mutex_enter(block_mutex);
53	@@ -1997,6 +2011,9 @@
54	}
55	rw_lock_x_unlock(&page_hash_latch);
56	mutex_exit(&LRU_list_mutex);
57	+ if (is_zip_dirty) {
58	+ mutex_exit(&zip_dirty_flush_mutex);
59	+ }
60	goto loop2;
61	}
62	}
63	@@ -2005,11 +2022,14 @@
64
65	if (UNIV_UNLIKELY
66	(bpage->buf_fix_count
67	- \|\| buf_page_get_io_fix(bpage) != BUF_IO_NONE)) {
68	+ \|\| buf_page_get_io_fix(bpage) != BUF_IO_NONE
69	+ \|\| is_zip_dirty != (buf_page_get_state(bpage)
70	+ == BUF_BLOCK_ZIP_DIRTY))) {
71
72	mutex_exit(&buf_pool_zip_mutex);
73	- /* The block was buffer-fixed or I/O-fixed
74	- while buf_pool_mutex was not held by this thread.
75	+ /* The block was buffer-fixed or I/O-fixed or changed
76	+ from zip clean to zip dirty or back while the mutexes
77	+ were not held by this thread.
78	Free the block that was allocated and try again.
79	This should be extremely unlikely. */
80
81	@@ -2018,6 +2038,9 @@
82
83	rw_lock_x_unlock(&page_hash_latch);
84	mutex_exit(&LRU_list_mutex);
85	+ if (is_zip_dirty) {
86	+ mutex_exit(&zip_dirty_flush_mutex);
87	+ }
88	goto wait_until_unfixed;
89	}
90
91	@@ -2060,6 +2083,9 @@
92	buf_unzip_LRU_add_block(block, FALSE);
93
94	mutex_exit(&LRU_list_mutex);
95	+ if (is_zip_dirty) {
96	+ mutex_exit(&zip_dirty_flush_mutex);
97	+ }
98
99	block->page.buf_fix_count = 1;
100	buf_block_set_io_fix(block, BUF_IO_READ);
101
102	=== modified file 'Percona-Server/storage/innodb_plugin/buf/buf0flu.c'
103	--- Percona-Server/storage/innodb_plugin/buf/buf0flu.c 2011-11-24 02:00:47 +0000
104	+++ Percona-Server/storage/innodb_plugin/buf/buf0flu.c 2013-01-23 10:49:24 +0000
105	@@ -400,7 +400,9 @@
106	//ut_a(buf_page_in_file(bpage));
107	//ut_ad(buf_pool_mutex_own()); /optimistic.../
108	ut_ad(mutex_own(buf_page_get_mutex(bpage)));
109	- ut_ad(flush_type == BUF_FLUSH_LRU \|\| BUF_FLUSH_LIST);
110	+ ut_ad(flush_type == BUF_FLUSH_LRU
111	+ \|\| (flush_type == BUF_FLUSH_LIST
112	+ && mutex_own(&zip_dirty_flush_mutex)));
113
114	if (buf_page_in_file(bpage) && bpage->oldest_modification != 0
115	&& buf_page_get_io_fix(bpage) == BUF_IO_NONE) {
116	@@ -499,6 +501,8 @@
117	buf_page_t* prev_b = NULL;
118
119	//ut_ad(buf_pool_mutex_own());
120	+ ut_ad((buf_page_get_state(bpage) != BUF_BLOCK_ZIP_DIRTY)
121	+ \|\| mutex_own(&zip_dirty_flush_mutex));
122	ut_ad(mutex_own(&flush_list_mutex));
123
124	ut_ad(mutex_own(buf_page_get_mutex(bpage)));
125	@@ -1170,7 +1174,9 @@
126	mutex_t* block_mutex;
127	ibool is_uncompressed;
128
129	- ut_ad(flush_type == BUF_FLUSH_LRU \|\| flush_type == BUF_FLUSH_LIST);
130	+ ut_ad(flush_type == BUF_FLUSH_LRU
131	+ \|\| (flush_type == BUF_FLUSH_LIST
132	+ && mutex_own(&zip_dirty_flush_mutex)));
133	//ut_ad(buf_pool_mutex_own());
134	#ifdef UNIV_SYNC_DEBUG
135	ut_ad(rw_lock_own(&page_hash_latch, RW_LOCK_EX)
136	@@ -1188,6 +1194,12 @@
137
138	buf_page_set_io_fix(bpage, BUF_IO_WRITE);
139
140	+ /* I/O fix prevents the block relocation, thus OK to release
141	+ temporarily */
142	+ if (flush_type == BUF_FLUSH_LIST) {
143	+ mutex_exit(&zip_dirty_flush_mutex);
144	+ }
145	+
146	buf_page_set_flush_type(bpage, flush_type);
147
148	if (buf_pool->n_flush[flush_type] == 0) {
149	@@ -1273,6 +1285,10 @@
150	}
151	#endif /* UNIV_DEBUG */
152	buf_flush_write_block_low(bpage);
153	+
154	+ if (flush_type == BUF_FLUSH_LIST) {
155	+ mutex_enter(&zip_dirty_flush_mutex);
156	+ }
157	}
158
159	/*********************************************************//
160	@@ -1293,7 +1309,9 @@
161	ulint count = 0;
162	ulint i;
163
164	- ut_ad(flush_type == BUF_FLUSH_LRU \|\| flush_type == BUF_FLUSH_LIST);
165	+ ut_ad(flush_type == BUF_FLUSH_LRU
166	+ \|\| (flush_type == BUF_FLUSH_LIST
167	+ && mutex_own(&zip_dirty_flush_mutex)));
168
169	if (UT_LIST_GET_LEN(buf_pool->LRU) < BUF_LRU_OLD_MIN_LEN \|\| !flush_neighbors) {
170	/* If there is little space, it is better not to flush any
171	@@ -1426,6 +1444,9 @@
172
173	if (flush_type == BUF_FLUSH_LRU) {
174	mutex_enter(&LRU_list_mutex);
175	+ } else {
176	+ ut_ad(flush_type == BUF_FLUSH_LIST);
177	+ mutex_enter(&zip_dirty_flush_mutex);
178	}
179
180	for (;;) {
181	@@ -1451,6 +1472,9 @@
182	prev_bpage = UT_LIST_GET_PREV(flush_list, bpage);
183	}
184	mutex_exit(&flush_list_mutex);
185	+ /* After the flush list mutex is released, the block is
186	+ protected from flush list reallocation by
187	+ zip_dirty_flush_mutex. */
188	if (!bpage
189	\|\| bpage->oldest_modification >= lsn_limit) {
190	/* We have flushed enough */
191	@@ -1528,6 +1552,8 @@
192
193	if (flush_type == BUF_FLUSH_LRU) {
194	mutex_exit(&LRU_list_mutex);
195	+ } else {
196	+ mutex_exit(&zip_dirty_flush_mutex);
197	}
198
199	mutex_enter(&buf_pool_mutex);
200
201	=== modified file 'Percona-Server/storage/innodb_plugin/include/buf0buf.h'
202	--- Percona-Server/storage/innodb_plugin/include/buf0buf.h 2012-10-16 04:49:59 +0000
203	+++ Percona-Server/storage/innodb_plugin/include/buf0buf.h 2013-01-23 10:49:24 +0000
204	@@ -1606,6 +1606,9 @@
205	extern mutex_t free_list_mutex;
206	extern mutex_t zip_free_mutex;
207	extern mutex_t zip_hash_mutex;
208	+/** mutex protecting the compressed page relocation and flush list page
209	+accesses after the flush list mutex release */
210	+extern mutex_t zip_dirty_flush_mutex;
211	/** mutex protecting the control blocks of compressed-only pages
212	(of type buf_page_t, not buf_block_t) */
213	extern mutex_t buf_pool_zip_mutex;
214
215	=== modified file 'Percona-Server/storage/innodb_plugin/include/sync0sync.h'
216	--- Percona-Server/storage/innodb_plugin/include/sync0sync.h 2011-11-24 16:33:30 +0000
217	+++ Percona-Server/storage/innodb_plugin/include/sync0sync.h 2013-01-23 10:49:24 +0000
218	@@ -487,6 +487,7 @@
219	SYNC_SEARCH_SYS, as memory allocation
220	can call routines there! Otherwise
221	the level is SYNC_MEM_HASH. */
222	+#define SYNC_BUF_ZIP_DIRTY_FLUSH 158
223	#define SYNC_BUF_LRU_LIST 157
224	#define SYNC_BUF_PAGE_HASH 156
225	#define SYNC_BUF_BLOCK 155
226
227	=== modified file 'Percona-Server/storage/innodb_plugin/sync/sync0sync.c'
228	--- Percona-Server/storage/innodb_plugin/sync/sync0sync.c 2011-11-24 16:33:30 +0000
229	+++ Percona-Server/storage/innodb_plugin/sync/sync0sync.c 2013-01-23 10:49:24 +0000
230	@@ -1170,6 +1170,7 @@
231	case SYNC_TRX_SYS_HEADER:
232	case SYNC_FILE_FORMAT_TAG:
233	case SYNC_DOUBLEWRITE:
234	+ case SYNC_BUF_ZIP_DIRTY_FLUSH:
235	case SYNC_BUF_LRU_LIST:
236	case SYNC_BUF_FLUSH_LIST:
237	case SYNC_BUF_PAGE_HASH: