Merge lp:~laurynas-biveinis/percona-server/bp-split-5.6 into lp:percona-server/5.6

Proposed by Laurynas Biveinis
Status: Merged
Approved by: Alexey Kopytov
Approved revision: no longer in the source branch.
Merged at revision: 425
Proposed branch: lp:~laurynas-biveinis/percona-server/bp-split-5.6
Merge into: lp:percona-server/5.6
Diff against target: 4913 lines (+1001/-789)
22 files modified
Percona-Server/storage/innobase/btr/btr0cur.cc (+18/-6)
Percona-Server/storage/innobase/btr/btr0sea.cc (+5/-8)
Percona-Server/storage/innobase/buf/buf0buddy.cc (+46/-18)
Percona-Server/storage/innobase/buf/buf0buf.cc (+220/-196)
Percona-Server/storage/innobase/buf/buf0dblwr.cc (+1/-0)
Percona-Server/storage/innobase/buf/buf0dump.cc (+8/-8)
Percona-Server/storage/innobase/buf/buf0flu.cc (+153/-125)
Percona-Server/storage/innobase/buf/buf0lru.cc (+280/-177)
Percona-Server/storage/innobase/buf/buf0rea.cc (+41/-26)
Percona-Server/storage/innobase/fsp/fsp0fsp.cc (+1/-1)
Percona-Server/storage/innobase/handler/ha_innodb.cc (+12/-2)
Percona-Server/storage/innobase/handler/i_s.cc (+14/-9)
Percona-Server/storage/innobase/ibuf/ibuf0ibuf.cc (+1/-1)
Percona-Server/storage/innobase/include/buf0buddy.h (+4/-4)
Percona-Server/storage/innobase/include/buf0buddy.ic (+9/-10)
Percona-Server/storage/innobase/include/buf0buf.h (+91/-108)
Percona-Server/storage/innobase/include/buf0buf.ic (+63/-63)
Percona-Server/storage/innobase/include/buf0flu.h (+6/-7)
Percona-Server/storage/innobase/include/buf0flu.ic (+0/-2)
Percona-Server/storage/innobase/include/buf0lru.h (+9/-7)
Percona-Server/storage/innobase/include/sync0sync.h (+13/-4)
Percona-Server/storage/innobase/sync/sync0sync.cc (+6/-7)
To merge this branch: bzr merge lp:~laurynas-biveinis/percona-server/bp-split-5.6
Reviewer Review Type Date Requested Status
Alexey Kopytov (community) Approve
Laurynas Biveinis Pending
Review via email: mp+186711@code.launchpad.net

This proposal supersedes a proposal from 2013-09-09.

Description of the change

Repushed and resubmitted. The only change from the 3rd push is atomic ops split off to be handled later.

http://jenkins.percona.com/job/percona-server-5.6-param/259/

No BT or ST, but 5.6 GA prerequisite.

To post a comment you must log in.
Revision history for this message
Alexey Kopytov (akopytov) wrote : Posted in a previous version of this proposal
Download full text (4.1 KiB)

Hi Laurynas,

This is not a complete review, I'm only about 10% into the patch, but I'm posting the comments I have so far to parallelize things a bit:

  - the code in btr_blob_free() can be simplified. just initialize
    ‘freed’ with ‘false’, then assign to it the result of
    buf_LRU_free_page() whenever it is called, and then do this at the
    end:

    if (!freed) {
     mutex_exit(&buf_pool->LRU_list_mutex);
    }

    Would result in less hairy code.

  - wrong comments for buf_LRU_free_page(): s/block
    mutex/buf_page_get_mutex() mutex/

  - the comments for buf_LRU_free_page() say that both LRU_list_mutex
    and block_mutex may be released temporarily if ‘true’ is
    returned. But:
    1) even if ‘false’ is returned, block_mutex may also be released
       temporarily
    2) the comments don’t mention that if ‘true’ is returned,
       LRU_list_mutex is always released upon return, but block_mutex is
       always locked. And callers of buf_LRU_free_page() rely on that.

  - the following code in buf_LRU_free_page() is missing a
    buf_page_free_descriptor() call if b != NULL. Which is a potential
    memory leak.

  if (!buf_LRU_block_remove_hashed(bpage, zip)) {
+
+ mutex_exit(&buf_pool->LRU_list_mutex);
+
+ mutex_enter(block_mutex);
+
   return(true);
  }

  - the patch removes buf_pool_mutex_enter_all() from
    btr_search_validate_one_table(), but then does a number of dirty
    reads from ‘block’ before it locks block->mutex. Any reasons to not
    lock block->mutex earlier?

  - the following checks for mutex != NULL in buf_buddy_relocate() seem
    to be redundant, since they are made after mutex_enter(mutex), so we
    are guaranteed mutex != NULL if we reach that code:

@@ -584,7 +604,11 @@ buf_buddy_relocate(

  mutex_enter(mutex);

- if (buf_page_can_relocate(bpage)) {
+ rw_lock_s_unlock(hash_lock);
+
+ mutex_enter(&buf_pool->zip_free_mutex);
+
+ if (mutex && buf_page_can_relocate(bpage)) {

and

- mutex_exit(mutex);
+ if (mutex)
+ mutex_exit(mutex);
+
+ ut_ad(mutex_own(&buf_pool->zip_free_mutex));

    and the last hunk is also missing braces (in case you decide to keep
    it).

  - asserting that zip_free_mutex is locked also looks redundant to me,
    because it is locked just a few lines above, and there’s nothing in
    the code path that could release it.

  - os_atomic_load_ulint() / os_atomic_store_ulint()... I don’t think we
    need that stuff. Their names are misleading as they don’t enforce
    any atomicity. They should be named os_ordered_load_ulint() /
    os_ordered_store_ulint(), but... what specific order are you trying
    to enforce with those constructs?

  - I don’t see a point in maintaining multiple list nodes in buf_page_t
    (i.e. ‘free’, ‘flush_list’ and ‘zip_list’). As I understand, each
    page may only be in a single list at any point in time, so splitting
    the list node is purely cosmetic.

    On the other hand, we are looking at a non-trivial buf_page_t size
    increase (112 bytes before the patch, 144 bytes after). Leaving all
    cache and memory locality questions aside, that’s 64 MB of memory
    just for list node pointers on a system with a 32 GB buffer pool....

Read more...

review: Needs Fixing
Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote : Posted in a previous version of this proposal
Download full text (7.7 KiB)

Alexey -

> - the code in btr_blob_free() can be simplified

Simplified.

> - wrong comments for buf_LRU_free_page(): s/block
> mutex/buf_page_get_mutex() mutex/

Edited. But there are numerous other places in this patch (and upstream) that would need this editing too, and "block mutex" is already an established shorthand for "really a block mutex or buf_pool->zip_mutex". Not to mention pointer to mutex variables named block_mutex.

Do you want me to edit the other places too?

> - the comments for buf_LRU_free_page() say that both LRU_list_mutex
> and block_mutex may be released temporarily if ‘true’ is
> returned. But:
> 1) even if ‘false’ is returned, block_mutex may also be released
> temporarily
> 2) the comments don’t mention that if ‘true’ is returned,
> LRU_list_mutex is always released upon return, but block_mutex is
> always locked. And callers of buf_LRU_free_page() rely on that.

Indeed callers rely on the current, arguably messy, buf_LRU_free_page() locking. This is how I edited the header comment for this and the previous review comment:

/******************************************************************//**
Try to free a block. If bpage is a descriptor of a compressed-only
page, the descriptor object will be freed as well.

NOTE: If this function returns true, it will release the LRU list mutex,
and temporarily release and relock the buf_page_get_mutex() mutex.
Furthermore, the page frame will no longer be accessible via bpage. If this
function returns false, the buf_page_get_get_mutex() might be temporarily
released and relocked too.

The caller must hold the LRU list and buf_page_get_mutex() mutexes.

@return true if freed, false otherwise. */

> - the following code in buf_LRU_free_page() is missing a
> buf_page_free_descriptor() call if b != NULL. Which is a potential
> memory leak.

Fixed.

> - the patch removes buf_pool_mutex_enter_all() from
> btr_search_validate_one_table(), but then does a number of dirty
> reads from ‘block’ before it locks block->mutex. Any reasons to not
> lock block->mutex earlier?

I *think* there were actual reasons, but I cannot remember them, due to the number of things going on with this patch. And I don't see why it locking block->mutex earlier is not possible now. I will look further.

> - the following checks for mutex != NULL in buf_buddy_relocate() seem
> to be redundant, since they are made after mutex_enter(mutex), so we
> are guaranteed mutex != NULL if we reach that code:

Fixed. This looks like a missed cleanup after removing 5.5's buf_page_get_mutex_enter().

> - asserting that zip_free_mutex is locked also looks redundant to me,
> because it is locked just a few lines above, and there’s nothing in
> the code path that could release it.

Removed. Added a comment to the function header "The caller must hold zip_free_mutex, and this
function will release and lock it again." instead.

> - os_atomic_load_ulint() / os_atomic_store_ulint()... I don’t think we
> need that stuff. Their names are misleading as they don’t enforce
> any atomicity.

The ops being load and store, their atom...

Read more...

Revision history for this message
Alexey Kopytov (akopytov) wrote : Posted in a previous version of this proposal
Download full text (11.5 KiB)

Hi Laurynas,

On Wed, 11 Sep 2013 09:45:13 -0000, Laurynas Biveinis wrote:
> Alexey -
>
>> - the code in btr_blob_free() can be simplified
>
> Simplified.
>
>> - wrong comments for buf_LRU_free_page(): s/block
>> mutex/buf_page_get_mutex() mutex/
>
> Edited. But there are numerous other places in this patch (and upstream) that would need this editing too, and "block mutex" is already an established shorthand for "really a block mutex or buf_pool->zip_mutex". Not to mention pointer to mutex variables named block_mutex.
>

I think editing existing comments is worth the efforts (and potential
extra maintenance cost in the future). I would be OK if this specific
comment was left intact too. It only caught my eye because the comment
was edited and I spent some time verifying it.

> Do you want me to edit the other places too?
>
>> - the comments for buf_LRU_free_page() say that both LRU_list_mutex
>> and block_mutex may be released temporarily if ‘true’ is
>> returned. But:
>> 1) even if ‘false’ is returned, block_mutex may also be released
>> temporarily
>> 2) the comments don’t mention that if ‘true’ is returned,
>> LRU_list_mutex is always released upon return, but block_mutex is
>> always locked. And callers of buf_LRU_free_page() rely on that.
>
> Indeed callers rely on the current, arguably messy, buf_LRU_free_page() locking. This is how I edited the header comment for this and the previous review comment:
>
> /******************************************************************//**
> Try to free a block. If bpage is a descriptor of a compressed-only
> page, the descriptor object will be freed as well.
>
> NOTE: If this function returns true, it will release the LRU list mutex,
> and temporarily release and relock the buf_page_get_mutex() mutex.
> Furthermore, the page frame will no longer be accessible via bpage. If this
> function returns false, the buf_page_get_get_mutex() might be temporarily
> released and relocked too.
>
> The caller must hold the LRU list and buf_page_get_mutex() mutexes.
>
> @return true if freed, false otherwise. */
>
>

Looks good.

>> - the patch removes buf_pool_mutex_enter_all() from
>> btr_search_validate_one_table(), but then does a number of dirty
>> reads from ‘block’ before it locks block->mutex. Any reasons to not
>> lock block->mutex earlier?
>
> I *think* there were actual reasons, but I cannot remember them, due to the number of things going on with this patch. And I don't see why it locking block->mutex earlier is not possible now. I will look further.
>

OK.

>> - os_atomic_load_ulint() / os_atomic_store_ulint()... I don’t think we
>> need that stuff. Their names are misleading as they don’t enforce
>> any atomicity.
>
> The ops being load and store, their atomicity is enforced by the data type width.
>

Right, the atomicity is enforced by the data type width on those
architectures that provide it. And even those that do provide it have a
number of prerequisites. Neither of those 2 facts is taken care of in
os_atomic_load_ulint() / os_atomic_store_ulint(). So they are not any
different with respect to atomi...

Revision history for this message
Alexey Kopytov (akopytov) wrote : Posted in a previous version of this proposal

On Wed, 11 Sep 2013 16:06:21 +0400, Alexey Kopytov wrote:
> I think editing existing comments is worth the efforts (and potential
> extra maintenance cost in the future). I would be OK if this specific

Grr, that was supposed to be "NOT worth the efforts" of course.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote : Posted in a previous version of this proposal

> > - the patch removes buf_pool_mutex_enter_all() from
> > btr_search_validate_one_table(), but then does a number of dirty
> > reads from ‘block’ before it locks block->mutex. Any reasons to not
> > lock block->mutex earlier?
>
> I *think* there were actual reasons, but I cannot remember them, due to the
> number of things going on with this patch. And I don't see why it locking
> block->mutex earlier is not possible now. I will look further.

A test run helped to recover my memories. So the problem is buf_block_hash_get() call, which locks page_hash, which violates locking order. Keeping the initial reads is OK, because 1) block pointer will not be invalidated because we are reading AHI-indexed pages only, which can be only BUF_BLOCK_FILE_PAGE or BUF_BLOCK_REMOVE_HASH (which is a kind of BUF_BLOCK_FILE_PAGE for our purposes), 2) block space and page_id cannot change neither while we hold a corresponding AHI X latch. Thus the initial dirty reads are not actually dirty.

Revision history for this message
Alexey Kopytov (akopytov) wrote : Posted in a previous version of this proposal

Hi Laurynas,

On Wed, 11 Sep 2013 12:26:57 -0000, Laurynas Biveinis wrote:
>>> - the patch removes buf_pool_mutex_enter_all() from
>>> btr_search_validate_one_table(), but then does a number of dirty
>>> reads from ‘block’ before it locks block->mutex. Any reasons to not
>>> lock block->mutex earlier?
>>
>> I *think* there were actual reasons, but I cannot remember them, due to the
>> number of things going on with this patch. And I don't see why it locking
>> block->mutex earlier is not possible now. I will look further.
>
> A test run helped to recover my memories. So the problem is buf_block_hash_get() call, which locks page_hash, which violates locking order. Keeping the initial reads is OK, because 1) block pointer will not be invalidated because we are reading AHI-indexed pages only, which can be only BUF_BLOCK_FILE_PAGE or BUF_BLOCK_REMOVE_HASH (which is a kind of BUF_BLOCK_FILE_PAGE for our purposes), 2) block space and page_id cannot change neither while we hold a corresponding AHI X latch. Thus the initial dirty reads are not actually dirty.
>

Right, makes sense.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote : Posted in a previous version of this proposal
Download full text (11.7 KiB)

> >> - wrong comments for buf_LRU_free_page(): s/block
> >> mutex/buf_page_get_mutex() mutex/
> >
> > Edited. But there are numerous other places in this patch (and upstream)
> that would need this editing too, and "block mutex" is already an established
> shorthand for "really a block mutex or buf_pool->zip_mutex". Not to mention
> pointer to mutex variables named block_mutex.
> >
>
> I think editing existing comments is not worth the efforts (and potential
> extra maintenance cost in the future). I would be OK if this specific
> comment was left intact too. It only caught my eye because the comment
> was edited and I spent some time verifying it.

Fair enough. I applied same edit through the other comments changed by this patch.

> >> - os_atomic_load_ulint() / os_atomic_store_ulint()... I don’t think we
> >> need that stuff. Their names are misleading as they don’t enforce
> >> any atomicity.
> >
> > The ops being load and store, their atomicity is enforced by the data type
> width.
> >
>
> Right, the atomicity is enforced by the data type width on those
> architectures that provide it.

I forgot to mention that they also have to be not misaligned so that one access does not translate to two accesses.

> And even those that do provide it have a
> number of prerequisites. Neither of those 2 facts is taken care of in
> os_atomic_load_ulint() / os_atomic_store_ulint(). So they are not any
> different with respect to atomicity as plain load/store of a ulint and
> thus, have misleading names.
>
> So to justify "atomic" in their names those functions should:
>
> - (if we want to be portable) protect those load/stores with a mutex

Why? I guess this question boils down to, what would the mutex implementation code additionally ensure here, let's say, on x86_64? Or is this referring to the 5.6 mutex fallbacks when no atomic ops are implemented for a platform?

> - (if we only care about x86/x86_64) make sure that values being
> loaded/stored do not cross cache lines or page boundaries. Which is of
> course impossible to guarantee in a generic function.

Why? We are talking about ulints here only, and I was not able to find such requirements in the x86_64 memory model descriptions. There is a requirement to be aligned, and misaligned stores/loads might indeed cross cache line or page boundaries, and anything that crosses them is indeed non-atomic. But alignment is possible to guarantee in a generic function (which doesn't even has to be generic: the x86_64 implementation is for x86_64 only, obviously).

Intel® 64 and IA-32 Architectures Software Developer's Manual
Volume 3A: System Programming Guide, Part 1, section 8.1.1, http://download.intel.com/products/processor/manual/253668.pdf:

"The Intel486 processor (and newer processors since) guarantees that the following basic memory operations will
always be carried out atomically:
(...)
• Reading or writing a doubleword aligned on a 32-bit boundary

The Pentium processor (...):
• Reading or writing a quadword aligned on a 64-bit boundary
"

My understanding of the above is that os_atomic_load_ulint()/os_atomic_store_ulint() fit the above description, modulo alignment ...

Revision history for this message
Alexey Kopytov (akopytov) wrote : Posted in a previous version of this proposal
Download full text (11.6 KiB)

Hi Laurynas,

On Wed, 11 Sep 2013 15:29:33 -0000, Laurynas Biveinis wrote:
>>>> - os_atomic_load_ulint() / os_atomic_store_ulint()... I don’t think we
>>>> need that stuff. Their names are misleading as they don’t enforce
>>>> any atomicity.
>>>
>>> The ops being load and store, their atomicity is enforced by the data type
>> width.
>>>
>>
>> Right, the atomicity is enforced by the data type width on those
>> architectures that provide it.
>
>
> I forgot to mention that they also have to be not misaligned so that one access does not translate to two accesses.
>

Yes, but alignment does not guarantee atomicity, see below.

>
>> And even those that do provide it have a
>> number of prerequisites. Neither of those 2 facts is taken care of in
>> os_atomic_load_ulint() / os_atomic_store_ulint(). So they are not any
>> different with respect to atomicity as plain load/store of a ulint and
>> thus, have misleading names.
>>
>> So to justify "atomic" in their names those functions should:
>>
>> - (if we want to be portable) protect those load/stores with a mutex
>
>
> Why? I guess this question boils down to, what would the mutex implementation code additionally ensure here, let's say, on x86_64? Or is this referring to the 5.6 mutex fallbacks when no atomic ops are implemented for a platform?
>

mutex the only portable way to ensure atomicity. You can use atomic
primitives provided by specific architectures, but then you either limit
support for those architectures or yes, provide a mutex fallback.

>
>> - (if we only care about x86/x86_64) make sure that values being
>> loaded/stored do not cross cache lines or page boundaries. Which is of
>> course impossible to guarantee in a generic function.
>
>
> Why? We are talking about ulints here only, and I was not able to find such requirements in the x86_64 memory model descriptions. There is a requirement to be aligned, and misaligned stores/loads might indeed cross cache line or page boundaries, and anything that crosses them is indeed non-atomic. But alignment is possible to guarantee in a generic function (which doesn't even has to be generic: the x86_64 implementation is for x86_64 only, obviously).
>
> Intel® 64 and IA-32 Architectures Software Developer's Manual
> Volume 3A: System Programming Guide, Part 1, section 8.1.1, http://download.intel.com/products/processor/manual/253668.pdf:
>
> "The Intel486 processor (and newer processors since) guarantees that the following basic memory operations will
> always be carried out atomically:
> (...)
> • Reading or writing a doubleword aligned on a 32-bit boundary
>
> The Pentium processor (...):
> • Reading or writing a quadword aligned on a 64-bit boundary
> "

Why didn't you quote it further?

"
Accesses to cacheable memory that are split across cache lines and page
boundaries are not guaranteed to be atomic by <all processors>. <all
processors> provide bus control signals that permit external memory
subsystems to make split accesses atomic;
"

Which means even aligned accesses are not guaranteed to be atomic and
it's up to the implementation of "external memory subsystems" (that
probably means chipsets, motherboards, NUMA archi...

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote : Posted in a previous version of this proposal
Download full text (14.0 KiB)

Alexey -

> >> - (if we only care about x86/x86_64) make sure that values being
> >> loaded/stored do not cross cache lines or page boundaries. Which is of
> >> course impossible to guarantee in a generic function.
> >
> >
> > Why? We are talking about ulints here only, and I was not able to find such
> requirements in the x86_64 memory model descriptions. There is a requirement
> to be aligned, and misaligned stores/loads might indeed cross cache line or
> page boundaries, and anything that crosses them is indeed non-atomic. But
> alignment is possible to guarantee in a generic function (which doesn't even
> has to be generic: the x86_64 implementation is for x86_64 only, obviously).
> >
> > Intel® 64 and IA-32 Architectures Software Developer's Manual
> > Volume 3A: System Programming Guide, Part 1, section 8.1.1,
> http://download.intel.com/products/processor/manual/253668.pdf:
> >
> > "The Intel486 processor (and newer processors since) guarantees that the
> following basic memory operations will
> > always be carried out atomically:
> > (...)
> > • Reading or writing a doubleword aligned on a 32-bit boundary
> >
> > The Pentium processor (...):
> > • Reading or writing a quadword aligned on a 64-bit boundary
> > "
>
> Why didn't you quote it further?
>
> "
> Accesses to cacheable memory that are split across cache lines and page
> boundaries are not guaranteed to be atomic by <all processors>. <all
> processors> provide bus control signals that permit external memory
> subsystems to make split accesses atomic;
> "
>
> Which means even aligned accesses are not guaranteed to be atomic and
> it's up to the implementation of "external memory subsystems" (that
> probably means chipsets, motherboards, NUMA architectures and the like).

I didn't quote because we both have already acknowledged that cache line- or page boundary-crossing accesses are non-atomic, and because I don't see how it's relevant here. I don't see how a properly-aligned ulint can possibly cross a cache line boundary, when cache lines are 64-byte wide and 64-byte aligned. Or even 32 for older architectures.

> > My understanding of the above is that
> os_atomic_load_ulint()/os_atomic_store_ulint() fit the above description,
> modulo alignment issues, if any. These are easy to ensure by ut_ad().
> >
>
> Modulo alignment, cache line boundary and page boundary issues.

Alignment only unless my reasoning above is wrong.

> I don't see how ut_ad() is going to help here. So a buf_pool_stat_t
> structure happens to be allocated in memory so that n_pages_written
> happens to be misaligned, or cross a cache line or a page boundary. How
> exactly ut_ad() is going to ensure that never happens at runtime?

A debug build would hit this assert and we'd fix the structure layout/allocation. Unless I'm mistaken, to get a misaligned ulint, we'd have to ask for this explicitly, by packing a struct, fetching a pointer to it from a byte array, etc. Thus ut_ad() seems reasonable to me.

> >>>> They should be named os_ordered_load_ulint() /
> >>>> os_ordered_store_ulint(),
> >>>
> >>> That's an option, but I needed atomicity, visibility, and ordering, and
> >> chose atomic for ...

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote : Posted in a previous version of this proposal

buf_LRU_remove_all_pages() was ported incorrectly from 5.5, dropping the space id and I/O-fix re-checks after the block mutex acquisition. This has been caught by Roel as bug 1224282.

review: Needs Fixing
Revision history for this message
Alexey Kopytov (akopytov) wrote : Posted in a previous version of this proposal
Download full text (17.1 KiB)

Hi Laurynas,

On Thu, 12 Sep 2013 06:16:52 -0000, Laurynas Biveinis wrote:
> Alexey -
>
>
>>>> - (if we only care about x86/x86_64) make sure that values being
>>>> loaded/stored do not cross cache lines or page boundaries. Which is of
>>>> course impossible to guarantee in a generic function.
>>>
>>>
>>> Why? We are talking about ulints here only, and I was not able to find such
>> requirements in the x86_64 memory model descriptions. There is a requirement
>> to be aligned, and misaligned stores/loads might indeed cross cache line or
>> page boundaries, and anything that crosses them is indeed non-atomic. But
>> alignment is possible to guarantee in a generic function (which doesn't even
>> has to be generic: the x86_64 implementation is for x86_64 only, obviously).
>>>
>>> Intel® 64 and IA-32 Architectures Software Developer's Manual
>>> Volume 3A: System Programming Guide, Part 1, section 8.1.1,
>> http://download.intel.com/products/processor/manual/253668.pdf:
>>>
>>> "The Intel486 processor (and newer processors since) guarantees that the
>> following basic memory operations will
>>> always be carried out atomically:
>>> (...)
>>> • Reading or writing a doubleword aligned on a 32-bit boundary
>>>
>>> The Pentium processor (...):
>>> • Reading or writing a quadword aligned on a 64-bit boundary
>>> "
>>
>> Why didn't you quote it further?
>>
>> "
>> Accesses to cacheable memory that are split across cache lines and page
>> boundaries are not guaranteed to be atomic by <all processors>. <all
>> processors> provide bus control signals that permit external memory
>> subsystems to make split accesses atomic;
>> "
>>
>> Which means even aligned accesses are not guaranteed to be atomic and
>> it's up to the implementation of "external memory subsystems" (that
>> probably means chipsets, motherboards, NUMA architectures and the like).
>
>
> I didn't quote because we both have already acknowledged that cache line- or page boundary-crossing accesses are non-atomic, and because I don't see how it's relevant here. I don't see how a properly-aligned ulint can possibly cross a cache line boundary, when cache lines are 64-byte wide and 64-byte aligned. Or even 32 for older architectures.
>

The array of buffer pool descriptors is allocated as follows:

 buf_pool_ptr = (buf_pool_t*) mem_zalloc(
  n_instances * sizeof *buf_pool_ptr);

so individual buf_pool_t instances are not guaranteed to have any
specific alignment, neither to cache line nor to page boundaries, right?

Now, the 'stat' member of buf_pool_t has the offset of 736 bytes into
buf_pool_t so nothing prevents it from crossing a cache line or a page
boundary?

Now, offsets of the buf_pool_stat_t members vary from 0 to 88. Again,
nothing prevents them from crossing a cache line or a page boundary, right?

>
>>> My understanding of the above is that
>> os_atomic_load_ulint()/os_atomic_store_ulint() fit the above description,
>> modulo alignment issues, if any. These are easy to ensure by ut_ad().
>>>
>>
>> Modulo alignment, cache line boundary and page boundary issues.
>
>
> Alignment only unless my reasoning above is wrong.
>

Yes.

>
>> I don't see how ut_ad() is going to help here. So a ...

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote : Posted in a previous version of this proposal

Repushed branch with the 1st partial review comments. Not a resubmission due to partial review and ongoing discussions.

    Changes from the 1st MP.
    - Simplified btr_blob_free().
    - Added a note about mutexes to the header comment of
      buf_buddy_relocate().
    - Removed redundant mutex == NULL checks and mutex own assertions
      from buf_buddy_relocate().
    - Fixed locking notes in buf_LRU_free_page() header comment.
    - Removed a memory leak in one of the early exits in
      buf_LRU_free_page().
    - Clarified locking in a comment for buf_page_t::zip.
    - Added debug build checks to os_atomic_load_ulint() and
      os_atomic_store_ulint() x86_64 implementation that the accessed
      variable is properly aligned.
    - Added dropped by mistake buffer page space id and I/O fix 2nd
      checks after the buf_page_get_mutex() has been locked in
      buf_LRU_remove_all_pages().

Please ignore the "Added debug build checks to os_atomic_load_ulint() and os_atomic_store_ulint() x86_64 implementation that the accessed variable is properly aligned." bit for now.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote : Posted in a previous version of this proposal

A Jenkins run of the latest branch turned up bug 1224432. Logged a separate bug because I am not sure I'll manage to debug it during this MP cycle.

"...
2013-09-12 12:15:38 7ff450082700 InnoDB: Assertion failure in thread 140687291459328 in file buf0buf.cc line 3694
InnoDB: Failing assertion: buf_fix_count > 0
...

Which appears to be a race condition on buf_fix_count on a page that is a sentinel for buffer pool watch. How exactly this can happen is not clear to me currently. All the watch sentinel buf_fix_count changes happen under zip_mutex and a corresponding page_hash X lock. Further, buf_page_init_for_read() asserts buf_fix_count > 0 through buf_pool_watch_is_sentinel() at line 3647, thus this should have changed between 3647 and 3694, but the hash is X-latched throughout, even though the zip mutex is only acquired at 3670."

review: Needs Fixing
Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote : Posted in a previous version of this proposal
Download full text (19.0 KiB)

Alexey -

> I don't see how a properly-aligned ulint can possibly
> cross a cache line boundary, when cache lines are 64-byte wide and 64-byte
> aligned. Or even 32 for older architectures.
> >
>
> The array of buffer pool descriptors is allocated as follows:

You are right, I failed to consider the base addresses returned by dynamic memory allocation. I also failed to notice your hint to that direction in one of the previous mails.

> buf_pool_ptr = (buf_pool_t*) mem_zalloc(
> n_instances * sizeof *buf_pool_ptr);
>
> so individual buf_pool_t instances are not guaranteed to have any
> specific alignment, neither to cache line nor to page boundaries, right?

Right.

> Now, the 'stat' member of buf_pool_t has the offset of 736 bytes into
> buf_pool_t so nothing prevents it from crossing a cache line or a page
> boundary?

Right.

> Now, offsets of the buf_pool_stat_t members vary from 0 to 88. Again,
> nothing prevents them from crossing a cache line or a page boundary, right?

Right, nothing prevents an object of buf_pool_stat_t from crossing it. But that's OK. We only need the individual fields not to cross it.

> >> I don't see how ut_ad() is going to help here. So a buf_pool_stat_t
> >> structure happens to be allocated in memory so that n_pages_written
> >> happens to be misaligned, or cross a cache line or a page boundary. How
> >> exactly ut_ad() is going to ensure that never happens at runtime?
> >
> >
> > A debug build would hit this assert and we'd fix the structure
> layout/allocation. Unless I'm mistaken, to get a misaligned ulint, we'd have
> to ask for this explicitly, by packing a struct, fetching a pointer to it from
> a byte array, etc. Thus ut_ad() seems reasonable to me.
> >
>
> The only thing you can assume about dynamically allocated objects is
> that their addresses (and thus, the first member of a structure, if an
> object is a structure) is aligned to machine word size. Which is always
> lower than the cache line size. There are no guarantees on alignment of
> other structure members, no matter what compiler hints were used (those
> only matter for statically allocated objects).

Right. So, to conclude... no individual ulint is going to cross a cache line or a page boundary and we are good? We start with a machine-word aligned address returned from heap, and add a multiple of machine-word width to arrive at the address of an individual field, which is machine-word aligned and thus the individual field cannot cross anything?

> >>>>>> They should be named os_ordered_load_ulint() /
> >>>>>> os_ordered_store_ulint(),
> >>>>>
> >>>>> That's an option, but I needed atomicity, visibility, and ordering, and
> >>>> chose atomic for function name to match the existing CAS and atomic add
> >>>> operations, which also need all three.
> >>>>>
> >>>>
> >>>> I'm not sure you need all of those 3 in every occurrence of those
> >>>> functions, but see below.
> >>>
> >>>
> >>> That's right. And ordering probably is not needed anywhere, sorry about
> >> that, my understanding of atomics is far from fluent. But visibility
> should
> >> be needed in every occurrence if this is ever ported to a...

Revision history for this message
Alexey Kopytov (akopytov) wrote : Posted in a previous version of this proposal

Hi Laurynas,

You are right that that os_atomic_{load,store}_ulint() will work as the
name suggests (i.e. be atomic) as long as:

1. it is used to access machine word sized members of structures
2. we are on x86/x86_64

However, the patch implements them as a generic primitives that do
nothing to enforce those restrictions, and that's why their names are
misleading. This is where this discussion has started, and it is a
design flaw. I don't see any arguments in this discussion that would
dispel those concerns. You also acknowledged them in one of the comments.

In contrast, other atomic primitives in existing code keep up to their
promise of being atomic, i.e. do not enforce any implicit requirements.
But they also have mutex-guarded fall back implementations for those
architectures that do not provide atomics.

I also agree that this discussion may be endless and time is precious.
So I think we should implement whatever we both agree does work. That
is: instead of implementing generic atomic primitives that are only
atomic under implicit requirements that are not enforceable at compile
time, it must either use separate mutex(es) to protect them, or use true
atomic primitives provided by the existing code if they are available on
the target architecture and fall back to mutex-guarded access. The
latter is how it is implemented in the rest of InnoDB.

Thanks.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote : Posted in a previous version of this proposal

Alexey -

> Hi Laurynas,
>
> You are right that that os_atomic_{load,store}_ulint() will work as the
> name suggests (i.e. be atomic) as long as:
>
> 1. it is used to access machine word sized members of structures
> 2. we are on x86/x86_64

Right.

> However, the patch implements them as a generic primitives that do
> nothing to enforce those restrictions, and that's why their names are
> misleading.

1. is enforced through "ulint" in the name and args. ulint is commented in univ.i as "unsigned long integer which should be equal to the word size of the machine".
2. is enforced by platform #ifdefs not providing any other implementation except one for x86/x86_64 with GCC or a GCC-like compiler.

Thus I provide generic primitives, whose current implementations will work as designed. However the 1. above also seems to be missing "properly-aligned" and that's where the design is debatable. On one hand it is possible to implement misaligned access atomically by LOCK MOV, and document that the primitives may be used with args of any alignment. But a better alternative to me seems to accept that misaligned accesses are bugs and document/allow aligned accesses only. Even though that's enforceable in debug builds only, so that's not ideally perfect, but IMHO acceptable.

> This is where this discussion has started, and it is a
> design flaw. I don't see any arguments in this discussion that would
> dispel those concerns. You also acknowledged them in one of the comments.

Addressed above.

> I also agree that this discussion may be endless and time is precious.

But it also cannot end prematurely.

> So I think we should implement whatever we both agree does work.

I suggest that the above either is working already or requires some improvements re. alignment only.

> That
> is: instead of implementing generic atomic primitives that are only
> atomic under implicit requirements that are not enforceable at compile
> time, it must either use separate mutex(es) to protect them, or use true
> atomic primitives provided by the existing code if they are available on
> the target architecture

If you either show that how I address 1. and 2. above is incorrect, either show that the alignment issue is major and unsurmountable, then I'll implement load as inc by zero, return old value, and store as dirty read + CAS in a loop using the existing primitives.

Thanks,
Laurynas

Revision history for this message
Alexey Kopytov (akopytov) wrote : Posted in a previous version of this proposal

Hi Laurynas,

On Fri, 13 Sep 2013 07:40:10 -0000, Laurynas Biveinis wrote:
> Alexey -
>
>
>> Hi Laurynas,
>>
>> You are right that that os_atomic_{load,store}_ulint() will work as the
>> name suggests (i.e. be atomic) as long as:
>>
>> 1. it is used to access machine word sized members of structures
>> 2. we are on x86/x86_64
>
>
> Right.
>
>
>> However, the patch implements them as a generic primitives that do
>> nothing to enforce those restrictions, and that's why their names are
>> misleading.
>
>
> 1. is enforced through "ulint" in the name and args. ulint is commented in univ.i as "unsigned long integer which should be equal to the word size of the machine".

It is not enforced, because nothing prevents me from passing a
misaligned address to those functions and expect them to be atomic as
the name implies.

For example, os_atomic_inc_ulint() is guaranteed to be atomic for any
arguments on any platform. But os_atomic_load_ulint() is not. That is
the problem.

> 2. is enforced by platform #ifdefs not providing any other implementation except one for x86/x86_64 with GCC or a GCC-like compiler.
>

That's correct. I only mentioned #2 for completeness.

> Thus I provide generic primitives, whose current implementations will work as designed. However the 1. above also seems to be missing "properly-aligned" and that's where the design is debatable. On one hand it is possible to implement misaligned access atomically by LOCK MOV, and document that the primitives may be used with args of any alignment. But a better alternative to me seems to accept that misaligned accesses are bugs and document/allow aligned accesses only. Even though that's enforceable in debug builds only, so that's not ideally perfect, but IMHO acceptable.
>

You don't.

>
> If you either show that how I address 1. and 2. above is incorrect, either show that the alignment issue is major and unsurmountable, then I'll implement load as inc by zero, return old value, and store as dirty read + CAS in a loop using the existing primitives.
>

Yes, please do.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote : Posted in a previous version of this proposal

Alexey -

> >> You are right that that os_atomic_{load,store}_ulint() will work as the
> >> name suggests (i.e. be atomic) as long as:
> >>
> >> 1. it is used to access machine word sized members of structures
> >> 2. we are on x86/x86_64
> >
> >
> > Right.
> >
> >
> >> However, the patch implements them as a generic primitives that do
> >> nothing to enforce those restrictions, and that's why their names are
> >> misleading.
> >
> >
> > 1. is enforced through "ulint" in the name and args. ulint is commented in
> univ.i as "unsigned long integer which should be equal to the word size of the
> machine".
>
> It is not enforced, because nothing prevents me from passing a
> misaligned address to those functions and expect them to be atomic as
> the name implies.

This is exactly what I discussed below.

> For example, os_atomic_inc_ulint() is guaranteed to be atomic for any
> arguments on any platform. But os_atomic_load_ulint() is not. That is
> the problem.

Right. os_atomic_load_ulint() has additional restrictions on its arg over os_atomic_inc_ulint(). I believe that these restrictions are reasonable. It is a performance bug in any case to perform misaligned atomic ops even with those ops that make it technically possible. I have added ut_ad()s to catch this. I can rename os_atomic_ prefix to os_atomic_aligned_ prefix too, although that one looks like an overkill to me.

> > 2. is enforced by platform #ifdefs not providing any other implementation
> except one for x86/x86_64 with GCC or a GCC-like compiler.
> >
>
> That's correct. I only mentioned #2 for completeness.

OK, but I am not sure what does the #2 complete then.

> > Thus I provide generic primitives, whose current implementations will work
> as designed. However the 1. above also seems to be missing "properly-aligned"
> and that's where the design is debatable. On one hand it is possible to
> implement misaligned access atomically by LOCK MOV, and document that the
> primitives may be used with args of any alignment. But a better alternative
> to me seems to accept that misaligned accesses are bugs and document/allow
> aligned accesses only. Even though that's enforceable in debug builds only,
> so that's not ideally perfect, but IMHO acceptable.
> >
>
> You don't.

Will you reply to the rest of that paragraph too please? I am acknowledging that alignment is an issue, so let's see how to resolve it.

Revision history for this message
Alexey Kopytov (akopytov) wrote : Posted in a previous version of this proposal

Hi Laurynas,

On Wed, 11 Sep 2013 15:29:33 -0000, Laurynas Biveinis wrote:
>>>> - the following hunk simply removes reference to buf_pool->mutex. As I
>>>> understand, it should just replace buf_pool->mutex with
>> zip_free_mutex?
>>>> + page_zip_des_t zip; /*!< compressed page; state
>>>> + == BUF_BLOCK_ZIP_PAGE and zip.data
>>>> + == NULL means an active
>>>
>>> Hm, it looked to me that it's protected not with zip_free_mutex but with
>> zip_mutex in its page mutex capacity. I will check.
>>>
>>
>> There was a place in the code that asserted zip_free_mutex locked when
>> bpage->zip.data is modified. But I'm not sure if that is correct.
>
>
> I have checked, I believe it's indeed zip_mutex. Re. zip_free_mutex, you must be referring to this bit in buf_buddy_relocate():
>
> mutex_enter(&buf_pool->zip_free_mutex);
>
> if (buf_page_can_relocate(bpage)) {
> ...
> bpage->zip.data = (page_zip_t*) dst;
> mutex_exit(mutex);
>
> buf_buddy_stat_t* buddy_stat = &buf_pool->buddy_stat[i];
> buddy_stat->relocated++;
> buddy_stat->relocated_usec += ut_time_us(NULL) - usec;
> ...
>
> Here zip_free_mutex happens to protect buddy_stat and pushing it down to if clause would require an else clause to appear that locks the same mutex.
>

No, I was referring to buf_pool_contains_zip(). It traverses the buffer
pool and examines (but not modifies) block->page.zip.data for each
block. However, the patch changes the assertion in
buf_pool_contains_zip() to make sure that zip_free_mutex is locked,
rather than zip_mutex. In fact, in one of the code paths calling
buf_pool_contains_zip() we assert that zip_mutex is NOT locked. Don't we
have a bug here?

Revision history for this message
Alexey Kopytov (akopytov) wrote : Posted in a previous version of this proposal

On Fri, 13 Sep 2013 11:10:36 -0000, Laurynas Biveinis wrote:
>
>
> Right. os_atomic_load_ulint() has additional restrictions on its arg over os_atomic_inc_ulint(). I believe that these restrictions are reasonable. It is a performance bug in any case to perform misaligned atomic ops even with those ops that make it technically possible. I have added ut_ad()s to catch this. I can rename os_atomic_ prefix to os_atomic_aligned_ prefix too, although that one looks like an overkill to me.
>

The same restrictions would apply even if os_atomic_load_ulint() didn't
exist, right? I.e. the same restrictions would apply if we simply
accessed those variables without any helper functions?

Let me ask you a few simple questions and this time around I demand
"yes/no" answers.

- Do you agree that os_atomic_load_ulint() / os_atomic_store_ulint() do
not do what they promise to do?

- Do you agree that naming them os_ordered_load_ulint() /
os_ordered_store_ulint() would better reflect what they do?

- Do you agree that naming them that way also makes it obvious that
using them in most places is simply unnecessary (e.g. in
buf_get_total_stat(), buf_mark_space_corrupt(), buf_print_instance(),
buf_get_n_pending_read_ios(), etc.)?

>
>>> 2. is enforced by platform #ifdefs not providing any other implementation
>> except one for x86/x86_64 with GCC or a GCC-like compiler.
>>>
>>
>> That's correct. I only mentioned #2 for completeness.
>
>
> OK, but I am not sure what does the #2 complete then.
>
>
>>> Thus I provide generic primitives, whose current implementations will work
>> as designed. However the 1. above also seems to be missing "properly-aligned"
>> and that's where the design is debatable. On one hand it is possible to
>> implement misaligned access atomically by LOCK MOV, and document that the
>> primitives may be used with args of any alignment. But a better alternative
>> to me seems to accept that misaligned accesses are bugs and document/allow
>> aligned accesses only. Even though that's enforceable in debug builds only,
>> so that's not ideally perfect, but IMHO acceptable.
>>>
>>
>> You don't.
>
>
> Will you reply to the rest of that paragraph too please? I am acknowledging that alignment is an issue, so let's see how to resolve it.
>

I don't think enforcing requirements in debug builds only is acceptable.
It must be a compile-time assertion, not a run-time one.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote : Posted in a previous version of this proposal

> > Right. os_atomic_load_ulint() has additional restrictions on its arg over
> os_atomic_inc_ulint(). I believe that these restrictions are reasonable. It
> is a performance bug in any case to perform misaligned atomic ops even with
> those ops that make it technically possible. I have added ut_ad()s to catch
> this. I can rename os_atomic_ prefix to os_atomic_aligned_ prefix too,
> although that one looks like an overkill to me.
> >
>
> The same restrictions would apply even if os_atomic_load_ulint() didn't
> exist, right? I.e. the same restrictions would apply if we simply
> accessed those variables without any helper functions?

What would be the desired access semantics in that case? If "anything goes", then no restrictions would apply.

> Let me ask you a few simple questions and this time around I demand
> "yes/no" answers.
>
> - Do you agree that os_atomic_load_ulint() / os_atomic_store_ulint() do
> not do what they promise to do?

Yes.

> - Do you agree that naming them os_ordered_load_ulint() /
> os_ordered_store_ulint() would better reflect what they do?

No.

> - Do you agree that naming them that way also makes it obvious that
> using them in most places is simply unnecessary (e.g. in
> buf_get_total_stat(), buf_mark_space_corrupt(), buf_print_instance(),
> buf_get_n_pending_read_ios(), etc.)?

No.

The first answer would be No if not the alignment issue.

> >>> Thus I provide generic primitives, whose current implementations will work
> >> as designed. However the 1. above also seems to be missing "properly-
> aligned"
> >> and that's where the design is debatable. On one hand it is possible to
> >> implement misaligned access atomically by LOCK MOV, and document that the
> >> primitives may be used with args of any alignment. But a better
> alternative
> >> to me seems to accept that misaligned accesses are bugs and document/allow
> >> aligned accesses only. Even though that's enforceable in debug builds
> only,
> >> so that's not ideally perfect, but IMHO acceptable.
> >>>
> >>
> >> You don't.
> >
> >
> > Will you reply to the rest of that paragraph too please? I am acknowledging
> that alignment is an issue, so let's see how to resolve it.
> >
>
> I don't think enforcing requirements in debug builds only is acceptable.
> It must be a compile-time assertion, not a run-time one.

And as we both know, this is not enforceable at compile time. I think that requesting extra protections on the top of already provided ones, when the only way to get a misaligned ulint is to ask for one explicitly, is an overkill. But that's my hand-waving against your hand-waving. Thus let's say that yes, "the alignment issue is major and unsurmountable", and I proceed to do what was offered previously: implement load as atomic add zero and return value, and store as dirty read and CAS until success. The reason I didn't like these implementations is that they are pessimized. But that's OK.

Revision history for this message
Alexey Kopytov (akopytov) wrote : Posted in a previous version of this proposal

On Fri, 13 Sep 2013 12:54:44 -0000, Laurynas Biveinis wrote:
>>> Right. os_atomic_load_ulint() has additional restrictions on its arg over
>> os_atomic_inc_ulint(). I believe that these restrictions are reasonable. It
>> is a performance bug in any case to perform misaligned atomic ops even with
>> those ops that make it technically possible. I have added ut_ad()s to catch
>> this. I can rename os_atomic_ prefix to os_atomic_aligned_ prefix too,
>> although that one looks like an overkill to me.
>>>
>>
>> The same restrictions would apply even if os_atomic_load_ulint() didn't
>> exist, right? I.e. the same restrictions would apply if we simply
>> accessed those variables without any helper functions?
>
>
> What would be the desired access semantics in that case? If "anything goes", then no restrictions would apply.
>
>
>> Let me ask you a few simple questions and this time around I demand
>> "yes/no" answers.
>>
>> - Do you agree that os_atomic_load_ulint() / os_atomic_store_ulint() do
>> not do what they promise to do?
>
>
> Yes.

OK, so we agree that naming is unfortunate.

>
>
>> - Do you agree that naming them os_ordered_load_ulint() /
>> os_ordered_store_ulint() would better reflect what they do?
>
>
> No.

What would be a better naming then?

>
>
>> - Do you agree that naming them that way also makes it obvious that
>> using them in most places is simply unnecessary (e.g. in
>> buf_get_total_stat(), buf_mark_space_corrupt(), buf_print_instance(),
>> buf_get_n_pending_read_ios(), etc.)?
>
>
> No.

OK, then 2 followup questions:

1. Why do we need os_atomic_load_ulint() in buf_get_total_stat(), for
example? Here's an example of a valid answer: "Because that will result
in incorrect values being used in case ...". And some examples of
invalid answers: "non-cache-coherent architectures, visibility, memory
model, sunspots, crop circles, global warming, ...".

2. Why do only 2 out of 9 values are being loaded with
os_atomic_load_ulint() in buf_get_total_stat()? Why don't the remaining
ones need them?

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote : Posted in a previous version of this proposal

Alexey -

> >> - Do you agree that os_atomic_load_ulint() / os_atomic_store_ulint() do
> >> not do what they promise to do?
> >
> >
> > Yes.
>
> OK, so we agree that naming is unfortunate.

Vanishingly slightly so, due to the alignment issue, which I believe is mostly theoretical, but nevertheless am ready to address now.

> >> - Do you agree that naming them os_ordered_load_ulint() /
> >> os_ordered_store_ulint() would better reflect what they do?
> >
> >
> > No.
>
> What would be a better naming then?

os_atomic_load_aligned_ulint().

> OK, then 2 followup questions:
>
> 1. Why do we need os_atomic_load_ulint() in buf_get_total_stat(), for
> example? Here's an example of a valid answer: "Because that will result
> in incorrect values being used in case ...". And some examples of
> invalid answers: "non-cache-coherent architectures, visibility, memory
> model, sunspots, crop circles, global warming, ...".

We have gone through this with a disassembly example already, haven't we? We need os_atomic_load_ulint() because we don't want a dirty read. We may well decide that a dirty read there is fine and then replace it. But that's orthogonal to what is this primitive and why.

Are you also objecting to mutex protection here? If not, why? Note that the three n_flush values here are completely independent.

  mutex_enter(&buf_pool->flush_state_mutex);

  pending_io += buf_pool->n_flush[BUF_FLUSH_LRU];
  pending_io += buf_pool->n_flush[BUF_FLUSH_SINGLE_PAGE];
  pending_io += buf_pool->n_flush[BUF_FLUSH_LIST];

  mutex_exit(&buf_pool->flush_state_mutex);

And please don't group some of the valid answers with looney stuff.

> 2. Why do only 2 out of 9 values are being loaded with
> os_atomic_load_ulint() in buf_get_total_stat()? Why don't the remaining
> ones need them?

Upstream reads all 9 dirtily. I replaced two of them to be clean instead. Maybe I need to replace all 9. Maybe 0. But that's again orthogonal.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote : Posted in a previous version of this proposal

Alexey -

> No, I was referring to buf_pool_contains_zip(). It traverses the buffer
> pool and examines (but not modifies) block->page.zip.data for each
> block. However, the patch changes the assertion in
> buf_pool_contains_zip() to make sure that zip_free_mutex is locked,
> rather than zip_mutex.

Thanks for the pointer, I have reviewed buf_pool_contains_zip() further. Here, as you say, we iterate through the buffer pool uncompressed pages trying to find the uncompressed page of a given compressed page, in order to assert that no uncompressed page points to it, which can be either just taken from the buddy allocator, either be just removed from the buffer pool. In both cases it's an invalid pointer value and it seems to me that we could iterate through the buffer pool without any locking at all. I have tried to think of possible race conditions in the case of no locking, i.e. can the pointer become valid again somehow, by some uncompressed page starting pointing to this zip page. And the transition of the given zip page pointer unallocated -> allocated is protected by zip_free_mutex. Thus, as long as all its callers hold zip_free_mutex and it's only called with invalid pointer values to assert that FALSE is returned, buf_pool_contains_zip() itself does not need any locking. If my reasoning is correct, I can remove the assert from this function. But maybe this should be documented better somehow?

> In fact, in one of the code paths calling
> buf_pool_contains_zip() we assert that zip_mutex is NOT locked. Don't we
> have a bug here?

buf_buddy_free_low(), right? This function is called for a compressed page that is already removed from the buffer pool, that is, no pointer to it should be live in any other thread (and in the buffer pool, as it's asserted by !buf_pool_contains_zip()). Thus it's ok not hold zip_mutex.

Revision history for this message
Alexey Kopytov (akopytov) wrote : Posted in a previous version of this proposal

Hi Laurynas,

On Fri, 13 Sep 2013 14:21:58 -0000, Laurynas Biveinis wrote:
> Alexey -
>
>>>> - Do you agree that os_atomic_load_ulint() / os_atomic_store_ulint() do
>>>> not do what they promise to do?
>>>
>>>
>>> Yes.
>>
>> OK, so we agree that naming is unfortunate.
>
>
> Vanishingly slightly so, due to the alignment issue, which I believe is mostly theoretical, but nevertheless am ready to address now.
>

It doesn't do anything to enforce atomicity, does it? I.e. the following
implementation would be equally "atomic":

ulint
os_atomic_load_ulint(ulint *ptr)
{
 printf("Hello, world!\n");
 return(*ptr);
}

>
>>>> - Do you agree that naming them os_ordered_load_ulint() /
>>>> os_ordered_store_ulint() would better reflect what they do?
>>>
>>>
>>> No.
>>
>> What would be a better naming then?
>
>
> os_atomic_load_aligned_ulint().
>

No, it doesn't do anything to enforce atomicity. That is a caller's
responsibility.

>
>> OK, then 2 followup questions:
>>
>> 1. Why do we need os_atomic_load_ulint() in buf_get_total_stat(), for
>> example? Here's an example of a valid answer: "Because that will result
>> in incorrect values being used in case ...". And some examples of
>> invalid answers: "non-cache-coherent architectures, visibility, memory
>> model, sunspots, crop circles, global warming, ...".
>
>
> We have gone through this with a disassembly example already, haven't we? We need os_atomic_load_ulint() because we don't want a dirty read. We may well decide that a dirty read there is fine and then replace it. But that's orthogonal to what is this primitive and why.
>

We need an atomic read rather than os_atomic_load_ulint() for the above
reasons. An it will be atomic without using any helper functions. Since
I see no answer in the form "Because that will result in incorrect
values being used in case ...", I assume you don't have an answer to
that question.

> Are you also objecting to mutex protection here? If not, why? Note that the three n_flush values here are completely independent.
>
> mutex_enter(&buf_pool->flush_state_mutex);
>
> pending_io += buf_pool->n_flush[BUF_FLUSH_LRU];
> pending_io += buf_pool->n_flush[BUF_FLUSH_SINGLE_PAGE];
> pending_io += buf_pool->n_flush[BUF_FLUSH_LIST];
>
> mutex_exit(&buf_pool->flush_state_mutex);
>

I'm not objecting to mutex protection in that code. Why would I?

> And please don't group some of the valid answers with looney stuff.
>
>
>> 2. Why do only 2 out of 9 values are being loaded with
>> os_atomic_load_ulint() in buf_get_total_stat()? Why don't the remaining
>> ones need them?
>
>
> Upstream reads all 9 dirtily. I replaced two of them to be clean instead. Maybe I need to replace all 9. Maybe 0. But that's again orthogonal.
>

All 9 reads are atomic. But 7 of them don't use compiler barriers
because they don't need it. Neither do the remaining 2, but you are
quite creative to avoid accepting this simple fact.

Revision history for this message
Alexey Kopytov (akopytov) wrote : Posted in a previous version of this proposal

On Sun, 15 Sep 2013 07:47:59 -0000, Laurynas Biveinis wrote:
> Alexey -
>
>
>> No, I was referring to buf_pool_contains_zip(). It traverses the buffer
>> pool and examines (but not modifies) block->page.zip.data for each
>> block. However, the patch changes the assertion in
>> buf_pool_contains_zip() to make sure that zip_free_mutex is locked,
>> rather than zip_mutex.
>
>
> Thanks for the pointer, I have reviewed buf_pool_contains_zip() further. Here, as you say, we iterate through the buffer pool uncompressed pages trying to find the uncompressed page of a given compressed page, in order to assert that no uncompressed page points to it, which can be either just taken from the buddy allocator, either be just removed from the buffer pool. In both cases it's an invalid pointer value and it seems to me that we could iterate through the buffer pool without any locking at all. I have tried to think of possible race conditions in the case of no locking, i.e. can the pointer become valid again somehow, by some uncompressed page starting pointing to this zip page. And the transition of the given zip page pointer unallocated -> allocated is protected by zip_free_mutex. Thus, as long as all its callers hold zip_free_mutex and it's only called with invalid pointer values to assert that FALSE is returned, buf_pool_contains_zip() itself does not need
 any locki
ng. If my reasoning is correct, I can remove the assert from this function. But maybe this should be documented better somehow?
>

Looks correct to me. Let's remove the zip_free_mutex assertion then?

>
>> In fact, in one of the code paths calling
>> buf_pool_contains_zip() we assert that zip_mutex is NOT locked. Don't we
>> have a bug here?
>
>
> buf_buddy_free_low(), right? This function is called for a compressed page that is already removed from the buffer pool, that is, no pointer to it should be live in any other thread (and in the buffer pool, as it's asserted by !buf_pool_contains_zip()). Thus it's ok not hold zip_mutex.
>

OK, thanks for clarification.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote : Posted in a previous version of this proposal

Alexey -

> > I have reviewed buf_pool_contains_zip() further.
...
> Looks correct to me. Let's remove the zip_free_mutex assertion then?

Yes.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote : Posted in a previous version of this proposal
Download full text (4.7 KiB)

Alexey -

> >>>> - Do you agree that os_atomic_load_ulint() / os_atomic_store_ulint() do
> >>>> not do what they promise to do?
> >>>
> >>>
> >>> Yes.
> >>
> >> OK, so we agree that naming is unfortunate.
> >
> >
> > Vanishingly slightly so, due to the alignment issue, which I believe is
> mostly theoretical, but nevertheless am ready to address now.
> >
>
> It doesn't do anything to enforce atomicity, does it? I.e. the following
> implementation would be equally "atomic":
>
> ulint
> os_atomic_load_ulint(ulint *ptr)
> {
> printf("Hello, world!\n");
> return(*ptr);
> }

Yes, it would be equally atomic (ignoring visibility and ordering) on x86_64 as long as pointer is aligned.

> >>>> - Do you agree that naming them os_ordered_load_ulint() /
> >>>> os_ordered_store_ulint() would better reflect what they do?
> >>>
> >>>
> >>> No.
> >>
> >> What would be a better naming then?
> >
> >
> > os_atomic_load_aligned_ulint().
> >
>
> No, it doesn't do anything to enforce atomicity. That is a caller's
> responsibility.

As in, don't pass misaligned values? In that case, yes, it is a caller's responsibility not to pass misaligned values. But where would InnoDB get a misaligned pointer to ulint from that we'd wish to access atomically? Hence enforcing alignment on debug build only seemed like a reasonable compromise, but OK, that's debatable.

> >> 1. Why do we need os_atomic_load_ulint() in buf_get_total_stat(), for
> >> example? Here's an example of a valid answer: "Because that will result
> >> in incorrect values being used in case ...". And some examples of
> >> invalid answers: "non-cache-coherent architectures, visibility, memory
> >> model, sunspots, crop circles, global warming, ...".
> >
> >
> > We have gone through this with a disassembly example already, haven't we?
> We need os_atomic_load_ulint() because we don't want a dirty read. We may
> well decide that a dirty read there is fine and then replace it. But that's
> orthogonal to what is this primitive and why.
> >
>
> We need an atomic read rather than os_atomic_load_ulint() for the above
> reasons. An it will be atomic without using any helper functions.

OK, so is the problem that I wanted to introduce the primitives for such access, that would also document how is the variable accessed, and that they don't have to do much besides a compiler barrier on x86_64?

> Since
> I see no answer in the form "Because that will result in incorrect
> values being used in case ...", I assume you don't have an answer to
> that question.

I know that they are not resulting in incorrect values currently, and that the worst can happen in x86_64 with the most of possible future code changes is that the value loads could be moved earlier, resulting in more out-of-date values used. This and the fact that accessing the variable through the primitive serves as self-documentation seem good enough reasons to me.

> > Are you also objecting to mutex protection here? If not, why? Note that the
> three n_flush values here are completely independent.
> >
> > mutex_enter(&buf_pool->flush_state_mutex);
> >
> > pending_io += buf_pool->n_flush[BUF_FLUSH_...

Read more...

Revision history for this message
Alexey Kopytov (akopytov) wrote : Posted in a previous version of this proposal
Download full text (5.6 KiB)

On Mon, 16 Sep 2013 09:05:39 -0000, Laurynas Biveinis wrote:
> Alexey -
>
>
>>>>>> - Do you agree that os_atomic_load_ulint() / os_atomic_store_ulint() do
>>>>>> not do what they promise to do?
>>>>>
>>>>>
>>>>> Yes.
>>>>
>>>> OK, so we agree that naming is unfortunate.
>>>
>>>
>>> Vanishingly slightly so, due to the alignment issue, which I believe is
>> mostly theoretical, but nevertheless am ready to address now.
>>>
>>
>> It doesn't do anything to enforce atomicity, does it? I.e. the following
>> implementation would be equally "atomic":
>>
>> ulint
>> os_atomic_load_ulint(ulint *ptr)
>> {
>> printf("Hello, world!\n");
>> return(*ptr);
>> }
>
>
> Yes, it would be equally atomic (ignoring visibility and ordering) on x86_64 as long as pointer is aligned.
>

So... we don't need it after all?

>
>>>>>> - Do you agree that naming them os_ordered_load_ulint() /
>>>>>> os_ordered_store_ulint() would better reflect what they do?
>>>>>
>>>>>
>>>>> No.
>>>>
>>>> What would be a better naming then?
>>>
>>>
>>> os_atomic_load_aligned_ulint().
>>>
>>
>> No, it doesn't do anything to enforce atomicity. That is a caller's
>> responsibility.
>
>
> As in, don't pass misaligned values? In that case, yes, it is a caller's responsibility not to pass misaligned values. But where would InnoDB get a misaligned pointer to ulint from that we'd wish to access atomically? Hence enforcing alignment on debug build only seemed like a reasonable compromise, but OK, that's debatable.
>
>
>>>> 1. Why do we need os_atomic_load_ulint() in buf_get_total_stat(), for
>>>> example? Here's an example of a valid answer: "Because that will result
>>>> in incorrect values being used in case ...". And some examples of
>>>> invalid answers: "non-cache-coherent architectures, visibility, memory
>>>> model, sunspots, crop circles, global warming, ...".
>>>
>>>
>>> We have gone through this with a disassembly example already, haven't we?
>> We need os_atomic_load_ulint() because we don't want a dirty read. We may
>> well decide that a dirty read there is fine and then replace it. But that's
>> orthogonal to what is this primitive and why.
>>>
>>
>> We need an atomic read rather than os_atomic_load_ulint() for the above
>> reasons. An it will be atomic without using any helper functions.
>
>
> OK, so is the problem that I wanted to introduce the primitives for such access, that would also document how is the variable accessed, and that they don't have to do much besides a compiler barrier on x86_64?
>

Yes, the problem is that you introduce primitives that basically do
nothing, and then use those primitives unnecessarily and inconsistently.
Which in turn leads to blowing up the patch size, increased maintenance
burden, and opens the door for wrong assumptions made when reading the
existing code and implementing new code.

>
>> Since
>> I see no answer in the form "Because that will result in incorrect
>> values being used in case ...", I assume you don't have an answer to
>> that question.
>
>
> I know that they are not resulting in incorrect values currently, and that the worst can happen in x86_64 with the most of possible future code changes is that the va...

Read more...

Revision history for this message
Alexey Kopytov (akopytov) wrote : Posted in a previous version of this proposal
Download full text (5.1 KiB)

More comments on the patch:

  - typo (double “get”) in the updated buf_LRU_free_page() comments:
   “function returns false, the buf_page_get_get_mutex() might be temporarily”

  - what’s the reason for changing buf_page_t::in_LRU_list to be present
    in release builds? Unless I’m missing something in the current code,
    it is only assigned, but never read in release builds?

  - spurious blank line changes in buf_buddy_relocate(),
    buf_buddy_free_low(), buf_pool_watch_set(), buf_page_get_gen(),
    buf_page_init_for_read(), buf_pool_validate_instance(),
    buf_flush_check_neighbor(), buf_flush_LRU_list_batch(), buf_LRU_drop_page_hash_for_tablespace()
    and innodb_buffer_pool_evict_uncompressed().

  - the following change in buf_block_try_discard_uncompressed() does
    not release block_mutex if buf_LRU_free_page() returns false.

  bpage = buf_page_hash_get(buf_pool, space, offset);

  if (bpage) {
- buf_LRU_free_page(bpage, false);
+
+ ib_mutex_t* block_mutex = buf_page_get_mutex(bpage);
+
+ mutex_enter(block_mutex);
+
+ if (buf_LRU_free_page(bpage, false)) {
+
+ mutex_exit(block_mutex);
+ return;
+ }
  }

  - do you really need this change?

@@ -2114,8 +2098,8 @@ buf_page_get_zip(
   break;
  case BUF_BLOCK_ZIP_PAGE:
  case BUF_BLOCK_ZIP_DIRTY:
+ buf_enter_zip_mutex_for_page(bpage);
   block_mutex = &buf_pool->zip_mutex;
- mutex_enter(block_mutex);
   bpage->buf_fix_count++;
   goto got_block;
  case BUF_BLOCK_FILE_PAGE:

  - in the following change:

@@ -2721,13 +2707,14 @@ buf_page_get_gen(
   }

   bpage = &block->page;
+ ut_ad(buf_own_zip_mutex_for_page(bpage));

   if (bpage->buf_fix_count
       || buf_page_get_io_fix(bpage) != BUF_IO_NONE) {
    /* This condition often occurs when the buffer
    is not buffer-fixed, but I/O-fixed by
    buf_page_init_for_read(). */
- mutex_exit(block_mutex);
+ mutex_exit(&buf_pool->zip_mutex);
 wait_until_unfixed:
    /* The block is buffer-fixed or I/O-fixed.
    Try again later. */

     is there a reason for replacing block_mutex with zip_mutex? you
     could just assert that block_mutex is zip_mutex next to, or even
     instead of the buf_own_zip_mutex_for_page() call.

  - same comments for this change:

@@ -2737,11 +2724,11 @@ buf_page_get_gen(
   }

   /* Allocate an uncompressed page. */
- mutex_exit(block_mutex);
+ mutex_exit(&buf_pool->zip_mutex);
   block = buf_LRU_get_free_block(buf_pool);
   ut_a(block);

- buf_pool_mutex_enter(buf_pool);
+ mutex_enter(&buf_pool->LRU_list_mutex);

   /* As we have released the page_hash lock and the
   block_mutex to allocate an uncompressed page it is

  - in buf_mark_space_corrupt() I see LRU_list_mutex, hash_lock and
    buf_page_get_mutex() being acquired, but only LRU_list_mutex and
    buf_page_get_mutex() being released, i.e. it returns with hash_lock
    acquired?

  - it looks like with the changes in buf_page_io_complete() for io_type
    == BUF_IO_WRITE we don’t set io_fix to BUF_IO_NONE, though we did
    before the changes?

  - more spurious changes:

@@ -475,8 +473,8 @@ buf_flush_insert_sorted_into_flush_list(
  if (prev_b == NULL) {
   UT_LIST_ADD_FIRST(list, buf_pool->flush_list, &block->page);
...

Read more...

review: Needs Fixing
Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote : Posted in a previous version of this proposal

Alexey -

Replying to this bit separately as it may need further discussion while I am addressing the rest of comments.

> - what’s the reason for changing buf_page_t::in_LRU_list to be present
> in release builds? Unless I’m missing something in the current code,
> it is only assigned, but never read in release builds?

This is another "automerge" from 5.5 and indeed serves no release
build purpose in the current 5.6 code. But it points out a
non-trivial thing. In 5.5 it is used as follows:
-) for checking whether a given page is still on the LRU list if both
   block and LRU mutexes were temporarily released:
   buf_page_get_zip(), buf_LRU_free_block() (both are different code
   from 5.6).
-) for iterating through the LRU list without holding the LRU list
   mutex at all: buf_LRU_free_from_common_LRU_list(),
   buf_LRU_free_block(),
   buf_flush_LRU_recommendation()/buf_flush_ready_for_replace(). I
   think this is unsafe and a bug in 5.5 due to page relocations
   potentially resulting in wild pointers, even if it does wonders for
   the LRU list contention. 5.6 holds the LRU list mutex in the corresponding code.
-) redundant checks, ie. on LRU list iteration where the mutex is not
   released: buf_LRU_insert_zip_clean(),
   buf_LRU_free_from_unzip_LRU_list().

Thus I think 1) in_LRU_list changes should be reverted now. 2) 5.5
might need fixing. 3) The LRU list mutex is hot in 5.6. If there is
a safe way not to hold it in 5.6 (for example, for
BUF_BLOCK_FILE_PAGE, but hard to tell the page type without
dereferencing page pointer - maybe by comparing the page address
against buffer pool chunk address range?), then it's worth looking
into it.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote : Posted in a previous version of this proposal
Download full text (5.3 KiB)

Alexey -

> - typo (double “get”) in the updated buf_LRU_free_page() comments:
> “function returns false, the buf_page_get_get_mutex() might be temporarily”

Fixed.

> - what’s the reason for changing buf_page_t::in_LRU_list to be present
> in release builds? Unless I’m missing something in the current code,
> it is only assigned, but never read in release builds?

in_LRU_list changes have been reverted with the exception of an extra
assert in buf_page_set_sticky(). I also found that in_unzip_LRU_list
was converted to a release build flag but no uses were converted.
Reverted that too.

> - spurious blank line changes in buf_buddy_relocate(),
> buf_buddy_free_low(), buf_pool_watch_set(),

Fixed.

> buf_page_get_gen(),

I didn't find this one. Will re-check before the final push.

> buf_page_init_for_read(), buf_pool_validate_instance(),
> buf_flush_check_neighbor(), buf_flush_LRU_list_batch(),
> buf_LRU_drop_page_hash_for_tablespace()
> and innodb_buffer_pool_evict_uncompressed().

Fixed. Also removed a diagnostic printf from
buf_pool_validate_instance().

> - the following change in buf_block_try_discard_uncompressed() does
> not release block_mutex if buf_LRU_free_page() returns false.

Fixed.

> - do you really need this change?
>
> @@ -2114,8 +2098,8 @@ buf_page_get_zip(
> break;
> case BUF_BLOCK_ZIP_PAGE:
> case BUF_BLOCK_ZIP_DIRTY:
> + buf_enter_zip_mutex_for_page(bpage);
> block_mutex = &buf_pool->zip_mutex;
> - mutex_enter(block_mutex);
> bpage->buf_fix_count++;
> goto got_block;
> case BUF_BLOCK_FILE_PAGE:

No, I don't. A debugging leftover, reverted.

> - in the following change:
>
> @@ -2721,13 +2707,14 @@ buf_page_get_gen(
> }
>
> bpage = &block->page;
> + ut_ad(buf_own_zip_mutex_for_page(bpage));
>
> if (bpage->buf_fix_count
> || buf_page_get_io_fix(bpage) != BUF_IO_NONE) {
> /* This condition often occurs when the buffer
> is not buffer-fixed, but I/O-fixed by
> buf_page_init_for_read(). */
> - mutex_exit(block_mutex);
> + mutex_exit(&buf_pool->zip_mutex);
> wait_until_unfixed:
> /* The block is buffer-fixed or I/O-fixed.
> Try again later. */
>
> is there a reason for replacing block_mutex with zip_mutex? you
> could just assert that block_mutex is zip_mutex next to, or even
> instead of the buf_own_zip_mutex_for_page() call.

Yes. Replaced buf_own_zip_mutex_for_page() with ut_ad(block_mutex ==
&buf_pool->zip_mutex), which also happens to be symmetric with "!="
assert above for BUF_BLOCK_FILE_PAGE. Reverted the mutex_exit change
too.

> - same comments for this change:
>
> @@ -2737,11 +2724,11 @@ buf_page_get_gen(
> }
>
> /* Allocate an uncompressed page. */
> - mutex_exit(block_mutex);
> + mutex_exit(&buf_pool->zip_mutex);...

Read more...

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote : Posted in a previous version of this proposal
Download full text (4.4 KiB)

buf_enter_zip_mutex_for_page(buf_page_t *) is a bad idea, unless a buffer pool pointer is passed too, which then makes it an overkill, because it dereferences an unprotected bpage pointer, resulting in the below. Will remove it and replace back with mutex_enter(&buf_pool->zip_mutex).

http://jenkins.percona.com/job/percona-server-5.6-param/273/BUILD_TYPE=debug,Host=debian-wheezy-x64/testReport/junit/%28root%29/innodb/innodb_wl5522_zip/

Thread 1 (Thread 0x7f3acebfe700 (LWP 321)):
#0 __pthread_kill (threadid=<optimized out>, signo=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/pthread_kill.c:63
#1 0x0000000000ae2f45 in my_write_core (sig=6) at /mnt/workspace/percona-server-5.6-param/BUILD_TYPE/debug/Host/debian-wheezy-x64/Percona-Server/mysys/stacktrace.c:422
#2 0x000000000075eaa9 in handle_fatal_signal (sig=6) at /mnt/workspace/percona-server-5.6-param/BUILD_TYPE/debug/Host/debian-wheezy-x64/Percona-Server/sql/signal_handler.cc:251
#3 <signal handler called>
#4 0x00007f3ae835c475 in *__GI_raise (sig=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#5 0x00007f3ae835f6f0 in *__GI_abort () at abort.c:92
#6 0x0000000000d9b44c in buf_pool_from_bpage (bpage=0x33912c0) at /mnt/workspace/percona-server-5.6-param/BUILD_TYPE/debug/Host/debian-wheezy-x64/Percona-Server/storage/innobase/include/buf0buf.ic:88
#7 0x0000000000d9cb3c in buf_enter_zip_mutex_for_page (bpage=0x33912c0) at /mnt/workspace/percona-server-5.6-param/BUILD_TYPE/debug/Host/debian-wheezy-x64/Percona-Server/storage/innobase/include/buf0buf.ic:1412
#8 0x0000000000da2e5b in buf_page_get_gen (space=431, zip_size=8192, offset=9, rw_latch=1, guess=0x0, mode=10, file=0x1109668 "/mnt/workspace/percona-server-5.6-param/BUILD_TYPE/debug/Host/debian-wheezy-x64/Percona-Server/storage/innobase/btr/btr0pcur.cc", line=438, mtr=0x7f3acebfd5e0) at /mnt/workspace/percona-server-5.6-param/BUILD_TYPE/debug/Host/debian-wheezy-x64/Percona-Server/storage/innobase/buf/buf0buf.cc:2743
#9 0x0000000000d8c45a in btr_block_get_func (space=431, zip_size=8192, page_no=9, mode=1, file=0x1109668 "/mnt/workspace/percona-server-5.6-param/BUILD_TYPE/debug/Host/debian-wheezy-x64/Percona-Server/storage/innobase/btr/btr0pcur.cc", line=438, index=0x3393d18, mtr=0x7f3acebfd5e0) at /mnt/workspace/percona-server-5.6-param/BUILD_TYPE/debug/Host/debian-wheezy-x64/Percona-Server/storage/innobase/include/btr0btr.ic:60
#10 0x0000000000d8df82 in btr_pcur_move_to_next_page (cursor=0x7f3acebfd420, mtr=0x7f3acebfd5e0) at /mnt/workspace/percona-server-5.6-param/BUILD_TYPE/debug/Host/debian-wheezy-x64/Percona-Server/storage/innobase/btr/btr0pcur.cc:438
#11 0x0000000000df3a47 in btr_pcur_move_to_next_user_rec (cursor=0x7f3acebfd420, mtr=0x7f3acebfd5e0) at /mnt/workspace/percona-server-5.6-param/BUILD_TYPE/debug/Host/debian-wheezy-x64/Percona-Server/storage/innobase/include/btr0pcur.ic:323
#12 0x0000000000df5a6e in dict_stats_analyze_index_level (index=0x3393d18, level=0, n_diff=0x3496418, total_recs=0x7f3acebfdac0, total_pages=0x7f3acebfdab8, n_diff_boundaries=0x0, mtr=0x7f3acebfd5e0) at /mnt/workspace/percona-server-5.6-param/BUILD_TYPE/debug/Host/debian-wheezy-x64/Percona-Server/storage/innobase/dict/dict0s...

Read more...

review: Needs Fixing
Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote : Posted in a previous version of this proposal
Download full text (6.9 KiB)

Alexey -

> >>>>>> - Do you agree that os_atomic_load_ulint() / os_atomic_store_ulint() do
> >>>>>> not do what they promise to do?
> >>>>>
> >>>>>
> >>>>> Yes.
> >>>>
> >>>> OK, so we agree that naming is unfortunate.
> >>>
> >>>
> >>> Vanishingly slightly so, due to the alignment issue, which I believe is
> >> mostly theoretical, but nevertheless am ready to address now.
> >>>
> >>
> >> It doesn't do anything to enforce atomicity, does it? I.e. the following
> >> implementation would be equally "atomic":
> >>
> >> ulint
> >> os_atomic_load_ulint(ulint *ptr)
> >> {
> >> printf("Hello, world!\n");
> >> return(*ptr);
> >> }
> >
> >
> > Yes, it would be equally atomic (ignoring visibility and ordering) on x86_64
> as long as pointer is aligned.
> >
>
> So... we don't need it after all?

But we don't want to ignore visibility and ordering.

> >>>> 1. Why do we need os_atomic_load_ulint() in buf_get_total_stat(), for
> >>>> example? Here's an example of a valid answer: "Because that will result
> >>>> in incorrect values being used in case ...". And some examples of
> >>>> invalid answers: "non-cache-coherent architectures, visibility, memory
> >>>> model, sunspots, crop circles, global warming, ...".
> >>>
> >>>
> >>> We have gone through this with a disassembly example already, haven't we?
> >> We need os_atomic_load_ulint() because we don't want a dirty read. We may
> >> well decide that a dirty read there is fine and then replace it. But
> that's
> >> orthogonal to what is this primitive and why.
> >>>
> >>
> >> We need an atomic read rather than os_atomic_load_ulint() for the above
> >> reasons. An it will be atomic without using any helper functions.
> >
> >
> > OK, so is the problem that I wanted to introduce the primitives for such
> access, that would also document how is the variable accessed, and that they
> don't have to do much besides a compiler barrier on x86_64?
> >
>
> Yes, the problem is that you introduce primitives that basically do
> nothing, and then use those primitives unnecessarily and inconsistently.
> Which in turn leads to blowing up the patch size, increased maintenance
> burden, and opens the door for wrong assumptions made when reading the
> existing code and implementing new code.

OK, now I see your concerns, and understand a big part of them but not all of them. What wrong assumptions are encouraged by the primitives?

> >> Since
> >> I see no answer in the form "Because that will result in incorrect
> >> values being used in case ...", I assume you don't have an answer to
> >> that question.
> >
> >
> > I know that they are not resulting in incorrect values currently, and that
> the worst can happen in x86_64 with the most of possible future code changes
> is that the value loads could be moved earlier, resulting in more out-of-date
> values used. This and the fact that accessing the variable through the
> primitive serves as self-documentation seem good enough reasons to me.
> >
>
> Whether they are more "out-of-date" or less "out-of-date" depends on the
> definition of "date". By defining "date" as the "point in time when the
> function is called", "more out-of-date" can be easily...

Read more...

Revision history for this message
Alexey Kopytov (akopytov) wrote : Posted in a previous version of this proposal

Hi Laurynas,

On Tue, 17 Sep 2013 12:41:18 -0000, Laurynas Biveinis wrote:
> Alexey -
>
>
>>>>>>>> - Do you agree that os_atomic_load_ulint() / os_atomic_store_ulint() do
>>>>>>>> not do what they promise to do?
>>>>>>>
>>>>>>>
>>>>>>> Yes.
>>>>>>
>>>>>> OK, so we agree that naming is unfortunate.
>>>>>
>>>>>
>>>>> Vanishingly slightly so, due to the alignment issue, which I believe is
>>>> mostly theoretical, but nevertheless am ready to address now.
>>>>>
>>>>
>>>> It doesn't do anything to enforce atomicity, does it? I.e. the following
>>>> implementation would be equally "atomic":
>>>>
>>>> ulint
>>>> os_atomic_load_ulint(ulint *ptr)
>>>> {
>>>> printf("Hello, world!\n");
>>>> return(*ptr);
>>>> }
>>>
>>>
>>> Yes, it would be equally atomic (ignoring visibility and ordering) on x86_64
>> as long as pointer is aligned.
>>>
>>
>> So... we don't need it after all?
>
>
> But we don't want to ignore visibility and ordering.
>

Yeah, you forgot non-cache-coherent architectures.

This discussion has been running in circles for almost a week now, and I
have a feeling you deliberately keep it this way. Since I have not been
presented any technical arguments for keeping that code, and have better
things to do, I'm going to wrap up the democrazy and stop it forcefully.

I will not approve this MP with "atomic" primitives present in the code.
The discussion is over.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote : Posted in a previous version of this proposal

Alexey -

> This discussion has been running in circles for almost a week now, and I
> have a feeling you deliberately keep it this way. Since I have not been
> presented any technical arguments for keeping that code, and have better
> things to do, I'm going to wrap up the democrazy and stop it forcefully.
>
> I will not approve this MP with "atomic" primitives present in the code.
> The discussion is over.

I did not do anything to deserve this kind of treatment. Your feeling on me keeping this deliberately is plain wrong (What do I have to gain? Minus one week of my copious time? Smug feeling of being right?). I have addressed every single your comment without hesitation and coming from an assumption that you are right in this MP and tens of MPs before.

I have tried my best to explain why the code is correct. You seem to disagree but I have trouble understanding why. I am well within in my rights to ask you to explain further and the burden is on you to show why the code is wrong. Hence your refusal to continue the review is stalling the review right now. Please continue the technical discussion.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote : Posted in a previous version of this proposal

Repushed. Changes from the 2nd push. Not a resubmission yet.

    - Removed ut_ad(mutex_own(&buf_pool->zip_free_mutex)) from
      buf_pool_contains_zip().
    - Fixed comment typos.
    - Reverted in_LRU_list and in_unzip_LRU_list changes, with the
      exception of an extra assert in buf_page_set_sticky().
    - Reverted spurious whitespace changes.
    - Removed spurious diagnostic printf from
      buf_pool_validate_instance().
    - Fixed locking in buf_block_try_discard_uncompressed().
    - Reverted redundant locking changes in buf_page_get_zip() and
      buf_page_get_gen().
    - Removed buf_enter_zip_mutex_for_page().
    - Removed os_atomic_load_ulint() uses from from
      buf_get_total_stat(), buf_print_instance(),
      buf_stat_get_pool_info(), buf_LRU_evict_from_unzip_LRU(), the
      last instance in buf_read_recv_pages(), and.
      srv_mon_process_existing_counter().

Revision history for this message
Alexey Kopytov (akopytov) :
review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'Percona-Server/storage/innobase/btr/btr0cur.cc'
--- Percona-Server/storage/innobase/btr/btr0cur.cc 2013-09-06 13:40:39 +0000
+++ Percona-Server/storage/innobase/btr/btr0cur.cc 2013-09-20 05:29:11 +0000
@@ -4503,12 +4503,14 @@
4503 buf_pool_t* buf_pool = buf_pool_from_block(block);4503 buf_pool_t* buf_pool = buf_pool_from_block(block);
4504 ulint space = buf_block_get_space(block);4504 ulint space = buf_block_get_space(block);
4505 ulint page_no = buf_block_get_page_no(block);4505 ulint page_no = buf_block_get_page_no(block);
4506 bool freed = false;
45064507
4507 ut_ad(mtr_memo_contains(mtr, block, MTR_MEMO_PAGE_X_FIX));4508 ut_ad(mtr_memo_contains(mtr, block, MTR_MEMO_PAGE_X_FIX));
45084509
4509 mtr_commit(mtr);4510 mtr_commit(mtr);
45104511
4511 buf_pool_mutex_enter(buf_pool);4512 mutex_enter(&buf_pool->LRU_list_mutex);
4513 mutex_enter(&block->mutex);
45124514
4513 /* Only free the block if it is still allocated to4515 /* Only free the block if it is still allocated to
4514 the same file page. */4516 the same file page. */
@@ -4518,16 +4520,26 @@
4518 && buf_block_get_space(block) == space4520 && buf_block_get_space(block) == space
4519 && buf_block_get_page_no(block) == page_no) {4521 && buf_block_get_page_no(block) == page_no) {
45204522
4521 if (!buf_LRU_free_page(&block->page, all)4523 freed = buf_LRU_free_page(&block->page, all);
4522 && all && block->page.zip.data) {4524
4525 if (!freed && all && block->page.zip.data
4526 /* Now, buf_LRU_free_page() may release mutexes
4527 temporarily */
4528 && buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE
4529 && buf_block_get_space(block) == space
4530 && buf_block_get_page_no(block) == page_no) {
4531
4523 /* Attempt to deallocate the uncompressed page4532 /* Attempt to deallocate the uncompressed page
4524 if the whole block cannot be deallocted. */4533 if the whole block cannot be deallocted. */
45254534 freed = buf_LRU_free_page(&block->page, false);
4526 buf_LRU_free_page(&block->page, false);
4527 }4535 }
4528 }4536 }
45294537
4530 buf_pool_mutex_exit(buf_pool);4538 if (!freed) {
4539 mutex_exit(&buf_pool->LRU_list_mutex);
4540 }
4541
4542 mutex_exit(&block->mutex);
4531}4543}
45324544
4533/*******************************************************************//**4545/*******************************************************************//**
45344546
=== modified file 'Percona-Server/storage/innobase/btr/btr0sea.cc'
--- Percona-Server/storage/innobase/btr/btr0sea.cc 2013-09-06 13:40:39 +0000
+++ Percona-Server/storage/innobase/btr/btr0sea.cc 2013-09-20 05:29:11 +0000
@@ -1922,19 +1922,15 @@
19221922
1923 rec_offs_init(offsets_);1923 rec_offs_init(offsets_);
19241924
1925 buf_pool_mutex_enter_all();
1926
1927 cell_count = hash_get_n_cells(btr_search_sys->hash_tables[t]);1925 cell_count = hash_get_n_cells(btr_search_sys->hash_tables[t]);
19281926
1929 for (i = 0; i < cell_count; i++) {1927 for (i = 0; i < cell_count; i++) {
1930 /* We release btr_search_latch every once in a while to1928 /* We release btr_search_latch every once in a while to
1931 give other queries a chance to run. */1929 give other queries a chance to run. */
1932 if ((i != 0) && ((i % chunk_size) == 0)) {1930 if ((i != 0) && ((i % chunk_size) == 0)) {
1933 buf_pool_mutex_exit_all();
1934 btr_search_x_unlock_all();1931 btr_search_x_unlock_all();
1935 os_thread_yield();1932 os_thread_yield();
1936 btr_search_x_lock_all();1933 btr_search_x_lock_all();
1937 buf_pool_mutex_enter_all();
1938 }1934 }
19391935
1940 node = (ha_node_t*)1936 node = (ha_node_t*)
@@ -1942,7 +1938,7 @@
1942 i)->node;1938 i)->node;
19431939
1944 for (; node != NULL; node = node->next) {1940 for (; node != NULL; node = node->next) {
1945 const buf_block_t* block1941 buf_block_t* block
1946 = buf_block_align((byte*) node->data);1942 = buf_block_align((byte*) node->data);
1947 const buf_block_t* hash_block;1943 const buf_block_t* hash_block;
1948 buf_pool_t* buf_pool;1944 buf_pool_t* buf_pool;
@@ -1983,6 +1979,8 @@
1983 == BUF_BLOCK_REMOVE_HASH);1979 == BUF_BLOCK_REMOVE_HASH);
1984 }1980 }
19851981
1982 mutex_enter(&block->mutex);
1983
1986 ut_a(!dict_index_is_ibuf(block->index));1984 ut_a(!dict_index_is_ibuf(block->index));
19871985
1988 page_index_id = btr_page_get_index_id(block->frame);1986 page_index_id = btr_page_get_index_id(block->frame);
@@ -2038,6 +2036,8 @@
2038 n_page_dumps++;2036 n_page_dumps++;
2039 }2037 }
2040 }2038 }
2039
2040 mutex_exit(&block->mutex);
2041 }2041 }
2042 }2042 }
20432043
@@ -2047,11 +2047,9 @@
2047 /* We release btr_search_latch every once in a while to2047 /* We release btr_search_latch every once in a while to
2048 give other queries a chance to run. */2048 give other queries a chance to run. */
2049 if (i != 0) {2049 if (i != 0) {
2050 buf_pool_mutex_exit_all();
2051 btr_search_x_unlock_all();2050 btr_search_x_unlock_all();
2052 os_thread_yield();2051 os_thread_yield();
2053 btr_search_x_lock_all();2052 btr_search_x_lock_all();
2054 buf_pool_mutex_enter_all();
2055 }2053 }
20562054
2057 if (!ha_validate(btr_search_sys->hash_tables[t], i,2055 if (!ha_validate(btr_search_sys->hash_tables[t], i,
@@ -2060,7 +2058,6 @@
2060 }2058 }
2061 }2059 }
20622060
2063 buf_pool_mutex_exit_all();
2064 if (UNIV_LIKELY_NULL(heap)) {2061 if (UNIV_LIKELY_NULL(heap)) {
2065 mem_heap_free(heap);2062 mem_heap_free(heap);
2066 }2063 }
20672064
=== modified file 'Percona-Server/storage/innobase/buf/buf0buddy.cc'
--- Percona-Server/storage/innobase/buf/buf0buddy.cc 2013-08-06 15:16:34 +0000
+++ Percona-Server/storage/innobase/buf/buf0buddy.cc 2013-09-20 05:29:11 +0000
@@ -205,7 +205,7 @@
205{205{
206 const ulint size = BUF_BUDDY_LOW << i;206 const ulint size = BUF_BUDDY_LOW << i;
207207
208 ut_ad(buf_pool_mutex_own(buf_pool));208 ut_ad(mutex_own(&buf_pool->zip_free_mutex));
209 ut_ad(!ut_align_offset(buf, size));209 ut_ad(!ut_align_offset(buf, size));
210 ut_ad(i >= buf_buddy_get_slot(UNIV_ZIP_SIZE_MIN));210 ut_ad(i >= buf_buddy_get_slot(UNIV_ZIP_SIZE_MIN));
211211
@@ -278,7 +278,7 @@
278 ulint i) /*!< in: index of278 ulint i) /*!< in: index of
279 buf_pool->zip_free[] */279 buf_pool->zip_free[] */
280{280{
281 ut_ad(buf_pool_mutex_own(buf_pool));281 ut_ad(mutex_own(&buf_pool->zip_free_mutex));
282 ut_ad(buf_pool->zip_free[i].start != buf);282 ut_ad(buf_pool->zip_free[i].start != buf);
283283
284 buf_buddy_stamp_free(buf, i);284 buf_buddy_stamp_free(buf, i);
@@ -297,7 +297,7 @@
297 ulint i) /*!< in: index of297 ulint i) /*!< in: index of
298 buf_pool->zip_free[] */298 buf_pool->zip_free[] */
299{299{
300 ut_ad(buf_pool_mutex_own(buf_pool));300 ut_ad(mutex_own(&buf_pool->zip_free_mutex));
301 ut_ad(buf_buddy_check_free(buf_pool, buf, i));301 ut_ad(buf_buddy_check_free(buf_pool, buf, i));
302302
303 UT_LIST_REMOVE(list, buf_pool->zip_free[i], buf);303 UT_LIST_REMOVE(list, buf_pool->zip_free[i], buf);
@@ -316,7 +316,7 @@
316{316{
317 buf_buddy_free_t* buf;317 buf_buddy_free_t* buf;
318318
319 ut_ad(buf_pool_mutex_own(buf_pool));319 ut_ad(mutex_own(&buf_pool->zip_free_mutex));
320 ut_a(i < BUF_BUDDY_SIZES);320 ut_a(i < BUF_BUDDY_SIZES);
321 ut_a(i >= buf_buddy_get_slot(UNIV_ZIP_SIZE_MIN));321 ut_a(i >= buf_buddy_get_slot(UNIV_ZIP_SIZE_MIN));
322322
@@ -369,10 +369,11 @@
369 buf_page_t* bpage;369 buf_page_t* bpage;
370 buf_block_t* block;370 buf_block_t* block;
371371
372 ut_ad(buf_pool_mutex_own(buf_pool));
373 ut_ad(!mutex_own(&buf_pool->zip_mutex));372 ut_ad(!mutex_own(&buf_pool->zip_mutex));
374 ut_a(!ut_align_offset(buf, UNIV_PAGE_SIZE));373 ut_a(!ut_align_offset(buf, UNIV_PAGE_SIZE));
375374
375 mutex_enter(&buf_pool->zip_hash_mutex);
376
376 HASH_SEARCH(hash, buf_pool->zip_hash, fold, buf_page_t*, bpage,377 HASH_SEARCH(hash, buf_pool->zip_hash, fold, buf_page_t*, bpage,
377 ut_ad(buf_page_get_state(bpage) == BUF_BLOCK_MEMORY378 ut_ad(buf_page_get_state(bpage) == BUF_BLOCK_MEMORY
378 && bpage->in_zip_hash && !bpage->in_page_hash),379 && bpage->in_zip_hash && !bpage->in_page_hash),
@@ -384,6 +385,8 @@
384 ut_d(bpage->in_zip_hash = FALSE);385 ut_d(bpage->in_zip_hash = FALSE);
385 HASH_DELETE(buf_page_t, hash, buf_pool->zip_hash, fold, bpage);386 HASH_DELETE(buf_page_t, hash, buf_pool->zip_hash, fold, bpage);
386387
388 mutex_exit(&buf_pool->zip_hash_mutex);
389
387 ut_d(memset(buf, 0, UNIV_PAGE_SIZE));390 ut_d(memset(buf, 0, UNIV_PAGE_SIZE));
388 UNIV_MEM_INVALID(buf, UNIV_PAGE_SIZE);391 UNIV_MEM_INVALID(buf, UNIV_PAGE_SIZE);
389392
@@ -406,7 +409,6 @@
406{409{
407 buf_pool_t* buf_pool = buf_pool_from_block(block);410 buf_pool_t* buf_pool = buf_pool_from_block(block);
408 const ulint fold = BUF_POOL_ZIP_FOLD(block);411 const ulint fold = BUF_POOL_ZIP_FOLD(block);
409 ut_ad(buf_pool_mutex_own(buf_pool));
410 ut_ad(!mutex_own(&buf_pool->zip_mutex));412 ut_ad(!mutex_own(&buf_pool->zip_mutex));
411 ut_ad(buf_block_get_state(block) == BUF_BLOCK_READY_FOR_USE);413 ut_ad(buf_block_get_state(block) == BUF_BLOCK_READY_FOR_USE);
412414
@@ -418,7 +420,10 @@
418 ut_ad(!block->page.in_page_hash);420 ut_ad(!block->page.in_page_hash);
419 ut_ad(!block->page.in_zip_hash);421 ut_ad(!block->page.in_zip_hash);
420 ut_d(block->page.in_zip_hash = TRUE);422 ut_d(block->page.in_zip_hash = TRUE);
423
424 mutex_enter(&buf_pool->zip_hash_mutex);
421 HASH_INSERT(buf_page_t, hash, buf_pool->zip_hash, fold, &block->page);425 HASH_INSERT(buf_page_t, hash, buf_pool->zip_hash, fold, &block->page);
426 mutex_exit(&buf_pool->zip_hash_mutex);
422427
423 ut_d(buf_pool->buddy_n_frames++);428 ut_d(buf_pool->buddy_n_frames++);
424}429}
@@ -438,6 +443,7 @@
438 of buf_pool->zip_free[] */443 of buf_pool->zip_free[] */
439{444{
440 ulint offs = BUF_BUDDY_LOW << j;445 ulint offs = BUF_BUDDY_LOW << j;
446 ut_ad(mutex_own(&buf_pool->zip_free_mutex));
441 ut_ad(j <= BUF_BUDDY_SIZES);447 ut_ad(j <= BUF_BUDDY_SIZES);
442 ut_ad(i >= buf_buddy_get_slot(UNIV_ZIP_SIZE_MIN));448 ut_ad(i >= buf_buddy_get_slot(UNIV_ZIP_SIZE_MIN));
443 ut_ad(j >= i);449 ut_ad(j >= i);
@@ -461,8 +467,8 @@
461467
462/**********************************************************************//**468/**********************************************************************//**
463Allocate a block. The thread calling this function must hold469Allocate a block. The thread calling this function must hold
464buf_pool->mutex and must not hold buf_pool->zip_mutex or any block->mutex.470buf_pool->LRU_list_mutex and must not hold buf_pool->zip_mutex or any
465The buf_pool_mutex may be released and reacquired.471block->mutex. The buf_pool->LRU_list_mutex may be released and reacquired.
466@return allocated block, never NULL */472@return allocated block, never NULL */
467UNIV_INTERN473UNIV_INTERN
468void*474void*
@@ -474,23 +480,25 @@
474 ibool* lru) /*!< in: pointer to a variable that480 ibool* lru) /*!< in: pointer to a variable that
475 will be assigned TRUE if storage was481 will be assigned TRUE if storage was
476 allocated from the LRU list and482 allocated from the LRU list and
477 buf_pool->mutex was temporarily483 buf_pool->LRU_list_mutex was
478 released */484 temporarily released */
479{485{
480 buf_block_t* block;486 buf_block_t* block;
481487
482 ut_ad(lru);488 ut_ad(lru);
483 ut_ad(buf_pool_mutex_own(buf_pool));489 ut_ad(mutex_own(&buf_pool->LRU_list_mutex));
484 ut_ad(!mutex_own(&buf_pool->zip_mutex));490 ut_ad(!mutex_own(&buf_pool->zip_mutex));
485 ut_ad(i >= buf_buddy_get_slot(UNIV_ZIP_SIZE_MIN));491 ut_ad(i >= buf_buddy_get_slot(UNIV_ZIP_SIZE_MIN));
486492
487 if (i < BUF_BUDDY_SIZES) {493 if (i < BUF_BUDDY_SIZES) {
488 /* Try to allocate from the buddy system. */494 /* Try to allocate from the buddy system. */
495 mutex_enter(&buf_pool->zip_free_mutex);
489 block = (buf_block_t*) buf_buddy_alloc_zip(buf_pool, i);496 block = (buf_block_t*) buf_buddy_alloc_zip(buf_pool, i);
490497
491 if (block) {498 if (block) {
492 goto func_exit;499 goto func_exit;
493 }500 }
501 mutex_exit(&buf_pool->zip_free_mutex);
494 }502 }
495503
496 /* Try allocating from the buf_pool->free list. */504 /* Try allocating from the buf_pool->free list. */
@@ -502,24 +510,28 @@
502 }510 }
503511
504 /* Try replacing an uncompressed page in the buffer pool. */512 /* Try replacing an uncompressed page in the buffer pool. */
505 buf_pool_mutex_exit(buf_pool);513 mutex_exit(&buf_pool->LRU_list_mutex);
506 block = buf_LRU_get_free_block(buf_pool);514 block = buf_LRU_get_free_block(buf_pool);
507 *lru = TRUE;515 *lru = TRUE;
508 buf_pool_mutex_enter(buf_pool);516 mutex_enter(&buf_pool->LRU_list_mutex);
509517
510alloc_big:518alloc_big:
511 buf_buddy_block_register(block);519 buf_buddy_block_register(block);
512520
521 mutex_enter(&buf_pool->zip_free_mutex);
513 block = (buf_block_t*) buf_buddy_alloc_from(522 block = (buf_block_t*) buf_buddy_alloc_from(
514 buf_pool, block->frame, i, BUF_BUDDY_SIZES);523 buf_pool, block->frame, i, BUF_BUDDY_SIZES);
515524
516func_exit:525func_exit:
517 buf_pool->buddy_stat[i].used++;526 buf_pool->buddy_stat[i].used++;
527 mutex_exit(&buf_pool->zip_free_mutex);
528
518 return(block);529 return(block);
519}530}
520531
521/**********************************************************************//**532/**********************************************************************//**
522Try to relocate a block.533Try to relocate a block. The caller must hold zip_free_mutex, and this
534function will release and lock it again.
523@return true if relocated */535@return true if relocated */
524static536static
525bool537bool
@@ -536,8 +548,9 @@
536 ib_mutex_t* mutex;548 ib_mutex_t* mutex;
537 ulint space;549 ulint space;
538 ulint offset;550 ulint offset;
551 rw_lock_t* hash_lock;
539552
540 ut_ad(buf_pool_mutex_own(buf_pool));553 ut_ad(mutex_own(&buf_pool->zip_free_mutex));
541 ut_ad(!mutex_own(&buf_pool->zip_mutex));554 ut_ad(!mutex_own(&buf_pool->zip_mutex));
542 ut_ad(!ut_align_offset(src, size));555 ut_ad(!ut_align_offset(src, size));
543 ut_ad(!ut_align_offset(dst, size));556 ut_ad(!ut_align_offset(dst, size));
@@ -556,7 +569,9 @@
556569
557 ut_ad(space != BUF_BUDDY_STAMP_FREE);570 ut_ad(space != BUF_BUDDY_STAMP_FREE);
558571
559 bpage = buf_page_hash_get(buf_pool, space, offset);572 mutex_exit(&buf_pool->zip_free_mutex);
573 /* Lock page hash to prevent a relocation for the target page */
574 bpage = buf_page_hash_get_s_locked(buf_pool, space, offset, &hash_lock);
560575
561 if (!bpage || bpage->zip.data != src) {576 if (!bpage || bpage->zip.data != src) {
562 /* The block has probably been freshly577 /* The block has probably been freshly
@@ -564,6 +579,10 @@
564 added to buf_pool->page_hash yet. Obviously,579 added to buf_pool->page_hash yet. Obviously,
565 it cannot be relocated. */580 it cannot be relocated. */
566581
582 if (bpage) {
583 rw_lock_s_unlock(hash_lock);
584 }
585 mutex_enter(&buf_pool->zip_free_mutex);
567 return(false);586 return(false);
568 }587 }
569588
@@ -573,6 +592,8 @@
573 For the sake of simplicity, give up. */592 For the sake of simplicity, give up. */
574 ut_ad(page_zip_get_size(&bpage->zip) < size);593 ut_ad(page_zip_get_size(&bpage->zip) < size);
575594
595 rw_lock_s_unlock(hash_lock);
596 mutex_enter(&buf_pool->zip_free_mutex);
576 return(false);597 return(false);
577 }598 }
578599
@@ -584,6 +605,10 @@
584605
585 mutex_enter(mutex);606 mutex_enter(mutex);
586607
608 rw_lock_s_unlock(hash_lock);
609
610 mutex_enter(&buf_pool->zip_free_mutex);
611
587 if (buf_page_can_relocate(bpage)) {612 if (buf_page_can_relocate(bpage)) {
588 /* Relocate the compressed page. */613 /* Relocate the compressed page. */
589 ullint usec = ut_time_us(NULL);614 ullint usec = ut_time_us(NULL);
@@ -618,17 +643,19 @@
618{643{
619 buf_buddy_free_t* buddy;644 buf_buddy_free_t* buddy;
620645
621 ut_ad(buf_pool_mutex_own(buf_pool));
622 ut_ad(!mutex_own(&buf_pool->zip_mutex));646 ut_ad(!mutex_own(&buf_pool->zip_mutex));
623 ut_ad(i <= BUF_BUDDY_SIZES);647 ut_ad(i <= BUF_BUDDY_SIZES);
624 ut_ad(i >= buf_buddy_get_slot(UNIV_ZIP_SIZE_MIN));648 ut_ad(i >= buf_buddy_get_slot(UNIV_ZIP_SIZE_MIN));
649
650 mutex_enter(&buf_pool->zip_free_mutex);
651
625 ut_ad(buf_pool->buddy_stat[i].used > 0);652 ut_ad(buf_pool->buddy_stat[i].used > 0);
626
627 buf_pool->buddy_stat[i].used--;653 buf_pool->buddy_stat[i].used--;
628recombine:654recombine:
629 UNIV_MEM_ASSERT_AND_ALLOC(buf, BUF_BUDDY_LOW << i);655 UNIV_MEM_ASSERT_AND_ALLOC(buf, BUF_BUDDY_LOW << i);
630656
631 if (i == BUF_BUDDY_SIZES) {657 if (i == BUF_BUDDY_SIZES) {
658 mutex_exit(&buf_pool->zip_free_mutex);
632 buf_buddy_block_free(buf_pool, buf);659 buf_buddy_block_free(buf_pool, buf);
633 return;660 return;
634 }661 }
@@ -695,4 +722,5 @@
695 buf_buddy_add_to_free(buf_pool,722 buf_buddy_add_to_free(buf_pool,
696 reinterpret_cast<buf_buddy_free_t*>(buf),723 reinterpret_cast<buf_buddy_free_t*>(buf),
697 i);724 i);
725 mutex_exit(&buf_pool->zip_free_mutex);
698}726}
699727
=== modified file 'Percona-Server/storage/innobase/buf/buf0buf.cc'
--- Percona-Server/storage/innobase/buf/buf0buf.cc 2013-09-02 10:01:38 +0000
+++ Percona-Server/storage/innobase/buf/buf0buf.cc 2013-09-20 05:29:11 +0000
@@ -119,24 +119,9 @@
119119
120 Buffer pool struct120 Buffer pool struct
121 ------------------121 ------------------
122The buffer buf_pool contains a single mutex which protects all the122The buffer buf_pool contains several mutexes which protect all the
123control data structures of the buf_pool. The content of a buffer frame is123control data structures of the buf_pool. The content of a buffer frame is
124protected by a separate read-write lock in its control block, though.124protected by a separate read-write lock in its control block, though.
125These locks can be locked and unlocked without owning the buf_pool->mutex.
126The OS events in the buf_pool struct can be waited for without owning the
127buf_pool->mutex.
128
129The buf_pool->mutex is a hot-spot in main memory, causing a lot of
130memory bus traffic on multiprocessor systems when processors
131alternately access the mutex. On our Pentium, the mutex is accessed
132maybe every 10 microseconds. We gave up the solution to have mutexes
133for each control block, for instance, because it seemed to be
134complicated.
135
136A solution to reduce mutex contention of the buf_pool->mutex is to
137create a separate mutex for the page hash table. On Pentium,
138accessing the hash table takes 2 microseconds, about half
139of the total buf_pool->mutex hold time.
140125
141 Control blocks126 Control blocks
142 --------------127 --------------
@@ -311,6 +296,11 @@
311UNIV_INTERN mysql_pfs_key_t buffer_block_mutex_key;296UNIV_INTERN mysql_pfs_key_t buffer_block_mutex_key;
312UNIV_INTERN mysql_pfs_key_t buf_pool_mutex_key;297UNIV_INTERN mysql_pfs_key_t buf_pool_mutex_key;
313UNIV_INTERN mysql_pfs_key_t buf_pool_zip_mutex_key;298UNIV_INTERN mysql_pfs_key_t buf_pool_zip_mutex_key;
299UNIV_INTERN mysql_pfs_key_t buf_pool_flush_state_mutex_key;
300UNIV_INTERN mysql_pfs_key_t buf_pool_LRU_list_mutex_key;
301UNIV_INTERN mysql_pfs_key_t buf_pool_free_list_mutex_key;
302UNIV_INTERN mysql_pfs_key_t buf_pool_zip_free_mutex_key;
303UNIV_INTERN mysql_pfs_key_t buf_pool_zip_hash_mutex_key;
314UNIV_INTERN mysql_pfs_key_t flush_list_mutex_key;304UNIV_INTERN mysql_pfs_key_t flush_list_mutex_key;
315#endif /* UNIV_PFS_MUTEX */305#endif /* UNIV_PFS_MUTEX */
316306
@@ -1186,7 +1176,6 @@
1186 buf_chunk_t* chunk = buf_pool->chunks;1176 buf_chunk_t* chunk = buf_pool->chunks;
11871177
1188 ut_ad(buf_pool);1178 ut_ad(buf_pool);
1189 ut_ad(buf_pool_mutex_own(buf_pool));
1190 for (n = buf_pool->n_chunks; n--; chunk++) {1179 for (n = buf_pool->n_chunks; n--; chunk++) {
11911180
1192 buf_block_t* block = buf_chunk_contains_zip(chunk, data);1181 buf_block_t* block = buf_chunk_contains_zip(chunk, data);
@@ -1265,8 +1254,6 @@
1265 ulint i;1254 ulint i;
1266 ulint curr_size = 0;1255 ulint curr_size = 0;
12671256
1268 buf_pool_mutex_enter_all();
1269
1270 for (i = 0; i < srv_buf_pool_instances; i++) {1257 for (i = 0; i < srv_buf_pool_instances; i++) {
1271 buf_pool_t* buf_pool;1258 buf_pool_t* buf_pool;
12721259
@@ -1276,8 +1263,6 @@
12761263
1277 srv_buf_pool_curr_size = curr_size;1264 srv_buf_pool_curr_size = curr_size;
1278 srv_buf_pool_old_size = srv_buf_pool_size;1265 srv_buf_pool_old_size = srv_buf_pool_size;
1279
1280 buf_pool_mutex_exit_all();
1281}1266}
12821267
1283/********************************************************************//**1268/********************************************************************//**
@@ -1297,12 +1282,18 @@
12971282
1298 /* 1. Initialize general fields1283 /* 1. Initialize general fields
1299 ------------------------------- */1284 ------------------------------- */
1300 mutex_create(buf_pool_mutex_key,1285 mutex_create(buf_pool_LRU_list_mutex_key,
1301 &buf_pool->mutex, SYNC_BUF_POOL);1286 &buf_pool->LRU_list_mutex, SYNC_BUF_LRU_LIST);
1287 mutex_create(buf_pool_free_list_mutex_key,
1288 &buf_pool->free_list_mutex, SYNC_BUF_FREE_LIST);
1289 mutex_create(buf_pool_zip_free_mutex_key,
1290 &buf_pool->zip_free_mutex, SYNC_BUF_ZIP_FREE);
1291 mutex_create(buf_pool_zip_hash_mutex_key,
1292 &buf_pool->zip_hash_mutex, SYNC_BUF_ZIP_HASH);
1302 mutex_create(buf_pool_zip_mutex_key,1293 mutex_create(buf_pool_zip_mutex_key,
1303 &buf_pool->zip_mutex, SYNC_BUF_BLOCK);1294 &buf_pool->zip_mutex, SYNC_BUF_BLOCK);
13041295 mutex_create(buf_pool_flush_state_mutex_key,
1305 buf_pool_mutex_enter(buf_pool);1296 &buf_pool->flush_state_mutex, SYNC_BUF_FLUSH_STATE);
13061297
1307 if (buf_pool_size > 0) {1298 if (buf_pool_size > 0) {
1308 buf_pool->n_chunks = 1;1299 buf_pool->n_chunks = 1;
@@ -1316,8 +1307,6 @@
1316 mem_free(chunk);1307 mem_free(chunk);
1317 mem_free(buf_pool);1308 mem_free(buf_pool);
13181309
1319 buf_pool_mutex_exit(buf_pool);
1320
1321 return(DB_ERROR);1310 return(DB_ERROR);
1322 }1311 }
13231312
@@ -1361,8 +1350,6 @@
13611350
1362 buf_pool->try_LRU_scan = TRUE;1351 buf_pool->try_LRU_scan = TRUE;
13631352
1364 buf_pool_mutex_exit(buf_pool);
1365
1366 return(DB_SUCCESS);1353 return(DB_SUCCESS);
1367}1354}
13681355
@@ -1537,7 +1524,7 @@
15371524
1538 fold = buf_page_address_fold(bpage->space, bpage->offset);1525 fold = buf_page_address_fold(bpage->space, bpage->offset);
15391526
1540 ut_ad(buf_pool_mutex_own(buf_pool));1527 ut_ad(mutex_own(&buf_pool->LRU_list_mutex));
1541 ut_ad(buf_page_hash_lock_held_x(buf_pool, bpage));1528 ut_ad(buf_page_hash_lock_held_x(buf_pool, bpage));
1542 ut_ad(mutex_own(buf_page_get_mutex(bpage)));1529 ut_ad(mutex_own(buf_page_get_mutex(bpage)));
1543 ut_a(buf_page_get_io_fix(bpage) == BUF_IO_NONE);1530 ut_a(buf_page_get_io_fix(bpage) == BUF_IO_NONE);
@@ -1670,14 +1657,14 @@
1670 return(bpage);1657 return(bpage);
1671 }1658 }
1672 /* Add to an existing watch. */1659 /* Add to an existing watch. */
1660 mutex_enter(&buf_pool->zip_mutex);
1673 bpage->buf_fix_count++;1661 bpage->buf_fix_count++;
1662 mutex_exit(&buf_pool->zip_mutex);
1674 return(NULL);1663 return(NULL);
1675 }1664 }
16761665
1677 /* From this point this function becomes fairly heavy in terms1666 /* From this point this function becomes fairly heavy in terms
1678 of latching. We acquire the buf_pool mutex as well as all the1667 of latching. We acquire all the hash_locks. They are needed
1679 hash_locks. buf_pool mutex is needed because any changes to
1680 the page_hash must be covered by it and hash_locks are needed
1681 because we don't want to read any stale information in1668 because we don't want to read any stale information in
1682 buf_pool->watch[]. However, it is not in the critical code path1669 buf_pool->watch[]. However, it is not in the critical code path
1683 as this function will be called only by the purge thread. */1670 as this function will be called only by the purge thread. */
@@ -1686,18 +1673,16 @@
1686 /* To obey latching order first release the hash_lock. */1673 /* To obey latching order first release the hash_lock. */
1687 rw_lock_x_unlock(hash_lock);1674 rw_lock_x_unlock(hash_lock);
16881675
1689 buf_pool_mutex_enter(buf_pool);
1690 hash_lock_x_all(buf_pool->page_hash);1676 hash_lock_x_all(buf_pool->page_hash);
16911677
1692 /* We have to recheck that the page1678 /* We have to recheck that the page
1693 was not loaded or a watch set by some other1679 was not loaded or a watch set by some other
1694 purge thread. This is because of the small1680 purge thread. This is because of the small
1695 time window between when we release the1681 time window between when we release the
1696 hash_lock to acquire buf_pool mutex above. */1682 hash_lock to acquire all the hash locks above. */
16971683
1698 bpage = buf_page_hash_get_low(buf_pool, space, offset, fold);1684 bpage = buf_page_hash_get_low(buf_pool, space, offset, fold);
1699 if (UNIV_LIKELY_NULL(bpage)) {1685 if (UNIV_LIKELY_NULL(bpage)) {
1700 buf_pool_mutex_exit(buf_pool);
1701 hash_unlock_x_all_but(buf_pool->page_hash, hash_lock);1686 hash_unlock_x_all_but(buf_pool->page_hash, hash_lock);
1702 goto page_found;1687 goto page_found;
1703 }1688 }
@@ -1716,21 +1701,19 @@
1716 ut_ad(!bpage->in_page_hash);1701 ut_ad(!bpage->in_page_hash);
1717 ut_ad(bpage->buf_fix_count == 0);1702 ut_ad(bpage->buf_fix_count == 0);
17181703
1719 /* bpage is pointing to buf_pool->watch[],1704 mutex_enter(&buf_pool->zip_mutex);
1720 which is protected by buf_pool->mutex.
1721 Normally, buf_page_t objects are protected by
1722 buf_block_t::mutex or buf_pool->zip_mutex or both. */
17231705
1724 bpage->state = BUF_BLOCK_ZIP_PAGE;1706 bpage->state = BUF_BLOCK_ZIP_PAGE;
1725 bpage->space = space;1707 bpage->space = space;
1726 bpage->offset = offset;1708 bpage->offset = offset;
1727 bpage->buf_fix_count = 1;1709 bpage->buf_fix_count = 1;
17281710
1711 mutex_exit(&buf_pool->zip_mutex);
1712
1729 ut_d(bpage->in_page_hash = TRUE);1713 ut_d(bpage->in_page_hash = TRUE);
1730 HASH_INSERT(buf_page_t, hash, buf_pool->page_hash,1714 HASH_INSERT(buf_page_t, hash, buf_pool->page_hash,
1731 fold, bpage);1715 fold, bpage);
17321716
1733 buf_pool_mutex_exit(buf_pool);
1734 /* Once the sentinel is in the page_hash we can1717 /* Once the sentinel is in the page_hash we can
1735 safely release all locks except just the1718 safely release all locks except just the
1736 relevant hash_lock */1719 relevant hash_lock */
@@ -1777,7 +1760,8 @@
1777 ut_ad(rw_lock_own(hash_lock, RW_LOCK_EX));1760 ut_ad(rw_lock_own(hash_lock, RW_LOCK_EX));
1778#endif /* UNIV_SYNC_DEBUG */1761#endif /* UNIV_SYNC_DEBUG */
17791762
1780 ut_ad(buf_pool_mutex_own(buf_pool));1763 ut_ad(buf_page_get_state(watch) == BUF_BLOCK_ZIP_PAGE);
1764 ut_ad(buf_own_zip_mutex_for_page(watch));
17811765
1782 HASH_DELETE(buf_page_t, hash, buf_pool->page_hash, fold, watch);1766 HASH_DELETE(buf_page_t, hash, buf_pool->page_hash, fold, watch);
1783 ut_d(watch->in_page_hash = FALSE);1767 ut_d(watch->in_page_hash = FALSE);
@@ -1801,13 +1785,6 @@
1801 rw_lock_t* hash_lock = buf_page_hash_lock_get(buf_pool,1785 rw_lock_t* hash_lock = buf_page_hash_lock_get(buf_pool,
1802 fold);1786 fold);
18031787
1804 /* We only need to have buf_pool mutex in case where we end
1805 up calling buf_pool_watch_remove but to obey latching order
1806 we acquire it here before acquiring hash_lock. This should
1807 not cause too much grief as this function is only ever
1808 called from the purge thread. */
1809 buf_pool_mutex_enter(buf_pool);
1810
1811 rw_lock_x_lock(hash_lock);1788 rw_lock_x_lock(hash_lock);
18121789
1813 bpage = buf_page_hash_get_low(buf_pool, space, offset, fold);1790 bpage = buf_page_hash_get_low(buf_pool, space, offset, fold);
@@ -1825,12 +1802,13 @@
1825 } else {1802 } else {
1826 ut_a(bpage->buf_fix_count > 0);1803 ut_a(bpage->buf_fix_count > 0);
18271804
1805 mutex_enter(&buf_pool->zip_mutex);
1828 if (UNIV_LIKELY(!--bpage->buf_fix_count)) {1806 if (UNIV_LIKELY(!--bpage->buf_fix_count)) {
1829 buf_pool_watch_remove(buf_pool, fold, bpage);1807 buf_pool_watch_remove(buf_pool, fold, bpage);
1830 }1808 }
1809 mutex_exit(&buf_pool->zip_mutex);
1831 }1810 }
18321811
1833 buf_pool_mutex_exit(buf_pool);
1834 rw_lock_x_unlock(hash_lock);1812 rw_lock_x_unlock(hash_lock);
1835}1813}
18361814
@@ -1877,13 +1855,13 @@
1877{1855{
1878 buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);1856 buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);
18791857
1880 buf_pool_mutex_enter(buf_pool);1858 mutex_enter(&buf_pool->LRU_list_mutex);
18811859
1882 ut_a(buf_page_in_file(bpage));1860 ut_a(buf_page_in_file(bpage));
18831861
1884 buf_LRU_make_block_young(bpage);1862 buf_LRU_make_block_young(bpage);
18851863
1886 buf_pool_mutex_exit(buf_pool);1864 mutex_exit(&buf_pool->LRU_list_mutex);
1887}1865}
18881866
1889/********************************************************************//**1867/********************************************************************//**
@@ -1897,10 +1875,6 @@
1897 buf_page_t* bpage) /*!< in/out: buffer block of a1875 buf_page_t* bpage) /*!< in/out: buffer block of a
1898 file page */1876 file page */
1899{1877{
1900#ifdef UNIV_DEBUG
1901 buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);
1902 ut_ad(!buf_pool_mutex_own(buf_pool));
1903#endif /* UNIV_DEBUG */
1904 ut_a(buf_page_in_file(bpage));1878 ut_a(buf_page_in_file(bpage));
19051879
1906 if (buf_page_peek_if_too_old(bpage)) {1880 if (buf_page_peek_if_too_old(bpage)) {
@@ -1921,16 +1895,12 @@
1921 buf_block_t* block;1895 buf_block_t* block;
1922 buf_pool_t* buf_pool = buf_pool_get(space, offset);1896 buf_pool_t* buf_pool = buf_pool_get(space, offset);
19231897
1924 buf_pool_mutex_enter(buf_pool);
1925
1926 block = (buf_block_t*) buf_page_hash_get(buf_pool, space, offset);1898 block = (buf_block_t*) buf_page_hash_get(buf_pool, space, offset);
19271899
1928 if (block && buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE) {1900 if (block && buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE) {
1929 ut_ad(!buf_pool_watch_is_sentinel(buf_pool, &block->page));1901 ut_ad(!buf_pool_watch_is_sentinel(buf_pool, &block->page));
1930 block->check_index_page_at_flush = FALSE;1902 block->check_index_page_at_flush = FALSE;
1931 }1903 }
1932
1933 buf_pool_mutex_exit(buf_pool);
1934}1904}
19351905
1936#if defined UNIV_DEBUG_FILE_ACCESSES || defined UNIV_DEBUG1906#if defined UNIV_DEBUG_FILE_ACCESSES || defined UNIV_DEBUG
@@ -2014,21 +1984,32 @@
2014 buf_page_t* bpage;1984 buf_page_t* bpage;
2015 buf_pool_t* buf_pool = buf_pool_get(space, offset);1985 buf_pool_t* buf_pool = buf_pool_get(space, offset);
20161986
2017 /* Since we need to acquire buf_pool mutex to discard1987 /* Since we need to acquire buf_pool->LRU_list_mutex to discard
2018 the uncompressed frame and because page_hash mutex resides1988 the uncompressed frame and because page_hash mutex resides below
2019 below buf_pool mutex in sync ordering therefore we must1989 buf_pool->LRU_list_mutex in sync ordering therefore we must first
2020 first release the page_hash mutex. This means that the1990 release the page_hash mutex. This means that the block in question
2021 block in question can move out of page_hash. Therefore1991 can move out of page_hash. Therefore we need to check again if the
2022 we need to check again if the block is still in page_hash. */1992 block is still in page_hash. */
2023 buf_pool_mutex_enter(buf_pool);1993
1994 mutex_enter(&buf_pool->LRU_list_mutex);
20241995
2025 bpage = buf_page_hash_get(buf_pool, space, offset);1996 bpage = buf_page_hash_get(buf_pool, space, offset);
20261997
2027 if (bpage) {1998 if (bpage) {
2028 buf_LRU_free_page(bpage, false);1999
2000 ib_mutex_t* block_mutex = buf_page_get_mutex(bpage);
2001
2002 mutex_enter(block_mutex);
2003
2004 if (buf_LRU_free_page(bpage, false)) {
2005
2006 mutex_exit(block_mutex);
2007 return;
2008 }
2009 mutex_exit(block_mutex);
2029 }2010 }
20302011
2031 buf_pool_mutex_exit(buf_pool);2012 mutex_exit(&buf_pool->LRU_list_mutex);
2032}2013}
20332014
2034/********************************************************************//**2015/********************************************************************//**
@@ -2324,7 +2305,8 @@
2324 ut_ad(block->frame == page_align(ptr));2305 ut_ad(block->frame == page_align(ptr));
2325#ifdef UNIV_DEBUG2306#ifdef UNIV_DEBUG
2326 /* A thread that updates these fields must2307 /* A thread that updates these fields must
2327 hold buf_pool->mutex and block->mutex. Acquire2308 hold one of the buf_pool mutexes, depending on the
2309 page state, and block->mutex. Acquire
2328 only the latter. */2310 only the latter. */
2329 mutex_enter(&block->mutex);2311 mutex_enter(&block->mutex);
23302312
@@ -2708,6 +2690,7 @@
2708 buf_page_t* bpage;2690 buf_page_t* bpage;
27092691
2710 case BUF_BLOCK_FILE_PAGE:2692 case BUF_BLOCK_FILE_PAGE:
2693 ut_ad(block_mutex != &buf_pool->zip_mutex);
2711 break;2694 break;
27122695
2713 case BUF_BLOCK_ZIP_PAGE:2696 case BUF_BLOCK_ZIP_PAGE:
@@ -2721,13 +2704,14 @@
2721 }2704 }
27222705
2723 bpage = &block->page;2706 bpage = &block->page;
2707 ut_ad(block_mutex == &buf_pool->zip_mutex);
27242708
2725 if (bpage->buf_fix_count2709 if (bpage->buf_fix_count
2726 || buf_page_get_io_fix(bpage) != BUF_IO_NONE) {2710 || buf_page_get_io_fix(bpage) != BUF_IO_NONE) {
2727 /* This condition often occurs when the buffer2711 /* This condition often occurs when the buffer
2728 is not buffer-fixed, but I/O-fixed by2712 is not buffer-fixed, but I/O-fixed by
2729 buf_page_init_for_read(). */2713 buf_page_init_for_read(). */
2730 mutex_exit(block_mutex);2714 mutex_exit(&buf_pool->zip_mutex);
2731wait_until_unfixed:2715wait_until_unfixed:
2732 /* The block is buffer-fixed or I/O-fixed.2716 /* The block is buffer-fixed or I/O-fixed.
2733 Try again later. */2717 Try again later. */
@@ -2737,11 +2721,11 @@
2737 }2721 }
27382722
2739 /* Allocate an uncompressed page. */2723 /* Allocate an uncompressed page. */
2740 mutex_exit(block_mutex);2724 mutex_exit(&buf_pool->zip_mutex);
2741 block = buf_LRU_get_free_block(buf_pool);2725 block = buf_LRU_get_free_block(buf_pool);
2742 ut_a(block);2726 ut_a(block);
27432727
2744 buf_pool_mutex_enter(buf_pool);2728 mutex_enter(&buf_pool->LRU_list_mutex);
27452729
2746 /* As we have released the page_hash lock and the2730 /* As we have released the page_hash lock and the
2747 block_mutex to allocate an uncompressed page it is2731 block_mutex to allocate an uncompressed page it is
@@ -2754,22 +2738,25 @@
2754 offset, fold);2738 offset, fold);
27552739
2756 mutex_enter(&block->mutex);2740 mutex_enter(&block->mutex);
2741 mutex_enter(&buf_pool->zip_mutex);
2742
2757 if (bpage != hash_bpage2743 if (bpage != hash_bpage
2758 || bpage->buf_fix_count2744 || bpage->buf_fix_count
2759 || buf_page_get_io_fix(bpage) != BUF_IO_NONE) {2745 || buf_page_get_io_fix(bpage) != BUF_IO_NONE) {
2760 buf_LRU_block_free_non_file_page(block);2746 buf_LRU_block_free_non_file_page(block);
2761 buf_pool_mutex_exit(buf_pool);2747 mutex_exit(&buf_pool->LRU_list_mutex);
2748 mutex_exit(&buf_pool->zip_mutex);
2762 rw_lock_x_unlock(hash_lock);2749 rw_lock_x_unlock(hash_lock);
2763 mutex_exit(&block->mutex);2750 mutex_exit(&block->mutex);
27642751
2765 if (bpage != hash_bpage) {2752 if (bpage != hash_bpage) {
2766 /* The buf_pool->page_hash was modified2753 /* The buf_pool->page_hash was modified
2767 while buf_pool->mutex was not held2754 while buf_pool->LRU_list_mutex was not held
2768 by this thread. */2755 by this thread. */
2769 goto loop;2756 goto loop;
2770 } else {2757 } else {
2771 /* The block was buffer-fixed or2758 /* The block was buffer-fixed or
2772 I/O-fixed while buf_pool->mutex was2759 I/O-fixed while buf_pool->LRU_list_mutex was
2773 not held by this thread. */2760 not held by this thread. */
2774 goto wait_until_unfixed;2761 goto wait_until_unfixed;
2775 }2762 }
@@ -2778,8 +2765,6 @@
2778 /* Move the compressed page from bpage to block,2765 /* Move the compressed page from bpage to block,
2779 and uncompress it. */2766 and uncompress it. */
27802767
2781 mutex_enter(&buf_pool->zip_mutex);
2782
2783 buf_relocate(bpage, &block->page);2768 buf_relocate(bpage, &block->page);
2784 buf_block_init_low(block);2769 buf_block_init_low(block);
2785 block->lock_hash_val = lock_rec_hash(space, offset);2770 block->lock_hash_val = lock_rec_hash(space, offset);
@@ -2808,6 +2793,8 @@
2808 /* Insert at the front of unzip_LRU list */2793 /* Insert at the front of unzip_LRU list */
2809 buf_unzip_LRU_add_block(block, FALSE);2794 buf_unzip_LRU_add_block(block, FALSE);
28102795
2796 mutex_exit(&buf_pool->LRU_list_mutex);
2797
2811 block->page.buf_fix_count = 1;2798 block->page.buf_fix_count = 1;
2812 buf_block_set_io_fix(block, BUF_IO_READ);2799 buf_block_set_io_fix(block, BUF_IO_READ);
2813 rw_lock_x_lock_inline(&block->lock, 0, file, line);2800 rw_lock_x_lock_inline(&block->lock, 0, file, line);
@@ -2816,8 +2803,7 @@
28162803
2817 rw_lock_x_unlock(hash_lock);2804 rw_lock_x_unlock(hash_lock);
28182805
2819 buf_pool->n_pend_unzip++;2806 os_atomic_increment_ulint(&buf_pool->n_pend_unzip, 1);
2820 buf_pool_mutex_exit(buf_pool);
28212807
2822 access_time = buf_page_is_accessed(&block->page);2808 access_time = buf_page_is_accessed(&block->page);
2823 mutex_exit(&block->mutex);2809 mutex_exit(&block->mutex);
@@ -2826,7 +2812,7 @@
2826 buf_page_free_descriptor(bpage);2812 buf_page_free_descriptor(bpage);
28272813
2828 /* Decompress the page while not holding2814 /* Decompress the page while not holding
2829 buf_pool->mutex or block->mutex. */2815 any buf_pool or block->mutex. */
28302816
2831 /* Page checksum verification is already done when2817 /* Page checksum verification is already done when
2832 the page is read from disk. Hence page checksum2818 the page is read from disk. Hence page checksum
@@ -2845,12 +2831,10 @@
2845 }2831 }
28462832
2847 /* Unfix and unlatch the block. */2833 /* Unfix and unlatch the block. */
2848 buf_pool_mutex_enter(buf_pool);
2849 mutex_enter(&block->mutex);2834 mutex_enter(&block->mutex);
2850 block->page.buf_fix_count--;2835 block->page.buf_fix_count--;
2851 buf_block_set_io_fix(block, BUF_IO_NONE);2836 buf_block_set_io_fix(block, BUF_IO_NONE);
2852 buf_pool->n_pend_unzip--;2837 os_atomic_decrement_ulint(&buf_pool->n_pend_unzip, 1);
2853 buf_pool_mutex_exit(buf_pool);
2854 rw_lock_x_unlock(&block->lock);2838 rw_lock_x_unlock(&block->lock);
28552839
2856 break;2840 break;
@@ -2885,23 +2869,18 @@
2885 insert buffer (change buffer) as much as possible. */2869 insert buffer (change buffer) as much as possible. */
28862870
2887 /* To obey the latching order, release the2871 /* To obey the latching order, release the
2888 block->mutex before acquiring buf_pool->mutex. Protect2872 block->mutex before acquiring buf_pool->LRU_list_mutex. Protect
2889 the block from changes by temporarily buffer-fixing it2873 the block from changes by temporarily buffer-fixing it
2890 for the time we are not holding block->mutex. */2874 for the time we are not holding block->mutex. */
2875
2891 buf_block_buf_fix_inc(block, file, line);2876 buf_block_buf_fix_inc(block, file, line);
2892 mutex_exit(&block->mutex);2877 mutex_exit(&block->mutex);
2893 buf_pool_mutex_enter(buf_pool);2878 mutex_enter(&buf_pool->LRU_list_mutex);
2894 mutex_enter(&block->mutex);2879 mutex_enter(&block->mutex);
2895 buf_block_buf_fix_dec(block);2880 buf_block_buf_fix_dec(block);
2896 mutex_exit(&block->mutex);
2897
2898 /* Now we are only holding the buf_pool->mutex,
2899 not block->mutex or hash_lock. Blocks cannot be
2900 relocated or enter or exit the buf_pool while we
2901 are holding the buf_pool->mutex. */
29022881
2903 if (buf_LRU_free_page(&block->page, true)) {2882 if (buf_LRU_free_page(&block->page, true)) {
2904 buf_pool_mutex_exit(buf_pool);2883 mutex_exit(&block->mutex);
2905 rw_lock_x_lock(hash_lock);2884 rw_lock_x_lock(hash_lock);
29062885
2907 if (mode == BUF_GET_IF_IN_POOL_OR_WATCH) {2886 if (mode == BUF_GET_IF_IN_POOL_OR_WATCH) {
@@ -2931,10 +2910,11 @@
2931 "innodb_change_buffering_debug evict %u %u\n",2910 "innodb_change_buffering_debug evict %u %u\n",
2932 (unsigned) space, (unsigned) offset);2911 (unsigned) space, (unsigned) offset);
2933 return(NULL);2912 return(NULL);
2913 } else {
2914
2915 mutex_exit(&buf_pool->LRU_list_mutex);
2934 }2916 }
29352917
2936 mutex_enter(&block->mutex);
2937
2938 if (buf_flush_page_try(buf_pool, block)) {2918 if (buf_flush_page_try(buf_pool, block)) {
2939 fprintf(stderr,2919 fprintf(stderr,
2940 "innodb_change_buffering_debug flush %u %u\n",2920 "innodb_change_buffering_debug flush %u %u\n",
@@ -2944,8 +2924,6 @@
2944 }2924 }
29452925
2946 /* Failed to evict the page; change it directly */2926 /* Failed to evict the page; change it directly */
2947
2948 buf_pool_mutex_exit(buf_pool);
2949 }2927 }
2950#endif /* UNIV_DEBUG || UNIV_IBUF_DEBUG */2928#endif /* UNIV_DEBUG || UNIV_IBUF_DEBUG */
29512929
@@ -3420,7 +3398,6 @@
3420 buf_page_t* hash_page;3398 buf_page_t* hash_page;
34213399
3422 ut_ad(buf_pool == buf_pool_get(space, offset));3400 ut_ad(buf_pool == buf_pool_get(space, offset));
3423 ut_ad(buf_pool_mutex_own(buf_pool));
34243401
3425 ut_ad(mutex_own(&(block->mutex)));3402 ut_ad(mutex_own(&(block->mutex)));
3426 ut_a(buf_block_get_state(block) != BUF_BLOCK_FILE_PAGE);3403 ut_a(buf_block_get_state(block) != BUF_BLOCK_FILE_PAGE);
@@ -3455,11 +3432,17 @@
3455 if (UNIV_LIKELY(!hash_page)) {3432 if (UNIV_LIKELY(!hash_page)) {
3456 } else if (buf_pool_watch_is_sentinel(buf_pool, hash_page)) {3433 } else if (buf_pool_watch_is_sentinel(buf_pool, hash_page)) {
3457 /* Preserve the reference count. */3434 /* Preserve the reference count. */
3435
3436 mutex_enter(&buf_pool->zip_mutex);
3437
3458 ulint buf_fix_count = hash_page->buf_fix_count;3438 ulint buf_fix_count = hash_page->buf_fix_count;
34593439
3460 ut_a(buf_fix_count > 0);3440 ut_a(buf_fix_count > 0);
3461 block->page.buf_fix_count += buf_fix_count;3441 block->page.buf_fix_count += buf_fix_count;
3462 buf_pool_watch_remove(buf_pool, fold, hash_page);3442 buf_pool_watch_remove(buf_pool, fold, hash_page);
3443
3444 mutex_exit(&buf_pool->zip_mutex);
3445
3463 } else {3446 } else {
3464 fprintf(stderr,3447 fprintf(stderr,
3465 "InnoDB: Error: page %lu %lu already found"3448 "InnoDB: Error: page %lu %lu already found"
@@ -3469,7 +3452,6 @@
3469 (const void*) hash_page, (const void*) block);3452 (const void*) hash_page, (const void*) block);
3470#if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG3453#if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
3471 mutex_exit(&block->mutex);3454 mutex_exit(&block->mutex);
3472 buf_pool_mutex_exit(buf_pool);
3473 buf_print();3455 buf_print();
3474 buf_LRU_print();3456 buf_LRU_print();
3475 buf_validate();3457 buf_validate();
@@ -3556,7 +3538,7 @@
3556 fold = buf_page_address_fold(space, offset);3538 fold = buf_page_address_fold(space, offset);
3557 hash_lock = buf_page_hash_lock_get(buf_pool, fold);3539 hash_lock = buf_page_hash_lock_get(buf_pool, fold);
35583540
3559 buf_pool_mutex_enter(buf_pool);3541 mutex_enter(&buf_pool->LRU_list_mutex);
3560 rw_lock_x_lock(hash_lock);3542 rw_lock_x_lock(hash_lock);
35613543
3562 watch_page = buf_page_hash_get_low(buf_pool, space, offset, fold);3544 watch_page = buf_page_hash_get_low(buf_pool, space, offset, fold);
@@ -3564,6 +3546,7 @@
3564 /* The page is already in the buffer pool. */3546 /* The page is already in the buffer pool. */
3565 watch_page = NULL;3547 watch_page = NULL;
3566err_exit:3548err_exit:
3549 mutex_exit(&buf_pool->LRU_list_mutex);
3567 rw_lock_x_unlock(hash_lock);3550 rw_lock_x_unlock(hash_lock);
3568 if (block) {3551 if (block) {
3569 mutex_enter(&block->mutex);3552 mutex_enter(&block->mutex);
@@ -3596,6 +3579,7 @@
35963579
3597 /* The block must be put to the LRU list, to the old blocks */3580 /* The block must be put to the LRU list, to the old blocks */
3598 buf_LRU_add_block(bpage, TRUE/* to old blocks */);3581 buf_LRU_add_block(bpage, TRUE/* to old blocks */);
3582 mutex_exit(&buf_pool->LRU_list_mutex);
35993583
3600 /* We set a pass-type x-lock on the frame because then3584 /* We set a pass-type x-lock on the frame because then
3601 the same thread which called for the read operation3585 the same thread which called for the read operation
@@ -3610,15 +3594,16 @@
3610 buf_page_set_io_fix(bpage, BUF_IO_READ);3594 buf_page_set_io_fix(bpage, BUF_IO_READ);
36113595
3612 if (zip_size) {3596 if (zip_size) {
3613 /* buf_pool->mutex may be released and3597 /* buf_pool->LRU_list_mutex may be released and
3614 reacquired by buf_buddy_alloc(). Thus, we3598 reacquired by buf_buddy_alloc(). Thus, we
3615 must release block->mutex in order not to3599 must release block->mutex in order not to
3616 break the latching order in the reacquisition3600 break the latching order in the reacquisition
3617 of buf_pool->mutex. We also must defer this3601 of buf_pool->LRU_list_mutex. We also must defer this
3618 operation until after the block descriptor has3602 operation until after the block descriptor has
3619 been added to buf_pool->LRU and3603 been added to buf_pool->LRU and
3620 buf_pool->page_hash. */3604 buf_pool->page_hash. */
3621 mutex_exit(&block->mutex);3605 mutex_exit(&block->mutex);
3606 mutex_enter(&buf_pool->LRU_list_mutex);
3622 data = buf_buddy_alloc(buf_pool, zip_size, &lru);3607 data = buf_buddy_alloc(buf_pool, zip_size, &lru);
3623 mutex_enter(&block->mutex);3608 mutex_enter(&block->mutex);
3624 block->page.zip.data = (page_zip_t*) data;3609 block->page.zip.data = (page_zip_t*) data;
@@ -3630,6 +3615,7 @@
3630 after block->page.zip.data is set. */3615 after block->page.zip.data is set. */
3631 ut_ad(buf_page_belongs_to_unzip_LRU(&block->page));3616 ut_ad(buf_page_belongs_to_unzip_LRU(&block->page));
3632 buf_unzip_LRU_add_block(block, TRUE);3617 buf_unzip_LRU_add_block(block, TRUE);
3618 mutex_exit(&buf_pool->LRU_list_mutex);
3633 }3619 }
36343620
3635 mutex_exit(&block->mutex);3621 mutex_exit(&block->mutex);
@@ -3645,8 +3631,9 @@
3645 rw_lock_x_lock(hash_lock);3631 rw_lock_x_lock(hash_lock);
36463632
3647 /* If buf_buddy_alloc() allocated storage from the LRU list,3633 /* If buf_buddy_alloc() allocated storage from the LRU list,
3648 it released and reacquired buf_pool->mutex. Thus, we must3634 it released and reacquired buf_pool->LRU_list_mutex. Thus, we
3649 check the page_hash again, as it may have been modified. */3635 must check the page_hash again, as it may have been
3636 modified. */
3650 if (UNIV_UNLIKELY(lru)) {3637 if (UNIV_UNLIKELY(lru)) {
36513638
3652 watch_page = buf_page_hash_get_low(3639 watch_page = buf_page_hash_get_low(
@@ -3657,6 +3644,7 @@
3657 watch_page))) {3644 watch_page))) {
36583645
3659 /* The block was added by some other thread. */3646 /* The block was added by some other thread. */
3647 mutex_exit(&buf_pool->LRU_list_mutex);
3660 rw_lock_x_unlock(hash_lock);3648 rw_lock_x_unlock(hash_lock);
3661 watch_page = NULL;3649 watch_page = NULL;
3662 buf_buddy_free(buf_pool, data, zip_size);3650 buf_buddy_free(buf_pool, data, zip_size);
@@ -3700,6 +3688,7 @@
3700 /* Preserve the reference count. */3688 /* Preserve the reference count. */
3701 ulint buf_fix_count = watch_page->buf_fix_count;3689 ulint buf_fix_count = watch_page->buf_fix_count;
3702 ut_a(buf_fix_count > 0);3690 ut_a(buf_fix_count > 0);
3691 ut_ad(buf_own_zip_mutex_for_page(bpage));
3703 bpage->buf_fix_count += buf_fix_count;3692 bpage->buf_fix_count += buf_fix_count;
3704 ut_ad(buf_pool_watch_is_sentinel(buf_pool, watch_page));3693 ut_ad(buf_pool_watch_is_sentinel(buf_pool, watch_page));
3705 buf_pool_watch_remove(buf_pool, fold, watch_page);3694 buf_pool_watch_remove(buf_pool, fold, watch_page);
@@ -3716,15 +3705,15 @@
3716#if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG3705#if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
3717 buf_LRU_insert_zip_clean(bpage);3706 buf_LRU_insert_zip_clean(bpage);
3718#endif /* UNIV_DEBUG || UNIV_BUF_DEBUG */3707#endif /* UNIV_DEBUG || UNIV_BUF_DEBUG */
3708 mutex_exit(&buf_pool->LRU_list_mutex);
37193709
3720 buf_page_set_io_fix(bpage, BUF_IO_READ);3710 buf_page_set_io_fix(bpage, BUF_IO_READ);
37213711
3722 mutex_exit(&buf_pool->zip_mutex);3712 mutex_exit(&buf_pool->zip_mutex);
3723 }3713 }
37243714
3725 buf_pool->n_pend_reads++;3715 os_atomic_increment_ulint(&buf_pool->n_pend_reads, 1);
3726func_exit:3716func_exit:
3727 buf_pool_mutex_exit(buf_pool);
37283717
3729 if (mode == BUF_READ_IBUF_PAGES_ONLY) {3718 if (mode == BUF_READ_IBUF_PAGES_ONLY) {
37303719
@@ -3773,7 +3762,7 @@
3773 fold = buf_page_address_fold(space, offset);3762 fold = buf_page_address_fold(space, offset);
3774 hash_lock = buf_page_hash_lock_get(buf_pool, fold);3763 hash_lock = buf_page_hash_lock_get(buf_pool, fold);
37753764
3776 buf_pool_mutex_enter(buf_pool);3765 mutex_enter(&buf_pool->LRU_list_mutex);
3777 rw_lock_x_lock(hash_lock);3766 rw_lock_x_lock(hash_lock);
37783767
3779 block = (buf_block_t*) buf_page_hash_get_low(3768 block = (buf_block_t*) buf_page_hash_get_low(
@@ -3790,8 +3779,8 @@
3790#endif /* UNIV_DEBUG_FILE_ACCESSES || UNIV_DEBUG */3779#endif /* UNIV_DEBUG_FILE_ACCESSES || UNIV_DEBUG */
37913780
3792 /* Page can be found in buf_pool */3781 /* Page can be found in buf_pool */
3793 buf_pool_mutex_exit(buf_pool);
3794 rw_lock_x_unlock(hash_lock);3782 rw_lock_x_unlock(hash_lock);
3783 mutex_exit(&buf_pool->LRU_list_mutex);
37953784
3796 buf_block_free(free_block);3785 buf_block_free(free_block);
37973786
@@ -3827,17 +3816,17 @@
3827 ibool lru;3816 ibool lru;
38283817
3829 /* Prevent race conditions during buf_buddy_alloc(),3818 /* Prevent race conditions during buf_buddy_alloc(),
3830 which may release and reacquire buf_pool->mutex,3819 which may release and reacquire buf_pool->LRU_list_mutex,
3831 by IO-fixing and X-latching the block. */3820 by IO-fixing and X-latching the block. */
38323821
3833 buf_page_set_io_fix(&block->page, BUF_IO_READ);3822 buf_page_set_io_fix(&block->page, BUF_IO_READ);
3834 rw_lock_x_lock(&block->lock);3823 rw_lock_x_lock(&block->lock);
38353824
3836 mutex_exit(&block->mutex);3825 mutex_exit(&block->mutex);
3837 /* buf_pool->mutex may be released and reacquired by3826 /* buf_pool->LRU_list_mutex may be released and reacquired by
3838 buf_buddy_alloc(). Thus, we must release block->mutex3827 buf_buddy_alloc(). Thus, we must release block->mutex
3839 in order not to break the latching order in3828 in order not to break the latching order in
3840 the reacquisition of buf_pool->mutex. We also must3829 the reacquisition of buf_pool->LRU_list_mutex. We also must
3841 defer this operation until after the block descriptor3830 defer this operation until after the block descriptor
3842 has been added to buf_pool->LRU and buf_pool->page_hash. */3831 has been added to buf_pool->LRU and buf_pool->page_hash. */
3843 data = buf_buddy_alloc(buf_pool, zip_size, &lru);3832 data = buf_buddy_alloc(buf_pool, zip_size, &lru);
@@ -3856,7 +3845,7 @@
3856 rw_lock_x_unlock(&block->lock);3845 rw_lock_x_unlock(&block->lock);
3857 }3846 }
38583847
3859 buf_pool_mutex_exit(buf_pool);3848 mutex_exit(&buf_pool->LRU_list_mutex);
38603849
3861 mtr_memo_push(mtr, block, MTR_MEMO_BUF_FIX);3850 mtr_memo_push(mtr, block, MTR_MEMO_BUF_FIX);
38623851
@@ -3907,6 +3896,8 @@
3907 const byte* frame;3896 const byte* frame;
3908 monitor_id_t counter;3897 monitor_id_t counter;
39093898
3899 ut_ad(mutex_own(buf_page_get_mutex(bpage)));
3900
3910 /* If the counter module is not turned on, just return */3901 /* If the counter module is not turned on, just return */
3911 if (!MONITOR_IS_ON(MONITOR_MODULE_BUF_PAGE)) {3902 if (!MONITOR_IS_ON(MONITOR_MODULE_BUF_PAGE)) {
3912 return;3903 return;
@@ -4014,9 +4005,13 @@
4014 == BUF_BLOCK_FILE_PAGE);4005 == BUF_BLOCK_FILE_PAGE);
4015 ulint space = bpage->space;4006 ulint space = bpage->space;
4016 ibool ret = TRUE;4007 ibool ret = TRUE;
4008 const ulint fold = buf_page_address_fold(bpage->space,
4009 bpage->offset);
4010 rw_lock_t* hash_lock = buf_page_hash_lock_get(buf_pool, fold);
40174011
4018 /* First unfix and release lock on the bpage */4012 /* First unfix and release lock on the bpage */
4019 buf_pool_mutex_enter(buf_pool);4013 mutex_enter(&buf_pool->LRU_list_mutex);
4014 rw_lock_x_lock(hash_lock);
4020 mutex_enter(buf_page_get_mutex(bpage));4015 mutex_enter(buf_page_get_mutex(bpage));
4021 ut_ad(buf_page_get_io_fix(bpage) == BUF_IO_READ);4016 ut_ad(buf_page_get_io_fix(bpage) == BUF_IO_READ);
4022 ut_ad(bpage->buf_fix_count == 0);4017 ut_ad(bpage->buf_fix_count == 0);
@@ -4030,19 +4025,18 @@
4030 BUF_IO_READ);4025 BUF_IO_READ);
4031 }4026 }
40324027
4033 mutex_exit(buf_page_get_mutex(bpage));
4034
4035 /* Find the table with specified space id, and mark it corrupted */4028 /* Find the table with specified space id, and mark it corrupted */
4036 if (dict_set_corrupted_by_space(space)) {4029 if (dict_set_corrupted_by_space(space)) {
4037 buf_LRU_free_one_page(bpage);4030 buf_LRU_free_one_page(bpage);
4038 } else {4031 } else {
4032 mutex_exit(buf_page_get_mutex(bpage));
4039 ret = FALSE;4033 ret = FALSE;
4040 }4034 }
40414035
4036 mutex_exit(&buf_pool->LRU_list_mutex);
4037
4042 ut_ad(buf_pool->n_pend_reads > 0);4038 ut_ad(buf_pool->n_pend_reads > 0);
4043 buf_pool->n_pend_reads--;4039 os_atomic_decrement_ulint(&buf_pool->n_pend_reads, 1);
4044
4045 buf_pool_mutex_exit(buf_pool);
40464040
4047 return(ret);4041 return(ret);
4048}4042}
@@ -4061,6 +4055,7 @@
4061 buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);4055 buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);
4062 const ibool uncompressed = (buf_page_get_state(bpage)4056 const ibool uncompressed = (buf_page_get_state(bpage)
4063 == BUF_BLOCK_FILE_PAGE);4057 == BUF_BLOCK_FILE_PAGE);
4058 bool have_LRU_mutex = false;
40644059
4065 ut_a(buf_page_in_file(bpage));4060 ut_a(buf_page_in_file(bpage));
40664061
@@ -4070,7 +4065,7 @@
4070 ensures that this is the only thread that handles the i/o for this4065 ensures that this is the only thread that handles the i/o for this
4071 block. */4066 block. */
40724067
4073 io_type = buf_page_get_io_fix(bpage);4068 io_type = buf_page_get_io_fix_unlocked(bpage);
4074 ut_ad(io_type == BUF_IO_READ || io_type == BUF_IO_WRITE);4069 ut_ad(io_type == BUF_IO_READ || io_type == BUF_IO_WRITE);
40754070
4076 if (io_type == BUF_IO_READ) {4071 if (io_type == BUF_IO_READ) {
@@ -4080,15 +4075,16 @@
40804075
4081 if (buf_page_get_zip_size(bpage)) {4076 if (buf_page_get_zip_size(bpage)) {
4082 frame = bpage->zip.data;4077 frame = bpage->zip.data;
4083 buf_pool->n_pend_unzip++;4078 os_atomic_increment_ulint(&buf_pool->n_pend_unzip, 1);
4084 if (uncompressed4079 if (uncompressed
4085 && !buf_zip_decompress((buf_block_t*) bpage,4080 && !buf_zip_decompress((buf_block_t*) bpage,
4086 FALSE)) {4081 FALSE)) {
40874082
4088 buf_pool->n_pend_unzip--;4083 os_atomic_decrement_ulint(
4084 &buf_pool->n_pend_unzip, 1);
4089 goto corrupt;4085 goto corrupt;
4090 }4086 }
4091 buf_pool->n_pend_unzip--;4087 os_atomic_decrement_ulint(&buf_pool->n_pend_unzip, 1);
4092 } else {4088 } else {
4093 ut_a(uncompressed);4089 ut_a(uncompressed);
4094 frame = ((buf_block_t*) bpage)->frame;4090 frame = ((buf_block_t*) bpage)->frame;
@@ -4255,8 +4251,37 @@
4255 }4251 }
4256 }4252 }
42574253
4258 buf_pool_mutex_enter(buf_pool);4254 if (io_type == BUF_IO_WRITE
4259 mutex_enter(buf_page_get_mutex(bpage));4255 && (
4256#if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
4257 /* to keep consistency at buf_LRU_insert_zip_clean() */
4258 buf_page_get_state(bpage) == BUF_BLOCK_ZIP_DIRTY ||
4259#endif /* UNIV_DEBUG || UNIV_BUF_DEBUG */
4260 buf_page_get_flush_type(bpage) == BUF_FLUSH_LRU)) {
4261
4262 have_LRU_mutex = TRUE; /* optimistic */
4263 }
4264retry_mutex:
4265 if (have_LRU_mutex) {
4266 mutex_enter(&buf_pool->LRU_list_mutex);
4267 }
4268
4269 ib_mutex_t* block_mutex = buf_page_get_mutex(bpage);
4270 mutex_enter(block_mutex);
4271
4272 if (UNIV_UNLIKELY(io_type == BUF_IO_WRITE
4273 && (
4274#if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
4275 buf_page_get_state(bpage) == BUF_BLOCK_ZIP_DIRTY
4276 ||
4277#endif
4278 buf_page_get_flush_type(bpage) == BUF_FLUSH_LRU)
4279 && !have_LRU_mutex)) {
4280
4281 mutex_exit(block_mutex);
4282 have_LRU_mutex = TRUE;
4283 goto retry_mutex;
4284 }
42604285
4261#ifdef UNIV_IBUF_COUNT_DEBUG4286#ifdef UNIV_IBUF_COUNT_DEBUG
4262 if (io_type == BUF_IO_WRITE || uncompressed) {4287 if (io_type == BUF_IO_WRITE || uncompressed) {
@@ -4271,17 +4296,20 @@
4271 removes the newest lock debug record, without checking the thread4296 removes the newest lock debug record, without checking the thread
4272 id. */4297 id. */
42734298
4274 buf_page_set_io_fix(bpage, BUF_IO_NONE);
4275
4276 switch (io_type) {4299 switch (io_type) {
4277 case BUF_IO_READ:4300 case BUF_IO_READ:
4301
4302 buf_page_set_io_fix(bpage, BUF_IO_NONE);
4303
4278 /* NOTE that the call to ibuf may have moved the ownership of4304 /* NOTE that the call to ibuf may have moved the ownership of
4279 the x-latch to this OS thread: do not let this confuse you in4305 the x-latch to this OS thread: do not let this confuse you in
4280 debugging! */4306 debugging! */
42814307
4282 ut_ad(buf_pool->n_pend_reads > 0);4308 ut_ad(buf_pool->n_pend_reads > 0);
4283 buf_pool->n_pend_reads--;4309 os_atomic_decrement_ulint(&buf_pool->n_pend_reads, 1);
4284 buf_pool->stat.n_pages_read++;4310 os_atomic_increment_ulint(&buf_pool->stat.n_pages_read, 1);
4311
4312 ut_ad(!have_LRU_mutex);
42854313
4286 if (uncompressed) {4314 if (uncompressed) {
4287 rw_lock_x_unlock_gen(&((buf_block_t*) bpage)->lock,4315 rw_lock_x_unlock_gen(&((buf_block_t*) bpage)->lock,
@@ -4296,13 +4324,17 @@
42964324
4297 buf_flush_write_complete(bpage);4325 buf_flush_write_complete(bpage);
42984326
4327 os_atomic_increment_ulint(&buf_pool->stat.n_pages_written, 1);
4328
4329 if (have_LRU_mutex) {
4330 mutex_exit(&buf_pool->LRU_list_mutex);
4331 }
4332
4299 if (uncompressed) {4333 if (uncompressed) {
4300 rw_lock_s_unlock_gen(&((buf_block_t*) bpage)->lock,4334 rw_lock_s_unlock_gen(&((buf_block_t*) bpage)->lock,
4301 BUF_IO_WRITE);4335 BUF_IO_WRITE);
4302 }4336 }
43034337
4304 buf_pool->stat.n_pages_written++;
4305
4306 break;4338 break;
43074339
4308 default:4340 default:
@@ -4320,8 +4352,7 @@
4320 }4352 }
4321#endif /* UNIV_DEBUG */4353#endif /* UNIV_DEBUG */
43224354
4323 mutex_exit(buf_page_get_mutex(bpage));4355 mutex_exit(block_mutex);
4324 buf_pool_mutex_exit(buf_pool);
43254356
4326 return(true);4357 return(true);
4327}4358}
@@ -4340,14 +4371,16 @@
43404371
4341 ut_ad(buf_pool);4372 ut_ad(buf_pool);
43424373
4343 buf_pool_mutex_enter(buf_pool);
4344
4345 chunk = buf_pool->chunks;4374 chunk = buf_pool->chunks;
43464375
4347 for (i = buf_pool->n_chunks; i--; chunk++) {4376 for (i = buf_pool->n_chunks; i--; chunk++) {
43484377
4378 mutex_enter(&buf_pool->LRU_list_mutex);
4379
4349 const buf_block_t* block = buf_chunk_not_freed(chunk);4380 const buf_block_t* block = buf_chunk_not_freed(chunk);
43504381
4382 mutex_exit(&buf_pool->LRU_list_mutex);
4383
4351 if (UNIV_LIKELY_NULL(block)) {4384 if (UNIV_LIKELY_NULL(block)) {
4352 fprintf(stderr,4385 fprintf(stderr,
4353 "Page %lu %lu still fixed or dirty\n",4386 "Page %lu %lu still fixed or dirty\n",
@@ -4357,8 +4390,6 @@
4357 }4390 }
4358 }4391 }
43594392
4360 buf_pool_mutex_exit(buf_pool);
4361
4362 return(TRUE);4393 return(TRUE);
4363}4394}
43644395
@@ -4372,7 +4403,9 @@
4372{4403{
4373 ulint i;4404 ulint i;
43744405
4375 buf_pool_mutex_enter(buf_pool);4406 ut_ad(!mutex_own(&buf_pool->LRU_list_mutex));
4407
4408 mutex_enter(&buf_pool->flush_state_mutex);
43764409
4377 for (i = BUF_FLUSH_LRU; i < BUF_FLUSH_N_TYPES; i++) {4410 for (i = BUF_FLUSH_LRU; i < BUF_FLUSH_N_TYPES; i++) {
43784411
@@ -4390,21 +4423,20 @@
4390 if (buf_pool->n_flush[i] > 0) {4423 if (buf_pool->n_flush[i] > 0) {
4391 buf_flush_t type = static_cast<buf_flush_t>(i);4424 buf_flush_t type = static_cast<buf_flush_t>(i);
43924425
4393 buf_pool_mutex_exit(buf_pool);4426 mutex_exit(&buf_pool->flush_state_mutex);
4394 buf_flush_wait_batch_end(buf_pool, type);4427 buf_flush_wait_batch_end(buf_pool, type);
4395 buf_pool_mutex_enter(buf_pool);4428 mutex_enter(&buf_pool->flush_state_mutex);
4396 }4429 }
4397 }4430 }
43984431 mutex_exit(&buf_pool->flush_state_mutex);
4399 buf_pool_mutex_exit(buf_pool);
44004432
4401 ut_ad(buf_all_freed_instance(buf_pool));4433 ut_ad(buf_all_freed_instance(buf_pool));
44024434
4403 buf_pool_mutex_enter(buf_pool);
4404
4405 while (buf_LRU_scan_and_free_block(buf_pool, TRUE)) {4435 while (buf_LRU_scan_and_free_block(buf_pool, TRUE)) {
4406 }4436 }
44074437
4438 mutex_enter(&buf_pool->LRU_list_mutex);
4439
4408 ut_ad(UT_LIST_GET_LEN(buf_pool->LRU) == 0);4440 ut_ad(UT_LIST_GET_LEN(buf_pool->LRU) == 0);
4409 ut_ad(UT_LIST_GET_LEN(buf_pool->unzip_LRU) == 0);4441 ut_ad(UT_LIST_GET_LEN(buf_pool->unzip_LRU) == 0);
44104442
@@ -4412,10 +4444,10 @@
4412 buf_pool->LRU_old = NULL;4444 buf_pool->LRU_old = NULL;
4413 buf_pool->LRU_old_len = 0;4445 buf_pool->LRU_old_len = 0;
44144446
4447 mutex_exit(&buf_pool->LRU_list_mutex);
4448
4415 memset(&buf_pool->stat, 0x00, sizeof(buf_pool->stat));4449 memset(&buf_pool->stat, 0x00, sizeof(buf_pool->stat));
4416 buf_refresh_io_stats(buf_pool);4450 buf_refresh_io_stats(buf_pool);
4417
4418 buf_pool_mutex_exit(buf_pool);
4419}4451}
44204452
4421/*********************************************************************//**4453/*********************************************************************//**
@@ -4460,8 +4492,11 @@
44604492
4461 ut_ad(buf_pool);4493 ut_ad(buf_pool);
44624494
4463 buf_pool_mutex_enter(buf_pool);4495 mutex_enter(&buf_pool->LRU_list_mutex);
4464 hash_lock_x_all(buf_pool->page_hash);4496 hash_lock_x_all(buf_pool->page_hash);
4497 mutex_enter(&buf_pool->zip_mutex);
4498 mutex_enter(&buf_pool->free_list_mutex);
4499 mutex_enter(&buf_pool->flush_state_mutex);
44654500
4466 chunk = buf_pool->chunks;4501 chunk = buf_pool->chunks;
44674502
@@ -4474,8 +4509,6 @@
44744509
4475 for (j = chunk->size; j--; block++) {4510 for (j = chunk->size; j--; block++) {
44764511
4477 mutex_enter(&block->mutex);
4478
4479 switch (buf_block_get_state(block)) {4512 switch (buf_block_get_state(block)) {
4480 case BUF_BLOCK_POOL_WATCH:4513 case BUF_BLOCK_POOL_WATCH:
4481 case BUF_BLOCK_ZIP_PAGE:4514 case BUF_BLOCK_ZIP_PAGE:
@@ -4486,6 +4519,7 @@
4486 break;4519 break;
44874520
4488 case BUF_BLOCK_FILE_PAGE:4521 case BUF_BLOCK_FILE_PAGE:
4522
4489 space = buf_block_get_space(block);4523 space = buf_block_get_space(block);
4490 offset = buf_block_get_page_no(block);4524 offset = buf_block_get_page_no(block);
4491 fold = buf_page_address_fold(space, offset);4525 fold = buf_page_address_fold(space, offset);
@@ -4496,14 +4530,15 @@
4496 == &block->page);4530 == &block->page);
44974531
4498#ifdef UNIV_IBUF_COUNT_DEBUG4532#ifdef UNIV_IBUF_COUNT_DEBUG
4499 ut_a(buf_page_get_io_fix(&block->page)4533 ut_a(buf_page_get_io_fix_unlocked(&block->page)
4500 == BUF_IO_READ4534 == BUF_IO_READ
4501 || !ibuf_count_get(buf_block_get_space(4535 || !ibuf_count_get(buf_block_get_space(
4502 block),4536 block),
4503 buf_block_get_page_no(4537 buf_block_get_page_no(
4504 block)));4538 block)));
4505#endif4539#endif
4506 switch (buf_page_get_io_fix(&block->page)) {4540 switch (buf_page_get_io_fix_unlocked(
4541 &block->page)) {
4507 case BUF_IO_NONE:4542 case BUF_IO_NONE:
4508 break;4543 break;
45094544
@@ -4511,17 +4546,8 @@
4511 switch (buf_page_get_flush_type(4546 switch (buf_page_get_flush_type(
4512 &block->page)) {4547 &block->page)) {
4513 case BUF_FLUSH_LRU:4548 case BUF_FLUSH_LRU:
4514 n_lru_flush++;
4515 goto assert_s_latched;
4516 case BUF_FLUSH_SINGLE_PAGE:4549 case BUF_FLUSH_SINGLE_PAGE:
4517 n_page_flush++;
4518assert_s_latched:
4519 ut_a(rw_lock_is_locked(
4520 &block->lock,
4521 RW_LOCK_SHARED));
4522 break;
4523 case BUF_FLUSH_LIST:4550 case BUF_FLUSH_LIST:
4524 n_list_flush++;
4525 break;4551 break;
4526 default:4552 default:
4527 ut_error;4553 ut_error;
@@ -4552,13 +4578,9 @@
4552 /* do nothing */4578 /* do nothing */
4553 break;4579 break;
4554 }4580 }
4555
4556 mutex_exit(&block->mutex);
4557 }4581 }
4558 }4582 }
45594583
4560 mutex_enter(&buf_pool->zip_mutex);
4561
4562 /* Check clean compressed-only blocks. */4584 /* Check clean compressed-only blocks. */
45634585
4564 for (b = UT_LIST_GET_FIRST(buf_pool->zip_clean); b;4586 for (b = UT_LIST_GET_FIRST(buf_pool->zip_clean); b;
@@ -4604,7 +4626,9 @@
4604 case BUF_BLOCK_ZIP_DIRTY:4626 case BUF_BLOCK_ZIP_DIRTY:
4605 n_lru++;4627 n_lru++;
4606 n_zip++;4628 n_zip++;
4607 switch (buf_page_get_io_fix(b)) {4629 /* fallthrough */
4630 case BUF_BLOCK_FILE_PAGE:
4631 switch (buf_page_get_io_fix_unlocked(b)) {
4608 case BUF_IO_NONE:4632 case BUF_IO_NONE:
4609 case BUF_IO_READ:4633 case BUF_IO_READ:
4610 case BUF_IO_PIN:4634 case BUF_IO_PIN:
@@ -4624,11 +4648,10 @@
4624 ut_error;4648 ut_error;
4625 }4649 }
4626 break;4650 break;
4651 default:
4652 ut_error;
4627 }4653 }
4628 break;4654 break;
4629 case BUF_BLOCK_FILE_PAGE:
4630 /* uncompressed page */
4631 break;
4632 case BUF_BLOCK_POOL_WATCH:4655 case BUF_BLOCK_POOL_WATCH:
4633 case BUF_BLOCK_ZIP_PAGE:4656 case BUF_BLOCK_ZIP_PAGE:
4634 case BUF_BLOCK_NOT_USED:4657 case BUF_BLOCK_NOT_USED:
@@ -4658,6 +4681,9 @@
4658 }4681 }
46594682
4660 ut_a(UT_LIST_GET_LEN(buf_pool->LRU) == n_lru);4683 ut_a(UT_LIST_GET_LEN(buf_pool->LRU) == n_lru);
4684
4685 mutex_exit(&buf_pool->LRU_list_mutex);
4686
4661 if (UT_LIST_GET_LEN(buf_pool->free) != n_free) {4687 if (UT_LIST_GET_LEN(buf_pool->free) != n_free) {
4662 fprintf(stderr, "Free list len %lu, free blocks %lu\n",4688 fprintf(stderr, "Free list len %lu, free blocks %lu\n",
4663 (ulong) UT_LIST_GET_LEN(buf_pool->free),4689 (ulong) UT_LIST_GET_LEN(buf_pool->free),
@@ -4665,11 +4691,13 @@
4665 ut_error;4691 ut_error;
4666 }4692 }
46674693
4694 mutex_exit(&buf_pool->free_list_mutex);
4695
4668 ut_a(buf_pool->n_flush[BUF_FLUSH_LIST] == n_list_flush);4696 ut_a(buf_pool->n_flush[BUF_FLUSH_LIST] == n_list_flush);
4669 ut_a(buf_pool->n_flush[BUF_FLUSH_LRU] == n_lru_flush);4697 ut_a(buf_pool->n_flush[BUF_FLUSH_LRU] == n_lru_flush);
4670 ut_a(buf_pool->n_flush[BUF_FLUSH_SINGLE_PAGE] == n_page_flush);4698 ut_a(buf_pool->n_flush[BUF_FLUSH_SINGLE_PAGE] == n_page_flush);
46714699
4672 buf_pool_mutex_exit(buf_pool);4700 mutex_exit(&buf_pool->flush_state_mutex);
46734701
4674 ut_a(buf_LRU_validate());4702 ut_a(buf_LRU_validate());
4675 ut_a(buf_flush_validate(buf_pool));4703 ut_a(buf_flush_validate(buf_pool));
@@ -4727,8 +4755,7 @@
47274755
4728 counts = static_cast<ulint*>(mem_alloc(sizeof(ulint) * size));4756 counts = static_cast<ulint*>(mem_alloc(sizeof(ulint) * size));
47294757
4730 buf_pool_mutex_enter(buf_pool);4758 /* Dirty reads below */
4731 buf_flush_list_mutex_enter(buf_pool);
47324759
4733 fprintf(stderr,4760 fprintf(stderr,
4734 "buf_pool size %lu\n"4761 "buf_pool size %lu\n"
@@ -4755,12 +4782,12 @@
4755 (ulong) buf_pool->stat.n_pages_created,4782 (ulong) buf_pool->stat.n_pages_created,
4756 (ulong) buf_pool->stat.n_pages_written);4783 (ulong) buf_pool->stat.n_pages_written);
47574784
4758 buf_flush_list_mutex_exit(buf_pool);
4759
4760 /* Count the number of blocks belonging to each index in the buffer */4785 /* Count the number of blocks belonging to each index in the buffer */
47614786
4762 n_found = 0;4787 n_found = 0;
47634788
4789 mutex_enter(&buf_pool->LRU_list_mutex);
4790
4764 chunk = buf_pool->chunks;4791 chunk = buf_pool->chunks;
47654792
4766 for (i = buf_pool->n_chunks; i--; chunk++) {4793 for (i = buf_pool->n_chunks; i--; chunk++) {
@@ -4796,7 +4823,7 @@
4796 }4823 }
4797 }4824 }
47984825
4799 buf_pool_mutex_exit(buf_pool);4826 mutex_exit(&buf_pool->LRU_list_mutex);
48004827
4801 for (i = 0; i < n_found; i++) {4828 for (i = 0; i < n_found; i++) {
4802 index = dict_index_get_if_in_cache(index_ids[i]);4829 index = dict_index_get_if_in_cache(index_ids[i]);
@@ -4853,7 +4880,8 @@
4853 buf_chunk_t* chunk;4880 buf_chunk_t* chunk;
4854 ulint fixed_pages_number = 0;4881 ulint fixed_pages_number = 0;
48554882
4856 buf_pool_mutex_enter(buf_pool);4883 /* The LRU list mutex is enough to protect the required fields below */
4884 mutex_enter(&buf_pool->LRU_list_mutex);
48574885
4858 chunk = buf_pool->chunks;4886 chunk = buf_pool->chunks;
48594887
@@ -4870,18 +4898,17 @@
4870 continue;4898 continue;
4871 }4899 }
48724900
4873 mutex_enter(&block->mutex);
4874
4875 if (block->page.buf_fix_count != 04901 if (block->page.buf_fix_count != 0
4876 || buf_page_get_io_fix(&block->page)4902 || buf_page_get_io_fix_unlocked(&block->page)
4877 != BUF_IO_NONE) {4903 != BUF_IO_NONE) {
4878 fixed_pages_number++;4904 fixed_pages_number++;
4879 }4905 }
48804906
4881 mutex_exit(&block->mutex);
4882 }4907 }
4883 }4908 }
48844909
4910 mutex_exit(&buf_pool->LRU_list_mutex);
4911
4885 mutex_enter(&buf_pool->zip_mutex);4912 mutex_enter(&buf_pool->zip_mutex);
48864913
4887 /* Traverse the lists of clean and dirty compressed-only blocks. */4914 /* Traverse the lists of clean and dirty compressed-only blocks. */
@@ -4925,7 +4952,6 @@
49254952
4926 buf_flush_list_mutex_exit(buf_pool);4953 buf_flush_list_mutex_exit(buf_pool);
4927 mutex_exit(&buf_pool->zip_mutex);4954 mutex_exit(&buf_pool->zip_mutex);
4928 buf_pool_mutex_exit(buf_pool);
49294955
4930 return(fixed_pages_number);4956 return(fixed_pages_number);
4931}4957}
@@ -5073,9 +5099,6 @@
5073 /* Find appropriate pool_info to store stats for this buffer pool */5099 /* Find appropriate pool_info to store stats for this buffer pool */
5074 pool_info = &all_pool_info[pool_id];5100 pool_info = &all_pool_info[pool_id];
50755101
5076 buf_pool_mutex_enter(buf_pool);
5077 buf_flush_list_mutex_enter(buf_pool);
5078
5079 pool_info->pool_unique_id = pool_id;5102 pool_info->pool_unique_id = pool_id;
50805103
5081 pool_info->pool_size = buf_pool->curr_size;5104 pool_info->pool_size = buf_pool->curr_size;
@@ -5094,6 +5117,8 @@
50945117
5095 pool_info->n_pend_reads = buf_pool->n_pend_reads;5118 pool_info->n_pend_reads = buf_pool->n_pend_reads;
50965119
5120 mutex_enter(&buf_pool->flush_state_mutex);
5121
5097 pool_info->n_pending_flush_lru =5122 pool_info->n_pending_flush_lru =
5098 (buf_pool->n_flush[BUF_FLUSH_LRU]5123 (buf_pool->n_flush[BUF_FLUSH_LRU]
5099 + buf_pool->init_flush[BUF_FLUSH_LRU]);5124 + buf_pool->init_flush[BUF_FLUSH_LRU]);
@@ -5106,7 +5131,7 @@
5106 (buf_pool->n_flush[BUF_FLUSH_SINGLE_PAGE]5131 (buf_pool->n_flush[BUF_FLUSH_SINGLE_PAGE]
5107 + buf_pool->init_flush[BUF_FLUSH_SINGLE_PAGE]);5132 + buf_pool->init_flush[BUF_FLUSH_SINGLE_PAGE]);
51085133
5109 buf_flush_list_mutex_exit(buf_pool);5134 mutex_exit(&buf_pool->flush_state_mutex);
51105135
5111 current_time = time(NULL);5136 current_time = time(NULL);
5112 time_elapsed = 0.001 + difftime(current_time,5137 time_elapsed = 0.001 + difftime(current_time,
@@ -5189,7 +5214,6 @@
5189 pool_info->unzip_cur = buf_LRU_stat_cur.unzip;5214 pool_info->unzip_cur = buf_LRU_stat_cur.unzip;
51905215
5191 buf_refresh_io_stats(buf_pool);5216 buf_refresh_io_stats(buf_pool);
5192 buf_pool_mutex_exit(buf_pool);
5193}5217}
51945218
5195/*********************************************************************//**5219/*********************************************************************//**
@@ -5398,22 +5422,22 @@
5398 ulint i;5422 ulint i;
5399 ulint pending_io = 0;5423 ulint pending_io = 0;
54005424
5401 buf_pool_mutex_enter_all();
5402
5403 for (i = 0; i < srv_buf_pool_instances; i++) {5425 for (i = 0; i < srv_buf_pool_instances; i++) {
5404 const buf_pool_t* buf_pool;5426 buf_pool_t* buf_pool;
54055427
5406 buf_pool = buf_pool_from_array(i);5428 buf_pool = buf_pool_from_array(i);
54075429
5408 pending_io += buf_pool->n_pend_reads5430 pending_io += buf_pool->n_pend_reads;
5409 + buf_pool->n_flush[BUF_FLUSH_LRU]5431
5410 + buf_pool->n_flush[BUF_FLUSH_SINGLE_PAGE]5432 mutex_enter(&buf_pool->flush_state_mutex);
5411 + buf_pool->n_flush[BUF_FLUSH_LIST];5433
54125434 pending_io += buf_pool->n_flush[BUF_FLUSH_LRU];
5435 pending_io += buf_pool->n_flush[BUF_FLUSH_SINGLE_PAGE];
5436 pending_io += buf_pool->n_flush[BUF_FLUSH_LIST];
5437
5438 mutex_exit(&buf_pool->flush_state_mutex);
5413 }5439 }
54145440
5415 buf_pool_mutex_exit_all();
5416
5417 return(pending_io);5441 return(pending_io);
5418}5442}
54195443
@@ -5429,11 +5453,11 @@
5429{5453{
5430 ulint len;5454 ulint len;
54315455
5432 buf_pool_mutex_enter(buf_pool);5456 mutex_enter(&buf_pool->free_list_mutex);
54335457
5434 len = UT_LIST_GET_LEN(buf_pool->free);5458 len = UT_LIST_GET_LEN(buf_pool->free);
54355459
5436 buf_pool_mutex_exit(buf_pool);5460 mutex_exit(&buf_pool->free_list_mutex);
54375461
5438 return(len);5462 return(len);
5439}5463}
54405464
=== modified file 'Percona-Server/storage/innobase/buf/buf0dblwr.cc'
--- Percona-Server/storage/innobase/buf/buf0dblwr.cc 2013-06-20 15:16:00 +0000
+++ Percona-Server/storage/innobase/buf/buf0dblwr.cc 2013-09-20 05:29:11 +0000
@@ -936,6 +936,7 @@
936 ulint zip_size;936 ulint zip_size;
937937
938 ut_a(buf_page_in_file(bpage));938 ut_a(buf_page_in_file(bpage));
939 ut_ad(!mutex_own(&buf_pool_from_bpage(bpage)->LRU_list_mutex));
939940
940try_again:941try_again:
941 mutex_enter(&buf_dblwr->mutex);942 mutex_enter(&buf_dblwr->mutex);
942943
=== modified file 'Percona-Server/storage/innobase/buf/buf0dump.cc'
--- Percona-Server/storage/innobase/buf/buf0dump.cc 2012-12-04 08:24:59 +0000
+++ Percona-Server/storage/innobase/buf/buf0dump.cc 2013-09-20 05:29:11 +0000
@@ -28,7 +28,7 @@
28#include <stdarg.h> /* va_* */28#include <stdarg.h> /* va_* */
29#include <string.h> /* strerror() */29#include <string.h> /* strerror() */
3030
31#include "buf0buf.h" /* buf_pool_mutex_enter(), srv_buf_pool_instances */31#include "buf0buf.h" /* srv_buf_pool_instances */
32#include "buf0dump.h"32#include "buf0dump.h"
33#include "db0err.h"33#include "db0err.h"
34#include "dict0dict.h" /* dict_operation_lock */34#include "dict0dict.h" /* dict_operation_lock */
@@ -58,8 +58,8 @@
58static ibool buf_load_abort_flag = FALSE;58static ibool buf_load_abort_flag = FALSE;
5959
60/* Used to temporary store dump info in order to avoid IO while holding60/* Used to temporary store dump info in order to avoid IO while holding
61buffer pool mutex during dump and also to sort the contents of the dump61buffer pool LRU list mutex during dump and also to sort the contents of the
62before reading the pages from disk during load.62dump before reading the pages from disk during load.
63We store the space id in the high 32 bits and page no in low 32 bits. */63We store the space id in the high 32 bits and page no in low 32 bits. */
64typedef ib_uint64_t buf_dump_t;64typedef ib_uint64_t buf_dump_t;
6565
@@ -218,15 +218,15 @@
218218
219 buf_pool = buf_pool_from_array(i);219 buf_pool = buf_pool_from_array(i);
220220
221 /* obtain buf_pool mutex before allocate, since221 /* obtain buf_pool LRU list mutex before allocate, since
222 UT_LIST_GET_LEN(buf_pool->LRU) could change */222 UT_LIST_GET_LEN(buf_pool->LRU) could change */
223 buf_pool_mutex_enter(buf_pool);223 mutex_enter(&buf_pool->LRU_list_mutex);
224224
225 n_pages = UT_LIST_GET_LEN(buf_pool->LRU);225 n_pages = UT_LIST_GET_LEN(buf_pool->LRU);
226226
227 /* skip empty buffer pools */227 /* skip empty buffer pools */
228 if (n_pages == 0) {228 if (n_pages == 0) {
229 buf_pool_mutex_exit(buf_pool);229 mutex_exit(&buf_pool->LRU_list_mutex);
230 continue;230 continue;
231 }231 }
232232
@@ -234,7 +234,7 @@
234 ut_malloc(n_pages * sizeof(*dump))) ;234 ut_malloc(n_pages * sizeof(*dump))) ;
235235
236 if (dump == NULL) {236 if (dump == NULL) {
237 buf_pool_mutex_exit(buf_pool);237 mutex_exit(&buf_pool->LRU_list_mutex);
238 fclose(f);238 fclose(f);
239 buf_dump_status(STATUS_ERR,239 buf_dump_status(STATUS_ERR,
240 "Cannot allocate " ULINTPF " bytes: %s",240 "Cannot allocate " ULINTPF " bytes: %s",
@@ -256,7 +256,7 @@
256256
257 ut_a(j == n_pages);257 ut_a(j == n_pages);
258258
259 buf_pool_mutex_exit(buf_pool);259 mutex_exit(&buf_pool->LRU_list_mutex);
260260
261 for (j = 0; j < n_pages && !SHOULD_QUIT(); j++) {261 for (j = 0; j < n_pages && !SHOULD_QUIT(); j++) {
262 ret = fprintf(f, ULINTPF "," ULINTPF "\n",262 ret = fprintf(f, ULINTPF "," ULINTPF "\n",
263263
=== modified file 'Percona-Server/storage/innobase/buf/buf0flu.cc'
--- Percona-Server/storage/innobase/buf/buf0flu.cc 2013-08-16 09:11:51 +0000
+++ Percona-Server/storage/innobase/buf/buf0flu.cc 2013-09-20 05:29:11 +0000
@@ -350,7 +350,6 @@
350 buf_block_t* block, /*!< in/out: block which is modified */350 buf_block_t* block, /*!< in/out: block which is modified */
351 lsn_t lsn) /*!< in: oldest modification */351 lsn_t lsn) /*!< in: oldest modification */
352{352{
353 ut_ad(!buf_pool_mutex_own(buf_pool));
354 ut_ad(log_flush_order_mutex_own());353 ut_ad(log_flush_order_mutex_own());
355 ut_ad(mutex_own(&block->mutex));354 ut_ad(mutex_own(&block->mutex));
356355
@@ -409,15 +408,14 @@
409 buf_page_t* prev_b;408 buf_page_t* prev_b;
410 buf_page_t* b;409 buf_page_t* b;
411410
412 ut_ad(!buf_pool_mutex_own(buf_pool));
413 ut_ad(log_flush_order_mutex_own());411 ut_ad(log_flush_order_mutex_own());
414 ut_ad(mutex_own(&block->mutex));412 ut_ad(mutex_own(&block->mutex));
415 ut_ad(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);413 ut_ad(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);
416414
417 buf_flush_list_mutex_enter(buf_pool);415 buf_flush_list_mutex_enter(buf_pool);
418416
419 /* The field in_LRU_list is protected by buf_pool->mutex, which417 /* The field in_LRU_list is protected by buf_pool->LRU_list_mutex,
420 we are not holding. However, while a block is in the flush418 which we are not holding. However, while a block is in the flush
421 list, it is dirty and cannot be discarded, not from the419 list, it is dirty and cannot be discarded, not from the
422 page_hash or from the LRU list. At most, the uncompressed420 page_hash or from the LRU list. At most, the uncompressed
423 page frame of a compressed block may be discarded or created421 page frame of a compressed block may be discarded or created
@@ -501,7 +499,7 @@
501{499{
502#ifdef UNIV_DEBUG500#ifdef UNIV_DEBUG
503 buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);501 buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);
504 ut_ad(buf_pool_mutex_own(buf_pool));502 ut_ad(mutex_own(&buf_pool->LRU_list_mutex));
505#endif503#endif
506 ut_ad(mutex_own(buf_page_get_mutex(bpage)));504 ut_ad(mutex_own(buf_page_get_mutex(bpage)));
507 ut_ad(bpage->in_LRU_list);505 ut_ad(bpage->in_LRU_list);
@@ -535,17 +533,13 @@
535 buf_page_in_file(bpage) */533 buf_page_in_file(bpage) */
536 buf_flush_t flush_type)/*!< in: type of flush */534 buf_flush_t flush_type)/*!< in: type of flush */
537{535{
538#ifdef UNIV_DEBUG536 ut_ad(flush_type < BUF_FLUSH_N_TYPES);
539 buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);537 ut_ad(mutex_own(buf_page_get_mutex(bpage))
540 ut_ad(buf_pool_mutex_own(buf_pool));538 || flush_type == BUF_FLUSH_LIST);
541#endif /* UNIV_DEBUG */
542
543 ut_a(buf_page_in_file(bpage));539 ut_a(buf_page_in_file(bpage));
544 ut_ad(mutex_own(buf_page_get_mutex(bpage)));
545 ut_ad(flush_type < BUF_FLUSH_N_TYPES);
546540
547 if (bpage->oldest_modification == 0541 if (bpage->oldest_modification == 0
548 || buf_page_get_io_fix(bpage) != BUF_IO_NONE) {542 || buf_page_get_io_fix_unlocked(bpage) != BUF_IO_NONE) {
549 return(false);543 return(false);
550 }544 }
551545
@@ -583,8 +577,11 @@
583 buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);577 buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);
584 ulint zip_size;578 ulint zip_size;
585579
586 ut_ad(buf_pool_mutex_own(buf_pool));
587 ut_ad(mutex_own(buf_page_get_mutex(bpage)));580 ut_ad(mutex_own(buf_page_get_mutex(bpage)));
581#if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
582 ut_ad(buf_page_get_state(bpage) != BUF_BLOCK_ZIP_DIRTY
583 || mutex_own(&buf_pool->LRU_list_mutex));
584#endif
588 ut_ad(bpage->in_flush_list);585 ut_ad(bpage->in_flush_list);
589586
590 buf_flush_list_mutex_enter(buf_pool);587 buf_flush_list_mutex_enter(buf_pool);
@@ -655,7 +652,6 @@
655 buf_page_t* prev_b = NULL;652 buf_page_t* prev_b = NULL;
656 buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);653 buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);
657654
658 ut_ad(buf_pool_mutex_own(buf_pool));
659 /* Must reside in the same buffer pool. */655 /* Must reside in the same buffer pool. */
660 ut_ad(buf_pool == buf_pool_from_bpage(dpage));656 ut_ad(buf_pool == buf_pool_from_bpage(dpage));
661657
@@ -663,13 +659,6 @@
663659
664 buf_flush_list_mutex_enter(buf_pool);660 buf_flush_list_mutex_enter(buf_pool);
665661
666 /* FIXME: At this point we have both buf_pool and flush_list
667 mutexes. Theoretically removal of a block from flush list is
668 only covered by flush_list mutex but currently we do
669 have buf_pool mutex in buf_flush_remove() therefore this block
670 is guaranteed to be in the flush list. We need to check if
671 this will work without the assumption of block removing code
672 having the buf_pool mutex. */
673 ut_ad(bpage->in_flush_list);662 ut_ad(bpage->in_flush_list);
674 ut_ad(dpage->in_flush_list);663 ut_ad(dpage->in_flush_list);
675664
@@ -720,14 +709,15 @@
720/*=====================*/709/*=====================*/
721 buf_page_t* bpage) /*!< in: pointer to the block in question */710 buf_page_t* bpage) /*!< in: pointer to the block in question */
722{711{
723 buf_flush_t flush_type;712 buf_flush_t flush_type = buf_page_get_flush_type(bpage);
724 buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);713 buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);
725714
726 ut_ad(bpage);715 mutex_enter(&buf_pool->flush_state_mutex);
727716
728 buf_flush_remove(bpage);717 buf_flush_remove(bpage);
729718
730 flush_type = buf_page_get_flush_type(bpage);719 buf_page_set_io_fix(bpage, BUF_IO_NONE);
720
731 buf_pool->n_flush[flush_type]--;721 buf_pool->n_flush[flush_type]--;
732722
733 /* fprintf(stderr, "n pending flush %lu\n",723 /* fprintf(stderr, "n pending flush %lu\n",
@@ -742,6 +732,8 @@
742 }732 }
743733
744 buf_dblwr_update(bpage, flush_type);734 buf_dblwr_update(bpage, flush_type);
735
736 mutex_exit(&buf_pool->flush_state_mutex);
745}737}
746#endif /* !UNIV_HOTBACKUP */738#endif /* !UNIV_HOTBACKUP */
747739
@@ -890,7 +882,7 @@
890882
891#ifdef UNIV_DEBUG883#ifdef UNIV_DEBUG
892 buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);884 buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);
893 ut_ad(!buf_pool_mutex_own(buf_pool));885 ut_ad(!mutex_own(&buf_pool->LRU_list_mutex));
894#endif886#endif
895887
896#ifdef UNIV_LOG_DEBUG888#ifdef UNIV_LOG_DEBUG
@@ -899,15 +891,14 @@
899891
900 ut_ad(buf_page_in_file(bpage));892 ut_ad(buf_page_in_file(bpage));
901893
902 /* We are not holding buf_pool->mutex or block_mutex here.894 /* We are not holding block_mutex here.
903 Nevertheless, it is safe to access bpage, because it is895 Nevertheless, it is safe to access bpage, because it is
904 io_fixed and oldest_modification != 0. Thus, it cannot be896 io_fixed and oldest_modification != 0. Thus, it cannot be
905 relocated in the buffer pool or removed from flush_list or897 relocated in the buffer pool or removed from flush_list or
906 LRU_list. */898 LRU_list. */
907 ut_ad(!buf_pool_mutex_own(buf_pool));
908 ut_ad(!buf_flush_list_mutex_own(buf_pool));899 ut_ad(!buf_flush_list_mutex_own(buf_pool));
909 ut_ad(!mutex_own(buf_page_get_mutex(bpage)));900 ut_ad(!mutex_own(buf_page_get_mutex(bpage)));
910 ut_ad(buf_page_get_io_fix(bpage) == BUF_IO_WRITE);901 ut_ad(buf_page_get_io_fix_unlocked(bpage) == BUF_IO_WRITE);
911 ut_ad(bpage->oldest_modification != 0);902 ut_ad(bpage->oldest_modification != 0);
912903
913#ifdef UNIV_IBUF_COUNT_DEBUG904#ifdef UNIV_IBUF_COUNT_DEBUG
@@ -989,9 +980,8 @@
989Writes a flushable page asynchronously from the buffer pool to a file.980Writes a flushable page asynchronously from the buffer pool to a file.
990NOTE: in simulated aio we must call981NOTE: in simulated aio we must call
991os_aio_simulated_wake_handler_threads after we have posted a batch of982os_aio_simulated_wake_handler_threads after we have posted a batch of
992writes! NOTE: buf_pool->mutex and buf_page_get_mutex(bpage) must be983writes! NOTE: buf_page_get_mutex(bpage) must be held upon entering this
993held upon entering this function, and they will be released by this984function, and it will be released by this function. */
994function. */
995UNIV_INTERN985UNIV_INTERN
996void986void
997buf_flush_page(987buf_flush_page(
@@ -1005,7 +995,7 @@
1005 ibool is_uncompressed;995 ibool is_uncompressed;
1006996
1007 ut_ad(flush_type < BUF_FLUSH_N_TYPES);997 ut_ad(flush_type < BUF_FLUSH_N_TYPES);
1008 ut_ad(buf_pool_mutex_own(buf_pool));998 ut_ad(!mutex_own(&buf_pool->LRU_list_mutex));
1009 ut_ad(buf_page_in_file(bpage));999 ut_ad(buf_page_in_file(bpage));
1010 ut_ad(!sync || flush_type == BUF_FLUSH_SINGLE_PAGE);1000 ut_ad(!sync || flush_type == BUF_FLUSH_SINGLE_PAGE);
10111001
@@ -1014,6 +1004,8 @@
10141004
1015 ut_ad(buf_flush_ready_for_flush(bpage, flush_type));1005 ut_ad(buf_flush_ready_for_flush(bpage, flush_type));
10161006
1007 mutex_enter(&buf_pool->flush_state_mutex);
1008
1017 buf_page_set_io_fix(bpage, BUF_IO_WRITE);1009 buf_page_set_io_fix(bpage, BUF_IO_WRITE);
10181010
1019 buf_page_set_flush_type(bpage, flush_type);1011 buf_page_set_flush_type(bpage, flush_type);
@@ -1025,6 +1017,8 @@
10251017
1026 buf_pool->n_flush[flush_type]++;1018 buf_pool->n_flush[flush_type]++;
10271019
1020 mutex_exit(&buf_pool->flush_state_mutex);
1021
1028 is_uncompressed = (buf_page_get_state(bpage) == BUF_BLOCK_FILE_PAGE);1022 is_uncompressed = (buf_page_get_state(bpage) == BUF_BLOCK_FILE_PAGE);
1029 ut_ad(is_uncompressed == (block_mutex != &buf_pool->zip_mutex));1023 ut_ad(is_uncompressed == (block_mutex != &buf_pool->zip_mutex));
10301024
@@ -1042,7 +1036,6 @@
1042 }1036 }
10431037
1044 mutex_exit(block_mutex);1038 mutex_exit(block_mutex);
1045 buf_pool_mutex_exit(buf_pool);
10461039
1047 /* Even though bpage is not protected by any mutex at1040 /* Even though bpage is not protected by any mutex at
1048 this point, it is safe to access bpage, because it is1041 this point, it is safe to access bpage, because it is
@@ -1080,11 +1073,10 @@
1080 }1073 }
10811074
1082 /* Note that the s-latch is acquired before releasing the1075 /* Note that the s-latch is acquired before releasing the
1083 buf_pool mutex: this ensures that the latch is acquired1076 buf_page_get_mutex() mutex: this ensures that the latch is
1084 immediately. */1077 acquired immediately. */
10851078
1086 mutex_exit(block_mutex);1079 mutex_exit(block_mutex);
1087 buf_pool_mutex_exit(buf_pool);
1088 break;1080 break;
10891081
1090 default:1082 default:
@@ -1109,9 +1101,9 @@
1109# if defined UNIV_DEBUG || defined UNIV_IBUF_DEBUG1101# if defined UNIV_DEBUG || defined UNIV_IBUF_DEBUG
1110/********************************************************************//**1102/********************************************************************//**
1111Writes a flushable page asynchronously from the buffer pool to a file.1103Writes a flushable page asynchronously from the buffer pool to a file.
1112NOTE: buf_pool->mutex and block->mutex must be held upon entering this1104NOTE: block->mutex must be held upon entering this function, and it will be
1113function, and they will be released by this function after flushing.1105released by this function after flushing. This is loosely based on
1114This is loosely based on buf_flush_batch() and buf_flush_page().1106buf_flush_batch() and buf_flush_page().
1115@return TRUE if the page was flushed and the mutexes released */1107@return TRUE if the page was flushed and the mutexes released */
1116UNIV_INTERN1108UNIV_INTERN
1117ibool1109ibool
@@ -1120,7 +1112,6 @@
1120 buf_pool_t* buf_pool, /*!< in/out: buffer pool instance */1112 buf_pool_t* buf_pool, /*!< in/out: buffer pool instance */
1121 buf_block_t* block) /*!< in/out: buffer control block */1113 buf_block_t* block) /*!< in/out: buffer control block */
1122{1114{
1123 ut_ad(buf_pool_mutex_own(buf_pool));
1124 ut_ad(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);1115 ut_ad(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);
1125 ut_ad(mutex_own(&block->mutex));1116 ut_ad(mutex_own(&block->mutex));
11261117
@@ -1149,21 +1140,27 @@
1149 buf_page_t* bpage;1140 buf_page_t* bpage;
1150 buf_pool_t* buf_pool = buf_pool_get(space, offset);1141 buf_pool_t* buf_pool = buf_pool_get(space, offset);
1151 bool ret;1142 bool ret;
1143 rw_lock_t* hash_lock;
1144 ib_mutex_t* block_mutex;
11521145
1153 ut_ad(flush_type == BUF_FLUSH_LRU1146 ut_ad(flush_type == BUF_FLUSH_LRU
1154 || flush_type == BUF_FLUSH_LIST);1147 || flush_type == BUF_FLUSH_LIST);
11551148
1156 buf_pool_mutex_enter(buf_pool);
1157
1158 /* We only want to flush pages from this buffer pool. */1149 /* We only want to flush pages from this buffer pool. */
1159 bpage = buf_page_hash_get(buf_pool, space, offset);1150 bpage = buf_page_hash_get_s_locked(buf_pool, space, offset,
1151 &hash_lock);
11601152
1161 if (!bpage) {1153 if (!bpage) {
11621154
1163 buf_pool_mutex_exit(buf_pool);
1164 return(false);1155 return(false);
1165 }1156 }
11661157
1158 block_mutex = buf_page_get_mutex(bpage);
1159
1160 mutex_enter(block_mutex);
1161
1162 rw_lock_s_unlock(hash_lock);
1163
1167 ut_a(buf_page_in_file(bpage));1164 ut_a(buf_page_in_file(bpage));
11681165
1169 /* We avoid flushing 'non-old' blocks in an LRU flush,1166 /* We avoid flushing 'non-old' blocks in an LRU flush,
@@ -1171,15 +1168,13 @@
11711168
1172 ret = false;1169 ret = false;
1173 if (flush_type != BUF_FLUSH_LRU || buf_page_is_old(bpage)) {1170 if (flush_type != BUF_FLUSH_LRU || buf_page_is_old(bpage)) {
1174 ib_mutex_t* block_mutex = buf_page_get_mutex(bpage);
11751171
1176 mutex_enter(block_mutex);
1177 if (buf_flush_ready_for_flush(bpage, flush_type)) {1172 if (buf_flush_ready_for_flush(bpage, flush_type)) {
1178 ret = true;1173 ret = true;
1179 }1174 }
1180 mutex_exit(block_mutex);
1181 }1175 }
1182 buf_pool_mutex_exit(buf_pool);1176
1177 mutex_exit(block_mutex);
11831178
1184 return(ret);1179 return(ret);
1185}1180}
@@ -1207,6 +1202,8 @@
1207 buf_pool_t* buf_pool = buf_pool_get(space, offset);1202 buf_pool_t* buf_pool = buf_pool_get(space, offset);
12081203
1209 ut_ad(flush_type == BUF_FLUSH_LRU || flush_type == BUF_FLUSH_LIST);1204 ut_ad(flush_type == BUF_FLUSH_LRU || flush_type == BUF_FLUSH_LIST);
1205 ut_ad(!mutex_own(&buf_pool->LRU_list_mutex));
1206 ut_ad(!buf_flush_list_mutex_own(buf_pool));
12101207
1211 if (UT_LIST_GET_LEN(buf_pool->LRU) < BUF_LRU_OLD_MIN_LEN1208 if (UT_LIST_GET_LEN(buf_pool->LRU) < BUF_LRU_OLD_MIN_LEN
1212 || srv_flush_neighbors == 0) {1209 || srv_flush_neighbors == 0) {
@@ -1262,6 +1259,8 @@
1262 for (i = low; i < high; i++) {1259 for (i = low; i < high; i++) {
12631260
1264 buf_page_t* bpage;1261 buf_page_t* bpage;
1262 rw_lock_t* hash_lock;
1263 ib_mutex_t* block_mutex;
12651264
1266 if ((count + n_flushed) >= n_to_flush) {1265 if ((count + n_flushed) >= n_to_flush) {
12671266
@@ -1280,17 +1279,21 @@
12801279
1281 buf_pool = buf_pool_get(space, i);1280 buf_pool = buf_pool_get(space, i);
12821281
1283 buf_pool_mutex_enter(buf_pool);
1284
1285 /* We only want to flush pages from this buffer pool. */1282 /* We only want to flush pages from this buffer pool. */
1286 bpage = buf_page_hash_get(buf_pool, space, i);1283 bpage = buf_page_hash_get_s_locked(buf_pool, space, i,
1284 &hash_lock);
12871285
1288 if (!bpage) {1286 if (!bpage) {
12891287
1290 buf_pool_mutex_exit(buf_pool);
1291 continue;1288 continue;
1292 }1289 }
12931290
1291 block_mutex = buf_page_get_mutex(bpage);
1292
1293 mutex_enter(block_mutex);
1294
1295 rw_lock_s_unlock(hash_lock);
1296
1294 ut_a(buf_page_in_file(bpage));1297 ut_a(buf_page_in_file(bpage));
12951298
1296 /* We avoid flushing 'non-old' blocks in an LRU flush,1299 /* We avoid flushing 'non-old' blocks in an LRU flush,
@@ -1299,9 +1302,6 @@
1299 if (flush_type != BUF_FLUSH_LRU1302 if (flush_type != BUF_FLUSH_LRU
1300 || i == offset1303 || i == offset
1301 || buf_page_is_old(bpage)) {1304 || buf_page_is_old(bpage)) {
1302 ib_mutex_t* block_mutex = buf_page_get_mutex(bpage);
1303
1304 mutex_enter(block_mutex);
13051305
1306 if (buf_flush_ready_for_flush(bpage, flush_type)1306 if (buf_flush_ready_for_flush(bpage, flush_type)
1307 && (i == offset || !bpage->buf_fix_count)) {1307 && (i == offset || !bpage->buf_fix_count)) {
@@ -1316,14 +1316,12 @@
13161316
1317 buf_flush_page(buf_pool, bpage, flush_type, false);1317 buf_flush_page(buf_pool, bpage, flush_type, false);
1318 ut_ad(!mutex_own(block_mutex));1318 ut_ad(!mutex_own(block_mutex));
1319 ut_ad(!buf_pool_mutex_own(buf_pool));
1320 count++;1319 count++;
1321 continue;1320 continue;
1322 } else {
1323 mutex_exit(block_mutex);
1324 }1321 }
1325 }1322 }
1326 buf_pool_mutex_exit(buf_pool);1323
1324 mutex_exit(block_mutex);
1327 }1325 }
13281326
1329 if (count > 0) {1327 if (count > 0) {
@@ -1341,8 +1339,9 @@
1341Check if the block is modified and ready for flushing. If the the block1339Check if the block is modified and ready for flushing. If the the block
1342is ready to flush then flush the page and try o flush its neighbors.1340is ready to flush then flush the page and try o flush its neighbors.
13431341
1344@return TRUE if buf_pool mutex was released during this function.1342@return TRUE if, depending on the flush type, either LRU or flush list
1345This does not guarantee that some pages were written as well.1343mutex was released during this function. This does not guarantee that some
1344pages were written as well.
1346Number of pages written are incremented to the count. */1345Number of pages written are incremented to the count. */
1347static1346static
1348ibool1347ibool
@@ -1358,16 +1357,21 @@
1358 ulint* count) /*!< in/out: number of pages1357 ulint* count) /*!< in/out: number of pages
1359 flushed */1358 flushed */
1360{1359{
1361 ib_mutex_t* block_mutex;1360 ib_mutex_t* block_mutex = NULL;
1362 ibool flushed = FALSE;1361 ibool flushed = FALSE;
1363#ifdef UNIV_DEBUG1362#ifdef UNIV_DEBUG
1364 buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);1363 buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);
1365#endif /* UNIV_DEBUG */1364#endif /* UNIV_DEBUG */
13661365
1367 ut_ad(buf_pool_mutex_own(buf_pool));1366 ut_ad((flush_type == BUF_FLUSH_LRU
1367 && mutex_own(&buf_pool->LRU_list_mutex))
1368 || (flush_type == BUF_FLUSH_LIST
1369 && buf_flush_list_mutex_own(buf_pool)));
13681370
1369 block_mutex = buf_page_get_mutex(bpage);1371 if (flush_type == BUF_FLUSH_LRU) {
1370 mutex_enter(block_mutex);1372 block_mutex = buf_page_get_mutex(bpage);
1373 mutex_enter(block_mutex);
1374 }
13711375
1372 ut_a(buf_page_in_file(bpage));1376 ut_a(buf_page_in_file(bpage));
13731377
@@ -1378,14 +1382,20 @@
13781382
1379 buf_pool = buf_pool_from_bpage(bpage);1383 buf_pool = buf_pool_from_bpage(bpage);
13801384
1381 buf_pool_mutex_exit(buf_pool);1385 if (flush_type == BUF_FLUSH_LRU) {
1386 mutex_exit(&buf_pool->LRU_list_mutex);
1387 }
13821388
1383 /* These fields are protected by both the1389 /* These fields are protected by the buf_page_get_mutex()
1384 buffer pool mutex and block mutex. */1390 mutex. */
1385 space = buf_page_get_space(bpage);1391 space = buf_page_get_space(bpage);
1386 offset = buf_page_get_page_no(bpage);1392 offset = buf_page_get_page_no(bpage);
13871393
1388 mutex_exit(block_mutex);1394 if (flush_type == BUF_FLUSH_LRU) {
1395 mutex_exit(block_mutex);
1396 } else {
1397 buf_flush_list_mutex_exit(buf_pool);
1398 }
13891399
1390 /* Try to flush also all the neighbors */1400 /* Try to flush also all the neighbors */
1391 *count += buf_flush_try_neighbors(space,1401 *count += buf_flush_try_neighbors(space,
@@ -1394,13 +1404,20 @@
1394 *count,1404 *count,
1395 n_to_flush);1405 n_to_flush);
13961406
1397 buf_pool_mutex_enter(buf_pool);1407 if (flush_type == BUF_FLUSH_LRU) {
1408 mutex_enter(&buf_pool->LRU_list_mutex);
1409 } else {
1410 buf_flush_list_mutex_enter(buf_pool);
1411 }
1398 flushed = TRUE;1412 flushed = TRUE;
1399 } else {1413 } else if (flush_type == BUF_FLUSH_LRU) {
1400 mutex_exit(block_mutex);1414 mutex_exit(block_mutex);
1401 }1415 }
14021416
1403 ut_ad(buf_pool_mutex_own(buf_pool));1417 ut_ad((flush_type == BUF_FLUSH_LRU
1418 && mutex_own(&buf_pool->LRU_list_mutex))
1419 || (flush_type == BUF_FLUSH_LIST
1420 && buf_flush_list_mutex_own(buf_pool)));
14041421
1405 return(flushed);1422 return(flushed);
1406}1423}
@@ -1428,22 +1445,31 @@
1428 ulint free_len = UT_LIST_GET_LEN(buf_pool->free);1445 ulint free_len = UT_LIST_GET_LEN(buf_pool->free);
1429 ulint lru_len = UT_LIST_GET_LEN(buf_pool->unzip_LRU);1446 ulint lru_len = UT_LIST_GET_LEN(buf_pool->unzip_LRU);
14301447
1431 ut_ad(buf_pool_mutex_own(buf_pool));1448 ut_ad(mutex_own(&buf_pool->LRU_list_mutex));
14321449
1433 block = UT_LIST_GET_LAST(buf_pool->unzip_LRU);1450 block = UT_LIST_GET_LAST(buf_pool->unzip_LRU);
1434 while (block != NULL && count < max1451 while (block != NULL && count < max
1435 && free_len < srv_LRU_scan_depth1452 && free_len < srv_LRU_scan_depth
1436 && lru_len > UT_LIST_GET_LEN(buf_pool->LRU) / 10) {1453 && lru_len > UT_LIST_GET_LEN(buf_pool->LRU) / 10) {
14371454
1455 ib_mutex_t* block_mutex = buf_page_get_mutex(&block->page);
1456
1438 ++scanned;1457 ++scanned;
1458
1459 mutex_enter(block_mutex);
1460
1439 if (buf_LRU_free_page(&block->page, false)) {1461 if (buf_LRU_free_page(&block->page, false)) {
1440 /* Block was freed. buf_pool->mutex potentially1462
1463 mutex_exit(block_mutex);
1464 /* Block was freed. LRU list mutex potentially
1441 released and reacquired */1465 released and reacquired */
1442 ++count;1466 ++count;
1467 mutex_enter(&buf_pool->LRU_list_mutex);
1443 block = UT_LIST_GET_LAST(buf_pool->unzip_LRU);1468 block = UT_LIST_GET_LAST(buf_pool->unzip_LRU);
14441469
1445 } else {1470 } else {
14461471
1472 mutex_exit(block_mutex);
1447 block = UT_LIST_GET_PREV(unzip_LRU, block);1473 block = UT_LIST_GET_PREV(unzip_LRU, block);
1448 }1474 }
14491475
@@ -1451,7 +1477,7 @@
1451 lru_len = UT_LIST_GET_LEN(buf_pool->unzip_LRU);1477 lru_len = UT_LIST_GET_LEN(buf_pool->unzip_LRU);
1452 }1478 }
14531479
1454 ut_ad(buf_pool_mutex_own(buf_pool));1480 ut_ad(mutex_own(&buf_pool->LRU_list_mutex));
14551481
1456 if (scanned) {1482 if (scanned) {
1457 MONITOR_INC_VALUE_CUMULATIVE(1483 MONITOR_INC_VALUE_CUMULATIVE(
@@ -1485,7 +1511,7 @@
1485 ulint free_len = UT_LIST_GET_LEN(buf_pool->free);1511 ulint free_len = UT_LIST_GET_LEN(buf_pool->free);
1486 ulint lru_len = UT_LIST_GET_LEN(buf_pool->LRU);1512 ulint lru_len = UT_LIST_GET_LEN(buf_pool->LRU);
14871513
1488 ut_ad(buf_pool_mutex_own(buf_pool));1514 ut_ad(mutex_own(&buf_pool->LRU_list_mutex));
14891515
1490 bpage = UT_LIST_GET_LAST(buf_pool->LRU);1516 bpage = UT_LIST_GET_LAST(buf_pool->LRU);
1491 while (bpage != NULL && count < max1517 while (bpage != NULL && count < max
@@ -1494,13 +1520,20 @@
14941520
1495 ib_mutex_t* block_mutex = buf_page_get_mutex(bpage);1521 ib_mutex_t* block_mutex = buf_page_get_mutex(bpage);
1496 ibool evict;1522 ibool evict;
14971523 ulint failed_acquire;
1498 mutex_enter(block_mutex);
1499 evict = buf_flush_ready_for_replace(bpage);
1500 mutex_exit(block_mutex);
15011524
1502 ++scanned;1525 ++scanned;
15031526
1527 failed_acquire = mutex_enter_nowait(block_mutex);
1528
1529 evict = UNIV_LIKELY(!failed_acquire)
1530 && buf_flush_ready_for_replace(bpage);
1531
1532 if (UNIV_LIKELY(!failed_acquire) && !evict) {
1533
1534 mutex_exit(block_mutex);
1535 }
1536
1504 /* If the block is ready to be replaced we try to1537 /* If the block is ready to be replaced we try to
1505 free it i.e.: put it on the free list.1538 free it i.e.: put it on the free list.
1506 Otherwise we try to flush the block and its1539 Otherwise we try to flush the block and its
@@ -1514,28 +1547,35 @@
1514 O(n*n). */1547 O(n*n). */
1515 if (evict) {1548 if (evict) {
1516 if (buf_LRU_free_page(bpage, true)) {1549 if (buf_LRU_free_page(bpage, true)) {
1517 /* buf_pool->mutex was potentially1550
1518 released and reacquired. */1551 mutex_exit(block_mutex);
1552 mutex_enter(&buf_pool->LRU_list_mutex);
1519 bpage = UT_LIST_GET_LAST(buf_pool->LRU);1553 bpage = UT_LIST_GET_LAST(buf_pool->LRU);
1520 } else {1554 } else {
1555
1521 bpage = UT_LIST_GET_PREV(LRU, bpage);1556 bpage = UT_LIST_GET_PREV(LRU, bpage);
1557 mutex_exit(block_mutex);
1522 }1558 }
1523 } else if (buf_flush_page_and_try_neighbors(1559 } else if (UNIV_LIKELY(!failed_acquire)) {
1560
1561 if (buf_flush_page_and_try_neighbors(
1524 bpage,1562 bpage,
1525 BUF_FLUSH_LRU, max, &count)) {1563 BUF_FLUSH_LRU, max, &count)) {
15261564
1527 /* buf_pool->mutex was released.1565 /* LRU list mutex was released.
1528 Restart the scan. */1566 Restart the scan. */
1529 bpage = UT_LIST_GET_LAST(buf_pool->LRU);1567 bpage = UT_LIST_GET_LAST(buf_pool->LRU);
1530 } else {1568 } else {
1531 bpage = UT_LIST_GET_PREV(LRU, bpage);1569
1570 bpage = UT_LIST_GET_PREV(LRU, bpage);
1571 }
1532 }1572 }
15331573
1534 free_len = UT_LIST_GET_LEN(buf_pool->free);1574 free_len = UT_LIST_GET_LEN(buf_pool->free);
1535 lru_len = UT_LIST_GET_LEN(buf_pool->LRU);1575 lru_len = UT_LIST_GET_LEN(buf_pool->LRU);
1536 }1576 }
15371577
1538 ut_ad(buf_pool_mutex_own(buf_pool));1578 ut_ad(mutex_own(&buf_pool->LRU_list_mutex));
15391579
1540 /* We keep track of all flushes happening as part of LRU1580 /* We keep track of all flushes happening as part of LRU
1541 flush. When estimating the desired rate at which flush_list1581 flush. When estimating the desired rate at which flush_list
@@ -1604,8 +1644,6 @@
1604 ulint count = 0;1644 ulint count = 0;
1605 ulint scanned = 0;1645 ulint scanned = 0;
16061646
1607 ut_ad(buf_pool_mutex_own(buf_pool));
1608
1609 /* Start from the end of the list looking for a suitable1647 /* Start from the end of the list looking for a suitable
1610 block to be flushed. */1648 block to be flushed. */
1611 buf_flush_list_mutex_enter(buf_pool);1649 buf_flush_list_mutex_enter(buf_pool);
@@ -1629,16 +1667,12 @@
1629 prev = UT_LIST_GET_PREV(list, bpage);1667 prev = UT_LIST_GET_PREV(list, bpage);
1630 buf_flush_set_hp(buf_pool, prev);1668 buf_flush_set_hp(buf_pool, prev);
16311669
1632 buf_flush_list_mutex_exit(buf_pool);
1633
1634#ifdef UNIV_DEBUG1670#ifdef UNIV_DEBUG
1635 bool flushed =1671 bool flushed =
1636#endif /* UNIV_DEBUG */1672#endif /* UNIV_DEBUG */
1637 buf_flush_page_and_try_neighbors(1673 buf_flush_page_and_try_neighbors(
1638 bpage, BUF_FLUSH_LIST, min_n, &count);1674 bpage, BUF_FLUSH_LIST, min_n, &count);
16391675
1640 buf_flush_list_mutex_enter(buf_pool);
1641
1642 ut_ad(flushed || buf_flush_is_hp(buf_pool, prev));1676 ut_ad(flushed || buf_flush_is_hp(buf_pool, prev));
16431677
1644 if (!buf_flush_is_hp(buf_pool, prev)) {1678 if (!buf_flush_is_hp(buf_pool, prev)) {
@@ -1663,8 +1697,6 @@
1663 MONITOR_FLUSH_BATCH_SCANNED_PER_CALL,1697 MONITOR_FLUSH_BATCH_SCANNED_PER_CALL,
1664 scanned);1698 scanned);
16651699
1666 ut_ad(buf_pool_mutex_own(buf_pool));
1667
1668 return(count);1700 return(count);
1669}1701}
16701702
@@ -1701,13 +1733,13 @@
1701 || sync_thread_levels_empty_except_dict());1733 || sync_thread_levels_empty_except_dict());
1702#endif /* UNIV_SYNC_DEBUG */1734#endif /* UNIV_SYNC_DEBUG */
17031735
1704 buf_pool_mutex_enter(buf_pool);1736 /* Note: The buffer pool mutexes are released and reacquired within
1705
1706 /* Note: The buffer pool mutex is released and reacquired within
1707 the flush functions. */1737 the flush functions. */
1708 switch (flush_type) {1738 switch (flush_type) {
1709 case BUF_FLUSH_LRU:1739 case BUF_FLUSH_LRU:
1740 mutex_enter(&buf_pool->LRU_list_mutex);
1710 count = buf_do_LRU_batch(buf_pool, min_n);1741 count = buf_do_LRU_batch(buf_pool, min_n);
1742 mutex_exit(&buf_pool->LRU_list_mutex);
1711 break;1743 break;
1712 case BUF_FLUSH_LIST:1744 case BUF_FLUSH_LIST:
1713 count = buf_do_flush_list_batch(buf_pool, min_n, lsn_limit);1745 count = buf_do_flush_list_batch(buf_pool, min_n, lsn_limit);
@@ -1716,8 +1748,6 @@
1716 ut_error;1748 ut_error;
1717 }1749 }
17181750
1719 buf_pool_mutex_exit(buf_pool);
1720
1721#ifdef UNIV_DEBUG1751#ifdef UNIV_DEBUG
1722 if (buf_debug_prints && count > 0) {1752 if (buf_debug_prints && count > 0) {
1723 fprintf(stderr, flush_type == BUF_FLUSH_LRU1753 fprintf(stderr, flush_type == BUF_FLUSH_LRU
@@ -1765,21 +1795,21 @@
1765 buf_flush_t flush_type) /*!< in: BUF_FLUSH_LRU1795 buf_flush_t flush_type) /*!< in: BUF_FLUSH_LRU
1766 or BUF_FLUSH_LIST */1796 or BUF_FLUSH_LIST */
1767{1797{
1768 buf_pool_mutex_enter(buf_pool);1798 mutex_enter(&buf_pool->flush_state_mutex);
17691799
1770 if (buf_pool->n_flush[flush_type] > 01800 if (buf_pool->n_flush[flush_type] > 0
1771 || buf_pool->init_flush[flush_type] == TRUE) {1801 || buf_pool->init_flush[flush_type] == TRUE) {
17721802
1773 /* There is already a flush batch of the same type running */1803 /* There is already a flush batch of the same type running */
17741804
1775 buf_pool_mutex_exit(buf_pool);1805 mutex_exit(&buf_pool->flush_state_mutex);
17761806
1777 return(FALSE);1807 return(FALSE);
1778 }1808 }
17791809
1780 buf_pool->init_flush[flush_type] = TRUE;1810 buf_pool->init_flush[flush_type] = TRUE;
17811811
1782 buf_pool_mutex_exit(buf_pool);1812 mutex_exit(&buf_pool->flush_state_mutex);
17831813
1784 return(TRUE);1814 return(TRUE);
1785}1815}
@@ -1794,7 +1824,7 @@
1794 buf_flush_t flush_type) /*!< in: BUF_FLUSH_LRU1824 buf_flush_t flush_type) /*!< in: BUF_FLUSH_LRU
1795 or BUF_FLUSH_LIST */1825 or BUF_FLUSH_LIST */
1796{1826{
1797 buf_pool_mutex_enter(buf_pool);1827 mutex_enter(&buf_pool->flush_state_mutex);
17981828
1799 buf_pool->init_flush[flush_type] = FALSE;1829 buf_pool->init_flush[flush_type] = FALSE;
18001830
@@ -1807,7 +1837,7 @@
1807 os_event_set(buf_pool->no_flush[flush_type]);1837 os_event_set(buf_pool->no_flush[flush_type]);
1808 }1838 }
18091839
1810 buf_pool_mutex_exit(buf_pool);1840 mutex_exit(&buf_pool->flush_state_mutex);
1811}1841}
18121842
1813/******************************************************************//**1843/******************************************************************//**
@@ -1989,7 +2019,7 @@
1989 ibool freed;2019 ibool freed;
1990 bool evict_zip;2020 bool evict_zip;
19912021
1992 buf_pool_mutex_enter(buf_pool);2022 mutex_enter(&buf_pool->LRU_list_mutex);
19932023
1994 for (bpage = UT_LIST_GET_LAST(buf_pool->LRU), scanned = 1;2024 for (bpage = UT_LIST_GET_LAST(buf_pool->LRU), scanned = 1;
1995 bpage != NULL;2025 bpage != NULL;
@@ -2006,6 +2036,8 @@
2006 mutex_exit(block_mutex);2036 mutex_exit(block_mutex);
2007 }2037 }
20082038
2039 mutex_exit(&buf_pool->LRU_list_mutex);
2040
2009 MONITOR_INC_VALUE_CUMULATIVE(2041 MONITOR_INC_VALUE_CUMULATIVE(
2010 MONITOR_LRU_SINGLE_FLUSH_SCANNED,2042 MONITOR_LRU_SINGLE_FLUSH_SCANNED,
2011 MONITOR_LRU_SINGLE_FLUSH_SCANNED_NUM_CALL,2043 MONITOR_LRU_SINGLE_FLUSH_SCANNED_NUM_CALL,
@@ -2014,22 +2046,20 @@
20142046
2015 if (!bpage) {2047 if (!bpage) {
2016 /* Can't find a single flushable page. */2048 /* Can't find a single flushable page. */
2017 buf_pool_mutex_exit(buf_pool);
2018 return(FALSE);2049 return(FALSE);
2019 }2050 }
20202051
2021 /* The following call will release the buffer pool and2052 /* The following call will release the buf_page_get_mutex() mutex. */
2022 block mutex. */
2023 buf_flush_page(buf_pool, bpage, BUF_FLUSH_SINGLE_PAGE, true);2053 buf_flush_page(buf_pool, bpage, BUF_FLUSH_SINGLE_PAGE, true);
20242054
2025 /* At this point the page has been written to the disk.2055 /* At this point the page has been written to the disk.
2026 As we are not holding buffer pool or block mutex therefore2056 As we are not holding LRU list or buf_page_get_mutex() mutex therefore
2027 we cannot use the bpage safely. It may have been plucked out2057 we cannot use the bpage safely. It may have been plucked out
2028 of the LRU list by some other thread or it may even have2058 of the LRU list by some other thread or it may even have
2029 relocated in case of a compressed page. We need to start2059 relocated in case of a compressed page. We need to start
2030 the scan of LRU list again to remove the block from the LRU2060 the scan of LRU list again to remove the block from the LRU
2031 list and put it on the free list. */2061 list and put it on the free list. */
2032 buf_pool_mutex_enter(buf_pool);2062 mutex_enter(&buf_pool->LRU_list_mutex);
20332063
2034 for (bpage = UT_LIST_GET_LAST(buf_pool->LRU);2064 for (bpage = UT_LIST_GET_LAST(buf_pool->LRU);
2035 bpage != NULL;2065 bpage != NULL;
@@ -2040,23 +2070,25 @@
2040 block_mutex = buf_page_get_mutex(bpage);2070 block_mutex = buf_page_get_mutex(bpage);
2041 mutex_enter(block_mutex);2071 mutex_enter(block_mutex);
2042 ready = buf_flush_ready_for_replace(bpage);2072 ready = buf_flush_ready_for_replace(bpage);
2043 mutex_exit(block_mutex);
2044 if (ready) {2073 if (ready) {
2045 break;2074 break;
2046 }2075 }
2076 mutex_exit(block_mutex);
20472077
2048 }2078 }
20492079
2050 if (!bpage) {2080 if (!bpage) {
2051 /* Can't find a single replaceable page. */2081 /* Can't find a single replaceable page. */
2052 buf_pool_mutex_exit(buf_pool);2082 mutex_exit(&buf_pool->LRU_list_mutex);
2053 return(FALSE);2083 return(FALSE);
2054 }2084 }
20552085
2056 evict_zip = !buf_LRU_evict_from_unzip_LRU(buf_pool);;2086 evict_zip = !buf_LRU_evict_from_unzip_LRU(buf_pool);;
20572087
2058 freed = buf_LRU_free_page(bpage, evict_zip);2088 freed = buf_LRU_free_page(bpage, evict_zip);
2059 buf_pool_mutex_exit(buf_pool);2089 if (!freed)
2090 mutex_exit(&buf_pool->LRU_list_mutex);
2091 mutex_exit(block_mutex);
20602092
2061 return(freed);2093 return(freed);
2062}2094}
@@ -2082,9 +2114,7 @@
20822114
2083 /* srv_LRU_scan_depth can be arbitrarily large value.2115 /* srv_LRU_scan_depth can be arbitrarily large value.
2084 We cap it with current LRU size. */2116 We cap it with current LRU size. */
2085 buf_pool_mutex_enter(buf_pool);
2086 scan_depth = UT_LIST_GET_LEN(buf_pool->LRU);2117 scan_depth = UT_LIST_GET_LEN(buf_pool->LRU);
2087 buf_pool_mutex_exit(buf_pool);
20882118
2089 scan_depth = ut_min(srv_LRU_scan_depth, scan_depth);2119 scan_depth = ut_min(srv_LRU_scan_depth, scan_depth);
20902120
@@ -2143,15 +2173,15 @@
21432173
2144 buf_pool = buf_pool_from_array(i);2174 buf_pool = buf_pool_from_array(i);
21452175
2146 buf_pool_mutex_enter(buf_pool);2176 mutex_enter(&buf_pool->flush_state_mutex);
21472177
2148 if (buf_pool->n_flush[BUF_FLUSH_LRU] > 02178 if (buf_pool->n_flush[BUF_FLUSH_LRU] > 0
2149 || buf_pool->init_flush[BUF_FLUSH_LRU]) {2179 || buf_pool->init_flush[BUF_FLUSH_LRU]) {
21502180
2151 buf_pool_mutex_exit(buf_pool);2181 mutex_exit(&buf_pool->flush_state_mutex);
2152 buf_flush_wait_batch_end(buf_pool, BUF_FLUSH_LRU);2182 buf_flush_wait_batch_end(buf_pool, BUF_FLUSH_LRU);
2153 } else {2183 } else {
2154 buf_pool_mutex_exit(buf_pool);2184 mutex_exit(&buf_pool->flush_state_mutex);
2155 }2185 }
2156 }2186 }
2157}2187}
@@ -2629,7 +2659,6 @@
2629{2659{
2630 ulint count = 0;2660 ulint count = 0;
26312661
2632 buf_pool_mutex_enter(buf_pool);
2633 buf_flush_list_mutex_enter(buf_pool);2662 buf_flush_list_mutex_enter(buf_pool);
26342663
2635 buf_page_t* bpage;2664 buf_page_t* bpage;
@@ -2648,7 +2677,6 @@
2648 }2677 }
26492678
2650 buf_flush_list_mutex_exit(buf_pool);2679 buf_flush_list_mutex_exit(buf_pool);
2651 buf_pool_mutex_exit(buf_pool);
26522680
2653 return(count);2681 return(count);
2654}2682}
26552683
=== modified file 'Percona-Server/storage/innobase/buf/buf0lru.cc'
--- Percona-Server/storage/innobase/buf/buf0lru.cc 2013-08-16 09:11:51 +0000
+++ Percona-Server/storage/innobase/buf/buf0lru.cc 2013-09-20 05:29:11 +0000
@@ -75,7 +75,7 @@
75/** When dropping the search hash index entries before deleting an ibd75/** When dropping the search hash index entries before deleting an ibd
76file, we build a local array of pages belonging to that tablespace76file, we build a local array of pages belonging to that tablespace
77in the buffer pool. Following is the size of that array.77in the buffer pool. Following is the size of that array.
78We also release buf_pool->mutex after scanning this many pages of the78We also release buf_pool->LRU_list_mutex after scanning this many pages of the
79flush_list when dropping a table. This is to ensure that other threads79flush_list when dropping a table. This is to ensure that other threads
80are not blocked for extended period of time when using very large80are not blocked for extended period of time when using very large
81buffer pools. */81buffer pools. */
@@ -133,7 +133,7 @@
133If the block is compressed-only (BUF_BLOCK_ZIP_PAGE),133If the block is compressed-only (BUF_BLOCK_ZIP_PAGE),
134the object will be freed.134the object will be freed.
135135
136The caller must hold buf_pool->mutex, the buf_page_get_mutex() mutex136The caller must hold buf_pool->LRU_list_mutex, the buf_page_get_mutex() mutex
137and the appropriate hash_lock. This function will release the137and the appropriate hash_lock. This function will release the
138buf_page_get_mutex() and the hash_lock.138buf_page_get_mutex() and the hash_lock.
139139
@@ -170,7 +170,7 @@
170 buf_page_t* bpage, /*!< in: control block */170 buf_page_t* bpage, /*!< in: control block */
171 buf_pool_t* buf_pool) /*!< in: buffer pool instance */171 buf_pool_t* buf_pool) /*!< in: buffer pool instance */
172{172{
173 ut_ad(buf_pool_mutex_own(buf_pool));173 ut_ad(mutex_own(&buf_pool->LRU_list_mutex));
174 ulint zip_size = page_zip_get_size(&bpage->zip);174 ulint zip_size = page_zip_get_size(&bpage->zip);
175 buf_pool->stat.LRU_bytes += zip_size ? zip_size : UNIV_PAGE_SIZE;175 buf_pool->stat.LRU_bytes += zip_size ? zip_size : UNIV_PAGE_SIZE;
176 ut_ad(buf_pool->stat.LRU_bytes <= buf_pool->curr_pool_size);176 ut_ad(buf_pool->stat.LRU_bytes <= buf_pool->curr_pool_size);
@@ -189,7 +189,7 @@
189 ulint io_avg;189 ulint io_avg;
190 ulint unzip_avg;190 ulint unzip_avg;
191191
192 ut_ad(buf_pool_mutex_own(buf_pool));192 ut_ad(mutex_own(&buf_pool->LRU_list_mutex));
193193
194 /* If the unzip_LRU list is empty, we can only use the LRU. */194 /* If the unzip_LRU list is empty, we can only use the LRU. */
195 if (UT_LIST_GET_LEN(buf_pool->unzip_LRU) == 0) {195 if (UT_LIST_GET_LEN(buf_pool->unzip_LRU) == 0) {
@@ -276,7 +276,7 @@
276 page_arr = static_cast<ulint*>(ut_malloc(276 page_arr = static_cast<ulint*>(ut_malloc(
277 sizeof(ulint) * BUF_LRU_DROP_SEARCH_SIZE));277 sizeof(ulint) * BUF_LRU_DROP_SEARCH_SIZE));
278278
279 buf_pool_mutex_enter(buf_pool);279 mutex_enter(&buf_pool->LRU_list_mutex);
280 num_entries = 0;280 num_entries = 0;
281281
282scan_again:282scan_again:
@@ -285,6 +285,7 @@
285 while (bpage != NULL) {285 while (bpage != NULL) {
286 buf_page_t* prev_bpage;286 buf_page_t* prev_bpage;
287 ibool is_fixed;287 ibool is_fixed;
288 ib_mutex_t* block_mutex = buf_page_get_mutex(bpage);
288289
289 prev_bpage = UT_LIST_GET_PREV(LRU, bpage);290 prev_bpage = UT_LIST_GET_PREV(LRU, bpage);
290291
@@ -301,10 +302,10 @@
301 continue;302 continue;
302 }303 }
303304
304 mutex_enter(&((buf_block_t*) bpage)->mutex);305 mutex_enter(block_mutex);
305 is_fixed = bpage->buf_fix_count > 0306 is_fixed = bpage->buf_fix_count > 0
306 || !((buf_block_t*) bpage)->index;307 || !((buf_block_t*) bpage)->index;
307 mutex_exit(&((buf_block_t*) bpage)->mutex);308 mutex_exit(block_mutex);
308309
309 if (is_fixed) {310 if (is_fixed) {
310 goto next_page;311 goto next_page;
@@ -320,18 +321,18 @@
320 goto next_page;321 goto next_page;
321 }322 }
322323
323 /* Array full. We release the buf_pool->mutex to obey324 /* Array full. We release the buf_pool->LRU_list_mutex to obey
324 the latching order. */325 the latching order. */
325 buf_pool_mutex_exit(buf_pool);326 mutex_exit(&buf_pool->LRU_list_mutex);
326327
327 buf_LRU_drop_page_hash_batch(328 buf_LRU_drop_page_hash_batch(
328 id, zip_size, page_arr, num_entries);329 id, zip_size, page_arr, num_entries);
329330
330 num_entries = 0;331 num_entries = 0;
331332
332 buf_pool_mutex_enter(buf_pool);333 mutex_enter(&buf_pool->LRU_list_mutex);
333334
334 /* Note that we released the buf_pool mutex above335 /* Note that we released the buf_pool->LRU_list_mutex above
335 after reading the prev_bpage during processing of a336 after reading the prev_bpage during processing of a
336 page_hash_batch (i.e.: when the array was full).337 page_hash_batch (i.e.: when the array was full).
337 Because prev_bpage could belong to a compressed-only338 Because prev_bpage could belong to a compressed-only
@@ -345,15 +346,15 @@
345 guarantee that ALL such entries will be dropped. */346 guarantee that ALL such entries will be dropped. */
346347
347 /* If, however, bpage has been removed from LRU list348 /* If, however, bpage has been removed from LRU list
348 to the free list then we should restart the scan.349 to the free list then we should restart the scan. */
349 bpage->state is protected by buf_pool mutex. */350
350 if (bpage351 if (bpage
351 && buf_page_get_state(bpage) != BUF_BLOCK_FILE_PAGE) {352 && buf_page_get_state(bpage) != BUF_BLOCK_FILE_PAGE) {
352 goto scan_again;353 goto scan_again;
353 }354 }
354 }355 }
355356
356 buf_pool_mutex_exit(buf_pool);357 mutex_exit(&buf_pool->LRU_list_mutex);
357358
358 /* Drop any remaining batch of search hashed pages. */359 /* Drop any remaining batch of search hashed pages. */
359 buf_LRU_drop_page_hash_batch(id, zip_size, page_arr, num_entries);360 buf_LRU_drop_page_hash_batch(id, zip_size, page_arr, num_entries);
@@ -373,27 +374,25 @@
373 buf_pool_t* buf_pool, /*!< in/out: buffer pool instance */374 buf_pool_t* buf_pool, /*!< in/out: buffer pool instance */
374 buf_page_t* bpage) /*!< in/out: current page */375 buf_page_t* bpage) /*!< in/out: current page */
375{376{
376 ib_mutex_t* block_mutex;377 ib_mutex_t* block_mutex = buf_page_get_mutex(bpage);
377378
378 ut_ad(buf_pool_mutex_own(buf_pool));379 ut_ad(mutex_own(&buf_pool->LRU_list_mutex));
380 ut_ad(mutex_own(block_mutex));
379 ut_ad(buf_page_in_file(bpage));381 ut_ad(buf_page_in_file(bpage));
380382
381 block_mutex = buf_page_get_mutex(bpage);
382
383 mutex_enter(block_mutex);
384 /* "Fix" the block so that the position cannot be383 /* "Fix" the block so that the position cannot be
385 changed after we release the buffer pool and384 changed after we release the buffer pool and
386 block mutexes. */385 block mutexes. */
387 buf_page_set_sticky(bpage);386 buf_page_set_sticky(bpage);
388387
389 /* Now it is safe to release the buf_pool->mutex. */388 /* Now it is safe to release the LRU list mutex */
390 buf_pool_mutex_exit(buf_pool);389 mutex_exit(&buf_pool->LRU_list_mutex);
391390
392 mutex_exit(block_mutex);391 mutex_exit(block_mutex);
393 /* Try and force a context switch. */392 /* Try and force a context switch. */
394 os_thread_yield();393 os_thread_yield();
395394
396 buf_pool_mutex_enter(buf_pool);395 mutex_enter(&buf_pool->LRU_list_mutex);
397396
398 mutex_enter(block_mutex);397 mutex_enter(block_mutex);
399 /* "Unfix" the block now that we have both the398 /* "Unfix" the block now that we have both the
@@ -413,21 +412,47 @@
413/*================*/412/*================*/
414 buf_pool_t* buf_pool, /*!< in/out: buffer pool instance */413 buf_pool_t* buf_pool, /*!< in/out: buffer pool instance */
415 buf_page_t* bpage, /*!< in/out: bpage to remove */414 buf_page_t* bpage, /*!< in/out: bpage to remove */
416 ulint processed) /*!< in: number of pages processed */415 ulint processed, /*!< in: number of pages processed */
416 bool* must_restart) /*!< in/out: if true, we have to
417 restart the flush list scan */
417{418{
418 /* Every BUF_LRU_DROP_SEARCH_SIZE iterations in the419 /* Every BUF_LRU_DROP_SEARCH_SIZE iterations in the
419 loop we release buf_pool->mutex to let other threads420 loop we release buf_pool->LRU_list_mutex to let other threads
420 do their job but only if the block is not IO fixed. This421 do their job but only if the block is not IO fixed. This
421 ensures that the block stays in its position in the422 ensures that the block stays in its position in the
422 flush_list. */423 flush_list. */
423424
424 if (bpage != NULL425 if (bpage != NULL
425 && processed >= BUF_LRU_DROP_SEARCH_SIZE426 && processed >= BUF_LRU_DROP_SEARCH_SIZE
426 && buf_page_get_io_fix(bpage) == BUF_IO_NONE) {427 && buf_page_get_io_fix_unlocked(bpage) == BUF_IO_NONE) {
428
429 ib_mutex_t* block_mutex = buf_page_get_mutex(bpage);
427430
428 buf_flush_list_mutex_exit(buf_pool);431 buf_flush_list_mutex_exit(buf_pool);
429432
430 /* Release the buffer pool and block mutex433 /* We don't have to worry about bpage becoming a dangling
434 pointer by a compressed page flush list relocation because
435 buf_page_get_gen() won't be called for pages from this
436 tablespace. */
437
438 mutex_enter(block_mutex);
439 /* Recheck the I/O fix and the flush list presence now that we
440 hold the right mutex */
441 if (UNIV_UNLIKELY(buf_page_get_io_fix(bpage) != BUF_IO_NONE
442 || bpage->oldest_modification == 0)) {
443
444 mutex_exit(block_mutex);
445
446 *must_restart = true;
447
448 buf_flush_list_mutex_enter(buf_pool);
449
450 return false;
451 }
452
453 *must_restart = false;
454
455 /* Release the LRU list and buf_page_get_mutex() mutex
431 to give the other threads a go. */456 to give the other threads a go. */
432457
433 buf_flush_yield(buf_pool, bpage);458 buf_flush_yield(buf_pool, bpage);
@@ -456,18 +481,22 @@
456/*=====================*/481/*=====================*/
457 buf_pool_t* buf_pool, /*!< in/out: buffer pool instance */482 buf_pool_t* buf_pool, /*!< in/out: buffer pool instance */
458 buf_page_t* bpage, /*!< in/out: bpage to remove */483 buf_page_t* bpage, /*!< in/out: bpage to remove */
459 bool flush) /*!< in: flush to disk if true but484 bool flush, /*!< in: flush to disk if true but
460 don't remove else remove without485 don't remove else remove without
461 flushing to disk */486 flushing to disk */
487 bool* must_restart) /*!< in/out: if true, must restart the
488 flush list scan */
462{489{
463 ut_ad(buf_pool_mutex_own(buf_pool));490 ib_mutex_t* block_mutex = buf_page_get_mutex(bpage);
491
492 ut_ad(mutex_own(&buf_pool->LRU_list_mutex));
464 ut_ad(buf_flush_list_mutex_own(buf_pool));493 ut_ad(buf_flush_list_mutex_own(buf_pool));
465494
466 /* bpage->space and bpage->io_fix are protected by495 /* It is safe to check bpage->space and bpage->io_fix while holding
467 buf_pool->mutex and block_mutex. It is safe to check496 buf_pool->LRU_list_mutex only. */
468 them while holding buf_pool->mutex only. */
469497
470 if (buf_page_get_io_fix(bpage) != BUF_IO_NONE) {498 if (UNIV_UNLIKELY(buf_page_get_io_fix_unlocked(bpage)
499 != BUF_IO_NONE)) {
471500
472 /* We cannot remove this page during this scan501 /* We cannot remove this page during this scan
473 yet; maybe the system is currently reading it502 yet; maybe the system is currently reading it
@@ -476,22 +505,31 @@
476505
477 }506 }
478507
479 ib_mutex_t* block_mutex = buf_page_get_mutex(bpage);
480 bool processed = false;508 bool processed = false;
481509
482 /* We have to release the flush_list_mutex to obey the
483 latching order. We are however guaranteed that the page
484 will stay in the flush_list and won't be relocated because
485 buf_flush_remove() and buf_flush_relocate_on_flush_list()
486 need buf_pool->mutex as well. */
487
488 buf_flush_list_mutex_exit(buf_pool);510 buf_flush_list_mutex_exit(buf_pool);
489511
512 /* We don't have to worry about bpage becoming a dangling
513 pointer by a compressed page flush list relocation because
514 buf_page_get_gen() won't be called for pages from this
515 tablespace. */
516
490 mutex_enter(block_mutex);517 mutex_enter(block_mutex);
491518
492 ut_ad(bpage->oldest_modification != 0);519 /* Recheck the page I/O fix and the flush list presence now
493520 that we hold the right mutex. */
494 if (!flush) {521 if (UNIV_UNLIKELY(buf_page_get_io_fix(bpage) != BUF_IO_NONE
522 || bpage->oldest_modification == 0)) {
523
524 /* The page became I/O-fixed or is not on the flush
525 list anymore, this invalidates any flush-list-page
526 pointers we have. */
527
528 mutex_exit(block_mutex);
529
530 *must_restart = TRUE;
531
532 } else if (!flush) {
495533
496 buf_flush_remove(bpage);534 buf_flush_remove(bpage);
497535
@@ -502,8 +540,10 @@
502 } else if (buf_flush_ready_for_flush(bpage,540 } else if (buf_flush_ready_for_flush(bpage,
503 BUF_FLUSH_SINGLE_PAGE)) {541 BUF_FLUSH_SINGLE_PAGE)) {
504542
505 /* The following call will release the buffer pool543 mutex_exit(&buf_pool->LRU_list_mutex);
506 and block mutex. */544
545 /* The following call will release the buf_page_get_mutex()
546 mutex. */
507 buf_flush_page(buf_pool, bpage, BUF_FLUSH_SINGLE_PAGE, false);547 buf_flush_page(buf_pool, bpage, BUF_FLUSH_SINGLE_PAGE, false);
508 ut_ad(!mutex_own(block_mutex));548 ut_ad(!mutex_own(block_mutex));
509549
@@ -511,7 +551,7 @@
511 post the writes to the operating system */551 post the writes to the operating system */
512 os_aio_simulated_wake_handler_threads();552 os_aio_simulated_wake_handler_threads();
513553
514 buf_pool_mutex_enter(buf_pool);554 mutex_enter(&buf_pool->LRU_list_mutex);
515555
516 processed = true;556 processed = true;
517 } else {557 } else {
@@ -525,7 +565,7 @@
525 buf_flush_list_mutex_enter(buf_pool);565 buf_flush_list_mutex_enter(buf_pool);
526566
527 ut_ad(!mutex_own(block_mutex));567 ut_ad(!mutex_own(block_mutex));
528 ut_ad(buf_pool_mutex_own(buf_pool));568 ut_ad(mutex_own(&buf_pool->LRU_list_mutex));
529569
530 return(processed);570 return(processed);
531}571}
@@ -558,6 +598,7 @@
558 buf_flush_list_mutex_enter(buf_pool);598 buf_flush_list_mutex_enter(buf_pool);
559599
560rescan:600rescan:
601 bool must_restart = false;
561 bool all_freed = true;602 bool all_freed = true;
562603
563 for (bpage = UT_LIST_GET_LAST(buf_pool->flush_list);604 for (bpage = UT_LIST_GET_LAST(buf_pool->flush_list);
@@ -576,15 +617,16 @@
576 /* Skip this block, as it does not belong to617 /* Skip this block, as it does not belong to
577 the target space. */618 the target space. */
578619
579 } else if (!buf_flush_or_remove_page(buf_pool, bpage, flush)) {620 } else if (!buf_flush_or_remove_page(buf_pool, bpage, flush,
621 &must_restart)) {
580622
581 /* Remove was unsuccessful, we have to try again623 /* Remove was unsuccessful, we have to try again
582 by scanning the entire list from the end.624 by scanning the entire list from the end.
583 This also means that we never released the625 This also means that we never released the
584 buf_pool mutex. Therefore we can trust the prev626 flush list mutex. Therefore we can trust the prev
585 pointer.627 pointer.
586 buf_flush_or_remove_page() released the628 buf_flush_or_remove_page() released the
587 flush list mutex but not the buf_pool mutex.629 flush list mutex but not the LRU list mutex.
588 Therefore it is possible that a new page was630 Therefore it is possible that a new page was
589 added to the flush list. For example, in case631 added to the flush list. For example, in case
590 where we are at the head of the flush list and632 where we are at the head of the flush list and
@@ -602,17 +644,23 @@
602 } else if (flush) {644 } else if (flush) {
603645
604 /* The processing was successful. And during the646 /* The processing was successful. And during the
605 processing we have released the buf_pool mutex647 processing we have released all the buf_pool mutexes
606 when calling buf_page_flush(). We cannot trust648 when calling buf_page_flush(). We cannot trust
607 prev pointer. */649 prev pointer. */
608 goto rescan;650 goto rescan;
651 } else if (UNIV_UNLIKELY(must_restart)) {
652
653 ut_ad(!all_freed);
654 break;
609 }655 }
610656
611 ++processed;657 ++processed;
612658
613 /* Yield if we have hogged the CPU and mutexes for too long. */659 /* Yield if we have hogged the CPU and mutexes for too long. */
614 if (buf_flush_try_yield(buf_pool, prev, processed)) {660 if (buf_flush_try_yield(buf_pool, prev, processed,
661 &must_restart)) {
615662
663 ut_ad(!must_restart);
616 /* Reset the batch size counter if we had to yield. */664 /* Reset the batch size counter if we had to yield. */
617665
618 processed = 0;666 processed = 0;
@@ -658,11 +706,11 @@
658 dberr_t err;706 dberr_t err;
659707
660 do {708 do {
661 buf_pool_mutex_enter(buf_pool);709 mutex_enter(&buf_pool->LRU_list_mutex);
662710
663 err = buf_flush_or_remove_pages(buf_pool, id, flush, trx);711 err = buf_flush_or_remove_pages(buf_pool, id, flush, trx);
664712
665 buf_pool_mutex_exit(buf_pool);713 mutex_exit(&buf_pool->LRU_list_mutex);
666714
667 ut_ad(buf_flush_validate(buf_pool));715 ut_ad(buf_flush_validate(buf_pool));
668716
@@ -695,7 +743,7 @@
695 ibool all_freed;743 ibool all_freed;
696744
697scan_again:745scan_again:
698 buf_pool_mutex_enter(buf_pool);746 mutex_enter(&buf_pool->LRU_list_mutex);
699747
700 all_freed = TRUE;748 all_freed = TRUE;
701749
@@ -712,15 +760,16 @@
712760
713 prev_bpage = UT_LIST_GET_PREV(LRU, bpage);761 prev_bpage = UT_LIST_GET_PREV(LRU, bpage);
714762
715 /* bpage->space and bpage->io_fix are protected by763 /* It is safe to check bpage->space and bpage->io_fix while
716 buf_pool->mutex and the block_mutex. It is safe to check764 holding buf_pool->LRU_list_mutex only and later recheck
717 them while holding buf_pool->mutex only. */765 while holding the buf_page_get_mutex() mutex. */
718766
719 if (buf_page_get_space(bpage) != id) {767 if (buf_page_get_space(bpage) != id) {
720 /* Skip this block, as it does not belong to768 /* Skip this block, as it does not belong to
721 the space that is being invalidated. */769 the space that is being invalidated. */
722 goto next_page;770 goto next_page;
723 } else if (buf_page_get_io_fix(bpage) != BUF_IO_NONE) {771 } else if (UNIV_UNLIKELY(buf_page_get_io_fix_unlocked(bpage)
772 != BUF_IO_NONE)) {
724 /* We cannot remove this page during this scan773 /* We cannot remove this page during this scan
725 yet; maybe the system is currently reading it774 yet; maybe the system is currently reading it
726 in, or flushing the modifications to the file */775 in, or flushing the modifications to the file */
@@ -738,7 +787,11 @@
738 block_mutex = buf_page_get_mutex(bpage);787 block_mutex = buf_page_get_mutex(bpage);
739 mutex_enter(block_mutex);788 mutex_enter(block_mutex);
740789
741 if (bpage->buf_fix_count > 0) {790 if (UNIV_UNLIKELY(
791 buf_page_get_space(bpage) != id
792 || bpage->buf_fix_count > 0
793 || (buf_page_get_io_fix(bpage)
794 != BUF_IO_NONE))) {
742795
743 mutex_exit(block_mutex);796 mutex_exit(block_mutex);
744797
@@ -772,15 +825,15 @@
772 ulint page_no;825 ulint page_no;
773 ulint zip_size;826 ulint zip_size;
774827
775 buf_pool_mutex_exit(buf_pool);828 mutex_exit(&buf_pool->LRU_list_mutex);
776829
777 zip_size = buf_page_get_zip_size(bpage);830 zip_size = buf_page_get_zip_size(bpage);
778 page_no = buf_page_get_page_no(bpage);831 page_no = buf_page_get_page_no(bpage);
779832
833 mutex_exit(block_mutex);
834
780 rw_lock_x_unlock(hash_lock);835 rw_lock_x_unlock(hash_lock);
781836
782 mutex_exit(block_mutex);
783
784 /* Note that the following call will acquire837 /* Note that the following call will acquire
785 and release block->lock X-latch. */838 and release block->lock X-latch. */
786839
@@ -800,7 +853,10 @@
800 /* Remove from the LRU list. */853 /* Remove from the LRU list. */
801854
802 if (buf_LRU_block_remove_hashed(bpage, true)) {855 if (buf_LRU_block_remove_hashed(bpage, true)) {
856
857 mutex_enter(block_mutex);
803 buf_LRU_block_free_hashed_page((buf_block_t*) bpage);858 buf_LRU_block_free_hashed_page((buf_block_t*) bpage);
859 mutex_exit(block_mutex);
804 } else {860 } else {
805 ut_ad(block_mutex == &buf_pool->zip_mutex);861 ut_ad(block_mutex == &buf_pool->zip_mutex);
806 }862 }
@@ -817,7 +873,7 @@
817 bpage = prev_bpage;873 bpage = prev_bpage;
818 }874 }
819875
820 buf_pool_mutex_exit(buf_pool);876 mutex_exit(&buf_pool->LRU_list_mutex);
821877
822 if (!all_freed) {878 if (!all_freed) {
823 os_thread_sleep(20000);879 os_thread_sleep(20000);
@@ -921,7 +977,8 @@
921 buf_page_t* b;977 buf_page_t* b;
922 buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);978 buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);
923979
924 ut_ad(buf_pool_mutex_own(buf_pool));980 ut_ad(mutex_own(&buf_pool->LRU_list_mutex));
981 ut_ad(mutex_own(&buf_pool->zip_mutex));
925 ut_ad(buf_page_get_state(bpage) == BUF_BLOCK_ZIP_PAGE);982 ut_ad(buf_page_get_state(bpage) == BUF_BLOCK_ZIP_PAGE);
926983
927 /* Find the first successor of bpage in the LRU list984 /* Find the first successor of bpage in the LRU list
@@ -961,7 +1018,7 @@
961 ibool freed;1018 ibool freed;
962 ulint scanned;1019 ulint scanned;
9631020
964 ut_ad(buf_pool_mutex_own(buf_pool));1021 ut_ad(mutex_own(&buf_pool->LRU_list_mutex));
9651022
966 if (!buf_LRU_evict_from_unzip_LRU(buf_pool)) {1023 if (!buf_LRU_evict_from_unzip_LRU(buf_pool)) {
967 return(FALSE);1024 return(FALSE);
@@ -976,12 +1033,16 @@
976 buf_block_t* prev_block = UT_LIST_GET_PREV(unzip_LRU,1033 buf_block_t* prev_block = UT_LIST_GET_PREV(unzip_LRU,
977 block);1034 block);
9781035
1036 mutex_enter(&block->mutex);
1037
979 ut_ad(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);1038 ut_ad(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);
980 ut_ad(block->in_unzip_LRU_list);1039 ut_ad(block->in_unzip_LRU_list);
981 ut_ad(block->page.in_LRU_list);1040 ut_ad(block->page.in_LRU_list);
9821041
983 freed = buf_LRU_free_page(&block->page, false);1042 freed = buf_LRU_free_page(&block->page, false);
9841043
1044 mutex_exit(&block->mutex);
1045
985 block = prev_block;1046 block = prev_block;
986 }1047 }
9871048
@@ -1009,7 +1070,7 @@
1009 ibool freed;1070 ibool freed;
1010 ulint scanned;1071 ulint scanned;
10111072
1012 ut_ad(buf_pool_mutex_own(buf_pool));1073 ut_ad(mutex_own(&buf_pool->LRU_list_mutex));
10131074
1014 for (bpage = UT_LIST_GET_LAST(buf_pool->LRU),1075 for (bpage = UT_LIST_GET_LAST(buf_pool->LRU),
1015 scanned = 1, freed = FALSE;1076 scanned = 1, freed = FALSE;
@@ -1020,12 +1081,19 @@
1020 unsigned accessed;1081 unsigned accessed;
1021 buf_page_t* prev_bpage = UT_LIST_GET_PREV(LRU,1082 buf_page_t* prev_bpage = UT_LIST_GET_PREV(LRU,
1022 bpage);1083 bpage);
1084 ib_mutex_t* block_mutex = buf_page_get_mutex(bpage);
10231085
1024 ut_ad(buf_page_in_file(bpage));1086 ut_ad(buf_page_in_file(bpage));
1025 ut_ad(bpage->in_LRU_list);1087 ut_ad(bpage->in_LRU_list);
10261088
1027 accessed = buf_page_is_accessed(bpage);1089 accessed = buf_page_is_accessed(bpage);
1090
1091 mutex_enter(block_mutex);
1092
1028 freed = buf_LRU_free_page(bpage, true);1093 freed = buf_LRU_free_page(bpage, true);
1094
1095 mutex_exit(block_mutex);
1096
1029 if (freed && !accessed) {1097 if (freed && !accessed) {
1030 /* Keep track of pages that are evicted without1098 /* Keep track of pages that are evicted without
1031 ever being accessed. This gives us a measure of1099 ever being accessed. This gives us a measure of
@@ -1057,11 +1125,24 @@
1057 if TRUE, otherwise scan only1125 if TRUE, otherwise scan only
1058 'old' blocks. */1126 'old' blocks. */
1059{1127{
1060 ut_ad(buf_pool_mutex_own(buf_pool));1128 ibool freed = FALSE;
10611129 bool use_unzip_list = UT_LIST_GET_LEN(buf_pool->unzip_LRU) > 0;
1062 return(buf_LRU_free_from_unzip_LRU_list(buf_pool, scan_all)1130
1063 || buf_LRU_free_from_common_LRU_list(1131 mutex_enter(&buf_pool->LRU_list_mutex);
1064 buf_pool, scan_all));1132
1133 if (use_unzip_list) {
1134 freed = buf_LRU_free_from_unzip_LRU_list(buf_pool, scan_all);
1135 }
1136
1137 if (!freed) {
1138 freed = buf_LRU_free_from_common_LRU_list(buf_pool, scan_all);
1139 }
1140
1141 if (!freed) {
1142 mutex_exit(&buf_pool->LRU_list_mutex);
1143 }
1144
1145 return(freed);
1065}1146}
10661147
1067/******************************************************************//**1148/******************************************************************//**
@@ -1082,8 +1163,6 @@
10821163
1083 buf_pool = buf_pool_from_array(i);1164 buf_pool = buf_pool_from_array(i);
10841165
1085 buf_pool_mutex_enter(buf_pool);
1086
1087 if (!recv_recovery_on1166 if (!recv_recovery_on
1088 && UT_LIST_GET_LEN(buf_pool->free)1167 && UT_LIST_GET_LEN(buf_pool->free)
1089 + UT_LIST_GET_LEN(buf_pool->LRU)1168 + UT_LIST_GET_LEN(buf_pool->LRU)
@@ -1091,8 +1170,6 @@
10911170
1092 ret = TRUE;1171 ret = TRUE;
1093 }1172 }
1094
1095 buf_pool_mutex_exit(buf_pool);
1096 }1173 }
10971174
1098 return(ret);1175 return(ret);
@@ -1110,9 +1187,9 @@
1110{1187{
1111 buf_block_t* block;1188 buf_block_t* block;
11121189
1113 ut_ad(buf_pool_mutex_own(buf_pool));1190 mutex_enter(&buf_pool->free_list_mutex);
11141191
1115 block = (buf_block_t*) UT_LIST_GET_FIRST(buf_pool->free);1192 block = (buf_block_t*) UT_LIST_GET_LAST(buf_pool->free);
11161193
1117 if (block) {1194 if (block) {
11181195
@@ -1122,18 +1199,23 @@
1122 ut_ad(!block->page.in_LRU_list);1199 ut_ad(!block->page.in_LRU_list);
1123 ut_a(!buf_page_in_file(&block->page));1200 ut_a(!buf_page_in_file(&block->page));
1124 UT_LIST_REMOVE(list, buf_pool->free, (&block->page));1201 UT_LIST_REMOVE(list, buf_pool->free, (&block->page));
1125
1126 mutex_enter(&block->mutex);
1127
1128 buf_block_set_state(block, BUF_BLOCK_READY_FOR_USE);1202 buf_block_set_state(block, BUF_BLOCK_READY_FOR_USE);
1203
1204 mutex_exit(&buf_pool->free_list_mutex);
1205
1206 mutex_enter(&block->mutex);
1207
1129 UNIV_MEM_ALLOC(block->frame, UNIV_PAGE_SIZE);1208 UNIV_MEM_ALLOC(block->frame, UNIV_PAGE_SIZE);
11301209
1131 ut_ad(buf_pool_from_block(block) == buf_pool);1210 ut_ad(buf_pool_from_block(block) == buf_pool);
11321211
1133 mutex_exit(&block->mutex);1212 mutex_exit(&block->mutex);
1213 return(block);
1134 }1214 }
11351215
1136 return(block);1216 mutex_exit(&buf_pool->free_list_mutex);
1217
1218 return(NULL);
1137}1219}
11381220
1139/******************************************************************//**1221/******************************************************************//**
@@ -1147,8 +1229,6 @@
1147/*===================================*/1229/*===================================*/
1148 const buf_pool_t* buf_pool) /*!< in: buffer pool instance */1230 const buf_pool_t* buf_pool) /*!< in: buffer pool instance */
1149{1231{
1150 ut_ad(buf_pool_mutex_own(buf_pool));
1151
1152 if (!recv_recovery_on && UT_LIST_GET_LEN(buf_pool->free)1232 if (!recv_recovery_on && UT_LIST_GET_LEN(buf_pool->free)
1153 + UT_LIST_GET_LEN(buf_pool->LRU) < buf_pool->curr_size / 20) {1233 + UT_LIST_GET_LEN(buf_pool->LRU) < buf_pool->curr_size / 20) {
1154 ut_print_timestamp(stderr);1234 ut_print_timestamp(stderr);
@@ -1253,10 +1333,10 @@
1253 ibool mon_value_was = FALSE;1333 ibool mon_value_was = FALSE;
1254 ibool started_monitor = FALSE;1334 ibool started_monitor = FALSE;
12551335
1336 ut_ad(!mutex_own(&buf_pool->LRU_list_mutex));
1337
1256 MONITOR_INC(MONITOR_LRU_GET_FREE_SEARCH);1338 MONITOR_INC(MONITOR_LRU_GET_FREE_SEARCH);
1257loop:1339loop:
1258 buf_pool_mutex_enter(buf_pool);
1259
1260 buf_LRU_check_size_of_non_data_objects(buf_pool);1340 buf_LRU_check_size_of_non_data_objects(buf_pool);
12611341
1262 /* If there is a block in the free list, take it */1342 /* If there is a block in the free list, take it */
@@ -1264,7 +1344,6 @@
12641344
1265 if (block) {1345 if (block) {
12661346
1267 buf_pool_mutex_exit(buf_pool);
1268 ut_ad(buf_pool_from_block(block) == buf_pool);1347 ut_ad(buf_pool_from_block(block) == buf_pool);
1269 memset(&block->page.zip, 0, sizeof block->page.zip);1348 memset(&block->page.zip, 0, sizeof block->page.zip);
12701349
@@ -1275,22 +1354,28 @@
1275 return(block);1354 return(block);
1276 }1355 }
12771356
1357 mutex_enter(&buf_pool->flush_state_mutex);
1358
1278 if (buf_pool->init_flush[BUF_FLUSH_LRU]1359 if (buf_pool->init_flush[BUF_FLUSH_LRU]
1279 && srv_use_doublewrite_buf1360 && srv_use_doublewrite_buf
1280 && buf_dblwr != NULL) {1361 && buf_dblwr != NULL) {
12811362
1363 mutex_exit(&buf_pool->flush_state_mutex);
1364
1282 /* If there is an LRU flush happening in the background1365 /* If there is an LRU flush happening in the background
1283 then we wait for it to end instead of trying a single1366 then we wait for it to end instead of trying a single
1284 page flush. If, however, we are not using doublewrite1367 page flush. If, however, we are not using doublewrite
1285 buffer then it is better to do our own single page1368 buffer then it is better to do our own single page
1286 flush instead of waiting for LRU flush to end. */1369 flush instead of waiting for LRU flush to end. */
1287 buf_pool_mutex_exit(buf_pool);
1288 buf_flush_wait_batch_end(buf_pool, BUF_FLUSH_LRU);1370 buf_flush_wait_batch_end(buf_pool, BUF_FLUSH_LRU);
1289 goto loop;1371 goto loop;
1290 }1372 }
12911373
1374 mutex_exit(&buf_pool->flush_state_mutex);
1375
1292 freed = FALSE;1376 freed = FALSE;
1293 if (buf_pool->try_LRU_scan || n_iterations > 0) {1377 if (buf_pool->try_LRU_scan || n_iterations > 0) {
1378
1294 /* If no block was in the free list, search from the1379 /* If no block was in the free list, search from the
1295 end of the LRU list and try to free a block there.1380 end of the LRU list and try to free a block there.
1296 If we are doing for the first time we'll scan only1381 If we are doing for the first time we'll scan only
@@ -1308,8 +1393,6 @@
1308 }1393 }
1309 }1394 }
13101395
1311 buf_pool_mutex_exit(buf_pool);
1312
1313 if (freed) {1396 if (freed) {
1314 goto loop;1397 goto loop;
13151398
@@ -1395,7 +1478,7 @@
1395 ulint new_len;1478 ulint new_len;
13961479
1397 ut_a(buf_pool->LRU_old);1480 ut_a(buf_pool->LRU_old);
1398 ut_ad(buf_pool_mutex_own(buf_pool));1481 ut_ad(mutex_own(&buf_pool->LRU_list_mutex));
1399 ut_ad(buf_pool->LRU_old_ratio >= BUF_LRU_OLD_RATIO_MIN);1482 ut_ad(buf_pool->LRU_old_ratio >= BUF_LRU_OLD_RATIO_MIN);
1400 ut_ad(buf_pool->LRU_old_ratio <= BUF_LRU_OLD_RATIO_MAX);1483 ut_ad(buf_pool->LRU_old_ratio <= BUF_LRU_OLD_RATIO_MAX);
1401#if BUF_LRU_OLD_RATIO_MIN * BUF_LRU_OLD_MIN_LEN <= BUF_LRU_OLD_RATIO_DIV * (BUF_LRU_OLD_TOLERANCE + 5)1484#if BUF_LRU_OLD_RATIO_MIN * BUF_LRU_OLD_MIN_LEN <= BUF_LRU_OLD_RATIO_DIV * (BUF_LRU_OLD_TOLERANCE + 5)
@@ -1461,7 +1544,7 @@
1461{1544{
1462 buf_page_t* bpage;1545 buf_page_t* bpage;
14631546
1464 ut_ad(buf_pool_mutex_own(buf_pool));1547 ut_ad(mutex_own(&buf_pool->LRU_list_mutex));
1465 ut_a(UT_LIST_GET_LEN(buf_pool->LRU) == BUF_LRU_OLD_MIN_LEN);1548 ut_a(UT_LIST_GET_LEN(buf_pool->LRU) == BUF_LRU_OLD_MIN_LEN);
14661549
1467 /* We first initialize all blocks in the LRU list as old and then use1550 /* We first initialize all blocks in the LRU list as old and then use
@@ -1496,7 +1579,7 @@
1496 ut_ad(buf_pool);1579 ut_ad(buf_pool);
1497 ut_ad(bpage);1580 ut_ad(bpage);
1498 ut_ad(buf_page_in_file(bpage));1581 ut_ad(buf_page_in_file(bpage));
1499 ut_ad(buf_pool_mutex_own(buf_pool));1582 ut_ad(mutex_own(&buf_pool->LRU_list_mutex));
15001583
1501 if (buf_page_belongs_to_unzip_LRU(bpage)) {1584 if (buf_page_belongs_to_unzip_LRU(bpage)) {
1502 buf_block_t* block = (buf_block_t*) bpage;1585 buf_block_t* block = (buf_block_t*) bpage;
@@ -1521,7 +1604,7 @@
15211604
1522 ut_ad(buf_pool);1605 ut_ad(buf_pool);
1523 ut_ad(bpage);1606 ut_ad(bpage);
1524 ut_ad(buf_pool_mutex_own(buf_pool));1607 ut_ad(mutex_own(&buf_pool->LRU_list_mutex));
15251608
1526 ut_a(buf_page_in_file(bpage));1609 ut_a(buf_page_in_file(bpage));
15271610
@@ -1601,7 +1684,7 @@
16011684
1602 ut_ad(buf_pool);1685 ut_ad(buf_pool);
1603 ut_ad(block);1686 ut_ad(block);
1604 ut_ad(buf_pool_mutex_own(buf_pool));1687 ut_ad(mutex_own(&buf_pool->LRU_list_mutex));
16051688
1606 ut_a(buf_page_belongs_to_unzip_LRU(&block->page));1689 ut_a(buf_page_belongs_to_unzip_LRU(&block->page));
16071690
@@ -1630,7 +1713,7 @@
16301713
1631 ut_ad(buf_pool);1714 ut_ad(buf_pool);
1632 ut_ad(bpage);1715 ut_ad(bpage);
1633 ut_ad(buf_pool_mutex_own(buf_pool));1716 ut_ad(mutex_own(&buf_pool->LRU_list_mutex));
16341717
1635 ut_a(buf_page_in_file(bpage));1718 ut_a(buf_page_in_file(bpage));
16361719
@@ -1686,7 +1769,7 @@
16861769
1687 ut_ad(buf_pool);1770 ut_ad(buf_pool);
1688 ut_ad(bpage);1771 ut_ad(bpage);
1689 ut_ad(buf_pool_mutex_own(buf_pool));1772 ut_ad(mutex_own(&buf_pool->LRU_list_mutex));
16901773
1691 ut_a(buf_page_in_file(bpage));1774 ut_a(buf_page_in_file(bpage));
1692 ut_ad(!bpage->in_LRU_list);1775 ut_ad(!bpage->in_LRU_list);
@@ -1770,7 +1853,7 @@
1770{1853{
1771 buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);1854 buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);
17721855
1773 ut_ad(buf_pool_mutex_own(buf_pool));1856 ut_ad(mutex_own(&buf_pool->LRU_list_mutex));
17741857
1775 if (bpage->old) {1858 if (bpage->old) {
1776 buf_pool->stat.n_pages_made_young++;1859 buf_pool->stat.n_pages_made_young++;
@@ -1796,12 +1879,14 @@
1796Try to free a block. If bpage is a descriptor of a compressed-only1879Try to free a block. If bpage is a descriptor of a compressed-only
1797page, the descriptor object will be freed as well.1880page, the descriptor object will be freed as well.
17981881
1799NOTE: If this function returns true, it will temporarily1882NOTE: If this function returns true, it will release the LRU list mutex,
1800release buf_pool->mutex. Furthermore, the page frame will no longer be1883and temporarily release and relock the buf_page_get_mutex() mutex.
1801accessible via bpage.1884Furthermore, the page frame will no longer be accessible via bpage. If this
18021885function returns false, the buf_page_get_mutex() might be temporarily released
1803The caller must hold buf_pool->mutex and must not hold any1886and relocked too.
1804buf_page_get_mutex() when calling this function.1887
1888The caller must hold the LRU list and buf_page_get_mutex() mutexes.
1889
1805@return true if freed, false otherwise. */1890@return true if freed, false otherwise. */
1806UNIV_INTERN1891UNIV_INTERN
1807bool1892bool
@@ -1819,13 +1904,11 @@
18191904
1820 ib_mutex_t* block_mutex = buf_page_get_mutex(bpage);1905 ib_mutex_t* block_mutex = buf_page_get_mutex(bpage);
18211906
1822 ut_ad(buf_pool_mutex_own(buf_pool));1907 ut_ad(mutex_own(&buf_pool->LRU_list_mutex));
1908 ut_ad(mutex_own(block_mutex));
1823 ut_ad(buf_page_in_file(bpage));1909 ut_ad(buf_page_in_file(bpage));
1824 ut_ad(bpage->in_LRU_list);1910 ut_ad(bpage->in_LRU_list);
18251911
1826 rw_lock_x_lock(hash_lock);
1827 mutex_enter(block_mutex);
1828
1829#if UNIV_WORD_SIZE == 41912#if UNIV_WORD_SIZE == 4
1830 /* On 32-bit systems, there is no padding in buf_page_t. On1913 /* On 32-bit systems, there is no padding in buf_page_t. On
1831 other systems, Valgrind could complain about uninitialized pad1914 other systems, Valgrind could complain about uninitialized pad
@@ -1836,7 +1919,7 @@
1836 if (!buf_page_can_relocate(bpage)) {1919 if (!buf_page_can_relocate(bpage)) {
18371920
1838 /* Do not free buffer-fixed or I/O-fixed blocks. */1921 /* Do not free buffer-fixed or I/O-fixed blocks. */
1839 goto func_exit;1922 return(false);
1840 }1923 }
18411924
1842#ifdef UNIV_IBUF_COUNT_DEBUG1925#ifdef UNIV_IBUF_COUNT_DEBUG
@@ -1848,7 +1931,7 @@
1848 /* Do not completely free dirty blocks. */1931 /* Do not completely free dirty blocks. */
18491932
1850 if (bpage->oldest_modification) {1933 if (bpage->oldest_modification) {
1851 goto func_exit;1934 return(false);
1852 }1935 }
1853 } else if ((bpage->oldest_modification)1936 } else if ((bpage->oldest_modification)
1854 && (buf_page_get_state(bpage)1937 && (buf_page_get_state(bpage)
@@ -1857,18 +1940,13 @@
1857 ut_ad(buf_page_get_state(bpage)1940 ut_ad(buf_page_get_state(bpage)
1858 == BUF_BLOCK_ZIP_DIRTY);1941 == BUF_BLOCK_ZIP_DIRTY);
18591942
1860func_exit:
1861 rw_lock_x_unlock(hash_lock);
1862 mutex_exit(block_mutex);
1863 return(false);1943 return(false);
18641944
1865 } else if (buf_page_get_state(bpage) == BUF_BLOCK_FILE_PAGE) {1945 } else if (buf_page_get_state(bpage) == BUF_BLOCK_FILE_PAGE) {
1866 b = buf_page_alloc_descriptor();1946 b = buf_page_alloc_descriptor();
1867 ut_a(b);1947 ut_a(b);
1868 memcpy(b, bpage, sizeof *b);
1869 }1948 }
18701949
1871 ut_ad(buf_pool_mutex_own(buf_pool));
1872 ut_ad(buf_page_in_file(bpage));1950 ut_ad(buf_page_in_file(bpage));
1873 ut_ad(bpage->in_LRU_list);1951 ut_ad(bpage->in_LRU_list);
1874 ut_ad(!bpage->in_flush_list == !bpage->oldest_modification);1952 ut_ad(!bpage->in_flush_list == !bpage->oldest_modification);
@@ -1887,12 +1965,45 @@
1887 }1965 }
1888#endif /* UNIV_DEBUG */1966#endif /* UNIV_DEBUG */
18891967
1890#ifdef UNIV_SYNC_DEBUG1968 mutex_exit(block_mutex);
1891 ut_ad(rw_lock_own(hash_lock, RW_LOCK_EX));1969
1892#endif /* UNIV_SYNC_DEBUG */1970 rw_lock_x_lock(hash_lock);
1893 ut_ad(buf_page_can_relocate(bpage));1971 mutex_enter(block_mutex);
1972
1973 if (UNIV_UNLIKELY(!buf_page_can_relocate(bpage)
1974 || ((zip || !bpage->zip.data)
1975 && bpage->oldest_modification))) {
1976
1977not_freed:
1978 rw_lock_x_unlock(hash_lock);
1979 if (b) {
1980 buf_page_free_descriptor(b);
1981 }
1982
1983 return(false);
1984 } else if (UNIV_UNLIKELY(bpage->oldest_modification
1985 && (buf_page_get_state(bpage)
1986 != BUF_BLOCK_FILE_PAGE))) {
1987
1988 ut_ad(buf_page_get_state(bpage)
1989 == BUF_BLOCK_ZIP_DIRTY);
1990 goto not_freed;
1991 }
1992
1993 if (b) {
1994 memcpy(b, bpage, sizeof *b);
1995 }
18941996
1895 if (!buf_LRU_block_remove_hashed(bpage, zip)) {1997 if (!buf_LRU_block_remove_hashed(bpage, zip)) {
1998
1999 mutex_exit(&buf_pool->LRU_list_mutex);
2000
2001 if (b) {
2002 buf_page_free_descriptor(b);
2003 }
2004
2005 mutex_enter(block_mutex);
2006
1896 return(true);2007 return(true);
1897 }2008 }
18982009
@@ -1997,6 +2108,8 @@
1997 buf_LRU_add_block_low(b, buf_page_is_old(b));2108 buf_LRU_add_block_low(b, buf_page_is_old(b));
1998 }2109 }
19992110
2111 mutex_enter(&buf_pool->zip_mutex);
2112 rw_lock_x_unlock(hash_lock);
2000 if (b->state == BUF_BLOCK_ZIP_PAGE) {2113 if (b->state == BUF_BLOCK_ZIP_PAGE) {
2001#if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG2114#if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
2002 buf_LRU_insert_zip_clean(b);2115 buf_LRU_insert_zip_clean(b);
@@ -2008,35 +2121,16 @@
20082121
2009 bpage->zip.data = NULL;2122 bpage->zip.data = NULL;
2010 page_zip_set_size(&bpage->zip, 0);2123 page_zip_set_size(&bpage->zip, 0);
2011 mutex_exit(block_mutex);
20122124
2013 /* Prevent buf_page_get_gen() from2125 /* Prevent buf_page_get_gen() from
2014 decompressing the block while we release2126 decompressing the block while we release block_mutex. */
2015 buf_pool->mutex and block_mutex. */
2016 block_mutex = buf_page_get_mutex(b);
2017 mutex_enter(block_mutex);
2018 buf_page_set_sticky(b);2127 buf_page_set_sticky(b);
2019 mutex_exit(block_mutex);2128 mutex_exit(&buf_pool->zip_mutex);
20202129 mutex_exit(block_mutex);
2021 rw_lock_x_unlock(hash_lock);2130
2022
2023 } else {
2024
2025 /* There can be multiple threads doing an LRU scan to
2026 free a block. The page_cleaner thread can be doing an
2027 LRU batch whereas user threads can potentially be doing
2028 multiple single page flushes. As we release
2029 buf_pool->mutex below we need to make sure that no one
2030 else considers this block as a victim for page
2031 replacement. This block is already out of page_hash
2032 and we are about to remove it from the LRU list and put
2033 it on the free list. */
2034 mutex_enter(block_mutex);
2035 buf_page_set_sticky(bpage);
2036 mutex_exit(block_mutex);
2037 }2131 }
20382132
2039 buf_pool_mutex_exit(buf_pool);2133 mutex_exit(&buf_pool->LRU_list_mutex);
20402134
2041 /* Remove possible adaptive hash index on the page.2135 /* Remove possible adaptive hash index on the page.
2042 The page was declared uninitialized by2136 The page was declared uninitialized by
@@ -2069,13 +2163,17 @@
2069 checksum);2163 checksum);
2070 }2164 }
20712165
2072 buf_pool_mutex_enter(buf_pool);
2073
2074 mutex_enter(block_mutex);2166 mutex_enter(block_mutex);
2075 buf_page_unset_sticky(b != NULL ? b : bpage);2167
2076 mutex_exit(block_mutex);2168 if (b) {
2169 mutex_enter(&buf_pool->zip_mutex);
2170 buf_page_unset_sticky(b);
2171 mutex_exit(&buf_pool->zip_mutex);
2172 }
20772173
2078 buf_LRU_block_free_hashed_page((buf_block_t*) bpage);2174 buf_LRU_block_free_hashed_page((buf_block_t*) bpage);
2175 ut_ad(mutex_own(block_mutex));
2176 ut_ad(!mutex_own(&buf_pool->LRU_list_mutex));
2079 return(true);2177 return(true);
2080}2178}
20812179
@@ -2091,7 +2189,6 @@
2091 buf_pool_t* buf_pool = buf_pool_from_block(block);2189 buf_pool_t* buf_pool = buf_pool_from_block(block);
20922190
2093 ut_ad(block);2191 ut_ad(block);
2094 ut_ad(buf_pool_mutex_own(buf_pool));
2095 ut_ad(mutex_own(&block->mutex));2192 ut_ad(mutex_own(&block->mutex));
20962193
2097 switch (buf_block_get_state(block)) {2194 switch (buf_block_get_state(block)) {
@@ -2109,8 +2206,6 @@
2109 ut_ad(!block->page.in_flush_list);2206 ut_ad(!block->page.in_flush_list);
2110 ut_ad(!block->page.in_LRU_list);2207 ut_ad(!block->page.in_LRU_list);
21112208
2112 buf_block_set_state(block, BUF_BLOCK_NOT_USED);
2113
2114 UNIV_MEM_ALLOC(block->frame, UNIV_PAGE_SIZE);2209 UNIV_MEM_ALLOC(block->frame, UNIV_PAGE_SIZE);
2115#ifdef UNIV_DEBUG2210#ifdef UNIV_DEBUG
2116 /* Wipe contents of page to reveal possible stale pointers to it */2211 /* Wipe contents of page to reveal possible stale pointers to it */
@@ -2125,18 +2220,19 @@
2125 if (data) {2220 if (data) {
2126 block->page.zip.data = NULL;2221 block->page.zip.data = NULL;
2127 mutex_exit(&block->mutex);2222 mutex_exit(&block->mutex);
2128 buf_pool_mutex_exit_forbid(buf_pool);
21292223
2130 buf_buddy_free(2224 buf_buddy_free(
2131 buf_pool, data, page_zip_get_size(&block->page.zip));2225 buf_pool, data, page_zip_get_size(&block->page.zip));
21322226
2133 buf_pool_mutex_exit_allow(buf_pool);
2134 mutex_enter(&block->mutex);2227 mutex_enter(&block->mutex);
2135 page_zip_set_size(&block->page.zip, 0);2228 page_zip_set_size(&block->page.zip, 0);
2136 }2229 }
21372230
2231 mutex_enter(&buf_pool->free_list_mutex);
2232 buf_block_set_state(block, BUF_BLOCK_NOT_USED);
2138 UT_LIST_ADD_FIRST(list, buf_pool->free, (&block->page));2233 UT_LIST_ADD_FIRST(list, buf_pool->free, (&block->page));
2139 ut_d(block->page.in_free_list = TRUE);2234 ut_d(block->page.in_free_list = TRUE);
2235 mutex_exit(&buf_pool->free_list_mutex);
21402236
2141 UNIV_MEM_ASSERT_AND_FREE(block->frame, UNIV_PAGE_SIZE);2237 UNIV_MEM_ASSERT_AND_FREE(block->frame, UNIV_PAGE_SIZE);
2142}2238}
@@ -2146,7 +2242,7 @@
2146If the block is compressed-only (BUF_BLOCK_ZIP_PAGE),2242If the block is compressed-only (BUF_BLOCK_ZIP_PAGE),
2147the object will be freed.2243the object will be freed.
21482244
2149The caller must hold buf_pool->mutex, the buf_page_get_mutex() mutex2245The caller must hold buf_pool->LRU_list_mutex, the buf_page_get_mutex() mutex
2150and the appropriate hash_lock. This function will release the2246and the appropriate hash_lock. This function will release the
2151buf_page_get_mutex() and the hash_lock.2247buf_page_get_mutex() and the hash_lock.
21522248
@@ -2171,7 +2267,7 @@
2171 rw_lock_t* hash_lock;2267 rw_lock_t* hash_lock;
21722268
2173 ut_ad(bpage);2269 ut_ad(bpage);
2174 ut_ad(buf_pool_mutex_own(buf_pool));2270 ut_ad(mutex_own(&buf_pool->LRU_list_mutex));
2175 ut_ad(mutex_own(buf_page_get_mutex(bpage)));2271 ut_ad(mutex_own(buf_page_get_mutex(bpage)));
21762272
2177 fold = buf_page_address_fold(bpage->space, bpage->offset);2273 fold = buf_page_address_fold(bpage->space, bpage->offset);
@@ -2287,7 +2383,7 @@
2287#if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG2383#if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
2288 mutex_exit(buf_page_get_mutex(bpage));2384 mutex_exit(buf_page_get_mutex(bpage));
2289 rw_lock_x_unlock(hash_lock);2385 rw_lock_x_unlock(hash_lock);
2290 buf_pool_mutex_exit(buf_pool);2386 mutex_exit(&buf_pool->LRU_list_mutex);
2291 buf_print();2387 buf_print();
2292 buf_LRU_print();2388 buf_LRU_print();
2293 buf_validate();2389 buf_validate();
@@ -2314,13 +2410,11 @@
23142410
2315 mutex_exit(&buf_pool->zip_mutex);2411 mutex_exit(&buf_pool->zip_mutex);
2316 rw_lock_x_unlock(hash_lock);2412 rw_lock_x_unlock(hash_lock);
2317 buf_pool_mutex_exit_forbid(buf_pool);
23182413
2319 buf_buddy_free(2414 buf_buddy_free(
2320 buf_pool, bpage->zip.data,2415 buf_pool, bpage->zip.data,
2321 page_zip_get_size(&bpage->zip));2416 page_zip_get_size(&bpage->zip));
23222417
2323 buf_pool_mutex_exit_allow(buf_pool);
2324 buf_page_free_descriptor(bpage);2418 buf_page_free_descriptor(bpage);
2325 return(false);2419 return(false);
23262420
@@ -2344,14 +2438,15 @@
2344 page_hash. Only possibility is when while invalidating2438 page_hash. Only possibility is when while invalidating
2345 a tablespace we buffer fix the prev_page in LRU to2439 a tablespace we buffer fix the prev_page in LRU to
2346 avoid relocation during the scan. But that is not2440 avoid relocation during the scan. But that is not
2347 possible because we are holding buf_pool mutex.2441 possible because we are holding LRU list mutex.
23482442
2349 2) Not possible because in buf_page_init_for_read()2443 2) Not possible because in buf_page_init_for_read()
2350 we do a look up of page_hash while holding buf_pool2444 we do a look up of page_hash while holding LRU list
2351 mutex and since we are holding buf_pool mutex here2445 mutex and since we are holding LRU list mutex here
2352 and by the time we'll release it in the caller we'd2446 and by the time we'll release it in the caller we'd
2353 have inserted the compressed only descriptor in the2447 have inserted the compressed only descriptor in the
2354 page_hash. */2448 page_hash. */
2449 ut_ad(mutex_own(&buf_pool->LRU_list_mutex));
2355 rw_lock_x_unlock(hash_lock);2450 rw_lock_x_unlock(hash_lock);
2356 mutex_exit(&((buf_block_t*) bpage)->mutex);2451 mutex_exit(&((buf_block_t*) bpage)->mutex);
23572452
@@ -2363,13 +2458,11 @@
2363 ut_ad(!bpage->in_free_list);2458 ut_ad(!bpage->in_free_list);
2364 ut_ad(!bpage->in_flush_list);2459 ut_ad(!bpage->in_flush_list);
2365 ut_ad(!bpage->in_LRU_list);2460 ut_ad(!bpage->in_LRU_list);
2366 buf_pool_mutex_exit_forbid(buf_pool);
23672461
2368 buf_buddy_free(2462 buf_buddy_free(
2369 buf_pool, data,2463 buf_pool, data,
2370 page_zip_get_size(&bpage->zip));2464 page_zip_get_size(&bpage->zip));
23712465
2372 buf_pool_mutex_exit_allow(buf_pool);
2373 page_zip_set_size(&bpage->zip, 0);2466 page_zip_set_size(&bpage->zip, 0);
2374 }2467 }
23752468
@@ -2397,16 +2490,11 @@
2397 buf_block_t* block) /*!< in: block, must contain a file page and2490 buf_block_t* block) /*!< in: block, must contain a file page and
2398 be in a state where it can be freed */2491 be in a state where it can be freed */
2399{2492{
2400#ifdef UNIV_DEBUG2493 ut_ad(mutex_own(&block->mutex));
2401 buf_pool_t* buf_pool = buf_pool_from_block(block);
2402 ut_ad(buf_pool_mutex_own(buf_pool));
2403#endif
24042494
2405 mutex_enter(&block->mutex);
2406 buf_block_set_state(block, BUF_BLOCK_MEMORY);2495 buf_block_set_state(block, BUF_BLOCK_MEMORY);
24072496
2408 buf_LRU_block_free_non_file_page(block);2497 buf_LRU_block_free_non_file_page(block);
2409 mutex_exit(&block->mutex);
2410}2498}
24112499
2412/******************************************************************//**2500/******************************************************************//**
@@ -2419,19 +2507,26 @@
2419 be in a state where it can be freed; there2507 be in a state where it can be freed; there
2420 may or may not be a hash index to the page */2508 may or may not be a hash index to the page */
2421{2509{
2510#if defined(UNIV_DEBUG) || defined(UNIV_SYNC_DEBUG)
2422 buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);2511 buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);
2512#endif
2513#ifdef UNIV_SYNC_DEBUG
2423 const ulint fold = buf_page_address_fold(bpage->space,2514 const ulint fold = buf_page_address_fold(bpage->space,
2424 bpage->offset);2515 bpage->offset);
2425 rw_lock_t* hash_lock = buf_page_hash_lock_get(buf_pool, fold);2516 rw_lock_t* hash_lock = buf_page_hash_lock_get(buf_pool, fold);
2517#endif
2426 ib_mutex_t* block_mutex = buf_page_get_mutex(bpage);2518 ib_mutex_t* block_mutex = buf_page_get_mutex(bpage);
24272519
2428 ut_ad(buf_pool_mutex_own(buf_pool));2520 ut_ad(mutex_own(&buf_pool->LRU_list_mutex));
24292521 ut_ad(mutex_own(block_mutex));
2430 rw_lock_x_lock(hash_lock);2522#ifdef UNIV_SYNC_DEBUG
2431 mutex_enter(block_mutex);2523 ut_ad(rw_lock_own(hash_lock, RW_LOCK_EX));
2524#endif
24322525
2433 if (buf_LRU_block_remove_hashed(bpage, true)) {2526 if (buf_LRU_block_remove_hashed(bpage, true)) {
2527 mutex_enter(block_mutex);
2434 buf_LRU_block_free_hashed_page((buf_block_t*) bpage);2528 buf_LRU_block_free_hashed_page((buf_block_t*) bpage);
2529 mutex_exit(block_mutex);
2435 }2530 }
24362531
2437 /* buf_LRU_block_remove_hashed() releases hash_lock and block_mutex */2532 /* buf_LRU_block_remove_hashed() releases hash_lock and block_mutex */
@@ -2466,7 +2561,7 @@
2466 }2561 }
24672562
2468 if (adjust) {2563 if (adjust) {
2469 buf_pool_mutex_enter(buf_pool);2564 mutex_enter(&buf_pool->LRU_list_mutex);
24702565
2471 if (ratio != buf_pool->LRU_old_ratio) {2566 if (ratio != buf_pool->LRU_old_ratio) {
2472 buf_pool->LRU_old_ratio = ratio;2567 buf_pool->LRU_old_ratio = ratio;
@@ -2478,7 +2573,7 @@
2478 }2573 }
2479 }2574 }
24802575
2481 buf_pool_mutex_exit(buf_pool);2576 mutex_exit(&buf_pool->LRU_list_mutex);
2482 } else {2577 } else {
2483 buf_pool->LRU_old_ratio = ratio;2578 buf_pool->LRU_old_ratio = ratio;
2484 }2579 }
@@ -2583,7 +2678,7 @@
2583 ulint new_len;2678 ulint new_len;
25842679
2585 ut_ad(buf_pool);2680 ut_ad(buf_pool);
2586 buf_pool_mutex_enter(buf_pool);2681 mutex_enter(&buf_pool->LRU_list_mutex);
25872682
2588 if (UT_LIST_GET_LEN(buf_pool->LRU) >= BUF_LRU_OLD_MIN_LEN) {2683 if (UT_LIST_GET_LEN(buf_pool->LRU) >= BUF_LRU_OLD_MIN_LEN) {
25892684
@@ -2641,6 +2736,10 @@
26412736
2642 ut_a(buf_pool->LRU_old_len == old_len);2737 ut_a(buf_pool->LRU_old_len == old_len);
26432738
2739 mutex_exit(&buf_pool->LRU_list_mutex);
2740
2741 mutex_enter(&buf_pool->free_list_mutex);
2742
2644 UT_LIST_VALIDATE(list, buf_page_t, buf_pool->free, CheckInFreeList());2743 UT_LIST_VALIDATE(list, buf_page_t, buf_pool->free, CheckInFreeList());
26452744
2646 for (bpage = UT_LIST_GET_FIRST(buf_pool->free);2745 for (bpage = UT_LIST_GET_FIRST(buf_pool->free);
@@ -2650,6 +2749,10 @@
2650 ut_a(buf_page_get_state(bpage) == BUF_BLOCK_NOT_USED);2749 ut_a(buf_page_get_state(bpage) == BUF_BLOCK_NOT_USED);
2651 }2750 }
26522751
2752 mutex_exit(&buf_pool->free_list_mutex);
2753
2754 mutex_enter(&buf_pool->LRU_list_mutex);
2755
2653 UT_LIST_VALIDATE(2756 UT_LIST_VALIDATE(
2654 unzip_LRU, buf_block_t, buf_pool->unzip_LRU,2757 unzip_LRU, buf_block_t, buf_pool->unzip_LRU,
2655 CheckUnzipLRUAndLRUList());2758 CheckUnzipLRUAndLRUList());
@@ -2663,7 +2766,7 @@
2663 ut_a(buf_page_belongs_to_unzip_LRU(&block->page));2766 ut_a(buf_page_belongs_to_unzip_LRU(&block->page));
2664 }2767 }
26652768
2666 buf_pool_mutex_exit(buf_pool);2769 mutex_exit(&buf_pool->LRU_list_mutex);
2667}2770}
26682771
2669/**********************************************************************//**2772/**********************************************************************//**
@@ -2699,7 +2802,7 @@
2699 const buf_page_t* bpage;2802 const buf_page_t* bpage;
27002803
2701 ut_ad(buf_pool);2804 ut_ad(buf_pool);
2702 buf_pool_mutex_enter(buf_pool);2805 mutex_enter(&buf_pool->LRU_list_mutex);
27032806
2704 bpage = UT_LIST_GET_FIRST(buf_pool->LRU);2807 bpage = UT_LIST_GET_FIRST(buf_pool->LRU);
27052808
@@ -2756,7 +2859,7 @@
2756 bpage = UT_LIST_GET_NEXT(LRU, bpage);2859 bpage = UT_LIST_GET_NEXT(LRU, bpage);
2757 }2860 }
27582861
2759 buf_pool_mutex_exit(buf_pool);2862 mutex_exit(&buf_pool->LRU_list_mutex);
2760}2863}
27612864
2762/**********************************************************************//**2865/**********************************************************************//**
27632866
=== modified file 'Percona-Server/storage/innobase/buf/buf0rea.cc'
--- Percona-Server/storage/innobase/buf/buf0rea.cc 2013-08-06 15:16:34 +0000
+++ Percona-Server/storage/innobase/buf/buf0rea.cc 2013-09-20 05:29:11 +0000
@@ -63,10 +63,15 @@
63 buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);63 buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);
64 const bool uncompressed = (buf_page_get_state(bpage)64 const bool uncompressed = (buf_page_get_state(bpage)
65 == BUF_BLOCK_FILE_PAGE);65 == BUF_BLOCK_FILE_PAGE);
66 const ulint fold = buf_page_address_fold(bpage->space,
67 bpage->offset);
68 rw_lock_t* hash_lock = buf_page_hash_lock_get(buf_pool, fold);
69
70 mutex_enter(&buf_pool->LRU_list_mutex);
71 rw_lock_x_lock(hash_lock);
72 mutex_enter(buf_page_get_mutex(bpage));
6673
67 /* First unfix and release lock on the bpage */74 /* First unfix and release lock on the bpage */
68 buf_pool_mutex_enter(buf_pool);
69 mutex_enter(buf_page_get_mutex(bpage));
70 ut_ad(buf_page_get_io_fix(bpage) == BUF_IO_READ);75 ut_ad(buf_page_get_io_fix(bpage) == BUF_IO_READ);
71 ut_ad(bpage->buf_fix_count == 0);76 ut_ad(bpage->buf_fix_count == 0);
7277
@@ -79,15 +84,13 @@
79 BUF_IO_READ);84 BUF_IO_READ);
80 }85 }
8186
82 mutex_exit(buf_page_get_mutex(bpage));
83
84 /* remove the block from LRU list */87 /* remove the block from LRU list */
85 buf_LRU_free_one_page(bpage);88 buf_LRU_free_one_page(bpage);
8689
90 mutex_exit(&buf_pool->LRU_list_mutex);
91
87 ut_ad(buf_pool->n_pend_reads > 0);92 ut_ad(buf_pool->n_pend_reads > 0);
88 buf_pool->n_pend_reads--;93 os_atomic_decrement_ulint(&buf_pool->n_pend_reads, 1);
89
90 buf_pool_mutex_exit(buf_pool);
91}94}
9295
93/********************************************************************//**96/********************************************************************//**
@@ -216,6 +219,7 @@
216#endif219#endif
217220
218 ut_ad(buf_page_in_file(bpage));221 ut_ad(buf_page_in_file(bpage));
222 ut_ad(!mutex_own(&buf_pool_from_bpage(bpage)->LRU_list_mutex));
219223
220 if (sync) {224 if (sync) {
221 thd_wait_begin(NULL, THD_WAIT_DISKIO);225 thd_wait_begin(NULL, THD_WAIT_DISKIO);
@@ -332,11 +336,8 @@
332 high = fil_space_get_size(space);336 high = fil_space_get_size(space);
333 }337 }
334338
335 buf_pool_mutex_enter(buf_pool);
336
337 if (buf_pool->n_pend_reads339 if (buf_pool->n_pend_reads
338 > buf_pool->curr_size / BUF_READ_AHEAD_PEND_LIMIT) {340 > buf_pool->curr_size / BUF_READ_AHEAD_PEND_LIMIT) {
339 buf_pool_mutex_exit(buf_pool);
340341
341 return(0);342 return(0);
342 }343 }
@@ -345,8 +346,12 @@
345 that is, reside near the start of the LRU list. */346 that is, reside near the start of the LRU list. */
346347
347 for (i = low; i < high; i++) {348 for (i = low; i < high; i++) {
349
350 rw_lock_t* hash_lock;
351
348 const buf_page_t* bpage =352 const buf_page_t* bpage =
349 buf_page_hash_get(buf_pool, space, i);353 buf_page_hash_get_s_locked(buf_pool, space, i,
354 &hash_lock);
350355
351 if (bpage356 if (bpage
352 && buf_page_is_accessed(bpage)357 && buf_page_is_accessed(bpage)
@@ -357,13 +362,16 @@
357 if (recent_blocks362 if (recent_blocks
358 >= BUF_READ_AHEAD_RANDOM_THRESHOLD(buf_pool)) {363 >= BUF_READ_AHEAD_RANDOM_THRESHOLD(buf_pool)) {
359364
360 buf_pool_mutex_exit(buf_pool);365 rw_lock_s_unlock(hash_lock);
361 goto read_ahead;366 goto read_ahead;
362 }367 }
363 }368 }
369
370 if (bpage) {
371 rw_lock_s_unlock(hash_lock);
372 }
364 }373 }
365374
366 buf_pool_mutex_exit(buf_pool);
367 /* Do nothing */375 /* Do nothing */
368 return(0);376 return(0);
369377
@@ -551,6 +559,7 @@
551 buf_page_t* bpage;559 buf_page_t* bpage;
552 buf_frame_t* frame;560 buf_frame_t* frame;
553 buf_page_t* pred_bpage = NULL;561 buf_page_t* pred_bpage = NULL;
562 unsigned pred_bpage_is_accessed = 0;
554 ulint pred_offset;563 ulint pred_offset;
555 ulint succ_offset;564 ulint succ_offset;
556 ulint count;565 ulint count;
@@ -602,10 +611,7 @@
602611
603 tablespace_version = fil_space_get_version(space);612 tablespace_version = fil_space_get_version(space);
604613
605 buf_pool_mutex_enter(buf_pool);
606
607 if (high > fil_space_get_size(space)) {614 if (high > fil_space_get_size(space)) {
608 buf_pool_mutex_exit(buf_pool);
609 /* The area is not whole, return */615 /* The area is not whole, return */
610616
611 return(0);617 return(0);
@@ -613,7 +619,6 @@
613619
614 if (buf_pool->n_pend_reads620 if (buf_pool->n_pend_reads
615 > buf_pool->curr_size / BUF_READ_AHEAD_PEND_LIMIT) {621 > buf_pool->curr_size / BUF_READ_AHEAD_PEND_LIMIT) {
616 buf_pool_mutex_exit(buf_pool);
617622
618 return(0);623 return(0);
619 }624 }
@@ -636,7 +641,11 @@
636 fail_count = 0;641 fail_count = 0;
637642
638 for (i = low; i < high; i++) {643 for (i = low; i < high; i++) {
639 bpage = buf_page_hash_get(buf_pool, space, i);644
645 rw_lock_t* hash_lock;
646
647 bpage = buf_page_hash_get_s_locked(buf_pool, space, i,
648 &hash_lock);
640649
641 if (bpage == NULL || !buf_page_is_accessed(bpage)) {650 if (bpage == NULL || !buf_page_is_accessed(bpage)) {
642 /* Not accessed */651 /* Not accessed */
@@ -653,7 +662,7 @@
653 a little against this. */662 a little against this. */
654 int res = ut_ulint_cmp(663 int res = ut_ulint_cmp(
655 buf_page_is_accessed(bpage),664 buf_page_is_accessed(bpage),
656 buf_page_is_accessed(pred_bpage));665 pred_bpage_is_accessed);
657 /* Accesses not in the right order */666 /* Accesses not in the right order */
658 if (res != 0 && res != asc_or_desc) {667 if (res != 0 && res != asc_or_desc) {
659 fail_count++;668 fail_count++;
@@ -662,12 +671,20 @@
662671
663 if (fail_count > threshold) {672 if (fail_count > threshold) {
664 /* Too many failures: return */673 /* Too many failures: return */
665 buf_pool_mutex_exit(buf_pool);674 if (bpage) {
675 rw_lock_s_unlock(hash_lock);
676 }
666 return(0);677 return(0);
667 }678 }
668679
669 if (bpage && buf_page_is_accessed(bpage)) {680 if (bpage) {
670 pred_bpage = bpage;681 if (buf_page_is_accessed(bpage)) {
682 pred_bpage = bpage;
683 pred_bpage_is_accessed
684 = buf_page_is_accessed(bpage);
685 }
686
687 rw_lock_s_unlock(hash_lock);
671 }688 }
672 }689 }
673690
@@ -677,7 +694,6 @@
677 bpage = buf_page_hash_get(buf_pool, space, offset);694 bpage = buf_page_hash_get(buf_pool, space, offset);
678695
679 if (bpage == NULL) {696 if (bpage == NULL) {
680 buf_pool_mutex_exit(buf_pool);
681697
682 return(0);698 return(0);
683 }699 }
@@ -703,8 +719,6 @@
703 pred_offset = fil_page_get_prev(frame);719 pred_offset = fil_page_get_prev(frame);
704 succ_offset = fil_page_get_next(frame);720 succ_offset = fil_page_get_next(frame);
705721
706 buf_pool_mutex_exit(buf_pool);
707
708 if ((offset == low) && (succ_offset == offset + 1)) {722 if ((offset == low) && (succ_offset == offset + 1)) {
709723
710 /* This is ok, we can continue */724 /* This is ok, we can continue */
@@ -961,7 +975,8 @@
961975
962 os_aio_print_debug = FALSE;976 os_aio_print_debug = FALSE;
963 buf_pool = buf_pool_get(space, page_nos[i]);977 buf_pool = buf_pool_get(space, page_nos[i]);
964 while (buf_pool->n_pend_reads >= recv_n_pool_free_frames / 2) {978 while (buf_pool->n_pend_reads
979 >= recv_n_pool_free_frames / 2) {
965980
966 os_aio_simulated_wake_handler_threads();981 os_aio_simulated_wake_handler_threads();
967 os_thread_sleep(10000);982 os_thread_sleep(10000);
968983
=== modified file 'Percona-Server/storage/innobase/fsp/fsp0fsp.cc'
--- Percona-Server/storage/innobase/fsp/fsp0fsp.cc 2013-05-30 12:47:19 +0000
+++ Percona-Server/storage/innobase/fsp/fsp0fsp.cc 2013-09-20 05:29:11 +0000
@@ -2870,7 +2870,7 @@
28702870
2871 /* The convoluted mutex acquire is to overcome latching order2871 /* The convoluted mutex acquire is to overcome latching order
2872 issues: The problem is that the fil_mutex is at a lower level2872 issues: The problem is that the fil_mutex is at a lower level
2873 than the tablespace latch and the buffer pool mutex. We have to2873 than the tablespace latch and the buffer pool mutexes. We have to
2874 first prevent any operations on the file system by acquiring the2874 first prevent any operations on the file system by acquiring the
2875 dictionary mutex. Then acquire the tablespace latch to obey the2875 dictionary mutex. Then acquire the tablespace latch to obey the
2876 latching order and then release the dictionary mutex. That way we2876 latching order and then release the dictionary mutex. That way we
28772877
=== modified file 'Percona-Server/storage/innobase/handler/ha_innodb.cc'
--- Percona-Server/storage/innobase/handler/ha_innodb.cc 2013-08-30 13:23:53 +0000
+++ Percona-Server/storage/innobase/handler/ha_innodb.cc 2013-09-20 05:29:11 +0000
@@ -294,6 +294,11 @@
294# endif /* !PFS_SKIP_BUFFER_MUTEX_RWLOCK */294# endif /* !PFS_SKIP_BUFFER_MUTEX_RWLOCK */
295 {&buf_pool_mutex_key, "buf_pool_mutex", 0},295 {&buf_pool_mutex_key, "buf_pool_mutex", 0},
296 {&buf_pool_zip_mutex_key, "buf_pool_zip_mutex", 0},296 {&buf_pool_zip_mutex_key, "buf_pool_zip_mutex", 0},
297 {&buf_pool_LRU_list_mutex_key, "buf_pool_LRU_list_mutex", 0},
298 {&buf_pool_free_list_mutex_key, "buf_pool_free_list_mutex", 0},
299 {&buf_pool_zip_free_mutex_key, "buf_pool_zip_free_mutex", 0},
300 {&buf_pool_zip_hash_mutex_key, "buf_pool_zip_hash_mutex", 0},
301 {&buf_pool_flush_state_mutex_key, "buf_pool_flush_state_mutex", 0},
297 {&cache_last_read_mutex_key, "cache_last_read_mutex", 0},302 {&cache_last_read_mutex_key, "cache_last_read_mutex", 0},
298 {&dict_foreign_err_mutex_key, "dict_foreign_err_mutex", 0},303 {&dict_foreign_err_mutex_key, "dict_foreign_err_mutex", 0},
299 {&dict_sys_mutex_key, "dict_sys_mutex", 0},304 {&dict_sys_mutex_key, "dict_sys_mutex", 0},
@@ -15485,7 +15490,7 @@
15485 for (ulint i = 0; i < srv_buf_pool_instances; i++) {15490 for (ulint i = 0; i < srv_buf_pool_instances; i++) {
15486 buf_pool_t* buf_pool = &buf_pool_ptr[i];15491 buf_pool_t* buf_pool = &buf_pool_ptr[i];
1548715492
15488 buf_pool_mutex_enter(buf_pool);15493 mutex_enter(&buf_pool->LRU_list_mutex);
1548915494
15490 for (buf_block_t* block = UT_LIST_GET_LAST(15495 for (buf_block_t* block = UT_LIST_GET_LAST(
15491 buf_pool->unzip_LRU);15496 buf_pool->unzip_LRU);
@@ -15497,14 +15502,19 @@
15497 ut_ad(block->in_unzip_LRU_list);15502 ut_ad(block->in_unzip_LRU_list);
15498 ut_ad(block->page.in_LRU_list);15503 ut_ad(block->page.in_LRU_list);
1549915504
15505 mutex_enter(&block->mutex);
15500 if (!buf_LRU_free_page(&block->page, false)) {15506 if (!buf_LRU_free_page(&block->page, false)) {
15507 mutex_exit(&block->mutex);
15501 all_evicted = false;15508 all_evicted = false;
15509 } else {
15510 mutex_exit(&block->mutex);
15511 mutex_enter(&buf_pool->LRU_list_mutex);
15502 }15512 }
1550315513
15504 block = prev_block;15514 block = prev_block;
15505 }15515 }
1550615516
15507 buf_pool_mutex_exit(buf_pool);15517 mutex_exit(&buf_pool->LRU_list_mutex);
15508 }15518 }
1550915519
15510 return(all_evicted);15520 return(all_evicted);
1551115521
=== modified file 'Percona-Server/storage/innobase/handler/i_s.cc'
--- Percona-Server/storage/innobase/handler/i_s.cc 2013-08-14 03:57:21 +0000
+++ Percona-Server/storage/innobase/handler/i_s.cc 2013-09-20 05:29:11 +0000
@@ -2103,7 +2103,7 @@
21032103
2104 buf_pool = buf_pool_from_array(i);2104 buf_pool = buf_pool_from_array(i);
21052105
2106 buf_pool_mutex_enter(buf_pool);2106 mutex_enter(&buf_pool->zip_free_mutex);
21072107
2108 for (uint x = 0; x <= BUF_BUDDY_SIZES; x++) {2108 for (uint x = 0; x <= BUF_BUDDY_SIZES; x++) {
2109 buf_buddy_stat_t* buddy_stat;2109 buf_buddy_stat_t* buddy_stat;
@@ -2122,7 +2122,8 @@
2122 (ulong) (buddy_stat->relocated_usec / 1000000));2122 (ulong) (buddy_stat->relocated_usec / 1000000));
21232123
2124 if (reset) {2124 if (reset) {
2125 /* This is protected by buf_pool->mutex. */2125 /* This is protected by
2126 buf_pool->zip_free_mutex. */
2126 buddy_stat->relocated = 0;2127 buddy_stat->relocated = 0;
2127 buddy_stat->relocated_usec = 0;2128 buddy_stat->relocated_usec = 0;
2128 }2129 }
@@ -2133,7 +2134,7 @@
2133 }2134 }
2134 }2135 }
21352136
2136 buf_pool_mutex_exit(buf_pool);2137 mutex_exit(&buf_pool->zip_free_mutex);
21372138
2138 if (status) {2139 if (status) {
2139 break;2140 break;
@@ -4954,12 +4955,16 @@
4954 out: structure filled with scanned4955 out: structure filled with scanned
4955 info */4956 info */
4956{4957{
4958 ib_mutex_t* mutex = buf_page_get_mutex(bpage);
4959
4957 ut_ad(pool_id < MAX_BUFFER_POOLS);4960 ut_ad(pool_id < MAX_BUFFER_POOLS);
49584961
4959 page_info->pool_id = pool_id;4962 page_info->pool_id = pool_id;
49604963
4961 page_info->block_id = pos;4964 page_info->block_id = pos;
49624965
4966 mutex_enter(mutex);
4967
4963 page_info->page_state = buf_page_get_state(bpage);4968 page_info->page_state = buf_page_get_state(bpage);
49644969
4965 /* Only fetch information for buffers that map to a tablespace,4970 /* Only fetch information for buffers that map to a tablespace,
@@ -4998,6 +5003,7 @@
4998 break;5003 break;
4999 case BUF_IO_READ:5004 case BUF_IO_READ:
5000 page_info->page_type = I_S_PAGE_TYPE_UNKNOWN;5005 page_info->page_type = I_S_PAGE_TYPE_UNKNOWN;
5006 mutex_exit(mutex);
5001 return;5007 return;
5002 }5008 }
50035009
@@ -5018,6 +5024,8 @@
5018 } else {5024 } else {
5019 page_info->page_type = I_S_PAGE_TYPE_UNKNOWN;5025 page_info->page_type = I_S_PAGE_TYPE_UNKNOWN;
5020 }5026 }
5027
5028 mutex_exit(mutex);
5021}5029}
50225030
5023/*******************************************************************//**5031/*******************************************************************//**
@@ -5075,7 +5083,6 @@
5075 buffer pool info printout, we are not required to5083 buffer pool info printout, we are not required to
5076 preserve the overall consistency, so we can5084 preserve the overall consistency, so we can
5077 release mutex periodically */5085 release mutex periodically */
5078 buf_pool_mutex_enter(buf_pool);
50795086
5080 /* GO through each block in the chunk */5087 /* GO through each block in the chunk */
5081 for (n_blocks = num_to_process; n_blocks--; block++) {5088 for (n_blocks = num_to_process; n_blocks--; block++) {
@@ -5086,8 +5093,6 @@
5086 num_page++;5093 num_page++;
5087 }5094 }
50885095
5089 buf_pool_mutex_exit(buf_pool);
5090
5091 /* Fill in information schema table with information5096 /* Fill in information schema table with information
5092 just collected from the buffer chunk scan */5097 just collected from the buffer chunk scan */
5093 status = i_s_innodb_buffer_page_fill(5098 status = i_s_innodb_buffer_page_fill(
@@ -5609,9 +5614,9 @@
5609 DBUG_ENTER("i_s_innodb_fill_buffer_lru");5614 DBUG_ENTER("i_s_innodb_fill_buffer_lru");
5610 RETURN_IF_INNODB_NOT_STARTED(tables->schema_table_name);5615 RETURN_IF_INNODB_NOT_STARTED(tables->schema_table_name);
56115616
5612 /* Obtain buf_pool mutex before allocate info_buffer, since5617 /* Obtain buf_pool->LRU_list_mutex before allocate info_buffer, since
5613 UT_LIST_GET_LEN(buf_pool->LRU) could change */5618 UT_LIST_GET_LEN(buf_pool->LRU) could change */
5614 buf_pool_mutex_enter(buf_pool);5619 mutex_enter(&buf_pool->LRU_list_mutex);
56155620
5616 lru_len = UT_LIST_GET_LEN(buf_pool->LRU);5621 lru_len = UT_LIST_GET_LEN(buf_pool->LRU);
56175622
@@ -5645,7 +5650,7 @@
5645 ut_ad(lru_pos == UT_LIST_GET_LEN(buf_pool->LRU));5650 ut_ad(lru_pos == UT_LIST_GET_LEN(buf_pool->LRU));
56465651
5647exit:5652exit:
5648 buf_pool_mutex_exit(buf_pool);5653 mutex_exit(&buf_pool->LRU_list_mutex);
56495654
5650 if (info_buffer) {5655 if (info_buffer) {
5651 status = i_s_innodb_buf_page_lru_fill(5656 status = i_s_innodb_buf_page_lru_fill(
56525657
=== modified file 'Percona-Server/storage/innobase/ibuf/ibuf0ibuf.cc'
--- Percona-Server/storage/innobase/ibuf/ibuf0ibuf.cc 2013-09-02 10:01:38 +0000
+++ Percona-Server/storage/innobase/ibuf/ibuf0ibuf.cc 2013-09-20 05:29:11 +0000
@@ -4611,7 +4611,7 @@
4611 ut_ad(!block || buf_block_get_space(block) == space);4611 ut_ad(!block || buf_block_get_space(block) == space);
4612 ut_ad(!block || buf_block_get_page_no(block) == page_no);4612 ut_ad(!block || buf_block_get_page_no(block) == page_no);
4613 ut_ad(!block || buf_block_get_zip_size(block) == zip_size);4613 ut_ad(!block || buf_block_get_zip_size(block) == zip_size);
4614 ut_ad(!block || buf_block_get_io_fix(block) == BUF_IO_READ);4614 ut_ad(!block || buf_block_get_io_fix_unlocked(block) == BUF_IO_READ);
46154615
4616 if (srv_force_recovery >= SRV_FORCE_NO_IBUF_MERGE4616 if (srv_force_recovery >= SRV_FORCE_NO_IBUF_MERGE
4617 || trx_sys_hdr_page(space, page_no)) {4617 || trx_sys_hdr_page(space, page_no)) {
46184618
=== modified file 'Percona-Server/storage/innobase/include/buf0buddy.h'
--- Percona-Server/storage/innobase/include/buf0buddy.h 2013-08-06 15:16:34 +0000
+++ Percona-Server/storage/innobase/include/buf0buddy.h 2013-09-20 05:29:11 +0000
@@ -36,8 +36,8 @@
3636
37/**********************************************************************//**37/**********************************************************************//**
38Allocate a block. The thread calling this function must hold38Allocate a block. The thread calling this function must hold
39buf_pool->mutex and must not hold buf_pool->zip_mutex or any39buf_pool->LRU_list_mutex and must not hold buf_pool->zip_mutex or any
40block->mutex. The buf_pool->mutex may be released and reacquired.40block->mutex. The buf_pool->LRU_list_mutex may be released and reacquired.
41This function should only be used for allocating compressed page frames.41This function should only be used for allocating compressed page frames.
42@return allocated block, never NULL */42@return allocated block, never NULL */
43UNIV_INLINE43UNIV_INLINE
@@ -52,8 +52,8 @@
52 ibool* lru) /*!< in: pointer to a variable52 ibool* lru) /*!< in: pointer to a variable
53 that will be assigned TRUE if53 that will be assigned TRUE if
54 storage was allocated from the54 storage was allocated from the
55 LRU list and buf_pool->mutex was55 LRU list and buf_pool->LRU_list_mutex
56 temporarily released */56 was temporarily released */
57 __attribute__((malloc, nonnull));57 __attribute__((malloc, nonnull));
5858
59/**********************************************************************//**59/**********************************************************************//**
6060
=== modified file 'Percona-Server/storage/innobase/include/buf0buddy.ic'
--- Percona-Server/storage/innobase/include/buf0buddy.ic 2013-08-06 15:16:34 +0000
+++ Percona-Server/storage/innobase/include/buf0buddy.ic 2013-09-20 05:29:11 +0000
@@ -35,8 +35,8 @@
3535
36/**********************************************************************//**36/**********************************************************************//**
37Allocate a block. The thread calling this function must hold37Allocate a block. The thread calling this function must hold
38buf_pool->mutex and must not hold buf_pool->zip_mutex or any block->mutex.38buf_pool->LRU_list_mutex and must not hold buf_pool->zip_mutex or any
39The buf_pool_mutex may be released and reacquired.39block->mutex. The buf_pool->LRU_list_mutex may be released and reacquired.
40@return allocated block, never NULL */40@return allocated block, never NULL */
41UNIV_INTERN41UNIV_INTERN
42void*42void*
@@ -48,8 +48,8 @@
48 ibool* lru) /*!< in: pointer to a variable that48 ibool* lru) /*!< in: pointer to a variable that
49 will be assigned TRUE if storage was49 will be assigned TRUE if storage was
50 allocated from the LRU list and50 allocated from the LRU list and
51 buf_pool->mutex was temporarily51 buf_pool->LRU_list_mutex was
52 released */52 temporarily released */
53 __attribute__((malloc, nonnull));53 __attribute__((malloc, nonnull));
5454
55/**********************************************************************//**55/**********************************************************************//**
@@ -88,8 +88,8 @@
8888
89/**********************************************************************//**89/**********************************************************************//**
90Allocate a block. The thread calling this function must hold90Allocate a block. The thread calling this function must hold
91buf_pool->mutex and must not hold buf_pool->zip_mutex or any91buf_pool->LRU_list_mutex and must not hold buf_pool->zip_mutex or any
92block->mutex. The buf_pool->mutex may be released and reacquired.92block->mutex. The buf_pool->LRU_list_mutex may be released and reacquired.
93This function should only be used for allocating compressed page frames.93This function should only be used for allocating compressed page frames.
94@return allocated block, never NULL */94@return allocated block, never NULL */
95UNIV_INLINE95UNIV_INLINE
@@ -104,10 +104,10 @@
104 ibool* lru) /*!< in: pointer to a variable104 ibool* lru) /*!< in: pointer to a variable
105 that will be assigned TRUE if105 that will be assigned TRUE if
106 storage was allocated from the106 storage was allocated from the
107 LRU list and buf_pool->mutex was107 LRU list and buf_pool->LRU_list_mutex
108 temporarily released */108 was temporarily released */
109{109{
110 ut_ad(buf_pool_mutex_own(buf_pool));110 ut_ad(mutex_own(&buf_pool->LRU_list_mutex));
111 ut_ad(ut_is_2pow(size));111 ut_ad(ut_is_2pow(size));
112 ut_ad(size >= UNIV_ZIP_SIZE_MIN);112 ut_ad(size >= UNIV_ZIP_SIZE_MIN);
113 ut_ad(size <= UNIV_PAGE_SIZE);113 ut_ad(size <= UNIV_PAGE_SIZE);
@@ -129,7 +129,6 @@
129 ulint size) /*!< in: block size,129 ulint size) /*!< in: block size,
130 up to UNIV_PAGE_SIZE */130 up to UNIV_PAGE_SIZE */
131{131{
132 ut_ad(buf_pool_mutex_own(buf_pool));
133 ut_ad(ut_is_2pow(size));132 ut_ad(ut_is_2pow(size));
134 ut_ad(size >= UNIV_ZIP_SIZE_MIN);133 ut_ad(size >= UNIV_ZIP_SIZE_MIN);
135 ut_ad(size <= UNIV_PAGE_SIZE);134 ut_ad(size <= UNIV_PAGE_SIZE);
136135
=== modified file 'Percona-Server/storage/innobase/include/buf0buf.h'
--- Percona-Server/storage/innobase/include/buf0buf.h 2013-09-02 10:01:38 +0000
+++ Percona-Server/storage/innobase/include/buf0buf.h 2013-09-20 05:29:11 +0000
@@ -208,19 +208,6 @@
208};208};
209209
210#ifndef UNIV_HOTBACKUP210#ifndef UNIV_HOTBACKUP
211/********************************************************************//**
212Acquire mutex on all buffer pool instances */
213UNIV_INLINE
214void
215buf_pool_mutex_enter_all(void);
216/*===========================*/
217
218/********************************************************************//**
219Release mutex on all buffer pool instances */
220UNIV_INLINE
221void
222buf_pool_mutex_exit_all(void);
223/*==========================*/
224211
225/********************************************************************//**212/********************************************************************//**
226Creates the buffer pool.213Creates the buffer pool.
@@ -581,7 +568,7 @@
581 page frame */568 page frame */
582/********************************************************************//**569/********************************************************************//**
583Increments the modify clock of a frame by 1. The caller must (1) own the570Increments the modify clock of a frame by 1. The caller must (1) own the
584buf_pool->mutex and block bufferfix count has to be zero, (2) or own an x-lock571LRU list mutex and block bufferfix count has to be zero, (2) or own an x-lock
585on the block. */572on the block. */
586UNIV_INLINE573UNIV_INLINE
587void574void
@@ -938,6 +925,17 @@
938 const buf_block_t* block) /*!< in: pointer to the control block */925 const buf_block_t* block) /*!< in: pointer to the control block */
939 __attribute__((pure));926 __attribute__((pure));
940/*********************************************************************//**927/*********************************************************************//**
928Gets the io_fix state of a block. Does not assert that the
929buf_page_get_mutex() mutex is held, to be used in the cases where it is safe
930not to hold it.
931@return io_fix state */
932UNIV_INLINE
933enum buf_io_fix
934buf_page_get_io_fix_unlocked(
935/*=========================*/
936 const buf_page_t* bpage) /*!< in: pointer to the control block */
937 __attribute__((pure));
938/*********************************************************************//**
941Sets the io_fix state of a block. */939Sets the io_fix state of a block. */
942UNIV_INLINE940UNIV_INLINE
943void941void
@@ -955,7 +953,7 @@
955 enum buf_io_fix io_fix);/*!< in: io_fix state */953 enum buf_io_fix io_fix);/*!< in: io_fix state */
956/*********************************************************************//**954/*********************************************************************//**
957Makes a block sticky. A sticky block implies that even after we release955Makes a block sticky. A sticky block implies that even after we release
958the buf_pool->mutex and the block->mutex:956the buf_pool->LRU_list_mutex and the block->mutex:
959* it cannot be removed from the flush_list957* it cannot be removed from the flush_list
960* the block descriptor cannot be relocated958* the block descriptor cannot be relocated
961* it cannot be removed from the LRU list959* it cannot be removed from the LRU list
@@ -1410,6 +1408,19 @@
14101408
1411#endif /* !UNIV_HOTBACKUP */1409#endif /* !UNIV_HOTBACKUP */
14121410
1411#ifdef UNIV_DEBUG
1412/********************************************************************//**
1413Checks if buf_pool->zip_mutex is owned and is serving for a given page as its
1414block mutex.
1415@return true if buf_pool->zip_mutex is owned. */
1416UNIV_INLINE
1417bool
1418buf_own_zip_mutex_for_page(
1419/*=======================*/
1420 const buf_page_t* bpage)
1421 __attribute__((nonnull,warn_unused_result));
1422#endif /* UNIV_DEBUG */
1423
1413/** The common buffer control block structure1424/** The common buffer control block structure
1414for compressed and uncompressed frames */1425for compressed and uncompressed frames */
14151426
@@ -1421,18 +1432,14 @@
1421 None of these bit-fields must be modified without holding1432 None of these bit-fields must be modified without holding
1422 buf_page_get_mutex() [buf_block_t::mutex or1433 buf_page_get_mutex() [buf_block_t::mutex or
1423 buf_pool->zip_mutex], since they can be stored in the same1434 buf_pool->zip_mutex], since they can be stored in the same
1424 machine word. Some of these fields are additionally protected1435 machine word. */
1425 by buf_pool->mutex. */
1426 /* @{ */1436 /* @{ */
14271437
1428 unsigned space:32; /*!< tablespace id; also protected1438 unsigned space:32; /*!< tablespace id. */
1429 by buf_pool->mutex. */1439 unsigned offset:32; /*!< page number. */
1430 unsigned offset:32; /*!< page number; also protected
1431 by buf_pool->mutex. */
14321440
1433 unsigned state:BUF_PAGE_STATE_BITS;1441 unsigned state:BUF_PAGE_STATE_BITS;
1434 /*!< state of the control block; also1442 /*!< state of the control block.
1435 protected by buf_pool->mutex.
1436 State transitions from1443 State transitions from
1437 BUF_BLOCK_READY_FOR_USE to1444 BUF_BLOCK_READY_FOR_USE to
1438 BUF_BLOCK_MEMORY need not be1445 BUF_BLOCK_MEMORY need not be
@@ -1450,11 +1457,21 @@
1450#ifndef UNIV_HOTBACKUP1457#ifndef UNIV_HOTBACKUP
1451 unsigned flush_type:2; /*!< if this block is currently being1458 unsigned flush_type:2; /*!< if this block is currently being
1452 flushed to disk, this tells the1459 flushed to disk, this tells the
1453 flush_type.1460 flush_type. Writes during flushing
1461 protected by buf_page_get_mutex_enter()
1462 mutex and the corresponding flush state
1463 mutex.
1454 @see buf_flush_t */1464 @see buf_flush_t */
1455 unsigned io_fix:2; /*!< type of pending I/O operation;1465 unsigned io_fix:2; /*!< type of pending I/O operation.
1456 also protected by buf_pool->mutex1466 Transitions from BUF_IO_NONE to
1457 @see enum buf_io_fix */1467 BUF_IO_WRITE and back are protected by
1468 the buf_page_get_mutex() mutex and the
1469 corresponding flush state mutex. The
1470 flush state mutex protection for io_fix
1471 and flush_type is not strictly
1472 required, but it ensures consistent
1473 buffer pool instance state snapshots in
1474 buf_pool_validate_instance(). */
1458 unsigned buf_fix_count:19;/*!< count of how manyfold this block1475 unsigned buf_fix_count:19;/*!< count of how manyfold this block
1459 is currently bufferfixed */1476 is currently bufferfixed */
1460 unsigned buf_pool_index:6;/*!< index number of the buffer pool1477 unsigned buf_pool_index:6;/*!< index number of the buffer pool
@@ -1466,7 +1483,7 @@
1466#endif /* !UNIV_HOTBACKUP */1483#endif /* !UNIV_HOTBACKUP */
1467 page_zip_des_t zip; /*!< compressed page; zip.data1484 page_zip_des_t zip; /*!< compressed page; zip.data
1468 (but not the data it points to) is1485 (but not the data it points to) is
1469 also protected by buf_pool->mutex;1486 protected by buf_pool->zip_mutex;
1470 state == BUF_BLOCK_ZIP_PAGE and1487 state == BUF_BLOCK_ZIP_PAGE and
1471 zip.data == NULL means an active1488 zip.data == NULL means an active
1472 buf_pool->watch */1489 buf_pool->watch */
@@ -1479,15 +1496,13 @@
1479 ibool in_zip_hash; /*!< TRUE if in buf_pool->zip_hash */1496 ibool in_zip_hash; /*!< TRUE if in buf_pool->zip_hash */
1480#endif /* UNIV_DEBUG */1497#endif /* UNIV_DEBUG */
14811498
1482 /** @name Page flushing fields1499 /** @name Page flushing fields */
1483 All these are protected by buf_pool->mutex. */
1484 /* @{ */1500 /* @{ */
14851501
1486 UT_LIST_NODE_T(buf_page_t) list;1502 UT_LIST_NODE_T(buf_page_t) list;
1487 /*!< based on state, this is a1503 /*!< based on state, this is a
1488 list node, protected either by1504 list node, protected either by
1489 buf_pool->mutex or by1505 a corresponding list mutex,
1490 buf_pool->flush_list_mutex,
1491 in one of the following lists in1506 in one of the following lists in
1492 buf_pool:1507 buf_pool:
14931508
@@ -1500,7 +1515,8 @@
1500 then the node pointers are1515 then the node pointers are
1501 covered by buf_pool->flush_list_mutex.1516 covered by buf_pool->flush_list_mutex.
1502 Otherwise these pointers are1517 Otherwise these pointers are
1503 protected by buf_pool->mutex.1518 protected by a corresponding list
1519 mutex.
15041520
1505 The contents of the list node1521 The contents of the list node
1506 is undefined if !in_flush_list1522 is undefined if !in_flush_list
@@ -1523,8 +1539,8 @@
1523 reads can happen while holding1539 reads can happen while holding
1524 any one of the two mutexes */1540 any one of the two mutexes */
1525 ibool in_free_list; /*!< TRUE if in buf_pool->free; when1541 ibool in_free_list; /*!< TRUE if in buf_pool->free; when
1526 buf_pool->mutex is free, the following1542 buf_pool->free_list_mutex is free, the
1527 should hold: in_free_list1543 following should hold: in_free_list
1528 == (state == BUF_BLOCK_NOT_USED) */1544 == (state == BUF_BLOCK_NOT_USED) */
1529#endif /* UNIV_DEBUG */1545#endif /* UNIV_DEBUG */
1530 lsn_t newest_modification;1546 lsn_t newest_modification;
@@ -1547,9 +1563,7 @@
1547 reads can happen while holding1563 reads can happen while holding
1548 any one of the two mutexes */1564 any one of the two mutexes */
1549 /* @} */1565 /* @} */
1550 /** @name LRU replacement algorithm fields1566 /** @name LRU replacement algorithm fields */
1551 These fields are protected by buf_pool->mutex only (not
1552 buf_pool->zip_mutex or buf_block_t::mutex). */
1553 /* @{ */1567 /* @{ */
15541568
1555 UT_LIST_NODE_T(buf_page_t) LRU;1569 UT_LIST_NODE_T(buf_page_t) LRU;
@@ -1560,7 +1574,10 @@
1560 debugging */1574 debugging */
1561#endif /* UNIV_DEBUG */1575#endif /* UNIV_DEBUG */
1562 unsigned old:1; /*!< TRUE if the block is in the old1576 unsigned old:1; /*!< TRUE if the block is in the old
1563 blocks in buf_pool->LRU_old */1577 blocks in buf_pool->LRU_old. Protected
1578 by the LRU list mutex. May be read for
1579 heuristics purposes under the block
1580 mutex instead. */
1564 unsigned freed_page_clock:31;/*!< the value of1581 unsigned freed_page_clock:31;/*!< the value of
1565 buf_pool->freed_page_clock1582 buf_pool->freed_page_clock
1566 when this block was the last1583 when this block was the last
@@ -1612,8 +1629,7 @@
1612 used in debugging */1629 used in debugging */
1613#endif /* UNIV_DEBUG */1630#endif /* UNIV_DEBUG */
1614 ib_mutex_t mutex; /*!< mutex protecting this block:1631 ib_mutex_t mutex; /*!< mutex protecting this block:
1615 state (also protected by the buffer1632 state, io_fix, buf_fix_count,
1616 pool mutex), io_fix, buf_fix_count,
1617 and accessed; we introduce this new1633 and accessed; we introduce this new
1618 mutex in InnoDB-5.1 to relieve1634 mutex in InnoDB-5.1 to relieve
1619 contention on the buffer pool mutex */1635 contention on the buffer pool mutex */
@@ -1622,8 +1638,8 @@
1622 unsigned lock_hash_val:32;/*!< hashed value of the page address1638 unsigned lock_hash_val:32;/*!< hashed value of the page address
1623 in the record lock hash table;1639 in the record lock hash table;
1624 protected by buf_block_t::lock1640 protected by buf_block_t::lock
1625 (or buf_block_t::mutex, buf_pool->mutex1641 (or buf_block_t::mutex in
1626 in buf_page_get_gen(),1642 buf_page_get_gen(),
1627 buf_page_init_for_read()1643 buf_page_init_for_read()
1628 and buf_page_create()) */1644 and buf_page_create()) */
1629 ibool check_index_page_at_flush;1645 ibool check_index_page_at_flush;
@@ -1646,8 +1662,8 @@
1646 positioning: if the modify clock has1662 positioning: if the modify clock has
1647 not changed, we know that the pointer1663 not changed, we know that the pointer
1648 is still valid; this field may be1664 is still valid; this field may be
1649 changed if the thread (1) owns the1665 changed if the thread (1) owns the LRU
1650 pool mutex and the page is not1666 list mutex and the page is not
1651 bufferfixed, or (2) the thread has an1667 bufferfixed, or (2) the thread has an
1652 x-latch on the block */1668 x-latch on the block */
1653 /* @} */1669 /* @} */
@@ -1754,11 +1770,11 @@
1754 ulint n_page_gets; /*!< number of page gets performed;1770 ulint n_page_gets; /*!< number of page gets performed;
1755 also successful searches through1771 also successful searches through
1756 the adaptive hash index are1772 the adaptive hash index are
1757 counted as page gets; this field1773 counted as page gets. */
1758 is NOT protected by the buffer1774 ulint n_pages_read; /*!< number read operations. Accessed
1759 pool mutex */1775 atomically. */
1760 ulint n_pages_read; /*!< number read operations */1776 ulint n_pages_written;/*!< number write operations. Accessed
1761 ulint n_pages_written;/*!< number write operations */1777 atomically.*/
1762 ulint n_pages_created;/*!< number of pages created1778 ulint n_pages_created;/*!< number of pages created
1763 in the pool with no read */1779 in the pool with no read */
1764 ulint n_ra_pages_read_rnd;/*!< number of pages read in1780 ulint n_ra_pages_read_rnd;/*!< number of pages read in
@@ -1798,12 +1814,16 @@
17981814
1799 /** @name General fields */1815 /** @name General fields */
1800 /* @{ */1816 /* @{ */
1801 ib_mutex_t mutex; /*!< Buffer pool mutex of this
1802 instance */
1803 ib_mutex_t zip_mutex; /*!< Zip mutex of this buffer1817 ib_mutex_t zip_mutex; /*!< Zip mutex of this buffer
1804 pool instance, protects compressed1818 pool instance, protects compressed
1805 only pages (of type buf_page_t, not1819 only pages (of type buf_page_t, not
1806 buf_block_t */1820 buf_block_t */
1821 ib_mutex_t LRU_list_mutex;
1822 ib_mutex_t free_list_mutex;
1823 ib_mutex_t zip_free_mutex;
1824 ib_mutex_t zip_hash_mutex;
1825 ib_mutex_t flush_state_mutex; /*!< Flush state protection
1826 mutex */
1807 ulint instance_no; /*!< Array index of this buffer1827 ulint instance_no; /*!< Array index of this buffer
1808 pool instance */1828 pool instance */
1809 ulint old_pool_size; /*!< Old pool size in bytes */1829 ulint old_pool_size; /*!< Old pool size in bytes */
@@ -1814,9 +1834,6 @@
1814 ulint buddy_n_frames; /*!< Number of frames allocated from1834 ulint buddy_n_frames; /*!< Number of frames allocated from
1815 the buffer pool to the buddy system */1835 the buffer pool to the buddy system */
1816#endif1836#endif
1817#if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
1818 ulint mutex_exit_forbidden; /*!< Forbid release mutex */
1819#endif
1820 ulint n_chunks; /*!< number of buffer pool chunks */1837 ulint n_chunks; /*!< number of buffer pool chunks */
1821 buf_chunk_t* chunks; /*!< buffer pool chunks */1838 buf_chunk_t* chunks; /*!< buffer pool chunks */
1822 ulint curr_size; /*!< current pool size in pages */1839 ulint curr_size; /*!< current pool size in pages */
@@ -1828,26 +1845,23 @@
1828 buf_page_in_file() == TRUE,1845 buf_page_in_file() == TRUE,
1829 indexed by (space_id, offset).1846 indexed by (space_id, offset).
1830 page_hash is protected by an1847 page_hash is protected by an
1831 array of mutexes.1848 array of mutexes. */
1832 Changes in page_hash are protected
1833 by buf_pool->mutex and the relevant
1834 page_hash mutex. Lookups can happen
1835 while holding the buf_pool->mutex or
1836 the relevant page_hash mutex. */
1837 hash_table_t* zip_hash; /*!< hash table of buf_block_t blocks1849 hash_table_t* zip_hash; /*!< hash table of buf_block_t blocks
1838 whose frames are allocated to the1850 whose frames are allocated to the
1839 zip buddy system,1851 zip buddy system,
1840 indexed by block->frame */1852 indexed by block->frame */
1841 ulint n_pend_reads; /*!< number of pending read1853 ulint n_pend_reads; /*!< number of pending read
1842 operations */1854 operations. Accessed atomically */
1843 ulint n_pend_unzip; /*!< number of pending decompressions */1855 ulint n_pend_unzip; /*!< number of pending decompressions.
1856 Accesssed atomically */
18441857
1845 time_t last_printout_time;1858 time_t last_printout_time;
1846 /*!< when buf_print_io was last time1859 /*!< when buf_print_io was last time
1847 called */1860 called. Accesses not protected */
1848 buf_buddy_stat_t buddy_stat[BUF_BUDDY_SIZES_MAX + 1];1861 buf_buddy_stat_t buddy_stat[BUF_BUDDY_SIZES_MAX + 1];
1849 /*!< Statistics of buddy system,1862 /*!< Statistics of buddy system,
1850 indexed by block size */1863 indexed by block size. Protected by
1864 zip_free_mutex. */
1851 buf_pool_stat_t stat; /*!< current statistics */1865 buf_pool_stat_t stat; /*!< current statistics */
1852 buf_pool_stat_t old_stat; /*!< old statistics */1866 buf_pool_stat_t old_stat; /*!< old statistics */
18531867
@@ -1874,10 +1888,12 @@
1874 list */1888 list */
1875 ibool init_flush[BUF_FLUSH_N_TYPES];1889 ibool init_flush[BUF_FLUSH_N_TYPES];
1876 /*!< this is TRUE when a flush of the1890 /*!< this is TRUE when a flush of the
1877 given type is being initialized */1891 given type is being initialized.
1892 Protected by flush_state_mutex. */
1878 ulint n_flush[BUF_FLUSH_N_TYPES];1893 ulint n_flush[BUF_FLUSH_N_TYPES];
1879 /*!< this is the number of pending1894 /*!< this is the number of pending
1880 writes in the given flush type */1895 writes in the given flush type.
1896 Protected by flush_state_mutex. */
1881 os_event_t no_flush[BUF_FLUSH_N_TYPES];1897 os_event_t no_flush[BUF_FLUSH_N_TYPES];
1882 /*!< this is in the set state1898 /*!< this is in the set state
1883 when there is no flush batch1899 when there is no flush batch
@@ -1904,7 +1920,8 @@
1904 billion! A thread is allowed1920 billion! A thread is allowed
1905 to read this for heuristic1921 to read this for heuristic
1906 purposes without holding any1922 purposes without holding any
1907 mutex or latch */1923 mutex or latch. For non-heuristic
1924 purposes protected by LRU_list_mutex */
1908 ibool try_LRU_scan; /*!< Set to FALSE when an LRU1925 ibool try_LRU_scan; /*!< Set to FALSE when an LRU
1909 scan for free block fails. This1926 scan for free block fails. This
1910 flag is used to avoid repeated1927 flag is used to avoid repeated
@@ -1913,8 +1930,7 @@
1913 available in the scan depth for1930 available in the scan depth for
1914 eviction. Set to TRUE whenever1931 eviction. Set to TRUE whenever
1915 we flush a batch from the1932 we flush a batch from the
1916 buffer pool. Protected by the1933 buffer pool. Accessed atomically. */
1917 buf_pool->mutex */
1918 /* @} */1934 /* @} */
19191935
1920 /** @name LRU replacement algorithm fields */1936 /** @name LRU replacement algorithm fields */
@@ -1942,7 +1958,8 @@
19421958
1943 UT_LIST_BASE_NODE_T(buf_block_t) unzip_LRU;1959 UT_LIST_BASE_NODE_T(buf_block_t) unzip_LRU;
1944 /*!< base node of the1960 /*!< base node of the
1945 unzip_LRU list */1961 unzip_LRU list. The list is protected
1962 by LRU list mutex. */
19461963
1947 /* @} */1964 /* @} */
1948 /** @name Buddy allocator fields1965 /** @name Buddy allocator fields
@@ -1959,8 +1976,7 @@
19591976
1960 buf_page_t* watch;1977 buf_page_t* watch;
1961 /*!< Sentinel records for buffer1978 /*!< Sentinel records for buffer
1962 pool watches. Protected by1979 pool watches. */
1963 buf_pool->mutex. */
19641980
1965#if BUF_BUDDY_LOW > UNIV_ZIP_SIZE_MIN1981#if BUF_BUDDY_LOW > UNIV_ZIP_SIZE_MIN
1966# error "BUF_BUDDY_LOW > UNIV_ZIP_SIZE_MIN"1982# error "BUF_BUDDY_LOW > UNIV_ZIP_SIZE_MIN"
@@ -1968,18 +1984,10 @@
1968 /* @} */1984 /* @} */
1969};1985};
19701986
1971/** @name Accessors for buf_pool->mutex.1987/** @name Accessors for buffer pool mutexes
1972Use these instead of accessing buf_pool->mutex directly. */1988Use these instead of accessing buffer pool mutexes directly. */
1973/* @{ */1989/* @{ */
19741990
1975/** Test if a buffer pool mutex is owned. */
1976#define buf_pool_mutex_own(b) mutex_own(&b->mutex)
1977/** Acquire a buffer pool mutex. */
1978#define buf_pool_mutex_enter(b) do { \
1979 ut_ad(!mutex_own(&b->zip_mutex)); \
1980 mutex_enter(&b->mutex); \
1981} while (0)
1982
1983/** Test if flush list mutex is owned. */1991/** Test if flush list mutex is owned. */
1984#define buf_flush_list_mutex_own(b) mutex_own(&b->flush_list_mutex)1992#define buf_flush_list_mutex_own(b) mutex_own(&b->flush_list_mutex)
19851993
@@ -2035,31 +2043,6 @@
2035# define buf_block_hash_lock_held_s_or_x(b, p) (TRUE)2043# define buf_block_hash_lock_held_s_or_x(b, p) (TRUE)
2036#endif /* UNIV_SYNC_DEBUG */2044#endif /* UNIV_SYNC_DEBUG */
20372045
2038#if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
2039/** Forbid the release of the buffer pool mutex. */
2040# define buf_pool_mutex_exit_forbid(b) do { \
2041 ut_ad(buf_pool_mutex_own(b)); \
2042 b->mutex_exit_forbidden++; \
2043} while (0)
2044/** Allow the release of the buffer pool mutex. */
2045# define buf_pool_mutex_exit_allow(b) do { \
2046 ut_ad(buf_pool_mutex_own(b)); \
2047 ut_a(b->mutex_exit_forbidden); \
2048 b->mutex_exit_forbidden--; \
2049} while (0)
2050/** Release the buffer pool mutex. */
2051# define buf_pool_mutex_exit(b) do { \
2052 ut_a(!b->mutex_exit_forbidden); \
2053 mutex_exit(&b->mutex); \
2054} while (0)
2055#else
2056/** Forbid the release of the buffer pool mutex. */
2057# define buf_pool_mutex_exit_forbid(b) ((void) 0)
2058/** Allow the release of the buffer pool mutex. */
2059# define buf_pool_mutex_exit_allow(b) ((void) 0)
2060/** Release the buffer pool mutex. */
2061# define buf_pool_mutex_exit(b) mutex_exit(&b->mutex)
2062#endif
2063#endif /* !UNIV_HOTBACKUP */2046#endif /* !UNIV_HOTBACKUP */
2064/* @} */2047/* @} */
20652048
20662049
=== modified file 'Percona-Server/storage/innobase/include/buf0buf.ic'
--- Percona-Server/storage/innobase/include/buf0buf.ic 2013-06-25 13:13:06 +0000
+++ Percona-Server/storage/innobase/include/buf0buf.ic 2013-09-20 05:29:11 +0000
@@ -121,7 +121,7 @@
121/*==========================*/121/*==========================*/
122 const buf_page_t* bpage) /*!< in: block */122 const buf_page_t* bpage) /*!< in: block */
123{123{
124 /* This is sometimes read without holding buf_pool->mutex. */124 /* This is sometimes read without holding any buffer pool mutex. */
125 return(bpage->freed_page_clock);125 return(bpage->freed_page_clock);
126}126}
127127
@@ -420,8 +420,21 @@
420/*================*/420/*================*/
421 const buf_page_t* bpage) /*!< in: pointer to the control block */421 const buf_page_t* bpage) /*!< in: pointer to the control block */
422{422{
423 ut_ad(bpage != NULL);423 ut_ad(mutex_own(buf_page_get_mutex(bpage)));
424 return buf_page_get_io_fix_unlocked(bpage);
425}
424426
427/*********************************************************************//**
428Gets the io_fix state of a block. Does not assert that the
429buf_page_get_mutex() mutex is held, to be used in the cases where it is safe
430not to hold it.
431@return io_fix state */
432UNIV_INLINE
433enum buf_io_fix
434buf_page_get_io_fix_unlocked(
435/*=========================*/
436 const buf_page_t* bpage) /*!< in: pointer to the control block */
437{
425 enum buf_io_fix io_fix = (enum buf_io_fix) bpage->io_fix;438 enum buf_io_fix io_fix = (enum buf_io_fix) bpage->io_fix;
426#ifdef UNIV_DEBUG439#ifdef UNIV_DEBUG
427 switch (io_fix) {440 switch (io_fix) {
@@ -449,6 +462,21 @@
449}462}
450463
451/*********************************************************************//**464/*********************************************************************//**
465Gets the io_fix state of a block. Does not assert that the
466buf_page_get_mutex() mutex is held, to be used in the cases where it is safe
467not to hold it.
468@return io_fix state */
469UNIV_INLINE
470enum buf_io_fix
471buf_block_get_io_fix_unlocked(
472/*==========================*/
473 const buf_block_t* block) /*!< in: pointer to the control block */
474{
475 return(buf_page_get_io_fix_unlocked(&block->page));
476}
477
478
479/*********************************************************************//**
452Sets the io_fix state of a block. */480Sets the io_fix state of a block. */
453UNIV_INLINE481UNIV_INLINE
454void482void
@@ -457,10 +485,6 @@
457 buf_page_t* bpage, /*!< in/out: control block */485 buf_page_t* bpage, /*!< in/out: control block */
458 enum buf_io_fix io_fix) /*!< in: io_fix state */486 enum buf_io_fix io_fix) /*!< in: io_fix state */
459{487{
460#ifdef UNIV_DEBUG
461 buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);
462 ut_ad(buf_pool_mutex_own(buf_pool));
463#endif
464 ut_ad(mutex_own(buf_page_get_mutex(bpage)));488 ut_ad(mutex_own(buf_page_get_mutex(bpage)));
465489
466 bpage->io_fix = io_fix;490 bpage->io_fix = io_fix;
@@ -481,7 +505,7 @@
481505
482/*********************************************************************//**506/*********************************************************************//**
483Makes a block sticky. A sticky block implies that even after we release507Makes a block sticky. A sticky block implies that even after we release
484the buf_pool->mutex and the block->mutex:508the buf_pool->LRU_list_mutex and the block->mutex:
485* it cannot be removed from the flush_list509* it cannot be removed from the flush_list
486* the block descriptor cannot be relocated510* the block descriptor cannot be relocated
487* it cannot be removed from the LRU list511* it cannot be removed from the LRU list
@@ -496,10 +520,11 @@
496{520{
497#ifdef UNIV_DEBUG521#ifdef UNIV_DEBUG
498 buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);522 buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);
499 ut_ad(buf_pool_mutex_own(buf_pool));523 ut_ad(mutex_own(&buf_pool->LRU_list_mutex));
500#endif524#endif
501 ut_ad(mutex_own(buf_page_get_mutex(bpage)));525 ut_ad(mutex_own(buf_page_get_mutex(bpage)));
502 ut_ad(buf_page_get_io_fix(bpage) == BUF_IO_NONE);526 ut_ad(buf_page_get_io_fix(bpage) == BUF_IO_NONE);
527 ut_ad(bpage->in_LRU_list);
503528
504 bpage->io_fix = BUF_IO_PIN;529 bpage->io_fix = BUF_IO_PIN;
505}530}
@@ -512,10 +537,6 @@
512/*==================*/537/*==================*/
513 buf_page_t* bpage) /*!< in/out: control block */538 buf_page_t* bpage) /*!< in/out: control block */
514{539{
515#ifdef UNIV_DEBUG
516 buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);
517 ut_ad(buf_pool_mutex_own(buf_pool));
518#endif
519 ut_ad(mutex_own(buf_page_get_mutex(bpage)));540 ut_ad(mutex_own(buf_page_get_mutex(bpage)));
520 ut_ad(buf_page_get_io_fix(bpage) == BUF_IO_PIN);541 ut_ad(buf_page_get_io_fix(bpage) == BUF_IO_PIN);
521542
@@ -531,10 +552,6 @@
531/*==================*/552/*==================*/
532 const buf_page_t* bpage) /*!< control block being relocated */553 const buf_page_t* bpage) /*!< control block being relocated */
533{554{
534#ifdef UNIV_DEBUG
535 buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);
536 ut_ad(buf_pool_mutex_own(buf_pool));
537#endif
538 ut_ad(mutex_own(buf_page_get_mutex(bpage)));555 ut_ad(mutex_own(buf_page_get_mutex(bpage)));
539 ut_ad(buf_page_in_file(bpage));556 ut_ad(buf_page_in_file(bpage));
540 ut_ad(bpage->in_LRU_list);557 ut_ad(bpage->in_LRU_list);
@@ -554,8 +571,12 @@
554{571{
555#ifdef UNIV_DEBUG572#ifdef UNIV_DEBUG
556 buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);573 buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);
557 ut_ad(buf_pool_mutex_own(buf_pool));
558#endif574#endif
575 /* Buffer page mutex is not strictly required here for heuristic
576 purposes even if LRU mutex is not being held. Keep the assertion
577 for now since all the callers hold it. */
578 ut_ad(mutex_own(buf_page_get_mutex(bpage))
579 || mutex_own(&buf_pool->LRU_list_mutex));
559 ut_ad(buf_page_in_file(bpage));580 ut_ad(buf_page_in_file(bpage));
560581
561 return(bpage->old);582 return(bpage->old);
@@ -574,7 +595,7 @@
574 buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);595 buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);
575#endif /* UNIV_DEBUG */596#endif /* UNIV_DEBUG */
576 ut_a(buf_page_in_file(bpage));597 ut_a(buf_page_in_file(bpage));
577 ut_ad(buf_pool_mutex_own(buf_pool));598 ut_ad(mutex_own(&buf_pool->LRU_list_mutex));
578 ut_ad(bpage->in_LRU_list);599 ut_ad(bpage->in_LRU_list);
579600
580#ifdef UNIV_LRU_DEBUG601#ifdef UNIV_LRU_DEBUG
@@ -619,11 +640,7 @@
619/*==================*/640/*==================*/
620 buf_page_t* bpage) /*!< in/out: control block */641 buf_page_t* bpage) /*!< in/out: control block */
621{642{
622#ifdef UNIV_DEBUG
623 buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);
624 ut_ad(!buf_pool_mutex_own(buf_pool));
625 ut_ad(mutex_own(buf_page_get_mutex(bpage)));643 ut_ad(mutex_own(buf_page_get_mutex(bpage)));
626#endif
627 ut_a(buf_page_in_file(bpage));644 ut_a(buf_page_in_file(bpage));
628645
629 if (!bpage->access_time) {646 if (!bpage->access_time) {
@@ -885,10 +902,6 @@
885/*===========*/902/*===========*/
886 buf_block_t* block) /*!< in, own: block to be freed */903 buf_block_t* block) /*!< in, own: block to be freed */
887{904{
888 buf_pool_t* buf_pool = buf_pool_from_bpage((buf_page_t*) block);
889
890 buf_pool_mutex_enter(buf_pool);
891
892 mutex_enter(&block->mutex);905 mutex_enter(&block->mutex);
893906
894 ut_a(buf_block_get_state(block) != BUF_BLOCK_FILE_PAGE);907 ut_a(buf_block_get_state(block) != BUF_BLOCK_FILE_PAGE);
@@ -896,8 +909,6 @@
896 buf_LRU_block_free_non_file_page(block);909 buf_LRU_block_free_non_file_page(block);
897910
898 mutex_exit(&block->mutex);911 mutex_exit(&block->mutex);
899
900 buf_pool_mutex_exit(buf_pool);
901}912}
902#endif /* !UNIV_HOTBACKUP */913#endif /* !UNIV_HOTBACKUP */
903914
@@ -962,7 +973,7 @@
962973
963/********************************************************************//**974/********************************************************************//**
964Increments the modify clock of a frame by 1. The caller must (1) own the975Increments the modify clock of a frame by 1. The caller must (1) own the
965buf_pool mutex and block bufferfix count has to be zero, (2) or own an x-lock976LRU list mutex and block bufferfix count has to be zero, (2) or own an x-lock
966on the block. */977on the block. */
967UNIV_INLINE978UNIV_INLINE
968void979void
@@ -973,7 +984,7 @@
973#ifdef UNIV_SYNC_DEBUG984#ifdef UNIV_SYNC_DEBUG
974 buf_pool_t* buf_pool = buf_pool_from_bpage((buf_page_t*) block);985 buf_pool_t* buf_pool = buf_pool_from_bpage((buf_page_t*) block);
975986
976 ut_ad((buf_pool_mutex_own(buf_pool)987 ut_ad((mutex_own(&buf_pool->LRU_list_mutex)
977 && (block->page.buf_fix_count == 0))988 && (block->page.buf_fix_count == 0))
978 || rw_lock_own(&(block->lock), RW_LOCK_EXCLUSIVE));989 || rw_lock_own(&(block->lock), RW_LOCK_EXCLUSIVE));
979#endif /* UNIV_SYNC_DEBUG */990#endif /* UNIV_SYNC_DEBUG */
@@ -1371,39 +1382,6 @@
1371 sync_thread_add_level(&block->lock, level, FALSE);1382 sync_thread_add_level(&block->lock, level, FALSE);
1372}1383}
1373#endif /* UNIV_SYNC_DEBUG */1384#endif /* UNIV_SYNC_DEBUG */
1374/********************************************************************//**
1375Acquire mutex on all buffer pool instances. */
1376UNIV_INLINE
1377void
1378buf_pool_mutex_enter_all(void)
1379/*==========================*/
1380{
1381 ulint i;
1382
1383 for (i = 0; i < srv_buf_pool_instances; i++) {
1384 buf_pool_t* buf_pool;
1385
1386 buf_pool = buf_pool_from_array(i);
1387 buf_pool_mutex_enter(buf_pool);
1388 }
1389}
1390
1391/********************************************************************//**
1392Release mutex on all buffer pool instances. */
1393UNIV_INLINE
1394void
1395buf_pool_mutex_exit_all(void)
1396/*=========================*/
1397{
1398 ulint i;
1399
1400 for (i = 0; i < srv_buf_pool_instances; i++) {
1401 buf_pool_t* buf_pool;
1402
1403 buf_pool = buf_pool_from_array(i);
1404 buf_pool_mutex_exit(buf_pool);
1405 }
1406}
1407/*********************************************************************//**1385/*********************************************************************//**
1408Get the nth chunk's buffer block in the specified buffer pool.1386Get the nth chunk's buffer block in the specified buffer pool.
1409@return the nth chunk's buffer block. */1387@return the nth chunk's buffer block. */
@@ -1421,4 +1399,26 @@
1421 *chunk_size = chunk->size;1399 *chunk_size = chunk->size;
1422 return(chunk->blocks);1400 return(chunk->blocks);
1423}1401}
1402
1403#ifdef UNIV_DEBUG
1404/********************************************************************//**
1405Checks if buf_pool->zip_mutex is owned and is serving for a given page as its
1406block mutex.
1407@return true if buf_pool->zip_mutex is owned. */
1408UNIV_INLINE
1409bool
1410buf_own_zip_mutex_for_page(
1411/*=======================*/
1412 const buf_page_t* bpage)
1413{
1414 buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);
1415
1416 ut_ad(buf_page_get_state(bpage) == BUF_BLOCK_ZIP_PAGE
1417 || buf_page_get_state(bpage) == BUF_BLOCK_ZIP_DIRTY);
1418 ut_ad(buf_page_get_mutex(bpage) == &buf_pool->zip_mutex);
1419
1420 return(mutex_own(&buf_pool->zip_mutex));
1421}
1422#endif /* UNIV_DEBUG */
1423
1424#endif /* !UNIV_HOTBACKUP */1424#endif /* !UNIV_HOTBACKUP */
14251425
=== modified file 'Percona-Server/storage/innobase/include/buf0flu.h'
--- Percona-Server/storage/innobase/include/buf0flu.h 2013-08-16 09:11:51 +0000
+++ Percona-Server/storage/innobase/include/buf0flu.h 2013-09-20 05:29:11 +0000
@@ -37,7 +37,7 @@
37extern ibool buf_page_cleaner_is_active;37extern ibool buf_page_cleaner_is_active;
3838
39/********************************************************************//**39/********************************************************************//**
40Remove a block from the flush list of modified blocks. */40Remove a block from the flush list of modified blocks. */
41UNIV_INTERN41UNIV_INTERN
42void42void
43buf_flush_remove(43buf_flush_remove(
@@ -75,9 +75,9 @@
75# if defined UNIV_DEBUG || defined UNIV_IBUF_DEBUG75# if defined UNIV_DEBUG || defined UNIV_IBUF_DEBUG
76/********************************************************************//**76/********************************************************************//**
77Writes a flushable page asynchronously from the buffer pool to a file.77Writes a flushable page asynchronously from the buffer pool to a file.
78NOTE: buf_pool->mutex and block->mutex must be held upon entering this78NOTE: block->mutex must be held upon entering this function, and they will be
79function, and they will be released by this function after flushing.79released by this function after flushing. This is loosely based on
80This is loosely based on buf_flush_batch() and buf_flush_page().80buf_flush_batch() and buf_flush_page().
81@return TRUE if the page was flushed and the mutexes released */81@return TRUE if the page was flushed and the mutexes released */
82UNIV_INTERN82UNIV_INTERN
83ibool83ibool
@@ -232,9 +232,8 @@
232Writes a flushable page asynchronously from the buffer pool to a file.232Writes a flushable page asynchronously from the buffer pool to a file.
233NOTE: in simulated aio we must call233NOTE: in simulated aio we must call
234os_aio_simulated_wake_handler_threads after we have posted a batch of234os_aio_simulated_wake_handler_threads after we have posted a batch of
235writes! NOTE: buf_pool->mutex and buf_page_get_mutex(bpage) must be235writes! NOTE: buf_page_get_mutex(bpage) must be held upon entering this
236held upon entering this function, and they will be released by this236function, and they will be released by this function. */
237function. */
238UNIV_INTERN237UNIV_INTERN
239void238void
240buf_flush_page(239buf_flush_page(
241240
=== modified file 'Percona-Server/storage/innobase/include/buf0flu.ic'
--- Percona-Server/storage/innobase/include/buf0flu.ic 2013-08-06 15:16:34 +0000
+++ Percona-Server/storage/innobase/include/buf0flu.ic 2013-09-20 05:29:11 +0000
@@ -69,7 +69,6 @@
69 ut_ad(rw_lock_own(&(block->lock), RW_LOCK_EX));69 ut_ad(rw_lock_own(&(block->lock), RW_LOCK_EX));
70#endif /* UNIV_SYNC_DEBUG */70#endif /* UNIV_SYNC_DEBUG */
7171
72 ut_ad(!buf_pool_mutex_own(buf_pool));
73 ut_ad(!buf_flush_list_mutex_own(buf_pool));72 ut_ad(!buf_flush_list_mutex_own(buf_pool));
74 ut_ad(!mtr->made_dirty || log_flush_order_mutex_own());73 ut_ad(!mtr->made_dirty || log_flush_order_mutex_own());
7574
@@ -116,7 +115,6 @@
116 ut_ad(rw_lock_own(&(block->lock), RW_LOCK_EX));115 ut_ad(rw_lock_own(&(block->lock), RW_LOCK_EX));
117#endif /* UNIV_SYNC_DEBUG */116#endif /* UNIV_SYNC_DEBUG */
118117
119 ut_ad(!buf_pool_mutex_own(buf_pool));
120 ut_ad(!buf_flush_list_mutex_own(buf_pool));118 ut_ad(!buf_flush_list_mutex_own(buf_pool));
121 ut_ad(log_flush_order_mutex_own());119 ut_ad(log_flush_order_mutex_own());
122120
123121
=== modified file 'Percona-Server/storage/innobase/include/buf0lru.h'
--- Percona-Server/storage/innobase/include/buf0lru.h 2013-08-16 09:11:51 +0000
+++ Percona-Server/storage/innobase/include/buf0lru.h 2013-09-20 05:29:11 +0000
@@ -79,12 +79,14 @@
79Try to free a block. If bpage is a descriptor of a compressed-only79Try to free a block. If bpage is a descriptor of a compressed-only
80page, the descriptor object will be freed as well.80page, the descriptor object will be freed as well.
8181
82NOTE: If this function returns true, it will temporarily82NOTE: If this function returns true, it will release the LRU list mutex,
83release buf_pool->mutex. Furthermore, the page frame will no longer be83and temporarily release and relock the buf_page_get_mutex() mutex.
84accessible via bpage.84Furthermore, the page frame will no longer be accessible via bpage. If this
8585function returns false, the buf_page_get_mutex() might be temporarily released
86The caller must hold buf_pool->mutex and must not hold any86and relocked too.
87buf_page_get_mutex() when calling this function.87
88The caller must hold the LRU list and buf_page_get_mutex() mutexes.
89
88@return true if freed, false otherwise. */90@return true if freed, false otherwise. */
89UNIV_INTERN91UNIV_INTERN
90bool92bool
@@ -291,7 +293,7 @@
291extern buf_LRU_stat_t buf_LRU_stat_cur;293extern buf_LRU_stat_t buf_LRU_stat_cur;
292294
293/** Running sum of past values of buf_LRU_stat_cur.295/** Running sum of past values of buf_LRU_stat_cur.
294Updated by buf_LRU_stat_update(). Protected by buf_pool->mutex. */296Updated by buf_LRU_stat_update(). */
295extern buf_LRU_stat_t buf_LRU_stat_sum;297extern buf_LRU_stat_t buf_LRU_stat_sum;
296298
297/********************************************************************//**299/********************************************************************//**
298300
=== modified file 'Percona-Server/storage/innobase/include/sync0sync.h'
--- Percona-Server/storage/innobase/include/sync0sync.h 2013-08-06 15:16:34 +0000
+++ Percona-Server/storage/innobase/include/sync0sync.h 2013-09-20 05:29:11 +0000
@@ -71,6 +71,11 @@
71extern mysql_pfs_key_t buffer_block_mutex_key;71extern mysql_pfs_key_t buffer_block_mutex_key;
72extern mysql_pfs_key_t buf_pool_mutex_key;72extern mysql_pfs_key_t buf_pool_mutex_key;
73extern mysql_pfs_key_t buf_pool_zip_mutex_key;73extern mysql_pfs_key_t buf_pool_zip_mutex_key;
74extern mysql_pfs_key_t buf_pool_LRU_list_mutex_key;
75extern mysql_pfs_key_t buf_pool_free_list_mutex_key;
76extern mysql_pfs_key_t buf_pool_zip_free_mutex_key;
77extern mysql_pfs_key_t buf_pool_zip_hash_mutex_key;
78extern mysql_pfs_key_t buf_pool_flush_state_mutex_key;
74extern mysql_pfs_key_t cache_last_read_mutex_key;79extern mysql_pfs_key_t cache_last_read_mutex_key;
75extern mysql_pfs_key_t dict_foreign_err_mutex_key;80extern mysql_pfs_key_t dict_foreign_err_mutex_key;
76extern mysql_pfs_key_t dict_sys_mutex_key;81extern mysql_pfs_key_t dict_sys_mutex_key;
@@ -632,7 +637,7 @@
632Search system mutex637Search system mutex
633|638|
634V639V
635Buffer pool mutex640Buffer pool mutexes
636|641|
637V642V
638Log mutex643Log mutex
@@ -723,11 +728,15 @@
723 SYNC_SEARCH_SYS, as memory allocation728 SYNC_SEARCH_SYS, as memory allocation
724 can call routines there! Otherwise729 can call routines there! Otherwise
725 the level is SYNC_MEM_HASH. */730 the level is SYNC_MEM_HASH. */
726#define SYNC_BUF_POOL 150 /* Buffer pool mutex */731#define SYNC_BUF_LRU_LIST 151
727#define SYNC_BUF_PAGE_HASH 149 /* buf_pool->page_hash rw_lock */732#define SYNC_BUF_PAGE_HASH 149 /* buf_pool->page_hash rw_lock */
728#define SYNC_BUF_BLOCK 146 /* Block mutex */733#define SYNC_BUF_BLOCK 146 /* Block mutex */
729#define SYNC_BUF_FLUSH_LIST 145 /* Buffer flush list mutex */734#define SYNC_BUF_FREE_LIST 145
730#define SYNC_DOUBLEWRITE 140735#define SYNC_BUF_ZIP_FREE 144
736#define SYNC_BUF_ZIP_HASH 143
737#define SYNC_BUF_FLUSH_STATE 142
738#define SYNC_BUF_FLUSH_LIST 141 /* Buffer flush list mutex */
739#define SYNC_DOUBLEWRITE 139
731#define SYNC_ANY_LATCH 135740#define SYNC_ANY_LATCH 135
732#define SYNC_MEM_HASH 131741#define SYNC_MEM_HASH 131
733#define SYNC_MEM_POOL 130742#define SYNC_MEM_POOL 130
734743
=== modified file 'Percona-Server/storage/innobase/sync/sync0sync.cc'
--- Percona-Server/storage/innobase/sync/sync0sync.cc 2013-09-02 10:01:38 +0000
+++ Percona-Server/storage/innobase/sync/sync0sync.cc 2013-09-20 05:29:11 +0000
@@ -1201,7 +1201,11 @@
1201 /* fallthrough */1201 /* fallthrough */
1202 }1202 }
1203 case SYNC_BUF_FLUSH_LIST:1203 case SYNC_BUF_FLUSH_LIST:
1204 case SYNC_BUF_POOL:1204 case SYNC_BUF_LRU_LIST:
1205 case SYNC_BUF_FREE_LIST:
1206 case SYNC_BUF_ZIP_FREE:
1207 case SYNC_BUF_ZIP_HASH:
1208 case SYNC_BUF_FLUSH_STATE:
1205 /* We can have multiple mutexes of this type therefore we1209 /* We can have multiple mutexes of this type therefore we
1206 can only check whether the greater than condition holds. */1210 can only check whether the greater than condition holds. */
1207 if (!sync_thread_levels_g(array, level-1, TRUE)) {1211 if (!sync_thread_levels_g(array, level-1, TRUE)) {
@@ -1215,17 +1219,12 @@
12151219
1216 case SYNC_BUF_PAGE_HASH:1220 case SYNC_BUF_PAGE_HASH:
1217 /* Multiple page_hash locks are only allowed during1221 /* Multiple page_hash locks are only allowed during
1218 buf_validate and that is where buf_pool mutex is already1222 buf_validate. */
1219 held. */
1220 /* Fall through */1223 /* Fall through */
12211224
1222 case SYNC_BUF_BLOCK:1225 case SYNC_BUF_BLOCK:
1223 /* Either the thread must own the buffer pool mutex
1224 (buf_pool->mutex), or it is allowed to latch only ONE
1225 buffer block (block->mutex or buf_pool->zip_mutex). */
1226 if (!sync_thread_levels_g(array, level, FALSE)) {1226 if (!sync_thread_levels_g(array, level, FALSE)) {
1227 ut_a(sync_thread_levels_g(array, level - 1, TRUE));1227 ut_a(sync_thread_levels_g(array, level - 1, TRUE));
1228 ut_a(sync_thread_levels_contain(array, SYNC_BUF_POOL));
1229 }1228 }
1230 break;1229 break;
1231 case SYNC_REC_LOCK:1230 case SYNC_REC_LOCK:

Subscribers

People subscribed via source and target branches