Merge lp:~laurynas-biveinis/percona-server/bp-split-5.6 into lp:percona-server/5.6
- bp-split-5.6
- Merge into 5.6
Status: | Merged |
---|---|
Approved by: | Alexey Kopytov |
Approved revision: | no longer in the source branch. |
Merged at revision: | 425 |
Proposed branch: | lp:~laurynas-biveinis/percona-server/bp-split-5.6 |
Merge into: | lp:percona-server/5.6 |
Diff against target: |
4913 lines (+1001/-789) 22 files modified
Percona-Server/storage/innobase/btr/btr0cur.cc (+18/-6) Percona-Server/storage/innobase/btr/btr0sea.cc (+5/-8) Percona-Server/storage/innobase/buf/buf0buddy.cc (+46/-18) Percona-Server/storage/innobase/buf/buf0buf.cc (+220/-196) Percona-Server/storage/innobase/buf/buf0dblwr.cc (+1/-0) Percona-Server/storage/innobase/buf/buf0dump.cc (+8/-8) Percona-Server/storage/innobase/buf/buf0flu.cc (+153/-125) Percona-Server/storage/innobase/buf/buf0lru.cc (+280/-177) Percona-Server/storage/innobase/buf/buf0rea.cc (+41/-26) Percona-Server/storage/innobase/fsp/fsp0fsp.cc (+1/-1) Percona-Server/storage/innobase/handler/ha_innodb.cc (+12/-2) Percona-Server/storage/innobase/handler/i_s.cc (+14/-9) Percona-Server/storage/innobase/ibuf/ibuf0ibuf.cc (+1/-1) Percona-Server/storage/innobase/include/buf0buddy.h (+4/-4) Percona-Server/storage/innobase/include/buf0buddy.ic (+9/-10) Percona-Server/storage/innobase/include/buf0buf.h (+91/-108) Percona-Server/storage/innobase/include/buf0buf.ic (+63/-63) Percona-Server/storage/innobase/include/buf0flu.h (+6/-7) Percona-Server/storage/innobase/include/buf0flu.ic (+0/-2) Percona-Server/storage/innobase/include/buf0lru.h (+9/-7) Percona-Server/storage/innobase/include/sync0sync.h (+13/-4) Percona-Server/storage/innobase/sync/sync0sync.cc (+6/-7) |
To merge this branch: | bzr merge lp:~laurynas-biveinis/percona-server/bp-split-5.6 |
Related bugs: | |
Related blueprints: |
Buffer pool mutex split for 5.6
(Essential)
|
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Alexey Kopytov (community) | Approve | ||
Laurynas Biveinis | Pending | ||
Review via email: mp+186711@code.launchpad.net |
This proposal supersedes a proposal from 2013-09-09.
Commit message
Description of the change
Repushed and resubmitted. The only change from the 3rd push is atomic ops split off to be handled later.
http://
No BT or ST, but 5.6 GA prerequisite.
Alexey Kopytov (akopytov) wrote : Posted in a previous version of this proposal | # |
Laurynas Biveinis (laurynas-biveinis) wrote : Posted in a previous version of this proposal | # |
Alexey -
> - the code in btr_blob_free() can be simplified
Simplified.
> - wrong comments for buf_LRU_
> mutex/buf_
Edited. But there are numerous other places in this patch (and upstream) that would need this editing too, and "block mutex" is already an established shorthand for "really a block mutex or buf_pool-
Do you want me to edit the other places too?
> - the comments for buf_LRU_free_page() say that both LRU_list_mutex
> and block_mutex may be released temporarily if ‘true’ is
> returned. But:
> 1) even if ‘false’ is returned, block_mutex may also be released
> temporarily
> 2) the comments don’t mention that if ‘true’ is returned,
> LRU_list_mutex is always released upon return, but block_mutex is
> always locked. And callers of buf_LRU_free_page() rely on that.
Indeed callers rely on the current, arguably messy, buf_LRU_free_page() locking. This is how I edited the header comment for this and the previous review comment:
/******
Try to free a block. If bpage is a descriptor of a compressed-only
page, the descriptor object will be freed as well.
NOTE: If this function returns true, it will release the LRU list mutex,
and temporarily release and relock the buf_page_
Furthermore, the page frame will no longer be accessible via bpage. If this
function returns false, the buf_page_
released and relocked too.
The caller must hold the LRU list and buf_page_
@return true if freed, false otherwise. */
> - the following code in buf_LRU_free_page() is missing a
> buf_page_
> memory leak.
Fixed.
> - the patch removes buf_pool_
> btr_search_
> reads from ‘block’ before it locks block->mutex. Any reasons to not
> lock block->mutex earlier?
I *think* there were actual reasons, but I cannot remember them, due to the number of things going on with this patch. And I don't see why it locking block->mutex earlier is not possible now. I will look further.
> - the following checks for mutex != NULL in buf_buddy_
> to be redundant, since they are made after mutex_enter(mutex), so we
> are guaranteed mutex != NULL if we reach that code:
Fixed. This looks like a missed cleanup after removing 5.5's buf_page_
> - asserting that zip_free_mutex is locked also looks redundant to me,
> because it is locked just a few lines above, and there’s nothing in
> the code path that could release it.
Removed. Added a comment to the function header "The caller must hold zip_free_mutex, and this
function will release and lock it again." instead.
> - os_atomic_
> need that stuff. Their names are misleading as they don’t enforce
> any atomicity.
The ops being load and store, their atom...
Alexey Kopytov (akopytov) wrote : Posted in a previous version of this proposal | # |
Hi Laurynas,
On Wed, 11 Sep 2013 09:45:13 -0000, Laurynas Biveinis wrote:
> Alexey -
>
>> - the code in btr_blob_free() can be simplified
>
> Simplified.
>
>> - wrong comments for buf_LRU_
>> mutex/buf_
>
> Edited. But there are numerous other places in this patch (and upstream) that would need this editing too, and "block mutex" is already an established shorthand for "really a block mutex or buf_pool-
>
I think editing existing comments is worth the efforts (and potential
extra maintenance cost in the future). I would be OK if this specific
comment was left intact too. It only caught my eye because the comment
was edited and I spent some time verifying it.
> Do you want me to edit the other places too?
>
>> - the comments for buf_LRU_free_page() say that both LRU_list_mutex
>> and block_mutex may be released temporarily if ‘true’ is
>> returned. But:
>> 1) even if ‘false’ is returned, block_mutex may also be released
>> temporarily
>> 2) the comments don’t mention that if ‘true’ is returned,
>> LRU_list_mutex is always released upon return, but block_mutex is
>> always locked. And callers of buf_LRU_free_page() rely on that.
>
> Indeed callers rely on the current, arguably messy, buf_LRU_free_page() locking. This is how I edited the header comment for this and the previous review comment:
>
> /******
> Try to free a block. If bpage is a descriptor of a compressed-only
> page, the descriptor object will be freed as well.
>
> NOTE: If this function returns true, it will release the LRU list mutex,
> and temporarily release and relock the buf_page_
> Furthermore, the page frame will no longer be accessible via bpage. If this
> function returns false, the buf_page_
> released and relocked too.
>
> The caller must hold the LRU list and buf_page_
>
> @return true if freed, false otherwise. */
>
>
Looks good.
>> - the patch removes buf_pool_
>> btr_search_
>> reads from ‘block’ before it locks block->mutex. Any reasons to not
>> lock block->mutex earlier?
>
> I *think* there were actual reasons, but I cannot remember them, due to the number of things going on with this patch. And I don't see why it locking block->mutex earlier is not possible now. I will look further.
>
OK.
>> - os_atomic_
>> need that stuff. Their names are misleading as they don’t enforce
>> any atomicity.
>
> The ops being load and store, their atomicity is enforced by the data type width.
>
Right, the atomicity is enforced by the data type width on those
architectures that provide it. And even those that do provide it have a
number of prerequisites. Neither of those 2 facts is taken care of in
os_atomic_
different with respect to atomi...
Alexey Kopytov (akopytov) wrote : Posted in a previous version of this proposal | # |
On Wed, 11 Sep 2013 16:06:21 +0400, Alexey Kopytov wrote:
> I think editing existing comments is worth the efforts (and potential
> extra maintenance cost in the future). I would be OK if this specific
Grr, that was supposed to be "NOT worth the efforts" of course.
Laurynas Biveinis (laurynas-biveinis) wrote : Posted in a previous version of this proposal | # |
> > - the patch removes buf_pool_
> > btr_search_
> > reads from ‘block’ before it locks block->mutex. Any reasons to not
> > lock block->mutex earlier?
>
> I *think* there were actual reasons, but I cannot remember them, due to the
> number of things going on with this patch. And I don't see why it locking
> block->mutex earlier is not possible now. I will look further.
A test run helped to recover my memories. So the problem is buf_block_
Alexey Kopytov (akopytov) wrote : Posted in a previous version of this proposal | # |
Hi Laurynas,
On Wed, 11 Sep 2013 12:26:57 -0000, Laurynas Biveinis wrote:
>>> - the patch removes buf_pool_
>>> btr_search_
>>> reads from ‘block’ before it locks block->mutex. Any reasons to not
>>> lock block->mutex earlier?
>>
>> I *think* there were actual reasons, but I cannot remember them, due to the
>> number of things going on with this patch. And I don't see why it locking
>> block->mutex earlier is not possible now. I will look further.
>
> A test run helped to recover my memories. So the problem is buf_block_
>
Right, makes sense.
Laurynas Biveinis (laurynas-biveinis) wrote : Posted in a previous version of this proposal | # |
> >> - wrong comments for buf_LRU_
> >> mutex/buf_
> >
> > Edited. But there are numerous other places in this patch (and upstream)
> that would need this editing too, and "block mutex" is already an established
> shorthand for "really a block mutex or buf_pool-
> pointer to mutex variables named block_mutex.
> >
>
> I think editing existing comments is not worth the efforts (and potential
> extra maintenance cost in the future). I would be OK if this specific
> comment was left intact too. It only caught my eye because the comment
> was edited and I spent some time verifying it.
Fair enough. I applied same edit through the other comments changed by this patch.
> >> - os_atomic_
> >> need that stuff. Their names are misleading as they don’t enforce
> >> any atomicity.
> >
> > The ops being load and store, their atomicity is enforced by the data type
> width.
> >
>
> Right, the atomicity is enforced by the data type width on those
> architectures that provide it.
I forgot to mention that they also have to be not misaligned so that one access does not translate to two accesses.
> And even those that do provide it have a
> number of prerequisites. Neither of those 2 facts is taken care of in
> os_atomic_
> different with respect to atomicity as plain load/store of a ulint and
> thus, have misleading names.
>
> So to justify "atomic" in their names those functions should:
>
> - (if we want to be portable) protect those load/stores with a mutex
Why? I guess this question boils down to, what would the mutex implementation code additionally ensure here, let's say, on x86_64? Or is this referring to the 5.6 mutex fallbacks when no atomic ops are implemented for a platform?
> - (if we only care about x86/x86_64) make sure that values being
> loaded/stored do not cross cache lines or page boundaries. Which is of
> course impossible to guarantee in a generic function.
Why? We are talking about ulints here only, and I was not able to find such requirements in the x86_64 memory model descriptions. There is a requirement to be aligned, and misaligned stores/loads might indeed cross cache line or page boundaries, and anything that crosses them is indeed non-atomic. But alignment is possible to guarantee in a generic function (which doesn't even has to be generic: the x86_64 implementation is for x86_64 only, obviously).
Intel® 64 and IA-32 Architectures Software Developer's Manual
Volume 3A: System Programming Guide, Part 1, section 8.1.1, http://
"The Intel486 processor (and newer processors since) guarantees that the following basic memory operations will
always be carried out atomically:
(...)
• Reading or writing a doubleword aligned on a 32-bit boundary
The Pentium processor (...):
• Reading or writing a quadword aligned on a 64-bit boundary
"
My understanding of the above is that os_atomic_
Alexey Kopytov (akopytov) wrote : Posted in a previous version of this proposal | # |
Hi Laurynas,
On Wed, 11 Sep 2013 15:29:33 -0000, Laurynas Biveinis wrote:
>>>> - os_atomic_
>>>> need that stuff. Their names are misleading as they don’t enforce
>>>> any atomicity.
>>>
>>> The ops being load and store, their atomicity is enforced by the data type
>> width.
>>>
>>
>> Right, the atomicity is enforced by the data type width on those
>> architectures that provide it.
>
>
> I forgot to mention that they also have to be not misaligned so that one access does not translate to two accesses.
>
Yes, but alignment does not guarantee atomicity, see below.
>
>> And even those that do provide it have a
>> number of prerequisites. Neither of those 2 facts is taken care of in
>> os_atomic_
>> different with respect to atomicity as plain load/store of a ulint and
>> thus, have misleading names.
>>
>> So to justify "atomic" in their names those functions should:
>>
>> - (if we want to be portable) protect those load/stores with a mutex
>
>
> Why? I guess this question boils down to, what would the mutex implementation code additionally ensure here, let's say, on x86_64? Or is this referring to the 5.6 mutex fallbacks when no atomic ops are implemented for a platform?
>
mutex the only portable way to ensure atomicity. You can use atomic
primitives provided by specific architectures, but then you either limit
support for those architectures or yes, provide a mutex fallback.
>
>> - (if we only care about x86/x86_64) make sure that values being
>> loaded/stored do not cross cache lines or page boundaries. Which is of
>> course impossible to guarantee in a generic function.
>
>
> Why? We are talking about ulints here only, and I was not able to find such requirements in the x86_64 memory model descriptions. There is a requirement to be aligned, and misaligned stores/loads might indeed cross cache line or page boundaries, and anything that crosses them is indeed non-atomic. But alignment is possible to guarantee in a generic function (which doesn't even has to be generic: the x86_64 implementation is for x86_64 only, obviously).
>
> Intel® 64 and IA-32 Architectures Software Developer's Manual
> Volume 3A: System Programming Guide, Part 1, section 8.1.1, http://
>
> "The Intel486 processor (and newer processors since) guarantees that the following basic memory operations will
> always be carried out atomically:
> (...)
> • Reading or writing a doubleword aligned on a 32-bit boundary
>
> The Pentium processor (...):
> • Reading or writing a quadword aligned on a 64-bit boundary
> "
Why didn't you quote it further?
"
Accesses to cacheable memory that are split across cache lines and page
boundaries are not guaranteed to be atomic by <all processors>. <all
processors> provide bus control signals that permit external memory
subsystems to make split accesses atomic;
"
Which means even aligned accesses are not guaranteed to be atomic and
it's up to the implementation of "external memory subsystems" (that
probably means chipsets, motherboards, NUMA archi...
Laurynas Biveinis (laurynas-biveinis) wrote : Posted in a previous version of this proposal | # |
Alexey -
> >> - (if we only care about x86/x86_64) make sure that values being
> >> loaded/stored do not cross cache lines or page boundaries. Which is of
> >> course impossible to guarantee in a generic function.
> >
> >
> > Why? We are talking about ulints here only, and I was not able to find such
> requirements in the x86_64 memory model descriptions. There is a requirement
> to be aligned, and misaligned stores/loads might indeed cross cache line or
> page boundaries, and anything that crosses them is indeed non-atomic. But
> alignment is possible to guarantee in a generic function (which doesn't even
> has to be generic: the x86_64 implementation is for x86_64 only, obviously).
> >
> > Intel® 64 and IA-32 Architectures Software Developer's Manual
> > Volume 3A: System Programming Guide, Part 1, section 8.1.1,
> http://
> >
> > "The Intel486 processor (and newer processors since) guarantees that the
> following basic memory operations will
> > always be carried out atomically:
> > (...)
> > • Reading or writing a doubleword aligned on a 32-bit boundary
> >
> > The Pentium processor (...):
> > • Reading or writing a quadword aligned on a 64-bit boundary
> > "
>
> Why didn't you quote it further?
>
> "
> Accesses to cacheable memory that are split across cache lines and page
> boundaries are not guaranteed to be atomic by <all processors>. <all
> processors> provide bus control signals that permit external memory
> subsystems to make split accesses atomic;
> "
>
> Which means even aligned accesses are not guaranteed to be atomic and
> it's up to the implementation of "external memory subsystems" (that
> probably means chipsets, motherboards, NUMA architectures and the like).
I didn't quote because we both have already acknowledged that cache line- or page boundary-crossing accesses are non-atomic, and because I don't see how it's relevant here. I don't see how a properly-aligned ulint can possibly cross a cache line boundary, when cache lines are 64-byte wide and 64-byte aligned. Or even 32 for older architectures.
> > My understanding of the above is that
> os_atomic_
> modulo alignment issues, if any. These are easy to ensure by ut_ad().
> >
>
> Modulo alignment, cache line boundary and page boundary issues.
Alignment only unless my reasoning above is wrong.
> I don't see how ut_ad() is going to help here. So a buf_pool_stat_t
> structure happens to be allocated in memory so that n_pages_written
> happens to be misaligned, or cross a cache line or a page boundary. How
> exactly ut_ad() is going to ensure that never happens at runtime?
A debug build would hit this assert and we'd fix the structure layout/allocation. Unless I'm mistaken, to get a misaligned ulint, we'd have to ask for this explicitly, by packing a struct, fetching a pointer to it from a byte array, etc. Thus ut_ad() seems reasonable to me.
> >>>> They should be named os_ordered_
> >>>> os_ordered_
> >>>
> >>> That's an option, but I needed atomicity, visibility, and ordering, and
> >> chose atomic for ...
Laurynas Biveinis (laurynas-biveinis) wrote : Posted in a previous version of this proposal | # |
buf_LRU_
Alexey Kopytov (akopytov) wrote : Posted in a previous version of this proposal | # |
Hi Laurynas,
On Thu, 12 Sep 2013 06:16:52 -0000, Laurynas Biveinis wrote:
> Alexey -
>
>
>>>> - (if we only care about x86/x86_64) make sure that values being
>>>> loaded/stored do not cross cache lines or page boundaries. Which is of
>>>> course impossible to guarantee in a generic function.
>>>
>>>
>>> Why? We are talking about ulints here only, and I was not able to find such
>> requirements in the x86_64 memory model descriptions. There is a requirement
>> to be aligned, and misaligned stores/loads might indeed cross cache line or
>> page boundaries, and anything that crosses them is indeed non-atomic. But
>> alignment is possible to guarantee in a generic function (which doesn't even
>> has to be generic: the x86_64 implementation is for x86_64 only, obviously).
>>>
>>> Intel® 64 and IA-32 Architectures Software Developer's Manual
>>> Volume 3A: System Programming Guide, Part 1, section 8.1.1,
>> http://
>>>
>>> "The Intel486 processor (and newer processors since) guarantees that the
>> following basic memory operations will
>>> always be carried out atomically:
>>> (...)
>>> • Reading or writing a doubleword aligned on a 32-bit boundary
>>>
>>> The Pentium processor (...):
>>> • Reading or writing a quadword aligned on a 64-bit boundary
>>> "
>>
>> Why didn't you quote it further?
>>
>> "
>> Accesses to cacheable memory that are split across cache lines and page
>> boundaries are not guaranteed to be atomic by <all processors>. <all
>> processors> provide bus control signals that permit external memory
>> subsystems to make split accesses atomic;
>> "
>>
>> Which means even aligned accesses are not guaranteed to be atomic and
>> it's up to the implementation of "external memory subsystems" (that
>> probably means chipsets, motherboards, NUMA architectures and the like).
>
>
> I didn't quote because we both have already acknowledged that cache line- or page boundary-crossing accesses are non-atomic, and because I don't see how it's relevant here. I don't see how a properly-aligned ulint can possibly cross a cache line boundary, when cache lines are 64-byte wide and 64-byte aligned. Or even 32 for older architectures.
>
The array of buffer pool descriptors is allocated as follows:
buf_pool_ptr = (buf_pool_t*) mem_zalloc(
n_instances * sizeof *buf_pool_ptr);
so individual buf_pool_t instances are not guaranteed to have any
specific alignment, neither to cache line nor to page boundaries, right?
Now, the 'stat' member of buf_pool_t has the offset of 736 bytes into
buf_pool_t so nothing prevents it from crossing a cache line or a page
boundary?
Now, offsets of the buf_pool_stat_t members vary from 0 to 88. Again,
nothing prevents them from crossing a cache line or a page boundary, right?
>
>>> My understanding of the above is that
>> os_atomic_
>> modulo alignment issues, if any. These are easy to ensure by ut_ad().
>>>
>>
>> Modulo alignment, cache line boundary and page boundary issues.
>
>
> Alignment only unless my reasoning above is wrong.
>
Yes.
>
>> I don't see how ut_ad() is going to help here. So a ...
Laurynas Biveinis (laurynas-biveinis) wrote : Posted in a previous version of this proposal | # |
Repushed branch with the 1st partial review comments. Not a resubmission due to partial review and ongoing discussions.
Changes from the 1st MP.
- Simplified btr_blob_free().
- Added a note about mutexes to the header comment of
buf_
- Removed redundant mutex == NULL checks and mutex own assertions
from buf_buddy_
- Fixed locking notes in buf_LRU_free_page() header comment.
- Removed a memory leak in one of the early exits in
buf_
- Clarified locking in a comment for buf_page_t::zip.
- Added debug build checks to os_atomic_
os_
variable is properly aligned.
- Added dropped by mistake buffer page space id and I/O fix 2nd
checks after the buf_page_
buf_
Please ignore the "Added debug build checks to os_atomic_
Laurynas Biveinis (laurynas-biveinis) wrote : Posted in a previous version of this proposal | # |
A Jenkins run of the latest branch turned up bug 1224432. Logged a separate bug because I am not sure I'll manage to debug it during this MP cycle.
"...
2013-09-12 12:15:38 7ff450082700 InnoDB: Assertion failure in thread 140687291459328 in file buf0buf.cc line 3694
InnoDB: Failing assertion: buf_fix_count > 0
...
Which appears to be a race condition on buf_fix_count on a page that is a sentinel for buffer pool watch. How exactly this can happen is not clear to me currently. All the watch sentinel buf_fix_count changes happen under zip_mutex and a corresponding page_hash X lock. Further, buf_page_
Laurynas Biveinis (laurynas-biveinis) wrote : Posted in a previous version of this proposal | # |
Alexey -
> I don't see how a properly-aligned ulint can possibly
> cross a cache line boundary, when cache lines are 64-byte wide and 64-byte
> aligned. Or even 32 for older architectures.
> >
>
> The array of buffer pool descriptors is allocated as follows:
You are right, I failed to consider the base addresses returned by dynamic memory allocation. I also failed to notice your hint to that direction in one of the previous mails.
> buf_pool_ptr = (buf_pool_t*) mem_zalloc(
> n_instances * sizeof *buf_pool_ptr);
>
> so individual buf_pool_t instances are not guaranteed to have any
> specific alignment, neither to cache line nor to page boundaries, right?
Right.
> Now, the 'stat' member of buf_pool_t has the offset of 736 bytes into
> buf_pool_t so nothing prevents it from crossing a cache line or a page
> boundary?
Right.
> Now, offsets of the buf_pool_stat_t members vary from 0 to 88. Again,
> nothing prevents them from crossing a cache line or a page boundary, right?
Right, nothing prevents an object of buf_pool_stat_t from crossing it. But that's OK. We only need the individual fields not to cross it.
> >> I don't see how ut_ad() is going to help here. So a buf_pool_stat_t
> >> structure happens to be allocated in memory so that n_pages_written
> >> happens to be misaligned, or cross a cache line or a page boundary. How
> >> exactly ut_ad() is going to ensure that never happens at runtime?
> >
> >
> > A debug build would hit this assert and we'd fix the structure
> layout/allocation. Unless I'm mistaken, to get a misaligned ulint, we'd have
> to ask for this explicitly, by packing a struct, fetching a pointer to it from
> a byte array, etc. Thus ut_ad() seems reasonable to me.
> >
>
> The only thing you can assume about dynamically allocated objects is
> that their addresses (and thus, the first member of a structure, if an
> object is a structure) is aligned to machine word size. Which is always
> lower than the cache line size. There are no guarantees on alignment of
> other structure members, no matter what compiler hints were used (those
> only matter for statically allocated objects).
Right. So, to conclude... no individual ulint is going to cross a cache line or a page boundary and we are good? We start with a machine-word aligned address returned from heap, and add a multiple of machine-word width to arrive at the address of an individual field, which is machine-word aligned and thus the individual field cannot cross anything?
> >>>>>> They should be named os_ordered_
> >>>>>> os_ordered_
> >>>>>
> >>>>> That's an option, but I needed atomicity, visibility, and ordering, and
> >>>> chose atomic for function name to match the existing CAS and atomic add
> >>>> operations, which also need all three.
> >>>>>
> >>>>
> >>>> I'm not sure you need all of those 3 in every occurrence of those
> >>>> functions, but see below.
> >>>
> >>>
> >>> That's right. And ordering probably is not needed anywhere, sorry about
> >> that, my understanding of atomics is far from fluent. But visibility
> should
> >> be needed in every occurrence if this is ever ported to a...
Alexey Kopytov (akopytov) wrote : Posted in a previous version of this proposal | # |
Hi Laurynas,
You are right that that os_atomic_
name suggests (i.e. be atomic) as long as:
1. it is used to access machine word sized members of structures
2. we are on x86/x86_64
However, the patch implements them as a generic primitives that do
nothing to enforce those restrictions, and that's why their names are
misleading. This is where this discussion has started, and it is a
design flaw. I don't see any arguments in this discussion that would
dispel those concerns. You also acknowledged them in one of the comments.
In contrast, other atomic primitives in existing code keep up to their
promise of being atomic, i.e. do not enforce any implicit requirements.
But they also have mutex-guarded fall back implementations for those
architectures that do not provide atomics.
I also agree that this discussion may be endless and time is precious.
So I think we should implement whatever we both agree does work. That
is: instead of implementing generic atomic primitives that are only
atomic under implicit requirements that are not enforceable at compile
time, it must either use separate mutex(es) to protect them, or use true
atomic primitives provided by the existing code if they are available on
the target architecture and fall back to mutex-guarded access. The
latter is how it is implemented in the rest of InnoDB.
Thanks.
Laurynas Biveinis (laurynas-biveinis) wrote : Posted in a previous version of this proposal | # |
Alexey -
> Hi Laurynas,
>
> You are right that that os_atomic_
> name suggests (i.e. be atomic) as long as:
>
> 1. it is used to access machine word sized members of structures
> 2. we are on x86/x86_64
Right.
> However, the patch implements them as a generic primitives that do
> nothing to enforce those restrictions, and that's why their names are
> misleading.
1. is enforced through "ulint" in the name and args. ulint is commented in univ.i as "unsigned long integer which should be equal to the word size of the machine".
2. is enforced by platform #ifdefs not providing any other implementation except one for x86/x86_64 with GCC or a GCC-like compiler.
Thus I provide generic primitives, whose current implementations will work as designed. However the 1. above also seems to be missing "properly-aligned" and that's where the design is debatable. On one hand it is possible to implement misaligned access atomically by LOCK MOV, and document that the primitives may be used with args of any alignment. But a better alternative to me seems to accept that misaligned accesses are bugs and document/allow aligned accesses only. Even though that's enforceable in debug builds only, so that's not ideally perfect, but IMHO acceptable.
> This is where this discussion has started, and it is a
> design flaw. I don't see any arguments in this discussion that would
> dispel those concerns. You also acknowledged them in one of the comments.
Addressed above.
> I also agree that this discussion may be endless and time is precious.
But it also cannot end prematurely.
> So I think we should implement whatever we both agree does work.
I suggest that the above either is working already or requires some improvements re. alignment only.
> That
> is: instead of implementing generic atomic primitives that are only
> atomic under implicit requirements that are not enforceable at compile
> time, it must either use separate mutex(es) to protect them, or use true
> atomic primitives provided by the existing code if they are available on
> the target architecture
If you either show that how I address 1. and 2. above is incorrect, either show that the alignment issue is major and unsurmountable, then I'll implement load as inc by zero, return old value, and store as dirty read + CAS in a loop using the existing primitives.
Thanks,
Laurynas
Alexey Kopytov (akopytov) wrote : Posted in a previous version of this proposal | # |
Hi Laurynas,
On Fri, 13 Sep 2013 07:40:10 -0000, Laurynas Biveinis wrote:
> Alexey -
>
>
>> Hi Laurynas,
>>
>> You are right that that os_atomic_
>> name suggests (i.e. be atomic) as long as:
>>
>> 1. it is used to access machine word sized members of structures
>> 2. we are on x86/x86_64
>
>
> Right.
>
>
>> However, the patch implements them as a generic primitives that do
>> nothing to enforce those restrictions, and that's why their names are
>> misleading.
>
>
> 1. is enforced through "ulint" in the name and args. ulint is commented in univ.i as "unsigned long integer which should be equal to the word size of the machine".
It is not enforced, because nothing prevents me from passing a
misaligned address to those functions and expect them to be atomic as
the name implies.
For example, os_atomic_
arguments on any platform. But os_atomic_
the problem.
> 2. is enforced by platform #ifdefs not providing any other implementation except one for x86/x86_64 with GCC or a GCC-like compiler.
>
That's correct. I only mentioned #2 for completeness.
> Thus I provide generic primitives, whose current implementations will work as designed. However the 1. above also seems to be missing "properly-aligned" and that's where the design is debatable. On one hand it is possible to implement misaligned access atomically by LOCK MOV, and document that the primitives may be used with args of any alignment. But a better alternative to me seems to accept that misaligned accesses are bugs and document/allow aligned accesses only. Even though that's enforceable in debug builds only, so that's not ideally perfect, but IMHO acceptable.
>
You don't.
>
> If you either show that how I address 1. and 2. above is incorrect, either show that the alignment issue is major and unsurmountable, then I'll implement load as inc by zero, return old value, and store as dirty read + CAS in a loop using the existing primitives.
>
Yes, please do.
Laurynas Biveinis (laurynas-biveinis) wrote : Posted in a previous version of this proposal | # |
Alexey -
> >> You are right that that os_atomic_
> >> name suggests (i.e. be atomic) as long as:
> >>
> >> 1. it is used to access machine word sized members of structures
> >> 2. we are on x86/x86_64
> >
> >
> > Right.
> >
> >
> >> However, the patch implements them as a generic primitives that do
> >> nothing to enforce those restrictions, and that's why their names are
> >> misleading.
> >
> >
> > 1. is enforced through "ulint" in the name and args. ulint is commented in
> univ.i as "unsigned long integer which should be equal to the word size of the
> machine".
>
> It is not enforced, because nothing prevents me from passing a
> misaligned address to those functions and expect them to be atomic as
> the name implies.
This is exactly what I discussed below.
> For example, os_atomic_
> arguments on any platform. But os_atomic_
> the problem.
Right. os_atomic_
> > 2. is enforced by platform #ifdefs not providing any other implementation
> except one for x86/x86_64 with GCC or a GCC-like compiler.
> >
>
> That's correct. I only mentioned #2 for completeness.
OK, but I am not sure what does the #2 complete then.
> > Thus I provide generic primitives, whose current implementations will work
> as designed. However the 1. above also seems to be missing "properly-aligned"
> and that's where the design is debatable. On one hand it is possible to
> implement misaligned access atomically by LOCK MOV, and document that the
> primitives may be used with args of any alignment. But a better alternative
> to me seems to accept that misaligned accesses are bugs and document/allow
> aligned accesses only. Even though that's enforceable in debug builds only,
> so that's not ideally perfect, but IMHO acceptable.
> >
>
> You don't.
Will you reply to the rest of that paragraph too please? I am acknowledging that alignment is an issue, so let's see how to resolve it.
Alexey Kopytov (akopytov) wrote : Posted in a previous version of this proposal | # |
Hi Laurynas,
On Wed, 11 Sep 2013 15:29:33 -0000, Laurynas Biveinis wrote:
>>>> - the following hunk simply removes reference to buf_pool->mutex. As I
>>>> understand, it should just replace buf_pool->mutex with
>> zip_free_mutex?
>>>> + page_zip_des_t zip; /*!< compressed page; state
>>>> + == BUF_BLOCK_ZIP_PAGE and zip.data
>>>> + == NULL means an active
>>>
>>> Hm, it looked to me that it's protected not with zip_free_mutex but with
>> zip_mutex in its page mutex capacity. I will check.
>>>
>>
>> There was a place in the code that asserted zip_free_mutex locked when
>> bpage->zip.data is modified. But I'm not sure if that is correct.
>
>
> I have checked, I believe it's indeed zip_mutex. Re. zip_free_mutex, you must be referring to this bit in buf_buddy_
>
> mutex_enter(
>
> if (buf_page_
> ...
> bpage->zip.data = (page_zip_t*) dst;
> mutex_exit(mutex);
>
> buf_buddy_stat_t* buddy_stat = &buf_pool-
> buddy_stat-
> buddy_stat-
> ...
>
> Here zip_free_mutex happens to protect buddy_stat and pushing it down to if clause would require an else clause to appear that locks the same mutex.
>
No, I was referring to buf_pool_
pool and examines (but not modifies) block->
block. However, the patch changes the assertion in
buf_pool_
rather than zip_mutex. In fact, in one of the code paths calling
buf_pool_
have a bug here?
Alexey Kopytov (akopytov) wrote : Posted in a previous version of this proposal | # |
On Fri, 13 Sep 2013 11:10:36 -0000, Laurynas Biveinis wrote:
>
>
> Right. os_atomic_
>
The same restrictions would apply even if os_atomic_
exist, right? I.e. the same restrictions would apply if we simply
accessed those variables without any helper functions?
Let me ask you a few simple questions and this time around I demand
"yes/no" answers.
- Do you agree that os_atomic_
not do what they promise to do?
- Do you agree that naming them os_ordered_
os_ordered_
- Do you agree that naming them that way also makes it obvious that
using them in most places is simply unnecessary (e.g. in
buf_get_
buf_get_
>
>>> 2. is enforced by platform #ifdefs not providing any other implementation
>> except one for x86/x86_64 with GCC or a GCC-like compiler.
>>>
>>
>> That's correct. I only mentioned #2 for completeness.
>
>
> OK, but I am not sure what does the #2 complete then.
>
>
>>> Thus I provide generic primitives, whose current implementations will work
>> as designed. However the 1. above also seems to be missing "properly-aligned"
>> and that's where the design is debatable. On one hand it is possible to
>> implement misaligned access atomically by LOCK MOV, and document that the
>> primitives may be used with args of any alignment. But a better alternative
>> to me seems to accept that misaligned accesses are bugs and document/allow
>> aligned accesses only. Even though that's enforceable in debug builds only,
>> so that's not ideally perfect, but IMHO acceptable.
>>>
>>
>> You don't.
>
>
> Will you reply to the rest of that paragraph too please? I am acknowledging that alignment is an issue, so let's see how to resolve it.
>
I don't think enforcing requirements in debug builds only is acceptable.
It must be a compile-time assertion, not a run-time one.
Laurynas Biveinis (laurynas-biveinis) wrote : Posted in a previous version of this proposal | # |
> > Right. os_atomic_
> os_atomic_
> is a performance bug in any case to perform misaligned atomic ops even with
> those ops that make it technically possible. I have added ut_ad()s to catch
> this. I can rename os_atomic_ prefix to os_atomic_aligned_ prefix too,
> although that one looks like an overkill to me.
> >
>
> The same restrictions would apply even if os_atomic_
> exist, right? I.e. the same restrictions would apply if we simply
> accessed those variables without any helper functions?
What would be the desired access semantics in that case? If "anything goes", then no restrictions would apply.
> Let me ask you a few simple questions and this time around I demand
> "yes/no" answers.
>
> - Do you agree that os_atomic_
> not do what they promise to do?
Yes.
> - Do you agree that naming them os_ordered_
> os_ordered_
No.
> - Do you agree that naming them that way also makes it obvious that
> using them in most places is simply unnecessary (e.g. in
> buf_get_
> buf_get_
No.
The first answer would be No if not the alignment issue.
> >>> Thus I provide generic primitives, whose current implementations will work
> >> as designed. However the 1. above also seems to be missing "properly-
> aligned"
> >> and that's where the design is debatable. On one hand it is possible to
> >> implement misaligned access atomically by LOCK MOV, and document that the
> >> primitives may be used with args of any alignment. But a better
> alternative
> >> to me seems to accept that misaligned accesses are bugs and document/allow
> >> aligned accesses only. Even though that's enforceable in debug builds
> only,
> >> so that's not ideally perfect, but IMHO acceptable.
> >>>
> >>
> >> You don't.
> >
> >
> > Will you reply to the rest of that paragraph too please? I am acknowledging
> that alignment is an issue, so let's see how to resolve it.
> >
>
> I don't think enforcing requirements in debug builds only is acceptable.
> It must be a compile-time assertion, not a run-time one.
And as we both know, this is not enforceable at compile time. I think that requesting extra protections on the top of already provided ones, when the only way to get a misaligned ulint is to ask for one explicitly, is an overkill. But that's my hand-waving against your hand-waving. Thus let's say that yes, "the alignment issue is major and unsurmountable", and I proceed to do what was offered previously: implement load as atomic add zero and return value, and store as dirty read and CAS until success. The reason I didn't like these implementations is that they are pessimized. But that's OK.
Alexey Kopytov (akopytov) wrote : Posted in a previous version of this proposal | # |
On Fri, 13 Sep 2013 12:54:44 -0000, Laurynas Biveinis wrote:
>>> Right. os_atomic_
>> os_atomic_
>> is a performance bug in any case to perform misaligned atomic ops even with
>> those ops that make it technically possible. I have added ut_ad()s to catch
>> this. I can rename os_atomic_ prefix to os_atomic_aligned_ prefix too,
>> although that one looks like an overkill to me.
>>>
>>
>> The same restrictions would apply even if os_atomic_
>> exist, right? I.e. the same restrictions would apply if we simply
>> accessed those variables without any helper functions?
>
>
> What would be the desired access semantics in that case? If "anything goes", then no restrictions would apply.
>
>
>> Let me ask you a few simple questions and this time around I demand
>> "yes/no" answers.
>>
>> - Do you agree that os_atomic_
>> not do what they promise to do?
>
>
> Yes.
OK, so we agree that naming is unfortunate.
>
>
>> - Do you agree that naming them os_ordered_
>> os_ordered_
>
>
> No.
What would be a better naming then?
>
>
>> - Do you agree that naming them that way also makes it obvious that
>> using them in most places is simply unnecessary (e.g. in
>> buf_get_
>> buf_get_
>
>
> No.
OK, then 2 followup questions:
1. Why do we need os_atomic_
example? Here's an example of a valid answer: "Because that will result
in incorrect values being used in case ...". And some examples of
invalid answers: "non-cache-coherent architectures, visibility, memory
model, sunspots, crop circles, global warming, ...".
2. Why do only 2 out of 9 values are being loaded with
os_atomic_
ones need them?
Laurynas Biveinis (laurynas-biveinis) wrote : Posted in a previous version of this proposal | # |
Alexey -
> >> - Do you agree that os_atomic_
> >> not do what they promise to do?
> >
> >
> > Yes.
>
> OK, so we agree that naming is unfortunate.
Vanishingly slightly so, due to the alignment issue, which I believe is mostly theoretical, but nevertheless am ready to address now.
> >> - Do you agree that naming them os_ordered_
> >> os_ordered_
> >
> >
> > No.
>
> What would be a better naming then?
os_atomic_
> OK, then 2 followup questions:
>
> 1. Why do we need os_atomic_
> example? Here's an example of a valid answer: "Because that will result
> in incorrect values being used in case ...". And some examples of
> invalid answers: "non-cache-coherent architectures, visibility, memory
> model, sunspots, crop circles, global warming, ...".
We have gone through this with a disassembly example already, haven't we? We need os_atomic_
Are you also objecting to mutex protection here? If not, why? Note that the three n_flush values here are completely independent.
mutex_
pending_io += buf_pool-
pending_io += buf_pool-
pending_io += buf_pool-
mutex_
And please don't group some of the valid answers with looney stuff.
> 2. Why do only 2 out of 9 values are being loaded with
> os_atomic_
> ones need them?
Upstream reads all 9 dirtily. I replaced two of them to be clean instead. Maybe I need to replace all 9. Maybe 0. But that's again orthogonal.
Laurynas Biveinis (laurynas-biveinis) wrote : Posted in a previous version of this proposal | # |
Alexey -
> No, I was referring to buf_pool_
> pool and examines (but not modifies) block->
> block. However, the patch changes the assertion in
> buf_pool_
> rather than zip_mutex.
Thanks for the pointer, I have reviewed buf_pool_
> In fact, in one of the code paths calling
> buf_pool_
> have a bug here?
buf_buddy_
Alexey Kopytov (akopytov) wrote : Posted in a previous version of this proposal | # |
Hi Laurynas,
On Fri, 13 Sep 2013 14:21:58 -0000, Laurynas Biveinis wrote:
> Alexey -
>
>>>> - Do you agree that os_atomic_
>>>> not do what they promise to do?
>>>
>>>
>>> Yes.
>>
>> OK, so we agree that naming is unfortunate.
>
>
> Vanishingly slightly so, due to the alignment issue, which I believe is mostly theoretical, but nevertheless am ready to address now.
>
It doesn't do anything to enforce atomicity, does it? I.e. the following
implementation would be equally "atomic":
ulint
os_atomic_
{
printf("Hello, world!\n");
return(*ptr);
}
>
>>>> - Do you agree that naming them os_ordered_
>>>> os_ordered_
>>>
>>>
>>> No.
>>
>> What would be a better naming then?
>
>
> os_atomic_
>
No, it doesn't do anything to enforce atomicity. That is a caller's
responsibility.
>
>> OK, then 2 followup questions:
>>
>> 1. Why do we need os_atomic_
>> example? Here's an example of a valid answer: "Because that will result
>> in incorrect values being used in case ...". And some examples of
>> invalid answers: "non-cache-coherent architectures, visibility, memory
>> model, sunspots, crop circles, global warming, ...".
>
>
> We have gone through this with a disassembly example already, haven't we? We need os_atomic_
>
We need an atomic read rather than os_atomic_
reasons. An it will be atomic without using any helper functions. Since
I see no answer in the form "Because that will result in incorrect
values being used in case ...", I assume you don't have an answer to
that question.
> Are you also objecting to mutex protection here? If not, why? Note that the three n_flush values here are completely independent.
>
> mutex_enter(
>
> pending_io += buf_pool-
> pending_io += buf_pool-
> pending_io += buf_pool-
>
> mutex_exit(
>
I'm not objecting to mutex protection in that code. Why would I?
> And please don't group some of the valid answers with looney stuff.
>
>
>> 2. Why do only 2 out of 9 values are being loaded with
>> os_atomic_
>> ones need them?
>
>
> Upstream reads all 9 dirtily. I replaced two of them to be clean instead. Maybe I need to replace all 9. Maybe 0. But that's again orthogonal.
>
All 9 reads are atomic. But 7 of them don't use compiler barriers
because they don't need it. Neither do the remaining 2, but you are
quite creative to avoid accepting this simple fact.
Alexey Kopytov (akopytov) wrote : Posted in a previous version of this proposal | # |
On Sun, 15 Sep 2013 07:47:59 -0000, Laurynas Biveinis wrote:
> Alexey -
>
>
>> No, I was referring to buf_pool_
>> pool and examines (but not modifies) block->
>> block. However, the patch changes the assertion in
>> buf_pool_
>> rather than zip_mutex.
>
>
> Thanks for the pointer, I have reviewed buf_pool_
any locki
ng. If my reasoning is correct, I can remove the assert from this function. But maybe this should be documented better somehow?
>
Looks correct to me. Let's remove the zip_free_mutex assertion then?
>
>> In fact, in one of the code paths calling
>> buf_pool_
>> have a bug here?
>
>
> buf_buddy_
>
OK, thanks for clarification.
Laurynas Biveinis (laurynas-biveinis) wrote : Posted in a previous version of this proposal | # |
Alexey -
> > I have reviewed buf_pool_
...
> Looks correct to me. Let's remove the zip_free_mutex assertion then?
Yes.
Laurynas Biveinis (laurynas-biveinis) wrote : Posted in a previous version of this proposal | # |
Alexey -
> >>>> - Do you agree that os_atomic_
> >>>> not do what they promise to do?
> >>>
> >>>
> >>> Yes.
> >>
> >> OK, so we agree that naming is unfortunate.
> >
> >
> > Vanishingly slightly so, due to the alignment issue, which I believe is
> mostly theoretical, but nevertheless am ready to address now.
> >
>
> It doesn't do anything to enforce atomicity, does it? I.e. the following
> implementation would be equally "atomic":
>
> ulint
> os_atomic_
> {
> printf("Hello, world!\n");
> return(*ptr);
> }
Yes, it would be equally atomic (ignoring visibility and ordering) on x86_64 as long as pointer is aligned.
> >>>> - Do you agree that naming them os_ordered_
> >>>> os_ordered_
> >>>
> >>>
> >>> No.
> >>
> >> What would be a better naming then?
> >
> >
> > os_atomic_
> >
>
> No, it doesn't do anything to enforce atomicity. That is a caller's
> responsibility.
As in, don't pass misaligned values? In that case, yes, it is a caller's responsibility not to pass misaligned values. But where would InnoDB get a misaligned pointer to ulint from that we'd wish to access atomically? Hence enforcing alignment on debug build only seemed like a reasonable compromise, but OK, that's debatable.
> >> 1. Why do we need os_atomic_
> >> example? Here's an example of a valid answer: "Because that will result
> >> in incorrect values being used in case ...". And some examples of
> >> invalid answers: "non-cache-coherent architectures, visibility, memory
> >> model, sunspots, crop circles, global warming, ...".
> >
> >
> > We have gone through this with a disassembly example already, haven't we?
> We need os_atomic_
> well decide that a dirty read there is fine and then replace it. But that's
> orthogonal to what is this primitive and why.
> >
>
> We need an atomic read rather than os_atomic_
> reasons. An it will be atomic without using any helper functions.
OK, so is the problem that I wanted to introduce the primitives for such access, that would also document how is the variable accessed, and that they don't have to do much besides a compiler barrier on x86_64?
> Since
> I see no answer in the form "Because that will result in incorrect
> values being used in case ...", I assume you don't have an answer to
> that question.
I know that they are not resulting in incorrect values currently, and that the worst can happen in x86_64 with the most of possible future code changes is that the value loads could be moved earlier, resulting in more out-of-date values used. This and the fact that accessing the variable through the primitive serves as self-documentation seem good enough reasons to me.
> > Are you also objecting to mutex protection here? If not, why? Note that the
> three n_flush values here are completely independent.
> >
> > mutex_enter(
> >
> > pending_io += buf_pool-
Alexey Kopytov (akopytov) wrote : Posted in a previous version of this proposal | # |
On Mon, 16 Sep 2013 09:05:39 -0000, Laurynas Biveinis wrote:
> Alexey -
>
>
>>>>>> - Do you agree that os_atomic_
>>>>>> not do what they promise to do?
>>>>>
>>>>>
>>>>> Yes.
>>>>
>>>> OK, so we agree that naming is unfortunate.
>>>
>>>
>>> Vanishingly slightly so, due to the alignment issue, which I believe is
>> mostly theoretical, but nevertheless am ready to address now.
>>>
>>
>> It doesn't do anything to enforce atomicity, does it? I.e. the following
>> implementation would be equally "atomic":
>>
>> ulint
>> os_atomic_
>> {
>> printf("Hello, world!\n");
>> return(*ptr);
>> }
>
>
> Yes, it would be equally atomic (ignoring visibility and ordering) on x86_64 as long as pointer is aligned.
>
So... we don't need it after all?
>
>>>>>> - Do you agree that naming them os_ordered_
>>>>>> os_ordered_
>>>>>
>>>>>
>>>>> No.
>>>>
>>>> What would be a better naming then?
>>>
>>>
>>> os_atomic_
>>>
>>
>> No, it doesn't do anything to enforce atomicity. That is a caller's
>> responsibility.
>
>
> As in, don't pass misaligned values? In that case, yes, it is a caller's responsibility not to pass misaligned values. But where would InnoDB get a misaligned pointer to ulint from that we'd wish to access atomically? Hence enforcing alignment on debug build only seemed like a reasonable compromise, but OK, that's debatable.
>
>
>>>> 1. Why do we need os_atomic_
>>>> example? Here's an example of a valid answer: "Because that will result
>>>> in incorrect values being used in case ...". And some examples of
>>>> invalid answers: "non-cache-coherent architectures, visibility, memory
>>>> model, sunspots, crop circles, global warming, ...".
>>>
>>>
>>> We have gone through this with a disassembly example already, haven't we?
>> We need os_atomic_
>> well decide that a dirty read there is fine and then replace it. But that's
>> orthogonal to what is this primitive and why.
>>>
>>
>> We need an atomic read rather than os_atomic_
>> reasons. An it will be atomic without using any helper functions.
>
>
> OK, so is the problem that I wanted to introduce the primitives for such access, that would also document how is the variable accessed, and that they don't have to do much besides a compiler barrier on x86_64?
>
Yes, the problem is that you introduce primitives that basically do
nothing, and then use those primitives unnecessarily and inconsistently.
Which in turn leads to blowing up the patch size, increased maintenance
burden, and opens the door for wrong assumptions made when reading the
existing code and implementing new code.
>
>> Since
>> I see no answer in the form "Because that will result in incorrect
>> values being used in case ...", I assume you don't have an answer to
>> that question.
>
>
> I know that they are not resulting in incorrect values currently, and that the worst can happen in x86_64 with the most of possible future code changes is that the va...
Alexey Kopytov (akopytov) wrote : Posted in a previous version of this proposal | # |
More comments on the patch:
- typo (double “get”) in the updated buf_LRU_free_page() comments:
“function returns false, the buf_page_
- what’s the reason for changing buf_page_
in release builds? Unless I’m missing something in the current code,
it is only assigned, but never read in release builds?
- spurious blank line changes in buf_buddy_
buf_
buf_
buf_
and innodb_
- the following change in buf_block_
not release block_mutex if buf_LRU_free_page() returns false.
bpage = buf_page_
if (bpage) {
- buf_LRU_
+
+ ib_mutex_t* block_mutex = buf_page_
+
+ mutex_enter(
+
+ if (buf_LRU_
+
+ mutex_exit(
+ return;
+ }
}
- do you really need this change?
@@ -2114,8 +2098,8 @@ buf_page_get_zip(
break;
case BUF_BLOCK_ZIP_PAGE:
case BUF_BLOCK_
+ buf_enter_
block_mutex = &buf_pool-
- mutex_enter(
bpage-
goto got_block;
case BUF_BLOCK_
- in the following change:
@@ -2721,13 +2707,14 @@ buf_page_get_gen(
}
bpage = &block->page;
+ ut_ad(buf_
if (bpage-
|| buf_page_
/* This condition often occurs when the buffer
is not buffer-fixed, but I/O-fixed by
buf_
- mutex_exit(
+ mutex_exit(
wait_until_
/* The block is buffer-fixed or I/O-fixed.
Try again later. */
is there a reason for replacing block_mutex with zip_mutex? you
could just assert that block_mutex is zip_mutex next to, or even
instead of the buf_own_
- same comments for this change:
@@ -2737,11 +2724,11 @@ buf_page_get_gen(
}
/* Allocate an uncompressed page. */
- mutex_exit(
+ mutex_exit(
block = buf_LRU_
ut_a(block);
- buf_pool_
+ mutex_enter(
/* As we have released the page_hash lock and the
block_mutex to allocate an uncompressed page it is
- in buf_mark_
buf_
buf_
acquired?
- it looks like with the changes in buf_page_
== BUF_IO_WRITE we don’t set io_fix to BUF_IO_NONE, though we did
before the changes?
- more spurious changes:
@@ -475,8 +473,8 @@ buf_flush_
if (prev_b == NULL) {
UT_LIST_
...
Laurynas Biveinis (laurynas-biveinis) wrote : Posted in a previous version of this proposal | # |
Alexey -
Replying to this bit separately as it may need further discussion while I am addressing the rest of comments.
> - what’s the reason for changing buf_page_
> in release builds? Unless I’m missing something in the current code,
> it is only assigned, but never read in release builds?
This is another "automerge" from 5.5 and indeed serves no release
build purpose in the current 5.6 code. But it points out a
non-trivial thing. In 5.5 it is used as follows:
-) for checking whether a given page is still on the LRU list if both
block and LRU mutexes were temporarily released:
buf_
from 5.6).
-) for iterating through the LRU list without holding the LRU list
mutex at all: buf_LRU_
buf_
buf_
think this is unsafe and a bug in 5.5 due to page relocations
potentially resulting in wild pointers, even if it does wonders for
the LRU list contention. 5.6 holds the LRU list mutex in the corresponding code.
-) redundant checks, ie. on LRU list iteration where the mutex is not
released: buf_LRU_
buf_
Thus I think 1) in_LRU_list changes should be reverted now. 2) 5.5
might need fixing. 3) The LRU list mutex is hot in 5.6. If there is
a safe way not to hold it in 5.6 (for example, for
BUF_BLOCK_
dereferencing page pointer - maybe by comparing the page address
against buffer pool chunk address range?), then it's worth looking
into it.
Laurynas Biveinis (laurynas-biveinis) wrote : Posted in a previous version of this proposal | # |
Alexey -
> - typo (double “get”) in the updated buf_LRU_free_page() comments:
> “function returns false, the buf_page_
Fixed.
> - what’s the reason for changing buf_page_
> in release builds? Unless I’m missing something in the current code,
> it is only assigned, but never read in release builds?
in_LRU_list changes have been reverted with the exception of an extra
assert in buf_page_
was converted to a release build flag but no uses were converted.
Reverted that too.
> - spurious blank line changes in buf_buddy_
> buf_buddy_
Fixed.
> buf_page_get_gen(),
I didn't find this one. Will re-check before the final push.
> buf_page_
> buf_flush_
> buf_LRU_
> and innodb_
Fixed. Also removed a diagnostic printf from
buf_pool_
> - the following change in buf_block_
> not release block_mutex if buf_LRU_free_page() returns false.
Fixed.
> - do you really need this change?
>
> @@ -2114,8 +2098,8 @@ buf_page_get_zip(
> break;
> case BUF_BLOCK_ZIP_PAGE:
> case BUF_BLOCK_
> + buf_enter_
> block_mutex = &buf_pool-
> - mutex_enter(
> bpage->
> goto got_block;
> case BUF_BLOCK_
No, I don't. A debugging leftover, reverted.
> - in the following change:
>
> @@ -2721,13 +2707,14 @@ buf_page_get_gen(
> }
>
> bpage = &block->page;
> + ut_ad(buf_
>
> if (bpage-
> || buf_page_
> /* This condition often occurs when the buffer
> is not buffer-fixed, but I/O-fixed by
> buf_page_
> - mutex_exit(
> + mutex_exit(
> wait_until_unfixed:
> /* The block is buffer-fixed or I/O-fixed.
> Try again later. */
>
> is there a reason for replacing block_mutex with zip_mutex? you
> could just assert that block_mutex is zip_mutex next to, or even
> instead of the buf_own_
Yes. Replaced buf_own_
&buf_pool-
assert above for BUF_BLOCK_
too.
> - same comments for this change:
>
> @@ -2737,11 +2724,11 @@ buf_page_get_gen(
> }
>
> /* Allocate an uncompressed page. */
> - mutex_exit(
> + mutex_exit(
Laurynas Biveinis (laurynas-biveinis) wrote : Posted in a previous version of this proposal | # |
buf_enter_
Thread 1 (Thread 0x7f3acebfe700 (LWP 321)):
#0 __pthread_kill (threadid=
#1 0x0000000000ae2f45 in my_write_core (sig=6) at /mnt/workspace/
#2 0x000000000075eaa9 in handle_fatal_signal (sig=6) at /mnt/workspace/
#3 <signal handler called>
#4 0x00007f3ae835c475 in *__GI_raise (sig=<optimized out>) at ../nptl/
#5 0x00007f3ae835f6f0 in *__GI_abort () at abort.c:92
#6 0x0000000000d9b44c in buf_pool_from_bpage (bpage=0x33912c0) at /mnt/workspace/
#7 0x0000000000d9cb3c in buf_enter_
#8 0x0000000000da2e5b in buf_page_get_gen (space=431, zip_size=8192, offset=9, rw_latch=1, guess=0x0, mode=10, file=0x1109668 "/mnt/workspace
#9 0x0000000000d8c45a in btr_block_get_func (space=431, zip_size=8192, page_no=9, mode=1, file=0x1109668 "/mnt/workspace
#10 0x0000000000d8df82 in btr_pcur_
#11 0x0000000000df3a47 in btr_pcur_
#12 0x0000000000df5a6e in dict_stats_
Laurynas Biveinis (laurynas-biveinis) wrote : Posted in a previous version of this proposal | # |
Alexey -
> >>>>>> - Do you agree that os_atomic_
> >>>>>> not do what they promise to do?
> >>>>>
> >>>>>
> >>>>> Yes.
> >>>>
> >>>> OK, so we agree that naming is unfortunate.
> >>>
> >>>
> >>> Vanishingly slightly so, due to the alignment issue, which I believe is
> >> mostly theoretical, but nevertheless am ready to address now.
> >>>
> >>
> >> It doesn't do anything to enforce atomicity, does it? I.e. the following
> >> implementation would be equally "atomic":
> >>
> >> ulint
> >> os_atomic_
> >> {
> >> printf("Hello, world!\n");
> >> return(*ptr);
> >> }
> >
> >
> > Yes, it would be equally atomic (ignoring visibility and ordering) on x86_64
> as long as pointer is aligned.
> >
>
> So... we don't need it after all?
But we don't want to ignore visibility and ordering.
> >>>> 1. Why do we need os_atomic_
> >>>> example? Here's an example of a valid answer: "Because that will result
> >>>> in incorrect values being used in case ...". And some examples of
> >>>> invalid answers: "non-cache-coherent architectures, visibility, memory
> >>>> model, sunspots, crop circles, global warming, ...".
> >>>
> >>>
> >>> We have gone through this with a disassembly example already, haven't we?
> >> We need os_atomic_
> >> well decide that a dirty read there is fine and then replace it. But
> that's
> >> orthogonal to what is this primitive and why.
> >>>
> >>
> >> We need an atomic read rather than os_atomic_
> >> reasons. An it will be atomic without using any helper functions.
> >
> >
> > OK, so is the problem that I wanted to introduce the primitives for such
> access, that would also document how is the variable accessed, and that they
> don't have to do much besides a compiler barrier on x86_64?
> >
>
> Yes, the problem is that you introduce primitives that basically do
> nothing, and then use those primitives unnecessarily and inconsistently.
> Which in turn leads to blowing up the patch size, increased maintenance
> burden, and opens the door for wrong assumptions made when reading the
> existing code and implementing new code.
OK, now I see your concerns, and understand a big part of them but not all of them. What wrong assumptions are encouraged by the primitives?
> >> Since
> >> I see no answer in the form "Because that will result in incorrect
> >> values being used in case ...", I assume you don't have an answer to
> >> that question.
> >
> >
> > I know that they are not resulting in incorrect values currently, and that
> the worst can happen in x86_64 with the most of possible future code changes
> is that the value loads could be moved earlier, resulting in more out-of-date
> values used. This and the fact that accessing the variable through the
> primitive serves as self-documentation seem good enough reasons to me.
> >
>
> Whether they are more "out-of-date" or less "out-of-date" depends on the
> definition of "date". By defining "date" as the "point in time when the
> function is called", "more out-of-date" can be easily...
Alexey Kopytov (akopytov) wrote : Posted in a previous version of this proposal | # |
Hi Laurynas,
On Tue, 17 Sep 2013 12:41:18 -0000, Laurynas Biveinis wrote:
> Alexey -
>
>
>>>>>>>> - Do you agree that os_atomic_
>>>>>>>> not do what they promise to do?
>>>>>>>
>>>>>>>
>>>>>>> Yes.
>>>>>>
>>>>>> OK, so we agree that naming is unfortunate.
>>>>>
>>>>>
>>>>> Vanishingly slightly so, due to the alignment issue, which I believe is
>>>> mostly theoretical, but nevertheless am ready to address now.
>>>>>
>>>>
>>>> It doesn't do anything to enforce atomicity, does it? I.e. the following
>>>> implementation would be equally "atomic":
>>>>
>>>> ulint
>>>> os_atomic_
>>>> {
>>>> printf("Hello, world!\n");
>>>> return(*ptr);
>>>> }
>>>
>>>
>>> Yes, it would be equally atomic (ignoring visibility and ordering) on x86_64
>> as long as pointer is aligned.
>>>
>>
>> So... we don't need it after all?
>
>
> But we don't want to ignore visibility and ordering.
>
Yeah, you forgot non-cache-coherent architectures.
This discussion has been running in circles for almost a week now, and I
have a feeling you deliberately keep it this way. Since I have not been
presented any technical arguments for keeping that code, and have better
things to do, I'm going to wrap up the democrazy and stop it forcefully.
I will not approve this MP with "atomic" primitives present in the code.
The discussion is over.
Laurynas Biveinis (laurynas-biveinis) wrote : Posted in a previous version of this proposal | # |
Alexey -
> This discussion has been running in circles for almost a week now, and I
> have a feeling you deliberately keep it this way. Since I have not been
> presented any technical arguments for keeping that code, and have better
> things to do, I'm going to wrap up the democrazy and stop it forcefully.
>
> I will not approve this MP with "atomic" primitives present in the code.
> The discussion is over.
I did not do anything to deserve this kind of treatment. Your feeling on me keeping this deliberately is plain wrong (What do I have to gain? Minus one week of my copious time? Smug feeling of being right?). I have addressed every single your comment without hesitation and coming from an assumption that you are right in this MP and tens of MPs before.
I have tried my best to explain why the code is correct. You seem to disagree but I have trouble understanding why. I am well within in my rights to ask you to explain further and the burden is on you to show why the code is wrong. Hence your refusal to continue the review is stalling the review right now. Please continue the technical discussion.
Laurynas Biveinis (laurynas-biveinis) wrote : Posted in a previous version of this proposal | # |
Repushed. Changes from the 2nd push. Not a resubmission yet.
- Removed ut_ad(mutex_
buf_
- Fixed comment typos.
- Reverted in_LRU_list and in_unzip_LRU_list changes, with the
exception of an extra assert in buf_page_
- Reverted spurious whitespace changes.
- Removed spurious diagnostic printf from
buf_
- Fixed locking in buf_block_
- Reverted redundant locking changes in buf_page_get_zip() and
buf_
- Removed buf_enter_
- Removed os_atomic_
buf_
buf_
last instance in buf_read_
srv_
Alexey Kopytov (akopytov) : | # |
Preview Diff
1 | === modified file 'Percona-Server/storage/innobase/btr/btr0cur.cc' | |||
2 | --- Percona-Server/storage/innobase/btr/btr0cur.cc 2013-09-06 13:40:39 +0000 | |||
3 | +++ Percona-Server/storage/innobase/btr/btr0cur.cc 2013-09-20 05:29:11 +0000 | |||
4 | @@ -4503,12 +4503,14 @@ | |||
5 | 4503 | buf_pool_t* buf_pool = buf_pool_from_block(block); | 4503 | buf_pool_t* buf_pool = buf_pool_from_block(block); |
6 | 4504 | ulint space = buf_block_get_space(block); | 4504 | ulint space = buf_block_get_space(block); |
7 | 4505 | ulint page_no = buf_block_get_page_no(block); | 4505 | ulint page_no = buf_block_get_page_no(block); |
8 | 4506 | bool freed = false; | ||
9 | 4506 | 4507 | ||
10 | 4507 | ut_ad(mtr_memo_contains(mtr, block, MTR_MEMO_PAGE_X_FIX)); | 4508 | ut_ad(mtr_memo_contains(mtr, block, MTR_MEMO_PAGE_X_FIX)); |
11 | 4508 | 4509 | ||
12 | 4509 | mtr_commit(mtr); | 4510 | mtr_commit(mtr); |
13 | 4510 | 4511 | ||
15 | 4511 | buf_pool_mutex_enter(buf_pool); | 4512 | mutex_enter(&buf_pool->LRU_list_mutex); |
16 | 4513 | mutex_enter(&block->mutex); | ||
17 | 4512 | 4514 | ||
18 | 4513 | /* Only free the block if it is still allocated to | 4515 | /* Only free the block if it is still allocated to |
19 | 4514 | the same file page. */ | 4516 | the same file page. */ |
20 | @@ -4518,16 +4520,26 @@ | |||
21 | 4518 | && buf_block_get_space(block) == space | 4520 | && buf_block_get_space(block) == space |
22 | 4519 | && buf_block_get_page_no(block) == page_no) { | 4521 | && buf_block_get_page_no(block) == page_no) { |
23 | 4520 | 4522 | ||
26 | 4521 | if (!buf_LRU_free_page(&block->page, all) | 4523 | freed = buf_LRU_free_page(&block->page, all); |
27 | 4522 | && all && block->page.zip.data) { | 4524 | |
28 | 4525 | if (!freed && all && block->page.zip.data | ||
29 | 4526 | /* Now, buf_LRU_free_page() may release mutexes | ||
30 | 4527 | temporarily */ | ||
31 | 4528 | && buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE | ||
32 | 4529 | && buf_block_get_space(block) == space | ||
33 | 4530 | && buf_block_get_page_no(block) == page_no) { | ||
34 | 4531 | |||
35 | 4523 | /* Attempt to deallocate the uncompressed page | 4532 | /* Attempt to deallocate the uncompressed page |
36 | 4524 | if the whole block cannot be deallocted. */ | 4533 | if the whole block cannot be deallocted. */ |
39 | 4525 | 4534 | freed = buf_LRU_free_page(&block->page, false); | |
38 | 4526 | buf_LRU_free_page(&block->page, false); | ||
40 | 4527 | } | 4535 | } |
41 | 4528 | } | 4536 | } |
42 | 4529 | 4537 | ||
44 | 4530 | buf_pool_mutex_exit(buf_pool); | 4538 | if (!freed) { |
45 | 4539 | mutex_exit(&buf_pool->LRU_list_mutex); | ||
46 | 4540 | } | ||
47 | 4541 | |||
48 | 4542 | mutex_exit(&block->mutex); | ||
49 | 4531 | } | 4543 | } |
50 | 4532 | 4544 | ||
51 | 4533 | /*******************************************************************//** | 4545 | /*******************************************************************//** |
52 | 4534 | 4546 | ||
53 | === modified file 'Percona-Server/storage/innobase/btr/btr0sea.cc' | |||
54 | --- Percona-Server/storage/innobase/btr/btr0sea.cc 2013-09-06 13:40:39 +0000 | |||
55 | +++ Percona-Server/storage/innobase/btr/btr0sea.cc 2013-09-20 05:29:11 +0000 | |||
56 | @@ -1922,19 +1922,15 @@ | |||
57 | 1922 | 1922 | ||
58 | 1923 | rec_offs_init(offsets_); | 1923 | rec_offs_init(offsets_); |
59 | 1924 | 1924 | ||
60 | 1925 | buf_pool_mutex_enter_all(); | ||
61 | 1926 | |||
62 | 1927 | cell_count = hash_get_n_cells(btr_search_sys->hash_tables[t]); | 1925 | cell_count = hash_get_n_cells(btr_search_sys->hash_tables[t]); |
63 | 1928 | 1926 | ||
64 | 1929 | for (i = 0; i < cell_count; i++) { | 1927 | for (i = 0; i < cell_count; i++) { |
65 | 1930 | /* We release btr_search_latch every once in a while to | 1928 | /* We release btr_search_latch every once in a while to |
66 | 1931 | give other queries a chance to run. */ | 1929 | give other queries a chance to run. */ |
67 | 1932 | if ((i != 0) && ((i % chunk_size) == 0)) { | 1930 | if ((i != 0) && ((i % chunk_size) == 0)) { |
68 | 1933 | buf_pool_mutex_exit_all(); | ||
69 | 1934 | btr_search_x_unlock_all(); | 1931 | btr_search_x_unlock_all(); |
70 | 1935 | os_thread_yield(); | 1932 | os_thread_yield(); |
71 | 1936 | btr_search_x_lock_all(); | 1933 | btr_search_x_lock_all(); |
72 | 1937 | buf_pool_mutex_enter_all(); | ||
73 | 1938 | } | 1934 | } |
74 | 1939 | 1935 | ||
75 | 1940 | node = (ha_node_t*) | 1936 | node = (ha_node_t*) |
76 | @@ -1942,7 +1938,7 @@ | |||
77 | 1942 | i)->node; | 1938 | i)->node; |
78 | 1943 | 1939 | ||
79 | 1944 | for (; node != NULL; node = node->next) { | 1940 | for (; node != NULL; node = node->next) { |
81 | 1945 | const buf_block_t* block | 1941 | buf_block_t* block |
82 | 1946 | = buf_block_align((byte*) node->data); | 1942 | = buf_block_align((byte*) node->data); |
83 | 1947 | const buf_block_t* hash_block; | 1943 | const buf_block_t* hash_block; |
84 | 1948 | buf_pool_t* buf_pool; | 1944 | buf_pool_t* buf_pool; |
85 | @@ -1983,6 +1979,8 @@ | |||
86 | 1983 | == BUF_BLOCK_REMOVE_HASH); | 1979 | == BUF_BLOCK_REMOVE_HASH); |
87 | 1984 | } | 1980 | } |
88 | 1985 | 1981 | ||
89 | 1982 | mutex_enter(&block->mutex); | ||
90 | 1983 | |||
91 | 1986 | ut_a(!dict_index_is_ibuf(block->index)); | 1984 | ut_a(!dict_index_is_ibuf(block->index)); |
92 | 1987 | 1985 | ||
93 | 1988 | page_index_id = btr_page_get_index_id(block->frame); | 1986 | page_index_id = btr_page_get_index_id(block->frame); |
94 | @@ -2038,6 +2036,8 @@ | |||
95 | 2038 | n_page_dumps++; | 2036 | n_page_dumps++; |
96 | 2039 | } | 2037 | } |
97 | 2040 | } | 2038 | } |
98 | 2039 | |||
99 | 2040 | mutex_exit(&block->mutex); | ||
100 | 2041 | } | 2041 | } |
101 | 2042 | } | 2042 | } |
102 | 2043 | 2043 | ||
103 | @@ -2047,11 +2047,9 @@ | |||
104 | 2047 | /* We release btr_search_latch every once in a while to | 2047 | /* We release btr_search_latch every once in a while to |
105 | 2048 | give other queries a chance to run. */ | 2048 | give other queries a chance to run. */ |
106 | 2049 | if (i != 0) { | 2049 | if (i != 0) { |
107 | 2050 | buf_pool_mutex_exit_all(); | ||
108 | 2051 | btr_search_x_unlock_all(); | 2050 | btr_search_x_unlock_all(); |
109 | 2052 | os_thread_yield(); | 2051 | os_thread_yield(); |
110 | 2053 | btr_search_x_lock_all(); | 2052 | btr_search_x_lock_all(); |
111 | 2054 | buf_pool_mutex_enter_all(); | ||
112 | 2055 | } | 2053 | } |
113 | 2056 | 2054 | ||
114 | 2057 | if (!ha_validate(btr_search_sys->hash_tables[t], i, | 2055 | if (!ha_validate(btr_search_sys->hash_tables[t], i, |
115 | @@ -2060,7 +2058,6 @@ | |||
116 | 2060 | } | 2058 | } |
117 | 2061 | } | 2059 | } |
118 | 2062 | 2060 | ||
119 | 2063 | buf_pool_mutex_exit_all(); | ||
120 | 2064 | if (UNIV_LIKELY_NULL(heap)) { | 2061 | if (UNIV_LIKELY_NULL(heap)) { |
121 | 2065 | mem_heap_free(heap); | 2062 | mem_heap_free(heap); |
122 | 2066 | } | 2063 | } |
123 | 2067 | 2064 | ||
124 | === modified file 'Percona-Server/storage/innobase/buf/buf0buddy.cc' | |||
125 | --- Percona-Server/storage/innobase/buf/buf0buddy.cc 2013-08-06 15:16:34 +0000 | |||
126 | +++ Percona-Server/storage/innobase/buf/buf0buddy.cc 2013-09-20 05:29:11 +0000 | |||
127 | @@ -205,7 +205,7 @@ | |||
128 | 205 | { | 205 | { |
129 | 206 | const ulint size = BUF_BUDDY_LOW << i; | 206 | const ulint size = BUF_BUDDY_LOW << i; |
130 | 207 | 207 | ||
132 | 208 | ut_ad(buf_pool_mutex_own(buf_pool)); | 208 | ut_ad(mutex_own(&buf_pool->zip_free_mutex)); |
133 | 209 | ut_ad(!ut_align_offset(buf, size)); | 209 | ut_ad(!ut_align_offset(buf, size)); |
134 | 210 | ut_ad(i >= buf_buddy_get_slot(UNIV_ZIP_SIZE_MIN)); | 210 | ut_ad(i >= buf_buddy_get_slot(UNIV_ZIP_SIZE_MIN)); |
135 | 211 | 211 | ||
136 | @@ -278,7 +278,7 @@ | |||
137 | 278 | ulint i) /*!< in: index of | 278 | ulint i) /*!< in: index of |
138 | 279 | buf_pool->zip_free[] */ | 279 | buf_pool->zip_free[] */ |
139 | 280 | { | 280 | { |
141 | 281 | ut_ad(buf_pool_mutex_own(buf_pool)); | 281 | ut_ad(mutex_own(&buf_pool->zip_free_mutex)); |
142 | 282 | ut_ad(buf_pool->zip_free[i].start != buf); | 282 | ut_ad(buf_pool->zip_free[i].start != buf); |
143 | 283 | 283 | ||
144 | 284 | buf_buddy_stamp_free(buf, i); | 284 | buf_buddy_stamp_free(buf, i); |
145 | @@ -297,7 +297,7 @@ | |||
146 | 297 | ulint i) /*!< in: index of | 297 | ulint i) /*!< in: index of |
147 | 298 | buf_pool->zip_free[] */ | 298 | buf_pool->zip_free[] */ |
148 | 299 | { | 299 | { |
150 | 300 | ut_ad(buf_pool_mutex_own(buf_pool)); | 300 | ut_ad(mutex_own(&buf_pool->zip_free_mutex)); |
151 | 301 | ut_ad(buf_buddy_check_free(buf_pool, buf, i)); | 301 | ut_ad(buf_buddy_check_free(buf_pool, buf, i)); |
152 | 302 | 302 | ||
153 | 303 | UT_LIST_REMOVE(list, buf_pool->zip_free[i], buf); | 303 | UT_LIST_REMOVE(list, buf_pool->zip_free[i], buf); |
154 | @@ -316,7 +316,7 @@ | |||
155 | 316 | { | 316 | { |
156 | 317 | buf_buddy_free_t* buf; | 317 | buf_buddy_free_t* buf; |
157 | 318 | 318 | ||
159 | 319 | ut_ad(buf_pool_mutex_own(buf_pool)); | 319 | ut_ad(mutex_own(&buf_pool->zip_free_mutex)); |
160 | 320 | ut_a(i < BUF_BUDDY_SIZES); | 320 | ut_a(i < BUF_BUDDY_SIZES); |
161 | 321 | ut_a(i >= buf_buddy_get_slot(UNIV_ZIP_SIZE_MIN)); | 321 | ut_a(i >= buf_buddy_get_slot(UNIV_ZIP_SIZE_MIN)); |
162 | 322 | 322 | ||
163 | @@ -369,10 +369,11 @@ | |||
164 | 369 | buf_page_t* bpage; | 369 | buf_page_t* bpage; |
165 | 370 | buf_block_t* block; | 370 | buf_block_t* block; |
166 | 371 | 371 | ||
167 | 372 | ut_ad(buf_pool_mutex_own(buf_pool)); | ||
168 | 373 | ut_ad(!mutex_own(&buf_pool->zip_mutex)); | 372 | ut_ad(!mutex_own(&buf_pool->zip_mutex)); |
169 | 374 | ut_a(!ut_align_offset(buf, UNIV_PAGE_SIZE)); | 373 | ut_a(!ut_align_offset(buf, UNIV_PAGE_SIZE)); |
170 | 375 | 374 | ||
171 | 375 | mutex_enter(&buf_pool->zip_hash_mutex); | ||
172 | 376 | |||
173 | 376 | HASH_SEARCH(hash, buf_pool->zip_hash, fold, buf_page_t*, bpage, | 377 | HASH_SEARCH(hash, buf_pool->zip_hash, fold, buf_page_t*, bpage, |
174 | 377 | ut_ad(buf_page_get_state(bpage) == BUF_BLOCK_MEMORY | 378 | ut_ad(buf_page_get_state(bpage) == BUF_BLOCK_MEMORY |
175 | 378 | && bpage->in_zip_hash && !bpage->in_page_hash), | 379 | && bpage->in_zip_hash && !bpage->in_page_hash), |
176 | @@ -384,6 +385,8 @@ | |||
177 | 384 | ut_d(bpage->in_zip_hash = FALSE); | 385 | ut_d(bpage->in_zip_hash = FALSE); |
178 | 385 | HASH_DELETE(buf_page_t, hash, buf_pool->zip_hash, fold, bpage); | 386 | HASH_DELETE(buf_page_t, hash, buf_pool->zip_hash, fold, bpage); |
179 | 386 | 387 | ||
180 | 388 | mutex_exit(&buf_pool->zip_hash_mutex); | ||
181 | 389 | |||
182 | 387 | ut_d(memset(buf, 0, UNIV_PAGE_SIZE)); | 390 | ut_d(memset(buf, 0, UNIV_PAGE_SIZE)); |
183 | 388 | UNIV_MEM_INVALID(buf, UNIV_PAGE_SIZE); | 391 | UNIV_MEM_INVALID(buf, UNIV_PAGE_SIZE); |
184 | 389 | 392 | ||
185 | @@ -406,7 +409,6 @@ | |||
186 | 406 | { | 409 | { |
187 | 407 | buf_pool_t* buf_pool = buf_pool_from_block(block); | 410 | buf_pool_t* buf_pool = buf_pool_from_block(block); |
188 | 408 | const ulint fold = BUF_POOL_ZIP_FOLD(block); | 411 | const ulint fold = BUF_POOL_ZIP_FOLD(block); |
189 | 409 | ut_ad(buf_pool_mutex_own(buf_pool)); | ||
190 | 410 | ut_ad(!mutex_own(&buf_pool->zip_mutex)); | 412 | ut_ad(!mutex_own(&buf_pool->zip_mutex)); |
191 | 411 | ut_ad(buf_block_get_state(block) == BUF_BLOCK_READY_FOR_USE); | 413 | ut_ad(buf_block_get_state(block) == BUF_BLOCK_READY_FOR_USE); |
192 | 412 | 414 | ||
193 | @@ -418,7 +420,10 @@ | |||
194 | 418 | ut_ad(!block->page.in_page_hash); | 420 | ut_ad(!block->page.in_page_hash); |
195 | 419 | ut_ad(!block->page.in_zip_hash); | 421 | ut_ad(!block->page.in_zip_hash); |
196 | 420 | ut_d(block->page.in_zip_hash = TRUE); | 422 | ut_d(block->page.in_zip_hash = TRUE); |
197 | 423 | |||
198 | 424 | mutex_enter(&buf_pool->zip_hash_mutex); | ||
199 | 421 | HASH_INSERT(buf_page_t, hash, buf_pool->zip_hash, fold, &block->page); | 425 | HASH_INSERT(buf_page_t, hash, buf_pool->zip_hash, fold, &block->page); |
200 | 426 | mutex_exit(&buf_pool->zip_hash_mutex); | ||
201 | 422 | 427 | ||
202 | 423 | ut_d(buf_pool->buddy_n_frames++); | 428 | ut_d(buf_pool->buddy_n_frames++); |
203 | 424 | } | 429 | } |
204 | @@ -438,6 +443,7 @@ | |||
205 | 438 | of buf_pool->zip_free[] */ | 443 | of buf_pool->zip_free[] */ |
206 | 439 | { | 444 | { |
207 | 440 | ulint offs = BUF_BUDDY_LOW << j; | 445 | ulint offs = BUF_BUDDY_LOW << j; |
208 | 446 | ut_ad(mutex_own(&buf_pool->zip_free_mutex)); | ||
209 | 441 | ut_ad(j <= BUF_BUDDY_SIZES); | 447 | ut_ad(j <= BUF_BUDDY_SIZES); |
210 | 442 | ut_ad(i >= buf_buddy_get_slot(UNIV_ZIP_SIZE_MIN)); | 448 | ut_ad(i >= buf_buddy_get_slot(UNIV_ZIP_SIZE_MIN)); |
211 | 443 | ut_ad(j >= i); | 449 | ut_ad(j >= i); |
212 | @@ -461,8 +467,8 @@ | |||
213 | 461 | 467 | ||
214 | 462 | /**********************************************************************//** | 468 | /**********************************************************************//** |
215 | 463 | Allocate a block. The thread calling this function must hold | 469 | Allocate a block. The thread calling this function must hold |
218 | 464 | buf_pool->mutex and must not hold buf_pool->zip_mutex or any block->mutex. | 470 | buf_pool->LRU_list_mutex and must not hold buf_pool->zip_mutex or any |
219 | 465 | The buf_pool_mutex may be released and reacquired. | 471 | block->mutex. The buf_pool->LRU_list_mutex may be released and reacquired. |
220 | 466 | @return allocated block, never NULL */ | 472 | @return allocated block, never NULL */ |
221 | 467 | UNIV_INTERN | 473 | UNIV_INTERN |
222 | 468 | void* | 474 | void* |
223 | @@ -474,23 +480,25 @@ | |||
224 | 474 | ibool* lru) /*!< in: pointer to a variable that | 480 | ibool* lru) /*!< in: pointer to a variable that |
225 | 475 | will be assigned TRUE if storage was | 481 | will be assigned TRUE if storage was |
226 | 476 | allocated from the LRU list and | 482 | allocated from the LRU list and |
229 | 477 | buf_pool->mutex was temporarily | 483 | buf_pool->LRU_list_mutex was |
230 | 478 | released */ | 484 | temporarily released */ |
231 | 479 | { | 485 | { |
232 | 480 | buf_block_t* block; | 486 | buf_block_t* block; |
233 | 481 | 487 | ||
234 | 482 | ut_ad(lru); | 488 | ut_ad(lru); |
236 | 483 | ut_ad(buf_pool_mutex_own(buf_pool)); | 489 | ut_ad(mutex_own(&buf_pool->LRU_list_mutex)); |
237 | 484 | ut_ad(!mutex_own(&buf_pool->zip_mutex)); | 490 | ut_ad(!mutex_own(&buf_pool->zip_mutex)); |
238 | 485 | ut_ad(i >= buf_buddy_get_slot(UNIV_ZIP_SIZE_MIN)); | 491 | ut_ad(i >= buf_buddy_get_slot(UNIV_ZIP_SIZE_MIN)); |
239 | 486 | 492 | ||
240 | 487 | if (i < BUF_BUDDY_SIZES) { | 493 | if (i < BUF_BUDDY_SIZES) { |
241 | 488 | /* Try to allocate from the buddy system. */ | 494 | /* Try to allocate from the buddy system. */ |
242 | 495 | mutex_enter(&buf_pool->zip_free_mutex); | ||
243 | 489 | block = (buf_block_t*) buf_buddy_alloc_zip(buf_pool, i); | 496 | block = (buf_block_t*) buf_buddy_alloc_zip(buf_pool, i); |
244 | 490 | 497 | ||
245 | 491 | if (block) { | 498 | if (block) { |
246 | 492 | goto func_exit; | 499 | goto func_exit; |
247 | 493 | } | 500 | } |
248 | 501 | mutex_exit(&buf_pool->zip_free_mutex); | ||
249 | 494 | } | 502 | } |
250 | 495 | 503 | ||
251 | 496 | /* Try allocating from the buf_pool->free list. */ | 504 | /* Try allocating from the buf_pool->free list. */ |
252 | @@ -502,24 +510,28 @@ | |||
253 | 502 | } | 510 | } |
254 | 503 | 511 | ||
255 | 504 | /* Try replacing an uncompressed page in the buffer pool. */ | 512 | /* Try replacing an uncompressed page in the buffer pool. */ |
257 | 505 | buf_pool_mutex_exit(buf_pool); | 513 | mutex_exit(&buf_pool->LRU_list_mutex); |
258 | 506 | block = buf_LRU_get_free_block(buf_pool); | 514 | block = buf_LRU_get_free_block(buf_pool); |
259 | 507 | *lru = TRUE; | 515 | *lru = TRUE; |
261 | 508 | buf_pool_mutex_enter(buf_pool); | 516 | mutex_enter(&buf_pool->LRU_list_mutex); |
262 | 509 | 517 | ||
263 | 510 | alloc_big: | 518 | alloc_big: |
264 | 511 | buf_buddy_block_register(block); | 519 | buf_buddy_block_register(block); |
265 | 512 | 520 | ||
266 | 521 | mutex_enter(&buf_pool->zip_free_mutex); | ||
267 | 513 | block = (buf_block_t*) buf_buddy_alloc_from( | 522 | block = (buf_block_t*) buf_buddy_alloc_from( |
268 | 514 | buf_pool, block->frame, i, BUF_BUDDY_SIZES); | 523 | buf_pool, block->frame, i, BUF_BUDDY_SIZES); |
269 | 515 | 524 | ||
270 | 516 | func_exit: | 525 | func_exit: |
271 | 517 | buf_pool->buddy_stat[i].used++; | 526 | buf_pool->buddy_stat[i].used++; |
272 | 527 | mutex_exit(&buf_pool->zip_free_mutex); | ||
273 | 528 | |||
274 | 518 | return(block); | 529 | return(block); |
275 | 519 | } | 530 | } |
276 | 520 | 531 | ||
277 | 521 | /**********************************************************************//** | 532 | /**********************************************************************//** |
279 | 522 | Try to relocate a block. | 533 | Try to relocate a block. The caller must hold zip_free_mutex, and this |
280 | 534 | function will release and lock it again. | ||
281 | 523 | @return true if relocated */ | 535 | @return true if relocated */ |
282 | 524 | static | 536 | static |
283 | 525 | bool | 537 | bool |
284 | @@ -536,8 +548,9 @@ | |||
285 | 536 | ib_mutex_t* mutex; | 548 | ib_mutex_t* mutex; |
286 | 537 | ulint space; | 549 | ulint space; |
287 | 538 | ulint offset; | 550 | ulint offset; |
288 | 551 | rw_lock_t* hash_lock; | ||
289 | 539 | 552 | ||
291 | 540 | ut_ad(buf_pool_mutex_own(buf_pool)); | 553 | ut_ad(mutex_own(&buf_pool->zip_free_mutex)); |
292 | 541 | ut_ad(!mutex_own(&buf_pool->zip_mutex)); | 554 | ut_ad(!mutex_own(&buf_pool->zip_mutex)); |
293 | 542 | ut_ad(!ut_align_offset(src, size)); | 555 | ut_ad(!ut_align_offset(src, size)); |
294 | 543 | ut_ad(!ut_align_offset(dst, size)); | 556 | ut_ad(!ut_align_offset(dst, size)); |
295 | @@ -556,7 +569,9 @@ | |||
296 | 556 | 569 | ||
297 | 557 | ut_ad(space != BUF_BUDDY_STAMP_FREE); | 570 | ut_ad(space != BUF_BUDDY_STAMP_FREE); |
298 | 558 | 571 | ||
300 | 559 | bpage = buf_page_hash_get(buf_pool, space, offset); | 572 | mutex_exit(&buf_pool->zip_free_mutex); |
301 | 573 | /* Lock page hash to prevent a relocation for the target page */ | ||
302 | 574 | bpage = buf_page_hash_get_s_locked(buf_pool, space, offset, &hash_lock); | ||
303 | 560 | 575 | ||
304 | 561 | if (!bpage || bpage->zip.data != src) { | 576 | if (!bpage || bpage->zip.data != src) { |
305 | 562 | /* The block has probably been freshly | 577 | /* The block has probably been freshly |
306 | @@ -564,6 +579,10 @@ | |||
307 | 564 | added to buf_pool->page_hash yet. Obviously, | 579 | added to buf_pool->page_hash yet. Obviously, |
308 | 565 | it cannot be relocated. */ | 580 | it cannot be relocated. */ |
309 | 566 | 581 | ||
310 | 582 | if (bpage) { | ||
311 | 583 | rw_lock_s_unlock(hash_lock); | ||
312 | 584 | } | ||
313 | 585 | mutex_enter(&buf_pool->zip_free_mutex); | ||
314 | 567 | return(false); | 586 | return(false); |
315 | 568 | } | 587 | } |
316 | 569 | 588 | ||
317 | @@ -573,6 +592,8 @@ | |||
318 | 573 | For the sake of simplicity, give up. */ | 592 | For the sake of simplicity, give up. */ |
319 | 574 | ut_ad(page_zip_get_size(&bpage->zip) < size); | 593 | ut_ad(page_zip_get_size(&bpage->zip) < size); |
320 | 575 | 594 | ||
321 | 595 | rw_lock_s_unlock(hash_lock); | ||
322 | 596 | mutex_enter(&buf_pool->zip_free_mutex); | ||
323 | 576 | return(false); | 597 | return(false); |
324 | 577 | } | 598 | } |
325 | 578 | 599 | ||
326 | @@ -584,6 +605,10 @@ | |||
327 | 584 | 605 | ||
328 | 585 | mutex_enter(mutex); | 606 | mutex_enter(mutex); |
329 | 586 | 607 | ||
330 | 608 | rw_lock_s_unlock(hash_lock); | ||
331 | 609 | |||
332 | 610 | mutex_enter(&buf_pool->zip_free_mutex); | ||
333 | 611 | |||
334 | 587 | if (buf_page_can_relocate(bpage)) { | 612 | if (buf_page_can_relocate(bpage)) { |
335 | 588 | /* Relocate the compressed page. */ | 613 | /* Relocate the compressed page. */ |
336 | 589 | ullint usec = ut_time_us(NULL); | 614 | ullint usec = ut_time_us(NULL); |
337 | @@ -618,17 +643,19 @@ | |||
338 | 618 | { | 643 | { |
339 | 619 | buf_buddy_free_t* buddy; | 644 | buf_buddy_free_t* buddy; |
340 | 620 | 645 | ||
341 | 621 | ut_ad(buf_pool_mutex_own(buf_pool)); | ||
342 | 622 | ut_ad(!mutex_own(&buf_pool->zip_mutex)); | 646 | ut_ad(!mutex_own(&buf_pool->zip_mutex)); |
343 | 623 | ut_ad(i <= BUF_BUDDY_SIZES); | 647 | ut_ad(i <= BUF_BUDDY_SIZES); |
344 | 624 | ut_ad(i >= buf_buddy_get_slot(UNIV_ZIP_SIZE_MIN)); | 648 | ut_ad(i >= buf_buddy_get_slot(UNIV_ZIP_SIZE_MIN)); |
345 | 649 | |||
346 | 650 | mutex_enter(&buf_pool->zip_free_mutex); | ||
347 | 651 | |||
348 | 625 | ut_ad(buf_pool->buddy_stat[i].used > 0); | 652 | ut_ad(buf_pool->buddy_stat[i].used > 0); |
349 | 626 | |||
350 | 627 | buf_pool->buddy_stat[i].used--; | 653 | buf_pool->buddy_stat[i].used--; |
351 | 628 | recombine: | 654 | recombine: |
352 | 629 | UNIV_MEM_ASSERT_AND_ALLOC(buf, BUF_BUDDY_LOW << i); | 655 | UNIV_MEM_ASSERT_AND_ALLOC(buf, BUF_BUDDY_LOW << i); |
353 | 630 | 656 | ||
354 | 631 | if (i == BUF_BUDDY_SIZES) { | 657 | if (i == BUF_BUDDY_SIZES) { |
355 | 658 | mutex_exit(&buf_pool->zip_free_mutex); | ||
356 | 632 | buf_buddy_block_free(buf_pool, buf); | 659 | buf_buddy_block_free(buf_pool, buf); |
357 | 633 | return; | 660 | return; |
358 | 634 | } | 661 | } |
359 | @@ -695,4 +722,5 @@ | |||
360 | 695 | buf_buddy_add_to_free(buf_pool, | 722 | buf_buddy_add_to_free(buf_pool, |
361 | 696 | reinterpret_cast<buf_buddy_free_t*>(buf), | 723 | reinterpret_cast<buf_buddy_free_t*>(buf), |
362 | 697 | i); | 724 | i); |
363 | 725 | mutex_exit(&buf_pool->zip_free_mutex); | ||
364 | 698 | } | 726 | } |
365 | 699 | 727 | ||
366 | === modified file 'Percona-Server/storage/innobase/buf/buf0buf.cc' | |||
367 | --- Percona-Server/storage/innobase/buf/buf0buf.cc 2013-09-02 10:01:38 +0000 | |||
368 | +++ Percona-Server/storage/innobase/buf/buf0buf.cc 2013-09-20 05:29:11 +0000 | |||
369 | @@ -119,24 +119,9 @@ | |||
370 | 119 | 119 | ||
371 | 120 | Buffer pool struct | 120 | Buffer pool struct |
372 | 121 | ------------------ | 121 | ------------------ |
374 | 122 | The buffer buf_pool contains a single mutex which protects all the | 122 | The buffer buf_pool contains several mutexes which protect all the |
375 | 123 | control data structures of the buf_pool. The content of a buffer frame is | 123 | control data structures of the buf_pool. The content of a buffer frame is |
376 | 124 | protected by a separate read-write lock in its control block, though. | 124 | protected by a separate read-write lock in its control block, though. |
377 | 125 | These locks can be locked and unlocked without owning the buf_pool->mutex. | ||
378 | 126 | The OS events in the buf_pool struct can be waited for without owning the | ||
379 | 127 | buf_pool->mutex. | ||
380 | 128 | |||
381 | 129 | The buf_pool->mutex is a hot-spot in main memory, causing a lot of | ||
382 | 130 | memory bus traffic on multiprocessor systems when processors | ||
383 | 131 | alternately access the mutex. On our Pentium, the mutex is accessed | ||
384 | 132 | maybe every 10 microseconds. We gave up the solution to have mutexes | ||
385 | 133 | for each control block, for instance, because it seemed to be | ||
386 | 134 | complicated. | ||
387 | 135 | |||
388 | 136 | A solution to reduce mutex contention of the buf_pool->mutex is to | ||
389 | 137 | create a separate mutex for the page hash table. On Pentium, | ||
390 | 138 | accessing the hash table takes 2 microseconds, about half | ||
391 | 139 | of the total buf_pool->mutex hold time. | ||
392 | 140 | 125 | ||
393 | 141 | Control blocks | 126 | Control blocks |
394 | 142 | -------------- | 127 | -------------- |
395 | @@ -311,6 +296,11 @@ | |||
396 | 311 | UNIV_INTERN mysql_pfs_key_t buffer_block_mutex_key; | 296 | UNIV_INTERN mysql_pfs_key_t buffer_block_mutex_key; |
397 | 312 | UNIV_INTERN mysql_pfs_key_t buf_pool_mutex_key; | 297 | UNIV_INTERN mysql_pfs_key_t buf_pool_mutex_key; |
398 | 313 | UNIV_INTERN mysql_pfs_key_t buf_pool_zip_mutex_key; | 298 | UNIV_INTERN mysql_pfs_key_t buf_pool_zip_mutex_key; |
399 | 299 | UNIV_INTERN mysql_pfs_key_t buf_pool_flush_state_mutex_key; | ||
400 | 300 | UNIV_INTERN mysql_pfs_key_t buf_pool_LRU_list_mutex_key; | ||
401 | 301 | UNIV_INTERN mysql_pfs_key_t buf_pool_free_list_mutex_key; | ||
402 | 302 | UNIV_INTERN mysql_pfs_key_t buf_pool_zip_free_mutex_key; | ||
403 | 303 | UNIV_INTERN mysql_pfs_key_t buf_pool_zip_hash_mutex_key; | ||
404 | 314 | UNIV_INTERN mysql_pfs_key_t flush_list_mutex_key; | 304 | UNIV_INTERN mysql_pfs_key_t flush_list_mutex_key; |
405 | 315 | #endif /* UNIV_PFS_MUTEX */ | 305 | #endif /* UNIV_PFS_MUTEX */ |
406 | 316 | 306 | ||
407 | @@ -1186,7 +1176,6 @@ | |||
408 | 1186 | buf_chunk_t* chunk = buf_pool->chunks; | 1176 | buf_chunk_t* chunk = buf_pool->chunks; |
409 | 1187 | 1177 | ||
410 | 1188 | ut_ad(buf_pool); | 1178 | ut_ad(buf_pool); |
411 | 1189 | ut_ad(buf_pool_mutex_own(buf_pool)); | ||
412 | 1190 | for (n = buf_pool->n_chunks; n--; chunk++) { | 1179 | for (n = buf_pool->n_chunks; n--; chunk++) { |
413 | 1191 | 1180 | ||
414 | 1192 | buf_block_t* block = buf_chunk_contains_zip(chunk, data); | 1181 | buf_block_t* block = buf_chunk_contains_zip(chunk, data); |
415 | @@ -1265,8 +1254,6 @@ | |||
416 | 1265 | ulint i; | 1254 | ulint i; |
417 | 1266 | ulint curr_size = 0; | 1255 | ulint curr_size = 0; |
418 | 1267 | 1256 | ||
419 | 1268 | buf_pool_mutex_enter_all(); | ||
420 | 1269 | |||
421 | 1270 | for (i = 0; i < srv_buf_pool_instances; i++) { | 1257 | for (i = 0; i < srv_buf_pool_instances; i++) { |
422 | 1271 | buf_pool_t* buf_pool; | 1258 | buf_pool_t* buf_pool; |
423 | 1272 | 1259 | ||
424 | @@ -1276,8 +1263,6 @@ | |||
425 | 1276 | 1263 | ||
426 | 1277 | srv_buf_pool_curr_size = curr_size; | 1264 | srv_buf_pool_curr_size = curr_size; |
427 | 1278 | srv_buf_pool_old_size = srv_buf_pool_size; | 1265 | srv_buf_pool_old_size = srv_buf_pool_size; |
428 | 1279 | |||
429 | 1280 | buf_pool_mutex_exit_all(); | ||
430 | 1281 | } | 1266 | } |
431 | 1282 | 1267 | ||
432 | 1283 | /********************************************************************//** | 1268 | /********************************************************************//** |
433 | @@ -1297,12 +1282,18 @@ | |||
434 | 1297 | 1282 | ||
435 | 1298 | /* 1. Initialize general fields | 1283 | /* 1. Initialize general fields |
436 | 1299 | ------------------------------- */ | 1284 | ------------------------------- */ |
439 | 1300 | mutex_create(buf_pool_mutex_key, | 1285 | mutex_create(buf_pool_LRU_list_mutex_key, |
440 | 1301 | &buf_pool->mutex, SYNC_BUF_POOL); | 1286 | &buf_pool->LRU_list_mutex, SYNC_BUF_LRU_LIST); |
441 | 1287 | mutex_create(buf_pool_free_list_mutex_key, | ||
442 | 1288 | &buf_pool->free_list_mutex, SYNC_BUF_FREE_LIST); | ||
443 | 1289 | mutex_create(buf_pool_zip_free_mutex_key, | ||
444 | 1290 | &buf_pool->zip_free_mutex, SYNC_BUF_ZIP_FREE); | ||
445 | 1291 | mutex_create(buf_pool_zip_hash_mutex_key, | ||
446 | 1292 | &buf_pool->zip_hash_mutex, SYNC_BUF_ZIP_HASH); | ||
447 | 1302 | mutex_create(buf_pool_zip_mutex_key, | 1293 | mutex_create(buf_pool_zip_mutex_key, |
448 | 1303 | &buf_pool->zip_mutex, SYNC_BUF_BLOCK); | 1294 | &buf_pool->zip_mutex, SYNC_BUF_BLOCK); |
451 | 1304 | 1295 | mutex_create(buf_pool_flush_state_mutex_key, | |
452 | 1305 | buf_pool_mutex_enter(buf_pool); | 1296 | &buf_pool->flush_state_mutex, SYNC_BUF_FLUSH_STATE); |
453 | 1306 | 1297 | ||
454 | 1307 | if (buf_pool_size > 0) { | 1298 | if (buf_pool_size > 0) { |
455 | 1308 | buf_pool->n_chunks = 1; | 1299 | buf_pool->n_chunks = 1; |
456 | @@ -1316,8 +1307,6 @@ | |||
457 | 1316 | mem_free(chunk); | 1307 | mem_free(chunk); |
458 | 1317 | mem_free(buf_pool); | 1308 | mem_free(buf_pool); |
459 | 1318 | 1309 | ||
460 | 1319 | buf_pool_mutex_exit(buf_pool); | ||
461 | 1320 | |||
462 | 1321 | return(DB_ERROR); | 1310 | return(DB_ERROR); |
463 | 1322 | } | 1311 | } |
464 | 1323 | 1312 | ||
465 | @@ -1361,8 +1350,6 @@ | |||
466 | 1361 | 1350 | ||
467 | 1362 | buf_pool->try_LRU_scan = TRUE; | 1351 | buf_pool->try_LRU_scan = TRUE; |
468 | 1363 | 1352 | ||
469 | 1364 | buf_pool_mutex_exit(buf_pool); | ||
470 | 1365 | |||
471 | 1366 | return(DB_SUCCESS); | 1353 | return(DB_SUCCESS); |
472 | 1367 | } | 1354 | } |
473 | 1368 | 1355 | ||
474 | @@ -1537,7 +1524,7 @@ | |||
475 | 1537 | 1524 | ||
476 | 1538 | fold = buf_page_address_fold(bpage->space, bpage->offset); | 1525 | fold = buf_page_address_fold(bpage->space, bpage->offset); |
477 | 1539 | 1526 | ||
479 | 1540 | ut_ad(buf_pool_mutex_own(buf_pool)); | 1527 | ut_ad(mutex_own(&buf_pool->LRU_list_mutex)); |
480 | 1541 | ut_ad(buf_page_hash_lock_held_x(buf_pool, bpage)); | 1528 | ut_ad(buf_page_hash_lock_held_x(buf_pool, bpage)); |
481 | 1542 | ut_ad(mutex_own(buf_page_get_mutex(bpage))); | 1529 | ut_ad(mutex_own(buf_page_get_mutex(bpage))); |
482 | 1543 | ut_a(buf_page_get_io_fix(bpage) == BUF_IO_NONE); | 1530 | ut_a(buf_page_get_io_fix(bpage) == BUF_IO_NONE); |
483 | @@ -1670,14 +1657,14 @@ | |||
484 | 1670 | return(bpage); | 1657 | return(bpage); |
485 | 1671 | } | 1658 | } |
486 | 1672 | /* Add to an existing watch. */ | 1659 | /* Add to an existing watch. */ |
487 | 1660 | mutex_enter(&buf_pool->zip_mutex); | ||
488 | 1673 | bpage->buf_fix_count++; | 1661 | bpage->buf_fix_count++; |
489 | 1662 | mutex_exit(&buf_pool->zip_mutex); | ||
490 | 1674 | return(NULL); | 1663 | return(NULL); |
491 | 1675 | } | 1664 | } |
492 | 1676 | 1665 | ||
493 | 1677 | /* From this point this function becomes fairly heavy in terms | 1666 | /* From this point this function becomes fairly heavy in terms |
497 | 1678 | of latching. We acquire the buf_pool mutex as well as all the | 1667 | of latching. We acquire all the hash_locks. They are needed |
495 | 1679 | hash_locks. buf_pool mutex is needed because any changes to | ||
496 | 1680 | the page_hash must be covered by it and hash_locks are needed | ||
498 | 1681 | because we don't want to read any stale information in | 1668 | because we don't want to read any stale information in |
499 | 1682 | buf_pool->watch[]. However, it is not in the critical code path | 1669 | buf_pool->watch[]. However, it is not in the critical code path |
500 | 1683 | as this function will be called only by the purge thread. */ | 1670 | as this function will be called only by the purge thread. */ |
501 | @@ -1686,18 +1673,16 @@ | |||
502 | 1686 | /* To obey latching order first release the hash_lock. */ | 1673 | /* To obey latching order first release the hash_lock. */ |
503 | 1687 | rw_lock_x_unlock(hash_lock); | 1674 | rw_lock_x_unlock(hash_lock); |
504 | 1688 | 1675 | ||
505 | 1689 | buf_pool_mutex_enter(buf_pool); | ||
506 | 1690 | hash_lock_x_all(buf_pool->page_hash); | 1676 | hash_lock_x_all(buf_pool->page_hash); |
507 | 1691 | 1677 | ||
508 | 1692 | /* We have to recheck that the page | 1678 | /* We have to recheck that the page |
509 | 1693 | was not loaded or a watch set by some other | 1679 | was not loaded or a watch set by some other |
510 | 1694 | purge thread. This is because of the small | 1680 | purge thread. This is because of the small |
511 | 1695 | time window between when we release the | 1681 | time window between when we release the |
513 | 1696 | hash_lock to acquire buf_pool mutex above. */ | 1682 | hash_lock to acquire all the hash locks above. */ |
514 | 1697 | 1683 | ||
515 | 1698 | bpage = buf_page_hash_get_low(buf_pool, space, offset, fold); | 1684 | bpage = buf_page_hash_get_low(buf_pool, space, offset, fold); |
516 | 1699 | if (UNIV_LIKELY_NULL(bpage)) { | 1685 | if (UNIV_LIKELY_NULL(bpage)) { |
517 | 1700 | buf_pool_mutex_exit(buf_pool); | ||
518 | 1701 | hash_unlock_x_all_but(buf_pool->page_hash, hash_lock); | 1686 | hash_unlock_x_all_but(buf_pool->page_hash, hash_lock); |
519 | 1702 | goto page_found; | 1687 | goto page_found; |
520 | 1703 | } | 1688 | } |
521 | @@ -1716,21 +1701,19 @@ | |||
522 | 1716 | ut_ad(!bpage->in_page_hash); | 1701 | ut_ad(!bpage->in_page_hash); |
523 | 1717 | ut_ad(bpage->buf_fix_count == 0); | 1702 | ut_ad(bpage->buf_fix_count == 0); |
524 | 1718 | 1703 | ||
529 | 1719 | /* bpage is pointing to buf_pool->watch[], | 1704 | mutex_enter(&buf_pool->zip_mutex); |
526 | 1720 | which is protected by buf_pool->mutex. | ||
527 | 1721 | Normally, buf_page_t objects are protected by | ||
528 | 1722 | buf_block_t::mutex or buf_pool->zip_mutex or both. */ | ||
530 | 1723 | 1705 | ||
531 | 1724 | bpage->state = BUF_BLOCK_ZIP_PAGE; | 1706 | bpage->state = BUF_BLOCK_ZIP_PAGE; |
532 | 1725 | bpage->space = space; | 1707 | bpage->space = space; |
533 | 1726 | bpage->offset = offset; | 1708 | bpage->offset = offset; |
534 | 1727 | bpage->buf_fix_count = 1; | 1709 | bpage->buf_fix_count = 1; |
535 | 1728 | 1710 | ||
536 | 1711 | mutex_exit(&buf_pool->zip_mutex); | ||
537 | 1712 | |||
538 | 1729 | ut_d(bpage->in_page_hash = TRUE); | 1713 | ut_d(bpage->in_page_hash = TRUE); |
539 | 1730 | HASH_INSERT(buf_page_t, hash, buf_pool->page_hash, | 1714 | HASH_INSERT(buf_page_t, hash, buf_pool->page_hash, |
540 | 1731 | fold, bpage); | 1715 | fold, bpage); |
541 | 1732 | 1716 | ||
542 | 1733 | buf_pool_mutex_exit(buf_pool); | ||
543 | 1734 | /* Once the sentinel is in the page_hash we can | 1717 | /* Once the sentinel is in the page_hash we can |
544 | 1735 | safely release all locks except just the | 1718 | safely release all locks except just the |
545 | 1736 | relevant hash_lock */ | 1719 | relevant hash_lock */ |
546 | @@ -1777,7 +1760,8 @@ | |||
547 | 1777 | ut_ad(rw_lock_own(hash_lock, RW_LOCK_EX)); | 1760 | ut_ad(rw_lock_own(hash_lock, RW_LOCK_EX)); |
548 | 1778 | #endif /* UNIV_SYNC_DEBUG */ | 1761 | #endif /* UNIV_SYNC_DEBUG */ |
549 | 1779 | 1762 | ||
551 | 1780 | ut_ad(buf_pool_mutex_own(buf_pool)); | 1763 | ut_ad(buf_page_get_state(watch) == BUF_BLOCK_ZIP_PAGE); |
552 | 1764 | ut_ad(buf_own_zip_mutex_for_page(watch)); | ||
553 | 1781 | 1765 | ||
554 | 1782 | HASH_DELETE(buf_page_t, hash, buf_pool->page_hash, fold, watch); | 1766 | HASH_DELETE(buf_page_t, hash, buf_pool->page_hash, fold, watch); |
555 | 1783 | ut_d(watch->in_page_hash = FALSE); | 1767 | ut_d(watch->in_page_hash = FALSE); |
556 | @@ -1801,13 +1785,6 @@ | |||
557 | 1801 | rw_lock_t* hash_lock = buf_page_hash_lock_get(buf_pool, | 1785 | rw_lock_t* hash_lock = buf_page_hash_lock_get(buf_pool, |
558 | 1802 | fold); | 1786 | fold); |
559 | 1803 | 1787 | ||
560 | 1804 | /* We only need to have buf_pool mutex in case where we end | ||
561 | 1805 | up calling buf_pool_watch_remove but to obey latching order | ||
562 | 1806 | we acquire it here before acquiring hash_lock. This should | ||
563 | 1807 | not cause too much grief as this function is only ever | ||
564 | 1808 | called from the purge thread. */ | ||
565 | 1809 | buf_pool_mutex_enter(buf_pool); | ||
566 | 1810 | |||
567 | 1811 | rw_lock_x_lock(hash_lock); | 1788 | rw_lock_x_lock(hash_lock); |
568 | 1812 | 1789 | ||
569 | 1813 | bpage = buf_page_hash_get_low(buf_pool, space, offset, fold); | 1790 | bpage = buf_page_hash_get_low(buf_pool, space, offset, fold); |
570 | @@ -1825,12 +1802,13 @@ | |||
571 | 1825 | } else { | 1802 | } else { |
572 | 1826 | ut_a(bpage->buf_fix_count > 0); | 1803 | ut_a(bpage->buf_fix_count > 0); |
573 | 1827 | 1804 | ||
574 | 1805 | mutex_enter(&buf_pool->zip_mutex); | ||
575 | 1828 | if (UNIV_LIKELY(!--bpage->buf_fix_count)) { | 1806 | if (UNIV_LIKELY(!--bpage->buf_fix_count)) { |
576 | 1829 | buf_pool_watch_remove(buf_pool, fold, bpage); | 1807 | buf_pool_watch_remove(buf_pool, fold, bpage); |
577 | 1830 | } | 1808 | } |
578 | 1809 | mutex_exit(&buf_pool->zip_mutex); | ||
579 | 1831 | } | 1810 | } |
580 | 1832 | 1811 | ||
581 | 1833 | buf_pool_mutex_exit(buf_pool); | ||
582 | 1834 | rw_lock_x_unlock(hash_lock); | 1812 | rw_lock_x_unlock(hash_lock); |
583 | 1835 | } | 1813 | } |
584 | 1836 | 1814 | ||
585 | @@ -1877,13 +1855,13 @@ | |||
586 | 1877 | { | 1855 | { |
587 | 1878 | buf_pool_t* buf_pool = buf_pool_from_bpage(bpage); | 1856 | buf_pool_t* buf_pool = buf_pool_from_bpage(bpage); |
588 | 1879 | 1857 | ||
590 | 1880 | buf_pool_mutex_enter(buf_pool); | 1858 | mutex_enter(&buf_pool->LRU_list_mutex); |
591 | 1881 | 1859 | ||
592 | 1882 | ut_a(buf_page_in_file(bpage)); | 1860 | ut_a(buf_page_in_file(bpage)); |
593 | 1883 | 1861 | ||
594 | 1884 | buf_LRU_make_block_young(bpage); | 1862 | buf_LRU_make_block_young(bpage); |
595 | 1885 | 1863 | ||
597 | 1886 | buf_pool_mutex_exit(buf_pool); | 1864 | mutex_exit(&buf_pool->LRU_list_mutex); |
598 | 1887 | } | 1865 | } |
599 | 1888 | 1866 | ||
600 | 1889 | /********************************************************************//** | 1867 | /********************************************************************//** |
601 | @@ -1897,10 +1875,6 @@ | |||
602 | 1897 | buf_page_t* bpage) /*!< in/out: buffer block of a | 1875 | buf_page_t* bpage) /*!< in/out: buffer block of a |
603 | 1898 | file page */ | 1876 | file page */ |
604 | 1899 | { | 1877 | { |
605 | 1900 | #ifdef UNIV_DEBUG | ||
606 | 1901 | buf_pool_t* buf_pool = buf_pool_from_bpage(bpage); | ||
607 | 1902 | ut_ad(!buf_pool_mutex_own(buf_pool)); | ||
608 | 1903 | #endif /* UNIV_DEBUG */ | ||
609 | 1904 | ut_a(buf_page_in_file(bpage)); | 1878 | ut_a(buf_page_in_file(bpage)); |
610 | 1905 | 1879 | ||
611 | 1906 | if (buf_page_peek_if_too_old(bpage)) { | 1880 | if (buf_page_peek_if_too_old(bpage)) { |
612 | @@ -1921,16 +1895,12 @@ | |||
613 | 1921 | buf_block_t* block; | 1895 | buf_block_t* block; |
614 | 1922 | buf_pool_t* buf_pool = buf_pool_get(space, offset); | 1896 | buf_pool_t* buf_pool = buf_pool_get(space, offset); |
615 | 1923 | 1897 | ||
616 | 1924 | buf_pool_mutex_enter(buf_pool); | ||
617 | 1925 | |||
618 | 1926 | block = (buf_block_t*) buf_page_hash_get(buf_pool, space, offset); | 1898 | block = (buf_block_t*) buf_page_hash_get(buf_pool, space, offset); |
619 | 1927 | 1899 | ||
620 | 1928 | if (block && buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE) { | 1900 | if (block && buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE) { |
621 | 1929 | ut_ad(!buf_pool_watch_is_sentinel(buf_pool, &block->page)); | 1901 | ut_ad(!buf_pool_watch_is_sentinel(buf_pool, &block->page)); |
622 | 1930 | block->check_index_page_at_flush = FALSE; | 1902 | block->check_index_page_at_flush = FALSE; |
623 | 1931 | } | 1903 | } |
624 | 1932 | |||
625 | 1933 | buf_pool_mutex_exit(buf_pool); | ||
626 | 1934 | } | 1904 | } |
627 | 1935 | 1905 | ||
628 | 1936 | #if defined UNIV_DEBUG_FILE_ACCESSES || defined UNIV_DEBUG | 1906 | #if defined UNIV_DEBUG_FILE_ACCESSES || defined UNIV_DEBUG |
629 | @@ -2014,21 +1984,32 @@ | |||
630 | 2014 | buf_page_t* bpage; | 1984 | buf_page_t* bpage; |
631 | 2015 | buf_pool_t* buf_pool = buf_pool_get(space, offset); | 1985 | buf_pool_t* buf_pool = buf_pool_get(space, offset); |
632 | 2016 | 1986 | ||
640 | 2017 | /* Since we need to acquire buf_pool mutex to discard | 1987 | /* Since we need to acquire buf_pool->LRU_list_mutex to discard |
641 | 2018 | the uncompressed frame and because page_hash mutex resides | 1988 | the uncompressed frame and because page_hash mutex resides below |
642 | 2019 | below buf_pool mutex in sync ordering therefore we must | 1989 | buf_pool->LRU_list_mutex in sync ordering therefore we must first |
643 | 2020 | first release the page_hash mutex. This means that the | 1990 | release the page_hash mutex. This means that the block in question |
644 | 2021 | block in question can move out of page_hash. Therefore | 1991 | can move out of page_hash. Therefore we need to check again if the |
645 | 2022 | we need to check again if the block is still in page_hash. */ | 1992 | block is still in page_hash. */ |
646 | 2023 | buf_pool_mutex_enter(buf_pool); | 1993 | |
647 | 1994 | mutex_enter(&buf_pool->LRU_list_mutex); | ||
648 | 2024 | 1995 | ||
649 | 2025 | bpage = buf_page_hash_get(buf_pool, space, offset); | 1996 | bpage = buf_page_hash_get(buf_pool, space, offset); |
650 | 2026 | 1997 | ||
651 | 2027 | if (bpage) { | 1998 | if (bpage) { |
653 | 2028 | buf_LRU_free_page(bpage, false); | 1999 | |
654 | 2000 | ib_mutex_t* block_mutex = buf_page_get_mutex(bpage); | ||
655 | 2001 | |||
656 | 2002 | mutex_enter(block_mutex); | ||
657 | 2003 | |||
658 | 2004 | if (buf_LRU_free_page(bpage, false)) { | ||
659 | 2005 | |||
660 | 2006 | mutex_exit(block_mutex); | ||
661 | 2007 | return; | ||
662 | 2008 | } | ||
663 | 2009 | mutex_exit(block_mutex); | ||
664 | 2029 | } | 2010 | } |
665 | 2030 | 2011 | ||
667 | 2031 | buf_pool_mutex_exit(buf_pool); | 2012 | mutex_exit(&buf_pool->LRU_list_mutex); |
668 | 2032 | } | 2013 | } |
669 | 2033 | 2014 | ||
670 | 2034 | /********************************************************************//** | 2015 | /********************************************************************//** |
671 | @@ -2324,7 +2305,8 @@ | |||
672 | 2324 | ut_ad(block->frame == page_align(ptr)); | 2305 | ut_ad(block->frame == page_align(ptr)); |
673 | 2325 | #ifdef UNIV_DEBUG | 2306 | #ifdef UNIV_DEBUG |
674 | 2326 | /* A thread that updates these fields must | 2307 | /* A thread that updates these fields must |
676 | 2327 | hold buf_pool->mutex and block->mutex. Acquire | 2308 | hold one of the buf_pool mutexes, depending on the |
677 | 2309 | page state, and block->mutex. Acquire | ||
678 | 2328 | only the latter. */ | 2310 | only the latter. */ |
679 | 2329 | mutex_enter(&block->mutex); | 2311 | mutex_enter(&block->mutex); |
680 | 2330 | 2312 | ||
681 | @@ -2708,6 +2690,7 @@ | |||
682 | 2708 | buf_page_t* bpage; | 2690 | buf_page_t* bpage; |
683 | 2709 | 2691 | ||
684 | 2710 | case BUF_BLOCK_FILE_PAGE: | 2692 | case BUF_BLOCK_FILE_PAGE: |
685 | 2693 | ut_ad(block_mutex != &buf_pool->zip_mutex); | ||
686 | 2711 | break; | 2694 | break; |
687 | 2712 | 2695 | ||
688 | 2713 | case BUF_BLOCK_ZIP_PAGE: | 2696 | case BUF_BLOCK_ZIP_PAGE: |
689 | @@ -2721,13 +2704,14 @@ | |||
690 | 2721 | } | 2704 | } |
691 | 2722 | 2705 | ||
692 | 2723 | bpage = &block->page; | 2706 | bpage = &block->page; |
693 | 2707 | ut_ad(block_mutex == &buf_pool->zip_mutex); | ||
694 | 2724 | 2708 | ||
695 | 2725 | if (bpage->buf_fix_count | 2709 | if (bpage->buf_fix_count |
696 | 2726 | || buf_page_get_io_fix(bpage) != BUF_IO_NONE) { | 2710 | || buf_page_get_io_fix(bpage) != BUF_IO_NONE) { |
697 | 2727 | /* This condition often occurs when the buffer | 2711 | /* This condition often occurs when the buffer |
698 | 2728 | is not buffer-fixed, but I/O-fixed by | 2712 | is not buffer-fixed, but I/O-fixed by |
699 | 2729 | buf_page_init_for_read(). */ | 2713 | buf_page_init_for_read(). */ |
701 | 2730 | mutex_exit(block_mutex); | 2714 | mutex_exit(&buf_pool->zip_mutex); |
702 | 2731 | wait_until_unfixed: | 2715 | wait_until_unfixed: |
703 | 2732 | /* The block is buffer-fixed or I/O-fixed. | 2716 | /* The block is buffer-fixed or I/O-fixed. |
704 | 2733 | Try again later. */ | 2717 | Try again later. */ |
705 | @@ -2737,11 +2721,11 @@ | |||
706 | 2737 | } | 2721 | } |
707 | 2738 | 2722 | ||
708 | 2739 | /* Allocate an uncompressed page. */ | 2723 | /* Allocate an uncompressed page. */ |
710 | 2740 | mutex_exit(block_mutex); | 2724 | mutex_exit(&buf_pool->zip_mutex); |
711 | 2741 | block = buf_LRU_get_free_block(buf_pool); | 2725 | block = buf_LRU_get_free_block(buf_pool); |
712 | 2742 | ut_a(block); | 2726 | ut_a(block); |
713 | 2743 | 2727 | ||
715 | 2744 | buf_pool_mutex_enter(buf_pool); | 2728 | mutex_enter(&buf_pool->LRU_list_mutex); |
716 | 2745 | 2729 | ||
717 | 2746 | /* As we have released the page_hash lock and the | 2730 | /* As we have released the page_hash lock and the |
718 | 2747 | block_mutex to allocate an uncompressed page it is | 2731 | block_mutex to allocate an uncompressed page it is |
719 | @@ -2754,22 +2738,25 @@ | |||
720 | 2754 | offset, fold); | 2738 | offset, fold); |
721 | 2755 | 2739 | ||
722 | 2756 | mutex_enter(&block->mutex); | 2740 | mutex_enter(&block->mutex); |
723 | 2741 | mutex_enter(&buf_pool->zip_mutex); | ||
724 | 2742 | |||
725 | 2757 | if (bpage != hash_bpage | 2743 | if (bpage != hash_bpage |
726 | 2758 | || bpage->buf_fix_count | 2744 | || bpage->buf_fix_count |
727 | 2759 | || buf_page_get_io_fix(bpage) != BUF_IO_NONE) { | 2745 | || buf_page_get_io_fix(bpage) != BUF_IO_NONE) { |
728 | 2760 | buf_LRU_block_free_non_file_page(block); | 2746 | buf_LRU_block_free_non_file_page(block); |
730 | 2761 | buf_pool_mutex_exit(buf_pool); | 2747 | mutex_exit(&buf_pool->LRU_list_mutex); |
731 | 2748 | mutex_exit(&buf_pool->zip_mutex); | ||
732 | 2762 | rw_lock_x_unlock(hash_lock); | 2749 | rw_lock_x_unlock(hash_lock); |
733 | 2763 | mutex_exit(&block->mutex); | 2750 | mutex_exit(&block->mutex); |
734 | 2764 | 2751 | ||
735 | 2765 | if (bpage != hash_bpage) { | 2752 | if (bpage != hash_bpage) { |
736 | 2766 | /* The buf_pool->page_hash was modified | 2753 | /* The buf_pool->page_hash was modified |
738 | 2767 | while buf_pool->mutex was not held | 2754 | while buf_pool->LRU_list_mutex was not held |
739 | 2768 | by this thread. */ | 2755 | by this thread. */ |
740 | 2769 | goto loop; | 2756 | goto loop; |
741 | 2770 | } else { | 2757 | } else { |
742 | 2771 | /* The block was buffer-fixed or | 2758 | /* The block was buffer-fixed or |
744 | 2772 | I/O-fixed while buf_pool->mutex was | 2759 | I/O-fixed while buf_pool->LRU_list_mutex was |
745 | 2773 | not held by this thread. */ | 2760 | not held by this thread. */ |
746 | 2774 | goto wait_until_unfixed; | 2761 | goto wait_until_unfixed; |
747 | 2775 | } | 2762 | } |
748 | @@ -2778,8 +2765,6 @@ | |||
749 | 2778 | /* Move the compressed page from bpage to block, | 2765 | /* Move the compressed page from bpage to block, |
750 | 2779 | and uncompress it. */ | 2766 | and uncompress it. */ |
751 | 2780 | 2767 | ||
752 | 2781 | mutex_enter(&buf_pool->zip_mutex); | ||
753 | 2782 | |||
754 | 2783 | buf_relocate(bpage, &block->page); | 2768 | buf_relocate(bpage, &block->page); |
755 | 2784 | buf_block_init_low(block); | 2769 | buf_block_init_low(block); |
756 | 2785 | block->lock_hash_val = lock_rec_hash(space, offset); | 2770 | block->lock_hash_val = lock_rec_hash(space, offset); |
757 | @@ -2808,6 +2793,8 @@ | |||
758 | 2808 | /* Insert at the front of unzip_LRU list */ | 2793 | /* Insert at the front of unzip_LRU list */ |
759 | 2809 | buf_unzip_LRU_add_block(block, FALSE); | 2794 | buf_unzip_LRU_add_block(block, FALSE); |
760 | 2810 | 2795 | ||
761 | 2796 | mutex_exit(&buf_pool->LRU_list_mutex); | ||
762 | 2797 | |||
763 | 2811 | block->page.buf_fix_count = 1; | 2798 | block->page.buf_fix_count = 1; |
764 | 2812 | buf_block_set_io_fix(block, BUF_IO_READ); | 2799 | buf_block_set_io_fix(block, BUF_IO_READ); |
765 | 2813 | rw_lock_x_lock_inline(&block->lock, 0, file, line); | 2800 | rw_lock_x_lock_inline(&block->lock, 0, file, line); |
766 | @@ -2816,8 +2803,7 @@ | |||
767 | 2816 | 2803 | ||
768 | 2817 | rw_lock_x_unlock(hash_lock); | 2804 | rw_lock_x_unlock(hash_lock); |
769 | 2818 | 2805 | ||
772 | 2819 | buf_pool->n_pend_unzip++; | 2806 | os_atomic_increment_ulint(&buf_pool->n_pend_unzip, 1); |
771 | 2820 | buf_pool_mutex_exit(buf_pool); | ||
773 | 2821 | 2807 | ||
774 | 2822 | access_time = buf_page_is_accessed(&block->page); | 2808 | access_time = buf_page_is_accessed(&block->page); |
775 | 2823 | mutex_exit(&block->mutex); | 2809 | mutex_exit(&block->mutex); |
776 | @@ -2826,7 +2812,7 @@ | |||
777 | 2826 | buf_page_free_descriptor(bpage); | 2812 | buf_page_free_descriptor(bpage); |
778 | 2827 | 2813 | ||
779 | 2828 | /* Decompress the page while not holding | 2814 | /* Decompress the page while not holding |
781 | 2829 | buf_pool->mutex or block->mutex. */ | 2815 | any buf_pool or block->mutex. */ |
782 | 2830 | 2816 | ||
783 | 2831 | /* Page checksum verification is already done when | 2817 | /* Page checksum verification is already done when |
784 | 2832 | the page is read from disk. Hence page checksum | 2818 | the page is read from disk. Hence page checksum |
785 | @@ -2845,12 +2831,10 @@ | |||
786 | 2845 | } | 2831 | } |
787 | 2846 | 2832 | ||
788 | 2847 | /* Unfix and unlatch the block. */ | 2833 | /* Unfix and unlatch the block. */ |
789 | 2848 | buf_pool_mutex_enter(buf_pool); | ||
790 | 2849 | mutex_enter(&block->mutex); | 2834 | mutex_enter(&block->mutex); |
791 | 2850 | block->page.buf_fix_count--; | 2835 | block->page.buf_fix_count--; |
792 | 2851 | buf_block_set_io_fix(block, BUF_IO_NONE); | 2836 | buf_block_set_io_fix(block, BUF_IO_NONE); |
795 | 2852 | buf_pool->n_pend_unzip--; | 2837 | os_atomic_decrement_ulint(&buf_pool->n_pend_unzip, 1); |
794 | 2853 | buf_pool_mutex_exit(buf_pool); | ||
796 | 2854 | rw_lock_x_unlock(&block->lock); | 2838 | rw_lock_x_unlock(&block->lock); |
797 | 2855 | 2839 | ||
798 | 2856 | break; | 2840 | break; |
799 | @@ -2885,23 +2869,18 @@ | |||
800 | 2885 | insert buffer (change buffer) as much as possible. */ | 2869 | insert buffer (change buffer) as much as possible. */ |
801 | 2886 | 2870 | ||
802 | 2887 | /* To obey the latching order, release the | 2871 | /* To obey the latching order, release the |
804 | 2888 | block->mutex before acquiring buf_pool->mutex. Protect | 2872 | block->mutex before acquiring buf_pool->LRU_list_mutex. Protect |
805 | 2889 | the block from changes by temporarily buffer-fixing it | 2873 | the block from changes by temporarily buffer-fixing it |
806 | 2890 | for the time we are not holding block->mutex. */ | 2874 | for the time we are not holding block->mutex. */ |
807 | 2875 | |||
808 | 2891 | buf_block_buf_fix_inc(block, file, line); | 2876 | buf_block_buf_fix_inc(block, file, line); |
809 | 2892 | mutex_exit(&block->mutex); | 2877 | mutex_exit(&block->mutex); |
811 | 2893 | buf_pool_mutex_enter(buf_pool); | 2878 | mutex_enter(&buf_pool->LRU_list_mutex); |
812 | 2894 | mutex_enter(&block->mutex); | 2879 | mutex_enter(&block->mutex); |
813 | 2895 | buf_block_buf_fix_dec(block); | 2880 | buf_block_buf_fix_dec(block); |
814 | 2896 | mutex_exit(&block->mutex); | ||
815 | 2897 | |||
816 | 2898 | /* Now we are only holding the buf_pool->mutex, | ||
817 | 2899 | not block->mutex or hash_lock. Blocks cannot be | ||
818 | 2900 | relocated or enter or exit the buf_pool while we | ||
819 | 2901 | are holding the buf_pool->mutex. */ | ||
820 | 2902 | 2881 | ||
821 | 2903 | if (buf_LRU_free_page(&block->page, true)) { | 2882 | if (buf_LRU_free_page(&block->page, true)) { |
823 | 2904 | buf_pool_mutex_exit(buf_pool); | 2883 | mutex_exit(&block->mutex); |
824 | 2905 | rw_lock_x_lock(hash_lock); | 2884 | rw_lock_x_lock(hash_lock); |
825 | 2906 | 2885 | ||
826 | 2907 | if (mode == BUF_GET_IF_IN_POOL_OR_WATCH) { | 2886 | if (mode == BUF_GET_IF_IN_POOL_OR_WATCH) { |
827 | @@ -2931,10 +2910,11 @@ | |||
828 | 2931 | "innodb_change_buffering_debug evict %u %u\n", | 2910 | "innodb_change_buffering_debug evict %u %u\n", |
829 | 2932 | (unsigned) space, (unsigned) offset); | 2911 | (unsigned) space, (unsigned) offset); |
830 | 2933 | return(NULL); | 2912 | return(NULL); |
831 | 2913 | } else { | ||
832 | 2914 | |||
833 | 2915 | mutex_exit(&buf_pool->LRU_list_mutex); | ||
834 | 2934 | } | 2916 | } |
835 | 2935 | 2917 | ||
836 | 2936 | mutex_enter(&block->mutex); | ||
837 | 2937 | |||
838 | 2938 | if (buf_flush_page_try(buf_pool, block)) { | 2918 | if (buf_flush_page_try(buf_pool, block)) { |
839 | 2939 | fprintf(stderr, | 2919 | fprintf(stderr, |
840 | 2940 | "innodb_change_buffering_debug flush %u %u\n", | 2920 | "innodb_change_buffering_debug flush %u %u\n", |
841 | @@ -2944,8 +2924,6 @@ | |||
842 | 2944 | } | 2924 | } |
843 | 2945 | 2925 | ||
844 | 2946 | /* Failed to evict the page; change it directly */ | 2926 | /* Failed to evict the page; change it directly */ |
845 | 2947 | |||
846 | 2948 | buf_pool_mutex_exit(buf_pool); | ||
847 | 2949 | } | 2927 | } |
848 | 2950 | #endif /* UNIV_DEBUG || UNIV_IBUF_DEBUG */ | 2928 | #endif /* UNIV_DEBUG || UNIV_IBUF_DEBUG */ |
849 | 2951 | 2929 | ||
850 | @@ -3420,7 +3398,6 @@ | |||
851 | 3420 | buf_page_t* hash_page; | 3398 | buf_page_t* hash_page; |
852 | 3421 | 3399 | ||
853 | 3422 | ut_ad(buf_pool == buf_pool_get(space, offset)); | 3400 | ut_ad(buf_pool == buf_pool_get(space, offset)); |
854 | 3423 | ut_ad(buf_pool_mutex_own(buf_pool)); | ||
855 | 3424 | 3401 | ||
856 | 3425 | ut_ad(mutex_own(&(block->mutex))); | 3402 | ut_ad(mutex_own(&(block->mutex))); |
857 | 3426 | ut_a(buf_block_get_state(block) != BUF_BLOCK_FILE_PAGE); | 3403 | ut_a(buf_block_get_state(block) != BUF_BLOCK_FILE_PAGE); |
858 | @@ -3455,11 +3432,17 @@ | |||
859 | 3455 | if (UNIV_LIKELY(!hash_page)) { | 3432 | if (UNIV_LIKELY(!hash_page)) { |
860 | 3456 | } else if (buf_pool_watch_is_sentinel(buf_pool, hash_page)) { | 3433 | } else if (buf_pool_watch_is_sentinel(buf_pool, hash_page)) { |
861 | 3457 | /* Preserve the reference count. */ | 3434 | /* Preserve the reference count. */ |
862 | 3435 | |||
863 | 3436 | mutex_enter(&buf_pool->zip_mutex); | ||
864 | 3437 | |||
865 | 3458 | ulint buf_fix_count = hash_page->buf_fix_count; | 3438 | ulint buf_fix_count = hash_page->buf_fix_count; |
866 | 3459 | 3439 | ||
867 | 3460 | ut_a(buf_fix_count > 0); | 3440 | ut_a(buf_fix_count > 0); |
868 | 3461 | block->page.buf_fix_count += buf_fix_count; | 3441 | block->page.buf_fix_count += buf_fix_count; |
869 | 3462 | buf_pool_watch_remove(buf_pool, fold, hash_page); | 3442 | buf_pool_watch_remove(buf_pool, fold, hash_page); |
870 | 3443 | |||
871 | 3444 | mutex_exit(&buf_pool->zip_mutex); | ||
872 | 3445 | |||
873 | 3463 | } else { | 3446 | } else { |
874 | 3464 | fprintf(stderr, | 3447 | fprintf(stderr, |
875 | 3465 | "InnoDB: Error: page %lu %lu already found" | 3448 | "InnoDB: Error: page %lu %lu already found" |
876 | @@ -3469,7 +3452,6 @@ | |||
877 | 3469 | (const void*) hash_page, (const void*) block); | 3452 | (const void*) hash_page, (const void*) block); |
878 | 3470 | #if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG | 3453 | #if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG |
879 | 3471 | mutex_exit(&block->mutex); | 3454 | mutex_exit(&block->mutex); |
880 | 3472 | buf_pool_mutex_exit(buf_pool); | ||
881 | 3473 | buf_print(); | 3455 | buf_print(); |
882 | 3474 | buf_LRU_print(); | 3456 | buf_LRU_print(); |
883 | 3475 | buf_validate(); | 3457 | buf_validate(); |
884 | @@ -3556,7 +3538,7 @@ | |||
885 | 3556 | fold = buf_page_address_fold(space, offset); | 3538 | fold = buf_page_address_fold(space, offset); |
886 | 3557 | hash_lock = buf_page_hash_lock_get(buf_pool, fold); | 3539 | hash_lock = buf_page_hash_lock_get(buf_pool, fold); |
887 | 3558 | 3540 | ||
889 | 3559 | buf_pool_mutex_enter(buf_pool); | 3541 | mutex_enter(&buf_pool->LRU_list_mutex); |
890 | 3560 | rw_lock_x_lock(hash_lock); | 3542 | rw_lock_x_lock(hash_lock); |
891 | 3561 | 3543 | ||
892 | 3562 | watch_page = buf_page_hash_get_low(buf_pool, space, offset, fold); | 3544 | watch_page = buf_page_hash_get_low(buf_pool, space, offset, fold); |
893 | @@ -3564,6 +3546,7 @@ | |||
894 | 3564 | /* The page is already in the buffer pool. */ | 3546 | /* The page is already in the buffer pool. */ |
895 | 3565 | watch_page = NULL; | 3547 | watch_page = NULL; |
896 | 3566 | err_exit: | 3548 | err_exit: |
897 | 3549 | mutex_exit(&buf_pool->LRU_list_mutex); | ||
898 | 3567 | rw_lock_x_unlock(hash_lock); | 3550 | rw_lock_x_unlock(hash_lock); |
899 | 3568 | if (block) { | 3551 | if (block) { |
900 | 3569 | mutex_enter(&block->mutex); | 3552 | mutex_enter(&block->mutex); |
901 | @@ -3596,6 +3579,7 @@ | |||
902 | 3596 | 3579 | ||
903 | 3597 | /* The block must be put to the LRU list, to the old blocks */ | 3580 | /* The block must be put to the LRU list, to the old blocks */ |
904 | 3598 | buf_LRU_add_block(bpage, TRUE/* to old blocks */); | 3581 | buf_LRU_add_block(bpage, TRUE/* to old blocks */); |
905 | 3582 | mutex_exit(&buf_pool->LRU_list_mutex); | ||
906 | 3599 | 3583 | ||
907 | 3600 | /* We set a pass-type x-lock on the frame because then | 3584 | /* We set a pass-type x-lock on the frame because then |
908 | 3601 | the same thread which called for the read operation | 3585 | the same thread which called for the read operation |
909 | @@ -3610,15 +3594,16 @@ | |||
910 | 3610 | buf_page_set_io_fix(bpage, BUF_IO_READ); | 3594 | buf_page_set_io_fix(bpage, BUF_IO_READ); |
911 | 3611 | 3595 | ||
912 | 3612 | if (zip_size) { | 3596 | if (zip_size) { |
914 | 3613 | /* buf_pool->mutex may be released and | 3597 | /* buf_pool->LRU_list_mutex may be released and |
915 | 3614 | reacquired by buf_buddy_alloc(). Thus, we | 3598 | reacquired by buf_buddy_alloc(). Thus, we |
916 | 3615 | must release block->mutex in order not to | 3599 | must release block->mutex in order not to |
917 | 3616 | break the latching order in the reacquisition | 3600 | break the latching order in the reacquisition |
919 | 3617 | of buf_pool->mutex. We also must defer this | 3601 | of buf_pool->LRU_list_mutex. We also must defer this |
920 | 3618 | operation until after the block descriptor has | 3602 | operation until after the block descriptor has |
921 | 3619 | been added to buf_pool->LRU and | 3603 | been added to buf_pool->LRU and |
922 | 3620 | buf_pool->page_hash. */ | 3604 | buf_pool->page_hash. */ |
923 | 3621 | mutex_exit(&block->mutex); | 3605 | mutex_exit(&block->mutex); |
924 | 3606 | mutex_enter(&buf_pool->LRU_list_mutex); | ||
925 | 3622 | data = buf_buddy_alloc(buf_pool, zip_size, &lru); | 3607 | data = buf_buddy_alloc(buf_pool, zip_size, &lru); |
926 | 3623 | mutex_enter(&block->mutex); | 3608 | mutex_enter(&block->mutex); |
927 | 3624 | block->page.zip.data = (page_zip_t*) data; | 3609 | block->page.zip.data = (page_zip_t*) data; |
928 | @@ -3630,6 +3615,7 @@ | |||
929 | 3630 | after block->page.zip.data is set. */ | 3615 | after block->page.zip.data is set. */ |
930 | 3631 | ut_ad(buf_page_belongs_to_unzip_LRU(&block->page)); | 3616 | ut_ad(buf_page_belongs_to_unzip_LRU(&block->page)); |
931 | 3632 | buf_unzip_LRU_add_block(block, TRUE); | 3617 | buf_unzip_LRU_add_block(block, TRUE); |
932 | 3618 | mutex_exit(&buf_pool->LRU_list_mutex); | ||
933 | 3633 | } | 3619 | } |
934 | 3634 | 3620 | ||
935 | 3635 | mutex_exit(&block->mutex); | 3621 | mutex_exit(&block->mutex); |
936 | @@ -3645,8 +3631,9 @@ | |||
937 | 3645 | rw_lock_x_lock(hash_lock); | 3631 | rw_lock_x_lock(hash_lock); |
938 | 3646 | 3632 | ||
939 | 3647 | /* If buf_buddy_alloc() allocated storage from the LRU list, | 3633 | /* If buf_buddy_alloc() allocated storage from the LRU list, |
942 | 3648 | it released and reacquired buf_pool->mutex. Thus, we must | 3634 | it released and reacquired buf_pool->LRU_list_mutex. Thus, we |
943 | 3649 | check the page_hash again, as it may have been modified. */ | 3635 | must check the page_hash again, as it may have been |
944 | 3636 | modified. */ | ||
945 | 3650 | if (UNIV_UNLIKELY(lru)) { | 3637 | if (UNIV_UNLIKELY(lru)) { |
946 | 3651 | 3638 | ||
947 | 3652 | watch_page = buf_page_hash_get_low( | 3639 | watch_page = buf_page_hash_get_low( |
948 | @@ -3657,6 +3644,7 @@ | |||
949 | 3657 | watch_page))) { | 3644 | watch_page))) { |
950 | 3658 | 3645 | ||
951 | 3659 | /* The block was added by some other thread. */ | 3646 | /* The block was added by some other thread. */ |
952 | 3647 | mutex_exit(&buf_pool->LRU_list_mutex); | ||
953 | 3660 | rw_lock_x_unlock(hash_lock); | 3648 | rw_lock_x_unlock(hash_lock); |
954 | 3661 | watch_page = NULL; | 3649 | watch_page = NULL; |
955 | 3662 | buf_buddy_free(buf_pool, data, zip_size); | 3650 | buf_buddy_free(buf_pool, data, zip_size); |
956 | @@ -3700,6 +3688,7 @@ | |||
957 | 3700 | /* Preserve the reference count. */ | 3688 | /* Preserve the reference count. */ |
958 | 3701 | ulint buf_fix_count = watch_page->buf_fix_count; | 3689 | ulint buf_fix_count = watch_page->buf_fix_count; |
959 | 3702 | ut_a(buf_fix_count > 0); | 3690 | ut_a(buf_fix_count > 0); |
960 | 3691 | ut_ad(buf_own_zip_mutex_for_page(bpage)); | ||
961 | 3703 | bpage->buf_fix_count += buf_fix_count; | 3692 | bpage->buf_fix_count += buf_fix_count; |
962 | 3704 | ut_ad(buf_pool_watch_is_sentinel(buf_pool, watch_page)); | 3693 | ut_ad(buf_pool_watch_is_sentinel(buf_pool, watch_page)); |
963 | 3705 | buf_pool_watch_remove(buf_pool, fold, watch_page); | 3694 | buf_pool_watch_remove(buf_pool, fold, watch_page); |
964 | @@ -3716,15 +3705,15 @@ | |||
965 | 3716 | #if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG | 3705 | #if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG |
966 | 3717 | buf_LRU_insert_zip_clean(bpage); | 3706 | buf_LRU_insert_zip_clean(bpage); |
967 | 3718 | #endif /* UNIV_DEBUG || UNIV_BUF_DEBUG */ | 3707 | #endif /* UNIV_DEBUG || UNIV_BUF_DEBUG */ |
968 | 3708 | mutex_exit(&buf_pool->LRU_list_mutex); | ||
969 | 3719 | 3709 | ||
970 | 3720 | buf_page_set_io_fix(bpage, BUF_IO_READ); | 3710 | buf_page_set_io_fix(bpage, BUF_IO_READ); |
971 | 3721 | 3711 | ||
972 | 3722 | mutex_exit(&buf_pool->zip_mutex); | 3712 | mutex_exit(&buf_pool->zip_mutex); |
973 | 3723 | } | 3713 | } |
974 | 3724 | 3714 | ||
976 | 3725 | buf_pool->n_pend_reads++; | 3715 | os_atomic_increment_ulint(&buf_pool->n_pend_reads, 1); |
977 | 3726 | func_exit: | 3716 | func_exit: |
978 | 3727 | buf_pool_mutex_exit(buf_pool); | ||
979 | 3728 | 3717 | ||
980 | 3729 | if (mode == BUF_READ_IBUF_PAGES_ONLY) { | 3718 | if (mode == BUF_READ_IBUF_PAGES_ONLY) { |
981 | 3730 | 3719 | ||
982 | @@ -3773,7 +3762,7 @@ | |||
983 | 3773 | fold = buf_page_address_fold(space, offset); | 3762 | fold = buf_page_address_fold(space, offset); |
984 | 3774 | hash_lock = buf_page_hash_lock_get(buf_pool, fold); | 3763 | hash_lock = buf_page_hash_lock_get(buf_pool, fold); |
985 | 3775 | 3764 | ||
987 | 3776 | buf_pool_mutex_enter(buf_pool); | 3765 | mutex_enter(&buf_pool->LRU_list_mutex); |
988 | 3777 | rw_lock_x_lock(hash_lock); | 3766 | rw_lock_x_lock(hash_lock); |
989 | 3778 | 3767 | ||
990 | 3779 | block = (buf_block_t*) buf_page_hash_get_low( | 3768 | block = (buf_block_t*) buf_page_hash_get_low( |
991 | @@ -3790,8 +3779,8 @@ | |||
992 | 3790 | #endif /* UNIV_DEBUG_FILE_ACCESSES || UNIV_DEBUG */ | 3779 | #endif /* UNIV_DEBUG_FILE_ACCESSES || UNIV_DEBUG */ |
993 | 3791 | 3780 | ||
994 | 3792 | /* Page can be found in buf_pool */ | 3781 | /* Page can be found in buf_pool */ |
995 | 3793 | buf_pool_mutex_exit(buf_pool); | ||
996 | 3794 | rw_lock_x_unlock(hash_lock); | 3782 | rw_lock_x_unlock(hash_lock); |
997 | 3783 | mutex_exit(&buf_pool->LRU_list_mutex); | ||
998 | 3795 | 3784 | ||
999 | 3796 | buf_block_free(free_block); | 3785 | buf_block_free(free_block); |
1000 | 3797 | 3786 | ||
1001 | @@ -3827,17 +3816,17 @@ | |||
1002 | 3827 | ibool lru; | 3816 | ibool lru; |
1003 | 3828 | 3817 | ||
1004 | 3829 | /* Prevent race conditions during buf_buddy_alloc(), | 3818 | /* Prevent race conditions during buf_buddy_alloc(), |
1006 | 3830 | which may release and reacquire buf_pool->mutex, | 3819 | which may release and reacquire buf_pool->LRU_list_mutex, |
1007 | 3831 | by IO-fixing and X-latching the block. */ | 3820 | by IO-fixing and X-latching the block. */ |
1008 | 3832 | 3821 | ||
1009 | 3833 | buf_page_set_io_fix(&block->page, BUF_IO_READ); | 3822 | buf_page_set_io_fix(&block->page, BUF_IO_READ); |
1010 | 3834 | rw_lock_x_lock(&block->lock); | 3823 | rw_lock_x_lock(&block->lock); |
1011 | 3835 | 3824 | ||
1012 | 3836 | mutex_exit(&block->mutex); | 3825 | mutex_exit(&block->mutex); |
1014 | 3837 | /* buf_pool->mutex may be released and reacquired by | 3826 | /* buf_pool->LRU_list_mutex may be released and reacquired by |
1015 | 3838 | buf_buddy_alloc(). Thus, we must release block->mutex | 3827 | buf_buddy_alloc(). Thus, we must release block->mutex |
1016 | 3839 | in order not to break the latching order in | 3828 | in order not to break the latching order in |
1018 | 3840 | the reacquisition of buf_pool->mutex. We also must | 3829 | the reacquisition of buf_pool->LRU_list_mutex. We also must |
1019 | 3841 | defer this operation until after the block descriptor | 3830 | defer this operation until after the block descriptor |
1020 | 3842 | has been added to buf_pool->LRU and buf_pool->page_hash. */ | 3831 | has been added to buf_pool->LRU and buf_pool->page_hash. */ |
1021 | 3843 | data = buf_buddy_alloc(buf_pool, zip_size, &lru); | 3832 | data = buf_buddy_alloc(buf_pool, zip_size, &lru); |
1022 | @@ -3856,7 +3845,7 @@ | |||
1023 | 3856 | rw_lock_x_unlock(&block->lock); | 3845 | rw_lock_x_unlock(&block->lock); |
1024 | 3857 | } | 3846 | } |
1025 | 3858 | 3847 | ||
1027 | 3859 | buf_pool_mutex_exit(buf_pool); | 3848 | mutex_exit(&buf_pool->LRU_list_mutex); |
1028 | 3860 | 3849 | ||
1029 | 3861 | mtr_memo_push(mtr, block, MTR_MEMO_BUF_FIX); | 3850 | mtr_memo_push(mtr, block, MTR_MEMO_BUF_FIX); |
1030 | 3862 | 3851 | ||
1031 | @@ -3907,6 +3896,8 @@ | |||
1032 | 3907 | const byte* frame; | 3896 | const byte* frame; |
1033 | 3908 | monitor_id_t counter; | 3897 | monitor_id_t counter; |
1034 | 3909 | 3898 | ||
1035 | 3899 | ut_ad(mutex_own(buf_page_get_mutex(bpage))); | ||
1036 | 3900 | |||
1037 | 3910 | /* If the counter module is not turned on, just return */ | 3901 | /* If the counter module is not turned on, just return */ |
1038 | 3911 | if (!MONITOR_IS_ON(MONITOR_MODULE_BUF_PAGE)) { | 3902 | if (!MONITOR_IS_ON(MONITOR_MODULE_BUF_PAGE)) { |
1039 | 3912 | return; | 3903 | return; |
1040 | @@ -4014,9 +4005,13 @@ | |||
1041 | 4014 | == BUF_BLOCK_FILE_PAGE); | 4005 | == BUF_BLOCK_FILE_PAGE); |
1042 | 4015 | ulint space = bpage->space; | 4006 | ulint space = bpage->space; |
1043 | 4016 | ibool ret = TRUE; | 4007 | ibool ret = TRUE; |
1044 | 4008 | const ulint fold = buf_page_address_fold(bpage->space, | ||
1045 | 4009 | bpage->offset); | ||
1046 | 4010 | rw_lock_t* hash_lock = buf_page_hash_lock_get(buf_pool, fold); | ||
1047 | 4017 | 4011 | ||
1048 | 4018 | /* First unfix and release lock on the bpage */ | 4012 | /* First unfix and release lock on the bpage */ |
1050 | 4019 | buf_pool_mutex_enter(buf_pool); | 4013 | mutex_enter(&buf_pool->LRU_list_mutex); |
1051 | 4014 | rw_lock_x_lock(hash_lock); | ||
1052 | 4020 | mutex_enter(buf_page_get_mutex(bpage)); | 4015 | mutex_enter(buf_page_get_mutex(bpage)); |
1053 | 4021 | ut_ad(buf_page_get_io_fix(bpage) == BUF_IO_READ); | 4016 | ut_ad(buf_page_get_io_fix(bpage) == BUF_IO_READ); |
1054 | 4022 | ut_ad(bpage->buf_fix_count == 0); | 4017 | ut_ad(bpage->buf_fix_count == 0); |
1055 | @@ -4030,19 +4025,18 @@ | |||
1056 | 4030 | BUF_IO_READ); | 4025 | BUF_IO_READ); |
1057 | 4031 | } | 4026 | } |
1058 | 4032 | 4027 | ||
1059 | 4033 | mutex_exit(buf_page_get_mutex(bpage)); | ||
1060 | 4034 | |||
1061 | 4035 | /* Find the table with specified space id, and mark it corrupted */ | 4028 | /* Find the table with specified space id, and mark it corrupted */ |
1062 | 4036 | if (dict_set_corrupted_by_space(space)) { | 4029 | if (dict_set_corrupted_by_space(space)) { |
1063 | 4037 | buf_LRU_free_one_page(bpage); | 4030 | buf_LRU_free_one_page(bpage); |
1064 | 4038 | } else { | 4031 | } else { |
1065 | 4032 | mutex_exit(buf_page_get_mutex(bpage)); | ||
1066 | 4039 | ret = FALSE; | 4033 | ret = FALSE; |
1067 | 4040 | } | 4034 | } |
1068 | 4041 | 4035 | ||
1069 | 4036 | mutex_exit(&buf_pool->LRU_list_mutex); | ||
1070 | 4037 | |||
1071 | 4042 | ut_ad(buf_pool->n_pend_reads > 0); | 4038 | ut_ad(buf_pool->n_pend_reads > 0); |
1075 | 4043 | buf_pool->n_pend_reads--; | 4039 | os_atomic_decrement_ulint(&buf_pool->n_pend_reads, 1); |
1073 | 4044 | |||
1074 | 4045 | buf_pool_mutex_exit(buf_pool); | ||
1076 | 4046 | 4040 | ||
1077 | 4047 | return(ret); | 4041 | return(ret); |
1078 | 4048 | } | 4042 | } |
1079 | @@ -4061,6 +4055,7 @@ | |||
1080 | 4061 | buf_pool_t* buf_pool = buf_pool_from_bpage(bpage); | 4055 | buf_pool_t* buf_pool = buf_pool_from_bpage(bpage); |
1081 | 4062 | const ibool uncompressed = (buf_page_get_state(bpage) | 4056 | const ibool uncompressed = (buf_page_get_state(bpage) |
1082 | 4063 | == BUF_BLOCK_FILE_PAGE); | 4057 | == BUF_BLOCK_FILE_PAGE); |
1083 | 4058 | bool have_LRU_mutex = false; | ||
1084 | 4064 | 4059 | ||
1085 | 4065 | ut_a(buf_page_in_file(bpage)); | 4060 | ut_a(buf_page_in_file(bpage)); |
1086 | 4066 | 4061 | ||
1087 | @@ -4070,7 +4065,7 @@ | |||
1088 | 4070 | ensures that this is the only thread that handles the i/o for this | 4065 | ensures that this is the only thread that handles the i/o for this |
1089 | 4071 | block. */ | 4066 | block. */ |
1090 | 4072 | 4067 | ||
1092 | 4073 | io_type = buf_page_get_io_fix(bpage); | 4068 | io_type = buf_page_get_io_fix_unlocked(bpage); |
1093 | 4074 | ut_ad(io_type == BUF_IO_READ || io_type == BUF_IO_WRITE); | 4069 | ut_ad(io_type == BUF_IO_READ || io_type == BUF_IO_WRITE); |
1094 | 4075 | 4070 | ||
1095 | 4076 | if (io_type == BUF_IO_READ) { | 4071 | if (io_type == BUF_IO_READ) { |
1096 | @@ -4080,15 +4075,16 @@ | |||
1097 | 4080 | 4075 | ||
1098 | 4081 | if (buf_page_get_zip_size(bpage)) { | 4076 | if (buf_page_get_zip_size(bpage)) { |
1099 | 4082 | frame = bpage->zip.data; | 4077 | frame = bpage->zip.data; |
1101 | 4083 | buf_pool->n_pend_unzip++; | 4078 | os_atomic_increment_ulint(&buf_pool->n_pend_unzip, 1); |
1102 | 4084 | if (uncompressed | 4079 | if (uncompressed |
1103 | 4085 | && !buf_zip_decompress((buf_block_t*) bpage, | 4080 | && !buf_zip_decompress((buf_block_t*) bpage, |
1104 | 4086 | FALSE)) { | 4081 | FALSE)) { |
1105 | 4087 | 4082 | ||
1107 | 4088 | buf_pool->n_pend_unzip--; | 4083 | os_atomic_decrement_ulint( |
1108 | 4084 | &buf_pool->n_pend_unzip, 1); | ||
1109 | 4089 | goto corrupt; | 4085 | goto corrupt; |
1110 | 4090 | } | 4086 | } |
1112 | 4091 | buf_pool->n_pend_unzip--; | 4087 | os_atomic_decrement_ulint(&buf_pool->n_pend_unzip, 1); |
1113 | 4092 | } else { | 4088 | } else { |
1114 | 4093 | ut_a(uncompressed); | 4089 | ut_a(uncompressed); |
1115 | 4094 | frame = ((buf_block_t*) bpage)->frame; | 4090 | frame = ((buf_block_t*) bpage)->frame; |
1116 | @@ -4255,8 +4251,37 @@ | |||
1117 | 4255 | } | 4251 | } |
1118 | 4256 | } | 4252 | } |
1119 | 4257 | 4253 | ||
1122 | 4258 | buf_pool_mutex_enter(buf_pool); | 4254 | if (io_type == BUF_IO_WRITE |
1123 | 4259 | mutex_enter(buf_page_get_mutex(bpage)); | 4255 | && ( |
1124 | 4256 | #if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG | ||
1125 | 4257 | /* to keep consistency at buf_LRU_insert_zip_clean() */ | ||
1126 | 4258 | buf_page_get_state(bpage) == BUF_BLOCK_ZIP_DIRTY || | ||
1127 | 4259 | #endif /* UNIV_DEBUG || UNIV_BUF_DEBUG */ | ||
1128 | 4260 | buf_page_get_flush_type(bpage) == BUF_FLUSH_LRU)) { | ||
1129 | 4261 | |||
1130 | 4262 | have_LRU_mutex = TRUE; /* optimistic */ | ||
1131 | 4263 | } | ||
1132 | 4264 | retry_mutex: | ||
1133 | 4265 | if (have_LRU_mutex) { | ||
1134 | 4266 | mutex_enter(&buf_pool->LRU_list_mutex); | ||
1135 | 4267 | } | ||
1136 | 4268 | |||
1137 | 4269 | ib_mutex_t* block_mutex = buf_page_get_mutex(bpage); | ||
1138 | 4270 | mutex_enter(block_mutex); | ||
1139 | 4271 | |||
1140 | 4272 | if (UNIV_UNLIKELY(io_type == BUF_IO_WRITE | ||
1141 | 4273 | && ( | ||
1142 | 4274 | #if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG | ||
1143 | 4275 | buf_page_get_state(bpage) == BUF_BLOCK_ZIP_DIRTY | ||
1144 | 4276 | || | ||
1145 | 4277 | #endif | ||
1146 | 4278 | buf_page_get_flush_type(bpage) == BUF_FLUSH_LRU) | ||
1147 | 4279 | && !have_LRU_mutex)) { | ||
1148 | 4280 | |||
1149 | 4281 | mutex_exit(block_mutex); | ||
1150 | 4282 | have_LRU_mutex = TRUE; | ||
1151 | 4283 | goto retry_mutex; | ||
1152 | 4284 | } | ||
1153 | 4260 | 4285 | ||
1154 | 4261 | #ifdef UNIV_IBUF_COUNT_DEBUG | 4286 | #ifdef UNIV_IBUF_COUNT_DEBUG |
1155 | 4262 | if (io_type == BUF_IO_WRITE || uncompressed) { | 4287 | if (io_type == BUF_IO_WRITE || uncompressed) { |
1156 | @@ -4271,17 +4296,20 @@ | |||
1157 | 4271 | removes the newest lock debug record, without checking the thread | 4296 | removes the newest lock debug record, without checking the thread |
1158 | 4272 | id. */ | 4297 | id. */ |
1159 | 4273 | 4298 | ||
1160 | 4274 | buf_page_set_io_fix(bpage, BUF_IO_NONE); | ||
1161 | 4275 | |||
1162 | 4276 | switch (io_type) { | 4299 | switch (io_type) { |
1163 | 4277 | case BUF_IO_READ: | 4300 | case BUF_IO_READ: |
1164 | 4301 | |||
1165 | 4302 | buf_page_set_io_fix(bpage, BUF_IO_NONE); | ||
1166 | 4303 | |||
1167 | 4278 | /* NOTE that the call to ibuf may have moved the ownership of | 4304 | /* NOTE that the call to ibuf may have moved the ownership of |
1168 | 4279 | the x-latch to this OS thread: do not let this confuse you in | 4305 | the x-latch to this OS thread: do not let this confuse you in |
1169 | 4280 | debugging! */ | 4306 | debugging! */ |
1170 | 4281 | 4307 | ||
1171 | 4282 | ut_ad(buf_pool->n_pend_reads > 0); | 4308 | ut_ad(buf_pool->n_pend_reads > 0); |
1174 | 4283 | buf_pool->n_pend_reads--; | 4309 | os_atomic_decrement_ulint(&buf_pool->n_pend_reads, 1); |
1175 | 4284 | buf_pool->stat.n_pages_read++; | 4310 | os_atomic_increment_ulint(&buf_pool->stat.n_pages_read, 1); |
1176 | 4311 | |||
1177 | 4312 | ut_ad(!have_LRU_mutex); | ||
1178 | 4285 | 4313 | ||
1179 | 4286 | if (uncompressed) { | 4314 | if (uncompressed) { |
1180 | 4287 | rw_lock_x_unlock_gen(&((buf_block_t*) bpage)->lock, | 4315 | rw_lock_x_unlock_gen(&((buf_block_t*) bpage)->lock, |
1181 | @@ -4296,13 +4324,17 @@ | |||
1182 | 4296 | 4324 | ||
1183 | 4297 | buf_flush_write_complete(bpage); | 4325 | buf_flush_write_complete(bpage); |
1184 | 4298 | 4326 | ||
1185 | 4327 | os_atomic_increment_ulint(&buf_pool->stat.n_pages_written, 1); | ||
1186 | 4328 | |||
1187 | 4329 | if (have_LRU_mutex) { | ||
1188 | 4330 | mutex_exit(&buf_pool->LRU_list_mutex); | ||
1189 | 4331 | } | ||
1190 | 4332 | |||
1191 | 4299 | if (uncompressed) { | 4333 | if (uncompressed) { |
1192 | 4300 | rw_lock_s_unlock_gen(&((buf_block_t*) bpage)->lock, | 4334 | rw_lock_s_unlock_gen(&((buf_block_t*) bpage)->lock, |
1193 | 4301 | BUF_IO_WRITE); | 4335 | BUF_IO_WRITE); |
1194 | 4302 | } | 4336 | } |
1195 | 4303 | 4337 | ||
1196 | 4304 | buf_pool->stat.n_pages_written++; | ||
1197 | 4305 | |||
1198 | 4306 | break; | 4338 | break; |
1199 | 4307 | 4339 | ||
1200 | 4308 | default: | 4340 | default: |
1201 | @@ -4320,8 +4352,7 @@ | |||
1202 | 4320 | } | 4352 | } |
1203 | 4321 | #endif /* UNIV_DEBUG */ | 4353 | #endif /* UNIV_DEBUG */ |
1204 | 4322 | 4354 | ||
1207 | 4323 | mutex_exit(buf_page_get_mutex(bpage)); | 4355 | mutex_exit(block_mutex); |
1206 | 4324 | buf_pool_mutex_exit(buf_pool); | ||
1208 | 4325 | 4356 | ||
1209 | 4326 | return(true); | 4357 | return(true); |
1210 | 4327 | } | 4358 | } |
1211 | @@ -4340,14 +4371,16 @@ | |||
1212 | 4340 | 4371 | ||
1213 | 4341 | ut_ad(buf_pool); | 4372 | ut_ad(buf_pool); |
1214 | 4342 | 4373 | ||
1215 | 4343 | buf_pool_mutex_enter(buf_pool); | ||
1216 | 4344 | |||
1217 | 4345 | chunk = buf_pool->chunks; | 4374 | chunk = buf_pool->chunks; |
1218 | 4346 | 4375 | ||
1219 | 4347 | for (i = buf_pool->n_chunks; i--; chunk++) { | 4376 | for (i = buf_pool->n_chunks; i--; chunk++) { |
1220 | 4348 | 4377 | ||
1221 | 4378 | mutex_enter(&buf_pool->LRU_list_mutex); | ||
1222 | 4379 | |||
1223 | 4349 | const buf_block_t* block = buf_chunk_not_freed(chunk); | 4380 | const buf_block_t* block = buf_chunk_not_freed(chunk); |
1224 | 4350 | 4381 | ||
1225 | 4382 | mutex_exit(&buf_pool->LRU_list_mutex); | ||
1226 | 4383 | |||
1227 | 4351 | if (UNIV_LIKELY_NULL(block)) { | 4384 | if (UNIV_LIKELY_NULL(block)) { |
1228 | 4352 | fprintf(stderr, | 4385 | fprintf(stderr, |
1229 | 4353 | "Page %lu %lu still fixed or dirty\n", | 4386 | "Page %lu %lu still fixed or dirty\n", |
1230 | @@ -4357,8 +4390,6 @@ | |||
1231 | 4357 | } | 4390 | } |
1232 | 4358 | } | 4391 | } |
1233 | 4359 | 4392 | ||
1234 | 4360 | buf_pool_mutex_exit(buf_pool); | ||
1235 | 4361 | |||
1236 | 4362 | return(TRUE); | 4393 | return(TRUE); |
1237 | 4363 | } | 4394 | } |
1238 | 4364 | 4395 | ||
1239 | @@ -4372,7 +4403,9 @@ | |||
1240 | 4372 | { | 4403 | { |
1241 | 4373 | ulint i; | 4404 | ulint i; |
1242 | 4374 | 4405 | ||
1244 | 4375 | buf_pool_mutex_enter(buf_pool); | 4406 | ut_ad(!mutex_own(&buf_pool->LRU_list_mutex)); |
1245 | 4407 | |||
1246 | 4408 | mutex_enter(&buf_pool->flush_state_mutex); | ||
1247 | 4376 | 4409 | ||
1248 | 4377 | for (i = BUF_FLUSH_LRU; i < BUF_FLUSH_N_TYPES; i++) { | 4410 | for (i = BUF_FLUSH_LRU; i < BUF_FLUSH_N_TYPES; i++) { |
1249 | 4378 | 4411 | ||
1250 | @@ -4390,21 +4423,20 @@ | |||
1251 | 4390 | if (buf_pool->n_flush[i] > 0) { | 4423 | if (buf_pool->n_flush[i] > 0) { |
1252 | 4391 | buf_flush_t type = static_cast<buf_flush_t>(i); | 4424 | buf_flush_t type = static_cast<buf_flush_t>(i); |
1253 | 4392 | 4425 | ||
1255 | 4393 | buf_pool_mutex_exit(buf_pool); | 4426 | mutex_exit(&buf_pool->flush_state_mutex); |
1256 | 4394 | buf_flush_wait_batch_end(buf_pool, type); | 4427 | buf_flush_wait_batch_end(buf_pool, type); |
1258 | 4395 | buf_pool_mutex_enter(buf_pool); | 4428 | mutex_enter(&buf_pool->flush_state_mutex); |
1259 | 4396 | } | 4429 | } |
1260 | 4397 | } | 4430 | } |
1263 | 4398 | 4431 | mutex_exit(&buf_pool->flush_state_mutex); | |
1262 | 4399 | buf_pool_mutex_exit(buf_pool); | ||
1264 | 4400 | 4432 | ||
1265 | 4401 | ut_ad(buf_all_freed_instance(buf_pool)); | 4433 | ut_ad(buf_all_freed_instance(buf_pool)); |
1266 | 4402 | 4434 | ||
1267 | 4403 | buf_pool_mutex_enter(buf_pool); | ||
1268 | 4404 | |||
1269 | 4405 | while (buf_LRU_scan_and_free_block(buf_pool, TRUE)) { | 4435 | while (buf_LRU_scan_and_free_block(buf_pool, TRUE)) { |
1270 | 4406 | } | 4436 | } |
1271 | 4407 | 4437 | ||
1272 | 4438 | mutex_enter(&buf_pool->LRU_list_mutex); | ||
1273 | 4439 | |||
1274 | 4408 | ut_ad(UT_LIST_GET_LEN(buf_pool->LRU) == 0); | 4440 | ut_ad(UT_LIST_GET_LEN(buf_pool->LRU) == 0); |
1275 | 4409 | ut_ad(UT_LIST_GET_LEN(buf_pool->unzip_LRU) == 0); | 4441 | ut_ad(UT_LIST_GET_LEN(buf_pool->unzip_LRU) == 0); |
1276 | 4410 | 4442 | ||
1277 | @@ -4412,10 +4444,10 @@ | |||
1278 | 4412 | buf_pool->LRU_old = NULL; | 4444 | buf_pool->LRU_old = NULL; |
1279 | 4413 | buf_pool->LRU_old_len = 0; | 4445 | buf_pool->LRU_old_len = 0; |
1280 | 4414 | 4446 | ||
1281 | 4447 | mutex_exit(&buf_pool->LRU_list_mutex); | ||
1282 | 4448 | |||
1283 | 4415 | memset(&buf_pool->stat, 0x00, sizeof(buf_pool->stat)); | 4449 | memset(&buf_pool->stat, 0x00, sizeof(buf_pool->stat)); |
1284 | 4416 | buf_refresh_io_stats(buf_pool); | 4450 | buf_refresh_io_stats(buf_pool); |
1285 | 4417 | |||
1286 | 4418 | buf_pool_mutex_exit(buf_pool); | ||
1287 | 4419 | } | 4451 | } |
1288 | 4420 | 4452 | ||
1289 | 4421 | /*********************************************************************//** | 4453 | /*********************************************************************//** |
1290 | @@ -4460,8 +4492,11 @@ | |||
1291 | 4460 | 4492 | ||
1292 | 4461 | ut_ad(buf_pool); | 4493 | ut_ad(buf_pool); |
1293 | 4462 | 4494 | ||
1295 | 4463 | buf_pool_mutex_enter(buf_pool); | 4495 | mutex_enter(&buf_pool->LRU_list_mutex); |
1296 | 4464 | hash_lock_x_all(buf_pool->page_hash); | 4496 | hash_lock_x_all(buf_pool->page_hash); |
1297 | 4497 | mutex_enter(&buf_pool->zip_mutex); | ||
1298 | 4498 | mutex_enter(&buf_pool->free_list_mutex); | ||
1299 | 4499 | mutex_enter(&buf_pool->flush_state_mutex); | ||
1300 | 4465 | 4500 | ||
1301 | 4466 | chunk = buf_pool->chunks; | 4501 | chunk = buf_pool->chunks; |
1302 | 4467 | 4502 | ||
1303 | @@ -4474,8 +4509,6 @@ | |||
1304 | 4474 | 4509 | ||
1305 | 4475 | for (j = chunk->size; j--; block++) { | 4510 | for (j = chunk->size; j--; block++) { |
1306 | 4476 | 4511 | ||
1307 | 4477 | mutex_enter(&block->mutex); | ||
1308 | 4478 | |||
1309 | 4479 | switch (buf_block_get_state(block)) { | 4512 | switch (buf_block_get_state(block)) { |
1310 | 4480 | case BUF_BLOCK_POOL_WATCH: | 4513 | case BUF_BLOCK_POOL_WATCH: |
1311 | 4481 | case BUF_BLOCK_ZIP_PAGE: | 4514 | case BUF_BLOCK_ZIP_PAGE: |
1312 | @@ -4486,6 +4519,7 @@ | |||
1313 | 4486 | break; | 4519 | break; |
1314 | 4487 | 4520 | ||
1315 | 4488 | case BUF_BLOCK_FILE_PAGE: | 4521 | case BUF_BLOCK_FILE_PAGE: |
1316 | 4522 | |||
1317 | 4489 | space = buf_block_get_space(block); | 4523 | space = buf_block_get_space(block); |
1318 | 4490 | offset = buf_block_get_page_no(block); | 4524 | offset = buf_block_get_page_no(block); |
1319 | 4491 | fold = buf_page_address_fold(space, offset); | 4525 | fold = buf_page_address_fold(space, offset); |
1320 | @@ -4496,14 +4530,15 @@ | |||
1321 | 4496 | == &block->page); | 4530 | == &block->page); |
1322 | 4497 | 4531 | ||
1323 | 4498 | #ifdef UNIV_IBUF_COUNT_DEBUG | 4532 | #ifdef UNIV_IBUF_COUNT_DEBUG |
1325 | 4499 | ut_a(buf_page_get_io_fix(&block->page) | 4533 | ut_a(buf_page_get_io_fix_unlocked(&block->page) |
1326 | 4500 | == BUF_IO_READ | 4534 | == BUF_IO_READ |
1327 | 4501 | || !ibuf_count_get(buf_block_get_space( | 4535 | || !ibuf_count_get(buf_block_get_space( |
1328 | 4502 | block), | 4536 | block), |
1329 | 4503 | buf_block_get_page_no( | 4537 | buf_block_get_page_no( |
1330 | 4504 | block))); | 4538 | block))); |
1331 | 4505 | #endif | 4539 | #endif |
1333 | 4506 | switch (buf_page_get_io_fix(&block->page)) { | 4540 | switch (buf_page_get_io_fix_unlocked( |
1334 | 4541 | &block->page)) { | ||
1335 | 4507 | case BUF_IO_NONE: | 4542 | case BUF_IO_NONE: |
1336 | 4508 | break; | 4543 | break; |
1337 | 4509 | 4544 | ||
1338 | @@ -4511,17 +4546,8 @@ | |||
1339 | 4511 | switch (buf_page_get_flush_type( | 4546 | switch (buf_page_get_flush_type( |
1340 | 4512 | &block->page)) { | 4547 | &block->page)) { |
1341 | 4513 | case BUF_FLUSH_LRU: | 4548 | case BUF_FLUSH_LRU: |
1342 | 4514 | n_lru_flush++; | ||
1343 | 4515 | goto assert_s_latched; | ||
1344 | 4516 | case BUF_FLUSH_SINGLE_PAGE: | 4549 | case BUF_FLUSH_SINGLE_PAGE: |
1345 | 4517 | n_page_flush++; | ||
1346 | 4518 | assert_s_latched: | ||
1347 | 4519 | ut_a(rw_lock_is_locked( | ||
1348 | 4520 | &block->lock, | ||
1349 | 4521 | RW_LOCK_SHARED)); | ||
1350 | 4522 | break; | ||
1351 | 4523 | case BUF_FLUSH_LIST: | 4550 | case BUF_FLUSH_LIST: |
1352 | 4524 | n_list_flush++; | ||
1353 | 4525 | break; | 4551 | break; |
1354 | 4526 | default: | 4552 | default: |
1355 | 4527 | ut_error; | 4553 | ut_error; |
1356 | @@ -4552,13 +4578,9 @@ | |||
1357 | 4552 | /* do nothing */ | 4578 | /* do nothing */ |
1358 | 4553 | break; | 4579 | break; |
1359 | 4554 | } | 4580 | } |
1360 | 4555 | |||
1361 | 4556 | mutex_exit(&block->mutex); | ||
1362 | 4557 | } | 4581 | } |
1363 | 4558 | } | 4582 | } |
1364 | 4559 | 4583 | ||
1365 | 4560 | mutex_enter(&buf_pool->zip_mutex); | ||
1366 | 4561 | |||
1367 | 4562 | /* Check clean compressed-only blocks. */ | 4584 | /* Check clean compressed-only blocks. */ |
1368 | 4563 | 4585 | ||
1369 | 4564 | for (b = UT_LIST_GET_FIRST(buf_pool->zip_clean); b; | 4586 | for (b = UT_LIST_GET_FIRST(buf_pool->zip_clean); b; |
1370 | @@ -4604,7 +4626,9 @@ | |||
1371 | 4604 | case BUF_BLOCK_ZIP_DIRTY: | 4626 | case BUF_BLOCK_ZIP_DIRTY: |
1372 | 4605 | n_lru++; | 4627 | n_lru++; |
1373 | 4606 | n_zip++; | 4628 | n_zip++; |
1375 | 4607 | switch (buf_page_get_io_fix(b)) { | 4629 | /* fallthrough */ |
1376 | 4630 | case BUF_BLOCK_FILE_PAGE: | ||
1377 | 4631 | switch (buf_page_get_io_fix_unlocked(b)) { | ||
1378 | 4608 | case BUF_IO_NONE: | 4632 | case BUF_IO_NONE: |
1379 | 4609 | case BUF_IO_READ: | 4633 | case BUF_IO_READ: |
1380 | 4610 | case BUF_IO_PIN: | 4634 | case BUF_IO_PIN: |
1381 | @@ -4624,11 +4648,10 @@ | |||
1382 | 4624 | ut_error; | 4648 | ut_error; |
1383 | 4625 | } | 4649 | } |
1384 | 4626 | break; | 4650 | break; |
1385 | 4651 | default: | ||
1386 | 4652 | ut_error; | ||
1387 | 4627 | } | 4653 | } |
1388 | 4628 | break; | 4654 | break; |
1389 | 4629 | case BUF_BLOCK_FILE_PAGE: | ||
1390 | 4630 | /* uncompressed page */ | ||
1391 | 4631 | break; | ||
1392 | 4632 | case BUF_BLOCK_POOL_WATCH: | 4655 | case BUF_BLOCK_POOL_WATCH: |
1393 | 4633 | case BUF_BLOCK_ZIP_PAGE: | 4656 | case BUF_BLOCK_ZIP_PAGE: |
1394 | 4634 | case BUF_BLOCK_NOT_USED: | 4657 | case BUF_BLOCK_NOT_USED: |
1395 | @@ -4658,6 +4681,9 @@ | |||
1396 | 4658 | } | 4681 | } |
1397 | 4659 | 4682 | ||
1398 | 4660 | ut_a(UT_LIST_GET_LEN(buf_pool->LRU) == n_lru); | 4683 | ut_a(UT_LIST_GET_LEN(buf_pool->LRU) == n_lru); |
1399 | 4684 | |||
1400 | 4685 | mutex_exit(&buf_pool->LRU_list_mutex); | ||
1401 | 4686 | |||
1402 | 4661 | if (UT_LIST_GET_LEN(buf_pool->free) != n_free) { | 4687 | if (UT_LIST_GET_LEN(buf_pool->free) != n_free) { |
1403 | 4662 | fprintf(stderr, "Free list len %lu, free blocks %lu\n", | 4688 | fprintf(stderr, "Free list len %lu, free blocks %lu\n", |
1404 | 4663 | (ulong) UT_LIST_GET_LEN(buf_pool->free), | 4689 | (ulong) UT_LIST_GET_LEN(buf_pool->free), |
1405 | @@ -4665,11 +4691,13 @@ | |||
1406 | 4665 | ut_error; | 4691 | ut_error; |
1407 | 4666 | } | 4692 | } |
1408 | 4667 | 4693 | ||
1409 | 4694 | mutex_exit(&buf_pool->free_list_mutex); | ||
1410 | 4695 | |||
1411 | 4668 | ut_a(buf_pool->n_flush[BUF_FLUSH_LIST] == n_list_flush); | 4696 | ut_a(buf_pool->n_flush[BUF_FLUSH_LIST] == n_list_flush); |
1412 | 4669 | ut_a(buf_pool->n_flush[BUF_FLUSH_LRU] == n_lru_flush); | 4697 | ut_a(buf_pool->n_flush[BUF_FLUSH_LRU] == n_lru_flush); |
1413 | 4670 | ut_a(buf_pool->n_flush[BUF_FLUSH_SINGLE_PAGE] == n_page_flush); | 4698 | ut_a(buf_pool->n_flush[BUF_FLUSH_SINGLE_PAGE] == n_page_flush); |
1414 | 4671 | 4699 | ||
1416 | 4672 | buf_pool_mutex_exit(buf_pool); | 4700 | mutex_exit(&buf_pool->flush_state_mutex); |
1417 | 4673 | 4701 | ||
1418 | 4674 | ut_a(buf_LRU_validate()); | 4702 | ut_a(buf_LRU_validate()); |
1419 | 4675 | ut_a(buf_flush_validate(buf_pool)); | 4703 | ut_a(buf_flush_validate(buf_pool)); |
1420 | @@ -4727,8 +4755,7 @@ | |||
1421 | 4727 | 4755 | ||
1422 | 4728 | counts = static_cast<ulint*>(mem_alloc(sizeof(ulint) * size)); | 4756 | counts = static_cast<ulint*>(mem_alloc(sizeof(ulint) * size)); |
1423 | 4729 | 4757 | ||
1426 | 4730 | buf_pool_mutex_enter(buf_pool); | 4758 | /* Dirty reads below */ |
1425 | 4731 | buf_flush_list_mutex_enter(buf_pool); | ||
1427 | 4732 | 4759 | ||
1428 | 4733 | fprintf(stderr, | 4760 | fprintf(stderr, |
1429 | 4734 | "buf_pool size %lu\n" | 4761 | "buf_pool size %lu\n" |
1430 | @@ -4755,12 +4782,12 @@ | |||
1431 | 4755 | (ulong) buf_pool->stat.n_pages_created, | 4782 | (ulong) buf_pool->stat.n_pages_created, |
1432 | 4756 | (ulong) buf_pool->stat.n_pages_written); | 4783 | (ulong) buf_pool->stat.n_pages_written); |
1433 | 4757 | 4784 | ||
1434 | 4758 | buf_flush_list_mutex_exit(buf_pool); | ||
1435 | 4759 | |||
1436 | 4760 | /* Count the number of blocks belonging to each index in the buffer */ | 4785 | /* Count the number of blocks belonging to each index in the buffer */ |
1437 | 4761 | 4786 | ||
1438 | 4762 | n_found = 0; | 4787 | n_found = 0; |
1439 | 4763 | 4788 | ||
1440 | 4789 | mutex_enter(&buf_pool->LRU_list_mutex); | ||
1441 | 4790 | |||
1442 | 4764 | chunk = buf_pool->chunks; | 4791 | chunk = buf_pool->chunks; |
1443 | 4765 | 4792 | ||
1444 | 4766 | for (i = buf_pool->n_chunks; i--; chunk++) { | 4793 | for (i = buf_pool->n_chunks; i--; chunk++) { |
1445 | @@ -4796,7 +4823,7 @@ | |||
1446 | 4796 | } | 4823 | } |
1447 | 4797 | } | 4824 | } |
1448 | 4798 | 4825 | ||
1450 | 4799 | buf_pool_mutex_exit(buf_pool); | 4826 | mutex_exit(&buf_pool->LRU_list_mutex); |
1451 | 4800 | 4827 | ||
1452 | 4801 | for (i = 0; i < n_found; i++) { | 4828 | for (i = 0; i < n_found; i++) { |
1453 | 4802 | index = dict_index_get_if_in_cache(index_ids[i]); | 4829 | index = dict_index_get_if_in_cache(index_ids[i]); |
1454 | @@ -4853,7 +4880,8 @@ | |||
1455 | 4853 | buf_chunk_t* chunk; | 4880 | buf_chunk_t* chunk; |
1456 | 4854 | ulint fixed_pages_number = 0; | 4881 | ulint fixed_pages_number = 0; |
1457 | 4855 | 4882 | ||
1459 | 4856 | buf_pool_mutex_enter(buf_pool); | 4883 | /* The LRU list mutex is enough to protect the required fields below */ |
1460 | 4884 | mutex_enter(&buf_pool->LRU_list_mutex); | ||
1461 | 4857 | 4885 | ||
1462 | 4858 | chunk = buf_pool->chunks; | 4886 | chunk = buf_pool->chunks; |
1463 | 4859 | 4887 | ||
1464 | @@ -4870,18 +4898,17 @@ | |||
1465 | 4870 | continue; | 4898 | continue; |
1466 | 4871 | } | 4899 | } |
1467 | 4872 | 4900 | ||
1468 | 4873 | mutex_enter(&block->mutex); | ||
1469 | 4874 | |||
1470 | 4875 | if (block->page.buf_fix_count != 0 | 4901 | if (block->page.buf_fix_count != 0 |
1472 | 4876 | || buf_page_get_io_fix(&block->page) | 4902 | || buf_page_get_io_fix_unlocked(&block->page) |
1473 | 4877 | != BUF_IO_NONE) { | 4903 | != BUF_IO_NONE) { |
1474 | 4878 | fixed_pages_number++; | 4904 | fixed_pages_number++; |
1475 | 4879 | } | 4905 | } |
1476 | 4880 | 4906 | ||
1477 | 4881 | mutex_exit(&block->mutex); | ||
1478 | 4882 | } | 4907 | } |
1479 | 4883 | } | 4908 | } |
1480 | 4884 | 4909 | ||
1481 | 4910 | mutex_exit(&buf_pool->LRU_list_mutex); | ||
1482 | 4911 | |||
1483 | 4885 | mutex_enter(&buf_pool->zip_mutex); | 4912 | mutex_enter(&buf_pool->zip_mutex); |
1484 | 4886 | 4913 | ||
1485 | 4887 | /* Traverse the lists of clean and dirty compressed-only blocks. */ | 4914 | /* Traverse the lists of clean and dirty compressed-only blocks. */ |
1486 | @@ -4925,7 +4952,6 @@ | |||
1487 | 4925 | 4952 | ||
1488 | 4926 | buf_flush_list_mutex_exit(buf_pool); | 4953 | buf_flush_list_mutex_exit(buf_pool); |
1489 | 4927 | mutex_exit(&buf_pool->zip_mutex); | 4954 | mutex_exit(&buf_pool->zip_mutex); |
1490 | 4928 | buf_pool_mutex_exit(buf_pool); | ||
1491 | 4929 | 4955 | ||
1492 | 4930 | return(fixed_pages_number); | 4956 | return(fixed_pages_number); |
1493 | 4931 | } | 4957 | } |
1494 | @@ -5073,9 +5099,6 @@ | |||
1495 | 5073 | /* Find appropriate pool_info to store stats for this buffer pool */ | 5099 | /* Find appropriate pool_info to store stats for this buffer pool */ |
1496 | 5074 | pool_info = &all_pool_info[pool_id]; | 5100 | pool_info = &all_pool_info[pool_id]; |
1497 | 5075 | 5101 | ||
1498 | 5076 | buf_pool_mutex_enter(buf_pool); | ||
1499 | 5077 | buf_flush_list_mutex_enter(buf_pool); | ||
1500 | 5078 | |||
1501 | 5079 | pool_info->pool_unique_id = pool_id; | 5102 | pool_info->pool_unique_id = pool_id; |
1502 | 5080 | 5103 | ||
1503 | 5081 | pool_info->pool_size = buf_pool->curr_size; | 5104 | pool_info->pool_size = buf_pool->curr_size; |
1504 | @@ -5094,6 +5117,8 @@ | |||
1505 | 5094 | 5117 | ||
1506 | 5095 | pool_info->n_pend_reads = buf_pool->n_pend_reads; | 5118 | pool_info->n_pend_reads = buf_pool->n_pend_reads; |
1507 | 5096 | 5119 | ||
1508 | 5120 | mutex_enter(&buf_pool->flush_state_mutex); | ||
1509 | 5121 | |||
1510 | 5097 | pool_info->n_pending_flush_lru = | 5122 | pool_info->n_pending_flush_lru = |
1511 | 5098 | (buf_pool->n_flush[BUF_FLUSH_LRU] | 5123 | (buf_pool->n_flush[BUF_FLUSH_LRU] |
1512 | 5099 | + buf_pool->init_flush[BUF_FLUSH_LRU]); | 5124 | + buf_pool->init_flush[BUF_FLUSH_LRU]); |
1513 | @@ -5106,7 +5131,7 @@ | |||
1514 | 5106 | (buf_pool->n_flush[BUF_FLUSH_SINGLE_PAGE] | 5131 | (buf_pool->n_flush[BUF_FLUSH_SINGLE_PAGE] |
1515 | 5107 | + buf_pool->init_flush[BUF_FLUSH_SINGLE_PAGE]); | 5132 | + buf_pool->init_flush[BUF_FLUSH_SINGLE_PAGE]); |
1516 | 5108 | 5133 | ||
1518 | 5109 | buf_flush_list_mutex_exit(buf_pool); | 5134 | mutex_exit(&buf_pool->flush_state_mutex); |
1519 | 5110 | 5135 | ||
1520 | 5111 | current_time = time(NULL); | 5136 | current_time = time(NULL); |
1521 | 5112 | time_elapsed = 0.001 + difftime(current_time, | 5137 | time_elapsed = 0.001 + difftime(current_time, |
1522 | @@ -5189,7 +5214,6 @@ | |||
1523 | 5189 | pool_info->unzip_cur = buf_LRU_stat_cur.unzip; | 5214 | pool_info->unzip_cur = buf_LRU_stat_cur.unzip; |
1524 | 5190 | 5215 | ||
1525 | 5191 | buf_refresh_io_stats(buf_pool); | 5216 | buf_refresh_io_stats(buf_pool); |
1526 | 5192 | buf_pool_mutex_exit(buf_pool); | ||
1527 | 5193 | } | 5217 | } |
1528 | 5194 | 5218 | ||
1529 | 5195 | /*********************************************************************//** | 5219 | /*********************************************************************//** |
1530 | @@ -5398,22 +5422,22 @@ | |||
1531 | 5398 | ulint i; | 5422 | ulint i; |
1532 | 5399 | ulint pending_io = 0; | 5423 | ulint pending_io = 0; |
1533 | 5400 | 5424 | ||
1534 | 5401 | buf_pool_mutex_enter_all(); | ||
1535 | 5402 | |||
1536 | 5403 | for (i = 0; i < srv_buf_pool_instances; i++) { | 5425 | for (i = 0; i < srv_buf_pool_instances; i++) { |
1538 | 5404 | const buf_pool_t* buf_pool; | 5426 | buf_pool_t* buf_pool; |
1539 | 5405 | 5427 | ||
1540 | 5406 | buf_pool = buf_pool_from_array(i); | 5428 | buf_pool = buf_pool_from_array(i); |
1541 | 5407 | 5429 | ||
1547 | 5408 | pending_io += buf_pool->n_pend_reads | 5430 | pending_io += buf_pool->n_pend_reads; |
1548 | 5409 | + buf_pool->n_flush[BUF_FLUSH_LRU] | 5431 | |
1549 | 5410 | + buf_pool->n_flush[BUF_FLUSH_SINGLE_PAGE] | 5432 | mutex_enter(&buf_pool->flush_state_mutex); |
1550 | 5411 | + buf_pool->n_flush[BUF_FLUSH_LIST]; | 5433 | |
1551 | 5412 | 5434 | pending_io += buf_pool->n_flush[BUF_FLUSH_LRU]; | |
1552 | 5435 | pending_io += buf_pool->n_flush[BUF_FLUSH_SINGLE_PAGE]; | ||
1553 | 5436 | pending_io += buf_pool->n_flush[BUF_FLUSH_LIST]; | ||
1554 | 5437 | |||
1555 | 5438 | mutex_exit(&buf_pool->flush_state_mutex); | ||
1556 | 5413 | } | 5439 | } |
1557 | 5414 | 5440 | ||
1558 | 5415 | buf_pool_mutex_exit_all(); | ||
1559 | 5416 | |||
1560 | 5417 | return(pending_io); | 5441 | return(pending_io); |
1561 | 5418 | } | 5442 | } |
1562 | 5419 | 5443 | ||
1563 | @@ -5429,11 +5453,11 @@ | |||
1564 | 5429 | { | 5453 | { |
1565 | 5430 | ulint len; | 5454 | ulint len; |
1566 | 5431 | 5455 | ||
1568 | 5432 | buf_pool_mutex_enter(buf_pool); | 5456 | mutex_enter(&buf_pool->free_list_mutex); |
1569 | 5433 | 5457 | ||
1570 | 5434 | len = UT_LIST_GET_LEN(buf_pool->free); | 5458 | len = UT_LIST_GET_LEN(buf_pool->free); |
1571 | 5435 | 5459 | ||
1573 | 5436 | buf_pool_mutex_exit(buf_pool); | 5460 | mutex_exit(&buf_pool->free_list_mutex); |
1574 | 5437 | 5461 | ||
1575 | 5438 | return(len); | 5462 | return(len); |
1576 | 5439 | } | 5463 | } |
1577 | 5440 | 5464 | ||
1578 | === modified file 'Percona-Server/storage/innobase/buf/buf0dblwr.cc' | |||
1579 | --- Percona-Server/storage/innobase/buf/buf0dblwr.cc 2013-06-20 15:16:00 +0000 | |||
1580 | +++ Percona-Server/storage/innobase/buf/buf0dblwr.cc 2013-09-20 05:29:11 +0000 | |||
1581 | @@ -936,6 +936,7 @@ | |||
1582 | 936 | ulint zip_size; | 936 | ulint zip_size; |
1583 | 937 | 937 | ||
1584 | 938 | ut_a(buf_page_in_file(bpage)); | 938 | ut_a(buf_page_in_file(bpage)); |
1585 | 939 | ut_ad(!mutex_own(&buf_pool_from_bpage(bpage)->LRU_list_mutex)); | ||
1586 | 939 | 940 | ||
1587 | 940 | try_again: | 941 | try_again: |
1588 | 941 | mutex_enter(&buf_dblwr->mutex); | 942 | mutex_enter(&buf_dblwr->mutex); |
1589 | 942 | 943 | ||
1590 | === modified file 'Percona-Server/storage/innobase/buf/buf0dump.cc' | |||
1591 | --- Percona-Server/storage/innobase/buf/buf0dump.cc 2012-12-04 08:24:59 +0000 | |||
1592 | +++ Percona-Server/storage/innobase/buf/buf0dump.cc 2013-09-20 05:29:11 +0000 | |||
1593 | @@ -28,7 +28,7 @@ | |||
1594 | 28 | #include <stdarg.h> /* va_* */ | 28 | #include <stdarg.h> /* va_* */ |
1595 | 29 | #include <string.h> /* strerror() */ | 29 | #include <string.h> /* strerror() */ |
1596 | 30 | 30 | ||
1598 | 31 | #include "buf0buf.h" /* buf_pool_mutex_enter(), srv_buf_pool_instances */ | 31 | #include "buf0buf.h" /* srv_buf_pool_instances */ |
1599 | 32 | #include "buf0dump.h" | 32 | #include "buf0dump.h" |
1600 | 33 | #include "db0err.h" | 33 | #include "db0err.h" |
1601 | 34 | #include "dict0dict.h" /* dict_operation_lock */ | 34 | #include "dict0dict.h" /* dict_operation_lock */ |
1602 | @@ -58,8 +58,8 @@ | |||
1603 | 58 | static ibool buf_load_abort_flag = FALSE; | 58 | static ibool buf_load_abort_flag = FALSE; |
1604 | 59 | 59 | ||
1605 | 60 | /* Used to temporary store dump info in order to avoid IO while holding | 60 | /* Used to temporary store dump info in order to avoid IO while holding |
1608 | 61 | buffer pool mutex during dump and also to sort the contents of the dump | 61 | buffer pool LRU list mutex during dump and also to sort the contents of the |
1609 | 62 | before reading the pages from disk during load. | 62 | dump before reading the pages from disk during load. |
1610 | 63 | We store the space id in the high 32 bits and page no in low 32 bits. */ | 63 | We store the space id in the high 32 bits and page no in low 32 bits. */ |
1611 | 64 | typedef ib_uint64_t buf_dump_t; | 64 | typedef ib_uint64_t buf_dump_t; |
1612 | 65 | 65 | ||
1613 | @@ -218,15 +218,15 @@ | |||
1614 | 218 | 218 | ||
1615 | 219 | buf_pool = buf_pool_from_array(i); | 219 | buf_pool = buf_pool_from_array(i); |
1616 | 220 | 220 | ||
1618 | 221 | /* obtain buf_pool mutex before allocate, since | 221 | /* obtain buf_pool LRU list mutex before allocate, since |
1619 | 222 | UT_LIST_GET_LEN(buf_pool->LRU) could change */ | 222 | UT_LIST_GET_LEN(buf_pool->LRU) could change */ |
1621 | 223 | buf_pool_mutex_enter(buf_pool); | 223 | mutex_enter(&buf_pool->LRU_list_mutex); |
1622 | 224 | 224 | ||
1623 | 225 | n_pages = UT_LIST_GET_LEN(buf_pool->LRU); | 225 | n_pages = UT_LIST_GET_LEN(buf_pool->LRU); |
1624 | 226 | 226 | ||
1625 | 227 | /* skip empty buffer pools */ | 227 | /* skip empty buffer pools */ |
1626 | 228 | if (n_pages == 0) { | 228 | if (n_pages == 0) { |
1628 | 229 | buf_pool_mutex_exit(buf_pool); | 229 | mutex_exit(&buf_pool->LRU_list_mutex); |
1629 | 230 | continue; | 230 | continue; |
1630 | 231 | } | 231 | } |
1631 | 232 | 232 | ||
1632 | @@ -234,7 +234,7 @@ | |||
1633 | 234 | ut_malloc(n_pages * sizeof(*dump))) ; | 234 | ut_malloc(n_pages * sizeof(*dump))) ; |
1634 | 235 | 235 | ||
1635 | 236 | if (dump == NULL) { | 236 | if (dump == NULL) { |
1637 | 237 | buf_pool_mutex_exit(buf_pool); | 237 | mutex_exit(&buf_pool->LRU_list_mutex); |
1638 | 238 | fclose(f); | 238 | fclose(f); |
1639 | 239 | buf_dump_status(STATUS_ERR, | 239 | buf_dump_status(STATUS_ERR, |
1640 | 240 | "Cannot allocate " ULINTPF " bytes: %s", | 240 | "Cannot allocate " ULINTPF " bytes: %s", |
1641 | @@ -256,7 +256,7 @@ | |||
1642 | 256 | 256 | ||
1643 | 257 | ut_a(j == n_pages); | 257 | ut_a(j == n_pages); |
1644 | 258 | 258 | ||
1646 | 259 | buf_pool_mutex_exit(buf_pool); | 259 | mutex_exit(&buf_pool->LRU_list_mutex); |
1647 | 260 | 260 | ||
1648 | 261 | for (j = 0; j < n_pages && !SHOULD_QUIT(); j++) { | 261 | for (j = 0; j < n_pages && !SHOULD_QUIT(); j++) { |
1649 | 262 | ret = fprintf(f, ULINTPF "," ULINTPF "\n", | 262 | ret = fprintf(f, ULINTPF "," ULINTPF "\n", |
1650 | 263 | 263 | ||
1651 | === modified file 'Percona-Server/storage/innobase/buf/buf0flu.cc' | |||
1652 | --- Percona-Server/storage/innobase/buf/buf0flu.cc 2013-08-16 09:11:51 +0000 | |||
1653 | +++ Percona-Server/storage/innobase/buf/buf0flu.cc 2013-09-20 05:29:11 +0000 | |||
1654 | @@ -350,7 +350,6 @@ | |||
1655 | 350 | buf_block_t* block, /*!< in/out: block which is modified */ | 350 | buf_block_t* block, /*!< in/out: block which is modified */ |
1656 | 351 | lsn_t lsn) /*!< in: oldest modification */ | 351 | lsn_t lsn) /*!< in: oldest modification */ |
1657 | 352 | { | 352 | { |
1658 | 353 | ut_ad(!buf_pool_mutex_own(buf_pool)); | ||
1659 | 354 | ut_ad(log_flush_order_mutex_own()); | 353 | ut_ad(log_flush_order_mutex_own()); |
1660 | 355 | ut_ad(mutex_own(&block->mutex)); | 354 | ut_ad(mutex_own(&block->mutex)); |
1661 | 356 | 355 | ||
1662 | @@ -409,15 +408,14 @@ | |||
1663 | 409 | buf_page_t* prev_b; | 408 | buf_page_t* prev_b; |
1664 | 410 | buf_page_t* b; | 409 | buf_page_t* b; |
1665 | 411 | 410 | ||
1666 | 412 | ut_ad(!buf_pool_mutex_own(buf_pool)); | ||
1667 | 413 | ut_ad(log_flush_order_mutex_own()); | 411 | ut_ad(log_flush_order_mutex_own()); |
1668 | 414 | ut_ad(mutex_own(&block->mutex)); | 412 | ut_ad(mutex_own(&block->mutex)); |
1669 | 415 | ut_ad(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE); | 413 | ut_ad(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE); |
1670 | 416 | 414 | ||
1671 | 417 | buf_flush_list_mutex_enter(buf_pool); | 415 | buf_flush_list_mutex_enter(buf_pool); |
1672 | 418 | 416 | ||
1675 | 419 | /* The field in_LRU_list is protected by buf_pool->mutex, which | 417 | /* The field in_LRU_list is protected by buf_pool->LRU_list_mutex, |
1676 | 420 | we are not holding. However, while a block is in the flush | 418 | which we are not holding. However, while a block is in the flush |
1677 | 421 | list, it is dirty and cannot be discarded, not from the | 419 | list, it is dirty and cannot be discarded, not from the |
1678 | 422 | page_hash or from the LRU list. At most, the uncompressed | 420 | page_hash or from the LRU list. At most, the uncompressed |
1679 | 423 | page frame of a compressed block may be discarded or created | 421 | page frame of a compressed block may be discarded or created |
1680 | @@ -501,7 +499,7 @@ | |||
1681 | 501 | { | 499 | { |
1682 | 502 | #ifdef UNIV_DEBUG | 500 | #ifdef UNIV_DEBUG |
1683 | 503 | buf_pool_t* buf_pool = buf_pool_from_bpage(bpage); | 501 | buf_pool_t* buf_pool = buf_pool_from_bpage(bpage); |
1685 | 504 | ut_ad(buf_pool_mutex_own(buf_pool)); | 502 | ut_ad(mutex_own(&buf_pool->LRU_list_mutex)); |
1686 | 505 | #endif | 503 | #endif |
1687 | 506 | ut_ad(mutex_own(buf_page_get_mutex(bpage))); | 504 | ut_ad(mutex_own(buf_page_get_mutex(bpage))); |
1688 | 507 | ut_ad(bpage->in_LRU_list); | 505 | ut_ad(bpage->in_LRU_list); |
1689 | @@ -535,17 +533,13 @@ | |||
1690 | 535 | buf_page_in_file(bpage) */ | 533 | buf_page_in_file(bpage) */ |
1691 | 536 | buf_flush_t flush_type)/*!< in: type of flush */ | 534 | buf_flush_t flush_type)/*!< in: type of flush */ |
1692 | 537 | { | 535 | { |
1698 | 538 | #ifdef UNIV_DEBUG | 536 | ut_ad(flush_type < BUF_FLUSH_N_TYPES); |
1699 | 539 | buf_pool_t* buf_pool = buf_pool_from_bpage(bpage); | 537 | ut_ad(mutex_own(buf_page_get_mutex(bpage)) |
1700 | 540 | ut_ad(buf_pool_mutex_own(buf_pool)); | 538 | || flush_type == BUF_FLUSH_LIST); |
1696 | 541 | #endif /* UNIV_DEBUG */ | ||
1697 | 542 | |||
1701 | 543 | ut_a(buf_page_in_file(bpage)); | 539 | ut_a(buf_page_in_file(bpage)); |
1702 | 544 | ut_ad(mutex_own(buf_page_get_mutex(bpage))); | ||
1703 | 545 | ut_ad(flush_type < BUF_FLUSH_N_TYPES); | ||
1704 | 546 | 540 | ||
1705 | 547 | if (bpage->oldest_modification == 0 | 541 | if (bpage->oldest_modification == 0 |
1707 | 548 | || buf_page_get_io_fix(bpage) != BUF_IO_NONE) { | 542 | || buf_page_get_io_fix_unlocked(bpage) != BUF_IO_NONE) { |
1708 | 549 | return(false); | 543 | return(false); |
1709 | 550 | } | 544 | } |
1710 | 551 | 545 | ||
1711 | @@ -583,8 +577,11 @@ | |||
1712 | 583 | buf_pool_t* buf_pool = buf_pool_from_bpage(bpage); | 577 | buf_pool_t* buf_pool = buf_pool_from_bpage(bpage); |
1713 | 584 | ulint zip_size; | 578 | ulint zip_size; |
1714 | 585 | 579 | ||
1715 | 586 | ut_ad(buf_pool_mutex_own(buf_pool)); | ||
1716 | 587 | ut_ad(mutex_own(buf_page_get_mutex(bpage))); | 580 | ut_ad(mutex_own(buf_page_get_mutex(bpage))); |
1717 | 581 | #if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG | ||
1718 | 582 | ut_ad(buf_page_get_state(bpage) != BUF_BLOCK_ZIP_DIRTY | ||
1719 | 583 | || mutex_own(&buf_pool->LRU_list_mutex)); | ||
1720 | 584 | #endif | ||
1721 | 588 | ut_ad(bpage->in_flush_list); | 585 | ut_ad(bpage->in_flush_list); |
1722 | 589 | 586 | ||
1723 | 590 | buf_flush_list_mutex_enter(buf_pool); | 587 | buf_flush_list_mutex_enter(buf_pool); |
1724 | @@ -655,7 +652,6 @@ | |||
1725 | 655 | buf_page_t* prev_b = NULL; | 652 | buf_page_t* prev_b = NULL; |
1726 | 656 | buf_pool_t* buf_pool = buf_pool_from_bpage(bpage); | 653 | buf_pool_t* buf_pool = buf_pool_from_bpage(bpage); |
1727 | 657 | 654 | ||
1728 | 658 | ut_ad(buf_pool_mutex_own(buf_pool)); | ||
1729 | 659 | /* Must reside in the same buffer pool. */ | 655 | /* Must reside in the same buffer pool. */ |
1730 | 660 | ut_ad(buf_pool == buf_pool_from_bpage(dpage)); | 656 | ut_ad(buf_pool == buf_pool_from_bpage(dpage)); |
1731 | 661 | 657 | ||
1732 | @@ -663,13 +659,6 @@ | |||
1733 | 663 | 659 | ||
1734 | 664 | buf_flush_list_mutex_enter(buf_pool); | 660 | buf_flush_list_mutex_enter(buf_pool); |
1735 | 665 | 661 | ||
1736 | 666 | /* FIXME: At this point we have both buf_pool and flush_list | ||
1737 | 667 | mutexes. Theoretically removal of a block from flush list is | ||
1738 | 668 | only covered by flush_list mutex but currently we do | ||
1739 | 669 | have buf_pool mutex in buf_flush_remove() therefore this block | ||
1740 | 670 | is guaranteed to be in the flush list. We need to check if | ||
1741 | 671 | this will work without the assumption of block removing code | ||
1742 | 672 | having the buf_pool mutex. */ | ||
1743 | 673 | ut_ad(bpage->in_flush_list); | 662 | ut_ad(bpage->in_flush_list); |
1744 | 674 | ut_ad(dpage->in_flush_list); | 663 | ut_ad(dpage->in_flush_list); |
1745 | 675 | 664 | ||
1746 | @@ -720,14 +709,15 @@ | |||
1747 | 720 | /*=====================*/ | 709 | /*=====================*/ |
1748 | 721 | buf_page_t* bpage) /*!< in: pointer to the block in question */ | 710 | buf_page_t* bpage) /*!< in: pointer to the block in question */ |
1749 | 722 | { | 711 | { |
1751 | 723 | buf_flush_t flush_type; | 712 | buf_flush_t flush_type = buf_page_get_flush_type(bpage); |
1752 | 724 | buf_pool_t* buf_pool = buf_pool_from_bpage(bpage); | 713 | buf_pool_t* buf_pool = buf_pool_from_bpage(bpage); |
1753 | 725 | 714 | ||
1755 | 726 | ut_ad(bpage); | 715 | mutex_enter(&buf_pool->flush_state_mutex); |
1756 | 727 | 716 | ||
1757 | 728 | buf_flush_remove(bpage); | 717 | buf_flush_remove(bpage); |
1758 | 729 | 718 | ||
1760 | 730 | flush_type = buf_page_get_flush_type(bpage); | 719 | buf_page_set_io_fix(bpage, BUF_IO_NONE); |
1761 | 720 | |||
1762 | 731 | buf_pool->n_flush[flush_type]--; | 721 | buf_pool->n_flush[flush_type]--; |
1763 | 732 | 722 | ||
1764 | 733 | /* fprintf(stderr, "n pending flush %lu\n", | 723 | /* fprintf(stderr, "n pending flush %lu\n", |
1765 | @@ -742,6 +732,8 @@ | |||
1766 | 742 | } | 732 | } |
1767 | 743 | 733 | ||
1768 | 744 | buf_dblwr_update(bpage, flush_type); | 734 | buf_dblwr_update(bpage, flush_type); |
1769 | 735 | |||
1770 | 736 | mutex_exit(&buf_pool->flush_state_mutex); | ||
1771 | 745 | } | 737 | } |
1772 | 746 | #endif /* !UNIV_HOTBACKUP */ | 738 | #endif /* !UNIV_HOTBACKUP */ |
1773 | 747 | 739 | ||
1774 | @@ -890,7 +882,7 @@ | |||
1775 | 890 | 882 | ||
1776 | 891 | #ifdef UNIV_DEBUG | 883 | #ifdef UNIV_DEBUG |
1777 | 892 | buf_pool_t* buf_pool = buf_pool_from_bpage(bpage); | 884 | buf_pool_t* buf_pool = buf_pool_from_bpage(bpage); |
1779 | 893 | ut_ad(!buf_pool_mutex_own(buf_pool)); | 885 | ut_ad(!mutex_own(&buf_pool->LRU_list_mutex)); |
1780 | 894 | #endif | 886 | #endif |
1781 | 895 | 887 | ||
1782 | 896 | #ifdef UNIV_LOG_DEBUG | 888 | #ifdef UNIV_LOG_DEBUG |
1783 | @@ -899,15 +891,14 @@ | |||
1784 | 899 | 891 | ||
1785 | 900 | ut_ad(buf_page_in_file(bpage)); | 892 | ut_ad(buf_page_in_file(bpage)); |
1786 | 901 | 893 | ||
1788 | 902 | /* We are not holding buf_pool->mutex or block_mutex here. | 894 | /* We are not holding block_mutex here. |
1789 | 903 | Nevertheless, it is safe to access bpage, because it is | 895 | Nevertheless, it is safe to access bpage, because it is |
1790 | 904 | io_fixed and oldest_modification != 0. Thus, it cannot be | 896 | io_fixed and oldest_modification != 0. Thus, it cannot be |
1791 | 905 | relocated in the buffer pool or removed from flush_list or | 897 | relocated in the buffer pool or removed from flush_list or |
1792 | 906 | LRU_list. */ | 898 | LRU_list. */ |
1793 | 907 | ut_ad(!buf_pool_mutex_own(buf_pool)); | ||
1794 | 908 | ut_ad(!buf_flush_list_mutex_own(buf_pool)); | 899 | ut_ad(!buf_flush_list_mutex_own(buf_pool)); |
1795 | 909 | ut_ad(!mutex_own(buf_page_get_mutex(bpage))); | 900 | ut_ad(!mutex_own(buf_page_get_mutex(bpage))); |
1797 | 910 | ut_ad(buf_page_get_io_fix(bpage) == BUF_IO_WRITE); | 901 | ut_ad(buf_page_get_io_fix_unlocked(bpage) == BUF_IO_WRITE); |
1798 | 911 | ut_ad(bpage->oldest_modification != 0); | 902 | ut_ad(bpage->oldest_modification != 0); |
1799 | 912 | 903 | ||
1800 | 913 | #ifdef UNIV_IBUF_COUNT_DEBUG | 904 | #ifdef UNIV_IBUF_COUNT_DEBUG |
1801 | @@ -989,9 +980,8 @@ | |||
1802 | 989 | Writes a flushable page asynchronously from the buffer pool to a file. | 980 | Writes a flushable page asynchronously from the buffer pool to a file. |
1803 | 990 | NOTE: in simulated aio we must call | 981 | NOTE: in simulated aio we must call |
1804 | 991 | os_aio_simulated_wake_handler_threads after we have posted a batch of | 982 | os_aio_simulated_wake_handler_threads after we have posted a batch of |
1808 | 992 | writes! NOTE: buf_pool->mutex and buf_page_get_mutex(bpage) must be | 983 | writes! NOTE: buf_page_get_mutex(bpage) must be held upon entering this |
1809 | 993 | held upon entering this function, and they will be released by this | 984 | function, and it will be released by this function. */ |
1807 | 994 | function. */ | ||
1810 | 995 | UNIV_INTERN | 985 | UNIV_INTERN |
1811 | 996 | void | 986 | void |
1812 | 997 | buf_flush_page( | 987 | buf_flush_page( |
1813 | @@ -1005,7 +995,7 @@ | |||
1814 | 1005 | ibool is_uncompressed; | 995 | ibool is_uncompressed; |
1815 | 1006 | 996 | ||
1816 | 1007 | ut_ad(flush_type < BUF_FLUSH_N_TYPES); | 997 | ut_ad(flush_type < BUF_FLUSH_N_TYPES); |
1818 | 1008 | ut_ad(buf_pool_mutex_own(buf_pool)); | 998 | ut_ad(!mutex_own(&buf_pool->LRU_list_mutex)); |
1819 | 1009 | ut_ad(buf_page_in_file(bpage)); | 999 | ut_ad(buf_page_in_file(bpage)); |
1820 | 1010 | ut_ad(!sync || flush_type == BUF_FLUSH_SINGLE_PAGE); | 1000 | ut_ad(!sync || flush_type == BUF_FLUSH_SINGLE_PAGE); |
1821 | 1011 | 1001 | ||
1822 | @@ -1014,6 +1004,8 @@ | |||
1823 | 1014 | 1004 | ||
1824 | 1015 | ut_ad(buf_flush_ready_for_flush(bpage, flush_type)); | 1005 | ut_ad(buf_flush_ready_for_flush(bpage, flush_type)); |
1825 | 1016 | 1006 | ||
1826 | 1007 | mutex_enter(&buf_pool->flush_state_mutex); | ||
1827 | 1008 | |||
1828 | 1017 | buf_page_set_io_fix(bpage, BUF_IO_WRITE); | 1009 | buf_page_set_io_fix(bpage, BUF_IO_WRITE); |
1829 | 1018 | 1010 | ||
1830 | 1019 | buf_page_set_flush_type(bpage, flush_type); | 1011 | buf_page_set_flush_type(bpage, flush_type); |
1831 | @@ -1025,6 +1017,8 @@ | |||
1832 | 1025 | 1017 | ||
1833 | 1026 | buf_pool->n_flush[flush_type]++; | 1018 | buf_pool->n_flush[flush_type]++; |
1834 | 1027 | 1019 | ||
1835 | 1020 | mutex_exit(&buf_pool->flush_state_mutex); | ||
1836 | 1021 | |||
1837 | 1028 | is_uncompressed = (buf_page_get_state(bpage) == BUF_BLOCK_FILE_PAGE); | 1022 | is_uncompressed = (buf_page_get_state(bpage) == BUF_BLOCK_FILE_PAGE); |
1838 | 1029 | ut_ad(is_uncompressed == (block_mutex != &buf_pool->zip_mutex)); | 1023 | ut_ad(is_uncompressed == (block_mutex != &buf_pool->zip_mutex)); |
1839 | 1030 | 1024 | ||
1840 | @@ -1042,7 +1036,6 @@ | |||
1841 | 1042 | } | 1036 | } |
1842 | 1043 | 1037 | ||
1843 | 1044 | mutex_exit(block_mutex); | 1038 | mutex_exit(block_mutex); |
1844 | 1045 | buf_pool_mutex_exit(buf_pool); | ||
1845 | 1046 | 1039 | ||
1846 | 1047 | /* Even though bpage is not protected by any mutex at | 1040 | /* Even though bpage is not protected by any mutex at |
1847 | 1048 | this point, it is safe to access bpage, because it is | 1041 | this point, it is safe to access bpage, because it is |
1848 | @@ -1080,11 +1073,10 @@ | |||
1849 | 1080 | } | 1073 | } |
1850 | 1081 | 1074 | ||
1851 | 1082 | /* Note that the s-latch is acquired before releasing the | 1075 | /* Note that the s-latch is acquired before releasing the |
1854 | 1083 | buf_pool mutex: this ensures that the latch is acquired | 1076 | buf_page_get_mutex() mutex: this ensures that the latch is |
1855 | 1084 | immediately. */ | 1077 | acquired immediately. */ |
1856 | 1085 | 1078 | ||
1857 | 1086 | mutex_exit(block_mutex); | 1079 | mutex_exit(block_mutex); |
1858 | 1087 | buf_pool_mutex_exit(buf_pool); | ||
1859 | 1088 | break; | 1080 | break; |
1860 | 1089 | 1081 | ||
1861 | 1090 | default: | 1082 | default: |
1862 | @@ -1109,9 +1101,9 @@ | |||
1863 | 1109 | # if defined UNIV_DEBUG || defined UNIV_IBUF_DEBUG | 1101 | # if defined UNIV_DEBUG || defined UNIV_IBUF_DEBUG |
1864 | 1110 | /********************************************************************//** | 1102 | /********************************************************************//** |
1865 | 1111 | Writes a flushable page asynchronously from the buffer pool to a file. | 1103 | Writes a flushable page asynchronously from the buffer pool to a file. |
1869 | 1112 | NOTE: buf_pool->mutex and block->mutex must be held upon entering this | 1104 | NOTE: block->mutex must be held upon entering this function, and it will be |
1870 | 1113 | function, and they will be released by this function after flushing. | 1105 | released by this function after flushing. This is loosely based on |
1871 | 1114 | This is loosely based on buf_flush_batch() and buf_flush_page(). | 1106 | buf_flush_batch() and buf_flush_page(). |
1872 | 1115 | @return TRUE if the page was flushed and the mutexes released */ | 1107 | @return TRUE if the page was flushed and the mutexes released */ |
1873 | 1116 | UNIV_INTERN | 1108 | UNIV_INTERN |
1874 | 1117 | ibool | 1109 | ibool |
1875 | @@ -1120,7 +1112,6 @@ | |||
1876 | 1120 | buf_pool_t* buf_pool, /*!< in/out: buffer pool instance */ | 1112 | buf_pool_t* buf_pool, /*!< in/out: buffer pool instance */ |
1877 | 1121 | buf_block_t* block) /*!< in/out: buffer control block */ | 1113 | buf_block_t* block) /*!< in/out: buffer control block */ |
1878 | 1122 | { | 1114 | { |
1879 | 1123 | ut_ad(buf_pool_mutex_own(buf_pool)); | ||
1880 | 1124 | ut_ad(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE); | 1115 | ut_ad(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE); |
1881 | 1125 | ut_ad(mutex_own(&block->mutex)); | 1116 | ut_ad(mutex_own(&block->mutex)); |
1882 | 1126 | 1117 | ||
1883 | @@ -1149,21 +1140,27 @@ | |||
1884 | 1149 | buf_page_t* bpage; | 1140 | buf_page_t* bpage; |
1885 | 1150 | buf_pool_t* buf_pool = buf_pool_get(space, offset); | 1141 | buf_pool_t* buf_pool = buf_pool_get(space, offset); |
1886 | 1151 | bool ret; | 1142 | bool ret; |
1887 | 1143 | rw_lock_t* hash_lock; | ||
1888 | 1144 | ib_mutex_t* block_mutex; | ||
1889 | 1152 | 1145 | ||
1890 | 1153 | ut_ad(flush_type == BUF_FLUSH_LRU | 1146 | ut_ad(flush_type == BUF_FLUSH_LRU |
1891 | 1154 | || flush_type == BUF_FLUSH_LIST); | 1147 | || flush_type == BUF_FLUSH_LIST); |
1892 | 1155 | 1148 | ||
1893 | 1156 | buf_pool_mutex_enter(buf_pool); | ||
1894 | 1157 | |||
1895 | 1158 | /* We only want to flush pages from this buffer pool. */ | 1149 | /* We only want to flush pages from this buffer pool. */ |
1897 | 1159 | bpage = buf_page_hash_get(buf_pool, space, offset); | 1150 | bpage = buf_page_hash_get_s_locked(buf_pool, space, offset, |
1898 | 1151 | &hash_lock); | ||
1899 | 1160 | 1152 | ||
1900 | 1161 | if (!bpage) { | 1153 | if (!bpage) { |
1901 | 1162 | 1154 | ||
1902 | 1163 | buf_pool_mutex_exit(buf_pool); | ||
1903 | 1164 | return(false); | 1155 | return(false); |
1904 | 1165 | } | 1156 | } |
1905 | 1166 | 1157 | ||
1906 | 1158 | block_mutex = buf_page_get_mutex(bpage); | ||
1907 | 1159 | |||
1908 | 1160 | mutex_enter(block_mutex); | ||
1909 | 1161 | |||
1910 | 1162 | rw_lock_s_unlock(hash_lock); | ||
1911 | 1163 | |||
1912 | 1167 | ut_a(buf_page_in_file(bpage)); | 1164 | ut_a(buf_page_in_file(bpage)); |
1913 | 1168 | 1165 | ||
1914 | 1169 | /* We avoid flushing 'non-old' blocks in an LRU flush, | 1166 | /* We avoid flushing 'non-old' blocks in an LRU flush, |
1915 | @@ -1171,15 +1168,13 @@ | |||
1916 | 1171 | 1168 | ||
1917 | 1172 | ret = false; | 1169 | ret = false; |
1918 | 1173 | if (flush_type != BUF_FLUSH_LRU || buf_page_is_old(bpage)) { | 1170 | if (flush_type != BUF_FLUSH_LRU || buf_page_is_old(bpage)) { |
1919 | 1174 | ib_mutex_t* block_mutex = buf_page_get_mutex(bpage); | ||
1920 | 1175 | 1171 | ||
1921 | 1176 | mutex_enter(block_mutex); | ||
1922 | 1177 | if (buf_flush_ready_for_flush(bpage, flush_type)) { | 1172 | if (buf_flush_ready_for_flush(bpage, flush_type)) { |
1923 | 1178 | ret = true; | 1173 | ret = true; |
1924 | 1179 | } | 1174 | } |
1925 | 1180 | mutex_exit(block_mutex); | ||
1926 | 1181 | } | 1175 | } |
1928 | 1182 | buf_pool_mutex_exit(buf_pool); | 1176 | |
1929 | 1177 | mutex_exit(block_mutex); | ||
1930 | 1183 | 1178 | ||
1931 | 1184 | return(ret); | 1179 | return(ret); |
1932 | 1185 | } | 1180 | } |
1933 | @@ -1207,6 +1202,8 @@ | |||
1934 | 1207 | buf_pool_t* buf_pool = buf_pool_get(space, offset); | 1202 | buf_pool_t* buf_pool = buf_pool_get(space, offset); |
1935 | 1208 | 1203 | ||
1936 | 1209 | ut_ad(flush_type == BUF_FLUSH_LRU || flush_type == BUF_FLUSH_LIST); | 1204 | ut_ad(flush_type == BUF_FLUSH_LRU || flush_type == BUF_FLUSH_LIST); |
1937 | 1205 | ut_ad(!mutex_own(&buf_pool->LRU_list_mutex)); | ||
1938 | 1206 | ut_ad(!buf_flush_list_mutex_own(buf_pool)); | ||
1939 | 1210 | 1207 | ||
1940 | 1211 | if (UT_LIST_GET_LEN(buf_pool->LRU) < BUF_LRU_OLD_MIN_LEN | 1208 | if (UT_LIST_GET_LEN(buf_pool->LRU) < BUF_LRU_OLD_MIN_LEN |
1941 | 1212 | || srv_flush_neighbors == 0) { | 1209 | || srv_flush_neighbors == 0) { |
1942 | @@ -1262,6 +1259,8 @@ | |||
1943 | 1262 | for (i = low; i < high; i++) { | 1259 | for (i = low; i < high; i++) { |
1944 | 1263 | 1260 | ||
1945 | 1264 | buf_page_t* bpage; | 1261 | buf_page_t* bpage; |
1946 | 1262 | rw_lock_t* hash_lock; | ||
1947 | 1263 | ib_mutex_t* block_mutex; | ||
1948 | 1265 | 1264 | ||
1949 | 1266 | if ((count + n_flushed) >= n_to_flush) { | 1265 | if ((count + n_flushed) >= n_to_flush) { |
1950 | 1267 | 1266 | ||
1951 | @@ -1280,17 +1279,21 @@ | |||
1952 | 1280 | 1279 | ||
1953 | 1281 | buf_pool = buf_pool_get(space, i); | 1280 | buf_pool = buf_pool_get(space, i); |
1954 | 1282 | 1281 | ||
1955 | 1283 | buf_pool_mutex_enter(buf_pool); | ||
1956 | 1284 | |||
1957 | 1285 | /* We only want to flush pages from this buffer pool. */ | 1282 | /* We only want to flush pages from this buffer pool. */ |
1959 | 1286 | bpage = buf_page_hash_get(buf_pool, space, i); | 1283 | bpage = buf_page_hash_get_s_locked(buf_pool, space, i, |
1960 | 1284 | &hash_lock); | ||
1961 | 1287 | 1285 | ||
1962 | 1288 | if (!bpage) { | 1286 | if (!bpage) { |
1963 | 1289 | 1287 | ||
1964 | 1290 | buf_pool_mutex_exit(buf_pool); | ||
1965 | 1291 | continue; | 1288 | continue; |
1966 | 1292 | } | 1289 | } |
1967 | 1293 | 1290 | ||
1968 | 1291 | block_mutex = buf_page_get_mutex(bpage); | ||
1969 | 1292 | |||
1970 | 1293 | mutex_enter(block_mutex); | ||
1971 | 1294 | |||
1972 | 1295 | rw_lock_s_unlock(hash_lock); | ||
1973 | 1296 | |||
1974 | 1294 | ut_a(buf_page_in_file(bpage)); | 1297 | ut_a(buf_page_in_file(bpage)); |
1975 | 1295 | 1298 | ||
1976 | 1296 | /* We avoid flushing 'non-old' blocks in an LRU flush, | 1299 | /* We avoid flushing 'non-old' blocks in an LRU flush, |
1977 | @@ -1299,9 +1302,6 @@ | |||
1978 | 1299 | if (flush_type != BUF_FLUSH_LRU | 1302 | if (flush_type != BUF_FLUSH_LRU |
1979 | 1300 | || i == offset | 1303 | || i == offset |
1980 | 1301 | || buf_page_is_old(bpage)) { | 1304 | || buf_page_is_old(bpage)) { |
1981 | 1302 | ib_mutex_t* block_mutex = buf_page_get_mutex(bpage); | ||
1982 | 1303 | |||
1983 | 1304 | mutex_enter(block_mutex); | ||
1984 | 1305 | 1305 | ||
1985 | 1306 | if (buf_flush_ready_for_flush(bpage, flush_type) | 1306 | if (buf_flush_ready_for_flush(bpage, flush_type) |
1986 | 1307 | && (i == offset || !bpage->buf_fix_count)) { | 1307 | && (i == offset || !bpage->buf_fix_count)) { |
1987 | @@ -1316,14 +1316,12 @@ | |||
1988 | 1316 | 1316 | ||
1989 | 1317 | buf_flush_page(buf_pool, bpage, flush_type, false); | 1317 | buf_flush_page(buf_pool, bpage, flush_type, false); |
1990 | 1318 | ut_ad(!mutex_own(block_mutex)); | 1318 | ut_ad(!mutex_own(block_mutex)); |
1991 | 1319 | ut_ad(!buf_pool_mutex_own(buf_pool)); | ||
1992 | 1320 | count++; | 1319 | count++; |
1993 | 1321 | continue; | 1320 | continue; |
1994 | 1322 | } else { | ||
1995 | 1323 | mutex_exit(block_mutex); | ||
1996 | 1324 | } | 1321 | } |
1997 | 1325 | } | 1322 | } |
1999 | 1326 | buf_pool_mutex_exit(buf_pool); | 1323 | |
2000 | 1324 | mutex_exit(block_mutex); | ||
2001 | 1327 | } | 1325 | } |
2002 | 1328 | 1326 | ||
2003 | 1329 | if (count > 0) { | 1327 | if (count > 0) { |
2004 | @@ -1341,8 +1339,9 @@ | |||
2005 | 1341 | Check if the block is modified and ready for flushing. If the the block | 1339 | Check if the block is modified and ready for flushing. If the the block |
2006 | 1342 | is ready to flush then flush the page and try o flush its neighbors. | 1340 | is ready to flush then flush the page and try o flush its neighbors. |
2007 | 1343 | 1341 | ||
2010 | 1344 | @return TRUE if buf_pool mutex was released during this function. | 1342 | @return TRUE if, depending on the flush type, either LRU or flush list |
2011 | 1345 | This does not guarantee that some pages were written as well. | 1343 | mutex was released during this function. This does not guarantee that some |
2012 | 1344 | pages were written as well. | ||
2013 | 1346 | Number of pages written are incremented to the count. */ | 1345 | Number of pages written are incremented to the count. */ |
2014 | 1347 | static | 1346 | static |
2015 | 1348 | ibool | 1347 | ibool |
2016 | @@ -1358,16 +1357,21 @@ | |||
2017 | 1358 | ulint* count) /*!< in/out: number of pages | 1357 | ulint* count) /*!< in/out: number of pages |
2018 | 1359 | flushed */ | 1358 | flushed */ |
2019 | 1360 | { | 1359 | { |
2021 | 1361 | ib_mutex_t* block_mutex; | 1360 | ib_mutex_t* block_mutex = NULL; |
2022 | 1362 | ibool flushed = FALSE; | 1361 | ibool flushed = FALSE; |
2023 | 1363 | #ifdef UNIV_DEBUG | 1362 | #ifdef UNIV_DEBUG |
2024 | 1364 | buf_pool_t* buf_pool = buf_pool_from_bpage(bpage); | 1363 | buf_pool_t* buf_pool = buf_pool_from_bpage(bpage); |
2025 | 1365 | #endif /* UNIV_DEBUG */ | 1364 | #endif /* UNIV_DEBUG */ |
2026 | 1366 | 1365 | ||
2028 | 1367 | ut_ad(buf_pool_mutex_own(buf_pool)); | 1366 | ut_ad((flush_type == BUF_FLUSH_LRU |
2029 | 1367 | && mutex_own(&buf_pool->LRU_list_mutex)) | ||
2030 | 1368 | || (flush_type == BUF_FLUSH_LIST | ||
2031 | 1369 | && buf_flush_list_mutex_own(buf_pool))); | ||
2032 | 1368 | 1370 | ||
2035 | 1369 | block_mutex = buf_page_get_mutex(bpage); | 1371 | if (flush_type == BUF_FLUSH_LRU) { |
2036 | 1370 | mutex_enter(block_mutex); | 1372 | block_mutex = buf_page_get_mutex(bpage); |
2037 | 1373 | mutex_enter(block_mutex); | ||
2038 | 1374 | } | ||
2039 | 1371 | 1375 | ||
2040 | 1372 | ut_a(buf_page_in_file(bpage)); | 1376 | ut_a(buf_page_in_file(bpage)); |
2041 | 1373 | 1377 | ||
2042 | @@ -1378,14 +1382,20 @@ | |||
2043 | 1378 | 1382 | ||
2044 | 1379 | buf_pool = buf_pool_from_bpage(bpage); | 1383 | buf_pool = buf_pool_from_bpage(bpage); |
2045 | 1380 | 1384 | ||
2047 | 1381 | buf_pool_mutex_exit(buf_pool); | 1385 | if (flush_type == BUF_FLUSH_LRU) { |
2048 | 1386 | mutex_exit(&buf_pool->LRU_list_mutex); | ||
2049 | 1387 | } | ||
2050 | 1382 | 1388 | ||
2053 | 1383 | /* These fields are protected by both the | 1389 | /* These fields are protected by the buf_page_get_mutex() |
2054 | 1384 | buffer pool mutex and block mutex. */ | 1390 | mutex. */ |
2055 | 1385 | space = buf_page_get_space(bpage); | 1391 | space = buf_page_get_space(bpage); |
2056 | 1386 | offset = buf_page_get_page_no(bpage); | 1392 | offset = buf_page_get_page_no(bpage); |
2057 | 1387 | 1393 | ||
2059 | 1388 | mutex_exit(block_mutex); | 1394 | if (flush_type == BUF_FLUSH_LRU) { |
2060 | 1395 | mutex_exit(block_mutex); | ||
2061 | 1396 | } else { | ||
2062 | 1397 | buf_flush_list_mutex_exit(buf_pool); | ||
2063 | 1398 | } | ||
2064 | 1389 | 1399 | ||
2065 | 1390 | /* Try to flush also all the neighbors */ | 1400 | /* Try to flush also all the neighbors */ |
2066 | 1391 | *count += buf_flush_try_neighbors(space, | 1401 | *count += buf_flush_try_neighbors(space, |
2067 | @@ -1394,13 +1404,20 @@ | |||
2068 | 1394 | *count, | 1404 | *count, |
2069 | 1395 | n_to_flush); | 1405 | n_to_flush); |
2070 | 1396 | 1406 | ||
2072 | 1397 | buf_pool_mutex_enter(buf_pool); | 1407 | if (flush_type == BUF_FLUSH_LRU) { |
2073 | 1408 | mutex_enter(&buf_pool->LRU_list_mutex); | ||
2074 | 1409 | } else { | ||
2075 | 1410 | buf_flush_list_mutex_enter(buf_pool); | ||
2076 | 1411 | } | ||
2077 | 1398 | flushed = TRUE; | 1412 | flushed = TRUE; |
2079 | 1399 | } else { | 1413 | } else if (flush_type == BUF_FLUSH_LRU) { |
2080 | 1400 | mutex_exit(block_mutex); | 1414 | mutex_exit(block_mutex); |
2081 | 1401 | } | 1415 | } |
2082 | 1402 | 1416 | ||
2084 | 1403 | ut_ad(buf_pool_mutex_own(buf_pool)); | 1417 | ut_ad((flush_type == BUF_FLUSH_LRU |
2085 | 1418 | && mutex_own(&buf_pool->LRU_list_mutex)) | ||
2086 | 1419 | || (flush_type == BUF_FLUSH_LIST | ||
2087 | 1420 | && buf_flush_list_mutex_own(buf_pool))); | ||
2088 | 1404 | 1421 | ||
2089 | 1405 | return(flushed); | 1422 | return(flushed); |
2090 | 1406 | } | 1423 | } |
2091 | @@ -1428,22 +1445,31 @@ | |||
2092 | 1428 | ulint free_len = UT_LIST_GET_LEN(buf_pool->free); | 1445 | ulint free_len = UT_LIST_GET_LEN(buf_pool->free); |
2093 | 1429 | ulint lru_len = UT_LIST_GET_LEN(buf_pool->unzip_LRU); | 1446 | ulint lru_len = UT_LIST_GET_LEN(buf_pool->unzip_LRU); |
2094 | 1430 | 1447 | ||
2096 | 1431 | ut_ad(buf_pool_mutex_own(buf_pool)); | 1448 | ut_ad(mutex_own(&buf_pool->LRU_list_mutex)); |
2097 | 1432 | 1449 | ||
2098 | 1433 | block = UT_LIST_GET_LAST(buf_pool->unzip_LRU); | 1450 | block = UT_LIST_GET_LAST(buf_pool->unzip_LRU); |
2099 | 1434 | while (block != NULL && count < max | 1451 | while (block != NULL && count < max |
2100 | 1435 | && free_len < srv_LRU_scan_depth | 1452 | && free_len < srv_LRU_scan_depth |
2101 | 1436 | && lru_len > UT_LIST_GET_LEN(buf_pool->LRU) / 10) { | 1453 | && lru_len > UT_LIST_GET_LEN(buf_pool->LRU) / 10) { |
2102 | 1437 | 1454 | ||
2103 | 1455 | ib_mutex_t* block_mutex = buf_page_get_mutex(&block->page); | ||
2104 | 1456 | |||
2105 | 1438 | ++scanned; | 1457 | ++scanned; |
2106 | 1458 | |||
2107 | 1459 | mutex_enter(block_mutex); | ||
2108 | 1460 | |||
2109 | 1439 | if (buf_LRU_free_page(&block->page, false)) { | 1461 | if (buf_LRU_free_page(&block->page, false)) { |
2111 | 1440 | /* Block was freed. buf_pool->mutex potentially | 1462 | |
2112 | 1463 | mutex_exit(block_mutex); | ||
2113 | 1464 | /* Block was freed. LRU list mutex potentially | ||
2114 | 1441 | released and reacquired */ | 1465 | released and reacquired */ |
2115 | 1442 | ++count; | 1466 | ++count; |
2116 | 1467 | mutex_enter(&buf_pool->LRU_list_mutex); | ||
2117 | 1443 | block = UT_LIST_GET_LAST(buf_pool->unzip_LRU); | 1468 | block = UT_LIST_GET_LAST(buf_pool->unzip_LRU); |
2118 | 1444 | 1469 | ||
2119 | 1445 | } else { | 1470 | } else { |
2120 | 1446 | 1471 | ||
2121 | 1472 | mutex_exit(block_mutex); | ||
2122 | 1447 | block = UT_LIST_GET_PREV(unzip_LRU, block); | 1473 | block = UT_LIST_GET_PREV(unzip_LRU, block); |
2123 | 1448 | } | 1474 | } |
2124 | 1449 | 1475 | ||
2125 | @@ -1451,7 +1477,7 @@ | |||
2126 | 1451 | lru_len = UT_LIST_GET_LEN(buf_pool->unzip_LRU); | 1477 | lru_len = UT_LIST_GET_LEN(buf_pool->unzip_LRU); |
2127 | 1452 | } | 1478 | } |
2128 | 1453 | 1479 | ||
2130 | 1454 | ut_ad(buf_pool_mutex_own(buf_pool)); | 1480 | ut_ad(mutex_own(&buf_pool->LRU_list_mutex)); |
2131 | 1455 | 1481 | ||
2132 | 1456 | if (scanned) { | 1482 | if (scanned) { |
2133 | 1457 | MONITOR_INC_VALUE_CUMULATIVE( | 1483 | MONITOR_INC_VALUE_CUMULATIVE( |
2134 | @@ -1485,7 +1511,7 @@ | |||
2135 | 1485 | ulint free_len = UT_LIST_GET_LEN(buf_pool->free); | 1511 | ulint free_len = UT_LIST_GET_LEN(buf_pool->free); |
2136 | 1486 | ulint lru_len = UT_LIST_GET_LEN(buf_pool->LRU); | 1512 | ulint lru_len = UT_LIST_GET_LEN(buf_pool->LRU); |
2137 | 1487 | 1513 | ||
2139 | 1488 | ut_ad(buf_pool_mutex_own(buf_pool)); | 1514 | ut_ad(mutex_own(&buf_pool->LRU_list_mutex)); |
2140 | 1489 | 1515 | ||
2141 | 1490 | bpage = UT_LIST_GET_LAST(buf_pool->LRU); | 1516 | bpage = UT_LIST_GET_LAST(buf_pool->LRU); |
2142 | 1491 | while (bpage != NULL && count < max | 1517 | while (bpage != NULL && count < max |
2143 | @@ -1494,13 +1520,20 @@ | |||
2144 | 1494 | 1520 | ||
2145 | 1495 | ib_mutex_t* block_mutex = buf_page_get_mutex(bpage); | 1521 | ib_mutex_t* block_mutex = buf_page_get_mutex(bpage); |
2146 | 1496 | ibool evict; | 1522 | ibool evict; |
2151 | 1497 | 1523 | ulint failed_acquire; | |
2148 | 1498 | mutex_enter(block_mutex); | ||
2149 | 1499 | evict = buf_flush_ready_for_replace(bpage); | ||
2150 | 1500 | mutex_exit(block_mutex); | ||
2152 | 1501 | 1524 | ||
2153 | 1502 | ++scanned; | 1525 | ++scanned; |
2154 | 1503 | 1526 | ||
2155 | 1527 | failed_acquire = mutex_enter_nowait(block_mutex); | ||
2156 | 1528 | |||
2157 | 1529 | evict = UNIV_LIKELY(!failed_acquire) | ||
2158 | 1530 | && buf_flush_ready_for_replace(bpage); | ||
2159 | 1531 | |||
2160 | 1532 | if (UNIV_LIKELY(!failed_acquire) && !evict) { | ||
2161 | 1533 | |||
2162 | 1534 | mutex_exit(block_mutex); | ||
2163 | 1535 | } | ||
2164 | 1536 | |||
2165 | 1504 | /* If the block is ready to be replaced we try to | 1537 | /* If the block is ready to be replaced we try to |
2166 | 1505 | free it i.e.: put it on the free list. | 1538 | free it i.e.: put it on the free list. |
2167 | 1506 | Otherwise we try to flush the block and its | 1539 | Otherwise we try to flush the block and its |
2168 | @@ -1514,28 +1547,35 @@ | |||
2169 | 1514 | O(n*n). */ | 1547 | O(n*n). */ |
2170 | 1515 | if (evict) { | 1548 | if (evict) { |
2171 | 1516 | if (buf_LRU_free_page(bpage, true)) { | 1549 | if (buf_LRU_free_page(bpage, true)) { |
2174 | 1517 | /* buf_pool->mutex was potentially | 1550 | |
2175 | 1518 | released and reacquired. */ | 1551 | mutex_exit(block_mutex); |
2176 | 1552 | mutex_enter(&buf_pool->LRU_list_mutex); | ||
2177 | 1519 | bpage = UT_LIST_GET_LAST(buf_pool->LRU); | 1553 | bpage = UT_LIST_GET_LAST(buf_pool->LRU); |
2178 | 1520 | } else { | 1554 | } else { |
2179 | 1555 | |||
2180 | 1521 | bpage = UT_LIST_GET_PREV(LRU, bpage); | 1556 | bpage = UT_LIST_GET_PREV(LRU, bpage); |
2181 | 1557 | mutex_exit(block_mutex); | ||
2182 | 1522 | } | 1558 | } |
2184 | 1523 | } else if (buf_flush_page_and_try_neighbors( | 1559 | } else if (UNIV_LIKELY(!failed_acquire)) { |
2185 | 1560 | |||
2186 | 1561 | if (buf_flush_page_and_try_neighbors( | ||
2187 | 1524 | bpage, | 1562 | bpage, |
2188 | 1525 | BUF_FLUSH_LRU, max, &count)) { | 1563 | BUF_FLUSH_LRU, max, &count)) { |
2189 | 1526 | 1564 | ||
2195 | 1527 | /* buf_pool->mutex was released. | 1565 | /* LRU list mutex was released. |
2196 | 1528 | Restart the scan. */ | 1566 | Restart the scan. */ |
2197 | 1529 | bpage = UT_LIST_GET_LAST(buf_pool->LRU); | 1567 | bpage = UT_LIST_GET_LAST(buf_pool->LRU); |
2198 | 1530 | } else { | 1568 | } else { |
2199 | 1531 | bpage = UT_LIST_GET_PREV(LRU, bpage); | 1569 | |
2200 | 1570 | bpage = UT_LIST_GET_PREV(LRU, bpage); | ||
2201 | 1571 | } | ||
2202 | 1532 | } | 1572 | } |
2203 | 1533 | 1573 | ||
2204 | 1534 | free_len = UT_LIST_GET_LEN(buf_pool->free); | 1574 | free_len = UT_LIST_GET_LEN(buf_pool->free); |
2205 | 1535 | lru_len = UT_LIST_GET_LEN(buf_pool->LRU); | 1575 | lru_len = UT_LIST_GET_LEN(buf_pool->LRU); |
2206 | 1536 | } | 1576 | } |
2207 | 1537 | 1577 | ||
2209 | 1538 | ut_ad(buf_pool_mutex_own(buf_pool)); | 1578 | ut_ad(mutex_own(&buf_pool->LRU_list_mutex)); |
2210 | 1539 | 1579 | ||
2211 | 1540 | /* We keep track of all flushes happening as part of LRU | 1580 | /* We keep track of all flushes happening as part of LRU |
2212 | 1541 | flush. When estimating the desired rate at which flush_list | 1581 | flush. When estimating the desired rate at which flush_list |
2213 | @@ -1604,8 +1644,6 @@ | |||
2214 | 1604 | ulint count = 0; | 1644 | ulint count = 0; |
2215 | 1605 | ulint scanned = 0; | 1645 | ulint scanned = 0; |
2216 | 1606 | 1646 | ||
2217 | 1607 | ut_ad(buf_pool_mutex_own(buf_pool)); | ||
2218 | 1608 | |||
2219 | 1609 | /* Start from the end of the list looking for a suitable | 1647 | /* Start from the end of the list looking for a suitable |
2220 | 1610 | block to be flushed. */ | 1648 | block to be flushed. */ |
2221 | 1611 | buf_flush_list_mutex_enter(buf_pool); | 1649 | buf_flush_list_mutex_enter(buf_pool); |
2222 | @@ -1629,16 +1667,12 @@ | |||
2223 | 1629 | prev = UT_LIST_GET_PREV(list, bpage); | 1667 | prev = UT_LIST_GET_PREV(list, bpage); |
2224 | 1630 | buf_flush_set_hp(buf_pool, prev); | 1668 | buf_flush_set_hp(buf_pool, prev); |
2225 | 1631 | 1669 | ||
2226 | 1632 | buf_flush_list_mutex_exit(buf_pool); | ||
2227 | 1633 | |||
2228 | 1634 | #ifdef UNIV_DEBUG | 1670 | #ifdef UNIV_DEBUG |
2229 | 1635 | bool flushed = | 1671 | bool flushed = |
2230 | 1636 | #endif /* UNIV_DEBUG */ | 1672 | #endif /* UNIV_DEBUG */ |
2231 | 1637 | buf_flush_page_and_try_neighbors( | 1673 | buf_flush_page_and_try_neighbors( |
2232 | 1638 | bpage, BUF_FLUSH_LIST, min_n, &count); | 1674 | bpage, BUF_FLUSH_LIST, min_n, &count); |
2233 | 1639 | 1675 | ||
2234 | 1640 | buf_flush_list_mutex_enter(buf_pool); | ||
2235 | 1641 | |||
2236 | 1642 | ut_ad(flushed || buf_flush_is_hp(buf_pool, prev)); | 1676 | ut_ad(flushed || buf_flush_is_hp(buf_pool, prev)); |
2237 | 1643 | 1677 | ||
2238 | 1644 | if (!buf_flush_is_hp(buf_pool, prev)) { | 1678 | if (!buf_flush_is_hp(buf_pool, prev)) { |
2239 | @@ -1663,8 +1697,6 @@ | |||
2240 | 1663 | MONITOR_FLUSH_BATCH_SCANNED_PER_CALL, | 1697 | MONITOR_FLUSH_BATCH_SCANNED_PER_CALL, |
2241 | 1664 | scanned); | 1698 | scanned); |
2242 | 1665 | 1699 | ||
2243 | 1666 | ut_ad(buf_pool_mutex_own(buf_pool)); | ||
2244 | 1667 | |||
2245 | 1668 | return(count); | 1700 | return(count); |
2246 | 1669 | } | 1701 | } |
2247 | 1670 | 1702 | ||
2248 | @@ -1701,13 +1733,13 @@ | |||
2249 | 1701 | || sync_thread_levels_empty_except_dict()); | 1733 | || sync_thread_levels_empty_except_dict()); |
2250 | 1702 | #endif /* UNIV_SYNC_DEBUG */ | 1734 | #endif /* UNIV_SYNC_DEBUG */ |
2251 | 1703 | 1735 | ||
2255 | 1704 | buf_pool_mutex_enter(buf_pool); | 1736 | /* Note: The buffer pool mutexes are released and reacquired within |
2253 | 1705 | |||
2254 | 1706 | /* Note: The buffer pool mutex is released and reacquired within | ||
2256 | 1707 | the flush functions. */ | 1737 | the flush functions. */ |
2257 | 1708 | switch (flush_type) { | 1738 | switch (flush_type) { |
2258 | 1709 | case BUF_FLUSH_LRU: | 1739 | case BUF_FLUSH_LRU: |
2259 | 1740 | mutex_enter(&buf_pool->LRU_list_mutex); | ||
2260 | 1710 | count = buf_do_LRU_batch(buf_pool, min_n); | 1741 | count = buf_do_LRU_batch(buf_pool, min_n); |
2261 | 1742 | mutex_exit(&buf_pool->LRU_list_mutex); | ||
2262 | 1711 | break; | 1743 | break; |
2263 | 1712 | case BUF_FLUSH_LIST: | 1744 | case BUF_FLUSH_LIST: |
2264 | 1713 | count = buf_do_flush_list_batch(buf_pool, min_n, lsn_limit); | 1745 | count = buf_do_flush_list_batch(buf_pool, min_n, lsn_limit); |
2265 | @@ -1716,8 +1748,6 @@ | |||
2266 | 1716 | ut_error; | 1748 | ut_error; |
2267 | 1717 | } | 1749 | } |
2268 | 1718 | 1750 | ||
2269 | 1719 | buf_pool_mutex_exit(buf_pool); | ||
2270 | 1720 | |||
2271 | 1721 | #ifdef UNIV_DEBUG | 1751 | #ifdef UNIV_DEBUG |
2272 | 1722 | if (buf_debug_prints && count > 0) { | 1752 | if (buf_debug_prints && count > 0) { |
2273 | 1723 | fprintf(stderr, flush_type == BUF_FLUSH_LRU | 1753 | fprintf(stderr, flush_type == BUF_FLUSH_LRU |
2274 | @@ -1765,21 +1795,21 @@ | |||
2275 | 1765 | buf_flush_t flush_type) /*!< in: BUF_FLUSH_LRU | 1795 | buf_flush_t flush_type) /*!< in: BUF_FLUSH_LRU |
2276 | 1766 | or BUF_FLUSH_LIST */ | 1796 | or BUF_FLUSH_LIST */ |
2277 | 1767 | { | 1797 | { |
2279 | 1768 | buf_pool_mutex_enter(buf_pool); | 1798 | mutex_enter(&buf_pool->flush_state_mutex); |
2280 | 1769 | 1799 | ||
2281 | 1770 | if (buf_pool->n_flush[flush_type] > 0 | 1800 | if (buf_pool->n_flush[flush_type] > 0 |
2283 | 1771 | || buf_pool->init_flush[flush_type] == TRUE) { | 1801 | || buf_pool->init_flush[flush_type] == TRUE) { |
2284 | 1772 | 1802 | ||
2285 | 1773 | /* There is already a flush batch of the same type running */ | 1803 | /* There is already a flush batch of the same type running */ |
2286 | 1774 | 1804 | ||
2288 | 1775 | buf_pool_mutex_exit(buf_pool); | 1805 | mutex_exit(&buf_pool->flush_state_mutex); |
2289 | 1776 | 1806 | ||
2290 | 1777 | return(FALSE); | 1807 | return(FALSE); |
2291 | 1778 | } | 1808 | } |
2292 | 1779 | 1809 | ||
2293 | 1780 | buf_pool->init_flush[flush_type] = TRUE; | 1810 | buf_pool->init_flush[flush_type] = TRUE; |
2294 | 1781 | 1811 | ||
2296 | 1782 | buf_pool_mutex_exit(buf_pool); | 1812 | mutex_exit(&buf_pool->flush_state_mutex); |
2297 | 1783 | 1813 | ||
2298 | 1784 | return(TRUE); | 1814 | return(TRUE); |
2299 | 1785 | } | 1815 | } |
2300 | @@ -1794,7 +1824,7 @@ | |||
2301 | 1794 | buf_flush_t flush_type) /*!< in: BUF_FLUSH_LRU | 1824 | buf_flush_t flush_type) /*!< in: BUF_FLUSH_LRU |
2302 | 1795 | or BUF_FLUSH_LIST */ | 1825 | or BUF_FLUSH_LIST */ |
2303 | 1796 | { | 1826 | { |
2305 | 1797 | buf_pool_mutex_enter(buf_pool); | 1827 | mutex_enter(&buf_pool->flush_state_mutex); |
2306 | 1798 | 1828 | ||
2307 | 1799 | buf_pool->init_flush[flush_type] = FALSE; | 1829 | buf_pool->init_flush[flush_type] = FALSE; |
2308 | 1800 | 1830 | ||
2309 | @@ -1807,7 +1837,7 @@ | |||
2310 | 1807 | os_event_set(buf_pool->no_flush[flush_type]); | 1837 | os_event_set(buf_pool->no_flush[flush_type]); |
2311 | 1808 | } | 1838 | } |
2312 | 1809 | 1839 | ||
2314 | 1810 | buf_pool_mutex_exit(buf_pool); | 1840 | mutex_exit(&buf_pool->flush_state_mutex); |
2315 | 1811 | } | 1841 | } |
2316 | 1812 | 1842 | ||
2317 | 1813 | /******************************************************************//** | 1843 | /******************************************************************//** |
2318 | @@ -1989,7 +2019,7 @@ | |||
2319 | 1989 | ibool freed; | 2019 | ibool freed; |
2320 | 1990 | bool evict_zip; | 2020 | bool evict_zip; |
2321 | 1991 | 2021 | ||
2323 | 1992 | buf_pool_mutex_enter(buf_pool); | 2022 | mutex_enter(&buf_pool->LRU_list_mutex); |
2324 | 1993 | 2023 | ||
2325 | 1994 | for (bpage = UT_LIST_GET_LAST(buf_pool->LRU), scanned = 1; | 2024 | for (bpage = UT_LIST_GET_LAST(buf_pool->LRU), scanned = 1; |
2326 | 1995 | bpage != NULL; | 2025 | bpage != NULL; |
2327 | @@ -2006,6 +2036,8 @@ | |||
2328 | 2006 | mutex_exit(block_mutex); | 2036 | mutex_exit(block_mutex); |
2329 | 2007 | } | 2037 | } |
2330 | 2008 | 2038 | ||
2331 | 2039 | mutex_exit(&buf_pool->LRU_list_mutex); | ||
2332 | 2040 | |||
2333 | 2009 | MONITOR_INC_VALUE_CUMULATIVE( | 2041 | MONITOR_INC_VALUE_CUMULATIVE( |
2334 | 2010 | MONITOR_LRU_SINGLE_FLUSH_SCANNED, | 2042 | MONITOR_LRU_SINGLE_FLUSH_SCANNED, |
2335 | 2011 | MONITOR_LRU_SINGLE_FLUSH_SCANNED_NUM_CALL, | 2043 | MONITOR_LRU_SINGLE_FLUSH_SCANNED_NUM_CALL, |
2336 | @@ -2014,22 +2046,20 @@ | |||
2337 | 2014 | 2046 | ||
2338 | 2015 | if (!bpage) { | 2047 | if (!bpage) { |
2339 | 2016 | /* Can't find a single flushable page. */ | 2048 | /* Can't find a single flushable page. */ |
2340 | 2017 | buf_pool_mutex_exit(buf_pool); | ||
2341 | 2018 | return(FALSE); | 2049 | return(FALSE); |
2342 | 2019 | } | 2050 | } |
2343 | 2020 | 2051 | ||
2346 | 2021 | /* The following call will release the buffer pool and | 2052 | /* The following call will release the buf_page_get_mutex() mutex. */ |
2345 | 2022 | block mutex. */ | ||
2347 | 2023 | buf_flush_page(buf_pool, bpage, BUF_FLUSH_SINGLE_PAGE, true); | 2053 | buf_flush_page(buf_pool, bpage, BUF_FLUSH_SINGLE_PAGE, true); |
2348 | 2024 | 2054 | ||
2349 | 2025 | /* At this point the page has been written to the disk. | 2055 | /* At this point the page has been written to the disk. |
2351 | 2026 | As we are not holding buffer pool or block mutex therefore | 2056 | As we are not holding LRU list or buf_page_get_mutex() mutex therefore |
2352 | 2027 | we cannot use the bpage safely. It may have been plucked out | 2057 | we cannot use the bpage safely. It may have been plucked out |
2353 | 2028 | of the LRU list by some other thread or it may even have | 2058 | of the LRU list by some other thread or it may even have |
2354 | 2029 | relocated in case of a compressed page. We need to start | 2059 | relocated in case of a compressed page. We need to start |
2355 | 2030 | the scan of LRU list again to remove the block from the LRU | 2060 | the scan of LRU list again to remove the block from the LRU |
2356 | 2031 | list and put it on the free list. */ | 2061 | list and put it on the free list. */ |
2358 | 2032 | buf_pool_mutex_enter(buf_pool); | 2062 | mutex_enter(&buf_pool->LRU_list_mutex); |
2359 | 2033 | 2063 | ||
2360 | 2034 | for (bpage = UT_LIST_GET_LAST(buf_pool->LRU); | 2064 | for (bpage = UT_LIST_GET_LAST(buf_pool->LRU); |
2361 | 2035 | bpage != NULL; | 2065 | bpage != NULL; |
2362 | @@ -2040,23 +2070,25 @@ | |||
2363 | 2040 | block_mutex = buf_page_get_mutex(bpage); | 2070 | block_mutex = buf_page_get_mutex(bpage); |
2364 | 2041 | mutex_enter(block_mutex); | 2071 | mutex_enter(block_mutex); |
2365 | 2042 | ready = buf_flush_ready_for_replace(bpage); | 2072 | ready = buf_flush_ready_for_replace(bpage); |
2366 | 2043 | mutex_exit(block_mutex); | ||
2367 | 2044 | if (ready) { | 2073 | if (ready) { |
2368 | 2045 | break; | 2074 | break; |
2369 | 2046 | } | 2075 | } |
2370 | 2076 | mutex_exit(block_mutex); | ||
2371 | 2047 | 2077 | ||
2372 | 2048 | } | 2078 | } |
2373 | 2049 | 2079 | ||
2374 | 2050 | if (!bpage) { | 2080 | if (!bpage) { |
2375 | 2051 | /* Can't find a single replaceable page. */ | 2081 | /* Can't find a single replaceable page. */ |
2377 | 2052 | buf_pool_mutex_exit(buf_pool); | 2082 | mutex_exit(&buf_pool->LRU_list_mutex); |
2378 | 2053 | return(FALSE); | 2083 | return(FALSE); |
2379 | 2054 | } | 2084 | } |
2380 | 2055 | 2085 | ||
2381 | 2056 | evict_zip = !buf_LRU_evict_from_unzip_LRU(buf_pool);; | 2086 | evict_zip = !buf_LRU_evict_from_unzip_LRU(buf_pool);; |
2382 | 2057 | 2087 | ||
2383 | 2058 | freed = buf_LRU_free_page(bpage, evict_zip); | 2088 | freed = buf_LRU_free_page(bpage, evict_zip); |
2385 | 2059 | buf_pool_mutex_exit(buf_pool); | 2089 | if (!freed) |
2386 | 2090 | mutex_exit(&buf_pool->LRU_list_mutex); | ||
2387 | 2091 | mutex_exit(block_mutex); | ||
2388 | 2060 | 2092 | ||
2389 | 2061 | return(freed); | 2093 | return(freed); |
2390 | 2062 | } | 2094 | } |
2391 | @@ -2082,9 +2114,7 @@ | |||
2392 | 2082 | 2114 | ||
2393 | 2083 | /* srv_LRU_scan_depth can be arbitrarily large value. | 2115 | /* srv_LRU_scan_depth can be arbitrarily large value. |
2394 | 2084 | We cap it with current LRU size. */ | 2116 | We cap it with current LRU size. */ |
2395 | 2085 | buf_pool_mutex_enter(buf_pool); | ||
2396 | 2086 | scan_depth = UT_LIST_GET_LEN(buf_pool->LRU); | 2117 | scan_depth = UT_LIST_GET_LEN(buf_pool->LRU); |
2397 | 2087 | buf_pool_mutex_exit(buf_pool); | ||
2398 | 2088 | 2118 | ||
2399 | 2089 | scan_depth = ut_min(srv_LRU_scan_depth, scan_depth); | 2119 | scan_depth = ut_min(srv_LRU_scan_depth, scan_depth); |
2400 | 2090 | 2120 | ||
2401 | @@ -2143,15 +2173,15 @@ | |||
2402 | 2143 | 2173 | ||
2403 | 2144 | buf_pool = buf_pool_from_array(i); | 2174 | buf_pool = buf_pool_from_array(i); |
2404 | 2145 | 2175 | ||
2406 | 2146 | buf_pool_mutex_enter(buf_pool); | 2176 | mutex_enter(&buf_pool->flush_state_mutex); |
2407 | 2147 | 2177 | ||
2408 | 2148 | if (buf_pool->n_flush[BUF_FLUSH_LRU] > 0 | 2178 | if (buf_pool->n_flush[BUF_FLUSH_LRU] > 0 |
2409 | 2149 | || buf_pool->init_flush[BUF_FLUSH_LRU]) { | 2179 | || buf_pool->init_flush[BUF_FLUSH_LRU]) { |
2410 | 2150 | 2180 | ||
2412 | 2151 | buf_pool_mutex_exit(buf_pool); | 2181 | mutex_exit(&buf_pool->flush_state_mutex); |
2413 | 2152 | buf_flush_wait_batch_end(buf_pool, BUF_FLUSH_LRU); | 2182 | buf_flush_wait_batch_end(buf_pool, BUF_FLUSH_LRU); |
2414 | 2153 | } else { | 2183 | } else { |
2416 | 2154 | buf_pool_mutex_exit(buf_pool); | 2184 | mutex_exit(&buf_pool->flush_state_mutex); |
2417 | 2155 | } | 2185 | } |
2418 | 2156 | } | 2186 | } |
2419 | 2157 | } | 2187 | } |
2420 | @@ -2629,7 +2659,6 @@ | |||
2421 | 2629 | { | 2659 | { |
2422 | 2630 | ulint count = 0; | 2660 | ulint count = 0; |
2423 | 2631 | 2661 | ||
2424 | 2632 | buf_pool_mutex_enter(buf_pool); | ||
2425 | 2633 | buf_flush_list_mutex_enter(buf_pool); | 2662 | buf_flush_list_mutex_enter(buf_pool); |
2426 | 2634 | 2663 | ||
2427 | 2635 | buf_page_t* bpage; | 2664 | buf_page_t* bpage; |
2428 | @@ -2648,7 +2677,6 @@ | |||
2429 | 2648 | } | 2677 | } |
2430 | 2649 | 2678 | ||
2431 | 2650 | buf_flush_list_mutex_exit(buf_pool); | 2679 | buf_flush_list_mutex_exit(buf_pool); |
2432 | 2651 | buf_pool_mutex_exit(buf_pool); | ||
2433 | 2652 | 2680 | ||
2434 | 2653 | return(count); | 2681 | return(count); |
2435 | 2654 | } | 2682 | } |
2436 | 2655 | 2683 | ||
2437 | === modified file 'Percona-Server/storage/innobase/buf/buf0lru.cc' | |||
2438 | --- Percona-Server/storage/innobase/buf/buf0lru.cc 2013-08-16 09:11:51 +0000 | |||
2439 | +++ Percona-Server/storage/innobase/buf/buf0lru.cc 2013-09-20 05:29:11 +0000 | |||
2440 | @@ -75,7 +75,7 @@ | |||
2441 | 75 | /** When dropping the search hash index entries before deleting an ibd | 75 | /** When dropping the search hash index entries before deleting an ibd |
2442 | 76 | file, we build a local array of pages belonging to that tablespace | 76 | file, we build a local array of pages belonging to that tablespace |
2443 | 77 | in the buffer pool. Following is the size of that array. | 77 | in the buffer pool. Following is the size of that array. |
2445 | 78 | We also release buf_pool->mutex after scanning this many pages of the | 78 | We also release buf_pool->LRU_list_mutex after scanning this many pages of the |
2446 | 79 | flush_list when dropping a table. This is to ensure that other threads | 79 | flush_list when dropping a table. This is to ensure that other threads |
2447 | 80 | are not blocked for extended period of time when using very large | 80 | are not blocked for extended period of time when using very large |
2448 | 81 | buffer pools. */ | 81 | buffer pools. */ |
2449 | @@ -133,7 +133,7 @@ | |||
2450 | 133 | If the block is compressed-only (BUF_BLOCK_ZIP_PAGE), | 133 | If the block is compressed-only (BUF_BLOCK_ZIP_PAGE), |
2451 | 134 | the object will be freed. | 134 | the object will be freed. |
2452 | 135 | 135 | ||
2454 | 136 | The caller must hold buf_pool->mutex, the buf_page_get_mutex() mutex | 136 | The caller must hold buf_pool->LRU_list_mutex, the buf_page_get_mutex() mutex |
2455 | 137 | and the appropriate hash_lock. This function will release the | 137 | and the appropriate hash_lock. This function will release the |
2456 | 138 | buf_page_get_mutex() and the hash_lock. | 138 | buf_page_get_mutex() and the hash_lock. |
2457 | 139 | 139 | ||
2458 | @@ -170,7 +170,7 @@ | |||
2459 | 170 | buf_page_t* bpage, /*!< in: control block */ | 170 | buf_page_t* bpage, /*!< in: control block */ |
2460 | 171 | buf_pool_t* buf_pool) /*!< in: buffer pool instance */ | 171 | buf_pool_t* buf_pool) /*!< in: buffer pool instance */ |
2461 | 172 | { | 172 | { |
2463 | 173 | ut_ad(buf_pool_mutex_own(buf_pool)); | 173 | ut_ad(mutex_own(&buf_pool->LRU_list_mutex)); |
2464 | 174 | ulint zip_size = page_zip_get_size(&bpage->zip); | 174 | ulint zip_size = page_zip_get_size(&bpage->zip); |
2465 | 175 | buf_pool->stat.LRU_bytes += zip_size ? zip_size : UNIV_PAGE_SIZE; | 175 | buf_pool->stat.LRU_bytes += zip_size ? zip_size : UNIV_PAGE_SIZE; |
2466 | 176 | ut_ad(buf_pool->stat.LRU_bytes <= buf_pool->curr_pool_size); | 176 | ut_ad(buf_pool->stat.LRU_bytes <= buf_pool->curr_pool_size); |
2467 | @@ -189,7 +189,7 @@ | |||
2468 | 189 | ulint io_avg; | 189 | ulint io_avg; |
2469 | 190 | ulint unzip_avg; | 190 | ulint unzip_avg; |
2470 | 191 | 191 | ||
2472 | 192 | ut_ad(buf_pool_mutex_own(buf_pool)); | 192 | ut_ad(mutex_own(&buf_pool->LRU_list_mutex)); |
2473 | 193 | 193 | ||
2474 | 194 | /* If the unzip_LRU list is empty, we can only use the LRU. */ | 194 | /* If the unzip_LRU list is empty, we can only use the LRU. */ |
2475 | 195 | if (UT_LIST_GET_LEN(buf_pool->unzip_LRU) == 0) { | 195 | if (UT_LIST_GET_LEN(buf_pool->unzip_LRU) == 0) { |
2476 | @@ -276,7 +276,7 @@ | |||
2477 | 276 | page_arr = static_cast<ulint*>(ut_malloc( | 276 | page_arr = static_cast<ulint*>(ut_malloc( |
2478 | 277 | sizeof(ulint) * BUF_LRU_DROP_SEARCH_SIZE)); | 277 | sizeof(ulint) * BUF_LRU_DROP_SEARCH_SIZE)); |
2479 | 278 | 278 | ||
2481 | 279 | buf_pool_mutex_enter(buf_pool); | 279 | mutex_enter(&buf_pool->LRU_list_mutex); |
2482 | 280 | num_entries = 0; | 280 | num_entries = 0; |
2483 | 281 | 281 | ||
2484 | 282 | scan_again: | 282 | scan_again: |
2485 | @@ -285,6 +285,7 @@ | |||
2486 | 285 | while (bpage != NULL) { | 285 | while (bpage != NULL) { |
2487 | 286 | buf_page_t* prev_bpage; | 286 | buf_page_t* prev_bpage; |
2488 | 287 | ibool is_fixed; | 287 | ibool is_fixed; |
2489 | 288 | ib_mutex_t* block_mutex = buf_page_get_mutex(bpage); | ||
2490 | 288 | 289 | ||
2491 | 289 | prev_bpage = UT_LIST_GET_PREV(LRU, bpage); | 290 | prev_bpage = UT_LIST_GET_PREV(LRU, bpage); |
2492 | 290 | 291 | ||
2493 | @@ -301,10 +302,10 @@ | |||
2494 | 301 | continue; | 302 | continue; |
2495 | 302 | } | 303 | } |
2496 | 303 | 304 | ||
2498 | 304 | mutex_enter(&((buf_block_t*) bpage)->mutex); | 305 | mutex_enter(block_mutex); |
2499 | 305 | is_fixed = bpage->buf_fix_count > 0 | 306 | is_fixed = bpage->buf_fix_count > 0 |
2500 | 306 | || !((buf_block_t*) bpage)->index; | 307 | || !((buf_block_t*) bpage)->index; |
2502 | 307 | mutex_exit(&((buf_block_t*) bpage)->mutex); | 308 | mutex_exit(block_mutex); |
2503 | 308 | 309 | ||
2504 | 309 | if (is_fixed) { | 310 | if (is_fixed) { |
2505 | 310 | goto next_page; | 311 | goto next_page; |
2506 | @@ -320,18 +321,18 @@ | |||
2507 | 320 | goto next_page; | 321 | goto next_page; |
2508 | 321 | } | 322 | } |
2509 | 322 | 323 | ||
2511 | 323 | /* Array full. We release the buf_pool->mutex to obey | 324 | /* Array full. We release the buf_pool->LRU_list_mutex to obey |
2512 | 324 | the latching order. */ | 325 | the latching order. */ |
2514 | 325 | buf_pool_mutex_exit(buf_pool); | 326 | mutex_exit(&buf_pool->LRU_list_mutex); |
2515 | 326 | 327 | ||
2516 | 327 | buf_LRU_drop_page_hash_batch( | 328 | buf_LRU_drop_page_hash_batch( |
2517 | 328 | id, zip_size, page_arr, num_entries); | 329 | id, zip_size, page_arr, num_entries); |
2518 | 329 | 330 | ||
2519 | 330 | num_entries = 0; | 331 | num_entries = 0; |
2520 | 331 | 332 | ||
2522 | 332 | buf_pool_mutex_enter(buf_pool); | 333 | mutex_enter(&buf_pool->LRU_list_mutex); |
2523 | 333 | 334 | ||
2525 | 334 | /* Note that we released the buf_pool mutex above | 335 | /* Note that we released the buf_pool->LRU_list_mutex above |
2526 | 335 | after reading the prev_bpage during processing of a | 336 | after reading the prev_bpage during processing of a |
2527 | 336 | page_hash_batch (i.e.: when the array was full). | 337 | page_hash_batch (i.e.: when the array was full). |
2528 | 337 | Because prev_bpage could belong to a compressed-only | 338 | Because prev_bpage could belong to a compressed-only |
2529 | @@ -345,15 +346,15 @@ | |||
2530 | 345 | guarantee that ALL such entries will be dropped. */ | 346 | guarantee that ALL such entries will be dropped. */ |
2531 | 346 | 347 | ||
2532 | 347 | /* If, however, bpage has been removed from LRU list | 348 | /* If, however, bpage has been removed from LRU list |
2535 | 348 | to the free list then we should restart the scan. | 349 | to the free list then we should restart the scan. */ |
2536 | 349 | bpage->state is protected by buf_pool mutex. */ | 350 | |
2537 | 350 | if (bpage | 351 | if (bpage |
2538 | 351 | && buf_page_get_state(bpage) != BUF_BLOCK_FILE_PAGE) { | 352 | && buf_page_get_state(bpage) != BUF_BLOCK_FILE_PAGE) { |
2539 | 352 | goto scan_again; | 353 | goto scan_again; |
2540 | 353 | } | 354 | } |
2541 | 354 | } | 355 | } |
2542 | 355 | 356 | ||
2544 | 356 | buf_pool_mutex_exit(buf_pool); | 357 | mutex_exit(&buf_pool->LRU_list_mutex); |
2545 | 357 | 358 | ||
2546 | 358 | /* Drop any remaining batch of search hashed pages. */ | 359 | /* Drop any remaining batch of search hashed pages. */ |
2547 | 359 | buf_LRU_drop_page_hash_batch(id, zip_size, page_arr, num_entries); | 360 | buf_LRU_drop_page_hash_batch(id, zip_size, page_arr, num_entries); |
2548 | @@ -373,27 +374,25 @@ | |||
2549 | 373 | buf_pool_t* buf_pool, /*!< in/out: buffer pool instance */ | 374 | buf_pool_t* buf_pool, /*!< in/out: buffer pool instance */ |
2550 | 374 | buf_page_t* bpage) /*!< in/out: current page */ | 375 | buf_page_t* bpage) /*!< in/out: current page */ |
2551 | 375 | { | 376 | { |
2553 | 376 | ib_mutex_t* block_mutex; | 377 | ib_mutex_t* block_mutex = buf_page_get_mutex(bpage); |
2554 | 377 | 378 | ||
2556 | 378 | ut_ad(buf_pool_mutex_own(buf_pool)); | 379 | ut_ad(mutex_own(&buf_pool->LRU_list_mutex)); |
2557 | 380 | ut_ad(mutex_own(block_mutex)); | ||
2558 | 379 | ut_ad(buf_page_in_file(bpage)); | 381 | ut_ad(buf_page_in_file(bpage)); |
2559 | 380 | 382 | ||
2560 | 381 | block_mutex = buf_page_get_mutex(bpage); | ||
2561 | 382 | |||
2562 | 383 | mutex_enter(block_mutex); | ||
2563 | 384 | /* "Fix" the block so that the position cannot be | 383 | /* "Fix" the block so that the position cannot be |
2564 | 385 | changed after we release the buffer pool and | 384 | changed after we release the buffer pool and |
2565 | 386 | block mutexes. */ | 385 | block mutexes. */ |
2566 | 387 | buf_page_set_sticky(bpage); | 386 | buf_page_set_sticky(bpage); |
2567 | 388 | 387 | ||
2570 | 389 | /* Now it is safe to release the buf_pool->mutex. */ | 388 | /* Now it is safe to release the LRU list mutex */ |
2571 | 390 | buf_pool_mutex_exit(buf_pool); | 389 | mutex_exit(&buf_pool->LRU_list_mutex); |
2572 | 391 | 390 | ||
2573 | 392 | mutex_exit(block_mutex); | 391 | mutex_exit(block_mutex); |
2574 | 393 | /* Try and force a context switch. */ | 392 | /* Try and force a context switch. */ |
2575 | 394 | os_thread_yield(); | 393 | os_thread_yield(); |
2576 | 395 | 394 | ||
2578 | 396 | buf_pool_mutex_enter(buf_pool); | 395 | mutex_enter(&buf_pool->LRU_list_mutex); |
2579 | 397 | 396 | ||
2580 | 398 | mutex_enter(block_mutex); | 397 | mutex_enter(block_mutex); |
2581 | 399 | /* "Unfix" the block now that we have both the | 398 | /* "Unfix" the block now that we have both the |
2582 | @@ -413,21 +412,47 @@ | |||
2583 | 413 | /*================*/ | 412 | /*================*/ |
2584 | 414 | buf_pool_t* buf_pool, /*!< in/out: buffer pool instance */ | 413 | buf_pool_t* buf_pool, /*!< in/out: buffer pool instance */ |
2585 | 415 | buf_page_t* bpage, /*!< in/out: bpage to remove */ | 414 | buf_page_t* bpage, /*!< in/out: bpage to remove */ |
2587 | 416 | ulint processed) /*!< in: number of pages processed */ | 415 | ulint processed, /*!< in: number of pages processed */ |
2588 | 416 | bool* must_restart) /*!< in/out: if true, we have to | ||
2589 | 417 | restart the flush list scan */ | ||
2590 | 417 | { | 418 | { |
2591 | 418 | /* Every BUF_LRU_DROP_SEARCH_SIZE iterations in the | 419 | /* Every BUF_LRU_DROP_SEARCH_SIZE iterations in the |
2593 | 419 | loop we release buf_pool->mutex to let other threads | 420 | loop we release buf_pool->LRU_list_mutex to let other threads |
2594 | 420 | do their job but only if the block is not IO fixed. This | 421 | do their job but only if the block is not IO fixed. This |
2595 | 421 | ensures that the block stays in its position in the | 422 | ensures that the block stays in its position in the |
2596 | 422 | flush_list. */ | 423 | flush_list. */ |
2597 | 423 | 424 | ||
2598 | 424 | if (bpage != NULL | 425 | if (bpage != NULL |
2599 | 425 | && processed >= BUF_LRU_DROP_SEARCH_SIZE | 426 | && processed >= BUF_LRU_DROP_SEARCH_SIZE |
2601 | 426 | && buf_page_get_io_fix(bpage) == BUF_IO_NONE) { | 427 | && buf_page_get_io_fix_unlocked(bpage) == BUF_IO_NONE) { |
2602 | 428 | |||
2603 | 429 | ib_mutex_t* block_mutex = buf_page_get_mutex(bpage); | ||
2604 | 427 | 430 | ||
2605 | 428 | buf_flush_list_mutex_exit(buf_pool); | 431 | buf_flush_list_mutex_exit(buf_pool); |
2606 | 429 | 432 | ||
2608 | 430 | /* Release the buffer pool and block mutex | 433 | /* We don't have to worry about bpage becoming a dangling |
2609 | 434 | pointer by a compressed page flush list relocation because | ||
2610 | 435 | buf_page_get_gen() won't be called for pages from this | ||
2611 | 436 | tablespace. */ | ||
2612 | 437 | |||
2613 | 438 | mutex_enter(block_mutex); | ||
2614 | 439 | /* Recheck the I/O fix and the flush list presence now that we | ||
2615 | 440 | hold the right mutex */ | ||
2616 | 441 | if (UNIV_UNLIKELY(buf_page_get_io_fix(bpage) != BUF_IO_NONE | ||
2617 | 442 | || bpage->oldest_modification == 0)) { | ||
2618 | 443 | |||
2619 | 444 | mutex_exit(block_mutex); | ||
2620 | 445 | |||
2621 | 446 | *must_restart = true; | ||
2622 | 447 | |||
2623 | 448 | buf_flush_list_mutex_enter(buf_pool); | ||
2624 | 449 | |||
2625 | 450 | return false; | ||
2626 | 451 | } | ||
2627 | 452 | |||
2628 | 453 | *must_restart = false; | ||
2629 | 454 | |||
2630 | 455 | /* Release the LRU list and buf_page_get_mutex() mutex | ||
2631 | 431 | to give the other threads a go. */ | 456 | to give the other threads a go. */ |
2632 | 432 | 457 | ||
2633 | 433 | buf_flush_yield(buf_pool, bpage); | 458 | buf_flush_yield(buf_pool, bpage); |
2634 | @@ -456,18 +481,22 @@ | |||
2635 | 456 | /*=====================*/ | 481 | /*=====================*/ |
2636 | 457 | buf_pool_t* buf_pool, /*!< in/out: buffer pool instance */ | 482 | buf_pool_t* buf_pool, /*!< in/out: buffer pool instance */ |
2637 | 458 | buf_page_t* bpage, /*!< in/out: bpage to remove */ | 483 | buf_page_t* bpage, /*!< in/out: bpage to remove */ |
2639 | 459 | bool flush) /*!< in: flush to disk if true but | 484 | bool flush, /*!< in: flush to disk if true but |
2640 | 460 | don't remove else remove without | 485 | don't remove else remove without |
2641 | 461 | flushing to disk */ | 486 | flushing to disk */ |
2642 | 487 | bool* must_restart) /*!< in/out: if true, must restart the | ||
2643 | 488 | flush list scan */ | ||
2644 | 462 | { | 489 | { |
2646 | 463 | ut_ad(buf_pool_mutex_own(buf_pool)); | 490 | ib_mutex_t* block_mutex = buf_page_get_mutex(bpage); |
2647 | 491 | |||
2648 | 492 | ut_ad(mutex_own(&buf_pool->LRU_list_mutex)); | ||
2649 | 464 | ut_ad(buf_flush_list_mutex_own(buf_pool)); | 493 | ut_ad(buf_flush_list_mutex_own(buf_pool)); |
2650 | 465 | 494 | ||
2654 | 466 | /* bpage->space and bpage->io_fix are protected by | 495 | /* It is safe to check bpage->space and bpage->io_fix while holding |
2655 | 467 | buf_pool->mutex and block_mutex. It is safe to check | 496 | buf_pool->LRU_list_mutex only. */ |
2653 | 468 | them while holding buf_pool->mutex only. */ | ||
2656 | 469 | 497 | ||
2658 | 470 | if (buf_page_get_io_fix(bpage) != BUF_IO_NONE) { | 498 | if (UNIV_UNLIKELY(buf_page_get_io_fix_unlocked(bpage) |
2659 | 499 | != BUF_IO_NONE)) { | ||
2660 | 471 | 500 | ||
2661 | 472 | /* We cannot remove this page during this scan | 501 | /* We cannot remove this page during this scan |
2662 | 473 | yet; maybe the system is currently reading it | 502 | yet; maybe the system is currently reading it |
2663 | @@ -476,22 +505,31 @@ | |||
2664 | 476 | 505 | ||
2665 | 477 | } | 506 | } |
2666 | 478 | 507 | ||
2667 | 479 | ib_mutex_t* block_mutex = buf_page_get_mutex(bpage); | ||
2668 | 480 | bool processed = false; | 508 | bool processed = false; |
2669 | 481 | 509 | ||
2670 | 482 | /* We have to release the flush_list_mutex to obey the | ||
2671 | 483 | latching order. We are however guaranteed that the page | ||
2672 | 484 | will stay in the flush_list and won't be relocated because | ||
2673 | 485 | buf_flush_remove() and buf_flush_relocate_on_flush_list() | ||
2674 | 486 | need buf_pool->mutex as well. */ | ||
2675 | 487 | |||
2676 | 488 | buf_flush_list_mutex_exit(buf_pool); | 510 | buf_flush_list_mutex_exit(buf_pool); |
2677 | 489 | 511 | ||
2678 | 512 | /* We don't have to worry about bpage becoming a dangling | ||
2679 | 513 | pointer by a compressed page flush list relocation because | ||
2680 | 514 | buf_page_get_gen() won't be called for pages from this | ||
2681 | 515 | tablespace. */ | ||
2682 | 516 | |||
2683 | 490 | mutex_enter(block_mutex); | 517 | mutex_enter(block_mutex); |
2684 | 491 | 518 | ||
2688 | 492 | ut_ad(bpage->oldest_modification != 0); | 519 | /* Recheck the page I/O fix and the flush list presence now |
2689 | 493 | 520 | that we hold the right mutex. */ | |
2690 | 494 | if (!flush) { | 521 | if (UNIV_UNLIKELY(buf_page_get_io_fix(bpage) != BUF_IO_NONE |
2691 | 522 | || bpage->oldest_modification == 0)) { | ||
2692 | 523 | |||
2693 | 524 | /* The page became I/O-fixed or is not on the flush | ||
2694 | 525 | list anymore, this invalidates any flush-list-page | ||
2695 | 526 | pointers we have. */ | ||
2696 | 527 | |||
2697 | 528 | mutex_exit(block_mutex); | ||
2698 | 529 | |||
2699 | 530 | *must_restart = TRUE; | ||
2700 | 531 | |||
2701 | 532 | } else if (!flush) { | ||
2702 | 495 | 533 | ||
2703 | 496 | buf_flush_remove(bpage); | 534 | buf_flush_remove(bpage); |
2704 | 497 | 535 | ||
2705 | @@ -502,8 +540,10 @@ | |||
2706 | 502 | } else if (buf_flush_ready_for_flush(bpage, | 540 | } else if (buf_flush_ready_for_flush(bpage, |
2707 | 503 | BUF_FLUSH_SINGLE_PAGE)) { | 541 | BUF_FLUSH_SINGLE_PAGE)) { |
2708 | 504 | 542 | ||
2711 | 505 | /* The following call will release the buffer pool | 543 | mutex_exit(&buf_pool->LRU_list_mutex); |
2712 | 506 | and block mutex. */ | 544 | |
2713 | 545 | /* The following call will release the buf_page_get_mutex() | ||
2714 | 546 | mutex. */ | ||
2715 | 507 | buf_flush_page(buf_pool, bpage, BUF_FLUSH_SINGLE_PAGE, false); | 547 | buf_flush_page(buf_pool, bpage, BUF_FLUSH_SINGLE_PAGE, false); |
2716 | 508 | ut_ad(!mutex_own(block_mutex)); | 548 | ut_ad(!mutex_own(block_mutex)); |
2717 | 509 | 549 | ||
2718 | @@ -511,7 +551,7 @@ | |||
2719 | 511 | post the writes to the operating system */ | 551 | post the writes to the operating system */ |
2720 | 512 | os_aio_simulated_wake_handler_threads(); | 552 | os_aio_simulated_wake_handler_threads(); |
2721 | 513 | 553 | ||
2723 | 514 | buf_pool_mutex_enter(buf_pool); | 554 | mutex_enter(&buf_pool->LRU_list_mutex); |
2724 | 515 | 555 | ||
2725 | 516 | processed = true; | 556 | processed = true; |
2726 | 517 | } else { | 557 | } else { |
2727 | @@ -525,7 +565,7 @@ | |||
2728 | 525 | buf_flush_list_mutex_enter(buf_pool); | 565 | buf_flush_list_mutex_enter(buf_pool); |
2729 | 526 | 566 | ||
2730 | 527 | ut_ad(!mutex_own(block_mutex)); | 567 | ut_ad(!mutex_own(block_mutex)); |
2732 | 528 | ut_ad(buf_pool_mutex_own(buf_pool)); | 568 | ut_ad(mutex_own(&buf_pool->LRU_list_mutex)); |
2733 | 529 | 569 | ||
2734 | 530 | return(processed); | 570 | return(processed); |
2735 | 531 | } | 571 | } |
2736 | @@ -558,6 +598,7 @@ | |||
2737 | 558 | buf_flush_list_mutex_enter(buf_pool); | 598 | buf_flush_list_mutex_enter(buf_pool); |
2738 | 559 | 599 | ||
2739 | 560 | rescan: | 600 | rescan: |
2740 | 601 | bool must_restart = false; | ||
2741 | 561 | bool all_freed = true; | 602 | bool all_freed = true; |
2742 | 562 | 603 | ||
2743 | 563 | for (bpage = UT_LIST_GET_LAST(buf_pool->flush_list); | 604 | for (bpage = UT_LIST_GET_LAST(buf_pool->flush_list); |
2744 | @@ -576,15 +617,16 @@ | |||
2745 | 576 | /* Skip this block, as it does not belong to | 617 | /* Skip this block, as it does not belong to |
2746 | 577 | the target space. */ | 618 | the target space. */ |
2747 | 578 | 619 | ||
2749 | 579 | } else if (!buf_flush_or_remove_page(buf_pool, bpage, flush)) { | 620 | } else if (!buf_flush_or_remove_page(buf_pool, bpage, flush, |
2750 | 621 | &must_restart)) { | ||
2751 | 580 | 622 | ||
2752 | 581 | /* Remove was unsuccessful, we have to try again | 623 | /* Remove was unsuccessful, we have to try again |
2753 | 582 | by scanning the entire list from the end. | 624 | by scanning the entire list from the end. |
2754 | 583 | This also means that we never released the | 625 | This also means that we never released the |
2756 | 584 | buf_pool mutex. Therefore we can trust the prev | 626 | flush list mutex. Therefore we can trust the prev |
2757 | 585 | pointer. | 627 | pointer. |
2758 | 586 | buf_flush_or_remove_page() released the | 628 | buf_flush_or_remove_page() released the |
2760 | 587 | flush list mutex but not the buf_pool mutex. | 629 | flush list mutex but not the LRU list mutex. |
2761 | 588 | Therefore it is possible that a new page was | 630 | Therefore it is possible that a new page was |
2762 | 589 | added to the flush list. For example, in case | 631 | added to the flush list. For example, in case |
2763 | 590 | where we are at the head of the flush list and | 632 | where we are at the head of the flush list and |
2764 | @@ -602,17 +644,23 @@ | |||
2765 | 602 | } else if (flush) { | 644 | } else if (flush) { |
2766 | 603 | 645 | ||
2767 | 604 | /* The processing was successful. And during the | 646 | /* The processing was successful. And during the |
2769 | 605 | processing we have released the buf_pool mutex | 647 | processing we have released all the buf_pool mutexes |
2770 | 606 | when calling buf_page_flush(). We cannot trust | 648 | when calling buf_page_flush(). We cannot trust |
2771 | 607 | prev pointer. */ | 649 | prev pointer. */ |
2772 | 608 | goto rescan; | 650 | goto rescan; |
2773 | 651 | } else if (UNIV_UNLIKELY(must_restart)) { | ||
2774 | 652 | |||
2775 | 653 | ut_ad(!all_freed); | ||
2776 | 654 | break; | ||
2777 | 609 | } | 655 | } |
2778 | 610 | 656 | ||
2779 | 611 | ++processed; | 657 | ++processed; |
2780 | 612 | 658 | ||
2781 | 613 | /* Yield if we have hogged the CPU and mutexes for too long. */ | 659 | /* Yield if we have hogged the CPU and mutexes for too long. */ |
2783 | 614 | if (buf_flush_try_yield(buf_pool, prev, processed)) { | 660 | if (buf_flush_try_yield(buf_pool, prev, processed, |
2784 | 661 | &must_restart)) { | ||
2785 | 615 | 662 | ||
2786 | 663 | ut_ad(!must_restart); | ||
2787 | 616 | /* Reset the batch size counter if we had to yield. */ | 664 | /* Reset the batch size counter if we had to yield. */ |
2788 | 617 | 665 | ||
2789 | 618 | processed = 0; | 666 | processed = 0; |
2790 | @@ -658,11 +706,11 @@ | |||
2791 | 658 | dberr_t err; | 706 | dberr_t err; |
2792 | 659 | 707 | ||
2793 | 660 | do { | 708 | do { |
2795 | 661 | buf_pool_mutex_enter(buf_pool); | 709 | mutex_enter(&buf_pool->LRU_list_mutex); |
2796 | 662 | 710 | ||
2797 | 663 | err = buf_flush_or_remove_pages(buf_pool, id, flush, trx); | 711 | err = buf_flush_or_remove_pages(buf_pool, id, flush, trx); |
2798 | 664 | 712 | ||
2800 | 665 | buf_pool_mutex_exit(buf_pool); | 713 | mutex_exit(&buf_pool->LRU_list_mutex); |
2801 | 666 | 714 | ||
2802 | 667 | ut_ad(buf_flush_validate(buf_pool)); | 715 | ut_ad(buf_flush_validate(buf_pool)); |
2803 | 668 | 716 | ||
2804 | @@ -695,7 +743,7 @@ | |||
2805 | 695 | ibool all_freed; | 743 | ibool all_freed; |
2806 | 696 | 744 | ||
2807 | 697 | scan_again: | 745 | scan_again: |
2809 | 698 | buf_pool_mutex_enter(buf_pool); | 746 | mutex_enter(&buf_pool->LRU_list_mutex); |
2810 | 699 | 747 | ||
2811 | 700 | all_freed = TRUE; | 748 | all_freed = TRUE; |
2812 | 701 | 749 | ||
2813 | @@ -712,15 +760,16 @@ | |||
2814 | 712 | 760 | ||
2815 | 713 | prev_bpage = UT_LIST_GET_PREV(LRU, bpage); | 761 | prev_bpage = UT_LIST_GET_PREV(LRU, bpage); |
2816 | 714 | 762 | ||
2820 | 715 | /* bpage->space and bpage->io_fix are protected by | 763 | /* It is safe to check bpage->space and bpage->io_fix while |
2821 | 716 | buf_pool->mutex and the block_mutex. It is safe to check | 764 | holding buf_pool->LRU_list_mutex only and later recheck |
2822 | 717 | them while holding buf_pool->mutex only. */ | 765 | while holding the buf_page_get_mutex() mutex. */ |
2823 | 718 | 766 | ||
2824 | 719 | if (buf_page_get_space(bpage) != id) { | 767 | if (buf_page_get_space(bpage) != id) { |
2825 | 720 | /* Skip this block, as it does not belong to | 768 | /* Skip this block, as it does not belong to |
2826 | 721 | the space that is being invalidated. */ | 769 | the space that is being invalidated. */ |
2827 | 722 | goto next_page; | 770 | goto next_page; |
2829 | 723 | } else if (buf_page_get_io_fix(bpage) != BUF_IO_NONE) { | 771 | } else if (UNIV_UNLIKELY(buf_page_get_io_fix_unlocked(bpage) |
2830 | 772 | != BUF_IO_NONE)) { | ||
2831 | 724 | /* We cannot remove this page during this scan | 773 | /* We cannot remove this page during this scan |
2832 | 725 | yet; maybe the system is currently reading it | 774 | yet; maybe the system is currently reading it |
2833 | 726 | in, or flushing the modifications to the file */ | 775 | in, or flushing the modifications to the file */ |
2834 | @@ -738,7 +787,11 @@ | |||
2835 | 738 | block_mutex = buf_page_get_mutex(bpage); | 787 | block_mutex = buf_page_get_mutex(bpage); |
2836 | 739 | mutex_enter(block_mutex); | 788 | mutex_enter(block_mutex); |
2837 | 740 | 789 | ||
2839 | 741 | if (bpage->buf_fix_count > 0) { | 790 | if (UNIV_UNLIKELY( |
2840 | 791 | buf_page_get_space(bpage) != id | ||
2841 | 792 | || bpage->buf_fix_count > 0 | ||
2842 | 793 | || (buf_page_get_io_fix(bpage) | ||
2843 | 794 | != BUF_IO_NONE))) { | ||
2844 | 742 | 795 | ||
2845 | 743 | mutex_exit(block_mutex); | 796 | mutex_exit(block_mutex); |
2846 | 744 | 797 | ||
2847 | @@ -772,15 +825,15 @@ | |||
2848 | 772 | ulint page_no; | 825 | ulint page_no; |
2849 | 773 | ulint zip_size; | 826 | ulint zip_size; |
2850 | 774 | 827 | ||
2852 | 775 | buf_pool_mutex_exit(buf_pool); | 828 | mutex_exit(&buf_pool->LRU_list_mutex); |
2853 | 776 | 829 | ||
2854 | 777 | zip_size = buf_page_get_zip_size(bpage); | 830 | zip_size = buf_page_get_zip_size(bpage); |
2855 | 778 | page_no = buf_page_get_page_no(bpage); | 831 | page_no = buf_page_get_page_no(bpage); |
2856 | 779 | 832 | ||
2857 | 833 | mutex_exit(block_mutex); | ||
2858 | 834 | |||
2859 | 780 | rw_lock_x_unlock(hash_lock); | 835 | rw_lock_x_unlock(hash_lock); |
2860 | 781 | 836 | ||
2861 | 782 | mutex_exit(block_mutex); | ||
2862 | 783 | |||
2863 | 784 | /* Note that the following call will acquire | 837 | /* Note that the following call will acquire |
2864 | 785 | and release block->lock X-latch. */ | 838 | and release block->lock X-latch. */ |
2865 | 786 | 839 | ||
2866 | @@ -800,7 +853,10 @@ | |||
2867 | 800 | /* Remove from the LRU list. */ | 853 | /* Remove from the LRU list. */ |
2868 | 801 | 854 | ||
2869 | 802 | if (buf_LRU_block_remove_hashed(bpage, true)) { | 855 | if (buf_LRU_block_remove_hashed(bpage, true)) { |
2870 | 856 | |||
2871 | 857 | mutex_enter(block_mutex); | ||
2872 | 803 | buf_LRU_block_free_hashed_page((buf_block_t*) bpage); | 858 | buf_LRU_block_free_hashed_page((buf_block_t*) bpage); |
2873 | 859 | mutex_exit(block_mutex); | ||
2874 | 804 | } else { | 860 | } else { |
2875 | 805 | ut_ad(block_mutex == &buf_pool->zip_mutex); | 861 | ut_ad(block_mutex == &buf_pool->zip_mutex); |
2876 | 806 | } | 862 | } |
2877 | @@ -817,7 +873,7 @@ | |||
2878 | 817 | bpage = prev_bpage; | 873 | bpage = prev_bpage; |
2879 | 818 | } | 874 | } |
2880 | 819 | 875 | ||
2882 | 820 | buf_pool_mutex_exit(buf_pool); | 876 | mutex_exit(&buf_pool->LRU_list_mutex); |
2883 | 821 | 877 | ||
2884 | 822 | if (!all_freed) { | 878 | if (!all_freed) { |
2885 | 823 | os_thread_sleep(20000); | 879 | os_thread_sleep(20000); |
2886 | @@ -921,7 +977,8 @@ | |||
2887 | 921 | buf_page_t* b; | 977 | buf_page_t* b; |
2888 | 922 | buf_pool_t* buf_pool = buf_pool_from_bpage(bpage); | 978 | buf_pool_t* buf_pool = buf_pool_from_bpage(bpage); |
2889 | 923 | 979 | ||
2891 | 924 | ut_ad(buf_pool_mutex_own(buf_pool)); | 980 | ut_ad(mutex_own(&buf_pool->LRU_list_mutex)); |
2892 | 981 | ut_ad(mutex_own(&buf_pool->zip_mutex)); | ||
2893 | 925 | ut_ad(buf_page_get_state(bpage) == BUF_BLOCK_ZIP_PAGE); | 982 | ut_ad(buf_page_get_state(bpage) == BUF_BLOCK_ZIP_PAGE); |
2894 | 926 | 983 | ||
2895 | 927 | /* Find the first successor of bpage in the LRU list | 984 | /* Find the first successor of bpage in the LRU list |
2896 | @@ -961,7 +1018,7 @@ | |||
2897 | 961 | ibool freed; | 1018 | ibool freed; |
2898 | 962 | ulint scanned; | 1019 | ulint scanned; |
2899 | 963 | 1020 | ||
2901 | 964 | ut_ad(buf_pool_mutex_own(buf_pool)); | 1021 | ut_ad(mutex_own(&buf_pool->LRU_list_mutex)); |
2902 | 965 | 1022 | ||
2903 | 966 | if (!buf_LRU_evict_from_unzip_LRU(buf_pool)) { | 1023 | if (!buf_LRU_evict_from_unzip_LRU(buf_pool)) { |
2904 | 967 | return(FALSE); | 1024 | return(FALSE); |
2905 | @@ -976,12 +1033,16 @@ | |||
2906 | 976 | buf_block_t* prev_block = UT_LIST_GET_PREV(unzip_LRU, | 1033 | buf_block_t* prev_block = UT_LIST_GET_PREV(unzip_LRU, |
2907 | 977 | block); | 1034 | block); |
2908 | 978 | 1035 | ||
2909 | 1036 | mutex_enter(&block->mutex); | ||
2910 | 1037 | |||
2911 | 979 | ut_ad(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE); | 1038 | ut_ad(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE); |
2912 | 980 | ut_ad(block->in_unzip_LRU_list); | 1039 | ut_ad(block->in_unzip_LRU_list); |
2913 | 981 | ut_ad(block->page.in_LRU_list); | 1040 | ut_ad(block->page.in_LRU_list); |
2914 | 982 | 1041 | ||
2915 | 983 | freed = buf_LRU_free_page(&block->page, false); | 1042 | freed = buf_LRU_free_page(&block->page, false); |
2916 | 984 | 1043 | ||
2917 | 1044 | mutex_exit(&block->mutex); | ||
2918 | 1045 | |||
2919 | 985 | block = prev_block; | 1046 | block = prev_block; |
2920 | 986 | } | 1047 | } |
2921 | 987 | 1048 | ||
2922 | @@ -1009,7 +1070,7 @@ | |||
2923 | 1009 | ibool freed; | 1070 | ibool freed; |
2924 | 1010 | ulint scanned; | 1071 | ulint scanned; |
2925 | 1011 | 1072 | ||
2927 | 1012 | ut_ad(buf_pool_mutex_own(buf_pool)); | 1073 | ut_ad(mutex_own(&buf_pool->LRU_list_mutex)); |
2928 | 1013 | 1074 | ||
2929 | 1014 | for (bpage = UT_LIST_GET_LAST(buf_pool->LRU), | 1075 | for (bpage = UT_LIST_GET_LAST(buf_pool->LRU), |
2930 | 1015 | scanned = 1, freed = FALSE; | 1076 | scanned = 1, freed = FALSE; |
2931 | @@ -1020,12 +1081,19 @@ | |||
2932 | 1020 | unsigned accessed; | 1081 | unsigned accessed; |
2933 | 1021 | buf_page_t* prev_bpage = UT_LIST_GET_PREV(LRU, | 1082 | buf_page_t* prev_bpage = UT_LIST_GET_PREV(LRU, |
2934 | 1022 | bpage); | 1083 | bpage); |
2935 | 1084 | ib_mutex_t* block_mutex = buf_page_get_mutex(bpage); | ||
2936 | 1023 | 1085 | ||
2937 | 1024 | ut_ad(buf_page_in_file(bpage)); | 1086 | ut_ad(buf_page_in_file(bpage)); |
2938 | 1025 | ut_ad(bpage->in_LRU_list); | 1087 | ut_ad(bpage->in_LRU_list); |
2939 | 1026 | 1088 | ||
2940 | 1027 | accessed = buf_page_is_accessed(bpage); | 1089 | accessed = buf_page_is_accessed(bpage); |
2941 | 1090 | |||
2942 | 1091 | mutex_enter(block_mutex); | ||
2943 | 1092 | |||
2944 | 1028 | freed = buf_LRU_free_page(bpage, true); | 1093 | freed = buf_LRU_free_page(bpage, true); |
2945 | 1094 | |||
2946 | 1095 | mutex_exit(block_mutex); | ||
2947 | 1096 | |||
2948 | 1029 | if (freed && !accessed) { | 1097 | if (freed && !accessed) { |
2949 | 1030 | /* Keep track of pages that are evicted without | 1098 | /* Keep track of pages that are evicted without |
2950 | 1031 | ever being accessed. This gives us a measure of | 1099 | ever being accessed. This gives us a measure of |
2951 | @@ -1057,11 +1125,24 @@ | |||
2952 | 1057 | if TRUE, otherwise scan only | 1125 | if TRUE, otherwise scan only |
2953 | 1058 | 'old' blocks. */ | 1126 | 'old' blocks. */ |
2954 | 1059 | { | 1127 | { |
2960 | 1060 | ut_ad(buf_pool_mutex_own(buf_pool)); | 1128 | ibool freed = FALSE; |
2961 | 1061 | 1129 | bool use_unzip_list = UT_LIST_GET_LEN(buf_pool->unzip_LRU) > 0; | |
2962 | 1062 | return(buf_LRU_free_from_unzip_LRU_list(buf_pool, scan_all) | 1130 | |
2963 | 1063 | || buf_LRU_free_from_common_LRU_list( | 1131 | mutex_enter(&buf_pool->LRU_list_mutex); |
2964 | 1064 | buf_pool, scan_all)); | 1132 | |
2965 | 1133 | if (use_unzip_list) { | ||
2966 | 1134 | freed = buf_LRU_free_from_unzip_LRU_list(buf_pool, scan_all); | ||
2967 | 1135 | } | ||
2968 | 1136 | |||
2969 | 1137 | if (!freed) { | ||
2970 | 1138 | freed = buf_LRU_free_from_common_LRU_list(buf_pool, scan_all); | ||
2971 | 1139 | } | ||
2972 | 1140 | |||
2973 | 1141 | if (!freed) { | ||
2974 | 1142 | mutex_exit(&buf_pool->LRU_list_mutex); | ||
2975 | 1143 | } | ||
2976 | 1144 | |||
2977 | 1145 | return(freed); | ||
2978 | 1065 | } | 1146 | } |
2979 | 1066 | 1147 | ||
2980 | 1067 | /******************************************************************//** | 1148 | /******************************************************************//** |
2981 | @@ -1082,8 +1163,6 @@ | |||
2982 | 1082 | 1163 | ||
2983 | 1083 | buf_pool = buf_pool_from_array(i); | 1164 | buf_pool = buf_pool_from_array(i); |
2984 | 1084 | 1165 | ||
2985 | 1085 | buf_pool_mutex_enter(buf_pool); | ||
2986 | 1086 | |||
2987 | 1087 | if (!recv_recovery_on | 1166 | if (!recv_recovery_on |
2988 | 1088 | && UT_LIST_GET_LEN(buf_pool->free) | 1167 | && UT_LIST_GET_LEN(buf_pool->free) |
2989 | 1089 | + UT_LIST_GET_LEN(buf_pool->LRU) | 1168 | + UT_LIST_GET_LEN(buf_pool->LRU) |
2990 | @@ -1091,8 +1170,6 @@ | |||
2991 | 1091 | 1170 | ||
2992 | 1092 | ret = TRUE; | 1171 | ret = TRUE; |
2993 | 1093 | } | 1172 | } |
2994 | 1094 | |||
2995 | 1095 | buf_pool_mutex_exit(buf_pool); | ||
2996 | 1096 | } | 1173 | } |
2997 | 1097 | 1174 | ||
2998 | 1098 | return(ret); | 1175 | return(ret); |
2999 | @@ -1110,9 +1187,9 @@ | |||
3000 | 1110 | { | 1187 | { |
3001 | 1111 | buf_block_t* block; | 1188 | buf_block_t* block; |
3002 | 1112 | 1189 | ||
3004 | 1113 | ut_ad(buf_pool_mutex_own(buf_pool)); | 1190 | mutex_enter(&buf_pool->free_list_mutex); |
3005 | 1114 | 1191 | ||
3007 | 1115 | block = (buf_block_t*) UT_LIST_GET_FIRST(buf_pool->free); | 1192 | block = (buf_block_t*) UT_LIST_GET_LAST(buf_pool->free); |
3008 | 1116 | 1193 | ||
3009 | 1117 | if (block) { | 1194 | if (block) { |
3010 | 1118 | 1195 | ||
3011 | @@ -1122,18 +1199,23 @@ | |||
3012 | 1122 | ut_ad(!block->page.in_LRU_list); | 1199 | ut_ad(!block->page.in_LRU_list); |
3013 | 1123 | ut_a(!buf_page_in_file(&block->page)); | 1200 | ut_a(!buf_page_in_file(&block->page)); |
3014 | 1124 | UT_LIST_REMOVE(list, buf_pool->free, (&block->page)); | 1201 | UT_LIST_REMOVE(list, buf_pool->free, (&block->page)); |
3015 | 1125 | |||
3016 | 1126 | mutex_enter(&block->mutex); | ||
3017 | 1127 | |||
3018 | 1128 | buf_block_set_state(block, BUF_BLOCK_READY_FOR_USE); | 1202 | buf_block_set_state(block, BUF_BLOCK_READY_FOR_USE); |
3019 | 1203 | |||
3020 | 1204 | mutex_exit(&buf_pool->free_list_mutex); | ||
3021 | 1205 | |||
3022 | 1206 | mutex_enter(&block->mutex); | ||
3023 | 1207 | |||
3024 | 1129 | UNIV_MEM_ALLOC(block->frame, UNIV_PAGE_SIZE); | 1208 | UNIV_MEM_ALLOC(block->frame, UNIV_PAGE_SIZE); |
3025 | 1130 | 1209 | ||
3026 | 1131 | ut_ad(buf_pool_from_block(block) == buf_pool); | 1210 | ut_ad(buf_pool_from_block(block) == buf_pool); |
3027 | 1132 | 1211 | ||
3028 | 1133 | mutex_exit(&block->mutex); | 1212 | mutex_exit(&block->mutex); |
3029 | 1213 | return(block); | ||
3030 | 1134 | } | 1214 | } |
3031 | 1135 | 1215 | ||
3033 | 1136 | return(block); | 1216 | mutex_exit(&buf_pool->free_list_mutex); |
3034 | 1217 | |||
3035 | 1218 | return(NULL); | ||
3036 | 1137 | } | 1219 | } |
3037 | 1138 | 1220 | ||
3038 | 1139 | /******************************************************************//** | 1221 | /******************************************************************//** |
3039 | @@ -1147,8 +1229,6 @@ | |||
3040 | 1147 | /*===================================*/ | 1229 | /*===================================*/ |
3041 | 1148 | const buf_pool_t* buf_pool) /*!< in: buffer pool instance */ | 1230 | const buf_pool_t* buf_pool) /*!< in: buffer pool instance */ |
3042 | 1149 | { | 1231 | { |
3043 | 1150 | ut_ad(buf_pool_mutex_own(buf_pool)); | ||
3044 | 1151 | |||
3045 | 1152 | if (!recv_recovery_on && UT_LIST_GET_LEN(buf_pool->free) | 1232 | if (!recv_recovery_on && UT_LIST_GET_LEN(buf_pool->free) |
3046 | 1153 | + UT_LIST_GET_LEN(buf_pool->LRU) < buf_pool->curr_size / 20) { | 1233 | + UT_LIST_GET_LEN(buf_pool->LRU) < buf_pool->curr_size / 20) { |
3047 | 1154 | ut_print_timestamp(stderr); | 1234 | ut_print_timestamp(stderr); |
3048 | @@ -1253,10 +1333,10 @@ | |||
3049 | 1253 | ibool mon_value_was = FALSE; | 1333 | ibool mon_value_was = FALSE; |
3050 | 1254 | ibool started_monitor = FALSE; | 1334 | ibool started_monitor = FALSE; |
3051 | 1255 | 1335 | ||
3052 | 1336 | ut_ad(!mutex_own(&buf_pool->LRU_list_mutex)); | ||
3053 | 1337 | |||
3054 | 1256 | MONITOR_INC(MONITOR_LRU_GET_FREE_SEARCH); | 1338 | MONITOR_INC(MONITOR_LRU_GET_FREE_SEARCH); |
3055 | 1257 | loop: | 1339 | loop: |
3056 | 1258 | buf_pool_mutex_enter(buf_pool); | ||
3057 | 1259 | |||
3058 | 1260 | buf_LRU_check_size_of_non_data_objects(buf_pool); | 1340 | buf_LRU_check_size_of_non_data_objects(buf_pool); |
3059 | 1261 | 1341 | ||
3060 | 1262 | /* If there is a block in the free list, take it */ | 1342 | /* If there is a block in the free list, take it */ |
3061 | @@ -1264,7 +1344,6 @@ | |||
3062 | 1264 | 1344 | ||
3063 | 1265 | if (block) { | 1345 | if (block) { |
3064 | 1266 | 1346 | ||
3065 | 1267 | buf_pool_mutex_exit(buf_pool); | ||
3066 | 1268 | ut_ad(buf_pool_from_block(block) == buf_pool); | 1347 | ut_ad(buf_pool_from_block(block) == buf_pool); |
3067 | 1269 | memset(&block->page.zip, 0, sizeof block->page.zip); | 1348 | memset(&block->page.zip, 0, sizeof block->page.zip); |
3068 | 1270 | 1349 | ||
3069 | @@ -1275,22 +1354,28 @@ | |||
3070 | 1275 | return(block); | 1354 | return(block); |
3071 | 1276 | } | 1355 | } |
3072 | 1277 | 1356 | ||
3073 | 1357 | mutex_enter(&buf_pool->flush_state_mutex); | ||
3074 | 1358 | |||
3075 | 1278 | if (buf_pool->init_flush[BUF_FLUSH_LRU] | 1359 | if (buf_pool->init_flush[BUF_FLUSH_LRU] |
3076 | 1279 | && srv_use_doublewrite_buf | 1360 | && srv_use_doublewrite_buf |
3077 | 1280 | && buf_dblwr != NULL) { | 1361 | && buf_dblwr != NULL) { |
3078 | 1281 | 1362 | ||
3079 | 1363 | mutex_exit(&buf_pool->flush_state_mutex); | ||
3080 | 1364 | |||
3081 | 1282 | /* If there is an LRU flush happening in the background | 1365 | /* If there is an LRU flush happening in the background |
3082 | 1283 | then we wait for it to end instead of trying a single | 1366 | then we wait for it to end instead of trying a single |
3083 | 1284 | page flush. If, however, we are not using doublewrite | 1367 | page flush. If, however, we are not using doublewrite |
3084 | 1285 | buffer then it is better to do our own single page | 1368 | buffer then it is better to do our own single page |
3085 | 1286 | flush instead of waiting for LRU flush to end. */ | 1369 | flush instead of waiting for LRU flush to end. */ |
3086 | 1287 | buf_pool_mutex_exit(buf_pool); | ||
3087 | 1288 | buf_flush_wait_batch_end(buf_pool, BUF_FLUSH_LRU); | 1370 | buf_flush_wait_batch_end(buf_pool, BUF_FLUSH_LRU); |
3088 | 1289 | goto loop; | 1371 | goto loop; |
3089 | 1290 | } | 1372 | } |
3090 | 1291 | 1373 | ||
3091 | 1374 | mutex_exit(&buf_pool->flush_state_mutex); | ||
3092 | 1375 | |||
3093 | 1292 | freed = FALSE; | 1376 | freed = FALSE; |
3094 | 1293 | if (buf_pool->try_LRU_scan || n_iterations > 0) { | 1377 | if (buf_pool->try_LRU_scan || n_iterations > 0) { |
3095 | 1378 | |||
3096 | 1294 | /* If no block was in the free list, search from the | 1379 | /* If no block was in the free list, search from the |
3097 | 1295 | end of the LRU list and try to free a block there. | 1380 | end of the LRU list and try to free a block there. |
3098 | 1296 | If we are doing for the first time we'll scan only | 1381 | If we are doing for the first time we'll scan only |
3099 | @@ -1308,8 +1393,6 @@ | |||
3100 | 1308 | } | 1393 | } |
3101 | 1309 | } | 1394 | } |
3102 | 1310 | 1395 | ||
3103 | 1311 | buf_pool_mutex_exit(buf_pool); | ||
3104 | 1312 | |||
3105 | 1313 | if (freed) { | 1396 | if (freed) { |
3106 | 1314 | goto loop; | 1397 | goto loop; |
3107 | 1315 | 1398 | ||
3108 | @@ -1395,7 +1478,7 @@ | |||
3109 | 1395 | ulint new_len; | 1478 | ulint new_len; |
3110 | 1396 | 1479 | ||
3111 | 1397 | ut_a(buf_pool->LRU_old); | 1480 | ut_a(buf_pool->LRU_old); |
3113 | 1398 | ut_ad(buf_pool_mutex_own(buf_pool)); | 1481 | ut_ad(mutex_own(&buf_pool->LRU_list_mutex)); |
3114 | 1399 | ut_ad(buf_pool->LRU_old_ratio >= BUF_LRU_OLD_RATIO_MIN); | 1482 | ut_ad(buf_pool->LRU_old_ratio >= BUF_LRU_OLD_RATIO_MIN); |
3115 | 1400 | ut_ad(buf_pool->LRU_old_ratio <= BUF_LRU_OLD_RATIO_MAX); | 1483 | ut_ad(buf_pool->LRU_old_ratio <= BUF_LRU_OLD_RATIO_MAX); |
3116 | 1401 | #if BUF_LRU_OLD_RATIO_MIN * BUF_LRU_OLD_MIN_LEN <= BUF_LRU_OLD_RATIO_DIV * (BUF_LRU_OLD_TOLERANCE + 5) | 1484 | #if BUF_LRU_OLD_RATIO_MIN * BUF_LRU_OLD_MIN_LEN <= BUF_LRU_OLD_RATIO_DIV * (BUF_LRU_OLD_TOLERANCE + 5) |
3117 | @@ -1461,7 +1544,7 @@ | |||
3118 | 1461 | { | 1544 | { |
3119 | 1462 | buf_page_t* bpage; | 1545 | buf_page_t* bpage; |
3120 | 1463 | 1546 | ||
3122 | 1464 | ut_ad(buf_pool_mutex_own(buf_pool)); | 1547 | ut_ad(mutex_own(&buf_pool->LRU_list_mutex)); |
3123 | 1465 | ut_a(UT_LIST_GET_LEN(buf_pool->LRU) == BUF_LRU_OLD_MIN_LEN); | 1548 | ut_a(UT_LIST_GET_LEN(buf_pool->LRU) == BUF_LRU_OLD_MIN_LEN); |
3124 | 1466 | 1549 | ||
3125 | 1467 | /* We first initialize all blocks in the LRU list as old and then use | 1550 | /* We first initialize all blocks in the LRU list as old and then use |
3126 | @@ -1496,7 +1579,7 @@ | |||
3127 | 1496 | ut_ad(buf_pool); | 1579 | ut_ad(buf_pool); |
3128 | 1497 | ut_ad(bpage); | 1580 | ut_ad(bpage); |
3129 | 1498 | ut_ad(buf_page_in_file(bpage)); | 1581 | ut_ad(buf_page_in_file(bpage)); |
3131 | 1499 | ut_ad(buf_pool_mutex_own(buf_pool)); | 1582 | ut_ad(mutex_own(&buf_pool->LRU_list_mutex)); |
3132 | 1500 | 1583 | ||
3133 | 1501 | if (buf_page_belongs_to_unzip_LRU(bpage)) { | 1584 | if (buf_page_belongs_to_unzip_LRU(bpage)) { |
3134 | 1502 | buf_block_t* block = (buf_block_t*) bpage; | 1585 | buf_block_t* block = (buf_block_t*) bpage; |
3135 | @@ -1521,7 +1604,7 @@ | |||
3136 | 1521 | 1604 | ||
3137 | 1522 | ut_ad(buf_pool); | 1605 | ut_ad(buf_pool); |
3138 | 1523 | ut_ad(bpage); | 1606 | ut_ad(bpage); |
3140 | 1524 | ut_ad(buf_pool_mutex_own(buf_pool)); | 1607 | ut_ad(mutex_own(&buf_pool->LRU_list_mutex)); |
3141 | 1525 | 1608 | ||
3142 | 1526 | ut_a(buf_page_in_file(bpage)); | 1609 | ut_a(buf_page_in_file(bpage)); |
3143 | 1527 | 1610 | ||
3144 | @@ -1601,7 +1684,7 @@ | |||
3145 | 1601 | 1684 | ||
3146 | 1602 | ut_ad(buf_pool); | 1685 | ut_ad(buf_pool); |
3147 | 1603 | ut_ad(block); | 1686 | ut_ad(block); |
3149 | 1604 | ut_ad(buf_pool_mutex_own(buf_pool)); | 1687 | ut_ad(mutex_own(&buf_pool->LRU_list_mutex)); |
3150 | 1605 | 1688 | ||
3151 | 1606 | ut_a(buf_page_belongs_to_unzip_LRU(&block->page)); | 1689 | ut_a(buf_page_belongs_to_unzip_LRU(&block->page)); |
3152 | 1607 | 1690 | ||
3153 | @@ -1630,7 +1713,7 @@ | |||
3154 | 1630 | 1713 | ||
3155 | 1631 | ut_ad(buf_pool); | 1714 | ut_ad(buf_pool); |
3156 | 1632 | ut_ad(bpage); | 1715 | ut_ad(bpage); |
3158 | 1633 | ut_ad(buf_pool_mutex_own(buf_pool)); | 1716 | ut_ad(mutex_own(&buf_pool->LRU_list_mutex)); |
3159 | 1634 | 1717 | ||
3160 | 1635 | ut_a(buf_page_in_file(bpage)); | 1718 | ut_a(buf_page_in_file(bpage)); |
3161 | 1636 | 1719 | ||
3162 | @@ -1686,7 +1769,7 @@ | |||
3163 | 1686 | 1769 | ||
3164 | 1687 | ut_ad(buf_pool); | 1770 | ut_ad(buf_pool); |
3165 | 1688 | ut_ad(bpage); | 1771 | ut_ad(bpage); |
3167 | 1689 | ut_ad(buf_pool_mutex_own(buf_pool)); | 1772 | ut_ad(mutex_own(&buf_pool->LRU_list_mutex)); |
3168 | 1690 | 1773 | ||
3169 | 1691 | ut_a(buf_page_in_file(bpage)); | 1774 | ut_a(buf_page_in_file(bpage)); |
3170 | 1692 | ut_ad(!bpage->in_LRU_list); | 1775 | ut_ad(!bpage->in_LRU_list); |
3171 | @@ -1770,7 +1853,7 @@ | |||
3172 | 1770 | { | 1853 | { |
3173 | 1771 | buf_pool_t* buf_pool = buf_pool_from_bpage(bpage); | 1854 | buf_pool_t* buf_pool = buf_pool_from_bpage(bpage); |
3174 | 1772 | 1855 | ||
3176 | 1773 | ut_ad(buf_pool_mutex_own(buf_pool)); | 1856 | ut_ad(mutex_own(&buf_pool->LRU_list_mutex)); |
3177 | 1774 | 1857 | ||
3178 | 1775 | if (bpage->old) { | 1858 | if (bpage->old) { |
3179 | 1776 | buf_pool->stat.n_pages_made_young++; | 1859 | buf_pool->stat.n_pages_made_young++; |
3180 | @@ -1796,12 +1879,14 @@ | |||
3181 | 1796 | Try to free a block. If bpage is a descriptor of a compressed-only | 1879 | Try to free a block. If bpage is a descriptor of a compressed-only |
3182 | 1797 | page, the descriptor object will be freed as well. | 1880 | page, the descriptor object will be freed as well. |
3183 | 1798 | 1881 | ||
3190 | 1799 | NOTE: If this function returns true, it will temporarily | 1882 | NOTE: If this function returns true, it will release the LRU list mutex, |
3191 | 1800 | release buf_pool->mutex. Furthermore, the page frame will no longer be | 1883 | and temporarily release and relock the buf_page_get_mutex() mutex. |
3192 | 1801 | accessible via bpage. | 1884 | Furthermore, the page frame will no longer be accessible via bpage. If this |
3193 | 1802 | 1885 | function returns false, the buf_page_get_mutex() might be temporarily released | |
3194 | 1803 | The caller must hold buf_pool->mutex and must not hold any | 1886 | and relocked too. |
3195 | 1804 | buf_page_get_mutex() when calling this function. | 1887 | |
3196 | 1888 | The caller must hold the LRU list and buf_page_get_mutex() mutexes. | ||
3197 | 1889 | |||
3198 | 1805 | @return true if freed, false otherwise. */ | 1890 | @return true if freed, false otherwise. */ |
3199 | 1806 | UNIV_INTERN | 1891 | UNIV_INTERN |
3200 | 1807 | bool | 1892 | bool |
3201 | @@ -1819,13 +1904,11 @@ | |||
3202 | 1819 | 1904 | ||
3203 | 1820 | ib_mutex_t* block_mutex = buf_page_get_mutex(bpage); | 1905 | ib_mutex_t* block_mutex = buf_page_get_mutex(bpage); |
3204 | 1821 | 1906 | ||
3206 | 1822 | ut_ad(buf_pool_mutex_own(buf_pool)); | 1907 | ut_ad(mutex_own(&buf_pool->LRU_list_mutex)); |
3207 | 1908 | ut_ad(mutex_own(block_mutex)); | ||
3208 | 1823 | ut_ad(buf_page_in_file(bpage)); | 1909 | ut_ad(buf_page_in_file(bpage)); |
3209 | 1824 | ut_ad(bpage->in_LRU_list); | 1910 | ut_ad(bpage->in_LRU_list); |
3210 | 1825 | 1911 | ||
3211 | 1826 | rw_lock_x_lock(hash_lock); | ||
3212 | 1827 | mutex_enter(block_mutex); | ||
3213 | 1828 | |||
3214 | 1829 | #if UNIV_WORD_SIZE == 4 | 1912 | #if UNIV_WORD_SIZE == 4 |
3215 | 1830 | /* On 32-bit systems, there is no padding in buf_page_t. On | 1913 | /* On 32-bit systems, there is no padding in buf_page_t. On |
3216 | 1831 | other systems, Valgrind could complain about uninitialized pad | 1914 | other systems, Valgrind could complain about uninitialized pad |
3217 | @@ -1836,7 +1919,7 @@ | |||
3218 | 1836 | if (!buf_page_can_relocate(bpage)) { | 1919 | if (!buf_page_can_relocate(bpage)) { |
3219 | 1837 | 1920 | ||
3220 | 1838 | /* Do not free buffer-fixed or I/O-fixed blocks. */ | 1921 | /* Do not free buffer-fixed or I/O-fixed blocks. */ |
3222 | 1839 | goto func_exit; | 1922 | return(false); |
3223 | 1840 | } | 1923 | } |
3224 | 1841 | 1924 | ||
3225 | 1842 | #ifdef UNIV_IBUF_COUNT_DEBUG | 1925 | #ifdef UNIV_IBUF_COUNT_DEBUG |
3226 | @@ -1848,7 +1931,7 @@ | |||
3227 | 1848 | /* Do not completely free dirty blocks. */ | 1931 | /* Do not completely free dirty blocks. */ |
3228 | 1849 | 1932 | ||
3229 | 1850 | if (bpage->oldest_modification) { | 1933 | if (bpage->oldest_modification) { |
3231 | 1851 | goto func_exit; | 1934 | return(false); |
3232 | 1852 | } | 1935 | } |
3233 | 1853 | } else if ((bpage->oldest_modification) | 1936 | } else if ((bpage->oldest_modification) |
3234 | 1854 | && (buf_page_get_state(bpage) | 1937 | && (buf_page_get_state(bpage) |
3235 | @@ -1857,18 +1940,13 @@ | |||
3236 | 1857 | ut_ad(buf_page_get_state(bpage) | 1940 | ut_ad(buf_page_get_state(bpage) |
3237 | 1858 | == BUF_BLOCK_ZIP_DIRTY); | 1941 | == BUF_BLOCK_ZIP_DIRTY); |
3238 | 1859 | 1942 | ||
3239 | 1860 | func_exit: | ||
3240 | 1861 | rw_lock_x_unlock(hash_lock); | ||
3241 | 1862 | mutex_exit(block_mutex); | ||
3242 | 1863 | return(false); | 1943 | return(false); |
3243 | 1864 | 1944 | ||
3244 | 1865 | } else if (buf_page_get_state(bpage) == BUF_BLOCK_FILE_PAGE) { | 1945 | } else if (buf_page_get_state(bpage) == BUF_BLOCK_FILE_PAGE) { |
3245 | 1866 | b = buf_page_alloc_descriptor(); | 1946 | b = buf_page_alloc_descriptor(); |
3246 | 1867 | ut_a(b); | 1947 | ut_a(b); |
3247 | 1868 | memcpy(b, bpage, sizeof *b); | ||
3248 | 1869 | } | 1948 | } |
3249 | 1870 | 1949 | ||
3250 | 1871 | ut_ad(buf_pool_mutex_own(buf_pool)); | ||
3251 | 1872 | ut_ad(buf_page_in_file(bpage)); | 1950 | ut_ad(buf_page_in_file(bpage)); |
3252 | 1873 | ut_ad(bpage->in_LRU_list); | 1951 | ut_ad(bpage->in_LRU_list); |
3253 | 1874 | ut_ad(!bpage->in_flush_list == !bpage->oldest_modification); | 1952 | ut_ad(!bpage->in_flush_list == !bpage->oldest_modification); |
3254 | @@ -1887,12 +1965,45 @@ | |||
3255 | 1887 | } | 1965 | } |
3256 | 1888 | #endif /* UNIV_DEBUG */ | 1966 | #endif /* UNIV_DEBUG */ |
3257 | 1889 | 1967 | ||
3262 | 1890 | #ifdef UNIV_SYNC_DEBUG | 1968 | mutex_exit(block_mutex); |
3263 | 1891 | ut_ad(rw_lock_own(hash_lock, RW_LOCK_EX)); | 1969 | |
3264 | 1892 | #endif /* UNIV_SYNC_DEBUG */ | 1970 | rw_lock_x_lock(hash_lock); |
3265 | 1893 | ut_ad(buf_page_can_relocate(bpage)); | 1971 | mutex_enter(block_mutex); |
3266 | 1972 | |||
3267 | 1973 | if (UNIV_UNLIKELY(!buf_page_can_relocate(bpage) | ||
3268 | 1974 | || ((zip || !bpage->zip.data) | ||
3269 | 1975 | && bpage->oldest_modification))) { | ||
3270 | 1976 | |||
3271 | 1977 | not_freed: | ||
3272 | 1978 | rw_lock_x_unlock(hash_lock); | ||
3273 | 1979 | if (b) { | ||
3274 | 1980 | buf_page_free_descriptor(b); | ||
3275 | 1981 | } | ||
3276 | 1982 | |||
3277 | 1983 | return(false); | ||
3278 | 1984 | } else if (UNIV_UNLIKELY(bpage->oldest_modification | ||
3279 | 1985 | && (buf_page_get_state(bpage) | ||
3280 | 1986 | != BUF_BLOCK_FILE_PAGE))) { | ||
3281 | 1987 | |||
3282 | 1988 | ut_ad(buf_page_get_state(bpage) | ||
3283 | 1989 | == BUF_BLOCK_ZIP_DIRTY); | ||
3284 | 1990 | goto not_freed; | ||
3285 | 1991 | } | ||
3286 | 1992 | |||
3287 | 1993 | if (b) { | ||
3288 | 1994 | memcpy(b, bpage, sizeof *b); | ||
3289 | 1995 | } | ||
3290 | 1894 | 1996 | ||
3291 | 1895 | if (!buf_LRU_block_remove_hashed(bpage, zip)) { | 1997 | if (!buf_LRU_block_remove_hashed(bpage, zip)) { |
3292 | 1998 | |||
3293 | 1999 | mutex_exit(&buf_pool->LRU_list_mutex); | ||
3294 | 2000 | |||
3295 | 2001 | if (b) { | ||
3296 | 2002 | buf_page_free_descriptor(b); | ||
3297 | 2003 | } | ||
3298 | 2004 | |||
3299 | 2005 | mutex_enter(block_mutex); | ||
3300 | 2006 | |||
3301 | 1896 | return(true); | 2007 | return(true); |
3302 | 1897 | } | 2008 | } |
3303 | 1898 | 2009 | ||
3304 | @@ -1997,6 +2108,8 @@ | |||
3305 | 1997 | buf_LRU_add_block_low(b, buf_page_is_old(b)); | 2108 | buf_LRU_add_block_low(b, buf_page_is_old(b)); |
3306 | 1998 | } | 2109 | } |
3307 | 1999 | 2110 | ||
3308 | 2111 | mutex_enter(&buf_pool->zip_mutex); | ||
3309 | 2112 | rw_lock_x_unlock(hash_lock); | ||
3310 | 2000 | if (b->state == BUF_BLOCK_ZIP_PAGE) { | 2113 | if (b->state == BUF_BLOCK_ZIP_PAGE) { |
3311 | 2001 | #if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG | 2114 | #if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG |
3312 | 2002 | buf_LRU_insert_zip_clean(b); | 2115 | buf_LRU_insert_zip_clean(b); |
3313 | @@ -2008,35 +2121,16 @@ | |||
3314 | 2008 | 2121 | ||
3315 | 2009 | bpage->zip.data = NULL; | 2122 | bpage->zip.data = NULL; |
3316 | 2010 | page_zip_set_size(&bpage->zip, 0); | 2123 | page_zip_set_size(&bpage->zip, 0); |
3317 | 2011 | mutex_exit(block_mutex); | ||
3318 | 2012 | 2124 | ||
3319 | 2013 | /* Prevent buf_page_get_gen() from | 2125 | /* Prevent buf_page_get_gen() from |
3324 | 2014 | decompressing the block while we release | 2126 | decompressing the block while we release block_mutex. */ |
3321 | 2015 | buf_pool->mutex and block_mutex. */ | ||
3322 | 2016 | block_mutex = buf_page_get_mutex(b); | ||
3323 | 2017 | mutex_enter(block_mutex); | ||
3325 | 2018 | buf_page_set_sticky(b); | 2127 | buf_page_set_sticky(b); |
3344 | 2019 | mutex_exit(block_mutex); | 2128 | mutex_exit(&buf_pool->zip_mutex); |
3345 | 2020 | 2129 | mutex_exit(block_mutex); | |
3346 | 2021 | rw_lock_x_unlock(hash_lock); | 2130 | |
3329 | 2022 | |||
3330 | 2023 | } else { | ||
3331 | 2024 | |||
3332 | 2025 | /* There can be multiple threads doing an LRU scan to | ||
3333 | 2026 | free a block. The page_cleaner thread can be doing an | ||
3334 | 2027 | LRU batch whereas user threads can potentially be doing | ||
3335 | 2028 | multiple single page flushes. As we release | ||
3336 | 2029 | buf_pool->mutex below we need to make sure that no one | ||
3337 | 2030 | else considers this block as a victim for page | ||
3338 | 2031 | replacement. This block is already out of page_hash | ||
3339 | 2032 | and we are about to remove it from the LRU list and put | ||
3340 | 2033 | it on the free list. */ | ||
3341 | 2034 | mutex_enter(block_mutex); | ||
3342 | 2035 | buf_page_set_sticky(bpage); | ||
3343 | 2036 | mutex_exit(block_mutex); | ||
3347 | 2037 | } | 2131 | } |
3348 | 2038 | 2132 | ||
3350 | 2039 | buf_pool_mutex_exit(buf_pool); | 2133 | mutex_exit(&buf_pool->LRU_list_mutex); |
3351 | 2040 | 2134 | ||
3352 | 2041 | /* Remove possible adaptive hash index on the page. | 2135 | /* Remove possible adaptive hash index on the page. |
3353 | 2042 | The page was declared uninitialized by | 2136 | The page was declared uninitialized by |
3354 | @@ -2069,13 +2163,17 @@ | |||
3355 | 2069 | checksum); | 2163 | checksum); |
3356 | 2070 | } | 2164 | } |
3357 | 2071 | 2165 | ||
3358 | 2072 | buf_pool_mutex_enter(buf_pool); | ||
3359 | 2073 | |||
3360 | 2074 | mutex_enter(block_mutex); | 2166 | mutex_enter(block_mutex); |
3363 | 2075 | buf_page_unset_sticky(b != NULL ? b : bpage); | 2167 | |
3364 | 2076 | mutex_exit(block_mutex); | 2168 | if (b) { |
3365 | 2169 | mutex_enter(&buf_pool->zip_mutex); | ||
3366 | 2170 | buf_page_unset_sticky(b); | ||
3367 | 2171 | mutex_exit(&buf_pool->zip_mutex); | ||
3368 | 2172 | } | ||
3369 | 2077 | 2173 | ||
3370 | 2078 | buf_LRU_block_free_hashed_page((buf_block_t*) bpage); | 2174 | buf_LRU_block_free_hashed_page((buf_block_t*) bpage); |
3371 | 2175 | ut_ad(mutex_own(block_mutex)); | ||
3372 | 2176 | ut_ad(!mutex_own(&buf_pool->LRU_list_mutex)); | ||
3373 | 2079 | return(true); | 2177 | return(true); |
3374 | 2080 | } | 2178 | } |
3375 | 2081 | 2179 | ||
3376 | @@ -2091,7 +2189,6 @@ | |||
3377 | 2091 | buf_pool_t* buf_pool = buf_pool_from_block(block); | 2189 | buf_pool_t* buf_pool = buf_pool_from_block(block); |
3378 | 2092 | 2190 | ||
3379 | 2093 | ut_ad(block); | 2191 | ut_ad(block); |
3380 | 2094 | ut_ad(buf_pool_mutex_own(buf_pool)); | ||
3381 | 2095 | ut_ad(mutex_own(&block->mutex)); | 2192 | ut_ad(mutex_own(&block->mutex)); |
3382 | 2096 | 2193 | ||
3383 | 2097 | switch (buf_block_get_state(block)) { | 2194 | switch (buf_block_get_state(block)) { |
3384 | @@ -2109,8 +2206,6 @@ | |||
3385 | 2109 | ut_ad(!block->page.in_flush_list); | 2206 | ut_ad(!block->page.in_flush_list); |
3386 | 2110 | ut_ad(!block->page.in_LRU_list); | 2207 | ut_ad(!block->page.in_LRU_list); |
3387 | 2111 | 2208 | ||
3388 | 2112 | buf_block_set_state(block, BUF_BLOCK_NOT_USED); | ||
3389 | 2113 | |||
3390 | 2114 | UNIV_MEM_ALLOC(block->frame, UNIV_PAGE_SIZE); | 2209 | UNIV_MEM_ALLOC(block->frame, UNIV_PAGE_SIZE); |
3391 | 2115 | #ifdef UNIV_DEBUG | 2210 | #ifdef UNIV_DEBUG |
3392 | 2116 | /* Wipe contents of page to reveal possible stale pointers to it */ | 2211 | /* Wipe contents of page to reveal possible stale pointers to it */ |
3393 | @@ -2125,18 +2220,19 @@ | |||
3394 | 2125 | if (data) { | 2220 | if (data) { |
3395 | 2126 | block->page.zip.data = NULL; | 2221 | block->page.zip.data = NULL; |
3396 | 2127 | mutex_exit(&block->mutex); | 2222 | mutex_exit(&block->mutex); |
3397 | 2128 | buf_pool_mutex_exit_forbid(buf_pool); | ||
3398 | 2129 | 2223 | ||
3399 | 2130 | buf_buddy_free( | 2224 | buf_buddy_free( |
3400 | 2131 | buf_pool, data, page_zip_get_size(&block->page.zip)); | 2225 | buf_pool, data, page_zip_get_size(&block->page.zip)); |
3401 | 2132 | 2226 | ||
3402 | 2133 | buf_pool_mutex_exit_allow(buf_pool); | ||
3403 | 2134 | mutex_enter(&block->mutex); | 2227 | mutex_enter(&block->mutex); |
3404 | 2135 | page_zip_set_size(&block->page.zip, 0); | 2228 | page_zip_set_size(&block->page.zip, 0); |
3405 | 2136 | } | 2229 | } |
3406 | 2137 | 2230 | ||
3407 | 2231 | mutex_enter(&buf_pool->free_list_mutex); | ||
3408 | 2232 | buf_block_set_state(block, BUF_BLOCK_NOT_USED); | ||
3409 | 2138 | UT_LIST_ADD_FIRST(list, buf_pool->free, (&block->page)); | 2233 | UT_LIST_ADD_FIRST(list, buf_pool->free, (&block->page)); |
3410 | 2139 | ut_d(block->page.in_free_list = TRUE); | 2234 | ut_d(block->page.in_free_list = TRUE); |
3411 | 2235 | mutex_exit(&buf_pool->free_list_mutex); | ||
3412 | 2140 | 2236 | ||
3413 | 2141 | UNIV_MEM_ASSERT_AND_FREE(block->frame, UNIV_PAGE_SIZE); | 2237 | UNIV_MEM_ASSERT_AND_FREE(block->frame, UNIV_PAGE_SIZE); |
3414 | 2142 | } | 2238 | } |
3415 | @@ -2146,7 +2242,7 @@ | |||
3416 | 2146 | If the block is compressed-only (BUF_BLOCK_ZIP_PAGE), | 2242 | If the block is compressed-only (BUF_BLOCK_ZIP_PAGE), |
3417 | 2147 | the object will be freed. | 2243 | the object will be freed. |
3418 | 2148 | 2244 | ||
3420 | 2149 | The caller must hold buf_pool->mutex, the buf_page_get_mutex() mutex | 2245 | The caller must hold buf_pool->LRU_list_mutex, the buf_page_get_mutex() mutex |
3421 | 2150 | and the appropriate hash_lock. This function will release the | 2246 | and the appropriate hash_lock. This function will release the |
3422 | 2151 | buf_page_get_mutex() and the hash_lock. | 2247 | buf_page_get_mutex() and the hash_lock. |
3423 | 2152 | 2248 | ||
3424 | @@ -2171,7 +2267,7 @@ | |||
3425 | 2171 | rw_lock_t* hash_lock; | 2267 | rw_lock_t* hash_lock; |
3426 | 2172 | 2268 | ||
3427 | 2173 | ut_ad(bpage); | 2269 | ut_ad(bpage); |
3429 | 2174 | ut_ad(buf_pool_mutex_own(buf_pool)); | 2270 | ut_ad(mutex_own(&buf_pool->LRU_list_mutex)); |
3430 | 2175 | ut_ad(mutex_own(buf_page_get_mutex(bpage))); | 2271 | ut_ad(mutex_own(buf_page_get_mutex(bpage))); |
3431 | 2176 | 2272 | ||
3432 | 2177 | fold = buf_page_address_fold(bpage->space, bpage->offset); | 2273 | fold = buf_page_address_fold(bpage->space, bpage->offset); |
3433 | @@ -2287,7 +2383,7 @@ | |||
3434 | 2287 | #if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG | 2383 | #if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG |
3435 | 2288 | mutex_exit(buf_page_get_mutex(bpage)); | 2384 | mutex_exit(buf_page_get_mutex(bpage)); |
3436 | 2289 | rw_lock_x_unlock(hash_lock); | 2385 | rw_lock_x_unlock(hash_lock); |
3438 | 2290 | buf_pool_mutex_exit(buf_pool); | 2386 | mutex_exit(&buf_pool->LRU_list_mutex); |
3439 | 2291 | buf_print(); | 2387 | buf_print(); |
3440 | 2292 | buf_LRU_print(); | 2388 | buf_LRU_print(); |
3441 | 2293 | buf_validate(); | 2389 | buf_validate(); |
3442 | @@ -2314,13 +2410,11 @@ | |||
3443 | 2314 | 2410 | ||
3444 | 2315 | mutex_exit(&buf_pool->zip_mutex); | 2411 | mutex_exit(&buf_pool->zip_mutex); |
3445 | 2316 | rw_lock_x_unlock(hash_lock); | 2412 | rw_lock_x_unlock(hash_lock); |
3446 | 2317 | buf_pool_mutex_exit_forbid(buf_pool); | ||
3447 | 2318 | 2413 | ||
3448 | 2319 | buf_buddy_free( | 2414 | buf_buddy_free( |
3449 | 2320 | buf_pool, bpage->zip.data, | 2415 | buf_pool, bpage->zip.data, |
3450 | 2321 | page_zip_get_size(&bpage->zip)); | 2416 | page_zip_get_size(&bpage->zip)); |
3451 | 2322 | 2417 | ||
3452 | 2323 | buf_pool_mutex_exit_allow(buf_pool); | ||
3453 | 2324 | buf_page_free_descriptor(bpage); | 2418 | buf_page_free_descriptor(bpage); |
3454 | 2325 | return(false); | 2419 | return(false); |
3455 | 2326 | 2420 | ||
3456 | @@ -2344,14 +2438,15 @@ | |||
3457 | 2344 | page_hash. Only possibility is when while invalidating | 2438 | page_hash. Only possibility is when while invalidating |
3458 | 2345 | a tablespace we buffer fix the prev_page in LRU to | 2439 | a tablespace we buffer fix the prev_page in LRU to |
3459 | 2346 | avoid relocation during the scan. But that is not | 2440 | avoid relocation during the scan. But that is not |
3461 | 2347 | possible because we are holding buf_pool mutex. | 2441 | possible because we are holding LRU list mutex. |
3462 | 2348 | 2442 | ||
3463 | 2349 | 2) Not possible because in buf_page_init_for_read() | 2443 | 2) Not possible because in buf_page_init_for_read() |
3466 | 2350 | we do a look up of page_hash while holding buf_pool | 2444 | we do a look up of page_hash while holding LRU list |
3467 | 2351 | mutex and since we are holding buf_pool mutex here | 2445 | mutex and since we are holding LRU list mutex here |
3468 | 2352 | and by the time we'll release it in the caller we'd | 2446 | and by the time we'll release it in the caller we'd |
3469 | 2353 | have inserted the compressed only descriptor in the | 2447 | have inserted the compressed only descriptor in the |
3470 | 2354 | page_hash. */ | 2448 | page_hash. */ |
3471 | 2449 | ut_ad(mutex_own(&buf_pool->LRU_list_mutex)); | ||
3472 | 2355 | rw_lock_x_unlock(hash_lock); | 2450 | rw_lock_x_unlock(hash_lock); |
3473 | 2356 | mutex_exit(&((buf_block_t*) bpage)->mutex); | 2451 | mutex_exit(&((buf_block_t*) bpage)->mutex); |
3474 | 2357 | 2452 | ||
3475 | @@ -2363,13 +2458,11 @@ | |||
3476 | 2363 | ut_ad(!bpage->in_free_list); | 2458 | ut_ad(!bpage->in_free_list); |
3477 | 2364 | ut_ad(!bpage->in_flush_list); | 2459 | ut_ad(!bpage->in_flush_list); |
3478 | 2365 | ut_ad(!bpage->in_LRU_list); | 2460 | ut_ad(!bpage->in_LRU_list); |
3479 | 2366 | buf_pool_mutex_exit_forbid(buf_pool); | ||
3480 | 2367 | 2461 | ||
3481 | 2368 | buf_buddy_free( | 2462 | buf_buddy_free( |
3482 | 2369 | buf_pool, data, | 2463 | buf_pool, data, |
3483 | 2370 | page_zip_get_size(&bpage->zip)); | 2464 | page_zip_get_size(&bpage->zip)); |
3484 | 2371 | 2465 | ||
3485 | 2372 | buf_pool_mutex_exit_allow(buf_pool); | ||
3486 | 2373 | page_zip_set_size(&bpage->zip, 0); | 2466 | page_zip_set_size(&bpage->zip, 0); |
3487 | 2374 | } | 2467 | } |
3488 | 2375 | 2468 | ||
3489 | @@ -2397,16 +2490,11 @@ | |||
3490 | 2397 | buf_block_t* block) /*!< in: block, must contain a file page and | 2490 | buf_block_t* block) /*!< in: block, must contain a file page and |
3491 | 2398 | be in a state where it can be freed */ | 2491 | be in a state where it can be freed */ |
3492 | 2399 | { | 2492 | { |
3497 | 2400 | #ifdef UNIV_DEBUG | 2493 | ut_ad(mutex_own(&block->mutex)); |
3494 | 2401 | buf_pool_t* buf_pool = buf_pool_from_block(block); | ||
3495 | 2402 | ut_ad(buf_pool_mutex_own(buf_pool)); | ||
3496 | 2403 | #endif | ||
3498 | 2404 | 2494 | ||
3499 | 2405 | mutex_enter(&block->mutex); | ||
3500 | 2406 | buf_block_set_state(block, BUF_BLOCK_MEMORY); | 2495 | buf_block_set_state(block, BUF_BLOCK_MEMORY); |
3501 | 2407 | 2496 | ||
3502 | 2408 | buf_LRU_block_free_non_file_page(block); | 2497 | buf_LRU_block_free_non_file_page(block); |
3503 | 2409 | mutex_exit(&block->mutex); | ||
3504 | 2410 | } | 2498 | } |
3505 | 2411 | 2499 | ||
3506 | 2412 | /******************************************************************//** | 2500 | /******************************************************************//** |
3507 | @@ -2419,19 +2507,26 @@ | |||
3508 | 2419 | be in a state where it can be freed; there | 2507 | be in a state where it can be freed; there |
3509 | 2420 | may or may not be a hash index to the page */ | 2508 | may or may not be a hash index to the page */ |
3510 | 2421 | { | 2509 | { |
3511 | 2510 | #if defined(UNIV_DEBUG) || defined(UNIV_SYNC_DEBUG) | ||
3512 | 2422 | buf_pool_t* buf_pool = buf_pool_from_bpage(bpage); | 2511 | buf_pool_t* buf_pool = buf_pool_from_bpage(bpage); |
3513 | 2512 | #endif | ||
3514 | 2513 | #ifdef UNIV_SYNC_DEBUG | ||
3515 | 2423 | const ulint fold = buf_page_address_fold(bpage->space, | 2514 | const ulint fold = buf_page_address_fold(bpage->space, |
3516 | 2424 | bpage->offset); | 2515 | bpage->offset); |
3517 | 2425 | rw_lock_t* hash_lock = buf_page_hash_lock_get(buf_pool, fold); | 2516 | rw_lock_t* hash_lock = buf_page_hash_lock_get(buf_pool, fold); |
3518 | 2517 | #endif | ||
3519 | 2426 | ib_mutex_t* block_mutex = buf_page_get_mutex(bpage); | 2518 | ib_mutex_t* block_mutex = buf_page_get_mutex(bpage); |
3520 | 2427 | 2519 | ||
3525 | 2428 | ut_ad(buf_pool_mutex_own(buf_pool)); | 2520 | ut_ad(mutex_own(&buf_pool->LRU_list_mutex)); |
3526 | 2429 | 2521 | ut_ad(mutex_own(block_mutex)); | |
3527 | 2430 | rw_lock_x_lock(hash_lock); | 2522 | #ifdef UNIV_SYNC_DEBUG |
3528 | 2431 | mutex_enter(block_mutex); | 2523 | ut_ad(rw_lock_own(hash_lock, RW_LOCK_EX)); |
3529 | 2524 | #endif | ||
3530 | 2432 | 2525 | ||
3531 | 2433 | if (buf_LRU_block_remove_hashed(bpage, true)) { | 2526 | if (buf_LRU_block_remove_hashed(bpage, true)) { |
3532 | 2527 | mutex_enter(block_mutex); | ||
3533 | 2434 | buf_LRU_block_free_hashed_page((buf_block_t*) bpage); | 2528 | buf_LRU_block_free_hashed_page((buf_block_t*) bpage); |
3534 | 2529 | mutex_exit(block_mutex); | ||
3535 | 2435 | } | 2530 | } |
3536 | 2436 | 2531 | ||
3537 | 2437 | /* buf_LRU_block_remove_hashed() releases hash_lock and block_mutex */ | 2532 | /* buf_LRU_block_remove_hashed() releases hash_lock and block_mutex */ |
3538 | @@ -2466,7 +2561,7 @@ | |||
3539 | 2466 | } | 2561 | } |
3540 | 2467 | 2562 | ||
3541 | 2468 | if (adjust) { | 2563 | if (adjust) { |
3543 | 2469 | buf_pool_mutex_enter(buf_pool); | 2564 | mutex_enter(&buf_pool->LRU_list_mutex); |
3544 | 2470 | 2565 | ||
3545 | 2471 | if (ratio != buf_pool->LRU_old_ratio) { | 2566 | if (ratio != buf_pool->LRU_old_ratio) { |
3546 | 2472 | buf_pool->LRU_old_ratio = ratio; | 2567 | buf_pool->LRU_old_ratio = ratio; |
3547 | @@ -2478,7 +2573,7 @@ | |||
3548 | 2478 | } | 2573 | } |
3549 | 2479 | } | 2574 | } |
3550 | 2480 | 2575 | ||
3552 | 2481 | buf_pool_mutex_exit(buf_pool); | 2576 | mutex_exit(&buf_pool->LRU_list_mutex); |
3553 | 2482 | } else { | 2577 | } else { |
3554 | 2483 | buf_pool->LRU_old_ratio = ratio; | 2578 | buf_pool->LRU_old_ratio = ratio; |
3555 | 2484 | } | 2579 | } |
3556 | @@ -2583,7 +2678,7 @@ | |||
3557 | 2583 | ulint new_len; | 2678 | ulint new_len; |
3558 | 2584 | 2679 | ||
3559 | 2585 | ut_ad(buf_pool); | 2680 | ut_ad(buf_pool); |
3561 | 2586 | buf_pool_mutex_enter(buf_pool); | 2681 | mutex_enter(&buf_pool->LRU_list_mutex); |
3562 | 2587 | 2682 | ||
3563 | 2588 | if (UT_LIST_GET_LEN(buf_pool->LRU) >= BUF_LRU_OLD_MIN_LEN) { | 2683 | if (UT_LIST_GET_LEN(buf_pool->LRU) >= BUF_LRU_OLD_MIN_LEN) { |
3564 | 2589 | 2684 | ||
3565 | @@ -2641,6 +2736,10 @@ | |||
3566 | 2641 | 2736 | ||
3567 | 2642 | ut_a(buf_pool->LRU_old_len == old_len); | 2737 | ut_a(buf_pool->LRU_old_len == old_len); |
3568 | 2643 | 2738 | ||
3569 | 2739 | mutex_exit(&buf_pool->LRU_list_mutex); | ||
3570 | 2740 | |||
3571 | 2741 | mutex_enter(&buf_pool->free_list_mutex); | ||
3572 | 2742 | |||
3573 | 2644 | UT_LIST_VALIDATE(list, buf_page_t, buf_pool->free, CheckInFreeList()); | 2743 | UT_LIST_VALIDATE(list, buf_page_t, buf_pool->free, CheckInFreeList()); |
3574 | 2645 | 2744 | ||
3575 | 2646 | for (bpage = UT_LIST_GET_FIRST(buf_pool->free); | 2745 | for (bpage = UT_LIST_GET_FIRST(buf_pool->free); |
3576 | @@ -2650,6 +2749,10 @@ | |||
3577 | 2650 | ut_a(buf_page_get_state(bpage) == BUF_BLOCK_NOT_USED); | 2749 | ut_a(buf_page_get_state(bpage) == BUF_BLOCK_NOT_USED); |
3578 | 2651 | } | 2750 | } |
3579 | 2652 | 2751 | ||
3580 | 2752 | mutex_exit(&buf_pool->free_list_mutex); | ||
3581 | 2753 | |||
3582 | 2754 | mutex_enter(&buf_pool->LRU_list_mutex); | ||
3583 | 2755 | |||
3584 | 2653 | UT_LIST_VALIDATE( | 2756 | UT_LIST_VALIDATE( |
3585 | 2654 | unzip_LRU, buf_block_t, buf_pool->unzip_LRU, | 2757 | unzip_LRU, buf_block_t, buf_pool->unzip_LRU, |
3586 | 2655 | CheckUnzipLRUAndLRUList()); | 2758 | CheckUnzipLRUAndLRUList()); |
3587 | @@ -2663,7 +2766,7 @@ | |||
3588 | 2663 | ut_a(buf_page_belongs_to_unzip_LRU(&block->page)); | 2766 | ut_a(buf_page_belongs_to_unzip_LRU(&block->page)); |
3589 | 2664 | } | 2767 | } |
3590 | 2665 | 2768 | ||
3592 | 2666 | buf_pool_mutex_exit(buf_pool); | 2769 | mutex_exit(&buf_pool->LRU_list_mutex); |
3593 | 2667 | } | 2770 | } |
3594 | 2668 | 2771 | ||
3595 | 2669 | /**********************************************************************//** | 2772 | /**********************************************************************//** |
3596 | @@ -2699,7 +2802,7 @@ | |||
3597 | 2699 | const buf_page_t* bpage; | 2802 | const buf_page_t* bpage; |
3598 | 2700 | 2803 | ||
3599 | 2701 | ut_ad(buf_pool); | 2804 | ut_ad(buf_pool); |
3601 | 2702 | buf_pool_mutex_enter(buf_pool); | 2805 | mutex_enter(&buf_pool->LRU_list_mutex); |
3602 | 2703 | 2806 | ||
3603 | 2704 | bpage = UT_LIST_GET_FIRST(buf_pool->LRU); | 2807 | bpage = UT_LIST_GET_FIRST(buf_pool->LRU); |
3604 | 2705 | 2808 | ||
3605 | @@ -2756,7 +2859,7 @@ | |||
3606 | 2756 | bpage = UT_LIST_GET_NEXT(LRU, bpage); | 2859 | bpage = UT_LIST_GET_NEXT(LRU, bpage); |
3607 | 2757 | } | 2860 | } |
3608 | 2758 | 2861 | ||
3610 | 2759 | buf_pool_mutex_exit(buf_pool); | 2862 | mutex_exit(&buf_pool->LRU_list_mutex); |
3611 | 2760 | } | 2863 | } |
3612 | 2761 | 2864 | ||
3613 | 2762 | /**********************************************************************//** | 2865 | /**********************************************************************//** |
3614 | 2763 | 2866 | ||
3615 | === modified file 'Percona-Server/storage/innobase/buf/buf0rea.cc' | |||
3616 | --- Percona-Server/storage/innobase/buf/buf0rea.cc 2013-08-06 15:16:34 +0000 | |||
3617 | +++ Percona-Server/storage/innobase/buf/buf0rea.cc 2013-09-20 05:29:11 +0000 | |||
3618 | @@ -63,10 +63,15 @@ | |||
3619 | 63 | buf_pool_t* buf_pool = buf_pool_from_bpage(bpage); | 63 | buf_pool_t* buf_pool = buf_pool_from_bpage(bpage); |
3620 | 64 | const bool uncompressed = (buf_page_get_state(bpage) | 64 | const bool uncompressed = (buf_page_get_state(bpage) |
3621 | 65 | == BUF_BLOCK_FILE_PAGE); | 65 | == BUF_BLOCK_FILE_PAGE); |
3622 | 66 | const ulint fold = buf_page_address_fold(bpage->space, | ||
3623 | 67 | bpage->offset); | ||
3624 | 68 | rw_lock_t* hash_lock = buf_page_hash_lock_get(buf_pool, fold); | ||
3625 | 69 | |||
3626 | 70 | mutex_enter(&buf_pool->LRU_list_mutex); | ||
3627 | 71 | rw_lock_x_lock(hash_lock); | ||
3628 | 72 | mutex_enter(buf_page_get_mutex(bpage)); | ||
3629 | 66 | 73 | ||
3630 | 67 | /* First unfix and release lock on the bpage */ | 74 | /* First unfix and release lock on the bpage */ |
3631 | 68 | buf_pool_mutex_enter(buf_pool); | ||
3632 | 69 | mutex_enter(buf_page_get_mutex(bpage)); | ||
3633 | 70 | ut_ad(buf_page_get_io_fix(bpage) == BUF_IO_READ); | 75 | ut_ad(buf_page_get_io_fix(bpage) == BUF_IO_READ); |
3634 | 71 | ut_ad(bpage->buf_fix_count == 0); | 76 | ut_ad(bpage->buf_fix_count == 0); |
3635 | 72 | 77 | ||
3636 | @@ -79,15 +84,13 @@ | |||
3637 | 79 | BUF_IO_READ); | 84 | BUF_IO_READ); |
3638 | 80 | } | 85 | } |
3639 | 81 | 86 | ||
3640 | 82 | mutex_exit(buf_page_get_mutex(bpage)); | ||
3641 | 83 | |||
3642 | 84 | /* remove the block from LRU list */ | 87 | /* remove the block from LRU list */ |
3643 | 85 | buf_LRU_free_one_page(bpage); | 88 | buf_LRU_free_one_page(bpage); |
3644 | 86 | 89 | ||
3645 | 90 | mutex_exit(&buf_pool->LRU_list_mutex); | ||
3646 | 91 | |||
3647 | 87 | ut_ad(buf_pool->n_pend_reads > 0); | 92 | ut_ad(buf_pool->n_pend_reads > 0); |
3651 | 88 | buf_pool->n_pend_reads--; | 93 | os_atomic_decrement_ulint(&buf_pool->n_pend_reads, 1); |
3649 | 89 | |||
3650 | 90 | buf_pool_mutex_exit(buf_pool); | ||
3652 | 91 | } | 94 | } |
3653 | 92 | 95 | ||
3654 | 93 | /********************************************************************//** | 96 | /********************************************************************//** |
3655 | @@ -216,6 +219,7 @@ | |||
3656 | 216 | #endif | 219 | #endif |
3657 | 217 | 220 | ||
3658 | 218 | ut_ad(buf_page_in_file(bpage)); | 221 | ut_ad(buf_page_in_file(bpage)); |
3659 | 222 | ut_ad(!mutex_own(&buf_pool_from_bpage(bpage)->LRU_list_mutex)); | ||
3660 | 219 | 223 | ||
3661 | 220 | if (sync) { | 224 | if (sync) { |
3662 | 221 | thd_wait_begin(NULL, THD_WAIT_DISKIO); | 225 | thd_wait_begin(NULL, THD_WAIT_DISKIO); |
3663 | @@ -332,11 +336,8 @@ | |||
3664 | 332 | high = fil_space_get_size(space); | 336 | high = fil_space_get_size(space); |
3665 | 333 | } | 337 | } |
3666 | 334 | 338 | ||
3667 | 335 | buf_pool_mutex_enter(buf_pool); | ||
3668 | 336 | |||
3669 | 337 | if (buf_pool->n_pend_reads | 339 | if (buf_pool->n_pend_reads |
3670 | 338 | > buf_pool->curr_size / BUF_READ_AHEAD_PEND_LIMIT) { | 340 | > buf_pool->curr_size / BUF_READ_AHEAD_PEND_LIMIT) { |
3671 | 339 | buf_pool_mutex_exit(buf_pool); | ||
3672 | 340 | 341 | ||
3673 | 341 | return(0); | 342 | return(0); |
3674 | 342 | } | 343 | } |
3675 | @@ -345,8 +346,12 @@ | |||
3676 | 345 | that is, reside near the start of the LRU list. */ | 346 | that is, reside near the start of the LRU list. */ |
3677 | 346 | 347 | ||
3678 | 347 | for (i = low; i < high; i++) { | 348 | for (i = low; i < high; i++) { |
3679 | 349 | |||
3680 | 350 | rw_lock_t* hash_lock; | ||
3681 | 351 | |||
3682 | 348 | const buf_page_t* bpage = | 352 | const buf_page_t* bpage = |
3684 | 349 | buf_page_hash_get(buf_pool, space, i); | 353 | buf_page_hash_get_s_locked(buf_pool, space, i, |
3685 | 354 | &hash_lock); | ||
3686 | 350 | 355 | ||
3687 | 351 | if (bpage | 356 | if (bpage |
3688 | 352 | && buf_page_is_accessed(bpage) | 357 | && buf_page_is_accessed(bpage) |
3689 | @@ -357,13 +362,16 @@ | |||
3690 | 357 | if (recent_blocks | 362 | if (recent_blocks |
3691 | 358 | >= BUF_READ_AHEAD_RANDOM_THRESHOLD(buf_pool)) { | 363 | >= BUF_READ_AHEAD_RANDOM_THRESHOLD(buf_pool)) { |
3692 | 359 | 364 | ||
3694 | 360 | buf_pool_mutex_exit(buf_pool); | 365 | rw_lock_s_unlock(hash_lock); |
3695 | 361 | goto read_ahead; | 366 | goto read_ahead; |
3696 | 362 | } | 367 | } |
3697 | 363 | } | 368 | } |
3698 | 369 | |||
3699 | 370 | if (bpage) { | ||
3700 | 371 | rw_lock_s_unlock(hash_lock); | ||
3701 | 372 | } | ||
3702 | 364 | } | 373 | } |
3703 | 365 | 374 | ||
3704 | 366 | buf_pool_mutex_exit(buf_pool); | ||
3705 | 367 | /* Do nothing */ | 375 | /* Do nothing */ |
3706 | 368 | return(0); | 376 | return(0); |
3707 | 369 | 377 | ||
3708 | @@ -551,6 +559,7 @@ | |||
3709 | 551 | buf_page_t* bpage; | 559 | buf_page_t* bpage; |
3710 | 552 | buf_frame_t* frame; | 560 | buf_frame_t* frame; |
3711 | 553 | buf_page_t* pred_bpage = NULL; | 561 | buf_page_t* pred_bpage = NULL; |
3712 | 562 | unsigned pred_bpage_is_accessed = 0; | ||
3713 | 554 | ulint pred_offset; | 563 | ulint pred_offset; |
3714 | 555 | ulint succ_offset; | 564 | ulint succ_offset; |
3715 | 556 | ulint count; | 565 | ulint count; |
3716 | @@ -602,10 +611,7 @@ | |||
3717 | 602 | 611 | ||
3718 | 603 | tablespace_version = fil_space_get_version(space); | 612 | tablespace_version = fil_space_get_version(space); |
3719 | 604 | 613 | ||
3720 | 605 | buf_pool_mutex_enter(buf_pool); | ||
3721 | 606 | |||
3722 | 607 | if (high > fil_space_get_size(space)) { | 614 | if (high > fil_space_get_size(space)) { |
3723 | 608 | buf_pool_mutex_exit(buf_pool); | ||
3724 | 609 | /* The area is not whole, return */ | 615 | /* The area is not whole, return */ |
3725 | 610 | 616 | ||
3726 | 611 | return(0); | 617 | return(0); |
3727 | @@ -613,7 +619,6 @@ | |||
3728 | 613 | 619 | ||
3729 | 614 | if (buf_pool->n_pend_reads | 620 | if (buf_pool->n_pend_reads |
3730 | 615 | > buf_pool->curr_size / BUF_READ_AHEAD_PEND_LIMIT) { | 621 | > buf_pool->curr_size / BUF_READ_AHEAD_PEND_LIMIT) { |
3731 | 616 | buf_pool_mutex_exit(buf_pool); | ||
3732 | 617 | 622 | ||
3733 | 618 | return(0); | 623 | return(0); |
3734 | 619 | } | 624 | } |
3735 | @@ -636,7 +641,11 @@ | |||
3736 | 636 | fail_count = 0; | 641 | fail_count = 0; |
3737 | 637 | 642 | ||
3738 | 638 | for (i = low; i < high; i++) { | 643 | for (i = low; i < high; i++) { |
3740 | 639 | bpage = buf_page_hash_get(buf_pool, space, i); | 644 | |
3741 | 645 | rw_lock_t* hash_lock; | ||
3742 | 646 | |||
3743 | 647 | bpage = buf_page_hash_get_s_locked(buf_pool, space, i, | ||
3744 | 648 | &hash_lock); | ||
3745 | 640 | 649 | ||
3746 | 641 | if (bpage == NULL || !buf_page_is_accessed(bpage)) { | 650 | if (bpage == NULL || !buf_page_is_accessed(bpage)) { |
3747 | 642 | /* Not accessed */ | 651 | /* Not accessed */ |
3748 | @@ -653,7 +662,7 @@ | |||
3749 | 653 | a little against this. */ | 662 | a little against this. */ |
3750 | 654 | int res = ut_ulint_cmp( | 663 | int res = ut_ulint_cmp( |
3751 | 655 | buf_page_is_accessed(bpage), | 664 | buf_page_is_accessed(bpage), |
3753 | 656 | buf_page_is_accessed(pred_bpage)); | 665 | pred_bpage_is_accessed); |
3754 | 657 | /* Accesses not in the right order */ | 666 | /* Accesses not in the right order */ |
3755 | 658 | if (res != 0 && res != asc_or_desc) { | 667 | if (res != 0 && res != asc_or_desc) { |
3756 | 659 | fail_count++; | 668 | fail_count++; |
3757 | @@ -662,12 +671,20 @@ | |||
3758 | 662 | 671 | ||
3759 | 663 | if (fail_count > threshold) { | 672 | if (fail_count > threshold) { |
3760 | 664 | /* Too many failures: return */ | 673 | /* Too many failures: return */ |
3762 | 665 | buf_pool_mutex_exit(buf_pool); | 674 | if (bpage) { |
3763 | 675 | rw_lock_s_unlock(hash_lock); | ||
3764 | 676 | } | ||
3765 | 666 | return(0); | 677 | return(0); |
3766 | 667 | } | 678 | } |
3767 | 668 | 679 | ||
3770 | 669 | if (bpage && buf_page_is_accessed(bpage)) { | 680 | if (bpage) { |
3771 | 670 | pred_bpage = bpage; | 681 | if (buf_page_is_accessed(bpage)) { |
3772 | 682 | pred_bpage = bpage; | ||
3773 | 683 | pred_bpage_is_accessed | ||
3774 | 684 | = buf_page_is_accessed(bpage); | ||
3775 | 685 | } | ||
3776 | 686 | |||
3777 | 687 | rw_lock_s_unlock(hash_lock); | ||
3778 | 671 | } | 688 | } |
3779 | 672 | } | 689 | } |
3780 | 673 | 690 | ||
3781 | @@ -677,7 +694,6 @@ | |||
3782 | 677 | bpage = buf_page_hash_get(buf_pool, space, offset); | 694 | bpage = buf_page_hash_get(buf_pool, space, offset); |
3783 | 678 | 695 | ||
3784 | 679 | if (bpage == NULL) { | 696 | if (bpage == NULL) { |
3785 | 680 | buf_pool_mutex_exit(buf_pool); | ||
3786 | 681 | 697 | ||
3787 | 682 | return(0); | 698 | return(0); |
3788 | 683 | } | 699 | } |
3789 | @@ -703,8 +719,6 @@ | |||
3790 | 703 | pred_offset = fil_page_get_prev(frame); | 719 | pred_offset = fil_page_get_prev(frame); |
3791 | 704 | succ_offset = fil_page_get_next(frame); | 720 | succ_offset = fil_page_get_next(frame); |
3792 | 705 | 721 | ||
3793 | 706 | buf_pool_mutex_exit(buf_pool); | ||
3794 | 707 | |||
3795 | 708 | if ((offset == low) && (succ_offset == offset + 1)) { | 722 | if ((offset == low) && (succ_offset == offset + 1)) { |
3796 | 709 | 723 | ||
3797 | 710 | /* This is ok, we can continue */ | 724 | /* This is ok, we can continue */ |
3798 | @@ -961,7 +975,8 @@ | |||
3799 | 961 | 975 | ||
3800 | 962 | os_aio_print_debug = FALSE; | 976 | os_aio_print_debug = FALSE; |
3801 | 963 | buf_pool = buf_pool_get(space, page_nos[i]); | 977 | buf_pool = buf_pool_get(space, page_nos[i]); |
3803 | 964 | while (buf_pool->n_pend_reads >= recv_n_pool_free_frames / 2) { | 978 | while (buf_pool->n_pend_reads |
3804 | 979 | >= recv_n_pool_free_frames / 2) { | ||
3805 | 965 | 980 | ||
3806 | 966 | os_aio_simulated_wake_handler_threads(); | 981 | os_aio_simulated_wake_handler_threads(); |
3807 | 967 | os_thread_sleep(10000); | 982 | os_thread_sleep(10000); |
3808 | 968 | 983 | ||
3809 | === modified file 'Percona-Server/storage/innobase/fsp/fsp0fsp.cc' | |||
3810 | --- Percona-Server/storage/innobase/fsp/fsp0fsp.cc 2013-05-30 12:47:19 +0000 | |||
3811 | +++ Percona-Server/storage/innobase/fsp/fsp0fsp.cc 2013-09-20 05:29:11 +0000 | |||
3812 | @@ -2870,7 +2870,7 @@ | |||
3813 | 2870 | 2870 | ||
3814 | 2871 | /* The convoluted mutex acquire is to overcome latching order | 2871 | /* The convoluted mutex acquire is to overcome latching order |
3815 | 2872 | issues: The problem is that the fil_mutex is at a lower level | 2872 | issues: The problem is that the fil_mutex is at a lower level |
3817 | 2873 | than the tablespace latch and the buffer pool mutex. We have to | 2873 | than the tablespace latch and the buffer pool mutexes. We have to |
3818 | 2874 | first prevent any operations on the file system by acquiring the | 2874 | first prevent any operations on the file system by acquiring the |
3819 | 2875 | dictionary mutex. Then acquire the tablespace latch to obey the | 2875 | dictionary mutex. Then acquire the tablespace latch to obey the |
3820 | 2876 | latching order and then release the dictionary mutex. That way we | 2876 | latching order and then release the dictionary mutex. That way we |
3821 | 2877 | 2877 | ||
3822 | === modified file 'Percona-Server/storage/innobase/handler/ha_innodb.cc' | |||
3823 | --- Percona-Server/storage/innobase/handler/ha_innodb.cc 2013-08-30 13:23:53 +0000 | |||
3824 | +++ Percona-Server/storage/innobase/handler/ha_innodb.cc 2013-09-20 05:29:11 +0000 | |||
3825 | @@ -294,6 +294,11 @@ | |||
3826 | 294 | # endif /* !PFS_SKIP_BUFFER_MUTEX_RWLOCK */ | 294 | # endif /* !PFS_SKIP_BUFFER_MUTEX_RWLOCK */ |
3827 | 295 | {&buf_pool_mutex_key, "buf_pool_mutex", 0}, | 295 | {&buf_pool_mutex_key, "buf_pool_mutex", 0}, |
3828 | 296 | {&buf_pool_zip_mutex_key, "buf_pool_zip_mutex", 0}, | 296 | {&buf_pool_zip_mutex_key, "buf_pool_zip_mutex", 0}, |
3829 | 297 | {&buf_pool_LRU_list_mutex_key, "buf_pool_LRU_list_mutex", 0}, | ||
3830 | 298 | {&buf_pool_free_list_mutex_key, "buf_pool_free_list_mutex", 0}, | ||
3831 | 299 | {&buf_pool_zip_free_mutex_key, "buf_pool_zip_free_mutex", 0}, | ||
3832 | 300 | {&buf_pool_zip_hash_mutex_key, "buf_pool_zip_hash_mutex", 0}, | ||
3833 | 301 | {&buf_pool_flush_state_mutex_key, "buf_pool_flush_state_mutex", 0}, | ||
3834 | 297 | {&cache_last_read_mutex_key, "cache_last_read_mutex", 0}, | 302 | {&cache_last_read_mutex_key, "cache_last_read_mutex", 0}, |
3835 | 298 | {&dict_foreign_err_mutex_key, "dict_foreign_err_mutex", 0}, | 303 | {&dict_foreign_err_mutex_key, "dict_foreign_err_mutex", 0}, |
3836 | 299 | {&dict_sys_mutex_key, "dict_sys_mutex", 0}, | 304 | {&dict_sys_mutex_key, "dict_sys_mutex", 0}, |
3837 | @@ -15485,7 +15490,7 @@ | |||
3838 | 15485 | for (ulint i = 0; i < srv_buf_pool_instances; i++) { | 15490 | for (ulint i = 0; i < srv_buf_pool_instances; i++) { |
3839 | 15486 | buf_pool_t* buf_pool = &buf_pool_ptr[i]; | 15491 | buf_pool_t* buf_pool = &buf_pool_ptr[i]; |
3840 | 15487 | 15492 | ||
3842 | 15488 | buf_pool_mutex_enter(buf_pool); | 15493 | mutex_enter(&buf_pool->LRU_list_mutex); |
3843 | 15489 | 15494 | ||
3844 | 15490 | for (buf_block_t* block = UT_LIST_GET_LAST( | 15495 | for (buf_block_t* block = UT_LIST_GET_LAST( |
3845 | 15491 | buf_pool->unzip_LRU); | 15496 | buf_pool->unzip_LRU); |
3846 | @@ -15497,14 +15502,19 @@ | |||
3847 | 15497 | ut_ad(block->in_unzip_LRU_list); | 15502 | ut_ad(block->in_unzip_LRU_list); |
3848 | 15498 | ut_ad(block->page.in_LRU_list); | 15503 | ut_ad(block->page.in_LRU_list); |
3849 | 15499 | 15504 | ||
3850 | 15505 | mutex_enter(&block->mutex); | ||
3851 | 15500 | if (!buf_LRU_free_page(&block->page, false)) { | 15506 | if (!buf_LRU_free_page(&block->page, false)) { |
3852 | 15507 | mutex_exit(&block->mutex); | ||
3853 | 15501 | all_evicted = false; | 15508 | all_evicted = false; |
3854 | 15509 | } else { | ||
3855 | 15510 | mutex_exit(&block->mutex); | ||
3856 | 15511 | mutex_enter(&buf_pool->LRU_list_mutex); | ||
3857 | 15502 | } | 15512 | } |
3858 | 15503 | 15513 | ||
3859 | 15504 | block = prev_block; | 15514 | block = prev_block; |
3860 | 15505 | } | 15515 | } |
3861 | 15506 | 15516 | ||
3863 | 15507 | buf_pool_mutex_exit(buf_pool); | 15517 | mutex_exit(&buf_pool->LRU_list_mutex); |
3864 | 15508 | } | 15518 | } |
3865 | 15509 | 15519 | ||
3866 | 15510 | return(all_evicted); | 15520 | return(all_evicted); |
3867 | 15511 | 15521 | ||
3868 | === modified file 'Percona-Server/storage/innobase/handler/i_s.cc' | |||
3869 | --- Percona-Server/storage/innobase/handler/i_s.cc 2013-08-14 03:57:21 +0000 | |||
3870 | +++ Percona-Server/storage/innobase/handler/i_s.cc 2013-09-20 05:29:11 +0000 | |||
3871 | @@ -2103,7 +2103,7 @@ | |||
3872 | 2103 | 2103 | ||
3873 | 2104 | buf_pool = buf_pool_from_array(i); | 2104 | buf_pool = buf_pool_from_array(i); |
3874 | 2105 | 2105 | ||
3876 | 2106 | buf_pool_mutex_enter(buf_pool); | 2106 | mutex_enter(&buf_pool->zip_free_mutex); |
3877 | 2107 | 2107 | ||
3878 | 2108 | for (uint x = 0; x <= BUF_BUDDY_SIZES; x++) { | 2108 | for (uint x = 0; x <= BUF_BUDDY_SIZES; x++) { |
3879 | 2109 | buf_buddy_stat_t* buddy_stat; | 2109 | buf_buddy_stat_t* buddy_stat; |
3880 | @@ -2122,7 +2122,8 @@ | |||
3881 | 2122 | (ulong) (buddy_stat->relocated_usec / 1000000)); | 2122 | (ulong) (buddy_stat->relocated_usec / 1000000)); |
3882 | 2123 | 2123 | ||
3883 | 2124 | if (reset) { | 2124 | if (reset) { |
3885 | 2125 | /* This is protected by buf_pool->mutex. */ | 2125 | /* This is protected by |
3886 | 2126 | buf_pool->zip_free_mutex. */ | ||
3887 | 2126 | buddy_stat->relocated = 0; | 2127 | buddy_stat->relocated = 0; |
3888 | 2127 | buddy_stat->relocated_usec = 0; | 2128 | buddy_stat->relocated_usec = 0; |
3889 | 2128 | } | 2129 | } |
3890 | @@ -2133,7 +2134,7 @@ | |||
3891 | 2133 | } | 2134 | } |
3892 | 2134 | } | 2135 | } |
3893 | 2135 | 2136 | ||
3895 | 2136 | buf_pool_mutex_exit(buf_pool); | 2137 | mutex_exit(&buf_pool->zip_free_mutex); |
3896 | 2137 | 2138 | ||
3897 | 2138 | if (status) { | 2139 | if (status) { |
3898 | 2139 | break; | 2140 | break; |
3899 | @@ -4954,12 +4955,16 @@ | |||
3900 | 4954 | out: structure filled with scanned | 4955 | out: structure filled with scanned |
3901 | 4955 | info */ | 4956 | info */ |
3902 | 4956 | { | 4957 | { |
3903 | 4958 | ib_mutex_t* mutex = buf_page_get_mutex(bpage); | ||
3904 | 4959 | |||
3905 | 4957 | ut_ad(pool_id < MAX_BUFFER_POOLS); | 4960 | ut_ad(pool_id < MAX_BUFFER_POOLS); |
3906 | 4958 | 4961 | ||
3907 | 4959 | page_info->pool_id = pool_id; | 4962 | page_info->pool_id = pool_id; |
3908 | 4960 | 4963 | ||
3909 | 4961 | page_info->block_id = pos; | 4964 | page_info->block_id = pos; |
3910 | 4962 | 4965 | ||
3911 | 4966 | mutex_enter(mutex); | ||
3912 | 4967 | |||
3913 | 4963 | page_info->page_state = buf_page_get_state(bpage); | 4968 | page_info->page_state = buf_page_get_state(bpage); |
3914 | 4964 | 4969 | ||
3915 | 4965 | /* Only fetch information for buffers that map to a tablespace, | 4970 | /* Only fetch information for buffers that map to a tablespace, |
3916 | @@ -4998,6 +5003,7 @@ | |||
3917 | 4998 | break; | 5003 | break; |
3918 | 4999 | case BUF_IO_READ: | 5004 | case BUF_IO_READ: |
3919 | 5000 | page_info->page_type = I_S_PAGE_TYPE_UNKNOWN; | 5005 | page_info->page_type = I_S_PAGE_TYPE_UNKNOWN; |
3920 | 5006 | mutex_exit(mutex); | ||
3921 | 5001 | return; | 5007 | return; |
3922 | 5002 | } | 5008 | } |
3923 | 5003 | 5009 | ||
3924 | @@ -5018,6 +5024,8 @@ | |||
3925 | 5018 | } else { | 5024 | } else { |
3926 | 5019 | page_info->page_type = I_S_PAGE_TYPE_UNKNOWN; | 5025 | page_info->page_type = I_S_PAGE_TYPE_UNKNOWN; |
3927 | 5020 | } | 5026 | } |
3928 | 5027 | |||
3929 | 5028 | mutex_exit(mutex); | ||
3930 | 5021 | } | 5029 | } |
3931 | 5022 | 5030 | ||
3932 | 5023 | /*******************************************************************//** | 5031 | /*******************************************************************//** |
3933 | @@ -5075,7 +5083,6 @@ | |||
3934 | 5075 | buffer pool info printout, we are not required to | 5083 | buffer pool info printout, we are not required to |
3935 | 5076 | preserve the overall consistency, so we can | 5084 | preserve the overall consistency, so we can |
3936 | 5077 | release mutex periodically */ | 5085 | release mutex periodically */ |
3937 | 5078 | buf_pool_mutex_enter(buf_pool); | ||
3938 | 5079 | 5086 | ||
3939 | 5080 | /* GO through each block in the chunk */ | 5087 | /* GO through each block in the chunk */ |
3940 | 5081 | for (n_blocks = num_to_process; n_blocks--; block++) { | 5088 | for (n_blocks = num_to_process; n_blocks--; block++) { |
3941 | @@ -5086,8 +5093,6 @@ | |||
3942 | 5086 | num_page++; | 5093 | num_page++; |
3943 | 5087 | } | 5094 | } |
3944 | 5088 | 5095 | ||
3945 | 5089 | buf_pool_mutex_exit(buf_pool); | ||
3946 | 5090 | |||
3947 | 5091 | /* Fill in information schema table with information | 5096 | /* Fill in information schema table with information |
3948 | 5092 | just collected from the buffer chunk scan */ | 5097 | just collected from the buffer chunk scan */ |
3949 | 5093 | status = i_s_innodb_buffer_page_fill( | 5098 | status = i_s_innodb_buffer_page_fill( |
3950 | @@ -5609,9 +5614,9 @@ | |||
3951 | 5609 | DBUG_ENTER("i_s_innodb_fill_buffer_lru"); | 5614 | DBUG_ENTER("i_s_innodb_fill_buffer_lru"); |
3952 | 5610 | RETURN_IF_INNODB_NOT_STARTED(tables->schema_table_name); | 5615 | RETURN_IF_INNODB_NOT_STARTED(tables->schema_table_name); |
3953 | 5611 | 5616 | ||
3955 | 5612 | /* Obtain buf_pool mutex before allocate info_buffer, since | 5617 | /* Obtain buf_pool->LRU_list_mutex before allocate info_buffer, since |
3956 | 5613 | UT_LIST_GET_LEN(buf_pool->LRU) could change */ | 5618 | UT_LIST_GET_LEN(buf_pool->LRU) could change */ |
3958 | 5614 | buf_pool_mutex_enter(buf_pool); | 5619 | mutex_enter(&buf_pool->LRU_list_mutex); |
3959 | 5615 | 5620 | ||
3960 | 5616 | lru_len = UT_LIST_GET_LEN(buf_pool->LRU); | 5621 | lru_len = UT_LIST_GET_LEN(buf_pool->LRU); |
3961 | 5617 | 5622 | ||
3962 | @@ -5645,7 +5650,7 @@ | |||
3963 | 5645 | ut_ad(lru_pos == UT_LIST_GET_LEN(buf_pool->LRU)); | 5650 | ut_ad(lru_pos == UT_LIST_GET_LEN(buf_pool->LRU)); |
3964 | 5646 | 5651 | ||
3965 | 5647 | exit: | 5652 | exit: |
3967 | 5648 | buf_pool_mutex_exit(buf_pool); | 5653 | mutex_exit(&buf_pool->LRU_list_mutex); |
3968 | 5649 | 5654 | ||
3969 | 5650 | if (info_buffer) { | 5655 | if (info_buffer) { |
3970 | 5651 | status = i_s_innodb_buf_page_lru_fill( | 5656 | status = i_s_innodb_buf_page_lru_fill( |
3971 | 5652 | 5657 | ||
3972 | === modified file 'Percona-Server/storage/innobase/ibuf/ibuf0ibuf.cc' | |||
3973 | --- Percona-Server/storage/innobase/ibuf/ibuf0ibuf.cc 2013-09-02 10:01:38 +0000 | |||
3974 | +++ Percona-Server/storage/innobase/ibuf/ibuf0ibuf.cc 2013-09-20 05:29:11 +0000 | |||
3975 | @@ -4611,7 +4611,7 @@ | |||
3976 | 4611 | ut_ad(!block || buf_block_get_space(block) == space); | 4611 | ut_ad(!block || buf_block_get_space(block) == space); |
3977 | 4612 | ut_ad(!block || buf_block_get_page_no(block) == page_no); | 4612 | ut_ad(!block || buf_block_get_page_no(block) == page_no); |
3978 | 4613 | ut_ad(!block || buf_block_get_zip_size(block) == zip_size); | 4613 | ut_ad(!block || buf_block_get_zip_size(block) == zip_size); |
3980 | 4614 | ut_ad(!block || buf_block_get_io_fix(block) == BUF_IO_READ); | 4614 | ut_ad(!block || buf_block_get_io_fix_unlocked(block) == BUF_IO_READ); |
3981 | 4615 | 4615 | ||
3982 | 4616 | if (srv_force_recovery >= SRV_FORCE_NO_IBUF_MERGE | 4616 | if (srv_force_recovery >= SRV_FORCE_NO_IBUF_MERGE |
3983 | 4617 | || trx_sys_hdr_page(space, page_no)) { | 4617 | || trx_sys_hdr_page(space, page_no)) { |
3984 | 4618 | 4618 | ||
3985 | === modified file 'Percona-Server/storage/innobase/include/buf0buddy.h' | |||
3986 | --- Percona-Server/storage/innobase/include/buf0buddy.h 2013-08-06 15:16:34 +0000 | |||
3987 | +++ Percona-Server/storage/innobase/include/buf0buddy.h 2013-09-20 05:29:11 +0000 | |||
3988 | @@ -36,8 +36,8 @@ | |||
3989 | 36 | 36 | ||
3990 | 37 | /**********************************************************************//** | 37 | /**********************************************************************//** |
3991 | 38 | Allocate a block. The thread calling this function must hold | 38 | Allocate a block. The thread calling this function must hold |
3994 | 39 | buf_pool->mutex and must not hold buf_pool->zip_mutex or any | 39 | buf_pool->LRU_list_mutex and must not hold buf_pool->zip_mutex or any |
3995 | 40 | block->mutex. The buf_pool->mutex may be released and reacquired. | 40 | block->mutex. The buf_pool->LRU_list_mutex may be released and reacquired. |
3996 | 41 | This function should only be used for allocating compressed page frames. | 41 | This function should only be used for allocating compressed page frames. |
3997 | 42 | @return allocated block, never NULL */ | 42 | @return allocated block, never NULL */ |
3998 | 43 | UNIV_INLINE | 43 | UNIV_INLINE |
3999 | @@ -52,8 +52,8 @@ | |||
4000 | 52 | ibool* lru) /*!< in: pointer to a variable | 52 | ibool* lru) /*!< in: pointer to a variable |
4001 | 53 | that will be assigned TRUE if | 53 | that will be assigned TRUE if |
4002 | 54 | storage was allocated from the | 54 | storage was allocated from the |
4005 | 55 | LRU list and buf_pool->mutex was | 55 | LRU list and buf_pool->LRU_list_mutex |
4006 | 56 | temporarily released */ | 56 | was temporarily released */ |
4007 | 57 | __attribute__((malloc, nonnull)); | 57 | __attribute__((malloc, nonnull)); |
4008 | 58 | 58 | ||
4009 | 59 | /**********************************************************************//** | 59 | /**********************************************************************//** |
4010 | 60 | 60 | ||
4011 | === modified file 'Percona-Server/storage/innobase/include/buf0buddy.ic' | |||
4012 | --- Percona-Server/storage/innobase/include/buf0buddy.ic 2013-08-06 15:16:34 +0000 | |||
4013 | +++ Percona-Server/storage/innobase/include/buf0buddy.ic 2013-09-20 05:29:11 +0000 | |||
4014 | @@ -35,8 +35,8 @@ | |||
4015 | 35 | 35 | ||
4016 | 36 | /**********************************************************************//** | 36 | /**********************************************************************//** |
4017 | 37 | Allocate a block. The thread calling this function must hold | 37 | Allocate a block. The thread calling this function must hold |
4020 | 38 | buf_pool->mutex and must not hold buf_pool->zip_mutex or any block->mutex. | 38 | buf_pool->LRU_list_mutex and must not hold buf_pool->zip_mutex or any |
4021 | 39 | The buf_pool_mutex may be released and reacquired. | 39 | block->mutex. The buf_pool->LRU_list_mutex may be released and reacquired. |
4022 | 40 | @return allocated block, never NULL */ | 40 | @return allocated block, never NULL */ |
4023 | 41 | UNIV_INTERN | 41 | UNIV_INTERN |
4024 | 42 | void* | 42 | void* |
4025 | @@ -48,8 +48,8 @@ | |||
4026 | 48 | ibool* lru) /*!< in: pointer to a variable that | 48 | ibool* lru) /*!< in: pointer to a variable that |
4027 | 49 | will be assigned TRUE if storage was | 49 | will be assigned TRUE if storage was |
4028 | 50 | allocated from the LRU list and | 50 | allocated from the LRU list and |
4031 | 51 | buf_pool->mutex was temporarily | 51 | buf_pool->LRU_list_mutex was |
4032 | 52 | released */ | 52 | temporarily released */ |
4033 | 53 | __attribute__((malloc, nonnull)); | 53 | __attribute__((malloc, nonnull)); |
4034 | 54 | 54 | ||
4035 | 55 | /**********************************************************************//** | 55 | /**********************************************************************//** |
4036 | @@ -88,8 +88,8 @@ | |||
4037 | 88 | 88 | ||
4038 | 89 | /**********************************************************************//** | 89 | /**********************************************************************//** |
4039 | 90 | Allocate a block. The thread calling this function must hold | 90 | Allocate a block. The thread calling this function must hold |
4042 | 91 | buf_pool->mutex and must not hold buf_pool->zip_mutex or any | 91 | buf_pool->LRU_list_mutex and must not hold buf_pool->zip_mutex or any |
4043 | 92 | block->mutex. The buf_pool->mutex may be released and reacquired. | 92 | block->mutex. The buf_pool->LRU_list_mutex may be released and reacquired. |
4044 | 93 | This function should only be used for allocating compressed page frames. | 93 | This function should only be used for allocating compressed page frames. |
4045 | 94 | @return allocated block, never NULL */ | 94 | @return allocated block, never NULL */ |
4046 | 95 | UNIV_INLINE | 95 | UNIV_INLINE |
4047 | @@ -104,10 +104,10 @@ | |||
4048 | 104 | ibool* lru) /*!< in: pointer to a variable | 104 | ibool* lru) /*!< in: pointer to a variable |
4049 | 105 | that will be assigned TRUE if | 105 | that will be assigned TRUE if |
4050 | 106 | storage was allocated from the | 106 | storage was allocated from the |
4053 | 107 | LRU list and buf_pool->mutex was | 107 | LRU list and buf_pool->LRU_list_mutex |
4054 | 108 | temporarily released */ | 108 | was temporarily released */ |
4055 | 109 | { | 109 | { |
4057 | 110 | ut_ad(buf_pool_mutex_own(buf_pool)); | 110 | ut_ad(mutex_own(&buf_pool->LRU_list_mutex)); |
4058 | 111 | ut_ad(ut_is_2pow(size)); | 111 | ut_ad(ut_is_2pow(size)); |
4059 | 112 | ut_ad(size >= UNIV_ZIP_SIZE_MIN); | 112 | ut_ad(size >= UNIV_ZIP_SIZE_MIN); |
4060 | 113 | ut_ad(size <= UNIV_PAGE_SIZE); | 113 | ut_ad(size <= UNIV_PAGE_SIZE); |
4061 | @@ -129,7 +129,6 @@ | |||
4062 | 129 | ulint size) /*!< in: block size, | 129 | ulint size) /*!< in: block size, |
4063 | 130 | up to UNIV_PAGE_SIZE */ | 130 | up to UNIV_PAGE_SIZE */ |
4064 | 131 | { | 131 | { |
4065 | 132 | ut_ad(buf_pool_mutex_own(buf_pool)); | ||
4066 | 133 | ut_ad(ut_is_2pow(size)); | 132 | ut_ad(ut_is_2pow(size)); |
4067 | 134 | ut_ad(size >= UNIV_ZIP_SIZE_MIN); | 133 | ut_ad(size >= UNIV_ZIP_SIZE_MIN); |
4068 | 135 | ut_ad(size <= UNIV_PAGE_SIZE); | 134 | ut_ad(size <= UNIV_PAGE_SIZE); |
4069 | 136 | 135 | ||
4070 | === modified file 'Percona-Server/storage/innobase/include/buf0buf.h' | |||
4071 | --- Percona-Server/storage/innobase/include/buf0buf.h 2013-09-02 10:01:38 +0000 | |||
4072 | +++ Percona-Server/storage/innobase/include/buf0buf.h 2013-09-20 05:29:11 +0000 | |||
4073 | @@ -208,19 +208,6 @@ | |||
4074 | 208 | }; | 208 | }; |
4075 | 209 | 209 | ||
4076 | 210 | #ifndef UNIV_HOTBACKUP | 210 | #ifndef UNIV_HOTBACKUP |
4077 | 211 | /********************************************************************//** | ||
4078 | 212 | Acquire mutex on all buffer pool instances */ | ||
4079 | 213 | UNIV_INLINE | ||
4080 | 214 | void | ||
4081 | 215 | buf_pool_mutex_enter_all(void); | ||
4082 | 216 | /*===========================*/ | ||
4083 | 217 | |||
4084 | 218 | /********************************************************************//** | ||
4085 | 219 | Release mutex on all buffer pool instances */ | ||
4086 | 220 | UNIV_INLINE | ||
4087 | 221 | void | ||
4088 | 222 | buf_pool_mutex_exit_all(void); | ||
4089 | 223 | /*==========================*/ | ||
4090 | 224 | 211 | ||
4091 | 225 | /********************************************************************//** | 212 | /********************************************************************//** |
4092 | 226 | Creates the buffer pool. | 213 | Creates the buffer pool. |
4093 | @@ -581,7 +568,7 @@ | |||
4094 | 581 | page frame */ | 568 | page frame */ |
4095 | 582 | /********************************************************************//** | 569 | /********************************************************************//** |
4096 | 583 | Increments the modify clock of a frame by 1. The caller must (1) own the | 570 | Increments the modify clock of a frame by 1. The caller must (1) own the |
4098 | 584 | buf_pool->mutex and block bufferfix count has to be zero, (2) or own an x-lock | 571 | LRU list mutex and block bufferfix count has to be zero, (2) or own an x-lock |
4099 | 585 | on the block. */ | 572 | on the block. */ |
4100 | 586 | UNIV_INLINE | 573 | UNIV_INLINE |
4101 | 587 | void | 574 | void |
4102 | @@ -938,6 +925,17 @@ | |||
4103 | 938 | const buf_block_t* block) /*!< in: pointer to the control block */ | 925 | const buf_block_t* block) /*!< in: pointer to the control block */ |
4104 | 939 | __attribute__((pure)); | 926 | __attribute__((pure)); |
4105 | 940 | /*********************************************************************//** | 927 | /*********************************************************************//** |
4106 | 928 | Gets the io_fix state of a block. Does not assert that the | ||
4107 | 929 | buf_page_get_mutex() mutex is held, to be used in the cases where it is safe | ||
4108 | 930 | not to hold it. | ||
4109 | 931 | @return io_fix state */ | ||
4110 | 932 | UNIV_INLINE | ||
4111 | 933 | enum buf_io_fix | ||
4112 | 934 | buf_page_get_io_fix_unlocked( | ||
4113 | 935 | /*=========================*/ | ||
4114 | 936 | const buf_page_t* bpage) /*!< in: pointer to the control block */ | ||
4115 | 937 | __attribute__((pure)); | ||
4116 | 938 | /*********************************************************************//** | ||
4117 | 941 | Sets the io_fix state of a block. */ | 939 | Sets the io_fix state of a block. */ |
4118 | 942 | UNIV_INLINE | 940 | UNIV_INLINE |
4119 | 943 | void | 941 | void |
4120 | @@ -955,7 +953,7 @@ | |||
4121 | 955 | enum buf_io_fix io_fix);/*!< in: io_fix state */ | 953 | enum buf_io_fix io_fix);/*!< in: io_fix state */ |
4122 | 956 | /*********************************************************************//** | 954 | /*********************************************************************//** |
4123 | 957 | Makes a block sticky. A sticky block implies that even after we release | 955 | Makes a block sticky. A sticky block implies that even after we release |
4125 | 958 | the buf_pool->mutex and the block->mutex: | 956 | the buf_pool->LRU_list_mutex and the block->mutex: |
4126 | 959 | * it cannot be removed from the flush_list | 957 | * it cannot be removed from the flush_list |
4127 | 960 | * the block descriptor cannot be relocated | 958 | * the block descriptor cannot be relocated |
4128 | 961 | * it cannot be removed from the LRU list | 959 | * it cannot be removed from the LRU list |
4129 | @@ -1410,6 +1408,19 @@ | |||
4130 | 1410 | 1408 | ||
4131 | 1411 | #endif /* !UNIV_HOTBACKUP */ | 1409 | #endif /* !UNIV_HOTBACKUP */ |
4132 | 1412 | 1410 | ||
4133 | 1411 | #ifdef UNIV_DEBUG | ||
4134 | 1412 | /********************************************************************//** | ||
4135 | 1413 | Checks if buf_pool->zip_mutex is owned and is serving for a given page as its | ||
4136 | 1414 | block mutex. | ||
4137 | 1415 | @return true if buf_pool->zip_mutex is owned. */ | ||
4138 | 1416 | UNIV_INLINE | ||
4139 | 1417 | bool | ||
4140 | 1418 | buf_own_zip_mutex_for_page( | ||
4141 | 1419 | /*=======================*/ | ||
4142 | 1420 | const buf_page_t* bpage) | ||
4143 | 1421 | __attribute__((nonnull,warn_unused_result)); | ||
4144 | 1422 | #endif /* UNIV_DEBUG */ | ||
4145 | 1423 | |||
4146 | 1413 | /** The common buffer control block structure | 1424 | /** The common buffer control block structure |
4147 | 1414 | for compressed and uncompressed frames */ | 1425 | for compressed and uncompressed frames */ |
4148 | 1415 | 1426 | ||
4149 | @@ -1421,18 +1432,14 @@ | |||
4150 | 1421 | None of these bit-fields must be modified without holding | 1432 | None of these bit-fields must be modified without holding |
4151 | 1422 | buf_page_get_mutex() [buf_block_t::mutex or | 1433 | buf_page_get_mutex() [buf_block_t::mutex or |
4152 | 1423 | buf_pool->zip_mutex], since they can be stored in the same | 1434 | buf_pool->zip_mutex], since they can be stored in the same |
4155 | 1424 | machine word. Some of these fields are additionally protected | 1435 | machine word. */ |
4154 | 1425 | by buf_pool->mutex. */ | ||
4156 | 1426 | /* @{ */ | 1436 | /* @{ */ |
4157 | 1427 | 1437 | ||
4162 | 1428 | unsigned space:32; /*!< tablespace id; also protected | 1438 | unsigned space:32; /*!< tablespace id. */ |
4163 | 1429 | by buf_pool->mutex. */ | 1439 | unsigned offset:32; /*!< page number. */ |
4160 | 1430 | unsigned offset:32; /*!< page number; also protected | ||
4161 | 1431 | by buf_pool->mutex. */ | ||
4164 | 1432 | 1440 | ||
4165 | 1433 | unsigned state:BUF_PAGE_STATE_BITS; | 1441 | unsigned state:BUF_PAGE_STATE_BITS; |
4168 | 1434 | /*!< state of the control block; also | 1442 | /*!< state of the control block. |
4167 | 1435 | protected by buf_pool->mutex. | ||
4169 | 1436 | State transitions from | 1443 | State transitions from |
4170 | 1437 | BUF_BLOCK_READY_FOR_USE to | 1444 | BUF_BLOCK_READY_FOR_USE to |
4171 | 1438 | BUF_BLOCK_MEMORY need not be | 1445 | BUF_BLOCK_MEMORY need not be |
4172 | @@ -1450,11 +1457,21 @@ | |||
4173 | 1450 | #ifndef UNIV_HOTBACKUP | 1457 | #ifndef UNIV_HOTBACKUP |
4174 | 1451 | unsigned flush_type:2; /*!< if this block is currently being | 1458 | unsigned flush_type:2; /*!< if this block is currently being |
4175 | 1452 | flushed to disk, this tells the | 1459 | flushed to disk, this tells the |
4177 | 1453 | flush_type. | 1460 | flush_type. Writes during flushing |
4178 | 1461 | protected by buf_page_get_mutex_enter() | ||
4179 | 1462 | mutex and the corresponding flush state | ||
4180 | 1463 | mutex. | ||
4181 | 1454 | @see buf_flush_t */ | 1464 | @see buf_flush_t */ |
4185 | 1455 | unsigned io_fix:2; /*!< type of pending I/O operation; | 1465 | unsigned io_fix:2; /*!< type of pending I/O operation. |
4186 | 1456 | also protected by buf_pool->mutex | 1466 | Transitions from BUF_IO_NONE to |
4187 | 1457 | @see enum buf_io_fix */ | 1467 | BUF_IO_WRITE and back are protected by |
4188 | 1468 | the buf_page_get_mutex() mutex and the | ||
4189 | 1469 | corresponding flush state mutex. The | ||
4190 | 1470 | flush state mutex protection for io_fix | ||
4191 | 1471 | and flush_type is not strictly | ||
4192 | 1472 | required, but it ensures consistent | ||
4193 | 1473 | buffer pool instance state snapshots in | ||
4194 | 1474 | buf_pool_validate_instance(). */ | ||
4195 | 1458 | unsigned buf_fix_count:19;/*!< count of how manyfold this block | 1475 | unsigned buf_fix_count:19;/*!< count of how manyfold this block |
4196 | 1459 | is currently bufferfixed */ | 1476 | is currently bufferfixed */ |
4197 | 1460 | unsigned buf_pool_index:6;/*!< index number of the buffer pool | 1477 | unsigned buf_pool_index:6;/*!< index number of the buffer pool |
4198 | @@ -1466,7 +1483,7 @@ | |||
4199 | 1466 | #endif /* !UNIV_HOTBACKUP */ | 1483 | #endif /* !UNIV_HOTBACKUP */ |
4200 | 1467 | page_zip_des_t zip; /*!< compressed page; zip.data | 1484 | page_zip_des_t zip; /*!< compressed page; zip.data |
4201 | 1468 | (but not the data it points to) is | 1485 | (but not the data it points to) is |
4203 | 1469 | also protected by buf_pool->mutex; | 1486 | protected by buf_pool->zip_mutex; |
4204 | 1470 | state == BUF_BLOCK_ZIP_PAGE and | 1487 | state == BUF_BLOCK_ZIP_PAGE and |
4205 | 1471 | zip.data == NULL means an active | 1488 | zip.data == NULL means an active |
4206 | 1472 | buf_pool->watch */ | 1489 | buf_pool->watch */ |
4207 | @@ -1479,15 +1496,13 @@ | |||
4208 | 1479 | ibool in_zip_hash; /*!< TRUE if in buf_pool->zip_hash */ | 1496 | ibool in_zip_hash; /*!< TRUE if in buf_pool->zip_hash */ |
4209 | 1480 | #endif /* UNIV_DEBUG */ | 1497 | #endif /* UNIV_DEBUG */ |
4210 | 1481 | 1498 | ||
4213 | 1482 | /** @name Page flushing fields | 1499 | /** @name Page flushing fields */ |
4212 | 1483 | All these are protected by buf_pool->mutex. */ | ||
4214 | 1484 | /* @{ */ | 1500 | /* @{ */ |
4215 | 1485 | 1501 | ||
4216 | 1486 | UT_LIST_NODE_T(buf_page_t) list; | 1502 | UT_LIST_NODE_T(buf_page_t) list; |
4217 | 1487 | /*!< based on state, this is a | 1503 | /*!< based on state, this is a |
4218 | 1488 | list node, protected either by | 1504 | list node, protected either by |
4221 | 1489 | buf_pool->mutex or by | 1505 | a corresponding list mutex, |
4220 | 1490 | buf_pool->flush_list_mutex, | ||
4222 | 1491 | in one of the following lists in | 1506 | in one of the following lists in |
4223 | 1492 | buf_pool: | 1507 | buf_pool: |
4224 | 1493 | 1508 | ||
4225 | @@ -1500,7 +1515,8 @@ | |||
4226 | 1500 | then the node pointers are | 1515 | then the node pointers are |
4227 | 1501 | covered by buf_pool->flush_list_mutex. | 1516 | covered by buf_pool->flush_list_mutex. |
4228 | 1502 | Otherwise these pointers are | 1517 | Otherwise these pointers are |
4230 | 1503 | protected by buf_pool->mutex. | 1518 | protected by a corresponding list |
4231 | 1519 | mutex. | ||
4232 | 1504 | 1520 | ||
4233 | 1505 | The contents of the list node | 1521 | The contents of the list node |
4234 | 1506 | is undefined if !in_flush_list | 1522 | is undefined if !in_flush_list |
4235 | @@ -1523,8 +1539,8 @@ | |||
4236 | 1523 | reads can happen while holding | 1539 | reads can happen while holding |
4237 | 1524 | any one of the two mutexes */ | 1540 | any one of the two mutexes */ |
4238 | 1525 | ibool in_free_list; /*!< TRUE if in buf_pool->free; when | 1541 | ibool in_free_list; /*!< TRUE if in buf_pool->free; when |
4241 | 1526 | buf_pool->mutex is free, the following | 1542 | buf_pool->free_list_mutex is free, the |
4242 | 1527 | should hold: in_free_list | 1543 | following should hold: in_free_list |
4243 | 1528 | == (state == BUF_BLOCK_NOT_USED) */ | 1544 | == (state == BUF_BLOCK_NOT_USED) */ |
4244 | 1529 | #endif /* UNIV_DEBUG */ | 1545 | #endif /* UNIV_DEBUG */ |
4245 | 1530 | lsn_t newest_modification; | 1546 | lsn_t newest_modification; |
4246 | @@ -1547,9 +1563,7 @@ | |||
4247 | 1547 | reads can happen while holding | 1563 | reads can happen while holding |
4248 | 1548 | any one of the two mutexes */ | 1564 | any one of the two mutexes */ |
4249 | 1549 | /* @} */ | 1565 | /* @} */ |
4253 | 1550 | /** @name LRU replacement algorithm fields | 1566 | /** @name LRU replacement algorithm fields */ |
4251 | 1551 | These fields are protected by buf_pool->mutex only (not | ||
4252 | 1552 | buf_pool->zip_mutex or buf_block_t::mutex). */ | ||
4254 | 1553 | /* @{ */ | 1567 | /* @{ */ |
4255 | 1554 | 1568 | ||
4256 | 1555 | UT_LIST_NODE_T(buf_page_t) LRU; | 1569 | UT_LIST_NODE_T(buf_page_t) LRU; |
4257 | @@ -1560,7 +1574,10 @@ | |||
4258 | 1560 | debugging */ | 1574 | debugging */ |
4259 | 1561 | #endif /* UNIV_DEBUG */ | 1575 | #endif /* UNIV_DEBUG */ |
4260 | 1562 | unsigned old:1; /*!< TRUE if the block is in the old | 1576 | unsigned old:1; /*!< TRUE if the block is in the old |
4262 | 1563 | blocks in buf_pool->LRU_old */ | 1577 | blocks in buf_pool->LRU_old. Protected |
4263 | 1578 | by the LRU list mutex. May be read for | ||
4264 | 1579 | heuristics purposes under the block | ||
4265 | 1580 | mutex instead. */ | ||
4266 | 1564 | unsigned freed_page_clock:31;/*!< the value of | 1581 | unsigned freed_page_clock:31;/*!< the value of |
4267 | 1565 | buf_pool->freed_page_clock | 1582 | buf_pool->freed_page_clock |
4268 | 1566 | when this block was the last | 1583 | when this block was the last |
4269 | @@ -1612,8 +1629,7 @@ | |||
4270 | 1612 | used in debugging */ | 1629 | used in debugging */ |
4271 | 1613 | #endif /* UNIV_DEBUG */ | 1630 | #endif /* UNIV_DEBUG */ |
4272 | 1614 | ib_mutex_t mutex; /*!< mutex protecting this block: | 1631 | ib_mutex_t mutex; /*!< mutex protecting this block: |
4275 | 1615 | state (also protected by the buffer | 1632 | state, io_fix, buf_fix_count, |
4274 | 1616 | pool mutex), io_fix, buf_fix_count, | ||
4276 | 1617 | and accessed; we introduce this new | 1633 | and accessed; we introduce this new |
4277 | 1618 | mutex in InnoDB-5.1 to relieve | 1634 | mutex in InnoDB-5.1 to relieve |
4278 | 1619 | contention on the buffer pool mutex */ | 1635 | contention on the buffer pool mutex */ |
4279 | @@ -1622,8 +1638,8 @@ | |||
4280 | 1622 | unsigned lock_hash_val:32;/*!< hashed value of the page address | 1638 | unsigned lock_hash_val:32;/*!< hashed value of the page address |
4281 | 1623 | in the record lock hash table; | 1639 | in the record lock hash table; |
4282 | 1624 | protected by buf_block_t::lock | 1640 | protected by buf_block_t::lock |
4285 | 1625 | (or buf_block_t::mutex, buf_pool->mutex | 1641 | (or buf_block_t::mutex in |
4286 | 1626 | in buf_page_get_gen(), | 1642 | buf_page_get_gen(), |
4287 | 1627 | buf_page_init_for_read() | 1643 | buf_page_init_for_read() |
4288 | 1628 | and buf_page_create()) */ | 1644 | and buf_page_create()) */ |
4289 | 1629 | ibool check_index_page_at_flush; | 1645 | ibool check_index_page_at_flush; |
4290 | @@ -1646,8 +1662,8 @@ | |||
4291 | 1646 | positioning: if the modify clock has | 1662 | positioning: if the modify clock has |
4292 | 1647 | not changed, we know that the pointer | 1663 | not changed, we know that the pointer |
4293 | 1648 | is still valid; this field may be | 1664 | is still valid; this field may be |
4296 | 1649 | changed if the thread (1) owns the | 1665 | changed if the thread (1) owns the LRU |
4297 | 1650 | pool mutex and the page is not | 1666 | list mutex and the page is not |
4298 | 1651 | bufferfixed, or (2) the thread has an | 1667 | bufferfixed, or (2) the thread has an |
4299 | 1652 | x-latch on the block */ | 1668 | x-latch on the block */ |
4300 | 1653 | /* @} */ | 1669 | /* @} */ |
4301 | @@ -1754,11 +1770,11 @@ | |||
4302 | 1754 | ulint n_page_gets; /*!< number of page gets performed; | 1770 | ulint n_page_gets; /*!< number of page gets performed; |
4303 | 1755 | also successful searches through | 1771 | also successful searches through |
4304 | 1756 | the adaptive hash index are | 1772 | the adaptive hash index are |
4310 | 1757 | counted as page gets; this field | 1773 | counted as page gets. */ |
4311 | 1758 | is NOT protected by the buffer | 1774 | ulint n_pages_read; /*!< number read operations. Accessed |
4312 | 1759 | pool mutex */ | 1775 | atomically. */ |
4313 | 1760 | ulint n_pages_read; /*!< number read operations */ | 1776 | ulint n_pages_written;/*!< number write operations. Accessed |
4314 | 1761 | ulint n_pages_written;/*!< number write operations */ | 1777 | atomically.*/ |
4315 | 1762 | ulint n_pages_created;/*!< number of pages created | 1778 | ulint n_pages_created;/*!< number of pages created |
4316 | 1763 | in the pool with no read */ | 1779 | in the pool with no read */ |
4317 | 1764 | ulint n_ra_pages_read_rnd;/*!< number of pages read in | 1780 | ulint n_ra_pages_read_rnd;/*!< number of pages read in |
4318 | @@ -1798,12 +1814,16 @@ | |||
4319 | 1798 | 1814 | ||
4320 | 1799 | /** @name General fields */ | 1815 | /** @name General fields */ |
4321 | 1800 | /* @{ */ | 1816 | /* @{ */ |
4322 | 1801 | ib_mutex_t mutex; /*!< Buffer pool mutex of this | ||
4323 | 1802 | instance */ | ||
4324 | 1803 | ib_mutex_t zip_mutex; /*!< Zip mutex of this buffer | 1817 | ib_mutex_t zip_mutex; /*!< Zip mutex of this buffer |
4325 | 1804 | pool instance, protects compressed | 1818 | pool instance, protects compressed |
4326 | 1805 | only pages (of type buf_page_t, not | 1819 | only pages (of type buf_page_t, not |
4327 | 1806 | buf_block_t */ | 1820 | buf_block_t */ |
4328 | 1821 | ib_mutex_t LRU_list_mutex; | ||
4329 | 1822 | ib_mutex_t free_list_mutex; | ||
4330 | 1823 | ib_mutex_t zip_free_mutex; | ||
4331 | 1824 | ib_mutex_t zip_hash_mutex; | ||
4332 | 1825 | ib_mutex_t flush_state_mutex; /*!< Flush state protection | ||
4333 | 1826 | mutex */ | ||
4334 | 1807 | ulint instance_no; /*!< Array index of this buffer | 1827 | ulint instance_no; /*!< Array index of this buffer |
4335 | 1808 | pool instance */ | 1828 | pool instance */ |
4336 | 1809 | ulint old_pool_size; /*!< Old pool size in bytes */ | 1829 | ulint old_pool_size; /*!< Old pool size in bytes */ |
4337 | @@ -1814,9 +1834,6 @@ | |||
4338 | 1814 | ulint buddy_n_frames; /*!< Number of frames allocated from | 1834 | ulint buddy_n_frames; /*!< Number of frames allocated from |
4339 | 1815 | the buffer pool to the buddy system */ | 1835 | the buffer pool to the buddy system */ |
4340 | 1816 | #endif | 1836 | #endif |
4341 | 1817 | #if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG | ||
4342 | 1818 | ulint mutex_exit_forbidden; /*!< Forbid release mutex */ | ||
4343 | 1819 | #endif | ||
4344 | 1820 | ulint n_chunks; /*!< number of buffer pool chunks */ | 1837 | ulint n_chunks; /*!< number of buffer pool chunks */ |
4345 | 1821 | buf_chunk_t* chunks; /*!< buffer pool chunks */ | 1838 | buf_chunk_t* chunks; /*!< buffer pool chunks */ |
4346 | 1822 | ulint curr_size; /*!< current pool size in pages */ | 1839 | ulint curr_size; /*!< current pool size in pages */ |
4347 | @@ -1828,26 +1845,23 @@ | |||
4348 | 1828 | buf_page_in_file() == TRUE, | 1845 | buf_page_in_file() == TRUE, |
4349 | 1829 | indexed by (space_id, offset). | 1846 | indexed by (space_id, offset). |
4350 | 1830 | page_hash is protected by an | 1847 | page_hash is protected by an |
4357 | 1831 | array of mutexes. | 1848 | array of mutexes. */ |
4352 | 1832 | Changes in page_hash are protected | ||
4353 | 1833 | by buf_pool->mutex and the relevant | ||
4354 | 1834 | page_hash mutex. Lookups can happen | ||
4355 | 1835 | while holding the buf_pool->mutex or | ||
4356 | 1836 | the relevant page_hash mutex. */ | ||
4358 | 1837 | hash_table_t* zip_hash; /*!< hash table of buf_block_t blocks | 1849 | hash_table_t* zip_hash; /*!< hash table of buf_block_t blocks |
4359 | 1838 | whose frames are allocated to the | 1850 | whose frames are allocated to the |
4360 | 1839 | zip buddy system, | 1851 | zip buddy system, |
4361 | 1840 | indexed by block->frame */ | 1852 | indexed by block->frame */ |
4362 | 1841 | ulint n_pend_reads; /*!< number of pending read | 1853 | ulint n_pend_reads; /*!< number of pending read |
4365 | 1842 | operations */ | 1854 | operations. Accessed atomically */ |
4366 | 1843 | ulint n_pend_unzip; /*!< number of pending decompressions */ | 1855 | ulint n_pend_unzip; /*!< number of pending decompressions. |
4367 | 1856 | Accesssed atomically */ | ||
4368 | 1844 | 1857 | ||
4369 | 1845 | time_t last_printout_time; | 1858 | time_t last_printout_time; |
4370 | 1846 | /*!< when buf_print_io was last time | 1859 | /*!< when buf_print_io was last time |
4372 | 1847 | called */ | 1860 | called. Accesses not protected */ |
4373 | 1848 | buf_buddy_stat_t buddy_stat[BUF_BUDDY_SIZES_MAX + 1]; | 1861 | buf_buddy_stat_t buddy_stat[BUF_BUDDY_SIZES_MAX + 1]; |
4374 | 1849 | /*!< Statistics of buddy system, | 1862 | /*!< Statistics of buddy system, |
4376 | 1850 | indexed by block size */ | 1863 | indexed by block size. Protected by |
4377 | 1864 | zip_free_mutex. */ | ||
4378 | 1851 | buf_pool_stat_t stat; /*!< current statistics */ | 1865 | buf_pool_stat_t stat; /*!< current statistics */ |
4379 | 1852 | buf_pool_stat_t old_stat; /*!< old statistics */ | 1866 | buf_pool_stat_t old_stat; /*!< old statistics */ |
4380 | 1853 | 1867 | ||
4381 | @@ -1874,10 +1888,12 @@ | |||
4382 | 1874 | list */ | 1888 | list */ |
4383 | 1875 | ibool init_flush[BUF_FLUSH_N_TYPES]; | 1889 | ibool init_flush[BUF_FLUSH_N_TYPES]; |
4384 | 1876 | /*!< this is TRUE when a flush of the | 1890 | /*!< this is TRUE when a flush of the |
4386 | 1877 | given type is being initialized */ | 1891 | given type is being initialized. |
4387 | 1892 | Protected by flush_state_mutex. */ | ||
4388 | 1878 | ulint n_flush[BUF_FLUSH_N_TYPES]; | 1893 | ulint n_flush[BUF_FLUSH_N_TYPES]; |
4389 | 1879 | /*!< this is the number of pending | 1894 | /*!< this is the number of pending |
4391 | 1880 | writes in the given flush type */ | 1895 | writes in the given flush type. |
4392 | 1896 | Protected by flush_state_mutex. */ | ||
4393 | 1881 | os_event_t no_flush[BUF_FLUSH_N_TYPES]; | 1897 | os_event_t no_flush[BUF_FLUSH_N_TYPES]; |
4394 | 1882 | /*!< this is in the set state | 1898 | /*!< this is in the set state |
4395 | 1883 | when there is no flush batch | 1899 | when there is no flush batch |
4396 | @@ -1904,7 +1920,8 @@ | |||
4397 | 1904 | billion! A thread is allowed | 1920 | billion! A thread is allowed |
4398 | 1905 | to read this for heuristic | 1921 | to read this for heuristic |
4399 | 1906 | purposes without holding any | 1922 | purposes without holding any |
4401 | 1907 | mutex or latch */ | 1923 | mutex or latch. For non-heuristic |
4402 | 1924 | purposes protected by LRU_list_mutex */ | ||
4403 | 1908 | ibool try_LRU_scan; /*!< Set to FALSE when an LRU | 1925 | ibool try_LRU_scan; /*!< Set to FALSE when an LRU |
4404 | 1909 | scan for free block fails. This | 1926 | scan for free block fails. This |
4405 | 1910 | flag is used to avoid repeated | 1927 | flag is used to avoid repeated |
4406 | @@ -1913,8 +1930,7 @@ | |||
4407 | 1913 | available in the scan depth for | 1930 | available in the scan depth for |
4408 | 1914 | eviction. Set to TRUE whenever | 1931 | eviction. Set to TRUE whenever |
4409 | 1915 | we flush a batch from the | 1932 | we flush a batch from the |
4412 | 1916 | buffer pool. Protected by the | 1933 | buffer pool. Accessed atomically. */ |
4411 | 1917 | buf_pool->mutex */ | ||
4413 | 1918 | /* @} */ | 1934 | /* @} */ |
4414 | 1919 | 1935 | ||
4415 | 1920 | /** @name LRU replacement algorithm fields */ | 1936 | /** @name LRU replacement algorithm fields */ |
4416 | @@ -1942,7 +1958,8 @@ | |||
4417 | 1942 | 1958 | ||
4418 | 1943 | UT_LIST_BASE_NODE_T(buf_block_t) unzip_LRU; | 1959 | UT_LIST_BASE_NODE_T(buf_block_t) unzip_LRU; |
4419 | 1944 | /*!< base node of the | 1960 | /*!< base node of the |
4421 | 1945 | unzip_LRU list */ | 1961 | unzip_LRU list. The list is protected |
4422 | 1962 | by LRU list mutex. */ | ||
4423 | 1946 | 1963 | ||
4424 | 1947 | /* @} */ | 1964 | /* @} */ |
4425 | 1948 | /** @name Buddy allocator fields | 1965 | /** @name Buddy allocator fields |
4426 | @@ -1959,8 +1976,7 @@ | |||
4427 | 1959 | 1976 | ||
4428 | 1960 | buf_page_t* watch; | 1977 | buf_page_t* watch; |
4429 | 1961 | /*!< Sentinel records for buffer | 1978 | /*!< Sentinel records for buffer |
4432 | 1962 | pool watches. Protected by | 1979 | pool watches. */ |
4431 | 1963 | buf_pool->mutex. */ | ||
4433 | 1964 | 1980 | ||
4434 | 1965 | #if BUF_BUDDY_LOW > UNIV_ZIP_SIZE_MIN | 1981 | #if BUF_BUDDY_LOW > UNIV_ZIP_SIZE_MIN |
4435 | 1966 | # error "BUF_BUDDY_LOW > UNIV_ZIP_SIZE_MIN" | 1982 | # error "BUF_BUDDY_LOW > UNIV_ZIP_SIZE_MIN" |
4436 | @@ -1968,18 +1984,10 @@ | |||
4437 | 1968 | /* @} */ | 1984 | /* @} */ |
4438 | 1969 | }; | 1985 | }; |
4439 | 1970 | 1986 | ||
4442 | 1971 | /** @name Accessors for buf_pool->mutex. | 1987 | /** @name Accessors for buffer pool mutexes |
4443 | 1972 | Use these instead of accessing buf_pool->mutex directly. */ | 1988 | Use these instead of accessing buffer pool mutexes directly. */ |
4444 | 1973 | /* @{ */ | 1989 | /* @{ */ |
4445 | 1974 | 1990 | ||
4446 | 1975 | /** Test if a buffer pool mutex is owned. */ | ||
4447 | 1976 | #define buf_pool_mutex_own(b) mutex_own(&b->mutex) | ||
4448 | 1977 | /** Acquire a buffer pool mutex. */ | ||
4449 | 1978 | #define buf_pool_mutex_enter(b) do { \ | ||
4450 | 1979 | ut_ad(!mutex_own(&b->zip_mutex)); \ | ||
4451 | 1980 | mutex_enter(&b->mutex); \ | ||
4452 | 1981 | } while (0) | ||
4453 | 1982 | |||
4454 | 1983 | /** Test if flush list mutex is owned. */ | 1991 | /** Test if flush list mutex is owned. */ |
4455 | 1984 | #define buf_flush_list_mutex_own(b) mutex_own(&b->flush_list_mutex) | 1992 | #define buf_flush_list_mutex_own(b) mutex_own(&b->flush_list_mutex) |
4456 | 1985 | 1993 | ||
4457 | @@ -2035,31 +2043,6 @@ | |||
4458 | 2035 | # define buf_block_hash_lock_held_s_or_x(b, p) (TRUE) | 2043 | # define buf_block_hash_lock_held_s_or_x(b, p) (TRUE) |
4459 | 2036 | #endif /* UNIV_SYNC_DEBUG */ | 2044 | #endif /* UNIV_SYNC_DEBUG */ |
4460 | 2037 | 2045 | ||
4461 | 2038 | #if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG | ||
4462 | 2039 | /** Forbid the release of the buffer pool mutex. */ | ||
4463 | 2040 | # define buf_pool_mutex_exit_forbid(b) do { \ | ||
4464 | 2041 | ut_ad(buf_pool_mutex_own(b)); \ | ||
4465 | 2042 | b->mutex_exit_forbidden++; \ | ||
4466 | 2043 | } while (0) | ||
4467 | 2044 | /** Allow the release of the buffer pool mutex. */ | ||
4468 | 2045 | # define buf_pool_mutex_exit_allow(b) do { \ | ||
4469 | 2046 | ut_ad(buf_pool_mutex_own(b)); \ | ||
4470 | 2047 | ut_a(b->mutex_exit_forbidden); \ | ||
4471 | 2048 | b->mutex_exit_forbidden--; \ | ||
4472 | 2049 | } while (0) | ||
4473 | 2050 | /** Release the buffer pool mutex. */ | ||
4474 | 2051 | # define buf_pool_mutex_exit(b) do { \ | ||
4475 | 2052 | ut_a(!b->mutex_exit_forbidden); \ | ||
4476 | 2053 | mutex_exit(&b->mutex); \ | ||
4477 | 2054 | } while (0) | ||
4478 | 2055 | #else | ||
4479 | 2056 | /** Forbid the release of the buffer pool mutex. */ | ||
4480 | 2057 | # define buf_pool_mutex_exit_forbid(b) ((void) 0) | ||
4481 | 2058 | /** Allow the release of the buffer pool mutex. */ | ||
4482 | 2059 | # define buf_pool_mutex_exit_allow(b) ((void) 0) | ||
4483 | 2060 | /** Release the buffer pool mutex. */ | ||
4484 | 2061 | # define buf_pool_mutex_exit(b) mutex_exit(&b->mutex) | ||
4485 | 2062 | #endif | ||
4486 | 2063 | #endif /* !UNIV_HOTBACKUP */ | 2046 | #endif /* !UNIV_HOTBACKUP */ |
4487 | 2064 | /* @} */ | 2047 | /* @} */ |
4488 | 2065 | 2048 | ||
4489 | 2066 | 2049 | ||
4490 | === modified file 'Percona-Server/storage/innobase/include/buf0buf.ic' | |||
4491 | --- Percona-Server/storage/innobase/include/buf0buf.ic 2013-06-25 13:13:06 +0000 | |||
4492 | +++ Percona-Server/storage/innobase/include/buf0buf.ic 2013-09-20 05:29:11 +0000 | |||
4493 | @@ -121,7 +121,7 @@ | |||
4494 | 121 | /*==========================*/ | 121 | /*==========================*/ |
4495 | 122 | const buf_page_t* bpage) /*!< in: block */ | 122 | const buf_page_t* bpage) /*!< in: block */ |
4496 | 123 | { | 123 | { |
4498 | 124 | /* This is sometimes read without holding buf_pool->mutex. */ | 124 | /* This is sometimes read without holding any buffer pool mutex. */ |
4499 | 125 | return(bpage->freed_page_clock); | 125 | return(bpage->freed_page_clock); |
4500 | 126 | } | 126 | } |
4501 | 127 | 127 | ||
4502 | @@ -420,8 +420,21 @@ | |||
4503 | 420 | /*================*/ | 420 | /*================*/ |
4504 | 421 | const buf_page_t* bpage) /*!< in: pointer to the control block */ | 421 | const buf_page_t* bpage) /*!< in: pointer to the control block */ |
4505 | 422 | { | 422 | { |
4507 | 423 | ut_ad(bpage != NULL); | 423 | ut_ad(mutex_own(buf_page_get_mutex(bpage))); |
4508 | 424 | return buf_page_get_io_fix_unlocked(bpage); | ||
4509 | 425 | } | ||
4510 | 424 | 426 | ||
4511 | 427 | /*********************************************************************//** | ||
4512 | 428 | Gets the io_fix state of a block. Does not assert that the | ||
4513 | 429 | buf_page_get_mutex() mutex is held, to be used in the cases where it is safe | ||
4514 | 430 | not to hold it. | ||
4515 | 431 | @return io_fix state */ | ||
4516 | 432 | UNIV_INLINE | ||
4517 | 433 | enum buf_io_fix | ||
4518 | 434 | buf_page_get_io_fix_unlocked( | ||
4519 | 435 | /*=========================*/ | ||
4520 | 436 | const buf_page_t* bpage) /*!< in: pointer to the control block */ | ||
4521 | 437 | { | ||
4522 | 425 | enum buf_io_fix io_fix = (enum buf_io_fix) bpage->io_fix; | 438 | enum buf_io_fix io_fix = (enum buf_io_fix) bpage->io_fix; |
4523 | 426 | #ifdef UNIV_DEBUG | 439 | #ifdef UNIV_DEBUG |
4524 | 427 | switch (io_fix) { | 440 | switch (io_fix) { |
4525 | @@ -449,6 +462,21 @@ | |||
4526 | 449 | } | 462 | } |
4527 | 450 | 463 | ||
4528 | 451 | /*********************************************************************//** | 464 | /*********************************************************************//** |
4529 | 465 | Gets the io_fix state of a block. Does not assert that the | ||
4530 | 466 | buf_page_get_mutex() mutex is held, to be used in the cases where it is safe | ||
4531 | 467 | not to hold it. | ||
4532 | 468 | @return io_fix state */ | ||
4533 | 469 | UNIV_INLINE | ||
4534 | 470 | enum buf_io_fix | ||
4535 | 471 | buf_block_get_io_fix_unlocked( | ||
4536 | 472 | /*==========================*/ | ||
4537 | 473 | const buf_block_t* block) /*!< in: pointer to the control block */ | ||
4538 | 474 | { | ||
4539 | 475 | return(buf_page_get_io_fix_unlocked(&block->page)); | ||
4540 | 476 | } | ||
4541 | 477 | |||
4542 | 478 | |||
4543 | 479 | /*********************************************************************//** | ||
4544 | 452 | Sets the io_fix state of a block. */ | 480 | Sets the io_fix state of a block. */ |
4545 | 453 | UNIV_INLINE | 481 | UNIV_INLINE |
4546 | 454 | void | 482 | void |
4547 | @@ -457,10 +485,6 @@ | |||
4548 | 457 | buf_page_t* bpage, /*!< in/out: control block */ | 485 | buf_page_t* bpage, /*!< in/out: control block */ |
4549 | 458 | enum buf_io_fix io_fix) /*!< in: io_fix state */ | 486 | enum buf_io_fix io_fix) /*!< in: io_fix state */ |
4550 | 459 | { | 487 | { |
4551 | 460 | #ifdef UNIV_DEBUG | ||
4552 | 461 | buf_pool_t* buf_pool = buf_pool_from_bpage(bpage); | ||
4553 | 462 | ut_ad(buf_pool_mutex_own(buf_pool)); | ||
4554 | 463 | #endif | ||
4555 | 464 | ut_ad(mutex_own(buf_page_get_mutex(bpage))); | 488 | ut_ad(mutex_own(buf_page_get_mutex(bpage))); |
4556 | 465 | 489 | ||
4557 | 466 | bpage->io_fix = io_fix; | 490 | bpage->io_fix = io_fix; |
4558 | @@ -481,7 +505,7 @@ | |||
4559 | 481 | 505 | ||
4560 | 482 | /*********************************************************************//** | 506 | /*********************************************************************//** |
4561 | 483 | Makes a block sticky. A sticky block implies that even after we release | 507 | Makes a block sticky. A sticky block implies that even after we release |
4563 | 484 | the buf_pool->mutex and the block->mutex: | 508 | the buf_pool->LRU_list_mutex and the block->mutex: |
4564 | 485 | * it cannot be removed from the flush_list | 509 | * it cannot be removed from the flush_list |
4565 | 486 | * the block descriptor cannot be relocated | 510 | * the block descriptor cannot be relocated |
4566 | 487 | * it cannot be removed from the LRU list | 511 | * it cannot be removed from the LRU list |
4567 | @@ -496,10 +520,11 @@ | |||
4568 | 496 | { | 520 | { |
4569 | 497 | #ifdef UNIV_DEBUG | 521 | #ifdef UNIV_DEBUG |
4570 | 498 | buf_pool_t* buf_pool = buf_pool_from_bpage(bpage); | 522 | buf_pool_t* buf_pool = buf_pool_from_bpage(bpage); |
4572 | 499 | ut_ad(buf_pool_mutex_own(buf_pool)); | 523 | ut_ad(mutex_own(&buf_pool->LRU_list_mutex)); |
4573 | 500 | #endif | 524 | #endif |
4574 | 501 | ut_ad(mutex_own(buf_page_get_mutex(bpage))); | 525 | ut_ad(mutex_own(buf_page_get_mutex(bpage))); |
4575 | 502 | ut_ad(buf_page_get_io_fix(bpage) == BUF_IO_NONE); | 526 | ut_ad(buf_page_get_io_fix(bpage) == BUF_IO_NONE); |
4576 | 527 | ut_ad(bpage->in_LRU_list); | ||
4577 | 503 | 528 | ||
4578 | 504 | bpage->io_fix = BUF_IO_PIN; | 529 | bpage->io_fix = BUF_IO_PIN; |
4579 | 505 | } | 530 | } |
4580 | @@ -512,10 +537,6 @@ | |||
4581 | 512 | /*==================*/ | 537 | /*==================*/ |
4582 | 513 | buf_page_t* bpage) /*!< in/out: control block */ | 538 | buf_page_t* bpage) /*!< in/out: control block */ |
4583 | 514 | { | 539 | { |
4584 | 515 | #ifdef UNIV_DEBUG | ||
4585 | 516 | buf_pool_t* buf_pool = buf_pool_from_bpage(bpage); | ||
4586 | 517 | ut_ad(buf_pool_mutex_own(buf_pool)); | ||
4587 | 518 | #endif | ||
4588 | 519 | ut_ad(mutex_own(buf_page_get_mutex(bpage))); | 540 | ut_ad(mutex_own(buf_page_get_mutex(bpage))); |
4589 | 520 | ut_ad(buf_page_get_io_fix(bpage) == BUF_IO_PIN); | 541 | ut_ad(buf_page_get_io_fix(bpage) == BUF_IO_PIN); |
4590 | 521 | 542 | ||
4591 | @@ -531,10 +552,6 @@ | |||
4592 | 531 | /*==================*/ | 552 | /*==================*/ |
4593 | 532 | const buf_page_t* bpage) /*!< control block being relocated */ | 553 | const buf_page_t* bpage) /*!< control block being relocated */ |
4594 | 533 | { | 554 | { |
4595 | 534 | #ifdef UNIV_DEBUG | ||
4596 | 535 | buf_pool_t* buf_pool = buf_pool_from_bpage(bpage); | ||
4597 | 536 | ut_ad(buf_pool_mutex_own(buf_pool)); | ||
4598 | 537 | #endif | ||
4599 | 538 | ut_ad(mutex_own(buf_page_get_mutex(bpage))); | 555 | ut_ad(mutex_own(buf_page_get_mutex(bpage))); |
4600 | 539 | ut_ad(buf_page_in_file(bpage)); | 556 | ut_ad(buf_page_in_file(bpage)); |
4601 | 540 | ut_ad(bpage->in_LRU_list); | 557 | ut_ad(bpage->in_LRU_list); |
4602 | @@ -554,8 +571,12 @@ | |||
4603 | 554 | { | 571 | { |
4604 | 555 | #ifdef UNIV_DEBUG | 572 | #ifdef UNIV_DEBUG |
4605 | 556 | buf_pool_t* buf_pool = buf_pool_from_bpage(bpage); | 573 | buf_pool_t* buf_pool = buf_pool_from_bpage(bpage); |
4606 | 557 | ut_ad(buf_pool_mutex_own(buf_pool)); | ||
4607 | 558 | #endif | 574 | #endif |
4608 | 575 | /* Buffer page mutex is not strictly required here for heuristic | ||
4609 | 576 | purposes even if LRU mutex is not being held. Keep the assertion | ||
4610 | 577 | for now since all the callers hold it. */ | ||
4611 | 578 | ut_ad(mutex_own(buf_page_get_mutex(bpage)) | ||
4612 | 579 | || mutex_own(&buf_pool->LRU_list_mutex)); | ||
4613 | 559 | ut_ad(buf_page_in_file(bpage)); | 580 | ut_ad(buf_page_in_file(bpage)); |
4614 | 560 | 581 | ||
4615 | 561 | return(bpage->old); | 582 | return(bpage->old); |
4616 | @@ -574,7 +595,7 @@ | |||
4617 | 574 | buf_pool_t* buf_pool = buf_pool_from_bpage(bpage); | 595 | buf_pool_t* buf_pool = buf_pool_from_bpage(bpage); |
4618 | 575 | #endif /* UNIV_DEBUG */ | 596 | #endif /* UNIV_DEBUG */ |
4619 | 576 | ut_a(buf_page_in_file(bpage)); | 597 | ut_a(buf_page_in_file(bpage)); |
4621 | 577 | ut_ad(buf_pool_mutex_own(buf_pool)); | 598 | ut_ad(mutex_own(&buf_pool->LRU_list_mutex)); |
4622 | 578 | ut_ad(bpage->in_LRU_list); | 599 | ut_ad(bpage->in_LRU_list); |
4623 | 579 | 600 | ||
4624 | 580 | #ifdef UNIV_LRU_DEBUG | 601 | #ifdef UNIV_LRU_DEBUG |
4625 | @@ -619,11 +640,7 @@ | |||
4626 | 619 | /*==================*/ | 640 | /*==================*/ |
4627 | 620 | buf_page_t* bpage) /*!< in/out: control block */ | 641 | buf_page_t* bpage) /*!< in/out: control block */ |
4628 | 621 | { | 642 | { |
4629 | 622 | #ifdef UNIV_DEBUG | ||
4630 | 623 | buf_pool_t* buf_pool = buf_pool_from_bpage(bpage); | ||
4631 | 624 | ut_ad(!buf_pool_mutex_own(buf_pool)); | ||
4632 | 625 | ut_ad(mutex_own(buf_page_get_mutex(bpage))); | 643 | ut_ad(mutex_own(buf_page_get_mutex(bpage))); |
4633 | 626 | #endif | ||
4634 | 627 | ut_a(buf_page_in_file(bpage)); | 644 | ut_a(buf_page_in_file(bpage)); |
4635 | 628 | 645 | ||
4636 | 629 | if (!bpage->access_time) { | 646 | if (!bpage->access_time) { |
4637 | @@ -885,10 +902,6 @@ | |||
4638 | 885 | /*===========*/ | 902 | /*===========*/ |
4639 | 886 | buf_block_t* block) /*!< in, own: block to be freed */ | 903 | buf_block_t* block) /*!< in, own: block to be freed */ |
4640 | 887 | { | 904 | { |
4641 | 888 | buf_pool_t* buf_pool = buf_pool_from_bpage((buf_page_t*) block); | ||
4642 | 889 | |||
4643 | 890 | buf_pool_mutex_enter(buf_pool); | ||
4644 | 891 | |||
4645 | 892 | mutex_enter(&block->mutex); | 905 | mutex_enter(&block->mutex); |
4646 | 893 | 906 | ||
4647 | 894 | ut_a(buf_block_get_state(block) != BUF_BLOCK_FILE_PAGE); | 907 | ut_a(buf_block_get_state(block) != BUF_BLOCK_FILE_PAGE); |
4648 | @@ -896,8 +909,6 @@ | |||
4649 | 896 | buf_LRU_block_free_non_file_page(block); | 909 | buf_LRU_block_free_non_file_page(block); |
4650 | 897 | 910 | ||
4651 | 898 | mutex_exit(&block->mutex); | 911 | mutex_exit(&block->mutex); |
4652 | 899 | |||
4653 | 900 | buf_pool_mutex_exit(buf_pool); | ||
4654 | 901 | } | 912 | } |
4655 | 902 | #endif /* !UNIV_HOTBACKUP */ | 913 | #endif /* !UNIV_HOTBACKUP */ |
4656 | 903 | 914 | ||
4657 | @@ -962,7 +973,7 @@ | |||
4658 | 962 | 973 | ||
4659 | 963 | /********************************************************************//** | 974 | /********************************************************************//** |
4660 | 964 | Increments the modify clock of a frame by 1. The caller must (1) own the | 975 | Increments the modify clock of a frame by 1. The caller must (1) own the |
4662 | 965 | buf_pool mutex and block bufferfix count has to be zero, (2) or own an x-lock | 976 | LRU list mutex and block bufferfix count has to be zero, (2) or own an x-lock |
4663 | 966 | on the block. */ | 977 | on the block. */ |
4664 | 967 | UNIV_INLINE | 978 | UNIV_INLINE |
4665 | 968 | void | 979 | void |
4666 | @@ -973,7 +984,7 @@ | |||
4667 | 973 | #ifdef UNIV_SYNC_DEBUG | 984 | #ifdef UNIV_SYNC_DEBUG |
4668 | 974 | buf_pool_t* buf_pool = buf_pool_from_bpage((buf_page_t*) block); | 985 | buf_pool_t* buf_pool = buf_pool_from_bpage((buf_page_t*) block); |
4669 | 975 | 986 | ||
4671 | 976 | ut_ad((buf_pool_mutex_own(buf_pool) | 987 | ut_ad((mutex_own(&buf_pool->LRU_list_mutex) |
4672 | 977 | && (block->page.buf_fix_count == 0)) | 988 | && (block->page.buf_fix_count == 0)) |
4673 | 978 | || rw_lock_own(&(block->lock), RW_LOCK_EXCLUSIVE)); | 989 | || rw_lock_own(&(block->lock), RW_LOCK_EXCLUSIVE)); |
4674 | 979 | #endif /* UNIV_SYNC_DEBUG */ | 990 | #endif /* UNIV_SYNC_DEBUG */ |
4675 | @@ -1371,39 +1382,6 @@ | |||
4676 | 1371 | sync_thread_add_level(&block->lock, level, FALSE); | 1382 | sync_thread_add_level(&block->lock, level, FALSE); |
4677 | 1372 | } | 1383 | } |
4678 | 1373 | #endif /* UNIV_SYNC_DEBUG */ | 1384 | #endif /* UNIV_SYNC_DEBUG */ |
4679 | 1374 | /********************************************************************//** | ||
4680 | 1375 | Acquire mutex on all buffer pool instances. */ | ||
4681 | 1376 | UNIV_INLINE | ||
4682 | 1377 | void | ||
4683 | 1378 | buf_pool_mutex_enter_all(void) | ||
4684 | 1379 | /*==========================*/ | ||
4685 | 1380 | { | ||
4686 | 1381 | ulint i; | ||
4687 | 1382 | |||
4688 | 1383 | for (i = 0; i < srv_buf_pool_instances; i++) { | ||
4689 | 1384 | buf_pool_t* buf_pool; | ||
4690 | 1385 | |||
4691 | 1386 | buf_pool = buf_pool_from_array(i); | ||
4692 | 1387 | buf_pool_mutex_enter(buf_pool); | ||
4693 | 1388 | } | ||
4694 | 1389 | } | ||
4695 | 1390 | |||
4696 | 1391 | /********************************************************************//** | ||
4697 | 1392 | Release mutex on all buffer pool instances. */ | ||
4698 | 1393 | UNIV_INLINE | ||
4699 | 1394 | void | ||
4700 | 1395 | buf_pool_mutex_exit_all(void) | ||
4701 | 1396 | /*=========================*/ | ||
4702 | 1397 | { | ||
4703 | 1398 | ulint i; | ||
4704 | 1399 | |||
4705 | 1400 | for (i = 0; i < srv_buf_pool_instances; i++) { | ||
4706 | 1401 | buf_pool_t* buf_pool; | ||
4707 | 1402 | |||
4708 | 1403 | buf_pool = buf_pool_from_array(i); | ||
4709 | 1404 | buf_pool_mutex_exit(buf_pool); | ||
4710 | 1405 | } | ||
4711 | 1406 | } | ||
4712 | 1407 | /*********************************************************************//** | 1385 | /*********************************************************************//** |
4713 | 1408 | Get the nth chunk's buffer block in the specified buffer pool. | 1386 | Get the nth chunk's buffer block in the specified buffer pool. |
4714 | 1409 | @return the nth chunk's buffer block. */ | 1387 | @return the nth chunk's buffer block. */ |
4715 | @@ -1421,4 +1399,26 @@ | |||
4716 | 1421 | *chunk_size = chunk->size; | 1399 | *chunk_size = chunk->size; |
4717 | 1422 | return(chunk->blocks); | 1400 | return(chunk->blocks); |
4718 | 1423 | } | 1401 | } |
4719 | 1402 | |||
4720 | 1403 | #ifdef UNIV_DEBUG | ||
4721 | 1404 | /********************************************************************//** | ||
4722 | 1405 | Checks if buf_pool->zip_mutex is owned and is serving for a given page as its | ||
4723 | 1406 | block mutex. | ||
4724 | 1407 | @return true if buf_pool->zip_mutex is owned. */ | ||
4725 | 1408 | UNIV_INLINE | ||
4726 | 1409 | bool | ||
4727 | 1410 | buf_own_zip_mutex_for_page( | ||
4728 | 1411 | /*=======================*/ | ||
4729 | 1412 | const buf_page_t* bpage) | ||
4730 | 1413 | { | ||
4731 | 1414 | buf_pool_t* buf_pool = buf_pool_from_bpage(bpage); | ||
4732 | 1415 | |||
4733 | 1416 | ut_ad(buf_page_get_state(bpage) == BUF_BLOCK_ZIP_PAGE | ||
4734 | 1417 | || buf_page_get_state(bpage) == BUF_BLOCK_ZIP_DIRTY); | ||
4735 | 1418 | ut_ad(buf_page_get_mutex(bpage) == &buf_pool->zip_mutex); | ||
4736 | 1419 | |||
4737 | 1420 | return(mutex_own(&buf_pool->zip_mutex)); | ||
4738 | 1421 | } | ||
4739 | 1422 | #endif /* UNIV_DEBUG */ | ||
4740 | 1423 | |||
4741 | 1424 | #endif /* !UNIV_HOTBACKUP */ | 1424 | #endif /* !UNIV_HOTBACKUP */ |
4742 | 1425 | 1425 | ||
4743 | === modified file 'Percona-Server/storage/innobase/include/buf0flu.h' | |||
4744 | --- Percona-Server/storage/innobase/include/buf0flu.h 2013-08-16 09:11:51 +0000 | |||
4745 | +++ Percona-Server/storage/innobase/include/buf0flu.h 2013-09-20 05:29:11 +0000 | |||
4746 | @@ -37,7 +37,7 @@ | |||
4747 | 37 | extern ibool buf_page_cleaner_is_active; | 37 | extern ibool buf_page_cleaner_is_active; |
4748 | 38 | 38 | ||
4749 | 39 | /********************************************************************//** | 39 | /********************************************************************//** |
4751 | 40 | Remove a block from the flush list of modified blocks. */ | 40 | Remove a block from the flush list of modified blocks. */ |
4752 | 41 | UNIV_INTERN | 41 | UNIV_INTERN |
4753 | 42 | void | 42 | void |
4754 | 43 | buf_flush_remove( | 43 | buf_flush_remove( |
4755 | @@ -75,9 +75,9 @@ | |||
4756 | 75 | # if defined UNIV_DEBUG || defined UNIV_IBUF_DEBUG | 75 | # if defined UNIV_DEBUG || defined UNIV_IBUF_DEBUG |
4757 | 76 | /********************************************************************//** | 76 | /********************************************************************//** |
4758 | 77 | Writes a flushable page asynchronously from the buffer pool to a file. | 77 | Writes a flushable page asynchronously from the buffer pool to a file. |
4762 | 78 | NOTE: buf_pool->mutex and block->mutex must be held upon entering this | 78 | NOTE: block->mutex must be held upon entering this function, and they will be |
4763 | 79 | function, and they will be released by this function after flushing. | 79 | released by this function after flushing. This is loosely based on |
4764 | 80 | This is loosely based on buf_flush_batch() and buf_flush_page(). | 80 | buf_flush_batch() and buf_flush_page(). |
4765 | 81 | @return TRUE if the page was flushed and the mutexes released */ | 81 | @return TRUE if the page was flushed and the mutexes released */ |
4766 | 82 | UNIV_INTERN | 82 | UNIV_INTERN |
4767 | 83 | ibool | 83 | ibool |
4768 | @@ -232,9 +232,8 @@ | |||
4769 | 232 | Writes a flushable page asynchronously from the buffer pool to a file. | 232 | Writes a flushable page asynchronously from the buffer pool to a file. |
4770 | 233 | NOTE: in simulated aio we must call | 233 | NOTE: in simulated aio we must call |
4771 | 234 | os_aio_simulated_wake_handler_threads after we have posted a batch of | 234 | os_aio_simulated_wake_handler_threads after we have posted a batch of |
4775 | 235 | writes! NOTE: buf_pool->mutex and buf_page_get_mutex(bpage) must be | 235 | writes! NOTE: buf_page_get_mutex(bpage) must be held upon entering this |
4776 | 236 | held upon entering this function, and they will be released by this | 236 | function, and they will be released by this function. */ |
4774 | 237 | function. */ | ||
4777 | 238 | UNIV_INTERN | 237 | UNIV_INTERN |
4778 | 239 | void | 238 | void |
4779 | 240 | buf_flush_page( | 239 | buf_flush_page( |
4780 | 241 | 240 | ||
4781 | === modified file 'Percona-Server/storage/innobase/include/buf0flu.ic' | |||
4782 | --- Percona-Server/storage/innobase/include/buf0flu.ic 2013-08-06 15:16:34 +0000 | |||
4783 | +++ Percona-Server/storage/innobase/include/buf0flu.ic 2013-09-20 05:29:11 +0000 | |||
4784 | @@ -69,7 +69,6 @@ | |||
4785 | 69 | ut_ad(rw_lock_own(&(block->lock), RW_LOCK_EX)); | 69 | ut_ad(rw_lock_own(&(block->lock), RW_LOCK_EX)); |
4786 | 70 | #endif /* UNIV_SYNC_DEBUG */ | 70 | #endif /* UNIV_SYNC_DEBUG */ |
4787 | 71 | 71 | ||
4788 | 72 | ut_ad(!buf_pool_mutex_own(buf_pool)); | ||
4789 | 73 | ut_ad(!buf_flush_list_mutex_own(buf_pool)); | 72 | ut_ad(!buf_flush_list_mutex_own(buf_pool)); |
4790 | 74 | ut_ad(!mtr->made_dirty || log_flush_order_mutex_own()); | 73 | ut_ad(!mtr->made_dirty || log_flush_order_mutex_own()); |
4791 | 75 | 74 | ||
4792 | @@ -116,7 +115,6 @@ | |||
4793 | 116 | ut_ad(rw_lock_own(&(block->lock), RW_LOCK_EX)); | 115 | ut_ad(rw_lock_own(&(block->lock), RW_LOCK_EX)); |
4794 | 117 | #endif /* UNIV_SYNC_DEBUG */ | 116 | #endif /* UNIV_SYNC_DEBUG */ |
4795 | 118 | 117 | ||
4796 | 119 | ut_ad(!buf_pool_mutex_own(buf_pool)); | ||
4797 | 120 | ut_ad(!buf_flush_list_mutex_own(buf_pool)); | 118 | ut_ad(!buf_flush_list_mutex_own(buf_pool)); |
4798 | 121 | ut_ad(log_flush_order_mutex_own()); | 119 | ut_ad(log_flush_order_mutex_own()); |
4799 | 122 | 120 | ||
4800 | 123 | 121 | ||
4801 | === modified file 'Percona-Server/storage/innobase/include/buf0lru.h' | |||
4802 | --- Percona-Server/storage/innobase/include/buf0lru.h 2013-08-16 09:11:51 +0000 | |||
4803 | +++ Percona-Server/storage/innobase/include/buf0lru.h 2013-09-20 05:29:11 +0000 | |||
4804 | @@ -79,12 +79,14 @@ | |||
4805 | 79 | Try to free a block. If bpage is a descriptor of a compressed-only | 79 | Try to free a block. If bpage is a descriptor of a compressed-only |
4806 | 80 | page, the descriptor object will be freed as well. | 80 | page, the descriptor object will be freed as well. |
4807 | 81 | 81 | ||
4814 | 82 | NOTE: If this function returns true, it will temporarily | 82 | NOTE: If this function returns true, it will release the LRU list mutex, |
4815 | 83 | release buf_pool->mutex. Furthermore, the page frame will no longer be | 83 | and temporarily release and relock the buf_page_get_mutex() mutex. |
4816 | 84 | accessible via bpage. | 84 | Furthermore, the page frame will no longer be accessible via bpage. If this |
4817 | 85 | 85 | function returns false, the buf_page_get_mutex() might be temporarily released | |
4818 | 86 | The caller must hold buf_pool->mutex and must not hold any | 86 | and relocked too. |
4819 | 87 | buf_page_get_mutex() when calling this function. | 87 | |
4820 | 88 | The caller must hold the LRU list and buf_page_get_mutex() mutexes. | ||
4821 | 89 | |||
4822 | 88 | @return true if freed, false otherwise. */ | 90 | @return true if freed, false otherwise. */ |
4823 | 89 | UNIV_INTERN | 91 | UNIV_INTERN |
4824 | 90 | bool | 92 | bool |
4825 | @@ -291,7 +293,7 @@ | |||
4826 | 291 | extern buf_LRU_stat_t buf_LRU_stat_cur; | 293 | extern buf_LRU_stat_t buf_LRU_stat_cur; |
4827 | 292 | 294 | ||
4828 | 293 | /** Running sum of past values of buf_LRU_stat_cur. | 295 | /** Running sum of past values of buf_LRU_stat_cur. |
4830 | 294 | Updated by buf_LRU_stat_update(). Protected by buf_pool->mutex. */ | 296 | Updated by buf_LRU_stat_update(). */ |
4831 | 295 | extern buf_LRU_stat_t buf_LRU_stat_sum; | 297 | extern buf_LRU_stat_t buf_LRU_stat_sum; |
4832 | 296 | 298 | ||
4833 | 297 | /********************************************************************//** | 299 | /********************************************************************//** |
4834 | 298 | 300 | ||
4835 | === modified file 'Percona-Server/storage/innobase/include/sync0sync.h' | |||
4836 | --- Percona-Server/storage/innobase/include/sync0sync.h 2013-08-06 15:16:34 +0000 | |||
4837 | +++ Percona-Server/storage/innobase/include/sync0sync.h 2013-09-20 05:29:11 +0000 | |||
4838 | @@ -71,6 +71,11 @@ | |||
4839 | 71 | extern mysql_pfs_key_t buffer_block_mutex_key; | 71 | extern mysql_pfs_key_t buffer_block_mutex_key; |
4840 | 72 | extern mysql_pfs_key_t buf_pool_mutex_key; | 72 | extern mysql_pfs_key_t buf_pool_mutex_key; |
4841 | 73 | extern mysql_pfs_key_t buf_pool_zip_mutex_key; | 73 | extern mysql_pfs_key_t buf_pool_zip_mutex_key; |
4842 | 74 | extern mysql_pfs_key_t buf_pool_LRU_list_mutex_key; | ||
4843 | 75 | extern mysql_pfs_key_t buf_pool_free_list_mutex_key; | ||
4844 | 76 | extern mysql_pfs_key_t buf_pool_zip_free_mutex_key; | ||
4845 | 77 | extern mysql_pfs_key_t buf_pool_zip_hash_mutex_key; | ||
4846 | 78 | extern mysql_pfs_key_t buf_pool_flush_state_mutex_key; | ||
4847 | 74 | extern mysql_pfs_key_t cache_last_read_mutex_key; | 79 | extern mysql_pfs_key_t cache_last_read_mutex_key; |
4848 | 75 | extern mysql_pfs_key_t dict_foreign_err_mutex_key; | 80 | extern mysql_pfs_key_t dict_foreign_err_mutex_key; |
4849 | 76 | extern mysql_pfs_key_t dict_sys_mutex_key; | 81 | extern mysql_pfs_key_t dict_sys_mutex_key; |
4850 | @@ -632,7 +637,7 @@ | |||
4851 | 632 | Search system mutex | 637 | Search system mutex |
4852 | 633 | | | 638 | | |
4853 | 634 | V | 639 | V |
4855 | 635 | Buffer pool mutex | 640 | Buffer pool mutexes |
4856 | 636 | | | 641 | | |
4857 | 637 | V | 642 | V |
4858 | 638 | Log mutex | 643 | Log mutex |
4859 | @@ -723,11 +728,15 @@ | |||
4860 | 723 | SYNC_SEARCH_SYS, as memory allocation | 728 | SYNC_SEARCH_SYS, as memory allocation |
4861 | 724 | can call routines there! Otherwise | 729 | can call routines there! Otherwise |
4862 | 725 | the level is SYNC_MEM_HASH. */ | 730 | the level is SYNC_MEM_HASH. */ |
4864 | 726 | #define SYNC_BUF_POOL 150 /* Buffer pool mutex */ | 731 | #define SYNC_BUF_LRU_LIST 151 |
4865 | 727 | #define SYNC_BUF_PAGE_HASH 149 /* buf_pool->page_hash rw_lock */ | 732 | #define SYNC_BUF_PAGE_HASH 149 /* buf_pool->page_hash rw_lock */ |
4866 | 728 | #define SYNC_BUF_BLOCK 146 /* Block mutex */ | 733 | #define SYNC_BUF_BLOCK 146 /* Block mutex */ |
4869 | 729 | #define SYNC_BUF_FLUSH_LIST 145 /* Buffer flush list mutex */ | 734 | #define SYNC_BUF_FREE_LIST 145 |
4870 | 730 | #define SYNC_DOUBLEWRITE 140 | 735 | #define SYNC_BUF_ZIP_FREE 144 |
4871 | 736 | #define SYNC_BUF_ZIP_HASH 143 | ||
4872 | 737 | #define SYNC_BUF_FLUSH_STATE 142 | ||
4873 | 738 | #define SYNC_BUF_FLUSH_LIST 141 /* Buffer flush list mutex */ | ||
4874 | 739 | #define SYNC_DOUBLEWRITE 139 | ||
4875 | 731 | #define SYNC_ANY_LATCH 135 | 740 | #define SYNC_ANY_LATCH 135 |
4876 | 732 | #define SYNC_MEM_HASH 131 | 741 | #define SYNC_MEM_HASH 131 |
4877 | 733 | #define SYNC_MEM_POOL 130 | 742 | #define SYNC_MEM_POOL 130 |
4878 | 734 | 743 | ||
4879 | === modified file 'Percona-Server/storage/innobase/sync/sync0sync.cc' | |||
4880 | --- Percona-Server/storage/innobase/sync/sync0sync.cc 2013-09-02 10:01:38 +0000 | |||
4881 | +++ Percona-Server/storage/innobase/sync/sync0sync.cc 2013-09-20 05:29:11 +0000 | |||
4882 | @@ -1201,7 +1201,11 @@ | |||
4883 | 1201 | /* fallthrough */ | 1201 | /* fallthrough */ |
4884 | 1202 | } | 1202 | } |
4885 | 1203 | case SYNC_BUF_FLUSH_LIST: | 1203 | case SYNC_BUF_FLUSH_LIST: |
4887 | 1204 | case SYNC_BUF_POOL: | 1204 | case SYNC_BUF_LRU_LIST: |
4888 | 1205 | case SYNC_BUF_FREE_LIST: | ||
4889 | 1206 | case SYNC_BUF_ZIP_FREE: | ||
4890 | 1207 | case SYNC_BUF_ZIP_HASH: | ||
4891 | 1208 | case SYNC_BUF_FLUSH_STATE: | ||
4892 | 1205 | /* We can have multiple mutexes of this type therefore we | 1209 | /* We can have multiple mutexes of this type therefore we |
4893 | 1206 | can only check whether the greater than condition holds. */ | 1210 | can only check whether the greater than condition holds. */ |
4894 | 1207 | if (!sync_thread_levels_g(array, level-1, TRUE)) { | 1211 | if (!sync_thread_levels_g(array, level-1, TRUE)) { |
4895 | @@ -1215,17 +1219,12 @@ | |||
4896 | 1215 | 1219 | ||
4897 | 1216 | case SYNC_BUF_PAGE_HASH: | 1220 | case SYNC_BUF_PAGE_HASH: |
4898 | 1217 | /* Multiple page_hash locks are only allowed during | 1221 | /* Multiple page_hash locks are only allowed during |
4901 | 1218 | buf_validate and that is where buf_pool mutex is already | 1222 | buf_validate. */ |
4900 | 1219 | held. */ | ||
4902 | 1220 | /* Fall through */ | 1223 | /* Fall through */ |
4903 | 1221 | 1224 | ||
4904 | 1222 | case SYNC_BUF_BLOCK: | 1225 | case SYNC_BUF_BLOCK: |
4905 | 1223 | /* Either the thread must own the buffer pool mutex | ||
4906 | 1224 | (buf_pool->mutex), or it is allowed to latch only ONE | ||
4907 | 1225 | buffer block (block->mutex or buf_pool->zip_mutex). */ | ||
4908 | 1226 | if (!sync_thread_levels_g(array, level, FALSE)) { | 1226 | if (!sync_thread_levels_g(array, level, FALSE)) { |
4909 | 1227 | ut_a(sync_thread_levels_g(array, level - 1, TRUE)); | 1227 | ut_a(sync_thread_levels_g(array, level - 1, TRUE)); |
4910 | 1228 | ut_a(sync_thread_levels_contain(array, SYNC_BUF_POOL)); | ||
4911 | 1229 | } | 1228 | } |
4912 | 1230 | break; | 1229 | break; |
4913 | 1231 | case SYNC_REC_LOCK: | 1230 | case SYNC_REC_LOCK: |
Hi Laurynas,
This is not a complete review, I'm only about 10% into the patch, but I'm posting the comments I have so far to parallelize things a bit:
- the code in btr_blob_free() can be simplified. just initialize LRU_free_ page() whenever it is called, and then do this at the
‘freed’ with ‘false’, then assign to it the result of
buf_
end:
if (!freed) { exit(&buf_ pool->LRU_ list_mutex) ;
mutex_
}
Would result in less hairy code.
- wrong comments for buf_LRU_ free_page( ): s/block buf_page_ get_mutex( ) mutex/
mutex/
- the comments for buf_LRU_free_page() say that both LRU_list_mutex
LRU_list_ mutex is always released upon return, but block_mutex is
and block_mutex may be released temporarily if ‘true’ is
returned. But:
1) even if ‘false’ is returned, block_mutex may also be released
temporarily
2) the comments don’t mention that if ‘true’ is returned,
always locked. And callers of buf_LRU_free_page() rely on that.
- the following code in buf_LRU_free_page() is missing a page_free_ descriptor( ) call if b != NULL. Which is a potential
buf_
memory leak.
if (!buf_LRU_ block_remove_ hashed( bpage, zip)) { &buf_pool- >LRU_list_ mutex); block_mutex) ;
+
+ mutex_exit(
+
+ mutex_enter(
+
return(true);
}
- the patch removes buf_pool_ mutex_enter_ all() from search_ validate_ one_table( ), but then does a number of dirty
btr_
reads from ‘block’ before it locks block->mutex. Any reasons to not
lock block->mutex earlier?
- the following checks for mutex != NULL in buf_buddy_ relocate( ) seem
to be redundant, since they are made after mutex_enter(mutex), so we
are guaranteed mutex != NULL if we reach that code:
@@ -584,7 +604,11 @@ buf_buddy_relocate(
mutex_ enter(mutex) ;
- if (buf_page_ can_relocate( bpage)) { s_unlock( hash_lock) ; &buf_pool- >zip_free_ mutex); can_relocate( bpage)) {
+ rw_lock_
+
+ mutex_enter(
+
+ if (mutex && buf_page_
and
- mutex_exit(mutex); own(&buf_ pool->zip_ free_mutex) );
+ if (mutex)
+ mutex_exit(mutex);
+
+ ut_ad(mutex_
and the last hunk is also missing braces (in case you decide to keep
it).
- asserting that zip_free_mutex is locked also looks redundant to me,
because it is locked just a few lines above, and there’s nothing in
the code path that could release it.
- os_atomic_ load_ulint( ) / os_atomic_ store_ulint( )... I don’t think we load_ulint( ) / ordered_ store_ulint( ), but... what specific order are you trying
need that stuff. Their names are misleading as they don’t enforce
any atomicity. They should be named os_ordered_
os_
to enforce with those constructs?
- I don’t see a point in maintaining multiple list nodes in buf_page_t
(i.e. ‘free’, ‘flush_list’ and ‘zip_list’). As I understand, each
page may only be in a single list at any point in time, so splitting
the list node is purely cosmetic.
On the other hand, we are looking at a non-trivial buf_page_t size
increase (112 bytes before the patch, 144 bytes after). Leaving all
cache and memory locality questions aside, that’s 64 MB of memory
just for list node pointers on a system with a 32 GB buffer pool....