Created by Laurynas Biveinis on 2013-09-20 and last modified on 2013-09-26
Get this branch:
bzr branch lp:~laurynas-biveinis/percona-server/xtradb-thread-priority-flag
Only Laurynas Biveinis can upload to this branch. If you are Laurynas Biveinis please log in for upload directions.

Branch merges

Related bugs

Related blueprints

Branch information

Recent revisions

430. By Laurynas Biveinis on 2013-09-26

Implement priority RW lock

The priority RW lock is implemented through a new type prio_rw_lock_t
that includes regular rw_lock_t as its first member (code exploits the
regular RW lock being priority RW lock prefix in memory in several
places). Extend it by adding separate events for high priority
waiters with corresponding waiters flag. Implement priority RW lock
API as C++ overloads for the existing regular RW lock API in order to
minimize code differences outside sync0rw.*. In all cases the
acquisition priority is determined by the relative priority of the
current thread. No new API to override this has been defined.

The priority order for priority RW lock waiters has been implemented
as follows:
1) high priority next-writer waiter;
2) high priority X waiters;
3) high priority S waiters;
4) regular priority next-writer waiter;
5) regular priority waiters.

Another priority RW lock behavior change from the regular RW lock is
that a regular priority S request always checks for high-priority
waiters presence before attempting to lock and does not attempt to
lock if any found. This is done even for the cases where such lock
attempt would succeed, in order to prevent a situation where a
sequence of partially overlapping S requests starve a high priority X
waiter indefinitely. Moreover, this is a performance optimization by
skipping redundant spinning, done for X requests too, where the
correctness issue does not apply.

Adjust rw_lock_x_lock_func() to accept both regular and priority
lock by passing the lock as a generic void * pointer, which is then
casted and processed according to the newly-added priority flags.
Likewise for rw_lock_s_lock_spin(). Refactor out
rw_lock_x_prepare_unlock() from rw_lock_x_unlock_func() to be used
from both its overloads.

Define new wait array object types PRIO_RW_LOCK_SHARED and

Convert the following locks to be priority: fsp, page_hash, AHI,
index, purge.

At the same time fix http://bugs.mysql.com/bug.php?id=70417 / bug
(rw_lock_x_lock_func_nowait() calls os_thread_get_curr_id()
mostly needlessly) by pushing down the os_thread_get_curr_id() call to
its actual use site in rw_lock_x_lock_func_nowait().

429. By Laurynas Biveinis on 2013-09-23

Implement priority refill for the buffer pool free list

The free list has producers (the page cleaner thread, and single-page
LRU flushes by the query threads) and consumers (query and utility
threads). If the free list becomes empty due to heavy workload, the
consumers start waiting on the free list mutex in increasingly higher
numbers. These waits cause the producers to wait on that mutex
too, greatly reducing the refill speed, which in turn prevents the
consumers from getting a page and stopping waiting on the free list

Fix by always acquiring the free list mutex with high priority in the
sole producer function buf_LRU_block_free_non_file_page(), and with
low priority in the sole consumer function buf_LRU_get_free_only().

428. By Laurynas Biveinis on 2013-09-23

Implement XtraDB priority mutex

The priority mutex is implemented through a new mutex type ib_prio_mutex_t.
The mutex API is provided through the overloads of existing ib_mutex_t
API to minimize code changes. The mutex structure includes the
regular mutex in order to share the code as much as possible. Upon
mutex acquisition, one of the three priority options are available:
default priority, which means that srv_current_thread_priority will be
used; high priority; low priority. The latter two options are
provided for the cases where srv_current_thread_priority should be
ignored and are available through two new defines mutex_enter_first
and mutex_enter_last.

When a high priority acquisition is requested (either an explicit high
priority request, either a default priority request from a high
priority thread), a new wait array wait object type SYNC_PRIO_MUTEX is
used that waits on a separate high-priority wait event for the mutex.
Upon a mutex exit, any high-priority waiters are signalled first.

Convert the following mutexes to priority mutexes: hash table sync
object mutexes, buffer pool LRU and free list mutexes, data dictionary

427. By Laurynas Biveinis on 2013-09-23

Implement thread priority flag
that would be available for InnoDB threads to check whether they
should acquire some shared resource with priority or no. The actual
uses of this flag will be in follow-up.

This flag srv_current_thread_priority is implemented through
thread-local storage. There were two other alternatives considered
for its implementation.
1. Passing the flag value from the DECLARE_THREAD functions to its use
   sites through the callstacks. But such callstacks would be very
   deep and would require patching dozens of InnoDB functions to
   include this new arg.
2. pthread_setspecific()/pthread_getspecific(). This would have the
   advantage of being slightly more portable than TLS, but it's more
   complicated to use: 1) the flag would have to be allocated in
   heap. 2) pthread_key_create() calls through pthread_once() would
   be necessary, moreover they'd have to be placed in mysys instead of
   InnoDB and any affected non-mysys-initializing threads in InnoDB
   would have to be converted to initialize mysys. 3)
   pthread_getspecific() calls would have to happen in very hot code
   paths, such as mutex/rwlatch locking code.

Thus TLS appears to be the best option.

Means to change the flag value for individual InnoDB utility threads
are provided for UNIV_PERF_DEBUG or UNIV_DEBUG builds on Linux through
the new global dynamic variables innodb_priority_purge,
innodb_priority_io, innodb_priority_cleaner, innodb_priority_master.
Added sys_vars tests for them.

Branch metadata

Branch format:
Branch format 7
Repository format:
Bazaar repository format 2a (needs bzr 1.16 or later)
Stacked on:
This branch contains Public information 
Everyone can see this information.