Percona Server moved to https://jira.percona.com/projects/PS

innodb_adaptive_hash_index_partitions may cause hangup

Bug #791030 reported by Yasufumi Kinoshita on 2011-06-01

This bug affects 4 people

Affects		Status	Importance	Assigned to	Milestone
	Percona Server moved to https://jira.percona.com/projects/PS	Fix Released	High	Unassigned
	5.5	Fix Released	High	Unassigned

Bug Description

innodb_adaptive_hash_index_partitions > 1 may cause hangup.
Sometimes I met the hangup at high-spec bencnhmarks.

The part of innodb_adaptive_hash_index_partitions.patch

+ } else if (btr_search_index_num > 1) {
+ rw_lock_t* btr_search_latch;
+
+ /* FIXME: This may be optimistic implementation still. */
+ btr_search_latch = (rw_lock_t*)(block->btr_search_latch);
+ if (UNIV_LIKELY(!btr_search_latch)) {
+ if (block->is_hashed) {
+ goto retry;
+ }
+ return;
+ }
+ rw_lock_s_lock(btr_search_latch);

is insufficient a little still.

Tags:

Related branches

lp:~percona-dev/percona-server/5.5.13 (Merged)

lp:~stewart/percona-server/5.5.13-merge

Merged into lp:percona-server/5.5 at revision 122

Oleg Tsarev (community): Needs Fixing (dev) on 2011-06-20

Percona developers: Pending requested 2011-06-20

Diff: 11671 lines (+7079/-693)

63 files modified

Makefile (+2/-2)
control_online_alter_index.patch (+1/-1)
innodb_adaptive_hash_index_partitions.patch (+45/-27)
innodb_admin_command_base.patch (+1/-1)
innodb_buffer_pool_pages_i_s.patch (+3/-3)
innodb_buffer_pool_shm.patch (+7/-7)
innodb_deadlock_count.patch (+3/-3)
innodb_dict_size_limit.patch (+10/-10)
innodb_expand_import.patch (+597/-87)
innodb_extend_slow.patch (+35/-34)
innodb_extra_rseg.patch (+1/-1)
innodb_fake_changes.patch (+614/-0)
innodb_fast_checksum.patch (+33/-46)
innodb_files_extend.patch (+15/-24)
innodb_fix_misc.patch (+39/-39)
innodb_io_patches.patch (+39/-38)
innodb_lru_dump_restore.patch (+8/-8)
innodb_overwrite_relay_log_info.patch (+17/-18)
innodb_pass_corrupt_table.patch (+106/-100)
innodb_recovery_patches.patch (+10/-10)
innodb_separate_doublewrite.patch (+34/-34)
innodb_show_lock_name.patch (+6/-6)
innodb_show_status.patch (+15/-15)
innodb_show_status_extend.patch (+8/-8)
innodb_show_sys_tables.patch (+1/-1)
innodb_split_buf_pool_mutex.patch (+37/-26)
innodb_stats.patch (+31/-25)
innodb_thread_concurrency_timer_based.patch (+7/-7)
install_tests.sh.THIS (+35/-0)
log_connection_error.patch (+3/-3)
log_warnings_suppress.patch (+3/-3)
memory_dynamic_rows.patch (+2865/-0)
microsec_process.patch (+1/-1)
mysql-test/memory_dynamic_rows.patch/percona_heap_blob.result (+952/-0)
mysql-test/memory_dynamic_rows.patch/percona_heap_blob.test.disabled (+642/-0)
mysql-test/memory_dynamic_rows.patch/percona_heap_bug783366.result (+14/-0)
mysql-test/memory_dynamic_rows.patch/percona_heap_bug783366.test.disabled (+19/-0)
mysql-test/memory_dynamic_rows.patch/percona_heap_bug783451.result (+132/-0)
mysql-test/memory_dynamic_rows.patch/percona_heap_bug783451.test.disabled (+16/-0)
mysql-test/memory_dynamic_rows.patch/percona_heap_bug784464.result (+70/-0)
mysql-test/memory_dynamic_rows.patch/percona_heap_bug784464.test.disabled (+67/-0)
mysql-test/memory_dynamic_rows.patch/percona_heap_bug784468.result (+15/-0)
mysql-test/memory_dynamic_rows.patch/percona_heap_bug784468.test.disabled (+16/-0)
mysql-test/memory_dynamic_rows.patch/percona_heap_bug788544.result (+9/-0)
mysql-test/memory_dynamic_rows.patch/percona_heap_bug788544.test.disabled (+15/-0)
mysql-test/memory_dynamic_rows.patch/percona_heap_bug788576.result (+19/-0)
mysql-test/memory_dynamic_rows.patch/percona_heap_bug788576.test.disabled (+19/-0)
mysql-test/memory_dynamic_rows.patch/percona_heap_bug788722.result (+18/-0)
mysql-test/memory_dynamic_rows.patch/percona_heap_bug788722.test.disabled (+20/-0)
mysql-test/memory_dynamic_rows.patch/percona_heap_bug789131.result (+7/-0)
mysql-test/memory_dynamic_rows.patch/percona_heap_bug789131.test.disabled (+14/-0)
mysql-test/memory_dynamic_rows.patch/percona_heap_var.result (+194/-0)
mysql-test/memory_dynamic_rows.patch/percona_heap_var.test.disabled (+84/-0)
normalize_patches.sh (+1/-1)
optimizer_fix.patch (+2/-2)
processlist_row_stats.patch (+2/-2)
query_cache_enhance.patch (+2/-2)
response_time_distribution.patch (+8/-8)
show_slave_status_nolock.patch (+1/-1)
show_temp.patch (+6/-6)
slow_extended.patch (+17/-17)
sql_no_fcache.patch (+8/-8)
userstat.patch (+58/-58)

Yasufumi Kinoshita (yasufumi-kinoshita) on 2011-06-01

Changed in percona-server:
assignee:	nobody → Yasufumi Kinoshita (yasufumi-kinoshita)
status:	New → Confirmed
importance:	Undecided → Medium

Stewart Smith (stewart) on 2011-06-07

Changed in percona-server:
importance:	Medium → High
status:	Confirmed → Triaged

Revision history for this message

Yasufumi Kinoshita (yasufumi-kinoshita) wrote on 2011-06-08:

Next problematic place is:

--------------------------------------------------
hash index semaphore! */

#ifndef UNIV_SEARCH_DEBUG
- if (!trx->has_search_latch) {
- rw_lock_s_lock(&btr_search_latch);
- trx->has_search_latch = TRUE;
+ if (!(trx->has_search_latch
+ & ((ulint)1 << (index->id % btr_search_index_num)))) {
+ rw_lock_s_lock(btr_search_get_latch(index->id));
+ trx->has_search_latch |=
+ (ulint)1 << (index->id % btr_search_index_num);
}
#endif
switch (row_sel_try_search_shortcut_for_mysql(
--------------------------------------------------

We should decide explicit latch order between btr_search_latch_part[n], and rewrite patch to this function.
It is more possibly to cause hangup than the reported before.

Revision history for this message

Yasufumi Kinoshita (yasufumi-kinoshita) wrote on 2011-06-13:

I found true reason of the hangup. I will fix in today.

It seems to be needed to release s-latch of all hash indexes if someone other wait one of it.

The current implementation (If someone wait the s-latch "which the trx hold", release it only) might cause wait chain of the btr_search_latch_part[n]. only "release all" also seems not enough.

Yasufumi Kinoshita (yasufumi-kinoshita) on 2011-06-13

Changed in percona-server:
status:	Triaged → Fix Committed

Stewart Smith (stewart) on 2011-06-20

Changed in percona-server:
milestone:	none → 5.5.13-20.4

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2012-10-23:

This issue is not fixed yet. Setting multiple AHI partitions (mostly with multiple buffer pools) triggers a lock deadlock.

--Thread 139558755972864 has waited at btr0sea.c line 1197 for 930.00 seconds the semaphore:
X-lock (wait_ex) on RW-latch at 0x9390f558 'btr_search_latch_part[i]'
a writer (thread id 139558755972864) has reserved it in mode wait exclusive
number of readers 1, waiters flag 1, lock_word: ffffffffffffffff
Last time read locked in file btr0sea.c line 1099
Last time write locked in file /home/jenkins/workspace/percona-server-5.5-rpms/label_exp/centos6-64/target/BUILD/Percona-Server-5.5.27-rel28.1/Percona-Server-5.5.27-rel28.1/storage/innobase/btr/btr0sea.c line 669
InnoDB: Warning: a long semaphore wait:

As you can see, 139558755972864 is waiting on itself.

The call chain is as follows: (will provide full trace later)

    pthread_cond_wait
    os_cond_wait
    os_event_wait_low
    sync_array_wait_event
    rw_lock_x_lock_wait
    rw_lock_x_lock_low
    rw_lock_x_lock_func
    pfs_rw_lock_x_lock_func
    btr_search_drop_page_hash_index
    buf_LRU_free_block
    buf_LRU_free_from_common_LRU_list
    buf_LRU_search_and_free_block
    buf_LRU_get_free_block
    buf_block_alloc
    btr_search_check_free_space_in_heap
    btr_search_info_update_slow
    btr_search_info_update
    btr_cur_search_to_nth_level
    btr_pcur_open_with_no_init_func
    row_sel_try_search_shortcut_for_mysql
    row_search_for_mysql
    ha_innobase::index_read
    join_read_key
    sub_select
    evaluate_join_record
    sub_select
    evaluate_join_record
    sub_select
    evaluate_join_record
    sub_select
    evaluate_join_record
    sub_select
    evaluate_join_record
    sub_select
    evaluate_join_record
    sub_select
    do_select
    JOIN::exec
    mysql_select
    handle_select
    execute_sqlcom_select
    mysql_execute_command
    mysql_parse
    dispatch_command
    do_handle_one_connection
    handle_one_connection
    start_thread
    clone

The suspect in question is btr_search_drop_page_hash_index, since it deals with multiple partitions differently, and the FIXME mentioned in the description is still present in the code.

=================================
if (btr_search_index_num > 1) {
rw_lock_t* btr_search_latch;

  /* FIXME: This may be optimistic implementation still. */
  btr_search_latch = (rw_lock_t*)(block->btr_search_latch);
  if (UNIV_LIKELY(!btr_search_latch)) {
   if (block->index) {
    goto retry;
   }
   return;
  }
  ......
  ..
=================================================

It has also been reported in lp:331659

This issue is not fixed yet. Setting multiple AHI partitions (mostly with multiple buffer pools) triggers a lock deadlock.

--Thread 139558755972864 has waited at btr0sea.c line 1197 for 930.00 seconds the semaphore:
X-lock (wait_ex) on RW-latch at 0x9390f558 'btr_search_latch_part[i]'
a writer (thread id 139558755972864) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 1, lock_word: ffffffffffffffff
Last time read locked in file btr0sea.c line 1099
Last time write locked in file /home/jenkins/workspace/percona-server-5.5-rpms/label_exp/centos6-64/target/BUILD/Percona-Server-5.5.27-rel28.1/Percona-Server-5.5.27-rel28.1/storage/innobase/btr/btr0sea.c line 669
InnoDB: Warning: a long semaphore wait:

As you can see, 139558755972864 is waiting on itself.

The call chain is as follows: (will provide full trace later)

pthread_cond_wait
    os_cond_wait 
    os_event_wait_low 
    sync_array_wait_event 
    rw_lock_x_lock_wait 
    rw_lock_x_lock_low 
    rw_lock_x_lock_func 
    pfs_rw_lock_x_lock_func 
    btr_search_drop_page_hash_index       
    buf_LRU_free_block 
    buf_LRU_free_from_common_LRU_list 
    buf_LRU_search_and_free_block 
    buf_LRU_get_free_block 
    buf_block_alloc 
    btr_search_check_free_space_in_heap   
    btr_search_info_update_slow           
    btr_search_info_update 
    btr_cur_search_to_nth_level 
    btr_pcur_open_with_no_init_func 
    row_sel_try_search_shortcut_for_mysql 
    row_search_for_mysql 
    ha_innobase::index_read 
    join_read_key 
    sub_select 
    evaluate_join_record 
    sub_select 
    evaluate_join_record 
    sub_select 
    evaluate_join_record 
    sub_select 
    evaluate_join_record 
    sub_select 
    evaluate_join_record 
    sub_select 
    evaluate_join_record 
    sub_select 
    do_select 
    JOIN::exec 
    mysql_select 
    handle_select 
    execute_sqlcom_select 
    mysql_execute_command 
    mysql_parse 
    dispatch_command 
    do_handle_one_connection 
    handle_one_connection 
    start_thread 
    clone

The suspect in question is  btr_search_drop_page_hash_index, since it deals with multiple partitions differently, and  the FIXME mentioned in the description is still present in the code.

=================================
	if (btr_search_index_num > 1) {
		rw_lock_t*	btr_search_latch;

/* FIXME: This may be optimistic implementation still. */
		btr_search_latch = (rw_lock_t*)(block->btr_search_latch);
		if (UNIV_LIKELY(!btr_search_latch)) {
			if (block->index) {
				goto retry;
			}
			return;
		}
		......
		..
=================================================

It has also been reported in  lp:331659

Raghavendra D Prabhu (raghavendra-prabhu) on 2012-10-29

tags:

added: i26423

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2012-12-06:

While checking an AHI-related issue, I noticed that
mem_heap_create_block uses buf_block_alloc which round robins
through available buffer pools. So, in the presence of multiple
partitions and multiple buffer pools, can it be possible that
the locking meant for one pool is used for another, resulting in
deadlock?

Revision history for this message

vineet khanna (khannavin) wrote on 2012-12-11:

Hi,

I am having

mysql> show variables like 'innodb_adaptive_hash_index_partitions';
+---------------------------------------+-------+
| Variable_name | Value |
+---------------------------------------+-------+
| innodb_adaptive_hash_index_partitions | 1 |
+---------------------------------------+-------+

Still i am getting SEMAPHORES issue:

----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 4641988, signal count 327450531
--Thread 1219737920 has waited at row0sel.c line 3698 for 0.0000 seconds the semaphore:
S-lock on RW-latch at 0x29ea4b18 'btr_search_latch_part[i]'
number of readers 0, waiters flag 0, lock_word: 100000
Last time read locked in file btr0sea.c line 918
Last time write locked in file /home/jenkins/workspace/percona-server-5.5-rpms/label_exp/centos5-64/target/BUILD/Percona-Server-5.5.22-rel25.2/Percona-Server-5.5.22-rel25.2/storage/innobase/btr/btr0sea.c line 669
--Thread 1218406720 has waited at btr0sea.c line 1508 for 0.0000 seconds the semaphore:
S-lock on RW-latch at 0x29ea4b18 'btr_search_latch_part[i]'
number of readers 0, waiters flag 0, lock_word: 100000
Last time read locked in file btr0sea.c line 918
Last time write locked in file /home/jenkins/workspace/percona-server-5.5-rpms/label_exp/centos5-64/target/BUILD/Percona-Server-5.5.22-rel25.2/Percona-Server-5.5.22-rel25.2/storage/innobase/btr/btr0sea.c line 669
--Thread 1219205440 has waited at btr0sea.c line 1508 for 0.0000 seconds the semaphore:
S-lock on RW-latch at 0x29ea4b18 'btr_search_latch_part[i]'
number of readers 0, waiters flag 0, lock_word: 100000
Last time read locked in file btr0sea.c line 918

Please update if its a same issue.

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2012-12-11: Re: [Launchpad] [Bug 791030] Re: innodb_adaptive_hash_index_partitions may cause hangup

* On Tue, Dec 11, 2012 at 07:18:42AM -0000, vineet khanna <email address hidden> wrote:
>Hi,
>
>I am having
>
>mysql> show variables like 'innodb_adaptive_hash_index_partitions';
>+---------------------------------------+-------+
>| Variable_name | Value |
>+---------------------------------------+-------+
>| innodb_adaptive_hash_index_partitions | 1 |
>+---------------------------------------+-------+
>
>Still i am getting SEMAPHORES issue:
>
>----------
>SEMAPHORES
>----------
>OS WAIT ARRAY INFO: reservation count 4641988, signal count 327450531
>--Thread 1219737920 has waited at row0sel.c line 3698 for 0.0000 seconds the semaphore:
>S-lock on RW-latch at 0x29ea4b18 'btr_search_latch_part[i]'
>number of readers 0, waiters flag 0, lock_word: 100000
>Last time read locked in file btr0sea.c line 918
>Last time write locked in file /home/jenkins/workspace/percona-server-5.5-rpms/label_exp/centos5-64/target/BUILD/Percona-Server-5.5.22-rel25.2/Percona-Server-5.5.22-rel25.2/storage/innobase/btr/btr0sea.c line 669
>--Thread 1218406720 has waited at btr0sea.c line 1508 for 0.0000 seconds the semaphore:
>S-lock on RW-latch at 0x29ea4b18 'btr_search_latch_part[i]'
>number of readers 0, waiters flag 0, lock_word: 100000
>Last time read locked in file btr0sea.c line 918
>Last time write locked in file /home/jenkins/workspace/percona-server-5.5-rpms/label_exp/centos5-64/target/BUILD/Percona-Server-5.5.22-rel25.2/Percona-Server-5.5.22-rel25.2/storage/innobase/btr/btr0sea.c line 669
>--Thread 1219205440 has waited at btr0sea.c line 1508 for 0.0000 seconds the semaphore:
>S-lock on RW-latch at 0x29ea4b18 'btr_search_latch_part[i]'
>number of readers 0, waiters flag 0, lock_word: 100000
>Last time read locked in file btr0sea.c line 918
>
>
>Please update if its a same issue.
>
It is not, in your case, setting
innodb_adaptive_hash_index_partitions to non-default value (say
8) should ameliorate your issue.

The lp issue is about circular lock deadlock like for
139558755972864 in comment#3.

* On Tue, Dec 11, 2012 at 07:18:42AM -0000, vineet khanna <791030@bugs.launchpad.net> wrote:
>Hi,
>
>I am having
>
>mysql> show variables like 'innodb_adaptive_hash_index_partitions';
>+---------------------------------------+-------+
>| Variable_name                         | Value |
>+---------------------------------------+-------+
>| innodb_adaptive_hash_index_partitions | 1     |
>+---------------------------------------+-------+
>
>Still i am getting SEMAPHORES issue:
>
>----------
>SEMAPHORES
>----------
>OS WAIT ARRAY INFO: reservation count 4641988, signal count 327450531
>--Thread 1219737920 has waited at row0sel.c line 3698 for 0.0000 seconds the semaphore:
>S-lock on RW-latch at 0x29ea4b18 'btr_search_latch_part[i]'
>number of readers 0, waiters flag 0, lock_word: 100000
>Last time read locked in file btr0sea.c line 918
>Last time write locked in file /home/jenkins/workspace/percona-server-5.5-rpms/label_exp/centos5-64/target/BUILD/Percona-Server-5.5.22-rel25.2/Percona-Server-5.5.22-rel25.2/storage/innobase/btr/btr0sea.c line 669
>--Thread 1218406720 has waited at btr0sea.c line 1508 for 0.0000 seconds the semaphore:
>S-lock on RW-latch at 0x29ea4b18 'btr_search_latch_part[i]'
>number of readers 0, waiters flag 0, lock_word: 100000
>Last time read locked in file btr0sea.c line 918
>Last time write locked in file /home/jenkins/workspace/percona-server-5.5-rpms/label_exp/centos5-64/target/BUILD/Percona-Server-5.5.22-rel25.2/Percona-Server-5.5.22-rel25.2/storage/innobase/btr/btr0sea.c line 669
>--Thread 1219205440 has waited at btr0sea.c line 1508 for 0.0000 seconds the semaphore:
>S-lock on RW-latch at 0x29ea4b18 'btr_search_latch_part[i]'
>number of readers 0, waiters flag 0, lock_word: 100000
>Last time read locked in file btr0sea.c line 918
>
>
>Please update if its a same issue.
>
It is not, in your case, setting 
innodb_adaptive_hash_index_partitions to non-default value (say
8)  should ameliorate your issue.

The lp issue is about circular lock deadlock like for 
139558755972864 in comment#3.

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2012-12-11:

>I am having
>
>mysql> show variables like 'innodb_adaptive_hash_index_partitions';
>+---------------------------------------+-------+
>| Variable_name | Value |
>+---------------------------------------+-------+
>| innodb_adaptive_hash_index_partitions | 1 |
>+---------------------------------------+-------+
>
>Still i am getting SEMAPHORES issue:
>
>----------
>SEMAPHORES
>----------
>OS WAIT ARRAY INFO: reservation count 4641988, signal count 327450531
>--Thread 1219737920 has waited at row0sel.c line 3698 for 0.0000 seconds the semaphore:
>S-lock on RW-latch at 0x29ea4b18 'btr_search_latch_part[i]'
>number of readers 0, waiters flag 0, lock_word: 100000
>Last time read locked in file btr0sea.c line 918
>Last time write locked in file /home/jenkins/workspace/percona-server-5.5-rpms/label_exp/centos5-64/target/BUILD/Percona-Server-5.5.22-rel25.2/Percona-Server-5.5.22-rel25.2/storage/innobase/btr/btr0sea.c line 669
>--Thread 1218406720 has waited at btr0sea.c line 1508 for 0.0000 seconds the semaphore:
>S-lock on RW-latch at 0x29ea4b18 'btr_search_latch_part[i]'
>number of readers 0, waiters flag 0, lock_word: 100000
>Last time read locked in file btr0sea.c line 918
>Last time write locked in file /home/jenkins/workspace/percona-server-5.5-rpms/label_exp/centos5-64/target/BUILD/Percona-Server-5.5.22-rel25.2/Percona-Server-5.5.22-rel25.2/storage/innobase/btr/btr0sea.c line 669
>--Thread 1219205440 has waited at btr0sea.c line 1508 for 0.0000 seconds the semaphore:
>S-lock on RW-latch at 0x29ea4b18 'btr_search_latch_part[i]'
>number of readers 0, waiters flag 0, lock_word: 100000
>Last time read locked in file btr0sea.c line 918
>
>
>Please update if its a same issue.
>
It is not, in your case, setting
innodb_adaptive_hash_index_partitions to non-default value (say
8) should ameliorate your issue.

The lp issue is about circular lock deadlock like for
139558755972864 in comment#3.

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2012-12-11:

To add to previous comment, the main issue is mostly noticeable when there are multiple buffer pools and multiple AHI partitions, so setting AHI partitions to like 8 should be fine.

Revision history for this message

vineet khanna (khannavin) wrote on 2012-12-17:

Hi Raghavendra,

As per your comment:

"innodb_adaptive_hash_index_partitions to non-default value (say 8) should ameliorate your issue."

This change will put me in that Bug 791030 or will resolve my problem.

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2012-12-18:

#10

It should resolve your problem. As mentioned in #8, it mostly affects in presence of multiple buffer pools and multiple AHI partitions.

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2013-01-17:

#11

I am still able to reproduce it (but not crash it - since it requires a much larger buffer pool for this) with config in http://sprunge.us/aESA and backtrace (though not just in time) in http://sprunge.us/INjC

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2013-01-17:

#12

Following sysbench was used in #11:

sysbench --test=./oltp.lua --db-driver=mysql --mysql-engine-trx=yes --mysql-table-engine=innodb --mysql-user=root --mysql-password=test --oltp-table-size=30000 --num-threads=32 --max-requests=1000000 --oltp-tables-count=18 run

Revision history for this message

Alexey Kopytov (akopytov) wrote on 2013-01-17:

#13

@Raghavendra,

Can you report it as another bug?

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2013-01-17:

#14

Sure, will do.

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2013-01-17:

#15

Reported as lp:1100760

Laurynas Biveinis (laurynas-biveinis) on 2013-02-06

tags:

added: xtradb

Revision history for this message

Alexey Kopytov (akopytov) wrote on 2013-02-20:

#16

Closing this, as the linked branch has been merged, and remaining issues reported elsewhere.

Laurynas Biveinis (laurynas-biveinis) on 2013-08-25

tags:

added: ahi-partitions

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2013-08-25:

#17

It is not clear why the committed fix for this bug gathers a bit mask all of X waiters.

The original 5.5 code is

if (UNIV_UNLIKELY(rw_lock_get_writer(&btr_search_latch) != RW_LOCK_NOT_LOCKED)
&& trx->has_search_latch) {

  /* There is an x-latch request on the adaptive hash index:
  release the s-latch to reduce starvation and wait for
  BTR_SEA_TIMEOUT rounds before trying to keep it again over
  calls from MySQL */

rw_lock_s_unlock(&btr_search_latch);
trx->has_search_latch = FALSE;

The current XtraDB 5.5 code is

should_release = 0;
for (i = 0; i < btr_search_index_num; i++) {
  /* we should check all latches (fix Bug#791030) */
  if (UNIV_UNLIKELY(rw_lock_get_writer(btr_search_latch_part[i])
      != RW_LOCK_NOT_LOCKED)) {
   should_release |= ((ulint)1 << i);
  }
}

if (UNIV_UNLIKELY(should_release)) {

  for (i = 0; i < btr_search_index_num; i++) {
   /* we should release all s-latches (fix Bug#791030) */
   if (trx->has_search_latch & ((ulint)1 << i)) {
    rw_lock_s_unlock(btr_search_latch_part[i]);
    trx->has_search_latch &= (~((ulint)1 << i));
   }
  }

Thus, it checks all the latches for the X waiters and collects a bitmask of them. That bitmask is only used if any bit is set, and if yes, the S latches are released for which the current transaction thinks it has the latch for. The individual bit info in the should_release is not used at all. The should_release and trx->has_search_latch bitmasks are not identical.

It can be rewritten as

should_release = false;
for (i = 0; i < btr_search_index_num; i++) {
  /* we should check all latches (fix Bug#791030) */
  if (UNIV_UNLIKELY(rw_lock_get_writer(btr_search_latch_part[i])
      != RW_LOCK_NOT_LOCKED)) {
   should_release = true;
   break;
  }
}

if (UNIV_UNLIKELY(should_release) {
trx_search_latch_release_if_reserved();

Revision history for this message

Alexey Kopytov (akopytov) wrote on 2013-08-28:

#18

And even that is suboptimal. The proposed code just interrupts the loop early as soon as we found a latch with X waiters. 2 problems:

1. the current thread will release all latches, even if there are only X waiters on latches it doesn't own.
2. the current thread will release all latches, though it may be sufficient to release only some.

I will comment separately on the 5.6 AHI port.

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2013-09-09:

#19

The questionable code has been removed with the fix for bug 1218347.

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-25:

#20

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-480

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.