Comment 1 for bug 1026926

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

This looks like a bug in non-lazy drop of table -- lock ordering is not preserved leading to deadlock (I presume you have tables being dropped around the time? ).

Does adding innodb_lazy_drop_table=ON (it is also dynamic, so no need to restart server) help?

Regarding the lock wait itself (in non lazy situation):

--Thread 140476533987072 has waited at buf0buf.c line 2529 for 303.00 seconds the semaphore:
S-lock on RW-latch at 0x416f048 '&buf_pool->page_hash_latch'
a writer (thread id 140476533987072) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file buf0buf.c line 2529
Last time write locked in file /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.24-rel26.0/storage/innobase/buf/buf0lru.c line 629

Both the waiter and the thread waited upon are same - 140476533987072. The reason being drop table path (buf0lru.c:629) also has buf0buf.c:2529 in it -- the root being buf_LRU_remove_all_pages for both.

In other words,

at buf0buf.c:2529 in buf_page_get_gen:

  rw_lock_s_lock(&buf_pool->page_hash_latch); --> it is already holding X-latch here, which it acquired at buf0lru.c:629

In btr_search_drop_page_hash_when_freed, they are already accounting for this kind of lock ordering but they are checking only for block->lock I believe.

"""
 /* If the caller has a latch on the page, then the caller must
 have a x-latch on the page and it must have already dropped
 the hash index for the page. Because of the x-latch that we
 are possibly holding, we cannot s-latch the page, but must
 (recursively) x-latch it, even though we are only reading. */
"""