InnoDB: Failing assertion: buf_page_in_file(bpage) in file buf0flu.cc line 2674 | abort (sig=6) in buf_pool_get_dirty_pages_count on ALTER TABLE

Bug #1227581 reported by Roel Van de Paar
24
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
Fix Released
Critical
Laurynas Biveinis
5.1
Won't Fix
Low
Unassigned
5.5
Fix Released
High
Laurynas Biveinis
5.6
Fix Released
Critical
Laurynas Biveinis

Bug Description

2013-09-18 19:40:43 11049 [Note] /ssd/Percona-Server-5.6.13-rc60.5-440-debug.Linux.x86_64/bin/mysqld: ready for connections.
Version: '5.6.13-rc60.5-debug-log' socket: '/ssd//228188/current1_1/tmp/master.sock' port: 13000 Percona Server with XtraDB (GPL), Release rc60.5, Revision 440-debug
2013-09-18 19:46:31 7fcf40229700 InnoDB: Assertion failure in thread 140528110966528 in file buf0flu.cc line 2674
InnoDB: Failing assertion: buf_page_in_file(bpage)
[...]
Query (7fceed859250): ALTER TABLE `table100_innodb_dynamic_int` ROW_FORMAT = COMPACT
Connection ID (thread ID): 10
Status: NOT_KILLED

Thread 1 (LWP 13807):
+bt
#0 0x000000363720c69c in pthread_kill () from /lib64/libpthread.so.0
#1 0x00000000006874ae in handle_fatal_signal (sig=6) at /bzr/5.6-QA_dbg/Percona-Server-5.6.13-rc60.5/sql/signal_handler.cc:251
#2 <signal handler called>
#3 0x0000003636e328a5 in raise () from /lib64/libc.so.6
#4 0x0000003636e34085 in abort () from /lib64/libc.so.6
#5 0x0000000000b8ba4a in buf_pool_get_dirty_pages_count (buf_pool=0x34ee028, id=78) at /bzr/5.6-QA_dbg/Percona-Server-5.6.13-rc60.5/storage/innobase/buf/buf0flu.cc:2674
#6 0x0000000000b91f70 in buf_flush_dirty_pages (buf_pool=0x34ee028, id=78, flush=false, trx=0x0) at /bzr/5.6-QA_dbg/Percona-Server-5.6.13-rc60.5/storage/innobase/buf/buf0lru.cc:731
#7 0x0000000000b9466d in buf_LRU_remove_pages (trx=0x0, buf_remove=<optimized out>, id=78, buf_pool=0x34ee028) at /bzr/5.6-QA_dbg/Percona-Server-5.6.13-rc60.5/storage/innobase/buf/buf0lru.cc:911
#8 buf_LRU_flush_or_remove_pages (id=78, buf_remove=<optimized out>, trx=0x0) at /bzr/5.6-QA_dbg/Percona-Server-5.6.13-rc60.5/storage/innobase/buf/buf0lru.cc:967
#9 0x0000000000bd245b in fil_delete_tablespace (id=78, buf_remove=BUF_REMOVE_FLUSH_NO_WRITE) at /bzr/5.6-QA_dbg/Percona-Server-5.6.13-rc60.5/storage/innobase/fil/fil0fil.cc:2753
#10 0x0000000000ad2f90 in row_drop_table_for_mysql (name=0x7fce64075f01 "\330", <incomplete sequence \341>, trx=<optimized out>, drop_db=<optimized out>, nonatomic=<optimized out>) at /bzr/5.6-QA_dbg/Percona-Server-5.6.13-rc60.5/storage/innobase/row/row0mysql.cc:4242
#11 0x0000000000a1632c in ha_innobase::commit_inplace_alter_table (this=0x7fce5403a600, altered_table=0x7fce54048e90, ha_alter_info=0x7fcf40226970, commit=<optimized out>) at /bzr/5.6-QA_dbg/Percona-Server-5.6.13-rc60.5/storage/innobase/handler/handler0alter.cc:5735
#12 0x000000000077f2ab in mysql_inplace_alter_table (thd=0x3c8ad90, table_list=0x7fceed859408, table=0x7fce540fe190, altered_table=0x7fce54048e90, ha_alter_info=0x7fcf40226970, inplace_supported=HA_ALTER_INPLACE_NO_LOCK_AFTER_PREPARE, target_mdl_request=0x7fcf40226670, alter_ctx=0x7fcf40225050) at /bzr/5.6-QA_dbg/Percona-Server-5.6.13-rc60.5/sql/sql_table.cc:6546
#13 0x0000000000784e97 in mysql_alter_table (thd=<optimized out>, new_db=0x7fce54048e90 "`\227\004T\316\177", new_name=0x7fce00000000 <Address 0x7fce00000000 out of bounds>, create_info=<optimized out>, table_list=0x7fceed859408, alter_info=0x7fcf40227110, order_num=0, order=0x0, ignore=false) at /bzr/5.6-QA_dbg/Percona-Server-5.6.13-rc60.5/sql/sql_table.cc:8399
#14 0x0000000000894957 in Sql_cmd_alter_table::execute (this=<optimized out>, thd=0x3c8ad90) at /bzr/5.6-QA_dbg/Percona-Server-5.6.13-rc60.5/sql/sql_alter.cc:317
#15 0x0000000000721e3b in mysql_execute_command (thd=0x3c8ad90) at /bzr/5.6-QA_dbg/Percona-Server-5.6.13-rc60.5/sql/sql_parse.cc:5159
#16 0x0000000000728628 in mysql_parse (thd=0x3c8ad90, rawbuf=0x3c8d150 "P\303\070\001", length=<optimized out>, parser_state=<optimized out>) at /bzr/5.6-QA_dbg/Percona-Server-5.6.13-rc60.5/sql/sql_parse.cc:6479
#17 0x000000000072978c in dispatch_command (command=COM_QUERY, thd=0x3c8ad90, packet=0x7fcf40228910 "\220\255\310\003", packet_length=62) at /bzr/5.6-QA_dbg/Percona-Server-5.6.13-rc60.5/sql/sql_parse.cc:1355
#18 0x000000000072a987 in do_command (thd=0x3c8ad90) at /bzr/5.6-QA_dbg/Percona-Server-5.6.13-rc60.5/sql/sql_parse.cc:1049
#19 0x00000000007d7f57 in threadpool_process_request (thd=0x3c8ad90) at /bzr/5.6-QA_dbg/Percona-Server-5.6.13-rc60.5/sql/threadpool_common.cc:312
#20 0x00000000007dac38 in handle_event (connection=0x3c52160) at /bzr/5.6-QA_dbg/Percona-Server-5.6.13-rc60.5/sql/threadpool_unix.cc:1466
#21 worker_main (param=0x151d000 <all_groups>) at /bzr/5.6-QA_dbg/Percona-Server-5.6.13-rc60.5/sql/threadpool_unix.cc:1519
#22 0x00000000009e033a in pfs_spawn_thread (arg=0x7fce800f1df0) at /bzr/5.6-QA_dbg/Percona-Server-5.6.13-rc60.5/storage/perfschema/pfs.cc:1858
#23 0x0000003637207851 in start_thread () from /lib64/libpthread.so.0
#24 0x0000003636ee890d in clone () from /lib64/libc.so.6

Related branches

Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Roel Van de Paar (roel11) wrote :

RQG @ 911 <email address hidden>
PS @ Percona-Server-5.6.13-rc60.5-440-debug.Linux.x86_64 (5.6-QA)

Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Roel Van de Paar (roel11) wrote :

Upgrade to qablock: seen in many runs.

tags: added: qablock
Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Roel -

Can you provide me more sets of stacktraces?

Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

This is caused by a race condition between one thread performing a tablespace delete and another doing a compressed page flush list relocation. The latter thread will remove a dirty, thus present on the flush list, uncompressed page image from page hash and LRU list, set its status to BUF_BLOCK_REMOVE_HASH. Then the former thread would call buf_pool_get_dirty_pages_count(), find such page there and assert.

A debug-only issue.

Not a bug in Oracle, since they take both flush list and buffer pool mutex in buf_pool_get_dirty_pages_count(), preventing seeing flush list just before the flush list relocation.

Even though buf_pool_get_dirty_pages_count() is not present in 5.5, a cursory glance at 5.5 suggests that the code there needs to be checked to not assume that for every page in the flush list buf_page_in_file() holds. Probably other places in 5.6 too.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

5.5 triage Low because we haven't seen any actual crash like this.

tags: added: bp-split
Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

In 5.6, if the LRU list mutex is held in addition to the flush list mutex, no BUF_BLOCK_REMOVE_HASH pages can be seen on the flush list. This is not the case for buf_do_flush_list_batch(), thus one of its subroutines buf_flush_page_and_try_neighbors() may hit the same issue. Likewise in buf_get_latched_pages_number_instance().

Revision history for this message
Roel Van de Paar (roel11) wrote :

Adding diff code path occurence

Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Indeed buf_flush_set_hp() called from buf_do_flush_list_batch() may hit the same issue.

Revision history for this message
Roel Van de Paar (roel11) wrote :

Definite improvement in revid 448 which contains fix. Still one seen. Likely diff code path again. Adding occurrence.

Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Roel Van de Paar (roel11) wrote :

s/Fix Committed/Confirmed/ temporarily for odd (half-patched) bug status. Feel free to revert to FC on additional patch.

Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Ovais Tariq (ovais-tariq) wrote :

I think I have seen a similar crash in Percona Server 5.5.34. The assertion is as follows:

131111 23:07:37 InnoDB: Assertion failure in thread 140023754589952 in
file buf0flu.c line 1608
InnoDB: Failing assertion: buf_page_in_file(bpage)

And the asserting code is inside the function buf_flush_page_and_try_neighbors()

Based on Laurynas' comment #13 the assertion raised from inside the function buf_flush_page_and_try_neighbors() looks similar to me.

Unfortunately the stack traces are not really helpful:
131111 23:07:37 InnoDB: Assertion failure in thread 140023754589952 in
file buf0flu.c line 1608
InnoDB: Failing assertion: buf_page_in_file(bpage)
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB:
http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
05:07:37 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this
binary
or one of the libraries it was linked against is corrupt, improperly
built,
or misconfigured. This error can also be caused by malfunctioning
hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/

key_buffer_size=33554432
read_buffer_size=131072
max_used_connections=33
max_threads=3002
thread_count=26
connection_count=26
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads =
6602304 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x40000
/opt/percona-server-5.5.34/bin/mysqld(my_print_stacktrace
+0x35)[0x7eb7c5]
/opt/percona-server-5.5.34/bin/mysqld(handle_fatal_signal
+0x3f4)[0x6a5ce4]
/lib64/libpthread.so.0[0x31e000f500]
/lib64/libc.so.6(gsignal+0x35)[0x31df8328e5]
/lib64/libc.so.6(abort+0x175)[0x31df8340c5]
/opt/percona-server-5.5.34/bin/mysqld[0x8953b7]
/opt/percona-server-5.5.34/bin/mysqld[0x895b8e]
/opt/percona-server-5.5.34/bin/mysqld[0x895e30]
/opt/percona-server-5.5.34/bin/mysqld[0x83b67d]
/lib64/libpthread.so.0[0x31e0007851]
/lib64/libc.so.6(clone+0x6d)[0x31df8e894d]
You may download the Percona Server operations manual by visiting
http://www.percona.com/software/percona-server/. You may find
information
in the manual which will help you identify the cause of the crash.

If this is indeed the same bug, is there any chance for the fix to be backported from 5.6.13

tags: added: i36642
Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

The 5.6 crash was BUF_BLOCK_REMOVE_HASH pages unexpectedly encountered in some paths. To see if the 5.5 crash is the same issue or something else, a "thread apply all bt" in GDB or a core dump is needed.

Revision history for this message
Ovais Tariq (ovais-tariq) wrote :

I have reported a separate bug here because this looks different to me:
https://bugs.launchpad.net/percona-server/+bug/1250762

Revision history for this message
Roel Van de Paar (roel11) wrote :

If same, i36642 may increase triage for 5.5

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :
tags: added: i37223
Revision history for this message
Alexey Bychko (abychko) wrote :
Revision history for this message
Ovais Tariq (ovais-tariq) wrote :

There was another crash with the binary provided in the issue 37223.

131216 6:39:18 InnoDB: Assertion failure in thread 140026689693440 in
file buf0buf.ic line 749
InnoDB: Failing assertion: buf_page_in_file(bpage)
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
12:39:18 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/

key_buffer_size=33554432
read_buffer_size=131072
max_used_connections=37
max_threads=3002
thread_count=29
connection_count=29
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads =
6602304 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x40000
/opt/percona-server-5.5.34/bin/mysqld(my_print_stacktrace+0x35)[0x7eb555]
/opt/percona-server-5.5.34/bin/mysqld(handle_fatal_signal+0x3f4)[0x6a5a84]
/lib64/libpthread.so.0[0x31e000f500]
/lib64/libc.so.6(gsignal+0x35)[0x31df8328e5]
/lib64/libc.so.6(abort+0x175)[0x31df8340c5]
/opt/percona-server-5.5.34/bin/mysqld[0x8a6506]
/opt/percona-server-5.5.34/bin/mysqld[0x8a69fe]
/opt/percona-server-5.5.34/bin/mysqld[0x8a6ca0]
/opt/percona-server-5.5.34/bin/mysqld[0x84c3bd]
/lib64/libpthread.so.0[0x31e0007851]
/lib64/libc.so.6(clone+0x6d)[0x31df8e894d]
You may download the Percona Server operations manual by visiting
http://www.percona.com/software/percona-server/. You may find information
in the manual which will help you identify the cause of the crash.

The assertion is still the same

Revision history for this message
Ovais Tariq (ovais-tariq) wrote :

The code path still appears to be similar as shown in the duplicate bug #1250762:

srv_master_thread
buf_flush_list
buf_flush_flush_list_batch
buf_flush_batch
buf_flush_page_and_try_neighbors

Revision history for this message
Ovais Tariq (ovais-tariq) wrote :

There was another crash after the fix. The full gdb backtrace is attached.

Revision history for this message
Ovais Tariq (ovais-tariq) wrote :
Download full text (3.2 KiB)

The symptoms appear to be similar, there is a race condition between a user thread doing BP page relocation and at the same time the master thread attempting to flush the page as part of a bigger batch.

The interesting threads are these:

Thread 1 (Thread 0x7ec739c9a700 (LWP 42350)):
...
#5 0x00000000008a6506 in buf_page_get_space (bpage=0x7ee75cabb450, flush_type=BUF_FLUSH_LIST, n_to_flush=300, count=0x7ec739c99568) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/include/buf0buf.ic:749
#6 buf_flush_page_and_try_neighbors (bpage=0x7ee75cabb450, flush_type=BUF_FLUSH_LIST, n_to_flush=300, count=0x7ec739c99568) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0flu.c:1634
#7 0x00000000008a69fe in buf_flush_flush_list_batch (buf_pool=0x2817898, flush_type=<value optimized out>, min_n=300, lsn_limit=18446744073709551615) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0flu.c:1772
#8 buf_flush_batch (buf_pool=0x2817898, flush_type=<value optimized out>, min_n=300, lsn_limit=18446744073709551615) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0flu.c:1854
#9 0x00000000008a6ca0 in buf_flush_list (min_n=300, lsn_limit=18446744073709551615) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0flu.c:2071
#10 0x000000000084c3bd in srv_master_thread (arg=<value optimized out>) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/srv/srv0srv.c:3879

Thread 22 (Thread 0x7ec67baeb700 (LWP 10836)):
...
#6 buf_flush_relocate_on_flush_list (bpage=0x7ee75cabb450, dpage=0x7eb09bd94a20) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0flu.c:613
#7 0x00000000008ab020 in buf_LRU_free_block (bpage=0x7ee75cabb450, zip=0, have_LRU_mutex=1) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0lru.c:2034
#8 0x00000000008ac217 in buf_LRU_free_from_unzip_LRU_list (buf_pool=0x2817898, n_iterations=1) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0lru.c:981
#9 buf_LRU_search_and_free_block (buf_pool=0x2817898, n_iterations=1) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0lru.c:1085
#10 0x00000000008ac8ff in buf_LRU_get_free_block (buf_pool=0x2817898) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0lru.c:1326
#11 0x000000000089b74f in buf_page_get_gen (space=1256, zip_size=4096, offset=73393, rw_latch=3, guess=0x0, mode=10, file=0xa6ad00 "/mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/btr/btr0cur.c", line=3615, mt
r=0x7ec67bae41f0) at /m...

Read more...

Revision history for this message
Ovais Tariq (ovais-tariq) wrote :

This is from "thread apply all bt" while the previous one was from "thread apply all bt full"

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

The reocurrence in comments 29-33 has been split to bug 1269352.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Yet another instance of this is bug 1305364.

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-136

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.