Segfault during remove_table_from_cache on slave node

Bug #482813 reported by Alex Yurchenko
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL patches by Codership
Fix Released
Medium
Seppo Jaakola

Bug Description

The subject says it all. Happened during sqlgen start. sqlgen was restarted in rapid succession, so there could be some remnants of the previous transactions/sessions on the nodes.

091114 16:43:21 [Warning] wsrep_abort_thd: 2551217040
091114 16:43:21 - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=8384512
read_buffer_size=131072
max_used_connections=12
max_threads=151
threads_connected=4
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 337741 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0x93bbd58
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x981073bc thread_stack 0x30000
/mnt/galera/mysql/libexec/mysqld(my_print_stacktrace+0x21)[0x849d5a1]
/mnt/galera/mysql/libexec/mysqld(handle_segfault+0x390)[0x81cf03e]
[0xbfffe420]
/mnt/galera/mysql/libexec/mysqld(wsrep_abort_thd+0x33)[0x81d8889]
/mnt/galera/mysql/libexec/mysqld(_Z23remove_table_from_cacheP3THDPKcS2_j+0x264)[0x821f97c]
/mnt/galera/mysql/libexec/mysqld(_Z15lock_table_nameP3THDP10TABLE_LISTb+0xfb)[0x81c8847]
/mnt/galera/mysql/libexec/mysqld(_Z16lock_table_namesP3THDP10TABLE_LIST+0x4e)[0x81c9474]
/mnt/galera/mysql/libexec/mysqld(_Z28lock_table_names_exclusivelyP3THDP10TABLE_LIST+0x19)[0x81c9579]
/mnt/galera/mysql/libexec/mysqld(_Z20mysql_rm_table_part2P3THDP10TABLE_LISTbbbb+0x64a)[0x82e4420]
/mnt/galera/mysql/libexec/mysqld(_Z14mysql_rm_tableP3THDP10TABLE_LISTcc+0x68)[0x82e4a2a]
/mnt/galera/mysql/libexec/mysqld(_Z21mysql_execute_commandP3THD+0x1f8f)[0x81e07b1]
/mnt/galera/mysql/libexec/mysqld(_Z11mysql_parseP3THDPKcjPS2_+0x20a)[0x81ea170]
/mnt/galera/mysql/libexec/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0xc06)[0x81eb1c6]
/mnt/galera/mysql/libexec/mysqld[0x81ec3c5]
/mnt/galera/mysql/libexec/mysqld(_Z17wsrep_bf_apply_cbPvP16wsrep_apply_data+0x42)[0x81ec4a0]
/mnt/galera/galera/lib/libmmgalera.so[0x974b0218]
/mnt/galera/galera/lib/libmmgalera.so[0x974b6a6e]
/mnt/galera/mysql/libexec/mysqld(_Z25wsrep_replication_processP3THD+0xd9)[0x81dcdfb]
/mnt/galera/mysql/libexec/mysqld(start_wsrep_THD+0x1d9)[0x81d39c5]
/lib/libpthread.so.0[0xb7e382db]
/lib/libc.so.6(clone+0x5e)[0xb7d4012e]

Revision history for this message
Alex Yurchenko (ayurchen) wrote :
Download full text (3.5 KiB)

Remaining nodes had the following in the log:

node1:
091114 16:45:24 [ERROR] Slave SQL: Error 'Table 'test.comm01' doesn't exist' on opening tables, Error_code: 1146
091114 16:45:24 [Warning] RBR event 2 apply warning: 1146
091114 16:45:24 [ERROR] mm_galera.c:apply_write_set():657: apply failed for: 7, in level: 4
091114 16:45:24 [Warning] mm_galera.c:process_query_write_set_applying():802: ws apply failed, rcode: 7, seqno: 8909824, last_seen: 8909773
091114 16:45:24 [Warning] mm_galera.c:process_query_write_set_applying():830: ws applying is not possible, 8909824 - 1454417
091114 16:45:24 [ERROR] mm_galera.c:process_query_write_set():953: could not apply trx: 8909824 1454417
091114 16:45:24 [Note] wsrep recv thread exiting (code:0)
091114 16:45:24 [Note] wsrep starting shutdown
091114 16:45:24 [Note] /mnt/galera/mysql/libexec/mysqld: Normal shutdown

node2:
091114 16:43:13 [ERROR] Slave SQL: Error 'Table 'test.comm01' doesn't exist' on opening tables, Error_code: 1146
091114 16:43:13 [Warning] RBR event 2 apply warning: 1146
091114 16:43:13 [ERROR] mm_galera.c:apply_write_set():657: apply failed for: 7, in level: 4
091114 16:43:13 [Warning] mm_galera.c:process_query_write_set_applying():802: ws apply failed, rcode: 7, seqno: 8909824, last_seen: 8909773
091114 16:43:13 [Warning] mm_galera.c:process_query_write_set_applying():830: ws applying is not possible, 8909824 - 1454416
091114 16:43:13 [ERROR] mm_galera.c:process_query_write_set():953: could not apply trx: 8909824 1454416
091114 16:43:13 [Note] wsrep recv thread exiting (code:0)
091114 16:43:13 [Note] wsrep starting shutdown
091114 16:43:13 [Note] /mnt/galera/mysql/libexec/mysqld: Normal shutdown

node3:
091114 16:43:13 [ERROR] Slave SQL: Error 'Table 'test.comm01' doesn't exist' on opening tables, Error_code: 1146
091114 16:43:13 [Warning] RBR event 2 apply warning: 1146
091114 16:43:13 [ERROR] mm_galera.c:apply_write_set():657: apply failed for: 7, in level: 4
091114 16:43:13 [Warning] mm_galera.c:process_query_write_set_applying():802: ws apply failed, rcode: 7, seqno: 8909824, last_seen: 8909773
091114 16:43:13 [Warning] mm_galera.c:process_query_write_set_applying():830: ws applying is not possible, 8909824 - 1454415
091114 16:43:13 [ERROR] mm_galera.c:process_query_write_set():953: could not apply trx: 8909824 1454415
091114 16:43:13 [Note] wsrep recv thread exiting (code:0)
091114 16:43:13 [Note] wsrep starting shutdown
Cascading view changes, dropping state message
091114 16:43:13 [Note] gcs_group.c:gcs_group_handle_comp_msg():266: New COMPONENT: primary = yes, my_idx = 1, memb_num = 2
091114 16:43:13 [Note] gcs_core.c:core_handle_comp_msg():620: STATE EXCHANGE: Waiting for state UUID.
091114 16:43:13 [Note] gcs_core.c:core_handle_uuid_msg():697: STATE EXCHANGE: sent state msg: b5f194ce-d166-11de-0800-37da9a464a48
091114 16:43:13 [Note] gcs_group.c:gcs_group_handle_state_msg():422: STATE EXCHANGE: got state msg: b5f194ce-d166-11de-0800-37da9a464a48 from 0 (ip-10-226-130-241)
091114 16:43:13 [Note] gcs_group.c:gcs_group_handle_state_msg():422: STATE EXCHANGE: got state msg: b5f194ce-d166-11de-0800-37da9a464a48 from 1 (ip-10-226-103-194)
091114 16:43:13 [Note] gcs_group.c:gcs_gro...

Read more...

Changed in codership-mysql:
assignee: nobody → Seppo Jaakola (seppo-jaakola)
Changed in codership-mysql:
status: New → In Progress
importance: Undecided → Medium
Revision history for this message
Seppo Jaakola (seppo-jaakola) wrote :

Test case is to run sqlgen load against a single node MySQL/Galera
When sqlgen is running, open mysql connection to the node, and issue some DDL
queries, like:

mysql> alter table comm00 add column (i int);
mysql> alter table comm00 drop column i;

Sooner or later, the node will segfault, here is the stack trace:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xaf4a0b90 (LWP 6227)]
0x08391c47 in ha_wsrep_abort_transaction (bf_thd=0x0, victim_thd=0xa137c18)
    at handler.cc:4714
4714 for (info= bf_thd->transaction.stmt.ha_list; info; info= info->next())
(gdb) bt
#0 0x08391c47 in ha_wsrep_abort_transaction (bf_thd=0x0, victim_thd=0xa137c18)
    at handler.cc:4714
#1 0x0825a1ff in wsrep_abort_thd (bf_thd_ptr=0x0, victim_thd_ptr=0xa137c18)
    at sql_parse.cc:8412
#2 0x082ac7d3 in remove_table_from_cache (thd=0xa137c18, db=0x9f7a9b8 "test",
    table_name=0x9f7a9bd "comm00", flags=2) at sql_base.cc:8492
#3 0x083b0f81 in wait_while_table_is_used (thd=0xa137c18, table=0x9faabc8,
    function=HA_EXTRA_PREPARE_FOR_RENAME) at sql_table.cc:4207
#4 0x083bd56d in mysql_alter_table (thd=0xa137c18, new_db=0xa1a8998 "test",
    new_name=0xa1a8798 "comm00", create_info=0xaf49f240, table_list=0xa1a87c0,
    alter_info=0xaf49f530, order_num=0, order=0x0, ignore=false)
    at sql_table.cc:7304
#5 0x08264529 in mysql_execute_command (thd=0xa137c18) at sql_parse.cc:3196
#6 0x0826e1d4 in mysql_parse (thd=0xa137c18,
    inBuf=0xa1a8700 "alter table comm00 add column (i int)", length=37,
    found_semicolon=0xaf4a0030) at sql_parse.cc:6395
#7 0x0826fc20 in dispatch_command (command=COM_QUERY, thd=0xa137c18,
    packet=0xa16c661 "alter table comm00 add column (i int)", packet_length=37)
    at sql_parse.cc:1381
#8 0x08271d8e in do_command (thd=0xa137c18) at sql_parse.cc:964
#9 0x082593a0 in handle_one_connection (arg=0xa137c18) at sql_connect.cc:1165
#10 0xb7e7850f in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
#11 0xb7d69a0e in clone () from /lib/tls/i686/cmov/libc.so.6

Revision history for this message
Seppo Jaakola (seppo-jaakola) wrote :

The implementation for converting locking session to transaction interfered with TO isolation of DDL queries.
This was fixed in change set 2920. The fix required to carry flag of locking session in thr lock info.

Here is a simple script to cause conflicting DDL for sqlgen load generator

for i in `seq 1000`; do
    echo "round: $i";
    mysql -uroot -prootpass -hrotweiler test -e"alter table comm00 add column (i int);";
    mysql -uroot -prootpass -hrotweiler test -e"alter table comm00 drop column i ;" ;
done

Revision history for this message
Seppo Jaakola (seppo-jaakola) wrote :

The orginal problem does not happen anymore. However, when testing with high conflict rate DML load and simultaneous DDL load, it can happen, that delegate node crashes for not being able to apply "replaying" write set.

Will close this bug and open new DDL/DML concurrency bug, with deadline for 0.7.1 release.

Changed in codership-mysql:
status: In Progress → Fix Committed
milestone: none → 0.7
Changed in codership-mysql:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.