handle_fatal_signal (sig=11) in rbt_free_node at ut0rbt.c:731 | Writing larger than 4GB bitmap file fails; bitmap write error causes heap corruption

Bug #1111226 reported by Roel Van de Paar
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
Fix Released
High
Laurynas Biveinis
5.1
Fix Released
High
Laurynas Biveinis
5.5
Fix Released
High
Laurynas Biveinis

Bug Description

========================= gdb :
Thread 10 (LWP 23954):
+bt
#0 0x0000000004e3969c in pthread_kill () from /lib64/libpthread.so.0
#1 0x00000000007e2779 in my_write_core (sig=11) at /ssd/QA-16274-5.5/Percona-Server-5.5.28-rel29.3/mysys/stacktrace.c:433
#2 0x00000000006ab0ea in handle_fatal_signal (sig=11) at /ssd/QA-16274-5.5/Percona-Server-5.5.28-rel29.3/sql/signal_handler.cc:249
#3 <signal handler called>
#4 rbt_free_node (node=0x0, nil=0x1040f170) at /ssd/QA-16274-5.5/Percona-Server-5.5.28-rel29.3/storage/innobase/ut/ut0rbt.c:731
#5 0x00000000009935e9 in rbt_free_node (node=0x1040f1e0, nil=0x1040f170) at /ssd/QA-16274-5.5/Percona-Server-5.5.28-rel29.3/storage/innobase/ut/ut0rbt.c:731
#6 0x00000000009935e9 in rbt_free_node (node=0x1017d7d0, nil=0x1040f170) at /ssd/QA-16274-5.5/Percona-Server-5.5.28-rel29.3/storage/innobase/ut/ut0rbt.c:731
#7 0x0000000000993693 in rbt_free (tree=0x1017d760) at /ssd/QA-16274-5.5/Percona-Server-5.5.28-rel29.3/storage/innobase/ut/ut0rbt.c:746
#8 0x000000000094e1ad in log_online_read_shutdown () at /ssd/QA-16274-5.5/Percona-Server-5.5.28-rel29.3/storage/innobase/log/log0online.c:743
#9 0x0000000000860a76 in srv_redo_log_follow_thread (arg=<optimized out>) at /ssd/QA-16274-5.5/Percona-Server-5.5.28-rel29.3/storage/innobase/srv/srv0srv.c:3093
#10 0x0000000004e34851 in start_thread () from /lib64/libpthread.so.0
#11 0x0000000005bfa11d in clone () from /lib64/libc.so.6

========================= Error log:
130126 13:58:08 InnoDB: Completed initialization of buffer pool
130126 13:58:08 InnoDB: highest supported file format is Barracuda.
InnoDB: The log sequence number in ibdata files does not match
InnoDB: the log sequence number in the ib_logfiles!
130126 13:58:08 InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
InnoDB: last tracked LSN in './ib_modified_log_2_8204.xdb' is 8204, but the last checkpoint LSN is 1597945. This might be due to a server crash or a very fast shutdown. Reading the log to advance the last tracked LSN.
InnoDB: continuing tracking changed pages from LSN 1597945
130126 13:58:12 InnoDB: Waiting for the background threads to start
130126 13:58:13 Percona XtraDB (http://www.percona.com) 1.1.8-rel29.3 started; log sequence number 1597945
130126 13:58:14 [Warning] /ssd/Percona-Server-5.5.28-rel29.3-416-debug-valgrind.Linux.x86_64/bin/mysqld: unknown option '--loose-skip-ndbcluster'
130126 13:58:14 [Warning] /ssd/Percona-Server-5.5.28-rel29.3-416-debug-valgrind.Linux.x86_64/bin/mysqld: unknown option '--loose-debug-assert-if-crashed-table'
130126 13:58:14 [Warning] /ssd/Percona-Server-5.5.28-rel29.3-416-debug-valgrind.Linux.x86_64/bin/mysqld: unknown option '--loose-debug-assert-on-error'
130126 13:58:14 [Note] Server hostname (bind-address): '0.0.0.0'; port: 13080
130126 13:58:14 [Note] - '0.0.0.0' resolves to '0.0.0.0';
130126 13:58:14 [Note] Server socket created on IP: '0.0.0.0'.
130126 13:58:14 [Warning] 'user' entry '<email address hidden>' ignored in --skip-name-resolve mode.
130126 13:58:14 [Warning] 'proxies_priv' entry '@ <email address hidden>' ignored in --skip-name-resolve mode.
130126 13:58:15 [Note] Event Scheduler: Loaded 0 events
130126 13:58:15 [Note] /ssd/Percona-Server-5.5.28-rel29.3-416-debug-valgrind.Linux.x86_64/bin/mysqld: ready for connections.
Version: '5.5.29-rel29.3-debug-log' socket: '/ssd/210222/current1_5/tmp/master.sock' port: 13080 Percona Server with XtraDB (GPL), Release rel29.3, Revision 416-debug
InnoDB: DEBUG: update_statistics for test/table100_innodb_dynamic.
InnoDB: DEBUG: update_statistics for test/table100_innodb_dynamic.
InnoDB: DEBUG: update_statistics for test/table100_innodb_dynamic.
130126 13:58:20 InnoDB: Error: Write to file ./ib_modified_log_2_8204.xdb failed at offset 35184372088832 8192.
InnoDB: 4096 bytes should have been written, only 0 were written.
InnoDB: Operating system error number 22.
InnoDB: Check that your OS and file system support files of this size.
InnoDB: Check also that the disk is not full or a disk quota exceeded.
InnoDB: Error number 22 means 'Invalid argument'.
InnoDB: Some operating system error numbers are described at
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/operating-system-error-codes.html
130126 13:58:20 InnoDB: Operating system error number 22 in a file operation.
InnoDB: Error number 22 means 'Invalid argument'.
InnoDB: Some operating system error numbers are described at
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/operating-system-error-codes.html
InnoDB: Error: failed writing changed page bitmap file './ib_modified_log_2_8204.xdb'
InnoDB: Error: log tracking bitmap write failed, stopping log tracking thread!
==23942== Thread 12:
==23942== Invalid read of size 8
==23942== at 0x9935E0: rbt_free_node (ut0rbt.c:731)
==23942== by 0x9935E8: rbt_free_node (ut0rbt.c:731)
==23942== by 0x9935E8: rbt_free_node (ut0rbt.c:731)
==23942== by 0x993692: rbt_free (ut0rbt.c:746)
==23942== by 0x94E1AC: log_online_read_shutdown (log0online.c:743)
==23942== by 0x860A75: srv_redo_log_follow_thread (srv0srv.c:3093)
==23942== by 0x4E34850: start_thread (in /lib64/libpthread-2.12.so)
==23942== by 0x179C76FF: ???
==23942== Address 0x8 is not stack'd, malloc'd or (recently) free'd
==23942==
10:58:20 UTC - mysqld got signal 11 ;

========================= Run details:
[roel@macpro2 210222]$ cat cmd113
ps -ef | grep 'cmdrun_113' | grep -v grep | awk '{print $2}' | xargs sudo kill -9
rm -Rf /ssd/210222/cmdrun_113
mkdir /ssd/210222/cmdrun_113
cd /ssd/randgen
bash -c "set -o pipefail; perl runall.pl --queries=100000000 --seed=5950 --duration=180 --querytimeout=60 --short_column_names --sqltrace --reporter=Shutdown,Backtrace,QueryTimeout,ErrorLog,ErrorLogAlarm --mysqld=--log-output=none --mysqld=--sql_mode=ONLY_FULL_GROUP_BY --grammar=conf/percona_qa/percona_qa.yy --gendata=conf/percona_qa/percona_qa.zz --basedir=/ssd/Percona-Server-5.5.28-rel29.3-416-debug-valgrind.Linux.x86_64 --valgrind --reporter=ValgrindErrors --validator=MarkErrorLog --threads=25 --views --notnull --mysqld=--innodb_track_changed_pages=1 --mysqld=--innodb_max_bitmap_file_size=1048576 --mysqld=--innodb_log_file_size=1048576 --mysqld=--innodb_log_files_in_group=2 --mysqld=--innodb_log_buffer_size=1048576 --mysqld=--innodb_log_block_size=512 --mysqld=--innodb_fast_shutdown=2 --mysqld=--innodb_adaptive_flushing_method=native --mysqld=--innodb_use_global_flush_log_at_trx_commit=0 --mysqld=--userstat --mysqld=--innodb_flush_method=O_DIRECT --mysqld=--innodb_file_per_table=1 --mtr-build-thread=733 --mask=31450 --vardir1=/ssd/210222/cmdrun_113 > /ssd/210222/cmdrun113.log 2>&1"

========================= Notes:
Note the initial crash recovery. No idea why that is happening, but I suspect it has something to do with the startup options.

Tags: bitmap qa xtradb

Related branches

Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Roel Van de Paar (roel11) wrote :
description: updated
Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

The first issue here is a a spectacular typo that causes the write to fail:

130126 13:58:20 InnoDB: Error: Write to file ./ib_modified_log_2_8204.xdb failed at offset 35184372088832 8192.

 success = os_file_write(log_bmp_sys->out.name, log_bmp_sys->out.file,
    block,
    (ulint)(log_bmp_sys->out.offset & 0xFFFFFFFF),
    (ulint)(log_bmp_sys->out.offset << 32),
    MODIFIED_PAGE_BLOCK_SIZE);

s/<</>> !

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

The second issue is that a write error for a non-first bitmap page will result in the bitmap tree with some of its nodes being pointed to from the node freelist, causing a double freeing of them on the thread cleanup.

summary: - handle_fatal_signal (sig=11) in rbt_free_node at ut0rbt.c:731
+ Writing larger than 4GB bitmap file fails; bitmap write error causes
+ heap corruption
summary: Writing larger than 4GB bitmap file fails; bitmap write error causes
- heap corruption
+ heap corruption | Was: handle_fatal_signal (sig=11) in rbt_free_node at
+ ut0rbt.c:731
summary: - Writing larger than 4GB bitmap file fails; bitmap write error causes
- heap corruption | Was: handle_fatal_signal (sig=11) in rbt_free_node at
- ut0rbt.c:731
+ handle_fatal_signal (sig=11) in rbt_free_node at ut0rbt.c:731 | Writing
+ larger than 4GB bitmap file fails; bitmap write error causes heap
+ corruption
Revision history for this message
Roel Van de Paar (roel11) wrote :

Changed title for SEO (google bug search) purposes

tags: added: xtradb
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-628

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.