Xtrabackup 2.0.7 or 2.1.2 failed: expected log block no. 5458041, but got no. 1070811265 from the log file

Bug #1206309 reported by Nickolay Ihalainen
34
This bug affects 7 people
Affects Status Importance Assigned to Milestone
Percona XtraBackup moved to https://jira.percona.com/projects/PXB
Fix Released
High
Alexey Kopytov
2.0
Fix Released
High
Alexey Kopytov
2.1
Fix Released
High
Alexey Kopytov
2.2
Fix Released
High
Alexey Kopytov

Bug Description

Affected versions:
percona-xtrabackup-2.1.2-611.rhel5
percona-xtrabackup-2.0.7-552.rhel5

xtrabackup version 2.0.7 for Percona Server 5.1.59 unknown-linux-gnu (x86_64) (revision id: 552)
xtrabackup: uses posix_fadvise().
xtrabackup: cd to /database/mysql
xtrabackup: Target instance is assumed as followings.
xtrabackup: innodb_data_home_dir = ./
xtrabackup: innodb_data_file_path = ibdata1:10M:autoextend
xtrabackup: innodb_log_group_home_dir = ./
xtrabackup: innodb_log_files_in_group = 2
xtrabackup: innodb_log_file_size = 2147483648
xtrabackup: using O_DIRECT
InnoDB: Warning: allocated tablespace 424, old maximum was 9 ...

xtrabackup: error: log block numbers mismatch:
xtrabackup: error: expected log block no. 5458041, but got no. 1070811265 from the log file.
xtrabackup: Error: xtrabackup_copy_logfile() failed.
innobackupex: Error: ibbackup child process has died at /usr/bin/innobackupex line 386.

The error emitted after 179m45.411s since backup start.

Before backup start transaction log information:
Log sequence number 186856561401671
Log flushed up to 186856559716066
Last checkpoint at 186854266314745
Max checkpoint age 3476324967
Checkpoint age target 3367689812
Modified age 2265855030
Checkpoint age 2295086926
0 pending log writes, 0 pending chkp writes
1019036192 log i/o's done, 220.02 log i/o's/second

Right after xtrabackup crash:
Log sequence number 186919780065548
Log flushed up to 186919776998233
Last checkpoint at 186917463665464
Max checkpoint age 3476324967
Checkpoint age target 3367689812
Modified age 2313548226
Checkpoint age 2316400084
0 pending log writes, 0 pending chkp writes
1020869843 log i/o's done, 88.32 log i/o's/second

There are several errors from previous runs:
 xtrabackup: error: expected log block no. 3388077, but got no. 1068741301 from the log file.

>> log scanned up to (183619734185080)
>> log scanned up to (183619741409546)
xtrabackup: error: log block numbers mismatch:
xtrabackup: error: expected log block no. 2551941, but got no. 1067905165 from the log file.

From previous month innodb recovery information:
130618 23:46:21 Percona XtraDB (http://www.percona.com) 1.0.15-12.7 started; log sequence number 166828151398485

As you can see block no. diff is always 1065353224 compare to much smaller LSN change:
(186919780065548-186856561401671)/1024/1024/1024 = 58G

Tags: i33538

Related branches

description: updated
summary: - Xtrabackup 2.0.7 failed: expected log block no. 5458041, but got no.
- 1070811265 from the log file
+ Xtrabackup 2.0.7 or 2.1.2 failed: expected log block no. 5458041, but
+ got no. 1070811265 from the log file
Revision history for this message
Alexey Kopytov (akopytov) wrote :

Please provide the InnoDB log configuration from the server.

Changed in percona-xtrabackup:
status: New → Incomplete
Revision history for this message
Nickolay Ihalainen (ihanick) wrote :

There are innodb settings from my.cnf:
innodb_file_per_table=1
innodb_max_purge_lag=100000
innodb_buffer_pool_size=120GB
innodb_log_buffer_size=80M
innodb_log_file_size=2GB
innodb_open_files=1000
innodb_flush_log_at_trx_commit=2
innodb_flush_method=O_DIRECT
innodb_thread_concurrency=32
innodb_autoinc_lock_mode=0
innodb_file_format=Barracuda
innodb_io_capacity=400

And all innodb related settings from show variables:
have_innodb YES
ignore_builtin_innodb OFF
innodb_adaptive_checkpoint estimate
innodb_adaptive_flushing OFF
innodb_adaptive_hash_index ON
innodb_additional_mem_pool_size 8388608
innodb_auto_lru_dump 0
innodb_autoextend_increment 8
innodb_autoinc_lock_mode 0
innodb_buffer_pool_shm_checksum ON
innodb_buffer_pool_shm_key 0
innodb_buffer_pool_size 128849018880
innodb_change_buffering inserts
innodb_checkpoint_age_target 0
innodb_checksums ON
innodb_commit_concurrency 0
innodb_concurrency_tickets 500
innodb_data_file_path ibdata1:10M:autoextend
innodb_data_home_dir
innodb_dict_size_limit 0
innodb_doublewrite ON
innodb_doublewrite_file
innodb_enable_unsafe_group_commit 0
innodb_expand_import 0
innodb_extra_rsegments 0
innodb_extra_undoslots OFF
innodb_fast_checksum OFF
innodb_fast_recovery OFF
innodb_fast_shutdown 1
innodb_file_format Barracuda
innodb_file_format_check Barracuda
innodb_file_per_table ON
innodb_flush_log_at_trx_commit 2
innodb_flush_log_at_trx_commit_session 3
innodb_flush_method O_DIRECT
innodb_flush_neighbor_pages 1
innodb_force_recovery 0
innodb_ibuf_accel_rate 100
innodb_ibuf_active_contract 1
innodb_ibuf_max_size 64424493056
innodb_io_capacity 400
innodb_lazy_drop_table 0
innodb_lock_wait_timeout 50
innodb_locks_unsafe_for_binlog OFF
innodb_log_block_size 512
innodb_log_buffer_size 83886080
innodb_log_file_size 2147483648
innodb_log_files_in_group 2
innodb_log_group_home_dir ./
innodb_max_dirty_pages_pct 75
innodb_max_purge_lag 100000
innodb_mirrored_log_groups 1
innodb_old_blocks_pct 37
innodb_old_blocks_time 0
innodb_open_files 1000
innodb_overwrite_relay_log_info OFF
innodb_page_size 16384
innodb_pass_corrupt_table 0
innodb_read_ahead linear
innodb_read_ahead_threshold 56
innodb_read_io_threads 4
innodb_recovery_stats OFF
innodb_replication_delay 0
innodb_rollback_on_timeout OFF
innodb_show_locks_held 10
innodb_show_verbose_locks 0
innodb_spin_wait_delay 6
innodb_stats_auto_update 1
innodb_stats_method nulls_equal
innodb_stats_on_metadata ON
innodb_stats_sample_pages 8
innodb_stats_update_need_lock 1
innodb_strict_mode OFF
innodb_support_xa ON
innodb_sync_spin_loops 30
innodb_table_locks ON
innodb_thread_concurrency 32
innodb_thread_concurrency_timer_based OFF
innodb_thread_sleep_delay 10000
innodb_use_purge_thread 1
innodb_use_sys_malloc ON
innodb_use_sys_stats_table OFF
innodb_version 1.0.15-12.7
innodb_write_io_threads 4

tags: added: i33538
Changed in percona-xtrabackup:
status: Incomplete → New
Revision history for this message
Alexey Kopytov (akopytov) wrote :

The problem is a combination of 2 things:

1. InnoDB log block numbers wrap around at 0x40000000. So the next log block after 1073741823 has number 0, rather than 1073741824.
2. XtraBackup uses the following logic to detect "old" blocks (i.e. the current log tail after which it has to stop scanning): if the current log block number is less than the current scanned log block number, it is an old one, stop scanning.

#2 doesn't work in cases when log block numbers wrap around. I.e. we might be looking at an old block with a number before they had wrapped around, but the current "expected" number has already wrapped around.

Revision history for this message
Nickolay Ihalainen (ihanick) wrote :

The bug is a regression, it's possible to make a backup with 2.0.6.

Revision history for this message
Alexey Kopytov (akopytov) wrote :

No, it's not a regression. 2 conditions must be met in order to hit this bug:

1. The log block numbers must wrap around, which only happens once per 1 GB of log writes
2. The wrap-around point must be between the last checkpoint and the current log tail at the time the backup starts.

Which makes it not that easy to hit. Even when you hit it, it may disappear later once the server moves the last checkpoint past the numbers overwrap point.

Revision history for this message
Alexey Kopytov (akopytov) wrote :

The fix for this bug was incorrect due to a couple of unfortunate omissions. Will submit a followup fix.

Revision history for this message
egezon (egezon-berisha) wrote :

This seem to be the issue on 2.2.3 as well. Has anybody else experieince the same thing in version 2.2.3.

Please find the error below:

  >> log scanned up to (40958533803520)
     >> log scanned up to (40958533803520)
     >> log scanned up to (40958533803520)
     >> log scanned up to (40958533803520)
     >> log scanned up to (40958533803520)
     xtrabackup: error: log block numbers mismatch:
     xtrabackup: error: expected log block no. 540241360, but got no. 544435660 from the log file.
     xtrabackup: Error: xtrabackup_copy_logfile() failed.
     innobackupex: Error: The xtrabackup child process has died at /usr/bin/innobackupex line 2672.

2014-08-29 17:46:43 -0500 [PID-12657:ERROR] Trace: #0 /home/xbm/xtrabackup-manager/includes/genericBackupTaker.class.php(633): genericBackupTaker->takeBackup(Object(backupJob), Object(backupSnapshotGroup), Object(SnapshotType), Object(SnapshotMethod), Array, '5072')
     #1 /home/xbm/xtrabackup-manager/includes/rotatingBackupTaker.class.php(273): genericBackupTaker->takeIncrementalBackupSnapshot(Object(backupJob), Object(backupSnapshotGroup), Object(backupSnapshot))
     #2 /home/xbm/xtrabackup-manager/includes/backupSnapshotTaker.class.php(220): rotatingBackupTaker->takeScheduledBackupSnapshot(Object(backupJob))
     #3 /home/xbm/xtrabackup-manager/includes/cliHandler.class.php(1008): backupSnapshotTaker->takeScheduledBackupSnapshot(Object(scheduledBackup))
     #4 /home/xbm/xtrabackup-manager/includes/cliHandler.class.php(145): cliHandler->handleBackupActions(Array, Object(Args))
     #5 /home/xbm/xtrabackup-manager/xbm(36): cliHandler->handleArguments(Array)
     #6 {main}

Revision history for this message
egezon (egezon-berisha) wrote :

Ignore the extra info in the comment above please.

Here is the tail:

>> log scanned up to (40958533803520)
>> log scanned up to (40958533803520)
>> log scanned up to (40958533803520)
>> log scanned up to (40958533803520)
xtrabackup: error: log block numbers mismatch:
xtrabackup: error: expected log block no. 540241360, but got no. 544435660 from the log file.
xtrabackup: Error: xtrabackup_copy_logfile() failed.
innobackupex: Error: The xtrabackup child process has died at /usr/bin/innobackupex line 2672.

Revision history for this message
Wagner Bianchi (wagnerbianchijr) wrote :

Hey Guys, I hit the same bug streaming a backup between hosts. This is a 2TB dataset while I have the following configurations:

show engine innodb status\G select sleep(60); show engine innodb status\G
Log sequence number 170922005531445
1 row in set (0.00 sec)

1 row in set (1 min 0.00 sec)

Log sequence number 170922297590865
1 row in set (0.00 sec)

select round((170922297590865-170922005531445) / 1024 / 1024) as MB_per_minute;
+---------------+
| MB_per_minute |
+---------------+
| 279 |
+---------------+
1 row in set (0.00 sec)

select round(((170922297590865-170922005531445) / 1024 / 1024)*60) as MB_per_minute_per_hour;
+------------------------+
| MB_per_minute_per_hour |
+------------------------+
| 16712 |
+------------------------+
1 row in set (0.00 sec)

So, I have a little bit more than a file per hour, considering the below configuration.

select @@innodb_log_files_in_group, (@@innodb_log_file_size/1024/1024/1024) innodb_log_file_size, @@innodb_log_group_home_dir\G
*************************** 1. row ***************************
@@innodb_log_files_in_group: 2
innodb_log_file_size: 2.000000000000
@@innodb_log_group_home_dir: ./
1 row in set (0.00 sec)

So, the error on the xtrabackup output logs is:

xtrabackup: warning: Log block checksum mismatch (block no 688905413 at lsn 170777021876224):
expected 502359217, calculated checksum 192256968
xtrabackup: warning: this is possible when the log block has not been fully written by the server, will retry later.
160204 15:16:38 >> log scanned up to (170777021876224)
160204 15:16:39 >> log scanned up to (170777025428745)
160204 15:16:40 >> log scanned up to (170777028525963)

Why does that mismatch occurs?

xtrabackup: warning: Log block checksum mismatch (block no 688905413 at lsn 170777021876224):
expected 502359217, calculated checksum 192256968

This answer is not that completely clear to me yet:
https://bugs.launchpad.net/percona-xtrabackup/+bug/1206309/comments/3

Any additional advice here?

Revision history for this message
Wagner Bianchi (wagnerbianchijr) wrote :

Version in use:

xtrabackup version 2.3.3 based on MySQL server 5.6.24 Linux (x86_64) (revision id: 525ca7d)

Revision history for this message
Sergei Glushchenko (sergei.glushchenko) wrote :

Hi Wagner,

Initial bug report was about log block mismatch. From the output you provided I see only warning messages about log checksum mismatch which is not critical. It can be that block isn't fully written at the moment xtrabackup is reading it. Normally xtrabackup will retry starting with the block.
Does it answer your question? Do you see backup failures?

Revision history for this message
Wagner Bianchi (wagnerbianchijr) wrote :

Sure Sergei. I was suspecting about the hard drive speed or slowness as well. Thanks a lot man, cheers!!

Revision history for this message
Scott Solmonson (scott.solmonson) wrote :

Hi, we've run in to this bug as well.
The latest relevant comment is:

Alexey Kopytov (akopytov) wrote on 2013-08-10: #6
The fix for this bug was incorrect due to a couple of unfortunate omissions. Will submit a followup fix.

Please confirm if this was ever fixed, and in which version it was?
I see the status as "Fix Released" for all versions, but without diffs I can't tell if this is the final "good" fix or not, and it's not clear which versions contain the "fix".

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXB-382

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.