lp:~akopytov/percona-xtrabackup/bug1079700-2.0

Created by Alexey Kopytov on 2013-05-06 and last modified on 2013-05-07
Get this branch:
bzr branch lp:~akopytov/percona-xtrabackup/bug1079700-2.0
Only Alexey Kopytov can upload to this branch. If you are Alexey Kopytov please log in for upload directions.

Branch merges

Related bugs

Related blueprints

Branch information

Recent revisions

556. By Alexey Kopytov on 2013-05-07

Bug #1079700: Issues with renaming/rotating tables during the backup
              stage

The problem was in the way XtraBackup handled DDL on individual
tablespaces on the backup stage. It first created a list of tablespaces
to copy which is represented by fil_system, but when starting the actual
tablespace copy operation later, it opened tablespace files by name. If
a tablespace file could not be opened, XtraBackup assumed the tablespace
got removed after the fil_system list was created and ignored the
tablespaces. This naturally only worked for cases when the tablespace
got dropped. A renamed tablespace would be missing in the resulting
backup, and tablespace rotations resulted in a missing tablespace and a
duplicate copy of another tablespace participating in rotation.

The idea of the fix is to make sure that once a tablespace is added to
fil_system, the underlying file is copied to backup with the same name
and space ID, regardless of what operations have been performed on the
tablespace/file during the backup procedure.

The only way to achieve that is to reuse file handles created when
opening tablespaces and adding them to fil_system, i.e. never attempt to
access tablespaces by name and rely on the fact the an open file handle
will always point to the same inode, even if the file is unlink()ed or
rename()d along the way. In other words, exclude the case when we are
going to copy a tablespace, but the file does not exist already.

This requires changes to fil_load_single_table_tablespace() to not close
the file handle after adding a space/node to fil_system, but keep the
node open and assign the handle to node->handle. We could use
fil_node_open_file() for that, but that function creates another handle
and open the tablespace file by name, which would still leave some room
for a race condition during the time when fil_system is populated. This
also requires XtraBackup to close the node correctly basically to comply
with various invariants enforced by fil_validate() in debug builds. This
part is implemented in XtraBackup in xb_fil_node_close_file().

We also want to reuse file handles in fil_load_single_table_tablespace()
only at the backup stage. Historically, XtraBackup patches used
recv_recovery_on to detect whether we are currently in the 'backup' or
'recovery' mode. That doesn't always work reliably. For example, we may
initialize fil_system before we start recovery to apply an incremental
backup. So recovery is still not started, but we are not in the backup
mode either. To circumvent that, the patch introduces another global
variable srv_backup_mode which is set by XtraBackup only in
xtrabackup_backup_func(). This patch also changes remote tablespaces
support in innodb56.patch to use srv_backup_mode instead of
recv_recovery_on.

Another change necessitated by the patch is ignoring deleted tablespaces
on recovery, which has been implemented in XB patches to
recv_apply_hashed_log_recs(). After MLOG_FILE_DELETE is replayed on
recovery and the underlying file is deleted, possible updates to the
same tablespace done before deleting the tablespace are left in the redo
log. They will be ignored by the recovery code. This can never cause any
problems for server, because file creation/removal is done immediately,
so on recovery InnoDB detects missing tablespace files and does not the
corresponding log records to the hash table (see the first lines in
recv_add_to_hash_table). The situation is different with XtraBackup: the
underlying file will be present when recovery starts, but then get
deleted when MLOG_FILE_DELETE is replayed, i.e. after all log records
from the current batch have been added to the hash table. As a result,
log records corresponding to the deleted tablespace are ignore, but are
still left in the hash table, so recv_apply_hashed_log_recs() hangs
forever waiting for recv_sys->n_addrs to become zero. This was also
possible with XtraBackup before this fix, i.e. when a tablespace is
removed after it has been copied by XtraBackup. This fix just makes this
condition more likely to occur, as we always copy all tablespaces rather
than ignore those removed before the copy operation is tarted.

To fix the above case, we also check for deleted tablespaces in
recv_apply_hashed_log_recs(). Those corresponding to previously deleted
tablespaces are marked as processed and recv_sys->n_addrs is decremented
accordingly to not leave spurious unprocessed log records in the hash
table.

Finally, now that we reuse fil_system file handles again, bug #870119
needs another fix. I.e. the number of open file handles grows to the
number of tablespaces we want to copy, so we want to prevent InnoDB LRU
policies from kicking in and closing/reusing file handles. There are
multiple ways to achieve that. Patching InnoDB code to disable
fil_system LRU and file closing policies in InnoDB appeared to be too
risky. The easiest one is to set the allowed number of open InnoDB files
to some vary large value unconditionally (i.e. override
innodb_open_files with the maximum possible value, LONG_MAX). InnoDB
does not allocate any resource for each srv_max_n_open_files
increment. It's just the maximum possible LRU list length, so this
change does not incur any additional resource consumption.

This revision also extends bug722638.sh (since there's 95% overlap
between that one and the test case for this bug), and renames
bug722638.sh to ddl.sh to better reflect the contents. It also backports
record_db_state() / verify_db_state() from the 2.1 test suite, because
messing with individual table checksums and checksum_table looks rather
cumbersome with a higher number of tables in the test.

555. By <email address hidden> on 2013-05-06

Merge lp:~hrvojem/percona-xtrabackup/rn-2.0.7-2.0

554. By Sergei Glushchenko on 2013-05-03

Bug 1175860: Orphaned xtrabackup_pid file Breaks Cluster SST
The fix is to remove xtrabackup_pid at startup or bail out if
unlink was not successful.

551. By <email address hidden> on 2013-04-30

Merge lp:~hrvojem/percona-xtrabackup/bug1158243-2.0

550. By <email address hidden> on 2013-04-30

Merge lp:~hrvojem/percona-xtrabackup/bug839306-2.0

549. By Alexey Kopytov on 2013-04-29

Merge from trunk.

548. By Alexey Kopytov on 2013-04-29

Fixed incorrect merge and a compiler warning.

Branch metadata

Branch format:
Branch format 7
Repository format:
Bazaar repository format 2a (needs bzr 1.16 or later)
Stacked on:
lp:percona-xtrabackup/2.1
This branch contains Public information 
Everyone can see this information.

Subscribers