If FLUSH TABLES WITH READ LOCK fails backup hangs

Bug #1311120 reported by Dmitry Gribov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona XtraBackup moved to https://jira.percona.com/projects/PXB
New
Undecided
Unassigned

Bug Description

If innobackupex fails to execute "FLUSH TABLES WITH READ LOCK" it dies (leaving ibbackup running). It should be more persistent upon acquiring a lock and it should shut down ibbackup properly.

Revision history for this message
Dmitry Gribov (grib-d) wrote :
Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

@Dmitry,

Did you see this when innobackupex is itself a child process of
another wrapper? ie. SST or any other kind of wrapper.

If so, then it may be related to
https://bugs.launchpad.net/percona-xtrabackup/+bug/1294782 wherein it is
due to signal handling of SIGINT.

Revision history for this message
Dmitry Gribov (grib-d) wrote :

It's innobackupex running as SST from mysqld, xtradb cluster.
And, rethinking the fix I propose. I've added wait_for_ibbackup_finish(); in set lock error handler,, but perhaps it should be added to kill_child_processes(); instead to stop ibbackup on ANY error. No time for close testing, we use it this way and it's working fine.
I believe 1294782 is the same exact problem, yes. And simple adding wait_for_ibbackup_finish() to kill_child_processes() should fix 1294782.
Yet retrying "FLUSH TABLES WITH READ LOCK" at least several times worth using too, as this minor error fails the whole SST at once, and then we have to make new SST. Too time-expensive.

btw, I see no "begin" in the current innobackupex , buti see "commit". I've spend half an hour trying to understand why it is here, perhaps it's left from some old mechanics and should be removed along with compare_versions($mysql_server_version, '4.1.7') == 0)

Revision history for this message
Alexey Kopytov (akopytov) wrote :

Closing as a duplicate of bug #1294782.

Dmitry,

Regarding that "COMMIT", yes, it is a part of some legacy code that should be removed. Originally it was "SET AUTOCOMMIT=0; INSERT <some dummy value as a binlog marker into a dummy table>; FTWRL; COMMIT;".

I'll make sure to remove that code along with the fix for bug #1294782.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.