Merge lp:~akopytov/percona-xtrabackup/bug1294782-2.1 into lp:percona-xtrabackup/2.1

Proposed by Alexey Kopytov
Status: Merged
Approved by: Alexey Kopytov
Approved revision: no longer in the source branch.
Merged at revision: 741
Proposed branch: lp:~akopytov/percona-xtrabackup/bug1294782-2.1
Merge into: lp:percona-xtrabackup/2.1
Diff against target: 45 lines (+5/-12)
2 files modified
innobackupex.pl (+4/-11)
src/xtrabackup.cc (+1/-1)
To merge this branch: bzr merge lp:~akopytov/percona-xtrabackup/bug1294782-2.1
Reviewer Review Type Date Requested Status
Alexey Kopytov (community) Approve
Review via email: mp+217728@code.launchpad.net

Description of the change

    Bug #1294782: Innobackupex hangs on fatal error in some cases

    The problem was that the code in both innobackupex and xtrabackup, that
    was supposed to make sure no child processes are left running in case
    innobackupex got killed or failed with an error, relied on the fact the
    SIGTERM and SIGINT signals were not blocked by the xtrabackup process
    (or any other child processes spawned by innobackupex).

    When innobackupex terminates gracefully, it calls kill_child_processes()
    to send SIGTERM to its children.

    There’s also (Linux-specific) auto-termination logic in xtrabackup for
    cases when innobackupex is killed or terminates ungracefully. It worked
    as follows: the xtrabackup process called prctl(PR_SET_PDEATHSIG,
    SIGINT) on start, so that SIGINT is sent automatically in case the parent
    process (innobackupex) dies.

    However, both SIGTERM and SIGINT might be blocked by the
    process that had invoked innobackupex, for example, by the PXC server
    processes doing an SST, in which case they were also blocked by the
    xtrabackup process, since the signal mask is inherited by child
    processes.

    Fixed by replacing SIGTERM in innobackupex and SIGINT in xtrabackup
    auto-termination with SIGKILL.

    Also removed some ancient code around FTWRL. It has been unnecessary for
    a long time, which has been reported in duplicate bug #1311120.

    No test case, as it is impossible in Bash to create a signal mask that
    would be inherited by child processes. Signal masks created by ‘trap’
    are only inherited by built-in commands according to the manual.

http://jenkins.percona.com/view/PXB 2.1/job/percona-xtrabackup-2.1-param/565/

To post a comment you must log in.
Revision history for this message
Alexey Kopytov (akopytov) :
review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'innobackupex.pl'
--- innobackupex.pl 2014-03-01 13:29:34 +0000
+++ innobackupex.pl 2014-04-30 09:33:56 +0000
@@ -242,7 +242,7 @@
242 's' => ' ');242 's' => ' ');
243243
244# signal that is sent to child processes when they are killed244# signal that is sent to child processes when they are killed
245my $kill_signal = 15;245my $kill_signal = 9;
246246
247# current local time247# current local time
248my $now;248my $now;
@@ -3376,16 +3376,9 @@
3376 usleep(1);3376 usleep(1);
3377 }3377 }
3378 }3378 }
3379 if (compare_versions($mysql_server_version, '4.0.22') == 03379
3380 || compare_versions($mysql_server_version, '4.1.7') == 0) {3380 mysql_query($con, "FLUSH TABLES WITH READ LOCK");
3381 # MySQL server version is 4.0.22 or 4.1.73381
3382 mysql_query($con, "COMMIT");
3383 mysql_query($con, "FLUSH TABLES WITH READ LOCK");
3384 } else {
3385 # MySQL server version is other than 4.0.22 or 4.1.7
3386 mysql_query($con, "FLUSH TABLES WITH READ LOCK");
3387 mysql_query($con, "COMMIT");
3388 }
3389 # stop query killer process3382 # stop query killer process
3390 if ($option_kill_long_queries_timeout) {3383 if ($option_kill_long_queries_timeout) {
3391 stop_query_killer();3384 stop_query_killer();
33923385
=== modified file 'src/xtrabackup.cc'
--- src/xtrabackup.cc 2014-04-27 11:31:56 +0000
+++ src/xtrabackup.cc 2014-04-30 09:33:56 +0000
@@ -5836,7 +5836,7 @@
5836 /* Ensure xtrabackup process is killed when the parent one5836 /* Ensure xtrabackup process is killed when the parent one
5837 (innobackupex) is terminated with an unhandled signal */5837 (innobackupex) is terminated with an unhandled signal */
58385838
5839 if (prctl(PR_SET_PDEATHSIG, SIGINT)) {5839 if (prctl(PR_SET_PDEATHSIG, SIGKILL)) {
5840 msg("prctl() failed with errno = %d\n", errno);5840 msg("prctl() failed with errno = %d\n", errno);
5841 exit(EXIT_FAILURE);5841 exit(EXIT_FAILURE);
5842 }5842 }

Subscribers

People subscribed via source and target branches

to all changes: