Merge lp:~sergei.glushchenko/percona-xtrabackup/BT30559-kill-long-selects into lp:percona-xtrabackup/2.1

Proposed by Sergei Glushchenko
Status: Merged
Approved by: Alexey Kopytov
Approved revision: no longer in the source branch.
Merged at revision: 613
Proposed branch: lp:~sergei.glushchenko/percona-xtrabackup/BT30559-kill-long-selects
Merge into: lp:percona-xtrabackup/2.1
Diff against target: 514 lines (+421/-3)
2 files modified
innobackupex (+240/-3)
test/t/kill_long_selects.sh (+181/-0)
To merge this branch: bzr merge lp:~sergei.glushchenko/percona-xtrabackup/BT30559-kill-long-selects
Reviewer Review Type Date Requested Status
Alexey Kopytov (community) Approve
Review via email: mp+164748@code.launchpad.net

Description of the change

      Blueprint https://blueprints.launchpad.net/percona-xtrabackup/+spec/safe-ftwrl
      Try to catch the moment when no long queries are running.
      Try to kill all the queries which block the global lock.
      To determine a good moment, innobackupex analyses
      SHOW FULL PROCESSLIST output, particularly looking into
      Time field. If there is no process with Time greater than
      specified threshold, the moment is considered to be good.
      If no good moment been chatched during specified time,
      innobackupex bail out.
      After the global lock issued, spawn child process to kill
      the queries which bloc the global lock.

There is a lot of Jenkins failures, most of them are unrelated.
There are potential issues with test case as it rely heavily on timings. Suggestions to improve are welcome.

http://jenkins.percona.com/view/XtraBackup/job/percona-xtrabackup-2.1-param/315/

To post a comment you must log in.
Revision history for this message
Alexey Kopytov (akopytov) wrote :

Sergei,

There are conflicts in innobackupex. Please also re-run Jenkins tests with exactly the same branch as has been submitted for reviews.

review: Needs Fixing
Revision history for this message
Sergei Glushchenko (sergei.glushchenko) wrote :
Revision history for this message
Alexey Kopytov (akopytov) wrote :

Sergei,

Thanks. I would like the following things in user-visible text to be fixed before it is merged. Otherwise approved.

   - s/enales/enables/
   - s/"before start "FLUSH TABLE WITH READ LOCK"/before FLUSH TABLES
     WITH READ LOCK is started by innobackupex/
   - s/time in seconds to wait moment when there are no.../time in
     seconds to wait until there are no.../
   - s/This option specifies number of seconds innobackupex will wait
     after it issued the global lock and before it start to kill
     queries/This option specifies the number of seconds innobackupex
     waits between starting FLUSH TABLES WITH READ LOCK and killing
     those queries that block it./
   - s/which means no query killing/which means innobackupex will not
     attempt to kill any queries/
   - s/If value is not 0, innobackupex will wait not longer than
     specified amount of seconds for a good moment to issue the global
     lock. If no good moment being catched, innobackupex will bail out./
     This option specifies time in seconds that innobackupex should
     wait for queries that would block FTWRL before running it. If
     there are still such queries when the timeout expires,
     innobackupex terminates with an error.
   - s/Default is 0 which means no waiting for a good moment, just issue
     the global lock mandatory./Default is 0, in which case
     innobackupex does not wait for queries to complete and starts FTWRL
     immediately./
   - s/This option specifes the minimum age of a query to be considered as
     "long running" and potentially a blocker of the global lock. Such a
     queries will prevent the global lock from being issued (see
     --lock-wait-timeout).This option does not have any meaning if
     --lock-wait-timeout=0./
      This option specifies the query run time threshold which is used by
     innobackupex to detect long-running queries with a non-zero value
     of --lock-wait-timeout. FTWRL is not started until such
     long-running queries exist. This option has no effect if
     --lock-wait-timeout is 0.

review: Approve
Revision history for this message
Sergei Glushchenko (sergei.glushchenko) wrote :

Alexey,

I made required changes. Also I removed two documentation items, which turned out to be a garbage.

 =item --ftwrl-wait
and
 =item --ftwrl-wait-timeout=SECONDS

I run test locally and, re-pushed the branch, but did not start Jenkins build.

Thanks,
Sergei.

Revision history for this message
Alexey Kopytov (akopytov) :
review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'innobackupex'
--- innobackupex 2013-05-22 10:25:21 +0000
+++ innobackupex 2013-05-26 07:22:28 +0000
@@ -202,6 +202,26 @@
202# process id of ibbackup program (runs as a child process of this script)202# process id of ibbackup program (runs as a child process of this script)
203my $ibbackup_pid = '';203my $ibbackup_pid = '';
204204
205# process id of long queries killer
206my $query_killer_pid;
207
208# set kill long queries timeout in seconds
209my $option_kill_long_queries_timeout = 0;
210
211# waiting for an appropriate time to start FTWRL timeout
212my $option_lock_wait_timeout = 0;
213
214# how old should be query to be waited for
215my $option_lock_wait_threshold = 60;
216
217# which type of queries we are waiting for during the pre-FTWRL phase
218# possible values are "update" and "all"
219my $option_lock_wait_query_type = "all";
220
221# which type of queries wa are waiting when clearing the way for FTWRL
222# by killing; possible values are "select" and "all"
223my $option_kill_long_query_type = "all";
224
205# a counter for numbering mysql connection checks225# a counter for numbering mysql connection checks
206my $hello_id = 0;226my $hello_id = 0;
207227
@@ -277,7 +297,7 @@
277 if ($mysql{dbh}) {297 if ($mysql{dbh}) {
278 print STDERR "Connected successfully\n";298 print STDERR "Connected successfully\n";
279 $option_ibbackup_binary = set_xtrabackup_version();299 $option_ibbackup_binary = set_xtrabackup_version();
280 mysql_close();300 mysql_close(\%mysql);
281 } else {301 } else {
282 die "Failed to connect to MySQL server to detect version.\nYou must set xtrabackup version to use with --ibbackup option.\nPossible values are xtrabackup_55 (for MySQL 5.5), xtrabackup_56 (for MySQL or Percona Server 5.6, or xtrabackup (for MySQL 5.1 with InnoDB plugin or Percona Server)\n";302 die "Failed to connect to MySQL server to detect version.\nYou must set xtrabackup version to use with --ibbackup option.\nPossible values are xtrabackup_55 (for MySQL 5.5), xtrabackup_56 (for MySQL or Percona Server 5.6, or xtrabackup (for MySQL 5.1 with InnoDB plugin or Percona Server)\n";
283 }303 }
@@ -1417,6 +1437,7 @@
1417#1437#
1418# mysql_query subroutine send a query to MySQL server child process.1438# mysql_query subroutine send a query to MySQL server child process.
1419# Parameters:1439# Parameters:
1440# con mysql connection
1420# query query to execute1441# query query to execute
1421#1442#
1422sub mysql_query {1443sub mysql_query {
@@ -1439,6 +1460,10 @@
1439 } elsif ($query eq 'SHOW SLAVE STATUS') {1460 } elsif ($query eq 'SHOW SLAVE STATUS') {
1440 $con->{slave_status} =1461 $con->{slave_status} =
1441 $con->{dbh}->selectrow_hashref("SHOW SLAVE STATUS");1462 $con->{dbh}->selectrow_hashref("SHOW SLAVE STATUS");
1463 } elsif ($query eq 'SHOW PROCESSLIST' or
1464 $query eq "SHOW FULL PROCESSLIST") {
1465 $con->{processlist} =
1466 $con->{dbh}->selectall_hashref($query, "Id");
1442 } else {1467 } else {
1443 $con->{dbh}->do($query);1468 $con->{dbh}->do($query);
1444 }1469 }
@@ -1602,16 +1627,183 @@
1602 $mysql_slave_position = "master host '$master', filename '$filename', position $position";1627 $mysql_slave_position = "master host '$master', filename '$filename', position $position";
1603}1628}
16041629
1630sub eat_sql_whitespace {
1631 my ($query) = @_;
1632
1633 while (1) {
1634 if ($query =~ m/^\/\*/) {
1635 $query =~ s/^\/\*.*?\*\///s;
1636 } elsif ($query =~ m/^[ \t\n\r]/s) {
1637 $query =~ s/^[ \t\n\r]+//s;
1638 } elsif ($query =~ m/^\(/) {
1639 $query =~ s/^\(//;
1640 } else {
1641 return $query;
1642 }
1643 }
1644
1645}
1646
1647
1648sub is_query {
1649 my ($query) = @_;
1650
1651 $query = eat_sql_whitespace($query);
1652 if ($query =~
1653 m/^(insert|update|delete|replace|alter|load|select|do|handler|call|execute|begin)/i) {
1654 return 1;
1655 }
1656
1657 return 0;
1658}
1659
1660
1661sub is_select_query {
1662 my ($query) = @_;
1663
1664 $query = eat_sql_whitespace($query);
1665 if ($query =~ m/^select/i) {
1666 return 1;
1667 }
1668
1669 return 0;
1670}
1671
1672
1673sub is_update_query {
1674 my ($query) = @_;
1675
1676 $query = eat_sql_whitespace($query);
1677 if ($query =~ m/^(insert|update|delete|replace|alter|load)/i) {
1678 return 1;
1679 }
1680
1681 return 0;
1682}
1683
1684
1685sub have_queries_to_wait_for {
1686 my ($con, $threshold) = @_;
1687
1688 $now = current_time();
1689
1690 mysql_query($con, "SHOW FULL PROCESSLIST");
1691
1692 my $processlist = $con->{processlist};
1693
1694 while (my ($id, $process) = each %$processlist) {
1695 if (defined($process->{Info}) &&
1696 $process->{Time} >= $threshold &&
1697 (($option_lock_wait_query_type eq "all" &&
1698 is_query($process->{Info})) ||
1699 is_update_query($process->{Info}))) {
1700 print STDERR "\n$now $prefix Waiting for query $id (duration " .
1701 "$process->{Time} sec): $process->{Info}\n";
1702 return 1;
1703 }
1704 }
1705
1706 return 0;
1707}
1708
1709
1710sub kill_long_queries {
1711 my ($con, $timeout) = @_;
1712
1713 $now = current_time();
1714
1715 mysql_query($con, "SHOW FULL PROCESSLIST");
1716
1717 my $processlist = $con->{processlist};
1718
1719 while (my ($id, $process) = each %$processlist) {
1720 if (defined($process->{Info}) &&
1721 $process->{Time} >= $timeout &&
1722 (($option_kill_long_query_type eq "all" &&
1723 is_query($process->{Info})) ||
1724 is_select_query($process->{Info}))) {
1725 print STDERR "\n$now $prefix Killing query $id (duration " .
1726 "$process->{Time} sec): $process->{Info}\n";
1727 mysql_query($con, "KILL $id");
1728 }
1729 }
1730}
1731
1732
1733sub wait_for_no_updates {
1734 my ($con, $timeout, $threshold) = @_;
1735 my $start_time = time();
1736
1737 while (time() <= $start_time + $timeout) {
1738 if (!(have_queries_to_wait_for($con, $threshold))) {
1739 return;
1740 }
1741 sleep(1);
1742 }
1743
1744 Die "Unable to obtain lock. Please try again.";
1745}
1746
1747
1748sub start_query_killer {
1749
1750 my ($kill_timeout, $pcon) = @_;
1751 my $start_time = time();
1752 my $pid = fork();
1753
1754 if ($pid) {
1755 # parent process
1756 $query_killer_pid = $pid;
1757 } else {
1758 # child process
1759 my $end = 0;
1760 local $SIG{HUP} = sub { $end = 1 };
1761
1762 $pcon->{dbh}->{InactiveDestroy} = 1;
1763
1764 sleep($kill_timeout);
1765
1766 my %con = mysql_connect(abort_on_error => 1, keepalives => 0);
1767
1768 while (!$end) {
1769 kill_long_queries(\%con, time() - $start_time);
1770 sleep(1);
1771 }
1772
1773 mysql_close(\%con);
1774
1775 exit(0);
1776 }
1777}
1778
1779sub stop_query_killer {
1780 if (defined($query_killer_pid)) {
1781 kill 'HUP' => $query_killer_pid;
1782 waitpid($query_killer_pid, 0);
1783 print STDERR "Query killing process is finished\n";
1784 }
1785}
1786
16051787
1606#1788#
1607# mysql_lockall subroutine puts a read lock on all tables in all databases.1789# mysql_lockall subroutine puts a read lock on all tables in all databases.
1608# 1790#
1609sub mysql_lockall {1791sub mysql_lockall {
1610 my $con = shift;1792 my $con = shift;
1793 my $queries_hash_ref;
1794
1795 if ($option_lock_wait_timeout) {
1796 wait_for_no_updates($con, $option_lock_wait_timeout,
1797 $option_lock_wait_threshold);
1798 }
16111799
1612 $now = current_time();1800 $now = current_time();
1613 print STDERR "$now $prefix Starting to lock all tables...\n";1801 print STDERR "$now $prefix Starting to lock all tables...\n";
16141802
1803 # start query killer process
1804 if ($option_kill_long_queries_timeout) {
1805 start_query_killer($option_kill_long_queries_timeout, $con);
1806 }
1615 if (compare_versions($mysql_server_version, '4.0.22') == 01807 if (compare_versions($mysql_server_version, '4.0.22') == 0
1616 || compare_versions($mysql_server_version, '4.1.7') == 0) {1808 || compare_versions($mysql_server_version, '4.1.7') == 0) {
1617 # MySQL server version is 4.0.22 or 4.1.71809 # MySQL server version is 4.0.22 or 4.1.7
@@ -1622,9 +1814,12 @@
1622 mysql_query($con, "FLUSH TABLES WITH READ LOCK");1814 mysql_query($con, "FLUSH TABLES WITH READ LOCK");
1623 mysql_query($con, "COMMIT");1815 mysql_query($con, "COMMIT");
1624 }1816 }
1817 # stop query killer process
1818 if ($option_kill_long_queries_timeout) {
1819 stop_query_killer();
1820 }
1625 write_binlog_info($con);1821 write_binlog_info($con);
1626 write_galera_info($con) if $option_galera_info;1822 write_galera_info($con) if $option_galera_info;
1627
1628 $now = current_time();1823 $now = current_time();
1629 print STDERR "$now $prefix All tables locked and flushed to disk\n";1824 print STDERR "$now $prefix All tables locked and flushed to disk\n";
1630}1825}
@@ -1657,6 +1852,7 @@
1657 kill($kill_signal, $ibbackup_pid);1852 kill($kill_signal, $ibbackup_pid);
1658 $ibbackup_pid = '';1853 $ibbackup_pid = '';
1659 }1854 }
1855 stop_query_killer();
1660}1856}
16611857
16621858
@@ -1996,7 +2192,15 @@
1996 'rebuild-indexes' => \$option_rebuild_indexes,2192 'rebuild-indexes' => \$option_rebuild_indexes,
1997 'rebuild-threads=i' => \$option_rebuild_threads,2193 'rebuild-threads=i' => \$option_rebuild_threads,
1998 'debug-sleep-before-unlock=i' =>2194 'debug-sleep-before-unlock=i' =>
1999 \$option_debug_sleep_before_unlock2195 \$option_debug_sleep_before_unlock,
2196 'kill-long-queries-timeout=i' =>
2197 \$option_kill_long_queries_timeout,
2198 'kill-long-query-type=s' =>
2199 \$option_kill_long_query_type,
2200 'lock-wait-timeout=i' => \$option_lock_wait_timeout,
2201 'lock-wait-threshold=i' => \$option_lock_wait_threshold,
2202 'lock-wait-query-type=s' =>
2203 \$option_lock_wait_query_type
2000 );2204 );
2001 2205
2002 if (!$rcode) {2206 if (!$rcode) {
@@ -2036,6 +2240,18 @@
2036 $option_compress = 0;2240 $option_compress = 0;
2037 }2241 }
20382242
2243 # validate lock-wait-query-type and kill-long-query-type values
2244 if (!(grep {$_ eq $option_lock_wait_query_type} qw/all update/)) {
2245 Die "Wrong value of lock-wait-query-type. ".
2246 "Possible values are all|update, but $option_lock_wait_query_type ".
2247 "is specified.";
2248 }
2249 if (!(grep {$_ eq $option_kill_long_query_type} qw/all select/)) {
2250 Die "Wrong value of kill-long-query-type. ".
2251 "Possible values are all|select, but $option_kill_long_query_type ".
2252 "is specified.";
2253 }
2254
2039 if ($option_stream eq 'tar') {2255 if ($option_stream eq 'tar') {
2040 $stream_cmd = 'tar chf -';2256 $stream_cmd = 'tar chf -';
2041 } elsif ($option_stream eq 'xbstream') {2257 } elsif ($option_stream eq 'xbstream') {
@@ -3148,6 +3364,27 @@
31483364
3149This option specifies the log sequence number (LSN) to use for the incremental backup. The option accepts a string argument. It is used with the --incremental option. It is used instead of specifying --incremental-basedir. For databases created by MySQL and Percona Server 5.0-series versions, specify the LSN as two 32-bit integers in high:low format. For databases created in 5.1 and later, specify the LSN as a single 64-bit integer.3365This option specifies the log sequence number (LSN) to use for the incremental backup. The option accepts a string argument. It is used with the --incremental option. It is used instead of specifying --incremental-basedir. For databases created by MySQL and Percona Server 5.0-series versions, specify the LSN as two 32-bit integers in high:low format. For databases created in 5.1 and later, specify the LSN as a single 64-bit integer.
31503366
3367=item --kill-long-queries-timeout=SECONDS
3368
3369This option specifies the number of seconds innobackupex waits between starting FLUSH TABLES WITH READ LOCK and killing those queries that block it. Default is 0 seconds, which means innobackupex will not attempt to kill any queries.
3370
3371=item --kill-long-query-type=all|update
3372
3373This option specifies which types of queries should be killed to unblock the global lock. Default is "all".
3374
3375=item --lock-wait-timeout=SECONDS
3376
3377This option specifies time in seconds that innobackupex should wait for queries that would block FTWRL before running it. If there are still such queries when the timeout expires, innobackupex terminates with an error.
3378Default is 0, in which case innobackupex does not wait for queries to complete and starts FTWRL immediately.
3379
3380=item --lock-wait-threshold=SECONDS
3381
3382This option specifies the query run time threshold which is used by innobackupex to detect long-running queries with a non-zero value of --lock-wait-timeout. FTWRL is not started until such long-running queries exist. This option has no effect if --lock-wait-timeout is 0. Default value is 60 seconds.
3383
3384=item --lock-wait-query-type=all|update
3385
3386This option specifies which types of queries are allowed to complete before innobackuped will issue the global lock. Default is all.
3387
3151=item --move-back3388=item --move-back
31523389
3153Move all the files in a previously made backup from the backup directory to the actual datadir location. Use with caution, as it removes backup files.3390Move all the files in a previously made backup from the backup directory to the actual datadir location. Use with caution, as it removes backup files.
31543391
=== added file 'test/t/kill_long_selects.sh'
--- test/t/kill_long_selects.sh 1970-01-01 00:00:00 +0000
+++ test/t/kill_long_selects.sh 2013-05-26 07:22:28 +0000
@@ -0,0 +1,181 @@
1############################################################################
2# Test kill-long-queries and kill-long-queries-timeout optins
3############################################################################
4
5. inc/common.sh
6
7function bg_run()
8{
9 local varname=$1
10
11 shift
12
13 ( for cmd in "$@"
14 do
15 eval "$cmd"
16 done ) &
17
18 local pid=$!
19
20 eval "$varname=$pid"
21
22}
23
24function mysql_select()
25{
26 vlog "Run select query with duration $1 seconds"
27 ${MYSQL} ${MYSQL_ARGS} -c test 2> /dev/null <<EOF
28 /* Run background /*SELECT*\ */
29 (
30 SELECT SLEEP($1) FROM t1 FOR UPDATE
31 ) UNION ALL
32 (
33 SELECT 1
34 );
35EOF
36}
37
38function mysql_update()
39{
40 vlog "Run update query with duration $1 seconds"
41 ${MYSQL} ${MYSQL_ARGS} -c test 2> /dev/null <<EOF
42 /* This is not SELECT but rather an /*UPDATE*\
43 query */
44 UPDATE t1 SET a = SLEEP($1);
45EOF
46}
47
48
49function bg_kill_ok()
50{
51 vlog "Killing $1, expecting it is alive"
52 run_cmd kill $1
53}
54
55
56function bg_wait_ok()
57{
58 vlog "Waiting for $1, expecting it's success"
59 run_cmd wait $1
60}
61
62
63function bg_wait_fail()
64{
65 vlog "Waiting for $1, expecting it would fail"
66 run_cmd_expect_failure wait $1
67}
68
69function kill_all_queries()
70{
71 run_cmd $MYSQL $MYSQL_ARGS test <<EOF
72 select concat('KILL ',id,';') from information_schema.processlist
73 where user='root' and time > 2 into outfile '$MYSQLD_TMPDIR/killall.sql';
74 source $MYSQLD_TMPDIR/killall.sql;
75EOF
76 rm -f $MYSQLD_TMPDIR/killall.sql
77}
78
79start_server --innodb_file_per_table
80
81run_cmd $MYSQL $MYSQL_ARGS test <<EOF
82CREATE TABLE t1(a INT) ENGINE=InnoDB;
83INSERT INTO t1 VALUES (1);
84EOF
85
86mkdir $topdir/full
87
88# ==============================================================
89vlog "===================== case 1 ====================="
90bg_run bg_select_pid "mysql_select 3"
91bg_run bg_update_pid "sleep 1" "mysql_update 3"
92
93innobackupex $topdir/full --kill-long-queries-timeout=5 \
94 --kill-long-query-type=all
95
96bg_wait_ok $bg_select_pid
97bg_wait_ok $bg_update_pid
98kill_all_queries
99
100
101# ==============================================================
102vlog "===================== case 2 ====================="
103bg_run bg_select_pid "mysql_select 200"
104bg_run bg_update_pid "sleep 1" "mysql_update 5"
105
106innobackupex $topdir/full --kill-long-queries-timeout=3 \
107 --kill-long-query-type=select
108
109bg_wait_fail $bg_select_pid
110bg_wait_ok $bg_update_pid
111kill_all_queries
112
113
114# ==============================================================
115vlog "===================== case 3 ====================="
116bg_run bg_select_pid "mysql_select 200"
117bg_run bg_update_pid "mysql_update 200"
118
119innobackupex $topdir/full --kill-long-queries-timeout=3 \
120 --kill-long-query-type=all
121
122bg_wait_fail $bg_select_pid
123bg_wait_fail $bg_update_pid
124kill_all_queries
125
126
127# ==============================================================
128vlog "===================== case 4 ====================="
129bg_run bg_select_pid "mysql_select 200"
130bg_run bg_update_pid "mysql_update 200"
131
132sleep 1
133
134run_cmd_expect_failure ${IB_BIN} ${IB_ARGS} $topdir/full \
135 --lock-wait-timeout=3 \
136 --lock-wait-query-type=all \
137 --lock-wait-threshold=1 \
138 --kill-long-queries-timeout=1 \
139 --kill-long-query-type=all
140
141bg_kill_ok $bg_select_pid
142bg_kill_ok $bg_update_pid
143kill_all_queries
144
145
146# ==============================================================
147vlog "===================== case 5 ====================="
148bg_run bg_select_pid "mysql_select 200"
149bg_run bg_update_pid "mysql_update 200"
150
151sleep 2
152
153run_cmd_expect_failure ${IB_BIN} ${IB_ARGS} $topdir/full \
154 --lock-wait-timeout=3 \
155 --lock-wait-query-type=update \
156 --lock-wait-threshold=2 \
157 --kill-long-queries-timeout=1 \
158 --kill-long-query-type=all
159
160bg_kill_ok $bg_select_pid
161bg_kill_ok $bg_update_pid
162kill_all_queries
163
164
165# ==============================================================
166vlog "===================== case 6 ====================="
167bg_run bg_update_pid "mysql_update 5"
168bg_run bg_select_pid "sleep 1" "mysql_select 200"
169
170sleep 2
171
172innobackupex $topdir/full \
173 --lock-wait-timeout=6 \
174 --lock-wait-query-type=update \
175 --lock-wait-threshold=2 \
176 --kill-long-queries-timeout=1 \
177 --kill-long-query-type=all
178
179bg_wait_fail $bg_select_pid
180bg_wait_ok $bg_update_pid
181kill_all_queries

Subscribers

People subscribed via source and target branches