Merge lp:~sergei.glushchenko/percona-xtrabackup/BT30559-kill-long-selects into lp:percona-xtrabackup/2.1

Proposed by Sergei Glushchenko on 2013-05-20
Status: Merged
Approved by: Alexey Kopytov on 2013-06-10
Approved revision: 606
Merged at revision: 613
Proposed branch: lp:~sergei.glushchenko/percona-xtrabackup/BT30559-kill-long-selects
Merge into: lp:percona-xtrabackup/2.1
Diff against target: 514 lines (+421/-3)
2 files modified
innobackupex (+240/-3)
test/t/kill_long_selects.sh (+181/-0)
To merge this branch: bzr merge lp:~sergei.glushchenko/percona-xtrabackup/BT30559-kill-long-selects
Reviewer Review Type Date Requested Status
Alexey Kopytov (community) 2013-05-20 Approve on 2013-06-10
Review via email: mp+164748@code.launchpad.net

Description of the Change

      Blueprint https://blueprints.launchpad.net/percona-xtrabackup/+spec/safe-ftwrl
      Try to catch the moment when no long queries are running.
      Try to kill all the queries which block the global lock.
      To determine a good moment, innobackupex analyses
      SHOW FULL PROCESSLIST output, particularly looking into
      Time field. If there is no process with Time greater than
      specified threshold, the moment is considered to be good.
      If no good moment been chatched during specified time,
      innobackupex bail out.
      After the global lock issued, spawn child process to kill
      the queries which bloc the global lock.

There is a lot of Jenkins failures, most of them are unrelated.
There are potential issues with test case as it rely heavily on timings. Suggestions to improve are welcome.

http://jenkins.percona.com/view/XtraBackup/job/percona-xtrabackup-2.1-param/315/

To post a comment you must log in.
Alexey Kopytov (akopytov) wrote :

Sergei,

There are conflicts in innobackupex. Please also re-run Jenkins tests with exactly the same branch as has been submitted for reviews.

review: Needs Fixing
Alexey Kopytov (akopytov) wrote :

Sergei,

Thanks. I would like the following things in user-visible text to be fixed before it is merged. Otherwise approved.

   - s/enales/enables/
   - s/"before start "FLUSH TABLE WITH READ LOCK"/before FLUSH TABLES
     WITH READ LOCK is started by innobackupex/
   - s/time in seconds to wait moment when there are no.../time in
     seconds to wait until there are no.../
   - s/This option specifies number of seconds innobackupex will wait
     after it issued the global lock and before it start to kill
     queries/This option specifies the number of seconds innobackupex
     waits between starting FLUSH TABLES WITH READ LOCK and killing
     those queries that block it./
   - s/which means no query killing/which means innobackupex will not
     attempt to kill any queries/
   - s/If value is not 0, innobackupex will wait not longer than
     specified amount of seconds for a good moment to issue the global
     lock. If no good moment being catched, innobackupex will bail out./
     This option specifies time in seconds that innobackupex should
     wait for queries that would block FTWRL before running it. If
     there are still such queries when the timeout expires,
     innobackupex terminates with an error.
   - s/Default is 0 which means no waiting for a good moment, just issue
     the global lock mandatory./Default is 0, in which case
     innobackupex does not wait for queries to complete and starts FTWRL
     immediately./
   - s/This option specifes the minimum age of a query to be considered as
     "long running" and potentially a blocker of the global lock. Such a
     queries will prevent the global lock from being issued (see
     --lock-wait-timeout).This option does not have any meaning if
     --lock-wait-timeout=0./
      This option specifies the query run time threshold which is used by
     innobackupex to detect long-running queries with a non-zero value
     of --lock-wait-timeout. FTWRL is not started until such
     long-running queries exist. This option has no effect if
     --lock-wait-timeout is 0.

review: Approve

Alexey,

I made required changes. Also I removed two documentation items, which turned out to be a garbage.

 =item --ftwrl-wait
and
 =item --ftwrl-wait-timeout=SECONDS

I run test locally and, re-pushed the branch, but did not start Jenkins build.

Thanks,
Sergei.

Alexey Kopytov (akopytov) :
review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'innobackupex'
2--- innobackupex 2013-05-22 10:25:21 +0000
3+++ innobackupex 2013-05-26 07:22:28 +0000
4@@ -202,6 +202,26 @@
5 # process id of ibbackup program (runs as a child process of this script)
6 my $ibbackup_pid = '';
7
8+# process id of long queries killer
9+my $query_killer_pid;
10+
11+# set kill long queries timeout in seconds
12+my $option_kill_long_queries_timeout = 0;
13+
14+# waiting for an appropriate time to start FTWRL timeout
15+my $option_lock_wait_timeout = 0;
16+
17+# how old should be query to be waited for
18+my $option_lock_wait_threshold = 60;
19+
20+# which type of queries we are waiting for during the pre-FTWRL phase
21+# possible values are "update" and "all"
22+my $option_lock_wait_query_type = "all";
23+
24+# which type of queries wa are waiting when clearing the way for FTWRL
25+# by killing; possible values are "select" and "all"
26+my $option_kill_long_query_type = "all";
27+
28 # a counter for numbering mysql connection checks
29 my $hello_id = 0;
30
31@@ -277,7 +297,7 @@
32 if ($mysql{dbh}) {
33 print STDERR "Connected successfully\n";
34 $option_ibbackup_binary = set_xtrabackup_version();
35- mysql_close();
36+ mysql_close(\%mysql);
37 } else {
38 die "Failed to connect to MySQL server to detect version.\nYou must set xtrabackup version to use with --ibbackup option.\nPossible values are xtrabackup_55 (for MySQL 5.5), xtrabackup_56 (for MySQL or Percona Server 5.6, or xtrabackup (for MySQL 5.1 with InnoDB plugin or Percona Server)\n";
39 }
40@@ -1417,6 +1437,7 @@
41 #
42 # mysql_query subroutine send a query to MySQL server child process.
43 # Parameters:
44+# con mysql connection
45 # query query to execute
46 #
47 sub mysql_query {
48@@ -1439,6 +1460,10 @@
49 } elsif ($query eq 'SHOW SLAVE STATUS') {
50 $con->{slave_status} =
51 $con->{dbh}->selectrow_hashref("SHOW SLAVE STATUS");
52+ } elsif ($query eq 'SHOW PROCESSLIST' or
53+ $query eq "SHOW FULL PROCESSLIST") {
54+ $con->{processlist} =
55+ $con->{dbh}->selectall_hashref($query, "Id");
56 } else {
57 $con->{dbh}->do($query);
58 }
59@@ -1602,16 +1627,183 @@
60 $mysql_slave_position = "master host '$master', filename '$filename', position $position";
61 }
62
63+sub eat_sql_whitespace {
64+ my ($query) = @_;
65+
66+ while (1) {
67+ if ($query =~ m/^\/\*/) {
68+ $query =~ s/^\/\*.*?\*\///s;
69+ } elsif ($query =~ m/^[ \t\n\r]/s) {
70+ $query =~ s/^[ \t\n\r]+//s;
71+ } elsif ($query =~ m/^\(/) {
72+ $query =~ s/^\(//;
73+ } else {
74+ return $query;
75+ }
76+ }
77+
78+}
79+
80+
81+sub is_query {
82+ my ($query) = @_;
83+
84+ $query = eat_sql_whitespace($query);
85+ if ($query =~
86+ m/^(insert|update|delete|replace|alter|load|select|do|handler|call|execute|begin)/i) {
87+ return 1;
88+ }
89+
90+ return 0;
91+}
92+
93+
94+sub is_select_query {
95+ my ($query) = @_;
96+
97+ $query = eat_sql_whitespace($query);
98+ if ($query =~ m/^select/i) {
99+ return 1;
100+ }
101+
102+ return 0;
103+}
104+
105+
106+sub is_update_query {
107+ my ($query) = @_;
108+
109+ $query = eat_sql_whitespace($query);
110+ if ($query =~ m/^(insert|update|delete|replace|alter|load)/i) {
111+ return 1;
112+ }
113+
114+ return 0;
115+}
116+
117+
118+sub have_queries_to_wait_for {
119+ my ($con, $threshold) = @_;
120+
121+ $now = current_time();
122+
123+ mysql_query($con, "SHOW FULL PROCESSLIST");
124+
125+ my $processlist = $con->{processlist};
126+
127+ while (my ($id, $process) = each %$processlist) {
128+ if (defined($process->{Info}) &&
129+ $process->{Time} >= $threshold &&
130+ (($option_lock_wait_query_type eq "all" &&
131+ is_query($process->{Info})) ||
132+ is_update_query($process->{Info}))) {
133+ print STDERR "\n$now $prefix Waiting for query $id (duration " .
134+ "$process->{Time} sec): $process->{Info}\n";
135+ return 1;
136+ }
137+ }
138+
139+ return 0;
140+}
141+
142+
143+sub kill_long_queries {
144+ my ($con, $timeout) = @_;
145+
146+ $now = current_time();
147+
148+ mysql_query($con, "SHOW FULL PROCESSLIST");
149+
150+ my $processlist = $con->{processlist};
151+
152+ while (my ($id, $process) = each %$processlist) {
153+ if (defined($process->{Info}) &&
154+ $process->{Time} >= $timeout &&
155+ (($option_kill_long_query_type eq "all" &&
156+ is_query($process->{Info})) ||
157+ is_select_query($process->{Info}))) {
158+ print STDERR "\n$now $prefix Killing query $id (duration " .
159+ "$process->{Time} sec): $process->{Info}\n";
160+ mysql_query($con, "KILL $id");
161+ }
162+ }
163+}
164+
165+
166+sub wait_for_no_updates {
167+ my ($con, $timeout, $threshold) = @_;
168+ my $start_time = time();
169+
170+ while (time() <= $start_time + $timeout) {
171+ if (!(have_queries_to_wait_for($con, $threshold))) {
172+ return;
173+ }
174+ sleep(1);
175+ }
176+
177+ Die "Unable to obtain lock. Please try again.";
178+}
179+
180+
181+sub start_query_killer {
182+
183+ my ($kill_timeout, $pcon) = @_;
184+ my $start_time = time();
185+ my $pid = fork();
186+
187+ if ($pid) {
188+ # parent process
189+ $query_killer_pid = $pid;
190+ } else {
191+ # child process
192+ my $end = 0;
193+ local $SIG{HUP} = sub { $end = 1 };
194+
195+ $pcon->{dbh}->{InactiveDestroy} = 1;
196+
197+ sleep($kill_timeout);
198+
199+ my %con = mysql_connect(abort_on_error => 1, keepalives => 0);
200+
201+ while (!$end) {
202+ kill_long_queries(\%con, time() - $start_time);
203+ sleep(1);
204+ }
205+
206+ mysql_close(\%con);
207+
208+ exit(0);
209+ }
210+}
211+
212+sub stop_query_killer {
213+ if (defined($query_killer_pid)) {
214+ kill 'HUP' => $query_killer_pid;
215+ waitpid($query_killer_pid, 0);
216+ print STDERR "Query killing process is finished\n";
217+ }
218+}
219+
220
221 #
222 # mysql_lockall subroutine puts a read lock on all tables in all databases.
223 #
224 sub mysql_lockall {
225 my $con = shift;
226+ my $queries_hash_ref;
227+
228+ if ($option_lock_wait_timeout) {
229+ wait_for_no_updates($con, $option_lock_wait_timeout,
230+ $option_lock_wait_threshold);
231+ }
232
233 $now = current_time();
234 print STDERR "$now $prefix Starting to lock all tables...\n";
235
236+ # start query killer process
237+ if ($option_kill_long_queries_timeout) {
238+ start_query_killer($option_kill_long_queries_timeout, $con);
239+ }
240 if (compare_versions($mysql_server_version, '4.0.22') == 0
241 || compare_versions($mysql_server_version, '4.1.7') == 0) {
242 # MySQL server version is 4.0.22 or 4.1.7
243@@ -1622,9 +1814,12 @@
244 mysql_query($con, "FLUSH TABLES WITH READ LOCK");
245 mysql_query($con, "COMMIT");
246 }
247+ # stop query killer process
248+ if ($option_kill_long_queries_timeout) {
249+ stop_query_killer();
250+ }
251 write_binlog_info($con);
252 write_galera_info($con) if $option_galera_info;
253-
254 $now = current_time();
255 print STDERR "$now $prefix All tables locked and flushed to disk\n";
256 }
257@@ -1657,6 +1852,7 @@
258 kill($kill_signal, $ibbackup_pid);
259 $ibbackup_pid = '';
260 }
261+ stop_query_killer();
262 }
263
264
265@@ -1996,7 +2192,15 @@
266 'rebuild-indexes' => \$option_rebuild_indexes,
267 'rebuild-threads=i' => \$option_rebuild_threads,
268 'debug-sleep-before-unlock=i' =>
269- \$option_debug_sleep_before_unlock
270+ \$option_debug_sleep_before_unlock,
271+ 'kill-long-queries-timeout=i' =>
272+ \$option_kill_long_queries_timeout,
273+ 'kill-long-query-type=s' =>
274+ \$option_kill_long_query_type,
275+ 'lock-wait-timeout=i' => \$option_lock_wait_timeout,
276+ 'lock-wait-threshold=i' => \$option_lock_wait_threshold,
277+ 'lock-wait-query-type=s' =>
278+ \$option_lock_wait_query_type
279 );
280
281 if (!$rcode) {
282@@ -2036,6 +2240,18 @@
283 $option_compress = 0;
284 }
285
286+ # validate lock-wait-query-type and kill-long-query-type values
287+ if (!(grep {$_ eq $option_lock_wait_query_type} qw/all update/)) {
288+ Die "Wrong value of lock-wait-query-type. ".
289+ "Possible values are all|update, but $option_lock_wait_query_type ".
290+ "is specified.";
291+ }
292+ if (!(grep {$_ eq $option_kill_long_query_type} qw/all select/)) {
293+ Die "Wrong value of kill-long-query-type. ".
294+ "Possible values are all|select, but $option_kill_long_query_type ".
295+ "is specified.";
296+ }
297+
298 if ($option_stream eq 'tar') {
299 $stream_cmd = 'tar chf -';
300 } elsif ($option_stream eq 'xbstream') {
301@@ -3148,6 +3364,27 @@
302
303 This option specifies the log sequence number (LSN) to use for the incremental backup. The option accepts a string argument. It is used with the --incremental option. It is used instead of specifying --incremental-basedir. For databases created by MySQL and Percona Server 5.0-series versions, specify the LSN as two 32-bit integers in high:low format. For databases created in 5.1 and later, specify the LSN as a single 64-bit integer.
304
305+=item --kill-long-queries-timeout=SECONDS
306+
307+This option specifies the number of seconds innobackupex waits between starting FLUSH TABLES WITH READ LOCK and killing those queries that block it. Default is 0 seconds, which means innobackupex will not attempt to kill any queries.
308+
309+=item --kill-long-query-type=all|update
310+
311+This option specifies which types of queries should be killed to unblock the global lock. Default is "all".
312+
313+=item --lock-wait-timeout=SECONDS
314+
315+This option specifies time in seconds that innobackupex should wait for queries that would block FTWRL before running it. If there are still such queries when the timeout expires, innobackupex terminates with an error.
316+Default is 0, in which case innobackupex does not wait for queries to complete and starts FTWRL immediately.
317+
318+=item --lock-wait-threshold=SECONDS
319+
320+This option specifies the query run time threshold which is used by innobackupex to detect long-running queries with a non-zero value of --lock-wait-timeout. FTWRL is not started until such long-running queries exist. This option has no effect if --lock-wait-timeout is 0. Default value is 60 seconds.
321+
322+=item --lock-wait-query-type=all|update
323+
324+This option specifies which types of queries are allowed to complete before innobackuped will issue the global lock. Default is all.
325+
326 =item --move-back
327
328 Move all the files in a previously made backup from the backup directory to the actual datadir location. Use with caution, as it removes backup files.
329
330=== added file 'test/t/kill_long_selects.sh'
331--- test/t/kill_long_selects.sh 1970-01-01 00:00:00 +0000
332+++ test/t/kill_long_selects.sh 2013-05-26 07:22:28 +0000
333@@ -0,0 +1,181 @@
334+############################################################################
335+# Test kill-long-queries and kill-long-queries-timeout optins
336+############################################################################
337+
338+. inc/common.sh
339+
340+function bg_run()
341+{
342+ local varname=$1
343+
344+ shift
345+
346+ ( for cmd in "$@"
347+ do
348+ eval "$cmd"
349+ done ) &
350+
351+ local pid=$!
352+
353+ eval "$varname=$pid"
354+
355+}
356+
357+function mysql_select()
358+{
359+ vlog "Run select query with duration $1 seconds"
360+ ${MYSQL} ${MYSQL_ARGS} -c test 2> /dev/null <<EOF
361+ /* Run background /*SELECT*\ */
362+ (
363+ SELECT SLEEP($1) FROM t1 FOR UPDATE
364+ ) UNION ALL
365+ (
366+ SELECT 1
367+ );
368+EOF
369+}
370+
371+function mysql_update()
372+{
373+ vlog "Run update query with duration $1 seconds"
374+ ${MYSQL} ${MYSQL_ARGS} -c test 2> /dev/null <<EOF
375+ /* This is not SELECT but rather an /*UPDATE*\
376+ query */
377+ UPDATE t1 SET a = SLEEP($1);
378+EOF
379+}
380+
381+
382+function bg_kill_ok()
383+{
384+ vlog "Killing $1, expecting it is alive"
385+ run_cmd kill $1
386+}
387+
388+
389+function bg_wait_ok()
390+{
391+ vlog "Waiting for $1, expecting it's success"
392+ run_cmd wait $1
393+}
394+
395+
396+function bg_wait_fail()
397+{
398+ vlog "Waiting for $1, expecting it would fail"
399+ run_cmd_expect_failure wait $1
400+}
401+
402+function kill_all_queries()
403+{
404+ run_cmd $MYSQL $MYSQL_ARGS test <<EOF
405+ select concat('KILL ',id,';') from information_schema.processlist
406+ where user='root' and time > 2 into outfile '$MYSQLD_TMPDIR/killall.sql';
407+ source $MYSQLD_TMPDIR/killall.sql;
408+EOF
409+ rm -f $MYSQLD_TMPDIR/killall.sql
410+}
411+
412+start_server --innodb_file_per_table
413+
414+run_cmd $MYSQL $MYSQL_ARGS test <<EOF
415+CREATE TABLE t1(a INT) ENGINE=InnoDB;
416+INSERT INTO t1 VALUES (1);
417+EOF
418+
419+mkdir $topdir/full
420+
421+# ==============================================================
422+vlog "===================== case 1 ====================="
423+bg_run bg_select_pid "mysql_select 3"
424+bg_run bg_update_pid "sleep 1" "mysql_update 3"
425+
426+innobackupex $topdir/full --kill-long-queries-timeout=5 \
427+ --kill-long-query-type=all
428+
429+bg_wait_ok $bg_select_pid
430+bg_wait_ok $bg_update_pid
431+kill_all_queries
432+
433+
434+# ==============================================================
435+vlog "===================== case 2 ====================="
436+bg_run bg_select_pid "mysql_select 200"
437+bg_run bg_update_pid "sleep 1" "mysql_update 5"
438+
439+innobackupex $topdir/full --kill-long-queries-timeout=3 \
440+ --kill-long-query-type=select
441+
442+bg_wait_fail $bg_select_pid
443+bg_wait_ok $bg_update_pid
444+kill_all_queries
445+
446+
447+# ==============================================================
448+vlog "===================== case 3 ====================="
449+bg_run bg_select_pid "mysql_select 200"
450+bg_run bg_update_pid "mysql_update 200"
451+
452+innobackupex $topdir/full --kill-long-queries-timeout=3 \
453+ --kill-long-query-type=all
454+
455+bg_wait_fail $bg_select_pid
456+bg_wait_fail $bg_update_pid
457+kill_all_queries
458+
459+
460+# ==============================================================
461+vlog "===================== case 4 ====================="
462+bg_run bg_select_pid "mysql_select 200"
463+bg_run bg_update_pid "mysql_update 200"
464+
465+sleep 1
466+
467+run_cmd_expect_failure ${IB_BIN} ${IB_ARGS} $topdir/full \
468+ --lock-wait-timeout=3 \
469+ --lock-wait-query-type=all \
470+ --lock-wait-threshold=1 \
471+ --kill-long-queries-timeout=1 \
472+ --kill-long-query-type=all
473+
474+bg_kill_ok $bg_select_pid
475+bg_kill_ok $bg_update_pid
476+kill_all_queries
477+
478+
479+# ==============================================================
480+vlog "===================== case 5 ====================="
481+bg_run bg_select_pid "mysql_select 200"
482+bg_run bg_update_pid "mysql_update 200"
483+
484+sleep 2
485+
486+run_cmd_expect_failure ${IB_BIN} ${IB_ARGS} $topdir/full \
487+ --lock-wait-timeout=3 \
488+ --lock-wait-query-type=update \
489+ --lock-wait-threshold=2 \
490+ --kill-long-queries-timeout=1 \
491+ --kill-long-query-type=all
492+
493+bg_kill_ok $bg_select_pid
494+bg_kill_ok $bg_update_pid
495+kill_all_queries
496+
497+
498+# ==============================================================
499+vlog "===================== case 6 ====================="
500+bg_run bg_update_pid "mysql_update 5"
501+bg_run bg_select_pid "sleep 1" "mysql_select 200"
502+
503+sleep 2
504+
505+innobackupex $topdir/full \
506+ --lock-wait-timeout=6 \
507+ --lock-wait-query-type=update \
508+ --lock-wait-threshold=2 \
509+ --kill-long-queries-timeout=1 \
510+ --kill-long-query-type=all
511+
512+bg_wait_fail $bg_select_pid
513+bg_wait_ok $bg_update_pid
514+kill_all_queries

Subscribers

People subscribed via source and target branches