Merge lp:~stewart/percona-xtrabackup/2.0-parallel-test into lp:percona-xtrabackup/2.0

Proposed by Stewart Smith
Status: Superseded
Proposed branch: lp:~stewart/percona-xtrabackup/2.0-parallel-test
Merge into: lp:percona-xtrabackup/2.0
Diff against target: 555 lines (+445/-12)
10 files modified
test/inc/common.sh (+1/-0)
test/run.sh (+14/-0)
test/t/xb_incremental_compressed.inc (+0/-5)
test/t/xb_incremental_compressed_16kb.sh (+2/-0)
test/t/xb_incremental_compressed_1kb.sh (+2/-0)
test/t/xb_incremental_compressed_2kb.sh (+2/-0)
test/t/xb_incremental_compressed_4kb.sh (+2/-0)
test/t/xb_incremental_compressed_8kb.sh (+2/-0)
test/testrun.c (+405/-0)
test/testrun.sh (+15/-7)
To merge this branch: bzr merge lp:~stewart/percona-xtrabackup/2.0-parallel-test
Reviewer Review Type Date Requested Status
Alexey Kopytov (community) Needs Fixing
Review via email: mp+133406@code.launchpad.net

This proposal has been superseded by a proposal from 2013-01-08.

Description of the change

Introduce a parallel test runner to XtraBackup.

This *dramatically* reduces the amount of time a build through Jenkins takes. An individual build+test can now be about 10 minutes instead of 30 to 50 minutes. Multiplied by 100 or so, this is a big improvement.

I've made the parallel test runner execute using exactly the same commands as the non-parallel one so that the jenkins jobs don't require switching.

The parallel runner is just simple straight C and likely builds on any POSIX system released in the past twenty years.

http://jenkins.percona.com/view/XtraBackup/job/percona-xtrabackup-2.0-param/291/

To post a comment you must log in.
Revision history for this message
Stewart Smith (stewart) wrote :

Note that I'm still tweaking the automatic "how many concurrent jobs to execute" algorithm... I've found 1.0 * NRCPUS to be better than 1.5 * NRCPUS on our jenkins cluster, but it seems as though we may have hit a couple of timeouts still. This could be rectified by either increasing the timeout (10 minutes) or reducing parallelism.

Revision history for this message
Stewart Smith (stewart) wrote :
Revision history for this message
Stewart Smith (stewart) wrote :
Revision history for this message
Alexey Kopytov (akopytov) wrote :
Revision history for this message
Stewart Smith (stewart) wrote :

> http://jenkins.percona.com/view/XtraBackup/job/percona-
> xtrabackup-2.0-param/295/ was aborted?

Just kicked off http://jenkins.percona.com/view/XtraBackup/job/percona-xtrabackup-2.0-param/304/ and hopefully it goes better.

Revision history for this message
Alexey Kopytov (akopytov) wrote :

Looks still worse than current non-parallel builds:

http://jenkins.percona.com/view/XtraBackup/job/percona-xtrabackup-2.0-param/304/ (273 failures)

http://jenkins.percona.com/view/XtraBackup/job/percona-xtrabackup-2.0-param/305/ (320 failures, but it actually looks cleaner, because in 304 many hosts failed to execute tests due to "./run.sh: illegal option -- f")

review: Needs Fixing

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'test/inc/common.sh'
2--- test/inc/common.sh 2012-11-12 05:06:16 +0000
3+++ test/inc/common.sh 2013-01-08 06:28:21 +0000
4@@ -65,6 +65,7 @@
5 then
6 vlog "Found a leftover mysqld processes with PID `cat $file`, stopping it"
7 kill -9 `cat $file` 2>/dev/null || true
8+ rm -f $file
9 fi
10 done
11 }
12
13=== added file 'test/run.sh'
14--- test/run.sh 1970-01-01 00:00:00 +0000
15+++ test/run.sh 2013-01-08 06:28:21 +0000
16@@ -0,0 +1,14 @@
17+#!/bin/bash
18+
19+XB_BUILD="autodetect"
20+while getopts "fc:" options; do
21+ case $options in
22+ c ) XB_BUILD="$OPTARG";;
23+ f ) ;; # ignored
24+ esac
25+done
26+
27+rm -rf results/ var/ test_results.subunit
28+
29+CFLAGS=-g make testrun
30+./testrun -c $XB_BUILD
31
32=== renamed file 'test/t/xb_incremental_compressed.sh' => 'test/t/xb_incremental_compressed.inc'
33--- test/t/xb_incremental_compressed.sh 2012-07-27 11:56:50 +0000
34+++ test/t/xb_incremental_compressed.inc 2013-01-08 06:28:21 +0000
35@@ -181,8 +181,3 @@
36
37 stop_server
38 }
39-
40-for page_size in 1 2 4 8 16; do
41- test_incremental_compressed ${page_size}
42- clean
43-done
44
45=== added file 'test/t/xb_incremental_compressed_16kb.sh'
46--- test/t/xb_incremental_compressed_16kb.sh 1970-01-01 00:00:00 +0000
47+++ test/t/xb_incremental_compressed_16kb.sh 2013-01-08 06:28:21 +0000
48@@ -0,0 +1,2 @@
49+source t/xb_incremental_compressed.inc
50+test_incremental_compressed 16
51
52=== added file 'test/t/xb_incremental_compressed_1kb.sh'
53--- test/t/xb_incremental_compressed_1kb.sh 1970-01-01 00:00:00 +0000
54+++ test/t/xb_incremental_compressed_1kb.sh 2013-01-08 06:28:21 +0000
55@@ -0,0 +1,2 @@
56+source t/xb_incremental_compressed.inc
57+test_incremental_compressed 1
58
59=== added file 'test/t/xb_incremental_compressed_2kb.sh'
60--- test/t/xb_incremental_compressed_2kb.sh 1970-01-01 00:00:00 +0000
61+++ test/t/xb_incremental_compressed_2kb.sh 2013-01-08 06:28:21 +0000
62@@ -0,0 +1,2 @@
63+source t/xb_incremental_compressed.inc
64+test_incremental_compressed 2
65
66=== added file 'test/t/xb_incremental_compressed_4kb.sh'
67--- test/t/xb_incremental_compressed_4kb.sh 1970-01-01 00:00:00 +0000
68+++ test/t/xb_incremental_compressed_4kb.sh 2013-01-08 06:28:21 +0000
69@@ -0,0 +1,2 @@
70+source t/xb_incremental_compressed.inc
71+test_incremental_compressed 4
72
73=== added file 'test/t/xb_incremental_compressed_8kb.sh'
74--- test/t/xb_incremental_compressed_8kb.sh 1970-01-01 00:00:00 +0000
75+++ test/t/xb_incremental_compressed_8kb.sh 2013-01-08 06:28:21 +0000
76@@ -0,0 +1,2 @@
77+source t/xb_incremental_compressed.inc
78+test_incremental_compressed 8
79
80=== added file 'test/testrun.c'
81--- test/testrun.c 1970-01-01 00:00:00 +0000
82+++ test/testrun.c 2013-01-08 06:28:21 +0000
83@@ -0,0 +1,405 @@
84+/*
85+ * testrun.c - a parallel test runner for the XtraBackup test suite
86+ */
87+/* BEGIN LICENSE
88+ * Copyright (C) 2012 Percona Inc.
89+ *
90+ * Written by Stewart Smith
91+ *
92+ * This program is free software: you can redistribute it and/or modify it
93+ * under the terms of the GNU General Public License version 2, as published
94+ * by the Free Software Foundation.
95+ *
96+ * This program is distributed in the hope that it will be useful, but
97+ * WITHOUT ANY WARRANTY; without even the implied warranties of
98+ * MERCHANTABILITY, SATISFACTORY QUALITY, or FITNESS FOR A PARTICULAR
99+ * PURPOSE. See the GNU General Public License for more details.
100+ *
101+ * You should have received a copy of the GNU General Public License along
102+ * with this program. If not, see <http://www.gnu.org/licenses/>.
103+ * END LICENSE */
104+
105+#include <stdio.h>
106+#include <unistd.h>
107+#include <dirent.h>
108+#include <string.h>
109+#include <limits.h>
110+#include <stdlib.h>
111+#include <sys/select.h>
112+
113+#include <sys/stat.h>
114+#include <sys/types.h>
115+#include <sys/wait.h>
116+#include <fcntl.h>
117+#include <assert.h>
118+
119+#include <signal.h>
120+
121+struct testcase
122+{
123+ char *name;
124+ char *buf;
125+ size_t end_of_buf;
126+ size_t bufsz;
127+};
128+
129+pid_t *childpid;
130+int nchildpid=0;
131+
132+static void kill_children(int sig)
133+{
134+ int i;
135+ int status;
136+
137+ (void)sig;
138+
139+ fprintf(stderr, "Killing child processes...\n");
140+
141+ for(i=0; i<nchildpid; i++)
142+ if(childpid[i] > 0)
143+ {
144+ kill(childpid[i], SIGKILL);
145+ wait(&status);
146+ }
147+
148+ exit(EXIT_SUCCESS);
149+}
150+
151+int collect_testcases_filter(const struct dirent *a)
152+{
153+ int l;
154+
155+ if (a->d_name[0] == '.')
156+ return 0;
157+
158+ l= strlen(a->d_name);
159+
160+ if (l > 2 && strncmp(a->d_name + l - 3, ".sh", 3)==0)
161+ return 1;
162+
163+ return 0;
164+}
165+
166+static int collect_testcases(const char* suitedir, struct testcase **cases)
167+{
168+ struct dirent **namelist;
169+ int n;
170+ int i;
171+
172+ n= scandir(suitedir, &namelist, collect_testcases_filter, alphasort);
173+
174+ *cases= (struct testcase*) malloc(sizeof(struct testcase)*n);
175+
176+ for(i=0; i<n; i++)
177+ {
178+ (*cases)[i].name= strdup(namelist[i]->d_name);
179+ (*cases)[i].buf= NULL;
180+ (*cases)[i].end_of_buf= 0;
181+ (*cases)[i].bufsz= 0;
182+ free(namelist[i]);
183+ }
184+
185+ free(namelist);
186+
187+ return n;
188+}
189+
190+static void free_testcases(struct testcase *cases, int n)
191+{
192+ while (n>0)
193+ {
194+ free(cases[--n].name);
195+ }
196+ free(cases);
197+}
198+
199+static int run_testcase_in_child(int nr, struct testcase *t, pid_t *cpid, const char* xbtarget)
200+{
201+ int fd[2];
202+
203+ printf("[%d] LAUNCHING - %s\n", nr, t->name);
204+
205+ if (pipe(fd) == -1)
206+ {
207+ perror("pipe");
208+ exit(EXIT_FAILURE);
209+ }
210+
211+ *cpid= fork();
212+ if (*cpid == 0)
213+ {
214+ /* child */
215+ close(fd[0]);
216+
217+ char tname[500];
218+ snprintf(tname, sizeof(tname), "t/%s",t->name);
219+
220+ char basedir[PATH_MAX];
221+ char cwd[PATH_MAX];
222+ snprintf(basedir, sizeof(basedir), "%s/var/%d", getcwd(cwd,sizeof(cwd)), nr);
223+
224+ mkdir("var",0700);
225+ mkdir(basedir,0700);
226+
227+ char logname[PATH_MAX];
228+ snprintf(logname, sizeof(logname), "%s/var/%d.log", getcwd(cwd,sizeof(cwd)), nr);
229+
230+ int logfd= open(logname, O_WRONLY|O_APPEND|O_CREAT, 0600);
231+ dup2(logfd, STDOUT_FILENO);
232+ dup2(logfd, STDERR_FILENO);
233+ close(logfd);
234+
235+ char subunitfd[50];
236+ snprintf(subunitfd, sizeof(subunitfd), "/dev/fd/%d", fd[1]);
237+
238+ char* xbtarget_param;
239+ if (xbtarget)
240+ xbtarget_param= strdup(xbtarget);
241+ else
242+ xbtarget_param= NULL;
243+
244+ char *const newargv[] = {"testrun.sh", "-n",
245+ "-t", tname,
246+ "-b", basedir,
247+ "-r", subunitfd,
248+ (xbtarget)? "-c" : NULL, xbtarget_param,
249+ NULL };
250+ char *newenviron[] = { NULL };
251+ execve(newargv[0], newargv, newenviron);
252+ perror("execve");
253+ exit(EXIT_FAILURE);
254+ }
255+ else
256+ {
257+ /* parent */
258+ close(fd[1]);
259+ fcntl(fd[0], F_SETFL, O_NONBLOCK);
260+ return fd[0];
261+ }
262+}
263+
264+static inline void subunit_progress_sign(int fd, int n, char *sign)
265+{
266+ char *buf;
267+ const char* fmt= "progress: %s%d\n";
268+ size_t sz= 1+strlen(fmt)+100;
269+ size_t l;
270+
271+ buf= (char*)malloc(sz);
272+
273+ l= snprintf(buf, sz, fmt, sign, n);
274+ assert(l < sz);
275+
276+ write(fd, buf, l);
277+
278+ free(buf);
279+}
280+
281+static void run_testcases(struct testcase *testcases, int nrcases,
282+ int njobs, int timeout, const char* xbtarget)
283+{
284+ int childfd[njobs];
285+ int nfds= 0;
286+ int retval;
287+ pid_t chpid[njobs];
288+ int status;
289+ int next_testcase= 0;
290+ int i;
291+ fd_set rfds;
292+ fd_set efds;
293+ struct timeval tv;
294+ int nchildren;
295+ int childtest[njobs];
296+
297+ int subunitfd= open("test_results.subunit", O_TRUNC|O_WRONLY|O_APPEND|O_CREAT, 0600);
298+ subunit_progress_sign(subunitfd, nrcases, "");
299+
300+ if (nrcases < njobs)
301+ njobs= nrcases;
302+
303+ childpid= chpid;
304+ nchildpid= njobs;
305+
306+ for(i=0; i<njobs; i++)
307+ {
308+ childtest[i]=next_testcase++;
309+ childfd[i]= run_testcase_in_child(i, &testcases[childtest[i]], &childpid[i], xbtarget);
310+ }
311+
312+ fflush(stdout);
313+
314+loop:
315+ FD_ZERO(&efds);
316+ FD_ZERO(&rfds);
317+
318+ nchildren=0;
319+
320+ for (i=0; i<njobs; i++)
321+ {
322+ if (childfd[i] != -1)
323+ {
324+ FD_SET(childfd[i], &efds);
325+ FD_SET(childfd[i], &rfds);
326+ nfds= (childfd[i] > nfds)? childfd[i] : nfds;
327+ nchildren++;
328+ }
329+ }
330+
331+ tv.tv_sec= timeout;
332+ tv.tv_usec= 0;
333+
334+ retval = select(nfds+1, &rfds, NULL, &efds, &tv);
335+
336+ if (retval == -1)
337+ perror("select()");
338+ else if (retval)
339+ {
340+ int childexited=0;
341+
342+ for (i=0; i<njobs; i++)
343+ {
344+ if(childfd[i] != -1
345+ && (FD_ISSET(childfd[i], &efds) || FD_ISSET(childfd[i], &rfds)))
346+ {
347+ ssize_t r=0;
348+
349+ do {
350+ struct testcase *t= &testcases[childtest[i]];
351+
352+ if(t->bufsz == t->end_of_buf)
353+ {
354+ t->bufsz+=4000;
355+ t->buf= (char*)realloc(t->buf, t->bufsz);
356+ }
357+
358+ r= read(childfd[i], t->buf+t->end_of_buf, t->bufsz - t->end_of_buf);
359+ if (r>0)
360+ t->end_of_buf+= r;
361+ } while(r>0);
362+
363+ pid_t waited= waitpid(childpid[i], &status, WNOHANG);
364+ if (!(WIFEXITED(status) || WIFSIGNALED(status)))
365+ continue;
366+
367+ if (waited != childpid[i])
368+ continue;
369+
370+ write(subunitfd, testcases[childtest[i]].buf,
371+ testcases[childtest[i]].end_of_buf);
372+
373+ close(childfd[i]);
374+ printf("[%d] completed %s status %d\n",
375+ i, testcases[childtest[i]].name, WEXITSTATUS(status));
376+ childfd[i]=-1;
377+ nchildren--;
378+
379+ if (next_testcase < nrcases)
380+ {
381+ childtest[i]=next_testcase++;
382+ childfd[i]= run_testcase_in_child(i, &testcases[childtest[i]], &childpid[i], xbtarget);
383+ nfds= (childfd[i] > nfds)? childfd[i] : nfds;
384+ nchildren++;
385+ }
386+ printf("\nnrchildren= %d, %d tests remaining\n",
387+ nchildren, nrcases-next_testcase);
388+ childexited=1;
389+ fflush(stdout);
390+ }
391+ }
392+ if (childexited)
393+ {
394+ printf ("Running: ");
395+ for(i=0; i<njobs; i++)
396+ if (childfd[i] != -1)
397+ printf("%s ",testcases[childtest[i]].name);
398+ printf("\n");
399+ }
400+ }
401+ else
402+ {
403+ printf("Timeout\n");
404+ kill_children(SIGKILL);
405+ exit(EXIT_FAILURE);
406+ }
407+
408+ if (nchildren==0)
409+ goto end;
410+
411+ goto loop;
412+
413+end:
414+
415+ close(subunitfd);
416+ return;
417+}
418+
419+int main(int argc, char* argv[])
420+{
421+ const char* suitedir= "t/";
422+ int njobs= 4;
423+ int opt;
424+ int nrcases;
425+ struct testcase *testcases;
426+ int timeout= 600;
427+ const char* xbtarget= NULL;
428+
429+ struct sigaction sa;
430+
431+ sa.sa_flags= 0;
432+ sigemptyset(&sa.sa_mask);
433+ sa.sa_handler = kill_children;
434+ sigaction(SIGHUP, &sa, NULL);
435+ sigaction(SIGINT, &sa, NULL);
436+ sigaction(SIGQUIT, &sa, NULL);
437+
438+#ifdef _SC_NPROCESSORS_ONLN
439+ njobs= sysconf(_SC_NPROCESSORS_ONLN) /2;
440+#endif
441+
442+#ifdef _SC_PHYS_PAGES
443+ long nrpages= sysconf(_SC_PHYS_PAGES);
444+ long pagesize= sysconf(_SC_PAGESIZE);
445+ long pages_per_job= (128*(1 << 20)) / pagesize;
446+ nrpages= nrpages/2;
447+ if ((pages_per_job * njobs) > nrpages)
448+ njobs= nrpages / pages_per_job;
449+#endif
450+
451+ if (njobs == 0)
452+ njobs= 1;
453+
454+ while ((opt = getopt(argc, argv, "j:s:t:c:")) != -1)
455+ {
456+ switch (opt) {
457+ case 'c':
458+ xbtarget= optarg;
459+ break;
460+ case 's':
461+ suitedir= optarg;
462+ break;
463+ case 'j':
464+ njobs= atoi(optarg);
465+ break;
466+ case 't':
467+ timeout= atoi(optarg);
468+ break;
469+ default:
470+ fprintf(stderr, "Usage: %s [-s suite] [-j parallel] [-t timeout]\n",
471+ argv[0]);
472+ exit(EXIT_FAILURE);
473+ }
474+ }
475+
476+ printf("%s running with: -s %s -j %d -t %d\n\n",
477+ argv[0], suitedir,njobs, timeout);
478+
479+ nrcases= collect_testcases(suitedir, &testcases);
480+
481+ printf("Found %d testcases\n", nrcases);
482+
483+ run_testcases(testcases, nrcases, njobs, timeout, xbtarget);
484+
485+ free_testcases(testcases, nrcases);
486+
487+ return 0;
488+}
489
490=== renamed file 'test/run.sh' => 'test/testrun.sh'
491--- test/run.sh 2012-10-27 16:18:44 +0000
492+++ test/testrun.sh 2013-01-08 06:28:21 +0000
493@@ -9,8 +9,6 @@
494 set +e
495
496 result=0
497-rm -rf results
498-mkdir results
499
500 function usage()
501 {
502@@ -56,7 +54,7 @@
503
504 function set_vars()
505 {
506- TEST_BASEDIR="$PWD"
507+ TEST_BASEDIR=${TEST_BASEDIR:-"$PWD"}
508 MYSQL_BASEDIR=${MYSQL_BASEDIR:-"$PWD/server"}
509 PORT_BASE=$((3306 + $RANDOM))
510
511@@ -220,7 +218,10 @@
512 XTRACE_OPTION=""
513 XB_BUILD="autodetect"
514 force=""
515-while getopts "fgh?:t:s:d:c:" options; do
516+KEEP_RESULTS=0
517+SUBUNIT_OUT=test_results.subunit
518+
519+while getopts "fgh?:t:s:d:c:b:nr:" options; do
520 case $options in
521 f ) force="yes";;
522 t ) tname="$OPTARG";;
523@@ -228,12 +229,22 @@
524 h ) usage; exit;;
525 s ) tname="$OPTARG/*.sh";;
526 d ) export MYSQL_BASEDIR="$OPTARG";;
527+ b ) TEST_BASEDIR="$OPTARG";;
528 c ) XB_BUILD="$OPTARG";;
529+ n ) KEEP_RESULTS=1;;
530+ r ) SUBUNIT_OUT="$OPTARG";;
531 ? ) echo "Use \`$0 -h' for the list of available options."
532 exit -1;;
533 esac
534 done
535
536+if [ $KEEP_RESULTS -eq 0 ];
537+then
538+ rm -rf results
539+ rm -f $SUBUNIT_OUT
540+fi
541+mkdir results
542+
543 set_vars
544
545 if [ -n "$tname" ]
546@@ -268,9 +279,6 @@
547
548 source subunit.sh
549
550-SUBUNIT_OUT=test_results.subunit
551-rm -f $SUBUNIT_OUT
552-
553 echo "========================================================================"
554
555 for t in $tests

Subscribers

People subscribed via source and target branches