Merge lp:~stewart/percona-xtrabackup/2.0-parallel-test into lp:percona-xtrabackup/2.0

Proposed by Stewart Smith
Status: Merged
Approved by: Alexey Kopytov
Approved revision: no longer in the source branch.
Merged at revision: 493
Proposed branch: lp:~stewart/percona-xtrabackup/2.0-parallel-test
Merge into: lp:percona-xtrabackup/2.0
Diff against target: 555 lines (+445/-12)
10 files modified
test/inc/common.sh (+1/-0)
test/run.sh (+14/-0)
test/t/xb_incremental_compressed.inc (+0/-5)
test/t/xb_incremental_compressed_16kb.sh (+2/-0)
test/t/xb_incremental_compressed_1kb.sh (+2/-0)
test/t/xb_incremental_compressed_2kb.sh (+2/-0)
test/t/xb_incremental_compressed_4kb.sh (+2/-0)
test/t/xb_incremental_compressed_8kb.sh (+2/-0)
test/testrun.c (+405/-0)
test/testrun.sh (+15/-7)
To merge this branch: bzr merge lp:~stewart/percona-xtrabackup/2.0-parallel-test
Reviewer Review Type Date Requested Status
Alexey Kopytov (community) Approve
Review via email: mp+142414@code.launchpad.net

This proposal supersedes a proposal from 2012-11-08.

Description of the change

Introduce a parallel test runner to XtraBackup.

This *dramatically* reduces the amount of time a build through Jenkins takes. An individual build+test can now be about 10 minutes instead of 30 to 50 minutes. Multiplied by 100 or so, this is a big improvement.

I've made the parallel test runner execute using exactly the same commands as the non-parallel one so that the jenkins jobs don't require switching.

The parallel runner is just simple straight C and likely builds on any POSIX system released in the past twenty years.

New Jenkins run (with fix for test suite killing unrelated processes):
http://jenkins.percona.com/view/XtraBackup/job/percona-xtrabackup-2.0-param/323/

Once I've merged in the other few fixes for running tests, I'll do another jenkins run.

To post a comment you must log in.
Revision history for this message
Stewart Smith (stewart) wrote : Posted in a previous version of this proposal

Note that I'm still tweaking the automatic "how many concurrent jobs to execute" algorithm... I've found 1.0 * NRCPUS to be better than 1.5 * NRCPUS on our jenkins cluster, but it seems as though we may have hit a couple of timeouts still. This could be rectified by either increasing the timeout (10 minutes) or reducing parallelism.

Revision history for this message
Stewart Smith (stewart) wrote : Posted in a previous version of this proposal
Revision history for this message
Stewart Smith (stewart) wrote : Posted in a previous version of this proposal
Revision history for this message
Alexey Kopytov (akopytov) wrote : Posted in a previous version of this proposal
Revision history for this message
Stewart Smith (stewart) wrote : Posted in a previous version of this proposal

> http://jenkins.percona.com/view/XtraBackup/job/percona-
> xtrabackup-2.0-param/295/ was aborted?

Just kicked off http://jenkins.percona.com/view/XtraBackup/job/percona-xtrabackup-2.0-param/304/ and hopefully it goes better.

Revision history for this message
Alexey Kopytov (akopytov) wrote : Posted in a previous version of this proposal

Looks still worse than current non-parallel builds:

http://jenkins.percona.com/view/XtraBackup/job/percona-xtrabackup-2.0-param/304/ (273 failures)

http://jenkins.percona.com/view/XtraBackup/job/percona-xtrabackup-2.0-param/305/ (320 failures, but it actually looks cleaner, because in 304 many hosts failed to execute tests due to "./run.sh: illegal option -- f")

review: Needs Fixing
Revision history for this message
Stewart Smith (stewart) wrote :

New Jenkins run (with fix for test suite killing unrelated processes):
http://jenkins.percona.com/view/XtraBackup/job/percona-xtrabackup-2.0-param/323/

Once I've merged in the other few fixes for running tests, I'll do another jenkins run.

Revision history for this message
Alexey Kopytov (akopytov) :
review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'test/inc/common.sh'
--- test/inc/common.sh 2012-11-12 05:06:16 +0000
+++ test/inc/common.sh 2013-01-08 22:35:30 +0000
@@ -65,6 +65,7 @@
65 then65 then
66 vlog "Found a leftover mysqld processes with PID `cat $file`, stopping it"66 vlog "Found a leftover mysqld processes with PID `cat $file`, stopping it"
67 kill -9 `cat $file` 2>/dev/null || true67 kill -9 `cat $file` 2>/dev/null || true
68 rm -f $file
68 fi69 fi
69 done70 done
70}71}
7172
=== added file 'test/run.sh'
--- test/run.sh 1970-01-01 00:00:00 +0000
+++ test/run.sh 2013-01-08 22:35:30 +0000
@@ -0,0 +1,14 @@
1#!/bin/bash
2
3XB_BUILD="autodetect"
4while getopts "fc:" options; do
5 case $options in
6 c ) XB_BUILD="$OPTARG";;
7 f ) ;; # ignored
8 esac
9done
10
11rm -rf results/ var/ test_results.subunit
12
13CFLAGS=-g make testrun
14./testrun -c $XB_BUILD
015
=== renamed file 'test/t/xb_incremental_compressed.sh' => 'test/t/xb_incremental_compressed.inc'
--- test/t/xb_incremental_compressed.sh 2012-07-27 11:56:50 +0000
+++ test/t/xb_incremental_compressed.inc 2013-01-08 22:35:30 +0000
@@ -181,8 +181,3 @@
181181
182 stop_server182 stop_server
183}183}
184
185for page_size in 1 2 4 8 16; do
186 test_incremental_compressed ${page_size}
187 clean
188done
189184
=== added file 'test/t/xb_incremental_compressed_16kb.sh'
--- test/t/xb_incremental_compressed_16kb.sh 1970-01-01 00:00:00 +0000
+++ test/t/xb_incremental_compressed_16kb.sh 2013-01-08 22:35:30 +0000
@@ -0,0 +1,2 @@
1source t/xb_incremental_compressed.inc
2test_incremental_compressed 16
03
=== added file 'test/t/xb_incremental_compressed_1kb.sh'
--- test/t/xb_incremental_compressed_1kb.sh 1970-01-01 00:00:00 +0000
+++ test/t/xb_incremental_compressed_1kb.sh 2013-01-08 22:35:30 +0000
@@ -0,0 +1,2 @@
1source t/xb_incremental_compressed.inc
2test_incremental_compressed 1
03
=== added file 'test/t/xb_incremental_compressed_2kb.sh'
--- test/t/xb_incremental_compressed_2kb.sh 1970-01-01 00:00:00 +0000
+++ test/t/xb_incremental_compressed_2kb.sh 2013-01-08 22:35:30 +0000
@@ -0,0 +1,2 @@
1source t/xb_incremental_compressed.inc
2test_incremental_compressed 2
03
=== added file 'test/t/xb_incremental_compressed_4kb.sh'
--- test/t/xb_incremental_compressed_4kb.sh 1970-01-01 00:00:00 +0000
+++ test/t/xb_incremental_compressed_4kb.sh 2013-01-08 22:35:30 +0000
@@ -0,0 +1,2 @@
1source t/xb_incremental_compressed.inc
2test_incremental_compressed 4
03
=== added file 'test/t/xb_incremental_compressed_8kb.sh'
--- test/t/xb_incremental_compressed_8kb.sh 1970-01-01 00:00:00 +0000
+++ test/t/xb_incremental_compressed_8kb.sh 2013-01-08 22:35:30 +0000
@@ -0,0 +1,2 @@
1source t/xb_incremental_compressed.inc
2test_incremental_compressed 8
03
=== added file 'test/testrun.c'
--- test/testrun.c 1970-01-01 00:00:00 +0000
+++ test/testrun.c 2013-01-08 22:35:30 +0000
@@ -0,0 +1,405 @@
1/*
2 * testrun.c - a parallel test runner for the XtraBackup test suite
3 */
4/* BEGIN LICENSE
5 * Copyright (C) 2012 Percona Inc.
6 *
7 * Written by Stewart Smith
8 *
9 * This program is free software: you can redistribute it and/or modify it
10 * under the terms of the GNU General Public License version 2, as published
11 * by the Free Software Foundation.
12 *
13 * This program is distributed in the hope that it will be useful, but
14 * WITHOUT ANY WARRANTY; without even the implied warranties of
15 * MERCHANTABILITY, SATISFACTORY QUALITY, or FITNESS FOR A PARTICULAR
16 * PURPOSE. See the GNU General Public License for more details.
17 *
18 * You should have received a copy of the GNU General Public License along
19 * with this program. If not, see <http://www.gnu.org/licenses/>.
20 * END LICENSE */
21
22#include <stdio.h>
23#include <unistd.h>
24#include <dirent.h>
25#include <string.h>
26#include <limits.h>
27#include <stdlib.h>
28#include <sys/select.h>
29
30#include <sys/stat.h>
31#include <sys/types.h>
32#include <sys/wait.h>
33#include <fcntl.h>
34#include <assert.h>
35
36#include <signal.h>
37
38struct testcase
39{
40 char *name;
41 char *buf;
42 size_t end_of_buf;
43 size_t bufsz;
44};
45
46pid_t *childpid;
47int nchildpid=0;
48
49static void kill_children(int sig)
50{
51 int i;
52 int status;
53
54 (void)sig;
55
56 fprintf(stderr, "Killing child processes...\n");
57
58 for(i=0; i<nchildpid; i++)
59 if(childpid[i] > 0)
60 {
61 kill(childpid[i], SIGKILL);
62 wait(&status);
63 }
64
65 exit(EXIT_SUCCESS);
66}
67
68int collect_testcases_filter(const struct dirent *a)
69{
70 int l;
71
72 if (a->d_name[0] == '.')
73 return 0;
74
75 l= strlen(a->d_name);
76
77 if (l > 2 && strncmp(a->d_name + l - 3, ".sh", 3)==0)
78 return 1;
79
80 return 0;
81}
82
83static int collect_testcases(const char* suitedir, struct testcase **cases)
84{
85 struct dirent **namelist;
86 int n;
87 int i;
88
89 n= scandir(suitedir, &namelist, collect_testcases_filter, alphasort);
90
91 *cases= (struct testcase*) malloc(sizeof(struct testcase)*n);
92
93 for(i=0; i<n; i++)
94 {
95 (*cases)[i].name= strdup(namelist[i]->d_name);
96 (*cases)[i].buf= NULL;
97 (*cases)[i].end_of_buf= 0;
98 (*cases)[i].bufsz= 0;
99 free(namelist[i]);
100 }
101
102 free(namelist);
103
104 return n;
105}
106
107static void free_testcases(struct testcase *cases, int n)
108{
109 while (n>0)
110 {
111 free(cases[--n].name);
112 }
113 free(cases);
114}
115
116static int run_testcase_in_child(int nr, struct testcase *t, pid_t *cpid, const char* xbtarget)
117{
118 int fd[2];
119
120 printf("[%d] LAUNCHING - %s\n", nr, t->name);
121
122 if (pipe(fd) == -1)
123 {
124 perror("pipe");
125 exit(EXIT_FAILURE);
126 }
127
128 *cpid= fork();
129 if (*cpid == 0)
130 {
131 /* child */
132 close(fd[0]);
133
134 char tname[500];
135 snprintf(tname, sizeof(tname), "t/%s",t->name);
136
137 char basedir[PATH_MAX];
138 char cwd[PATH_MAX];
139 snprintf(basedir, sizeof(basedir), "%s/var/%d", getcwd(cwd,sizeof(cwd)), nr);
140
141 mkdir("var",0700);
142 mkdir(basedir,0700);
143
144 char logname[PATH_MAX];
145 snprintf(logname, sizeof(logname), "%s/var/%d.log", getcwd(cwd,sizeof(cwd)), nr);
146
147 int logfd= open(logname, O_WRONLY|O_APPEND|O_CREAT, 0600);
148 dup2(logfd, STDOUT_FILENO);
149 dup2(logfd, STDERR_FILENO);
150 close(logfd);
151
152 char subunitfd[50];
153 snprintf(subunitfd, sizeof(subunitfd), "/dev/fd/%d", fd[1]);
154
155 char* xbtarget_param;
156 if (xbtarget)
157 xbtarget_param= strdup(xbtarget);
158 else
159 xbtarget_param= NULL;
160
161 char *const newargv[] = {"testrun.sh", "-n",
162 "-t", tname,
163 "-b", basedir,
164 "-r", subunitfd,
165 (xbtarget)? "-c" : NULL, xbtarget_param,
166 NULL };
167 char *newenviron[] = { NULL };
168 execve(newargv[0], newargv, newenviron);
169 perror("execve");
170 exit(EXIT_FAILURE);
171 }
172 else
173 {
174 /* parent */
175 close(fd[1]);
176 fcntl(fd[0], F_SETFL, O_NONBLOCK);
177 return fd[0];
178 }
179}
180
181static inline void subunit_progress_sign(int fd, int n, char *sign)
182{
183 char *buf;
184 const char* fmt= "progress: %s%d\n";
185 size_t sz= 1+strlen(fmt)+100;
186 size_t l;
187
188 buf= (char*)malloc(sz);
189
190 l= snprintf(buf, sz, fmt, sign, n);
191 assert(l < sz);
192
193 write(fd, buf, l);
194
195 free(buf);
196}
197
198static void run_testcases(struct testcase *testcases, int nrcases,
199 int njobs, int timeout, const char* xbtarget)
200{
201 int childfd[njobs];
202 int nfds= 0;
203 int retval;
204 pid_t chpid[njobs];
205 int status;
206 int next_testcase= 0;
207 int i;
208 fd_set rfds;
209 fd_set efds;
210 struct timeval tv;
211 int nchildren;
212 int childtest[njobs];
213
214 int subunitfd= open("test_results.subunit", O_TRUNC|O_WRONLY|O_APPEND|O_CREAT, 0600);
215 subunit_progress_sign(subunitfd, nrcases, "");
216
217 if (nrcases < njobs)
218 njobs= nrcases;
219
220 childpid= chpid;
221 nchildpid= njobs;
222
223 for(i=0; i<njobs; i++)
224 {
225 childtest[i]=next_testcase++;
226 childfd[i]= run_testcase_in_child(i, &testcases[childtest[i]], &childpid[i], xbtarget);
227 }
228
229 fflush(stdout);
230
231loop:
232 FD_ZERO(&efds);
233 FD_ZERO(&rfds);
234
235 nchildren=0;
236
237 for (i=0; i<njobs; i++)
238 {
239 if (childfd[i] != -1)
240 {
241 FD_SET(childfd[i], &efds);
242 FD_SET(childfd[i], &rfds);
243 nfds= (childfd[i] > nfds)? childfd[i] : nfds;
244 nchildren++;
245 }
246 }
247
248 tv.tv_sec= timeout;
249 tv.tv_usec= 0;
250
251 retval = select(nfds+1, &rfds, NULL, &efds, &tv);
252
253 if (retval == -1)
254 perror("select()");
255 else if (retval)
256 {
257 int childexited=0;
258
259 for (i=0; i<njobs; i++)
260 {
261 if(childfd[i] != -1
262 && (FD_ISSET(childfd[i], &efds) || FD_ISSET(childfd[i], &rfds)))
263 {
264 ssize_t r=0;
265
266 do {
267 struct testcase *t= &testcases[childtest[i]];
268
269 if(t->bufsz == t->end_of_buf)
270 {
271 t->bufsz+=4000;
272 t->buf= (char*)realloc(t->buf, t->bufsz);
273 }
274
275 r= read(childfd[i], t->buf+t->end_of_buf, t->bufsz - t->end_of_buf);
276 if (r>0)
277 t->end_of_buf+= r;
278 } while(r>0);
279
280 pid_t waited= waitpid(childpid[i], &status, WNOHANG);
281 if (!(WIFEXITED(status) || WIFSIGNALED(status)))
282 continue;
283
284 if (waited != childpid[i])
285 continue;
286
287 write(subunitfd, testcases[childtest[i]].buf,
288 testcases[childtest[i]].end_of_buf);
289
290 close(childfd[i]);
291 printf("[%d] completed %s status %d\n",
292 i, testcases[childtest[i]].name, WEXITSTATUS(status));
293 childfd[i]=-1;
294 nchildren--;
295
296 if (next_testcase < nrcases)
297 {
298 childtest[i]=next_testcase++;
299 childfd[i]= run_testcase_in_child(i, &testcases[childtest[i]], &childpid[i], xbtarget);
300 nfds= (childfd[i] > nfds)? childfd[i] : nfds;
301 nchildren++;
302 }
303 printf("\nnrchildren= %d, %d tests remaining\n",
304 nchildren, nrcases-next_testcase);
305 childexited=1;
306 fflush(stdout);
307 }
308 }
309 if (childexited)
310 {
311 printf ("Running: ");
312 for(i=0; i<njobs; i++)
313 if (childfd[i] != -1)
314 printf("%s ",testcases[childtest[i]].name);
315 printf("\n");
316 }
317 }
318 else
319 {
320 printf("Timeout\n");
321 kill_children(SIGKILL);
322 exit(EXIT_FAILURE);
323 }
324
325 if (nchildren==0)
326 goto end;
327
328 goto loop;
329
330end:
331
332 close(subunitfd);
333 return;
334}
335
336int main(int argc, char* argv[])
337{
338 const char* suitedir= "t/";
339 int njobs= 4;
340 int opt;
341 int nrcases;
342 struct testcase *testcases;
343 int timeout= 600;
344 const char* xbtarget= NULL;
345
346 struct sigaction sa;
347
348 sa.sa_flags= 0;
349 sigemptyset(&sa.sa_mask);
350 sa.sa_handler = kill_children;
351 sigaction(SIGHUP, &sa, NULL);
352 sigaction(SIGINT, &sa, NULL);
353 sigaction(SIGQUIT, &sa, NULL);
354
355#ifdef _SC_NPROCESSORS_ONLN
356 njobs= sysconf(_SC_NPROCESSORS_ONLN) /2;
357#endif
358
359#ifdef _SC_PHYS_PAGES
360 long nrpages= sysconf(_SC_PHYS_PAGES);
361 long pagesize= sysconf(_SC_PAGESIZE);
362 long pages_per_job= (128*(1 << 20)) / pagesize;
363 nrpages= nrpages/2;
364 if ((pages_per_job * njobs) > nrpages)
365 njobs= nrpages / pages_per_job;
366#endif
367
368 if (njobs == 0)
369 njobs= 1;
370
371 while ((opt = getopt(argc, argv, "j:s:t:c:")) != -1)
372 {
373 switch (opt) {
374 case 'c':
375 xbtarget= optarg;
376 break;
377 case 's':
378 suitedir= optarg;
379 break;
380 case 'j':
381 njobs= atoi(optarg);
382 break;
383 case 't':
384 timeout= atoi(optarg);
385 break;
386 default:
387 fprintf(stderr, "Usage: %s [-s suite] [-j parallel] [-t timeout]\n",
388 argv[0]);
389 exit(EXIT_FAILURE);
390 }
391 }
392
393 printf("%s running with: -s %s -j %d -t %d\n\n",
394 argv[0], suitedir,njobs, timeout);
395
396 nrcases= collect_testcases(suitedir, &testcases);
397
398 printf("Found %d testcases\n", nrcases);
399
400 run_testcases(testcases, nrcases, njobs, timeout, xbtarget);
401
402 free_testcases(testcases, nrcases);
403
404 return 0;
405}
0406
=== renamed file 'test/run.sh' => 'test/testrun.sh'
--- test/run.sh 2012-10-27 16:18:44 +0000
+++ test/testrun.sh 2013-01-08 22:35:30 +0000
@@ -9,8 +9,6 @@
9set +e9set +e
1010
11result=011result=0
12rm -rf results
13mkdir results
1412
15function usage()13function usage()
16{14{
@@ -56,7 +54,7 @@
5654
57function set_vars()55function set_vars()
58{56{
59 TEST_BASEDIR="$PWD"57 TEST_BASEDIR=${TEST_BASEDIR:-"$PWD"}
60 MYSQL_BASEDIR=${MYSQL_BASEDIR:-"$PWD/server"}58 MYSQL_BASEDIR=${MYSQL_BASEDIR:-"$PWD/server"}
61 PORT_BASE=$((3306 + $RANDOM))59 PORT_BASE=$((3306 + $RANDOM))
6260
@@ -220,7 +218,10 @@
220XTRACE_OPTION=""218XTRACE_OPTION=""
221XB_BUILD="autodetect"219XB_BUILD="autodetect"
222force=""220force=""
223while getopts "fgh?:t:s:d:c:" options; do221KEEP_RESULTS=0
222SUBUNIT_OUT=test_results.subunit
223
224while getopts "fgh?:t:s:d:c:b:nr:" options; do
224 case $options in225 case $options in
225 f ) force="yes";;226 f ) force="yes";;
226 t ) tname="$OPTARG";;227 t ) tname="$OPTARG";;
@@ -228,12 +229,22 @@
228 h ) usage; exit;;229 h ) usage; exit;;
229 s ) tname="$OPTARG/*.sh";;230 s ) tname="$OPTARG/*.sh";;
230 d ) export MYSQL_BASEDIR="$OPTARG";;231 d ) export MYSQL_BASEDIR="$OPTARG";;
232 b ) TEST_BASEDIR="$OPTARG";;
231 c ) XB_BUILD="$OPTARG";;233 c ) XB_BUILD="$OPTARG";;
234 n ) KEEP_RESULTS=1;;
235 r ) SUBUNIT_OUT="$OPTARG";;
232 ? ) echo "Use \`$0 -h' for the list of available options."236 ? ) echo "Use \`$0 -h' for the list of available options."
233 exit -1;;237 exit -1;;
234 esac238 esac
235done239done
236240
241if [ $KEEP_RESULTS -eq 0 ];
242then
243 rm -rf results
244 rm -f $SUBUNIT_OUT
245fi
246mkdir results
247
237set_vars248set_vars
238249
239if [ -n "$tname" ]250if [ -n "$tname" ]
@@ -268,9 +279,6 @@
268279
269source subunit.sh280source subunit.sh
270281
271SUBUNIT_OUT=test_results.subunit
272rm -f $SUBUNIT_OUT
273
274echo "========================================================================"282echo "========================================================================"
275283
276for t in $tests284for t in $tests

Subscribers

People subscribed via source and target branches