Merge lp:~vila/bzr/selftest-fixes into lp:~bzr/bzr/trunk-old

Proposed by Vincent Ladeuil
Status: Work in progress
Proposed branch: lp:~vila/bzr/selftest-fixes
Merge into: lp:~bzr/bzr/trunk-old
Diff against target: 170 lines (has conflicts)
Text conflict in NEWS
To merge this branch: bzr merge lp:~vila/bzr/selftest-fixes
Reviewer Review Type Date Requested Status
Robert Collins (community) Needs Fixing
bzr-core Pending
Review via email: mp+10364@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Vincent Ladeuil (vila) wrote :

This patch implements a better balancing algorithm for selftest --parallel=fork.

It leads to ~25% smaller elapsed time on 4-core (8 threads) host.
Or said otherwise, running the full test suite goes down from 4 minutes to 3 minutes.

I also added a '-Eslices' that helps understand how the tests are distributed.

Revision history for this message
Robert Collins (lifeless) wrote :

I'm fairly sure that this breaks compatibility with --parallel=ec2.

I think the concept is useful but please consider how to fit in with that plugin, and on windows, where external processes are much more expensive to start.

review: Needs Fixing
Revision history for this message
Vincent Ladeuil (vila) wrote :

>>>>> "robert" == Robert Collins <email address hidden> writes:

    robert> Review: Needs Fixing

    robert> I'm fairly sure that this breaks compatibility with
    robert> --parallel=ec2.

Right, care to give it a try ? You know it's far easier for you
than for me.

I presume your suspicion comes from the fact that the plugin uses
fork_for_tests right ?

    robert> I think the concept is useful but please consider how
    robert> to fit in with that plugin, and on windows, where
    robert> external processes are much more expensive to start.

Using a process for each slice was simpler than starting one
process by processor and then implementing a way to send the
slices via another socket/pipe, so I tried that approach a bit
and punted.

If you played a bit with -Eslices, you noticed that, indeed, many
processes are spawned (one by slice) and avoiding them can give
even better performances.

Bug given that the test suite is *not* fully running so far on
windows, being able to run even part of it, faster is still a win IMHO.

And even if starting new processes is slow, I doubt it can be slower
to run 8 parallel newly started processes than a single started
once process :).

Hmm, I don't doubt it, I know in fact, the overhead of starting
new processes and handling the results via subunit is high, even
on linux, yet --parallel=fork reduces the *elapsed* time
enormously. Unless that can't reproduced on windows, I far prefer
being able to reduce the elapsed time than delay using -parallel=fork there.

Would you be ok if instead of modifying fork_for_tests I create a
new fork_balanced_for_tests until we can reconcile both ?

Revision history for this message
Robert Collins (lifeless) wrote :

On Mon, 2009-08-24 at 06:18 +0000, Vincent Ladeuil wrote:
>
> robert> I think the concept is useful but please consider how
> robert> to fit in with that plugin, and on windows, where
> robert> external processes are much more expensive to start.
>
> Using a process for each slice was simpler than starting one
> process by processor and then implementing a way to send the
> slices via another socket/pipe, so I tried that approach a bit
> and punted.

A chatty protocol has its own issues indeed, including not being
friendly to xmlrpc apis and so on.

> If you played a bit with -Eslices, you noticed that, indeed, many
> processes are spawned (one by slice) and avoiding them can give
> even better performances.

> Bug given that the test suite is *not* fully running so far on
> windows, being able to run even part of it, faster is still a win
> IMHO.

Sure, I'm not debating that. Its only slightly faster though ;). Running
with --randomize should give approximately the same performance boost I
think.

> And even if starting new processes is slow, I doubt it can be slower
> to run 8 parallel newly started processes than a single started
> once process :).

On EC2 bringing up a warm machine takes about 60 seconds. Bringing up a
fresh machine takes about 8-10 minutes. Each machine then has 8 CPU's at
its disposal.

> Hmm, I don't doubt it, I know in fact, the overhead of starting
> new processes and handling the results via subunit is high, even
> on linux, yet --parallel=fork reduces the *elapsed* time
> enormously. Unless that can't reproduced on windows, I far prefer
> being able to reduce the elapsed time than delay using -parallel=fork
> there.
>
> Would you be ok if instead of modifying fork_for_tests I create a
> new fork_balanced_for_tests until we can reconcile both ?

Its not fork_for_tests, its the helper classes.

ec2test is easy to setup, docs are in the developer tree:
install the plugin.
. ~/.aws
install boto [support library]
./bzr push; ./bzr --selftest --parallel=ec2

-Rob

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

...
>
> Bug given that the test suite is *not* fully running so far on
> windows, being able to run even part of it, faster is still a win IMHO.
>
> And even if starting new processes is slow, I doubt it can be slower
> to run 8 parallel newly started processes than a single started
> once process :).
>
> Hmm, I don't doubt it, I know in fact, the overhead of starting
> new processes and handling the results via subunit is high, even
> on linux, yet --parallel=fork reduces the *elapsed* time
> enormously. Unless that can't reproduced on windows, I far prefer
> being able to reduce the elapsed time than delay using -parallel=fork there.
>
> Would you be ok if instead of modifying fork_for_tests I create a
> new fork_balanced_for_tests until we can reconcile both ?

I'll mention that "os.fork()" doesn't exist on Windows, you have to
spawn a subprocess and pass it data.

I don't know if this patch effects those cases or not. But if the
problem is that you're worried about *fork* performance on Windows, it
is, indeed awful :).

If you are spawning a subprocess, and passing it data, then it seems
like most of the work is already done, and you just need that subprocess
to check its input when it is finished, to see if there is more to do....

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkqSqnsACgkQJdeBCYSNAAMjsgCfYLMIv7i/G2iPRBL8pnAikHoB
oUYAn2a9sqHXlSVw4oNmat+TWhz19G5y
=Ydab
-----END PGP SIGNATURE-----

Unmerged revisions

4625. By Vincent Ladeuil

NEWS entry and some cleanup for submission.

4624. By Vincent Ladeuil

-Eslices conditions statistics display.

* bzrlib/tests/__init__.py:
(selftest_debug_flags): Add a 'slices' debug flag.
(fork_for_tests.TestInOtherProcess.run): Display some key
statistics related to test suite slicing.

4623. By Vincent Ladeuil

Some tuning and cleanup.

* bzrlib/tests/__init__.py:
(fork_for_tests.TestInOtherProcess.run): Cap slize size at 8 to
avoid spawning too much processes.
(fork_for_tests.TestInOtherProcess.run_slice): Don't leave zombies
around, that hurts performances when tenths or hundreds are
produced.

4622. By Vincent Ladeuil

Implement a balancing scheme to maximize processor utilisation.

* bzrlib/tests/__init__.py:
(fork_for_tests): Change the palce we fork to better control which
tests are run where.

4621. By Vincent Ladeuil

Start hacking on balancing parallel selftest.

* bzrlib/tests/__init__.py:
(fork_for_tests): Start balancing forked selftest. This version
does not work.

4620. By Vincent Ladeuil

Fixed as per John's review.

4619. By Vincent Ladeuil

Make --parallel=fork work again.

* bzrlib/tests/__init__.py:
(run_suite): CoutingDecorator is incompatible with
--parallel=fork, don't use the former when the later is
required (even if we lose the toal number of tests that has to be
run...).

4618. By Canonical.com Patch Queue Manager <email address hidden>

(robertc) Fix test_write_group to not test inappropriate things on
 RemoteRepository. (Robert Collins)

4617. By Canonical.com Patch Queue Manager <email address hidden>

(robertc) Prepare test_foreign for rich roots as the default format.
 (Robert Collins)

4616. By Canonical.com Patch Queue Manager <email address hidden>

(robertc) Change a KnownFailure into a test with two success paths in
 preperation for 2a as default. (Robert Collins)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'NEWS'
2--- NEWS 2009-08-30 23:51:10 +0000
3+++ NEWS 2009-08-31 04:35:37 +0000
4@@ -148,6 +148,7 @@
5 --version`` and ``bzr selftest``.
6 (Martin Pool, #409137)
7
8+<<<<<<< TREE
9 * bzr can now (again) capture crash data through the apport library,
10 so that a single human-readable file can be attached to bug reports.
11 This can be disabled by using ``-Dno_apport`` on the command line, or by
12@@ -222,6 +223,17 @@
13 ``TestCase.start_server`` that registers a cleanup and starts the server
14 should be used. (Robert Collins)
15
16+=======
17+Testing
18+*******
19+
20+* ``bzr selftest --parallel=fork`` now uses a better balancing algorithm
21+ leading to a further ~25% reduction of the overall elapsed time. A new
22+ ``-Eslices`` flag is available for selftest to display some statistics about
23+ how the tests are distributed between the processes.
24+ (Vincent Ladeuil)
25+
26+>>>>>>> MERGE-SOURCE
27 bzr 1.18
28 ########
29
30
31=== modified file 'bzrlib/tests/__init__.py'
32--- bzrlib/tests/__init__.py 2009-08-28 21:05:31 +0000
33+++ bzrlib/tests/__init__.py 2009-08-31 04:35:38 +0000
34@@ -3051,49 +3051,94 @@
35 """
36 concurrency = osutils.local_concurrency()
37 result = []
38+
39 from subunit import TestProtocolClient, ProtocolTestCase
40 try:
41 from subunit.test_results import AutoTimingTestResultDecorator
42 except ImportError:
43 AutoTimingTestResultDecorator = lambda x:x
44- class TestInOtherProcess(ProtocolTestCase):
45+
46+ all_tests = list(iter_suite_tests(suite))
47+ nb_tests = len(all_tests)
48+ shared_cur_test = [0]
49+
50+ import threading
51+ class TestInOtherProcess(object):
52 # Should be in subunit, I think. RBC.
53- def __init__(self, stream, pid):
54- ProtocolTestCase.__init__(self, stream)
55- self.pid = pid
56+ # Or in testtools given the coupling with ConcurrentTestSuite ? --vila
57+ # 20090819
58+
59+ def __init__(self, suite_semaphore, rank):
60+ self.suite_semaphore = suite_semaphore
61+ self.rank = rank
62+ self.nb_slices = 0
63+ self.nb_tests = 0
64
65 def run(self, result):
66- try:
67- ProtocolTestCase.run(self, result)
68- finally:
69- os.waitpid(self.pid, os.WNOHANG)
70-
71- test_blocks = partition_tests(suite, concurrency)
72- for process_tests in test_blocks:
73- process_suite = TestSuite()
74- process_suite.addTests(process_tests)
75- c2pread, c2pwrite = os.pipe()
76- pid = os.fork()
77- if pid == 0:
78- try:
79- os.close(c2pread)
80- # Leave stderr and stdout open so we can see test noise
81- # Close stdin so that the child goes away if it decides to
82- # read from stdin (otherwise its a roulette to see what
83- # child actually gets keystrokes for pdb etc).
84- sys.stdin.close()
85- sys.stdin = None
86- stream = os.fdopen(c2pwrite, 'wb', 1)
87- subunit_result = AutoTimingTestResultDecorator(
88- TestProtocolClient(stream))
89- process_suite.run(subunit_result)
90- finally:
91- os._exit(0)
92- else:
93- os.close(c2pwrite)
94- stream = os.fdopen(c2pread, 'rb', 1)
95- test = TestInOtherProcess(stream, pid)
96- result.append(test)
97+ self.suite_semaphore.acquire()
98+ cur_test = shared_cur_test[0]
99+ while cur_test < nb_tests:
100+ # The slice size should be a balance between the overhead of
101+ # processing a slice, and the ability to feed as many children
102+ # as possible for the longest possible time (or said otherwise:
103+ # the last standing child should run alone for the shortest
104+ # possible time). So we start by saying that each child will
105+ # handle as many slices as there are children and reducing the
106+ # slice size from there.
107+ slice_size = ((nb_tests - cur_test)
108+ / (concurrency * concurrency)) + 8
109+ if 'slices' in selftest_debug_flags:
110+ note('New slice for %d: %5d', self.rank, slice_size)
111+ # give a slice to first free child
112+ first, last = cur_test, cur_test + slice_size
113+ shared_cur_test[0] = last
114+ self.suite_semaphore.release()
115+
116+ self.nb_slices += 1
117+ self.nb_tests += slice_size
118+ self.run_slice(result, all_tests[first:last])
119+ if shared_cur_test[0] > nb_tests:
120+ break
121+ self.suite_semaphore.acquire()
122+ cur_test = shared_cur_test[0]
123+ self.suite_semaphore.release()
124+ if 'slices' in selftest_debug_flags:
125+ note('%d ran %5d tests in %5d slices',
126+ self.rank, self.nb_tests, self.nb_slices)
127+
128+ def run_slice(self, result, tests):
129+ (f_read, f_write) = os.pipe()
130+ pid = os.fork()
131+ if pid == 0:
132+ try:
133+ # Leave stderr and stdout open so we can see test noise
134+ # Close stdin so that the child goes away if it decides
135+ # to read from stdin (otherwise its a roulette to see
136+ # what child actually gets keystrokes for pdb etc).
137+ sys.stdin.close()
138+ sys.stdin = None
139+ feedback = os.fdopen(f_write, 'wb', 1)
140+ os.close(f_read)
141+ subunit_result = AutoTimingTestResultDecorator(
142+ TestProtocolClient(feedback))
143+ process_suite = TestSuite()
144+ process_suite.addTests(tests)
145+ process_suite.run(subunit_result)
146+ finally:
147+ os._exit(0)
148+ else:
149+ feedback = os.fdopen(f_read, 'rb', 1)
150+ os.close(f_write)
151+ try:
152+ test = ProtocolTestCase(feedback)
153+ test.run(result)
154+ finally:
155+ os.waitpid(pid, 0)
156+
157+ suite_semaphore = threading.Semaphore(1)
158+ for proc in range(0, concurrency):
159+ test = TestInOtherProcess(suite_semaphore, proc)
160+ result.append(test)
161 return result
162
163
164@@ -3254,6 +3299,8 @@
165 # rather than failing tests. And no longer raise
166 # LockContention when fctnl locks are not being used
167 # with proper exclusion rules.
168+# -Eslices Will output information about how the test suite is
169+# sliced while running with --parallel=fork
170 selftest_debug_flags = set()
171
172