Merge into trunk-old : selftest-fixes : Code : Bazaar

Status:	Work in progress
Proposed branch:	lp:~vila/bzr/selftest-fixes
Merge into:	lp:~bzr/bzr/trunk-old
Diff against target:	170 lines (has conflicts) Text conflict in NEWS
To merge this branch:	bzr merge lp:~vila/bzr/selftest-fixes
Related bugs:	Link a bug report

Reviewer	Date Requested	Status
Robert Collins (community)	2009-08-20	Needs Fixing on 2009-08-24
bzr-core	2009-08-19	Pending
Review via email: mp+10364@code.launchpad.net

Revision history for this message

Vincent Ladeuil (vila) wrote on 2009-08-19:

#

This patch implements a better balancing algorithm for selftest --parallel=fork.

It leads to ~25% smaller elapsed time on 4-core (8 threads) host.
Or said otherwise, running the full test suite goes down from 4 minutes to 3 minutes.

I also added a '-Eslices' that helps understand how the tests are distributed.

Reply

Revision history for this message

Robert Collins (lifeless) wrote on 2009-08-24:

#

I'm fairly sure that this breaks compatibility with --parallel=ec2.

I think the concept is useful but please consider how to fit in with that plugin, and on windows, where external processes are much more expensive to start.

review: Needs Fixing

Reply

Revision history for this message

Vincent Ladeuil (vila) wrote on 2009-08-24:

#

>>>>> "robert" == Robert Collins <email address hidden> writes:

robert> Review: Needs Fixing

robert> I'm fairly sure that this breaks compatibility with
robert> --parallel=ec2.

Right, care to give it a try ? You know it's far easier for you
than for me.

I presume your suspicion comes from the fact that the plugin uses
fork_for_tests right ?

    robert> I think the concept is useful but please consider how
    robert> to fit in with that plugin, and on windows, where
    robert> external processes are much more expensive to start.

Using a process for each slice was simpler than starting one
process by processor and then implementing a way to send the
slices via another socket/pipe, so I tried that approach a bit
and punted.

If you played a bit with -Eslices, you noticed that, indeed, many
processes are spawned (one by slice) and avoiding them can give
even better performances.

Bug given that the test suite is *not* fully running so far on
windows, being able to run even part of it, faster is still a win IMHO.

And even if starting new processes is slow, I doubt it can be slower
to run 8 parallel newly started processes than a single started
once process :).

Hmm, I don't doubt it, I know in fact, the overhead of starting
new processes and handling the results via subunit is high, even
on linux, yet --parallel=fork reduces the *elapsed* time
enormously. Unless that can't reproduced on windows, I far prefer
being able to reduce the elapsed time than delay using -parallel=fork there.

Would you be ok if instead of modifying fork_for_tests I create a
new fork_balanced_for_tests until we can reconcile both ?

Reply

Revision history for this message

Robert Collins (lifeless) wrote on 2009-08-24:

#

On Mon, 2009-08-24 at 06:18 +0000, Vincent Ladeuil wrote:
>
> robert> I think the concept is useful but please consider how
> robert> to fit in with that plugin, and on windows, where
> robert> external processes are much more expensive to start.
>
> Using a process for each slice was simpler than starting one
> process by processor and then implementing a way to send the
> slices via another socket/pipe, so I tried that approach a bit
> and punted.

A chatty protocol has its own issues indeed, including not being
friendly to xmlrpc apis and so on.

> If you played a bit with -Eslices, you noticed that, indeed, many
> processes are spawned (one by slice) and avoiding them can give
> even better performances.

> Bug given that the test suite is *not* fully running so far on
> windows, being able to run even part of it, faster is still a win
> IMHO.

Sure, I'm not debating that. Its only slightly faster though ;). Running
with --randomize should give approximately the same performance boost I
think.

> And even if starting new processes is slow, I doubt it can be slower
> to run 8 parallel newly started processes than a single started
> once process :).

On EC2 bringing up a warm machine takes about 60 seconds. Bringing up a
fresh machine takes about 8-10 minutes. Each machine then has 8 CPU's at
its disposal.

> Hmm, I don't doubt it, I know in fact, the overhead of starting
> new processes and handling the results via subunit is high, even
> on linux, yet --parallel=fork reduces the *elapsed* time
> enormously. Unless that can't reproduced on windows, I far prefer
> being able to reduce the elapsed time than delay using -parallel=fork
> there.
>
> Would you be ok if instead of modifying fork_for_tests I create a
> new fork_balanced_for_tests until we can reconcile both ?

Its not fork_for_tests, its the helper classes.

ec2test is easy to setup, docs are in the developer tree:
install the plugin.
. ~/.aws
install boto [support library]
./bzr push; ./bzr --selftest --parallel=ec2

-Rob

On Mon, 2009-08-24 at 06:18 +0000, Vincent Ladeuil wrote:
> 
>     robert> I think the concept is useful but please consider how
>     robert> to fit in with that plugin, and on windows, where
>     robert> external processes are much more expensive to start.
> 
> Using a process for each slice was simpler than starting one
> process by processor and then implementing a way to send the
> slices via another socket/pipe, so I tried that approach a bit
> and punted.

A chatty protocol has its own issues indeed, including not being
friendly to xmlrpc apis and so on.

> If you played a bit with -Eslices, you noticed that, indeed, many
> processes are spawned (one by slice) and avoiding them can give
> even better performances.

> Bug given that the test suite is *not* fully running so far on
> windows, being able to run even part of it, faster is still a win
> IMHO.

Sure, I'm not debating that. Its only slightly faster though ;). Running
with --randomize should give approximately the same performance boost I
think.

> And even if starting new processes is slow, I doubt it can be slower
> to run 8 parallel newly started processes than a single started
> once process :).

On EC2 bringing up a warm machine takes about 60 seconds. Bringing up a
fresh machine takes about 8-10 minutes. Each machine then has 8 CPU's at
its disposal.

> Hmm, I don't doubt it, I know in fact, the overhead of starting
> new processes and handling the results via subunit is high, even
> on linux, yet --parallel=fork reduces the *elapsed* time
> enormously. Unless that can't reproduced on windows, I far prefer
> being able to reduce the elapsed time than delay using -parallel=fork
> there.
> 
> Would you be ok if instead of modifying fork_for_tests I create a
> new fork_balanced_for_tests until we can reconcile both ?

Its not fork_for_tests, its the helper classes.

ec2test is easy to setup, docs are in the developer tree:
install the plugin.
. ~/.aws
install boto [support library]
./bzr push; ./bzr --selftest --parallel=ec2

-Rob

Reply

Revision history for this message

John A Meinel (jameinel) wrote on 2009-08-24:

#

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

...
>
> Bug given that the test suite is *not* fully running so far on
> windows, being able to run even part of it, faster is still a win IMHO.
>
> And even if starting new processes is slow, I doubt it can be slower
> to run 8 parallel newly started processes than a single started
> once process :).
>
> Hmm, I don't doubt it, I know in fact, the overhead of starting
> new processes and handling the results via subunit is high, even
> on linux, yet --parallel=fork reduces the *elapsed* time
> enormously. Unless that can't reproduced on windows, I far prefer
> being able to reduce the elapsed time than delay using -parallel=fork there.
>
> Would you be ok if instead of modifying fork_for_tests I create a
> new fork_balanced_for_tests until we can reconcile both ?

I'll mention that "os.fork()" doesn't exist on Windows, you have to
spawn a subprocess and pass it data.

I don't know if this patch effects those cases or not. But if the
problem is that you're worried about *fork* performance on Windows, it
is, indeed awful :).

If you are spawning a subprocess, and passing it data, then it seems
like most of the work is already done, and you just need that subprocess
to check its input when it is finished, to see if there is more to do....

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkqSqnsACgkQJdeBCYSNAAMjsgCfYLMIv7i/G2iPRBL8pnAikHoB
oUYAn2a9sqHXlSVw4oNmat+TWhz19G5y
=Ydab
-----END PGP SIGNATURE-----

Reply

Unmerged revisions

4625. By Vincent Ladeuil on 2009-08-19

NEWS entry and some cleanup for submission.

4624. By Vincent Ladeuil on 2009-08-18

-Eslices conditions statistics display.

* bzrlib/tests/__init__.py:
(selftest_debug_flags): Add a 'slices' debug flag.
(fork_for_tests.TestInOtherProcess.run): Display some key
statistics related to test suite slicing.

4623. By Vincent Ladeuil on 2009-08-18

Some tuning and cleanup.

* bzrlib/tests/__init__.py:
(fork_for_tests.TestInOtherProcess.run): Cap slize size at 8 to
avoid spawning too much processes.
(fork_for_tests.TestInOtherProcess.run_slice): Don't leave zombies
around, that hurts performances when tenths or hundreds are
produced.

4622. By Vincent Ladeuil on 2009-08-18

Implement a balancing scheme to maximize processor utilisation.

* bzrlib/tests/__init__.py:
(fork_for_tests): Change the palce we fork to better control which
tests are run where.

4621. By Vincent Ladeuil on 2009-08-18

Start hacking on balancing parallel selftest.

* bzrlib/tests/__init__.py:
(fork_for_tests): Start balancing forked selftest. This version
does not work.

4620. By Vincent Ladeuil on 2009-08-17

Fixed as per John's review.

4619. By Vincent Ladeuil on 2009-08-17

Make --parallel=fork work again.

* bzrlib/tests/__init__.py:
(run_suite): CoutingDecorator is incompatible with
--parallel=fork, don't use the former when the later is
required (even if we lose the toal number of tests that has to be
run...).

4618. By Canonical.com Patch Queue Manager <email address hidden> on 2009-08-17

(robertc) Fix test_write_group to not test inappropriate things on
RemoteRepository. (Robert Collins)

4617. By Canonical.com Patch Queue Manager <email address hidden> on 2009-08-17

(robertc) Prepare test_foreign for rich roots as the default format.
(Robert Collins)

4616. By Canonical.com Patch Queue Manager <email address hidden> on 2009-08-16

(robertc) Change a KnownFailure into a test with two success paths in
preperation for 2a as default. (Robert Collins)

 === modified file 'NEWS'
 --- NEWS	2009-08-30 23:51:10 +0000
 +++ NEWS	2009-08-31 04:35:37 +0000
@@ -148,6 +148,7 @@
    --version`` and ``bzr selftest``.
    (Martin Pool, #409137)
++<<<<<<< TREE
  * bzr can now (again) capture crash data through the apport library,
    so that a single human-readable file can be attached to bug reports.
    This can be disabled by using ``-Dno_apport`` on the command line, or by
@@ -222,6 +223,17 @@
    ``TestCase.start_server`` that registers a cleanup and starts the server
    should be used. (Robert Collins)
++=======
++Testing
++*******
++
++* ``bzr selftest --parallel=fork`` now uses a better balancing algorithm
++  leading to a further ~25% reduction of the overall elapsed time. A new
++  ``-Eslices`` flag is available for selftest to display some statistics about
++  how the tests are distributed between the processes.
++  (Vincent Ladeuil)
++
++>>>>>>> MERGE-SOURCE
  bzr 1.18
  ########
 === modified file 'bzrlib/tests/__init__.py'
 --- bzrlib/tests/__init__.py	2009-08-28 21:05:31 +0000
 +++ bzrlib/tests/__init__.py	2009-08-31 04:35:38 +0000
@@ -3051,49 +3051,94 @@
      """
      concurrency = osutils.local_concurrency()
      result = []
++
      from subunit import TestProtocolClient, ProtocolTestCase
      try:
          from subunit.test_results import AutoTimingTestResultDecorator
      except ImportError:
          AutoTimingTestResultDecorator = lambda x:x
--    class TestInOtherProcess(ProtocolTestCase):
++
++    all_tests = list(iter_suite_tests(suite))
++    nb_tests = len(all_tests)
++    shared_cur_test = [0]
++
++    import threading
++    class TestInOtherProcess(object):
          # Should be in subunit, I think. RBC.
--        def __init__(self, stream, pid):
--            ProtocolTestCase.__init__(self, stream)
--            self.pid = pid
++        # Or in testtools given the coupling with ConcurrentTestSuite ? --vila
++        # 20090819
++
++        def __init__(self, suite_semaphore, rank):
++            self.suite_semaphore = suite_semaphore
++            self.rank = rank
++            self.nb_slices = 0
++            self.nb_tests = 0
          def run(self, result):
--            try:
--                ProtocolTestCase.run(self, result)
--            finally:
--                os.waitpid(self.pid, os.WNOHANG)
--
--    test_blocks = partition_tests(suite, concurrency)
--    for process_tests in test_blocks:
--        process_suite = TestSuite()
--        process_suite.addTests(process_tests)
--        c2pread, c2pwrite = os.pipe()
--        pid = os.fork()
--        if pid == 0:
--            try:
--                os.close(c2pread)
--                # Leave stderr and stdout open so we can see test noise
--                # Close stdin so that the child goes away if it decides to
--                # read from stdin (otherwise its a roulette to see what
--                # child actually gets keystrokes for pdb etc).
--                sys.stdin.close()
--                sys.stdin = None
--                stream = os.fdopen(c2pwrite, 'wb', 1)
--                subunit_result = AutoTimingTestResultDecorator(
--                    TestProtocolClient(stream))
--                process_suite.run(subunit_result)
--            finally:
--                os._exit(0)
--        else:
--            os.close(c2pwrite)
--            stream = os.fdopen(c2pread, 'rb', 1)
--            test = TestInOtherProcess(stream, pid)
--            result.append(test)
++            self.suite_semaphore.acquire()
++            cur_test = shared_cur_test[0]
++            while cur_test < nb_tests:
++                # The slice size should be a balance between the overhead of
++                # processing a slice, and the ability to feed as many children
++                # as possible for the longest possible time (or said otherwise:
++                # the last standing child should run alone for the shortest
++                # possible time). So we start by saying that each child will
++                # handle as many slices as there are children and reducing the
++                # slice size from there.
++                slice_size = ((nb_tests - cur_test)
++                              / (concurrency * concurrency)) + 8
++                if 'slices' in selftest_debug_flags:
++                    note('New slice for %d: %5d', self.rank, slice_size)
++                # give a slice to first free child
++                first, last = cur_test, cur_test + slice_size
++                shared_cur_test[0] = last
++                self.suite_semaphore.release()
++
++                self.nb_slices += 1
++                self.nb_tests += slice_size
++                self.run_slice(result, all_tests[first:last])
++                if shared_cur_test[0] > nb_tests:
++                    break
++                self.suite_semaphore.acquire()
++                cur_test = shared_cur_test[0]
++            self.suite_semaphore.release()
++            if 'slices' in selftest_debug_flags:
++                note('%d ran %5d tests in %5d slices',
++                     self.rank, self.nb_tests, self.nb_slices)
++
++        def run_slice(self, result, tests):
++            (f_read, f_write) = os.pipe()
++            pid = os.fork()
++            if pid == 0:
++                try:
++                    # Leave stderr and stdout open so we can see test noise
++                    # Close stdin so that the child goes away if it decides
++                    # to read from stdin (otherwise its a roulette to see
++                    # what child actually gets keystrokes for pdb etc).
++                    sys.stdin.close()
++                    sys.stdin = None
++                    feedback = os.fdopen(f_write, 'wb', 1)
++                    os.close(f_read)
++                    subunit_result = AutoTimingTestResultDecorator(
++                        TestProtocolClient(feedback))
++                    process_suite = TestSuite()
++                    process_suite.addTests(tests)
++                    process_suite.run(subunit_result)
++                finally:
++                    os._exit(0)
++            else:
++                feedback = os.fdopen(f_read, 'rb', 1)
++                os.close(f_write)
++                try:
++                    test = ProtocolTestCase(feedback)
++                    test.run(result)
++                finally:
++                    os.waitpid(pid, 0)
++
++    suite_semaphore = threading.Semaphore(1)
++    for proc in range(0, concurrency):
++        test = TestInOtherProcess(suite_semaphore, proc)
++        result.append(test)
      return result
@@ -3254,6 +3299,8 @@
  #                           rather than failing tests. And no longer raise
  #                           LockContention when fctnl locks are not being used
  #                           with proper exclusion rules.
++#   -Eslices                Will output information about how the test suite is
++#                           sliced while running with --parallel=fork
  selftest_debug_flags = set()

Bazaar

Merge lp:~vila/bzr/selftest-fixes into lp:~bzr/bzr/trunk-old

Commit message

Description of the change

Unmerged revisions

Preview Diff

Subscribers