Merge lp:~vila/bzr/869366-test-server-race into lp:bzr/2.4
Status: | Merged |
---|---|
Approved by: | John A Meinel |
Approved revision: | no longer in the source branch. |
Merged at revision: | 6053 |
Proposed branch: | lp:~vila/bzr/869366-test-server-race |
Merge into: | lp:bzr/2.4 |
Diff against target: |
159 lines (+61/-21) 3 files modified
bzrlib/tests/test_server.py (+3/-2) bzrlib/tests/test_test_server.py (+55/-19) doc/en/release-notes/bzr-2.4.txt (+3/-0) |
To merge this branch: | bzr merge lp:~vila/bzr/869366-test-server-race |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
John A Meinel | Approve | ||
Review via email: mp+78634@code.launchpad.net |
Commit message
Fix random test failure due to race in test_server_
Description of the change
This is was one of the worst races I had to chase.
For context, this is in bzr-2.4 and only the tests are concerned.
My deepest thanks goes to gz for pushing me enough to dive into this madness
again.
Here is the story with its gory details:
These tests are about exception handling in the tests servers.
First of all, two kind of servers are tested here:
* TestingTCPServer runs in a single thread with basically runs a loop
around:
accept
handle_request
which means that an exception during a request processing just kill the
whole server
* TestingThreadin
connection, so basically a loop:
accept
handle_request by spawning a thread where the processing occurs
Now, test_server_
also test_exception_
side when an exception is encountered while processing a request.
For TestingTCPServer it's simpler, the exception kills the server, easy,
just collect the exception.
TestingThreadin
- the server thread spawns the connection thread and continues until it
blocks in the accept() call. Here, the exception is caught in the
connection thread and can be checked there.
That was the first race I started chasing and ended up with: the exception
should be checked in the connection thread, not the server thread.
- the server thread spawns the connection thread but doesn't get any cycle
until the connection thread processes the request, raises and captures the
exception. At that point, the server thread get some cycles and captures
the exception *clearing* it in the connection thread *before* reaching the
accept() call and as such re-raises, captures the exception and never
reaches the accept() call !
So the fix is that we should check both the connection thread and the server
thread with some care:
- first the connection thread, if we get the exception, we're good,
- if it's not in the connection thread, we need to wait for the server to
stop before getting... our precious.
Quite a journey.
Now as mentioned in the bug report, this has been encountered ~20 times
during the last year.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 10/7/2011 6:34 PM, Vincent Ladeuil wrote: nAThread. test_server_ crash_while_ responding /bugs.launchpad .net/bzr/ +bug/869366 /code.launchpad .net/~vila/ bzr/869366- test-server- race/+merge/ 78634
> Vincent Ladeuil has proposed merging
> lp:~vila/bzr/869366-test-server-race into lp:bzr/2.4.
>
> Requested reviews: bzr-core (bzr-core) Related bugs: Bug #869366 in
> Bazaar: "TestTCPServerI
> random failure" https:/
>
> For more details, see:
> https:/
>
> This is was one of the worst races I had to chase.
>
> For context, this is in bzr-2.4 and only the tests are concerned.
>
merge: approve
I like the change, and I like that it makes it more deterministic. I
have a couple of comments that might improve clarity, but nothing that
must be done. So feel free to land this, though think about my comments.
John
=:->
...
def serve(self):
self. serving = True clear()
self. started. set()
- - self.stopped.
# We are listening and ready to accept connections
[comment] This doesn't seem related or necessary. It would allow us to
run 'serve' multiple times, though I suppose we never specifically
need to.
class FailingDuringRe sponseHandler( TCPConnectionHa ndler):
- - def handle_ connection( self): readline( ) currentThread( ).set_sync_ event(sync) connection( request) :
- - req = self.rfile.
- - threading.
+ def handle_
^- [comment] changing the 'self' parameter to not be self seems
confusing. I think I preferred your form using:
test = self
Or even better would be to pass 'self' to the sponseHandler' s __init__ constructor, and saving it as
FailingDuringRe
an attribute. This particular change will conflict landing on bzr.dev,
but should be obvious to fix.
The tests are meant to be roughly identical, except the latter case enigmail. mozdev. org/
marks the exception as ignored. Can we move the common code to a
helper to make it purely obvious that this is the case?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAk6 POyQACgkQJdeBCY SNAAPEMQCeN5NAB M6irUH7z0BRwK9C r9od XTuWmWbh03xR2P9 6U
9MwAni0HIw57Kw7
=/GQC
-----END PGP SIGNATURE-----