Code review comment for lp:~vila/bzr/leaking-tests-catch-them-all

Revision history for this message
Vincent Ladeuil (vila) wrote :

> Okay, I don't completely understand the why, but this is easy to fix. Don't
> store the real get_transport method on the test instance and close over self,
> just close over the real method.

Ok, I'll fix that, but (mentioning it here instead of IRC to have a track record ;),
avoiding cycles shouldn't require us to avoid closures. Breaking the cycle during
cleanup should be enough.

> Looking at that, there are a number of differences along the lines of:
>
> bt.test_remote.TestStacking.test_unstacked_get_parent_map
> Was: ok 882ms
> Test leaked threads, current number of live threads: 1662
> Now: ok 40898ms
> thread ('127.0.0.1', 2122) -> ('127.0.0.1', 2113) hung
> thread ('127.0.0.1', 2121) -> ('127.0.0.1', 2113) hung
> thread ('127.0.0.1', 2120) -> ('127.0.0.1', 2113) hung
> thread ('127.0.0.1', 2119) -> ('127.0.0.1', 2113) hung
> thread ('127.0.0.1', 2118) -> ('127.0.0.1', 2113) hung
> thread ('127.0.0.1', 2117) -> ('127.0.0.1', 2113) hung
> thread ('127.0.0.1', 2116) -> ('127.0.0.1', 2113) hung
> thread ('127.0.0.1', 2115) -> ('127.0.0.1', 2113) hung
>
> Test leaked threads, current number of live threads: 256
>
> So, you've done a great job reducing the number of leaking threads, but a
> subset of them have turned into hang timeouts at 5 seconds apiece. There are
> also tests that used to leak a thread and now don't and are 10x faster, so
> fixing these other ones should result in an overall speed up.

Right, so, this only occurs on windows. As mentioned in the comment above the line
that 'print.*hung' the real fix is to kill the thread and I didn't want to venture into that
on top of all the other fixes.
I don't know why this happens on windows only, but the symptom is that both the client
and the server are waiting for some change on the underlying socket... and this doesn't happen
on any other platform.

Now, the 5s timeout may be too high but I think I kept it that way to allow some minimal
time to poke under pdb... note that I don't see the same hung tests than you (from memory
I see far less <100, compared to >x1000 leaking tests).

And yes, the overall test suite is faster because there is less sockets to care about mainly,
the speed difference vary greatly from platform to platform though...

« Back to merge proposal