Martin [gz] wrote:
[...]
> As written, the code relies on refcounting working, so cycles need to be
> avoided. The alternative is to run a full gc collection after every test,
> which slow enough to be worth avoiding.

Another alternative: why not a single full gc collection at the end of the
entire run, and then report any test cases that are unexpectedly still
alive?

I don't think you should assume that “uncollected at this instant” implies
“can't be collected yet”.  And the real issue is uncollectable objects, not
objects that have fractionally outlived their usefulness, AFAICS.

The point is that if a VM chooses to allow say 20 or even 200 used test case
instances to be alive simultaneously before collecting them, then that's
fine.  It'll still keep the memory ceiling reasonably low rather than
growing in proportion to the number of tests.  So requiring that a test case
lifetime is strictly minimal is too strong a requirement, and I'm not sure
it's worth paying the price for it.  The price is either assuming a
refcounting VM (and the fairly extreme senstivity to refcounting differences
due to different versions of dependencies like testtools or even the Python
stdlib), or forcing a full GC after every test.  I agree that the latter is
too much, but I also think the former isn't worth paying either.

To be clear, I don't mind (small) contortions to make most or even all tests
be collected promptly via refcounting, under common conditions.  I do mind
emitting warnings when that fails, because I think the noise will outweigh
the benefits.

I'd be very happy to have an optional flag to selftest to force a full GC
after every test (or more creatively, after every N tests, then once a leak
is found re-run the last N individually to pinpoint it...).  And maybe even
an optional flag to warn if the refcount-based attempt to release the test
case doesn't release it.  It's good to have tools to help developers
diagnose memory issues.  I just don't believe these tools should be on all
the time unless they are very cheap and very accurate.

> The branch needs a bit more work still, about a hundred tests still need
> poking to avoid cycles, and needs a NEWS entry and so on. I'd like some
> feedback on if people think this is a worthwhile change over just
> stripping the __dict__ and if there are any other issues I've not
> realised.

Why not strip __dict__ as well?  Defense in depth...

Basically, I'm not inclined to go to extreme lengths to get utter perfection
here: if the memory consumption is down, and is likely stay down, and is
fairly easy to diagnose and correct if it doesn't stay down, then I think
that's clearly Good Enough, even if there are a handful of tests that don't
relinquish all their memory, or don't relinquish it as promptly as they
theoretically could.

Aside from the comments above, most of the diff looks ok to me except these
bits:

> +++ bzrlib/tests/test_errors.py	2010-10-20 23:41:12 +0000
> @@ -575,12 +575,15 @@
>          try:
>              1/0
>          except ZeroDivisionError:
> -            exc_info = sys.exc_info()
> -        err = errors.HookFailed('hook stage', 'hook name', exc_info, warn=False)
> -        self.assertStartsWith(
> -            str(err), 'Hook \'hook name\' during hook stage failed:\n')
> -        self.assertEndsWith(
> -            str(err), 'integer division or modulo by zero')
> +            err = errors.HookFailed('hook stage', 'hook name', sys.exc_info(),
> +                warn=False)
> +        try:
> +            self.assertStartsWith(
> +                str(err), 'Hook \'hook name\' during hook stage failed:\n')
> +            self.assertEndsWith(
> +                str(err), 'integer division or modulo by zero')
> +        finally:
> +            del err
>  
>      def test_tip_change_rejected(self):

That hunk looks odd.  Surely falling off the end of the function
already deletes all the locals?  And if the test fails, I don't really care
about the extra memory cost.  Certainly not enough to clutter the test.

[...]
> +    def _run_selftest_with_suite(self, **kwargs):
> +        sio = StringIO()
> +        gc.disable()
> +        try:
> +            tests.selftest(test_suite_factory=self._get_suite, stream=sio,
> +                **kwargs)
> +        finally:
> +            gc.enable()

To be pedantic this shouldn't gc.enable unless gc.isenabled() was true
before.

-Andrew.