Merge lp:~gz/bzr/cleanup_testcases_by_collection_613247 into lp:bzr
- cleanup_testcases_by_collection_613247
- Merge into bzr.dev
Status: | Superseded |
---|---|
Proposed branch: | lp:~gz/bzr/cleanup_testcases_by_collection_613247 |
Merge into: | lp:bzr |
Diff against target: |
427 lines (+190/-88) (has conflicts) 4 files modified
bzrlib/errors.py (+1/-0) bzrlib/tests/TestUtil.py (+25/-1) bzrlib/tests/__init__.py (+56/-87) bzrlib/tests/test_selftest.py (+108/-0) Text conflict in bzrlib/tests/__init__.py |
To merge this branch: | bzr merge lp:~gz/bzr/cleanup_testcases_by_collection_613247 |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Andrew Bennetts | Needs Fixing | ||
Vincent Ladeuil | Needs Fixing | ||
Review via email: mp+38999@code.launchpad.net |
This proposal supersedes a proposal from 2010-08-15.
Commit message
Description of the change
This branch ensures that after each test case is run, they are cleaned up rather than lingering until the entire suite completes. On my machine this reduces the memory usage of a full test run from ~575MB to ~140MB, which prevents OOM related errors towards the end. In fact, the first revision is a sufficient fix, however Robert's intention in removing an earlier hack was to allow each test case instance to be deallocated after being run. To make that happen, a lot of code needs to be touched to make sure decrementing the refcount after the test completes will actually result in its deallocation.
The most visible change is there will now be a whine printed to the output stream if a testcase stays alive. As this style has proved rather too easy to ignore, I may try an alternative approach once the remaining ones are fixed. The test result has already been recorded by this point, but adding a new stub test to run next that fails may be an option.
As written, the code relies on refcounting working, so cycles need to be avoided. The alternative is to run a full gc collection after every test, which slow enough to be worth avoiding.
Some changes to testtools are needed to avoid some needless exc_info cycles, but even with that fixed we're hamstrung on a few fronts. Firstly, cases with errors or failures are kept in lists on the result object. In testtools, there is also a list for expected failures but bazaar doesn't use this one. However, the new unittest _ExpectedFailure gets the old exc_info which creates cycles.
The last change on the branch guts the TestDecorator classes. This is needed because otherwise they were keeping extra references to test cases alive, but I found the way they worked previously overly complicated anyway. I'd like to simplify them further if there's not a good backward compatibility reason for keeping the old style.
The branch needs a bit more work still, about a hundred tests still need poking to avoid cycles, and needs a NEWS entry and so on. I'd like some feedback on if people think this is a worthwhile change over just stripping the __dict__ and if there are any other issues I've not realised.
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal | # |
Martin Packman (gz) wrote : Posted in a previous version of this proposal | # |
> I'm not worried about the references to test cases in errors/failures
> lists : they are a benefit (and the __dict__ stripping was a _REAL_
> pain there in the past).
Indeed that mostly won't be an issue (and can sometimes be useful), my one worry is the occasional breaks-everything rev will then go back to swapping my machine to death. That's why the current implementation still wipes all uncollected tests, but I'm okay with the idea of removing that once everything else is sorted.
A few things I forgot to mention in the initial review comment.
The lock decorators now use this horribly obnoxious pattern:
self.
try:
result = unbound(self, *args, **kwargs)
except:
import sys
exc_info = sys.exc_info()
try:
finally:
try:
del exc_info
else:
return result
If anyone can think of a sane way of spelling that, please say.
Cycles needing to be avoided will have to be documented if we go with this, and can be annoying to understand without decent knowledge of python internals. I've been using a rough reference tracking function to fix things, but it's not worth landing. Is there any package that can help contributors diagnose things like this?
Would there be any benefit in running all tests with gc disabled by default, and only enabling it for those that explicitly require it?
Is there a better way of telling if the test you just ran failed other than comparing failure counts on the result object?
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal | # |
> Is there a better way of telling if the test you just ran failed other than comparing failure counts on the result object?
Where do you need to know this?
Martin Packman (gz) wrote : Posted in a previous version of this proposal | # |
> > Is there a better way of telling if the test you just ran failed other than
> comparing failure counts on the result object?
>
> Where do you need to know this?
It has to be outside all the normal methods unfortunately. Even stopTest is too early, as that's passed a reference to the test case, so clearly the caller is going to be keeping it alive. Instead I added the collection accounting in TestSuite.run which was already popping testcases from its internal list, now it uses weakref to check that they really expire after being run.
I've put up <lp:~gz/testtools/avoid_exc_info_cycles> with the testtools changes I needed.
Vincent Ladeuil (vila) wrote : Posted in a previous version of this proposal | # |
Hmm, I appreciate the effort to reduce the memory footprint, but unfortunately this seems far from landable :-/
BZR_PLUGIN_
bzr selftest: /home/vila/
/home/
bzr-2.2b3 python-2.6.5 Linux-2.
-------
Ran 8 tests in 0.489s
OK
2 tests skipped
Huh ? Where are my tests gone ? (Strong suspicion around your modification of __iter__)
Ran 24856 tests in 1050.614s
FAILED (failures=2, errors=2, known_failure_
Including failures for bb.test_log so your fix is wrong there.
On the overall, with a superficial reading, the way you avoid ref cycles seem hard to understand
so very likely to be re-introduced. How about breaking the ref-cycles with addCleanup calls instead
(by deleting the offending attributes) ?
Now, a way to track the uncollectable test case sounds like a good way to progress but given the amount of work, could you instead devise a patch where the check depends on some option ?
Once we reach 0 we can toggle the default value for this option.
For the run above, I also get ~4000 Uncollected test cases, far too much to land this patch :-/
Martin Packman (gz) wrote : Posted in a previous version of this proposal | # |
Thanks for looking at this Vincent.
> Huh ? Where are my tests gone ? (Strong suspicion around your modification of
> __iter__)
I suspect this is related to --parallel which I thought might be affected but didn't have any easy way of testing on my machine, I'll take another look at that code.
> Including failures for bb.test_log so your fix is wrong there.
Gah, how did I miss those...
> On the overall, with a superficial reading, the way you avoid ref cycles seem
> hard to understand
> so very likely to be re-introduced. How about breaking the ref-cycles with
> addCleanup calls instead
> (by deleting the offending attributes) ?
Yes, that it's hard to understand is a big concern. At least this patch, with retroactively changing a bunch of tests in subtle ways, is the worst part of it. It's much easier to adjust the test you're writing right now because you get told there's a cycle. So, moving in the direction of actually causing a failure once all existing issues are fixed would I think prevent regressions without being too taxing on contributors.
Note, that sometimes cycles do have to be avoided, because it's easier than trying to break them later, particularly with exc_info and closures where it's internal objects pointing at each other rather than public TestCase attributes.
> Now, a way to track the uncollectable test case sounds like a good way to
> progress but given the amount of work, could you instead devise a patch where
> the check depends on some option ?
> Once we reach 0 we can toggle the default value for this option.
Well, the "print junk to stdout" is my less intrusive option, but the aim is to get it to 0 before landing. However...
> For the run above, I also get ~4000 Uncollected test cases, far too much to
> land this patch :-/
This could well be just needing the testtools exc_info cycle cleanup as well, branch for that linked in previous comment.
...getting 4000 complaints from not having the testtools change probably does mean this needs to land quieter, making everyone patch testtools or be spammed isn't polite.
Vincent Ladeuil (vila) wrote : Posted in a previous version of this proposal | # |
> Thanks for looking at this Vincent.
>
> > Huh ? Where are my tests gone ? (Strong suspicion around your modification
> of
> > __iter__)
>
> I suspect this is related to --parallel which I thought might be affected but
> didn't have any easy way of testing on my machine, I'll take another look at
> that code.
Clearly --parallel specific and at least one __iter__ you've modified should be involved.
>
> > Including failures for bb.test_log so your fix is wrong there.
>
> Gah, how did I miss those...
I'm pretty sure the way the log formatter are registered requires that a single instance is used
and you've changed that. In this case I think breaking the cycle with an addCleanup should be ok.
>
> > On the overall, with a superficial reading, the way you avoid ref cycles
> seem
> > hard to understand
> > so very likely to be re-introduced. How about breaking the ref-cycles with
> > addCleanup calls instead
> > (by deleting the offending attributes) ?
>
> Yes, that it's hard to understand is a big concern. At least this patch, with
> retroactively changing a bunch of tests in subtle ways, is the worst part of
> it. It's much easier to adjust the test you're writing right now because you
> get told there's a cycle.
I understand that, but avoiding the cycle may make the code less clear than breaking it during tearDown,
especially if there is an accompanying comment ;)
> So, moving in the direction of actually causing a
> failure once all existing issues are fixed would I think prevent regressions
> without being too taxing on contributors.
Yup, +1 on that.
> Note, that sometimes cycles do have to be avoided, because it's easier than
> trying to break them later, particularly with exc_info and closures where it's
> internal objects pointing at each other rather than public TestCase
> attributes.
No problem with that, but again adding a comment will help document such cases (so people
can learn from them) and avoid regressions.
> > Now, a way to track the uncollectable test case sounds like a good way to
> > progress but given the amount of work, could you instead devise a patch
> where
> > the check depends on some option ?
> > Once we reach 0 we can toggle the default value for this option.
>
> Well, the "print junk to stdout" is my less intrusive option, but the aim is
> to get it to 0 before landing. However...
I'm fine with that as long as an option control it, there are precedents, look for the lock checks.
> > For the run above, I also get ~4000 Uncollected test cases, far too much to
> > land this patch :-/
>
> This could well be just needing the testtools exc_info cycle cleanup as well,
> branch for that linked in previous comment.
Right, testing with your testtools patch, I still get ~2000 uncollected tests.
>
> ...getting 4000 complaints from not having the testtools change probably does
> mean this needs to land quieter, making everyone patch testtools or be spammed
> isn't polite.
Yup, that's why I suggest an option to control the output that could be stay off by default
until you reach 0 uncollected tests which you can then target in several steps.
In fact, ideally, the first step could be a patch that just imp...
Martin Packman (gz) wrote : Posted in a previous version of this proposal | # |
Quick followup on some points from review last time round.
> > > Huh ? Where are my tests gone ? (Strong suspicion around your modification
> > of
> > > __iter__)
> >
> > I suspect this is related to --parallel which I thought might be affected
> but
> > didn't have any easy way of testing on my machine, I'll take another look at
> > that code.
>
> Clearly --parallel specific and at least one __iter__ you've modified should
> be involved.
This was a combination of subunit playing very differently to the normal selftest result object, and bug 625551 meaning the errors from the children were silently eaten so I didn't notice.
> I'm pretty sure the way the log formatter are registered requires that a
> single instance is used
> and you've changed that. In this case I think breaking the cycle with an
> addCleanup should be ok.
Made this change.
> I'm fine with that as long as an option control it, there are precedents, look
> for the lock checks.
I'm still avoiding this for the moment, so people with 8GB ram at least get some console spam before they completely break my box.
> Right, testing with your testtools patch, I still get ~2000 uncollected tests.
The testtools change landed for 0.9.7 and I'd be interested in a new count. Should now not break with subunit which should help.
Andrew Bennetts (spiv) wrote : | # |
Martin [gz] wrote:
[...]
> As written, the code relies on refcounting working, so cycles need to be
> avoided. The alternative is to run a full gc collection after every test,
> which slow enough to be worth avoiding.
Another alternative: why not a single full gc collection at the end of the
entire run, and then report any test cases that are unexpectedly still
alive?
I don't think you should assume that “uncollected at this instant” implies
“can't be collected yet”. And the real issue is uncollectable objects, not
objects that have fractionally outlived their usefulness, AFAICS.
The point is that if a VM chooses to allow say 20 or even 200 used test case
instances to be alive simultaneously before collecting them, then that's
fine. It'll still keep the memory ceiling reasonably low rather than
growing in proportion to the number of tests. So requiring that a test case
lifetime is strictly minimal is too strong a requirement, and I'm not sure
it's worth paying the price for it. The price is either assuming a
refcounting VM (and the fairly extreme senstivity to refcounting differences
due to different versions of dependencies like testtools or even the Python
stdlib), or forcing a full GC after every test. I agree that the latter is
too much, but I also think the former isn't worth paying either.
To be clear, I don't mind (small) contortions to make most or even all tests
be collected promptly via refcounting, under common conditions. I do mind
emitting warnings when that fails, because I think the noise will outweigh
the benefits.
I'd be very happy to have an optional flag to selftest to force a full GC
after every test (or more creatively, after every N tests, then once a leak
is found re-run the last N individually to pinpoint it...). And maybe even
an optional flag to warn if the refcount-based attempt to release the test
case doesn't release it. It's good to have tools to help developers
diagnose memory issues. I just don't believe these tools should be on all
the time unless they are very cheap and very accurate.
> The branch needs a bit more work still, about a hundred tests still need
> poking to avoid cycles, and needs a NEWS entry and so on. I'd like some
> feedback on if people think this is a worthwhile change over just
> stripping the __dict__ and if there are any other issues I've not
> realised.
Why not strip __dict__ as well? Defense in depth...
Basically, I'm not inclined to go to extreme lengths to get utter perfection
here: if the memory consumption is down, and is likely stay down, and is
fairly easy to diagnose and correct if it doesn't stay down, then I think
that's clearly Good Enough, even if there are a handful of tests that don't
relinquish all their memory, or don't relinquish it as promptly as they
theoretically could.
Aside from the comments above, most of the diff looks ok to me except these
bits:
> +++ bzrlib/
> @@ -575,12 +575,15 @@
> try:
> 1/0
> except ZeroDivisionError:
> - exc_info = sys.exc_info()
> - err = errors.
> - self.asser...
Andrew Bennetts (spiv) wrote : | # |
I said:
> Why not strip __dict__ as well? Defense in depth...
I just realised you do! So that's nice :)
Andrew Bennetts (spiv) wrote : | # |
FWIW, under 4-way --parallel=fork on my laptop:
* I see significant memory savings too (no hard figures as I wasn't watching closely enough, but at least half the size, possibly better).
* The time drops by about 30s (from 560s down to 530s), which is the sort of modest improvement I was hoping to see :)
So, nice work! I just don't want it to be a burden on future development.
Martin Packman (gz) wrote : | # |
Thanks Andrew, this is great feedback.
> Another alternative: why not a single full gc collection at the end of the
> entire run, and then report any test cases that are unexpectedly still
> alive?
I didn't take this approach because I'm already in swapping death by that point. Now the main issues are fixed in this branch that sounds like a reasonable option.
> The point is that if a VM chooses to allow say 20 or even 200 used test case
> instances to be alive simultaneously before collecting them, then that's
> fine. It'll still keep the memory ceiling reasonably low rather than
> growing in proportion to the number of tests. So requiring that a test case
> lifetime is strictly minimal is too strong a requirement, and I'm not sure
> it's worth paying the price for it. The price is either assuming a
> refcounting VM (and the fairly extreme senstivity to refcounting differences
> due to different versions of dependencies like testtools or even the Python
> stdlib), or forcing a full GC after every test. I agree that the latter is
> too much, but I also think the former isn't worth paying either.
It's true cycles are harmless in and off themselves nowadays, but they do make execution non-deterministic. By and large, they are also avoidable and I think something of a code smell in tests at least. Many of the changes I've made to tests actually improve the code, and where it makes things worse tends to be in tests using slightly odd patterns.
I think there are benefits in not creating cycles in bzrlib proper. There are cases where building up a lot of garbage really hurts†, and while performance regressions should be noticed, I like selftest coverage.
There's only one Python implementation bzr can use, and it refcounts, so I don't think relying on that is much of a stretch. Neither Jython nor IronPython‡ are funded any more, and PyPy...
> Why not strip __dict__ as well? Defense in depth...
As you noted this branch does do that, but Robert was trying to avoid this. Having failed tests in a examinable state at the end of the run is potentially useful.
> Basically, I'm not inclined to go to extreme lengths to get utter perfection
> here: if the memory consumption is down, and is likely stay down, and is
> fairly easy to diagnose and correct if it doesn't stay down, then I think
> that's clearly Good Enough, even if there are a handful of tests that don't
> relinquish all their memory, or don't relinquish it as promptly as they
> theoretically could.
I feel this is where we are and have been with thread leaks, and it's been pain felt disproportionately by the wrong people. It's much easier for the test writer to get things correct, than those who end up suffering from the random problems. This is why I'm adding noise, and even considering making tests fail, for such a minor thing. It might be an overreaction, but I'd prefer to have an always on, testable solution than one big switch that will be generally ignored.
> That hunk looks odd. Surely falling off the end of the function
> already deletes all the locals? And if the test fails, I don't really care
> about the extra memory cost. Certainly not enough to clutter the test.
What...
Andrew Bennetts (spiv) wrote : | # |
Martin [gz] wrote:
> Thanks Andrew, this is great feedback.
>
> > Another alternative: why not a single full gc collection at the end of the
> > entire run, and then report any test cases that are unexpectedly still
> > alive?
>
> I didn't take this approach because I'm already in swapping death by that
> point. Now the main issues are fixed in this branch that sounds like a
> reasonable option.
Ah, right. For unexpected swap death it would be good to have an optional mode
along the lines of this patch. Although hopefully it'll never return :)
> > The point is that if a VM chooses to allow say 20 or even 200 used test case
> > instances to be alive simultaneously before collecting them, then that's
> > fine. It'll still keep the memory ceiling reasonably low rather than
> > growing in proportion to the number of tests. So requiring that a test case
> > lifetime is strictly minimal is too strong a requirement, and I'm not sure
> > it's worth paying the price for it. The price is either assuming a
> > refcounting VM (and the fairly extreme senstivity to refcounting differences
> > due to different versions of dependencies like testtools or even the Python
> > stdlib), or forcing a full GC after every test. I agree that the latter is
> > too much, but I also think the former isn't worth paying either.
>
> It's true cycles are harmless in and off themselves nowadays, but they do make
> execution non-deterministic. By and large, they are also avoidable and I think
> something of a code smell in tests at least. Many of the changes I've made to
> tests actually improve the code, and where it makes things worse tends to be
> in tests using slightly odd patterns.
I'm not really bothered by a little non-determinism in tests, especially as we
do so little with __del__ methods or weakref callbacks. If we want to remove
non-determinism in tests we have a much bigger problem: threads.
I'd say the changes you've made to test code are 50-50: about half a touch
nicer, about half a bit worse. I'm talking in terms of clarity, which is
probably the most important concern for test code. Adding try/finally blocks
that happen to break cycles a casual reader or writer wouldn't even think of,
when cycles aren't significant to the behaviour being tested, detract from the
test.
> I think there are benefits in not creating cycles in bzrlib proper. There are
> cases where building up a lot of garbage really hurts†, and while performance
> regressions should be noticed, I like selftest coverage.
Yes, John's noticed the same thing. Manually breaking cycles is a terribly
fragile way to get a 5% benefit though (5% being what I saw in terms of selftest
performance, and roughly what John's noticed in the past for some other issues).
The best fix is to simply create less objects, or at least create less
gc-tracked objects: hence StaticTuple, and other performance work John's done.
Manually breaking cycles here and there throughout the code isn't really
tackling root causes.
I don't mind it in generic higher-order function infrastructure like
bzrlib.decorators and bzrlib.cleanups, where many places benefit from a fairly
simple change (although even there it m...
Vincent Ladeuil (vila) wrote : | # |
This is definitely cleaner ! Excellent.
Yet:
./bzr selftest -s bzrlib.
and doing a full run output an 'Uncollected test case:' line for every test.
Do I miss some special version of testtools ? (I'm running with lp:testtools revno 128).
Martin Packman (gz) wrote : | # |
Sorry, had left tip on launchpad in a slightly broken state, have synced it up with my local branch again.
Andrew Bennetts (spiv) wrote : | # |
I still see a fair few "Uncollected test case" messages when doing "./bzr selftest". Too many to be useful, I fear.
And if I use --parallel=fork (which I do almost all the time) then every test is reported as uncollected! So this still needs a bit more work I guess. I'm going to put this back to Work in Progress to declutter the review queue.
Andrew Bennetts (spiv) wrote : | # |
On the plus side it does still pass regular selftest locally, even after merging latest lp:bzr (no conflicts currently).
Preview Diff
1 | === modified file 'bzrlib/errors.py' |
2 | --- bzrlib/errors.py 2011-06-21 11:34:52 +0000 |
3 | +++ bzrlib/errors.py 2011-07-01 00:12:55 +0000 |
4 | @@ -1659,6 +1659,7 @@ |
5 | |
6 | def __init__(self, exc_info): |
7 | import traceback |
8 | + # GZ 2010-08-10: Cycle with exc_tb/exc_info affects at least one test |
9 | self.exc_type, self.exc_value, self.exc_tb = exc_info |
10 | self.exc_info = exc_info |
11 | traceback_strings = traceback.format_exception( |
12 | |
13 | === modified file 'bzrlib/tests/TestUtil.py' |
14 | --- bzrlib/tests/TestUtil.py 2011-05-26 08:05:45 +0000 |
15 | +++ bzrlib/tests/TestUtil.py 2011-07-01 00:12:55 +0000 |
16 | @@ -19,6 +19,7 @@ |
17 | import sys |
18 | import logging |
19 | import unittest |
20 | +import weakref |
21 | |
22 | from bzrlib import pyutils |
23 | |
24 | @@ -82,14 +83,37 @@ |
25 | tests = list(self) |
26 | tests.reverse() |
27 | self._tests = [] |
28 | + stream = getattr(result, "stream", None) |
29 | + # With subunit, not only is stream underscored, but the actual result |
30 | + # object is hidden inside a wrapper decorator, get out the real stream |
31 | + if stream is None: |
32 | + stream = result.decorated._stream |
33 | + stored_count = 0 |
34 | + count_stored_tests = getattr(result, "_count_stored_tests", int) |
35 | + from bzrlib.tests import selftest_debug_flags |
36 | + notify = "uncollected_cases" in selftest_debug_flags |
37 | while tests: |
38 | if result.shouldStop: |
39 | self._tests = reversed(tests) |
40 | break |
41 | - tests.pop().run(result) |
42 | + case = _run_and_collect_case(tests.pop(), result)() |
43 | + new_stored_count = count_stored_tests() |
44 | + if case is not None and isinstance(case, unittest.TestCase): |
45 | + if stored_count == new_stored_count and notify: |
46 | + # Testcase didn't fail, but somehow is still alive |
47 | + stream.write("Uncollected test case: %s\n" % (case.id(),)) |
48 | + # Zombie the testcase but leave a working stub id method |
49 | + case.__dict__ = {"id": lambda _id=case.id(): _id} |
50 | + stored_count = new_stored_count |
51 | return result |
52 | |
53 | |
54 | +def _run_and_collect_case(case, res): |
55 | + """Run test case against result and use weakref to drop the refcount""" |
56 | + case.run(res) |
57 | + return weakref.ref(case) |
58 | + |
59 | + |
60 | class TestLoader(unittest.TestLoader): |
61 | """Custom TestLoader to extend the stock python one.""" |
62 | |
63 | |
64 | === modified file 'bzrlib/tests/__init__.py' |
65 | --- bzrlib/tests/__init__.py 2011-06-27 15:42:09 +0000 |
66 | +++ bzrlib/tests/__init__.py 2011-07-01 00:12:55 +0000 |
67 | @@ -495,6 +495,10 @@ |
68 | self.not_applicable_count += 1 |
69 | self.report_not_applicable(test, reason) |
70 | |
71 | + def _count_stored_tests(self): |
72 | + """Count of tests instances kept alive due to not succeeding""" |
73 | + return self.error_count + self.failure_count + self.known_failure_count |
74 | + |
75 | def _post_mortem(self, tb=None): |
76 | """Start a PDB post mortem session.""" |
77 | if os.environ.get('BZR_TEST_PDB', None): |
78 | @@ -3159,6 +3163,9 @@ |
79 | result_decorators=result_decorators, |
80 | ) |
81 | runner.stop_on_failure=stop_on_failure |
82 | + if isinstance(suite, unittest.TestSuite): |
83 | + # Empty out _tests list of passed suite and populate new TestSuite |
84 | + suite._tests[:], suite = [], TestSuite(suite) |
85 | # built in decorator factories: |
86 | decorators = [ |
87 | random_order(random_seed, runner), |
88 | @@ -3262,34 +3269,17 @@ |
89 | |
90 | class TestDecorator(TestUtil.TestSuite): |
91 | """A decorator for TestCase/TestSuite objects. |
92 | - |
93 | - Usually, subclasses should override __iter__(used when flattening test |
94 | - suites), which we do to filter, reorder, parallelise and so on, run() and |
95 | - debug(). |
96 | + |
97 | + Contains rather than flattening suite passed on construction |
98 | """ |
99 | |
100 | - def __init__(self, suite): |
101 | - TestUtil.TestSuite.__init__(self) |
102 | - self.addTest(suite) |
103 | - |
104 | - def countTestCases(self): |
105 | - cases = 0 |
106 | - for test in self: |
107 | - cases += test.countTestCases() |
108 | - return cases |
109 | - |
110 | - def debug(self): |
111 | - for test in self: |
112 | - test.debug() |
113 | - |
114 | - def run(self, result): |
115 | - # Use iteration on self, not self._tests, to allow subclasses to hook |
116 | - # into __iter__. |
117 | - for test in self: |
118 | - if result.shouldStop: |
119 | - break |
120 | - test.run(result) |
121 | - return result |
122 | + def __init__(self, suite=None): |
123 | + super(TestDecorator, self).__init__() |
124 | + if suite is not None: |
125 | + self.addTest(suite) |
126 | + |
127 | + # Don't need subclass run method with suite emptying |
128 | + run = unittest.TestSuite.run |
129 | |
130 | |
131 | class CountingDecorator(TestDecorator): |
132 | @@ -3306,90 +3296,50 @@ |
133 | """A decorator which excludes test matching an exclude pattern.""" |
134 | |
135 | def __init__(self, suite, exclude_pattern): |
136 | - TestDecorator.__init__(self, suite) |
137 | - self.exclude_pattern = exclude_pattern |
138 | - self.excluded = False |
139 | - |
140 | - def __iter__(self): |
141 | - if self.excluded: |
142 | - return iter(self._tests) |
143 | - self.excluded = True |
144 | - suite = exclude_tests_by_re(self, self.exclude_pattern) |
145 | - del self._tests[:] |
146 | - self.addTests(suite) |
147 | - return iter(self._tests) |
148 | + super(ExcludeDecorator, self).__init__( |
149 | + exclude_tests_by_re(suite, exclude_pattern)) |
150 | |
151 | |
152 | class FilterTestsDecorator(TestDecorator): |
153 | """A decorator which filters tests to those matching a pattern.""" |
154 | |
155 | def __init__(self, suite, pattern): |
156 | - TestDecorator.__init__(self, suite) |
157 | - self.pattern = pattern |
158 | - self.filtered = False |
159 | - |
160 | - def __iter__(self): |
161 | - if self.filtered: |
162 | - return iter(self._tests) |
163 | - self.filtered = True |
164 | - suite = filter_suite_by_re(self, self.pattern) |
165 | - del self._tests[:] |
166 | - self.addTests(suite) |
167 | - return iter(self._tests) |
168 | + super(FilterTestsDecorator, self).__init__( |
169 | + filter_suite_by_re(suite, pattern)) |
170 | |
171 | |
172 | class RandomDecorator(TestDecorator): |
173 | """A decorator which randomises the order of its tests.""" |
174 | |
175 | def __init__(self, suite, random_seed, stream): |
176 | - TestDecorator.__init__(self, suite) |
177 | - self.random_seed = random_seed |
178 | - self.randomised = False |
179 | - self.stream = stream |
180 | - |
181 | - def __iter__(self): |
182 | - if self.randomised: |
183 | - return iter(self._tests) |
184 | - self.randomised = True |
185 | - self.stream.write("Randomizing test order using seed %s\n\n" % |
186 | - (self.actual_seed())) |
187 | + random_seed = self.actual_seed(random_seed) |
188 | + stream.write("Randomizing test order using seed %s\n\n" % |
189 | + (random_seed,)) |
190 | # Initialise the random number generator. |
191 | - random.seed(self.actual_seed()) |
192 | - suite = randomize_suite(self) |
193 | - del self._tests[:] |
194 | - self.addTests(suite) |
195 | - return iter(self._tests) |
196 | + random.seed(random_seed) |
197 | + super(RandomDecorator, self).__init__(randomize_suite(suite)) |
198 | |
199 | - def actual_seed(self): |
200 | - if self.random_seed == "now": |
201 | + @staticmethod |
202 | + def actual_seed(seed): |
203 | + if seed == "now": |
204 | # We convert the seed to a long to make it reuseable across |
205 | # invocations (because the user can reenter it). |
206 | - self.random_seed = long(time.time()) |
207 | + return long(time.time()) |
208 | else: |
209 | # Convert the seed to a long if we can |
210 | try: |
211 | - self.random_seed = long(self.random_seed) |
212 | - except: |
213 | + return long(seed) |
214 | + except (TypeError, ValueError): |
215 | pass |
216 | - return self.random_seed |
217 | + return seed |
218 | |
219 | |
220 | class TestFirstDecorator(TestDecorator): |
221 | """A decorator which moves named tests to the front.""" |
222 | |
223 | def __init__(self, suite, pattern): |
224 | - TestDecorator.__init__(self, suite) |
225 | - self.pattern = pattern |
226 | - self.filtered = False |
227 | - |
228 | - def __iter__(self): |
229 | - if self.filtered: |
230 | - return iter(self._tests) |
231 | - self.filtered = True |
232 | - suites = split_suite_by_re(self, self.pattern) |
233 | - del self._tests[:] |
234 | - self.addTests(suites) |
235 | - return iter(self._tests) |
236 | + super(TestFirstDecorator, self).__init__() |
237 | + self.addTests(split_suite_by_re(suite, pattern)) |
238 | |
239 | |
240 | def partition_tests(suite, count): |
241 | @@ -3442,9 +3392,12 @@ |
242 | os.waitpid(self.pid, 0) |
243 | |
244 | test_blocks = partition_tests(suite, concurrency) |
245 | + # Clear the tests from the original suite so it doesn't keep them alive |
246 | + suite._tests[:] = [] |
247 | for process_tests in test_blocks: |
248 | - process_suite = TestUtil.TestSuite() |
249 | - process_suite.addTests(process_tests) |
250 | + process_suite = TestUtil.TestSuite(process_tests) |
251 | + # Also clear each split list so new suite has only reference |
252 | + process_tests[:] = [] |
253 | c2pread, c2pwrite = os.pipe() |
254 | pid = os.fork() |
255 | if pid == 0: |
256 | @@ -3456,12 +3409,15 @@ |
257 | # read from stdin (otherwise its a roulette to see what |
258 | # child actually gets keystrokes for pdb etc). |
259 | sys.stdin.close() |
260 | - sys.stdin = None |
261 | + # GZ 2011-06-16: Why set stdin to None? Breaks multi fork. |
262 | + #sys.stdin = None |
263 | stream = os.fdopen(c2pwrite, 'wb', 1) |
264 | subunit_result = AutoTimingTestResultDecorator( |
265 | TestProtocolClient(stream)) |
266 | process_suite.run(subunit_result) |
267 | finally: |
268 | + # GZ 2011-06-16: Is always exiting with silent success |
269 | + # really the right thing? Hurts debugging. |
270 | os._exit(0) |
271 | else: |
272 | os.close(c2pwrite) |
273 | @@ -3574,8 +3530,13 @@ |
274 | # with proper exclusion rules. |
275 | # -Ethreads Will display thread ident at creation/join time to |
276 | # help track thread leaks |
277 | +<<<<<<< TREE |
278 | |
279 | # -Econfig_stats Will collect statistics using addDetail |
280 | +======= |
281 | +# -Euncollected_cases Display the identity of any test cases that weren't |
282 | +# deallocated after being completed. |
283 | +>>>>>>> MERGE-SOURCE |
284 | selftest_debug_flags = set() |
285 | |
286 | |
287 | @@ -4683,6 +4644,14 @@ |
288 | from subunit.test_results import AutoTimingTestResultDecorator |
289 | class SubUnitBzrProtocolClient(TestProtocolClient): |
290 | |
291 | + # GZ 2011-05-26: This duplicates logic in ExtendedTestResult.stopTest |
292 | + def stopTest(self, test): |
293 | + super(SubUnitBzrProtocolClient, self).stopTest(test) |
294 | + # Break bound method cycles added in Python 2.7 unittest rewrite |
295 | + type_equality_funcs = getattr(test, "_type_equality_funcs", None) |
296 | + if type_equality_funcs is not None: |
297 | + type_equality_funcs.clear() |
298 | + |
299 | def addSuccess(self, test, details=None): |
300 | # The subunit client always includes the details in the subunit |
301 | # stream, but we don't want to include it in ours. |
302 | |
303 | === modified file 'bzrlib/tests/test_selftest.py' |
304 | --- bzrlib/tests/test_selftest.py 2011-06-17 13:59:23 +0000 |
305 | +++ bzrlib/tests/test_selftest.py 2011-07-01 00:12:55 +0000 |
306 | @@ -17,6 +17,7 @@ |
307 | """Tests for the test framework.""" |
308 | |
309 | from cStringIO import StringIO |
310 | +import gc |
311 | import doctest |
312 | import os |
313 | import signal |
314 | @@ -3385,6 +3386,113 @@ |
315 | self.assertLength(1, calls) |
316 | |
317 | |
318 | +class TestUncollectedWarnings(tests.TestCase): |
319 | + """Check a test case still alive after being run emits a warning""" |
320 | + |
321 | + class Test(tests.TestCase): |
322 | + def test_pass(self): |
323 | + pass |
324 | + def test_self_ref(self): |
325 | + self.also_self = self.test_self_ref |
326 | + def test_skip(self): |
327 | + self.skip("Don't need") |
328 | + |
329 | + def _get_suite(self): |
330 | + return TestUtil.TestSuite([ |
331 | + self.Test("test_pass"), |
332 | + self.Test("test_self_ref"), |
333 | + self.Test("test_skip"), |
334 | + ]) |
335 | + |
336 | + def _inject_stream_into_subunit(self, stream): |
337 | + """To be overridden by subclasses that run tests out of process""" |
338 | + |
339 | + def _run_selftest_with_suite(self, **kwargs): |
340 | + sio = StringIO() |
341 | + self._inject_stream_into_subunit(sio) |
342 | + old_flags = tests.selftest_debug_flags |
343 | + tests.selftest_debug_flags = old_flags.union(["uncollected_cases"]) |
344 | + gc_on = gc.isenabled() |
345 | + if gc_on: |
346 | + gc.disable() |
347 | + try: |
348 | + tests.selftest(test_suite_factory=self._get_suite, stream=sio, |
349 | + **kwargs) |
350 | + finally: |
351 | + if gc_on: |
352 | + gc.enable() |
353 | + tests.selftest_debug_flags = old_flags |
354 | + output = sio.getvalue() |
355 | + self.assertNotContainsRe(output, "Uncollected test case.*test_pass") |
356 | + self.assertContainsRe(output, "Uncollected test case.*test_self_ref") |
357 | + return output |
358 | + |
359 | + def test_testsuite(self): |
360 | + self._run_selftest_with_suite() |
361 | + |
362 | + def test_pattern(self): |
363 | + out = self._run_selftest_with_suite(pattern="test_(?:pass|self_ref)$") |
364 | + self.assertNotContainsRe(out, "test_skip") |
365 | + |
366 | + def test_exclude_pattern(self): |
367 | + out = self._run_selftest_with_suite(exclude_pattern="test_skip$") |
368 | + self.assertNotContainsRe(out, "test_skip") |
369 | + |
370 | + def test_random_seed(self): |
371 | + self._run_selftest_with_suite(random_seed="now") |
372 | + |
373 | + def test_matching_tests_first(self): |
374 | + self._run_selftest_with_suite(matching_tests_first=True, |
375 | + pattern="test_self_ref$") |
376 | + |
377 | + def test_starting_with_and_exclude(self): |
378 | + out = self._run_selftest_with_suite(starting_with=["bt."], |
379 | + exclude_pattern="test_skip$") |
380 | + self.assertNotContainsRe(out, "test_skip") |
381 | + |
382 | + def test_additonal_decorator(self): |
383 | + out = self._run_selftest_with_suite( |
384 | + suite_decorators=[tests.TestDecorator]) |
385 | + |
386 | + |
387 | +class TestUncollectedWarningsSubunit(TestUncollectedWarnings): |
388 | + """Check warnings from tests staying alive are emitted with subunit""" |
389 | + |
390 | + _test_needs_features = [features.subunit] |
391 | + |
392 | + def _run_selftest_with_suite(self, **kwargs): |
393 | + return TestUncollectedWarnings._run_selftest_with_suite(self, |
394 | + runner_class=tests.SubUnitBzrRunner, **kwargs) |
395 | + |
396 | + |
397 | +class TestUncollectedWarningsForking(TestUncollectedWarnings): |
398 | + """Check warnings from tests staying alive are emitted when forking""" |
399 | + |
400 | + _test_needs_features = [features.subunit] |
401 | + |
402 | + def _inject_stream_into_subunit(self, stream): |
403 | + """Monkey-patch subunit so the extra output goes to stream not stdout |
404 | + |
405 | + Some APIs need rewriting so this kind of bogus hackery can be replaced |
406 | + by passing the stream param from run_tests down into ProtocolTestCase. |
407 | + """ |
408 | + from subunit import ProtocolTestCase |
409 | + _original_init = ProtocolTestCase.__init__ |
410 | + def _init_with_passthrough(self, *args, **kwargs): |
411 | + _original_init(self, *args, **kwargs) |
412 | + self._passthrough = stream |
413 | + self.overrideAttr(ProtocolTestCase, "__init__", _init_with_passthrough) |
414 | + |
415 | + def _run_selftest_with_suite(self, **kwargs): |
416 | + # GZ 2011-05-26: Add a PosixSystem feature so this check can go away |
417 | + if getattr(os, "fork", None) is None: |
418 | + raise tests.TestNotApplicable("Platform doesn't support forking") |
419 | + # Make sure the fork code is actually invoked by claiming two cores |
420 | + self.overrideAttr(osutils, "local_concurrency", lambda: 2) |
421 | + kwargs.setdefault("suite_decorators", []).append(tests.fork_decorator) |
422 | + return TestUncollectedWarnings._run_selftest_with_suite(self, **kwargs) |
423 | + |
424 | + |
425 | class TestEnvironHandling(tests.TestCase): |
426 | |
427 | def test_overrideEnv_None_called_twice_doesnt_leak(self): |
So, I think this is more correct; fixing cycles is pretty important
for us in other ways : I like that we're finding latent issues here.
I'm not worried about the references to test cases in errors/failures
lists : they are a benefit (and the __dict__ stripping was a _REAL_
pain there in the past).
-Rob