Merge lp:~mbp/bzr/386180-check into lp:~bzr/bzr/trunk-old
- 386180-check
- Merge into trunk-old
Status: | Rejected | ||||
---|---|---|---|---|---|
Rejected by: | Martin Pool | ||||
Proposed branch: | lp:~mbp/bzr/386180-check | ||||
Merge into: | lp:~bzr/bzr/trunk-old | ||||
Diff against target: | 32 lines | ||||
To merge this branch: | bzr merge lp:~mbp/bzr/386180-check | ||||
Related bugs: |
|
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Robert Collins (community) | Disapprove | ||
Review via email: mp+7365@code.launchpad.net |
Commit message
Description of the change
Martin Pool (mbp) wrote : | # |
Robert Collins (lifeless) wrote : | # |
On Fri, 2009-06-12 at 01:50 +0000, Martin Pool wrote:
>
> This may be a bit controversial because, hey, no one likes reducing
> test coverage, and it's possible that sometimes a test will fail in
> LANG=C but pass in LANG=en_AU@UTF8 or vice versa. However, I think
> pqm rarely catches such tests, and it's not worth the latency price we
> pay.
We could also drop latency by getting a second CPU in pqm and using
selftest --parallel=fork in make check. This wouldn't reduce coverage at
all. Every CPU we add would reduce test latency.
C and a UTF8 locale really does catch a surprising amount of bugs. I'm
really hesitant to do this.
review -1
-Rob
John A Meinel (jameinel) wrote : | # |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Robert Collins wrote:
> Review: Disapprove
> On Fri, 2009-06-12 at 01:50 +0000, Martin Pool wrote:
>> This may be a bit controversial because, hey, no one likes reducing
>> test coverage, and it's possible that sometimes a test will fail in
>> LANG=C but pass in LANG=en_AU@UTF8 or vice versa. However, I think
>> pqm rarely catches such tests, and it's not worth the latency price we
>> pay.
>
> We could also drop latency by getting a second CPU in pqm and using
> selftest --parallel=fork in make check. This wouldn't reduce coverage at
> all. Every CPU we add would reduce test latency.
>
> C and a UTF8 locale really does catch a surprising amount of bugs. I'm
> really hesitant to do this.
>
> review -1
>
> -Rob
>
I'll also comment that it does more than just test with UTF-8 and C, it
also tests with "-O" and without.
Now, I suppose our code is 'assert clean' so we don't need that any
more. But at least one time it was designed such that it ran with all
asserts off, and then with all asserts on.
I would probably argue that we just need better code coverage like we
have in some of the blackbox tests. (EncodingAdapter coverage.)
Honestly, it has been a long time since one of my submissions has been
rejected once it got to [ascii]. Whenever I saw that, it meant I could
relax as I knew it would land.
So, IMO, that sort of thing should be being tested with blackbox
permutation tests, rather than whole-test-suite testing...
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAko
uYwAoLQ8+
=MjUy
-----END PGP SIGNATURE-----
Martin Pool (mbp) wrote : | # |
2009/6/12 Robert Collins <email address hidden>:
> We could also drop latency by getting a second CPU in pqm and using
> selftest --parallel=fork in make check. This wouldn't reduce coverage at
> all. Every CPU we add would reduce test latency.
Sure, but that's not a one-line change. :-) There's an RT open for
months if not years asking for a faster machine; it doesn't mean we
shouldn't be more efficient.
It may even have multiple cores already? Or it may be worth forking
two processes even on a single core. At any rate it's orthogonal to
this patch.
> C and a UTF8 locale really does catch a surprising amount of bugs. I'm
> really hesitant to do this.
So that's the key data point: how often do you get changes bounced
because of a failure in [ascii] mode? If it's trapping at least say
once per month then I'd agree it's worth keeping.
--
Martin <http://
Martin Pool (mbp) wrote : | # |
2009/6/12 John A Meinel <email address hidden>:
> I'll also comment that it does more than just test with UTF-8 and C, it
> also tests with "-O" and without.
>
> Now, I suppose our code is 'assert clean' so we don't need that any
> more. But at least one time it was designed such that it ran with all
> asserts off, and then with all asserts on.
I think banning asserts is a good example of being a bit smarter to
reduce test time. (And actual runtime variance too.)
>
> I would probably argue that we just need better code coverage like we
> have in some of the blackbox tests. (EncodingAdapter coverage.)
>
> Honestly, it has been a long time since one of my submissions has been
> rejected once it got to [ascii]. Whenever I saw that, it meant I could
> relax as I knew it would land.
>
> So, IMO, that sort of thing should be being tested with blackbox
> permutation tests, rather than whole-test-suite testing...
That's what I'm saying.
I think running the tests in ascii mode is to some extent a laziness
against adding real tests where we need them. It might make us
realize some tests should run in say a non-utf8 locale or a
non-English language.
--
Martin <http://
Robert Collins (lifeless) wrote : | # |
On Fri, 2009-06-12 at 08:12 +0000, Martin Pool wrote:
> 2009/6/12 Robert Collins <email address hidden>:
> > C and a UTF8 locale really does catch a surprising amount of bugs. I'm
> > really hesitant to do this.
>
> So that's the key data point: how often do you get changes bounced
> because of a failure in [ascii] mode? If it's trapping at least say
> once per month then I'd agree it's worth keeping.
For me, about 50% of non-ascii wt related changes I make get trapped.
The frequency of those changes varies depending on work tasks of course.
-Rob
Vincent Ladeuil (vila) wrote : | # |
>>>>> "martin" == Martin Pool <email address hidden> writes:
<snip/>
martin> I think running the tests in ascii mode is to some
martin> extent a laziness against adding real tests where we
martin> need them.
Then we can't delete the [ascii] tests until we add the missing
tests.
Or did I miss something ?
martin> It might make us realize some tests should run in say
martin> a non-utf8 locale or a non-English language.
My limited experience in that regard is:
http://
If you look at that patch history, you'll notice that the last
commit there fixed an issue detected by PQM indeed.
We also had a long standing bug where PQM wasn't correctly
testing the unicode issues.
All in all, I'm far from convinced we can get rid of them.
Vincent
John A Meinel (jameinel) wrote : | # |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Vincent Ladeuil wrote:
>>>>>> "martin" == Martin Pool <email address hidden> writes:
>
> <snip/>
>
> martin> I think running the tests in ascii mode is to some
> martin> extent a laziness against adding real tests where we
> martin> need them.
>
> Then we can't delete the [ascii] tests until we add the missing
> tests.
>
> Or did I miss something ?
>
Well, it means running the *entire* test suite in ascii mode. It isn't a
specific subset of tests.
Since one can always run:
LANG=C ./bzr selftest
if that catches things that
./bzr selftest
doesn't, then it is a good indication of a missing test....
> martin> It might make us realize some tests should run in say
> martin> a non-utf8 locale or a non-English language.
>
> My limited experience in that regard is:
>
> http://
>
> If you look at that patch history, you'll notice that the last
> commit there fixed an issue detected by PQM indeed.
>
> We also had a long standing bug where PQM wasn't correctly
> testing the unicode issues.
>
> All in all, I'm far from convinced we can get rid of them.
>
> Vincent
So I think the change you bring up is this one:
http://
Which is actually *only* adding "self.requireFe
which means that it is *skipping* your new tests on ASCII mode.
I'm not convinced that having PQM test that it can skip tests because of
LANG values is entirely necessary.
Now if you had actually fixed a bug in your code because of a PQM
rejection, I might be more convinced.
Personally, I think we can work towards a 2-test system. With PQM
running tests that must pass to get merged into mainline, and a
build-bot style testing across more platforms/
whether we are passing sufficiently well to make a release. This is
somewhat similar to what Launchpad switched to, except they gutted their
PQM test suite, so it only tests that the merge was clean.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAko
56UAniioZto6BOp
=CubY
-----END PGP SIGNATURE-----
Vincent Ladeuil (vila) wrote : | # |
>>>>> "martin" == Martin Pool <email address hidden> writes:
martin> It may even have multiple cores already? Or it may be
martin> worth forking two processes even on a single core.
Not in my, admittedly, limited experiments.
martin> At any rate it's orthogonal to this patch.
If the purpose of this patch was to make PQM snappier, I think
you just vetoed your own patch by proposing a better alternative :)
martin> So that's the key data point: how often do you get
martin> changes bounced because of a failure in [ascii] mode?
martin> If it's trapping at least say once per month then I'd
martin> agree it's worth keeping.
Wow, that is heavily influenced by the way you run your tests !
I know many people rely on PQM to do it and their experience may
be more valuable than mine which almost never do that.
Still, my experience is that pqm has often caught failures in
ascii, less than once a month, but what a blessing !
I don't want to sound like the TDD extremist I am not (or maybe I
am ? :) but the sooner you catch a bug the cheaper it is to
fix[1].
The key data point here, to me, is that risking to miss an easy
bug catching that will cost X engineer hours should be compared
to the cost of upgrading PQM[2].
Vincent
[1]: Which, to me, definitely rules out 'Test Last' against 'Test
First', but I diverge.
[2]: If you haven't used --parallel=fork daily on a multi-core
yet, you should really try.
Vincent Ladeuil (vila) wrote : | # |
>>>>> "jam" == John A Meinel <email address hidden> writes:
jam> Vincent Ladeuil wrote:
>>>>>>> "martin" == Martin Pool <email address hidden> writes:
>>
>> <snip/>
>>
martin> I think running the tests in ascii mode is to some
martin> extent a laziness against adding real tests where we
martin> need them.
>>
>> Then we can't delete the [ascii] tests until we add the missing
>> tests.
>>
>> Or did I miss something ?
>>
jam> Well, it means running the *entire* test suite in ascii
jam> mode. It isn't a specific subset of tests.
I know :)
But I've been surprised by many test failures in ascii...
jam> Since one can always run:
jam> LANG=C ./bzr selftest
But who does ?
PQM does.
jam> if that catches things that
jam> ./bzr selftest
jam> doesn't, then it is a good indication of a missing test....
Not in the case I refer below though...
<snip/>
jam> Now if you had actually fixed a bug in your code because
jam> of a PQM rejection, I might be more convinced.
I had to dig more to find such examples.
But I still consider fixing bug in *test* code to be of value, we
rely on tests to validate our code, we can't accept bugs there
either.
jam> Personally, I think we can work towards a 2-test
jam> system. With PQM running tests that must pass to get
jam> merged into mainline, and a build-bot style testing
jam> across more platforms/
jam> are passing sufficiently well to make a release. This is
jam> somewhat similar to what Launchpad switched to, except
jam> they gutted their PQM test suite, so it only tests that
jam> the merge was clean.
Now you're bribing me in a way... I can hardly believe :)
Of course I support that idea.
But let's do things in the right order, first we set up the test
farm, then we relax PQM constraints :)
Vincent
Robert Collins (lifeless) wrote : | # |
On Fri, 2009-06-12 at 13:09 +0000, John A Meinel wrote:
> Personally, I think we can work towards a 2-test system. With PQM
> running tests that must pass to get merged into mainline, and a
> build-bot style testing across more platforms/
> whether we are passing sufficiently well to make a release. This is
> somewhat similar to what Launchpad switched to, except they gutted their
> PQM test suite, so it only tests that the merge was clean.
My dream test environment would be:
pqm runs tests in parallel on all platforms we care about.
*and* the test suite is about 10 times faster than it is today.
-Rob
Martin Pool (mbp) wrote : | # |
2009/6/12 Vincent Ladeuil <email address hidden>:
> martin> At any rate it's orthogonal to this patch.
>
> If the purpose of this patch was to make PQM snappier, I think
> you just vetoed your own patch by proposing a better alternative :)
It's not better because it will not be done soon.
> The key data point here, to me, is that risking to miss an easy
> bug catching that will cost X engineer hours should be compared
> to the cost of upgrading PQM[2].
The point is at the moment that slow merges cost engineer time too,
especially when trying to make a release.
--
Martin <http://
Vincent Ladeuil (vila) wrote : | # |
>>>>> "martin" == Martin Pool writes:
martin> 2009/6/12 Vincent Ladeuil <email address hidden>:
>> martin> At any rate it's orthogonal to this patch.
>>
>> If the purpose of this patch was to make PQM snappier, I think
>> you just vetoed your own patch by proposing a better alternative :)
martin> It's not better because it will not be done soon.
Ok, I didn't understand that.
>> The key data point here, to me, is that risking to miss an easy
>> bug catching that will cost X engineer hours should be compared
>> to the cost of upgrading PQM[2].
martin> The point is at the moment that slow merges cost engineer time too,
martin> especially when trying to make a release.
Point taken.
Martin Pool (mbp) wrote : | # |
I'm going to reject my own patch. Once we have Buildbot going, I'd like to merge this change and then set up a different machine that asynchronously tests them in LANG=C and for that matter LANG=fr_FR.iso-8859
Preview Diff
1 | === modified file 'Makefile' |
2 | --- Makefile 2009-06-10 09:34:02 +0000 |
3 | +++ Makefile 2009-06-16 02:35:41 +0000 |
4 | @@ -1,4 +1,4 @@ |
5 | -# Copyright (C) 2005, 2006, 2007, 2008 Canonical Ltd |
6 | +# Copyright (C) 2005, 2006, 2007, 2008, 2009 Canonical Ltd |
7 | # |
8 | # This program is free software; you can redistribute it and/or modify |
9 | # it under the terms of the GNU General Public License as published by |
10 | @@ -35,8 +35,6 @@ |
11 | |
12 | check-nodocs: extensions |
13 | $(PYTHON) -Werror -O ./bzr selftest -1v $(tests) |
14 | - @echo "Running all tests with no locale." |
15 | - LC_CTYPE= LANG=C LC_ALL= ./bzr selftest -1v $(tests) 2>&1 | sed -e 's/^/[ascii] /' |
16 | |
17 | # Run Python style checker (apt-get install pyflakes) |
18 | # |
19 | |
20 | === modified file 'NEWS' |
21 | --- NEWS 2009-06-15 15:20:24 +0000 |
22 | +++ NEWS 2009-06-16 02:35:41 +0000 |
23 | @@ -206,6 +206,9 @@ |
24 | Testing |
25 | ******* |
26 | |
27 | +* ``make check`` no longer repeats the test run in ``LANG=C``. |
28 | + (Martin Pool, #386180) |
29 | + |
30 | * The number of cores is now correctly detected on OSX. (John Szakmeister) |
31 | |
32 | * The number of cores is also detected on Solaris and win32. (Vincent Ladeuil) |
Let's make our landing process a bit easier by cutting in half the time for pqm to process a merge: run 'bzr selftest' only once from 'make check'.
This may be a bit controversial because, hey, no one likes reducing test coverage, and it's possible that sometimes a test will fail in LANG=C but pass in LANG=en_AU@UTF8 or vice versa. However, I think pqm rarely catches such tests, and it's not worth the latency price we pay.
Catching and fixing bugs related to locales is good, but I think better tackled by:
* running tests in diverse locales (operating systems, languages, codepages, etc) and we can't do them all in pqm
* adding tests for specific operations in specific encodings