Merge lp:~mbp/bzr/386180-check into lp:~bzr/bzr/trunk-old

Proposed by Martin Pool
Status: Rejected
Rejected by: Martin Pool
Proposed branch: lp:~mbp/bzr/386180-check
Merge into: lp:~bzr/bzr/trunk-old
Diff against target: 32 lines
To merge this branch: bzr merge lp:~mbp/bzr/386180-check
Reviewer Review Type Date Requested Status
Robert Collins (community) Disapprove
Review via email: mp+7365@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Martin Pool (mbp) wrote :

Let's make our landing process a bit easier by cutting in half the time for pqm to process a merge: run 'bzr selftest' only once from 'make check'.

This may be a bit controversial because, hey, no one likes reducing test coverage, and it's possible that sometimes a test will fail in LANG=C but pass in LANG=en_AU@UTF8 or vice versa. However, I think pqm rarely catches such tests, and it's not worth the latency price we pay.

Catching and fixing bugs related to locales is good, but I think better tackled by:

 * running tests in diverse locales (operating systems, languages, codepages, etc) and we can't do them all in pqm
 * adding tests for specific operations in specific encodings

Revision history for this message
Robert Collins (lifeless) wrote :

On Fri, 2009-06-12 at 01:50 +0000, Martin Pool wrote:
>
> This may be a bit controversial because, hey, no one likes reducing
> test coverage, and it's possible that sometimes a test will fail in
> LANG=C but pass in LANG=en_AU@UTF8 or vice versa. However, I think
> pqm rarely catches such tests, and it's not worth the latency price we
> pay.

We could also drop latency by getting a second CPU in pqm and using
selftest --parallel=fork in make check. This wouldn't reduce coverage at
all. Every CPU we add would reduce test latency.

C and a UTF8 locale really does catch a surprising amount of bugs. I'm
really hesitant to do this.

 review -1

-Rob

review: Disapprove
Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> Review: Disapprove
> On Fri, 2009-06-12 at 01:50 +0000, Martin Pool wrote:
>> This may be a bit controversial because, hey, no one likes reducing
>> test coverage, and it's possible that sometimes a test will fail in
>> LANG=C but pass in LANG=en_AU@UTF8 or vice versa. However, I think
>> pqm rarely catches such tests, and it's not worth the latency price we
>> pay.
>
> We could also drop latency by getting a second CPU in pqm and using
> selftest --parallel=fork in make check. This wouldn't reduce coverage at
> all. Every CPU we add would reduce test latency.
>
> C and a UTF8 locale really does catch a surprising amount of bugs. I'm
> really hesitant to do this.
>
> review -1
>
> -Rob
>

I'll also comment that it does more than just test with UTF-8 and C, it
also tests with "-O" and without.

Now, I suppose our code is 'assert clean' so we don't need that any
more. But at least one time it was designed such that it ran with all
asserts off, and then with all asserts on.

I would probably argue that we just need better code coverage like we
have in some of the blackbox tests. (EncodingAdapter coverage.)

Honestly, it has been a long time since one of my submissions has been
rejected once it got to [ascii]. Whenever I saw that, it meant I could
relax as I knew it would land.

So, IMO, that sort of thing should be being tested with blackbox
permutation tests, rather than whole-test-suite testing...

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkoxxhkACgkQJdeBCYSNAAND/ACeMv6mi4BzKqMC4Osr3BnPWCWR
uYwAoLQ8+3U0ki2MZugJ7QgZsePgFiwy
=MjUy
-----END PGP SIGNATURE-----

Revision history for this message
Martin Pool (mbp) wrote :

2009/6/12 Robert Collins <email address hidden>:

> We could also drop latency by getting a second CPU in pqm and using
> selftest --parallel=fork in make check. This wouldn't reduce coverage at
> all. Every CPU we add would reduce test latency.

Sure, but that's not a one-line change. :-) There's an RT open for
months if not years asking for a faster machine; it doesn't mean we
shouldn't be more efficient.

It may even have multiple cores already? Or it may be worth forking
two processes even on a single core. At any rate it's orthogonal to
this patch.

> C and a UTF8 locale really does catch a surprising amount of bugs. I'm
> really hesitant to do this.

So that's the key data point: how often do you get changes bounced
because of a failure in [ascii] mode? If it's trapping at least say
once per month then I'd agree it's worth keeping.

--
Martin <http://launchpad.net/~mbp/>

Revision history for this message
Martin Pool (mbp) wrote :

2009/6/12 John A Meinel <email address hidden>:

> I'll also comment that it does more than just test with UTF-8 and C, it
> also tests with "-O" and without.
>
> Now, I suppose our code is 'assert clean' so we don't need that any
> more. But at least one time it was designed such that it ran with all
> asserts off, and then with all asserts on.

I think banning asserts is a good example of being a bit smarter to
reduce test time. (And actual runtime variance too.)

>
> I would probably argue that we just need better code coverage like we
> have in some of the blackbox tests. (EncodingAdapter coverage.)
>
> Honestly, it has been a long time since one of my submissions has been
> rejected once it got to [ascii]. Whenever I saw that, it meant I could
> relax as I knew it would land.
>
> So, IMO, that sort of thing should be being tested with blackbox
> permutation tests, rather than whole-test-suite testing...

That's what I'm saying.

I think running the tests in ascii mode is to some extent a laziness
against adding real tests where we need them. It might make us
realize some tests should run in say a non-utf8 locale or a
non-English language.

--
Martin <http://launchpad.net/~mbp/>

Revision history for this message
Robert Collins (lifeless) wrote :

On Fri, 2009-06-12 at 08:12 +0000, Martin Pool wrote:
> 2009/6/12 Robert Collins <email address hidden>:

> > C and a UTF8 locale really does catch a surprising amount of bugs. I'm
> > really hesitant to do this.
>
> So that's the key data point: how often do you get changes bounced
> because of a failure in [ascii] mode? If it's trapping at least say
> once per month then I'd agree it's worth keeping.

For me, about 50% of non-ascii wt related changes I make get trapped.
The frequency of those changes varies depending on work tasks of course.

-Rob

Revision history for this message
Vincent Ladeuil (vila) wrote :

>>>>> "martin" == Martin Pool <email address hidden> writes:

<snip/>

    martin> I think running the tests in ascii mode is to some
    martin> extent a laziness against adding real tests where we
    martin> need them.

Then we can't delete the [ascii] tests until we add the missing
tests.

Or did I miss something ?

    martin> It might make us realize some tests should run in say
    martin> a non-utf8 locale or a non-English language.

My limited experience in that regard is:

http://bazaar.launchpad.net/~bzr/bzr/trunk/revision/4332

If you look at that patch history, you'll notice that the last
commit there fixed an issue detected by PQM indeed.

We also had a long standing bug where PQM wasn't correctly
testing the unicode issues.

All in all, I'm far from convinced we can get rid of them.

    Vincent

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Vincent Ladeuil wrote:
>>>>>> "martin" == Martin Pool <email address hidden> writes:
>
> <snip/>
>
> martin> I think running the tests in ascii mode is to some
> martin> extent a laziness against adding real tests where we
> martin> need them.
>
> Then we can't delete the [ascii] tests until we add the missing
> tests.
>
> Or did I miss something ?
>

Well, it means running the *entire* test suite in ascii mode. It isn't a
specific subset of tests.

Since one can always run:

LANG=C ./bzr selftest

if that catches things that
./bzr selftest

doesn't, then it is a good indication of a missing test....

> martin> It might make us realize some tests should run in say
> martin> a non-utf8 locale or a non-English language.
>
> My limited experience in that regard is:
>
> http://bazaar.launchpad.net/~bzr/bzr/trunk/revision/4332
>
> If you look at that patch history, you'll notice that the last
> commit there fixed an issue detected by PQM indeed.
>
> We also had a long standing bug where PQM wasn't correctly
> testing the unicode issues.
>
> All in all, I'm far from convinced we can get rid of them.
>
> Vincent

So I think the change you bring up is this one:
http://bazaar.launchpad.net/~bzr/bzr/trunk/revision/4241.14.25

Which is actually *only* adding "self.requireFeature(UnicodeFilename)"
which means that it is *skipping* your new tests on ASCII mode.

I'm not convinced that having PQM test that it can skip tests because of
LANG values is entirely necessary.

Now if you had actually fixed a bug in your code because of a PQM
rejection, I might be more convinced.

Personally, I think we can work towards a 2-test system. With PQM
running tests that must pass to get merged into mainline, and a
build-bot style testing across more platforms/permutations that test
whether we are passing sufficiently well to make a release. This is
somewhat similar to what Launchpad switched to, except they gutted their
PQM test suite, so it only tests that the merge was clean.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkoyU1YACgkQJdeBCYSNAANCgQCfZ/meYDPuy8TlQiDLu3W0bnS7
56UAniioZto6BOpc0WDipN4vOL4SXmXJ
=CubY
-----END PGP SIGNATURE-----

Revision history for this message
Vincent Ladeuil (vila) wrote :

>>>>> "martin" == Martin Pool <email address hidden> writes:

    martin> It may even have multiple cores already? Or it may be
    martin> worth forking two processes even on a single core.

Not in my, admittedly, limited experiments.

    martin> At any rate it's orthogonal to this patch.

If the purpose of this patch was to make PQM snappier, I think
you just vetoed your own patch by proposing a better alternative :)

    martin> So that's the key data point: how often do you get
    martin> changes bounced because of a failure in [ascii] mode?
    martin> If it's trapping at least say once per month then I'd
    martin> agree it's worth keeping.

Wow, that is heavily influenced by the way you run your tests !
I know many people rely on PQM to do it and their experience may
be more valuable than mine which almost never do that.

Still, my experience is that pqm has often caught failures in
ascii, less than once a month, but what a blessing !

I don't want to sound like the TDD extremist I am not (or maybe I
am ? :) but the sooner you catch a bug the cheaper it is to
fix[1].

The key data point here, to me, is that risking to miss an easy
bug catching that will cost X engineer hours should be compared
to the cost of upgrading PQM[2].

   Vincent

[1]: Which, to me, definitely rules out 'Test Last' against 'Test
First', but I diverge.

[2]: If you haven't used --parallel=fork daily on a multi-core
yet, you should really try.

Revision history for this message
Vincent Ladeuil (vila) wrote :

>>>>> "jam" == John A Meinel <email address hidden> writes:

    jam> Vincent Ladeuil wrote:
    >>>>>>> "martin" == Martin Pool <email address hidden> writes:
    >>
    >> <snip/>
    >>
    martin> I think running the tests in ascii mode is to some
    martin> extent a laziness against adding real tests where we
    martin> need them.
    >>
    >> Then we can't delete the [ascii] tests until we add the missing
    >> tests.
    >>
    >> Or did I miss something ?
    >>

    jam> Well, it means running the *entire* test suite in ascii
    jam> mode. It isn't a specific subset of tests.

I know :)

But I've been surprised by many test failures in ascii...

    jam> Since one can always run:

    jam> LANG=C ./bzr selftest

But who does ?
PQM does.

    jam> if that catches things that
    jam> ./bzr selftest

    jam> doesn't, then it is a good indication of a missing test....

Not in the case I refer below though...

<snip/>

    jam> Now if you had actually fixed a bug in your code because
    jam> of a PQM rejection, I might be more convinced.

I had to dig more to find such examples.

But I still consider fixing bug in *test* code to be of value, we
rely on tests to validate our code, we can't accept bugs there
either.

    jam> Personally, I think we can work towards a 2-test
    jam> system. With PQM running tests that must pass to get
    jam> merged into mainline, and a build-bot style testing
    jam> across more platforms/permutations that test whether we
    jam> are passing sufficiently well to make a release. This is
    jam> somewhat similar to what Launchpad switched to, except
    jam> they gutted their PQM test suite, so it only tests that
    jam> the merge was clean.

Now you're bribing me in a way... I can hardly believe :)

Of course I support that idea.

But let's do things in the right order, first we set up the test
farm, then we relax PQM constraints :)

      Vincent

Revision history for this message
Robert Collins (lifeless) wrote :

On Fri, 2009-06-12 at 13:09 +0000, John A Meinel wrote:

> Personally, I think we can work towards a 2-test system. With PQM
> running tests that must pass to get merged into mainline, and a
> build-bot style testing across more platforms/permutations that test
> whether we are passing sufficiently well to make a release. This is
> somewhat similar to what Launchpad switched to, except they gutted their
> PQM test suite, so it only tests that the merge was clean.

My dream test environment would be:
pqm runs tests in parallel on all platforms we care about.
*and* the test suite is about 10 times faster than it is today.

-Rob

Revision history for this message
Martin Pool (mbp) wrote :

2009/6/12 Vincent Ladeuil <email address hidden>:

>    martin> At any rate it's orthogonal to this patch.
>
> If the purpose of this patch was to make PQM snappier, I think
> you just vetoed your own patch by proposing a better alternative :)

It's not better because it will not be done soon.

> The key data point here, to me, is that risking to miss an easy
> bug catching that will cost X engineer hours should be compared
> to the cost of upgrading PQM[2].

The point is at the moment that slow merges cost engineer time too,
especially when trying to make a release.

--
Martin <http://launchpad.net/~mbp/>

Revision history for this message
Vincent Ladeuil (vila) wrote :

>>>>> "martin" == Martin Pool writes:

    martin> 2009/6/12 Vincent Ladeuil <email address hidden>:
    >>    martin> At any rate it's orthogonal to this patch.
    >>
    >> If the purpose of this patch was to make PQM snappier, I think
    >> you just vetoed your own patch by proposing a better alternative :)

    martin> It's not better because it will not be done soon.

Ok, I didn't understand that.

    >> The key data point here, to me, is that risking to miss an easy
    >> bug catching that will cost X engineer hours should be compared
    >> to the cost of upgrading PQM[2].

    martin> The point is at the moment that slow merges cost engineer time too,
    martin> especially when trying to make a release.

Point taken.

Revision history for this message
Martin Pool (mbp) wrote :

I'm going to reject my own patch. Once we have Buildbot going, I'd like to merge this change and then set up a different machine that asynchronously tests them in LANG=C and for that matter LANG=fr_FR.iso-8859

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'Makefile'
2--- Makefile 2009-06-10 09:34:02 +0000
3+++ Makefile 2009-06-16 02:35:41 +0000
4@@ -1,4 +1,4 @@
5-# Copyright (C) 2005, 2006, 2007, 2008 Canonical Ltd
6+# Copyright (C) 2005, 2006, 2007, 2008, 2009 Canonical Ltd
7 #
8 # This program is free software; you can redistribute it and/or modify
9 # it under the terms of the GNU General Public License as published by
10@@ -35,8 +35,6 @@
11
12 check-nodocs: extensions
13 $(PYTHON) -Werror -O ./bzr selftest -1v $(tests)
14- @echo "Running all tests with no locale."
15- LC_CTYPE= LANG=C LC_ALL= ./bzr selftest -1v $(tests) 2>&1 | sed -e 's/^/[ascii] /'
16
17 # Run Python style checker (apt-get install pyflakes)
18 #
19
20=== modified file 'NEWS'
21--- NEWS 2009-06-15 15:20:24 +0000
22+++ NEWS 2009-06-16 02:35:41 +0000
23@@ -206,6 +206,9 @@
24 Testing
25 *******
26
27+* ``make check`` no longer repeats the test run in ``LANG=C``.
28+ (Martin Pool, #386180)
29+
30 * The number of cores is now correctly detected on OSX. (John Szakmeister)
31
32 * The number of cores is also detected on Solaris and win32. (Vincent Ladeuil)