Merge ~ddstreet/autopkgtest-cloud:master into autopkgtest-cloud:master

Proposed by Dan Streetman
Status: Merged
Merged at revision: 5714b725df6e4c447388863ee811690973401c01
Proposed branch: ~ddstreet/autopkgtest-cloud:master
Merge into: autopkgtest-cloud:master
Diff against target: 0 lines
Reviewer Review Type Date Requested Status
Dan Streetman (community) Disapprove
Steve Langasek Approve
Iain Lane Needs Information
Ɓukasz Zemczak Pending
Dimitri John Ledkov Pending
Ubuntu Release Team Pending
Review via email: mp+369620@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Steve Langasek (vorlon) wrote :

Reading the referenced bug report, I'm failing to see how the determination has been made that any current flakiness is tied back to not running on a large enough instance.
 Indeed, the logs referenced are from the time frame of 2019-05-21, at which time http://autopkgtest.ubuntu.com/packages/systemd/eoan/amd64 shows the tests were failing almost constantly. But since 2019-06-07, the tests have been passing consistently, with the only failures due to trying to run tests against kernels which were not compatible with the test suite; and this improvement clearly didn't require running the testsuite on larger runners.

We should of course prioritize running test suites on larger instances wherever suitable to get the most value out of the CI; but on the other hand we run tests by default on 1xCPU instances instead of 4xCPU instances because the larger instance type adversely affects overall throughput. So for me to +1 this the benefits would need to be clearer than they currently are to me.

review: Needs Information
Revision history for this message
Dan Streetman (ddstreet) wrote :

> But since 2019-06-07, the tests have been passing consistently

this is primarily about systemd-upstream, where the test results are not nearly as rosy as our test results. But our own test results will only get worse as more and more tests are added that run under non-accelerated qemu emulation inside the 1-cpu testbed, especially since upstream wants to start running qemu-wrapped tests on arm64 and s390x also.

Upstream also asked to run the upstream test suite in parallel, as they have been doing for months on the Centos testbed, but with a m1.small instance there is absolutely no chance of ever doing anything close to that; just running each test individually is pushing things.

> We should of course prioritize running test suites on larger instances
> wherever suitable to get the most value out of the CI; but on the other hand
> we run tests by default on 1xCPU instances instead of 4xCPU instances because
> the larger instance type adversely affects overall throughput. So for me to
> +1 this the benefits would need to be clearer than they currently are to me.

Unfortunately I can't prove anything, since the systemd-upstream flakyness on our testbeds is hard or impossible to reproduce locally, especially for non-intel archs, and there is no autopkgtest-cloud api to control the testbed flavor for a few test runs that i can use to test it.

Looking at the previous git log commit comments, I wasn't aware there is a strict requirement for proof before adding something to big packages, but I'm probably just missing context:

"The new version of this package consistently fails on the test_isin
    test on ppc64el, most likely due to OOM."

"add heat-dashboard in big_packages

    New version of this package consistently fails autopktests on the
    armhf. In cosmic we install py2 and py3 which takes a long time."

"Lintian tests take long but can be run in parallel on big testbeds"

"Add nodejs to big_packages on x86"

Once I get my own autopkgtest-cloud development environment up in canonistack i'll see if I can get definitive proof and revisit this.

Revision history for this message
Dan Streetman (ddstreet) wrote :
Download full text (3.9 KiB)

Development upstream tests consistently fail on arm64:
$ autopkgtest-manager --ppa ddstreet/systemd-upstream systemd -r bionic -a all -g --passed -p 3
systemd/243-rc2ubuntu0.18.04.1+upstream20190822b1 (bionic/armhf): OK (2: SKIP)
systemd/243-rc2ubuntu0.18.04.1+upstream20190822b2 (bionic/armhf): OK (2: SKIP)
systemd/243-rc2ubuntu0.18.04.1+upstream20190822b2 (bionic/armhf): OK (2: SKIP)
systemd/243-rc2ubuntu0.18.04.1+upstream20190822b1 (bionic/ppc64el): OK
systemd/243-rc2ubuntu0.18.04.1+upstream20190822b1 (bionic/ppc64el): OK
systemd/243-rc2ubuntu0.18.04.1+upstream20190822b2 (bionic/ppc64el): OK
systemd/243-rc2ubuntu0.18.04.1+upstream20190822b1 (bionic/amd64): OK
systemd/243-rc2ubuntu0.18.04.1+upstream20190822b1 (bionic/amd64): OK
systemd/243-rc2ubuntu0.18.04.1+upstream20190822b2 (bionic/amd64): OK
systemd/243-rc2ubuntu0.18.04.1+upstream20190822b1 (bionic/arm64): FAIL (4: FAIL)
systemd/243-rc2ubuntu0.18.04.1+upstream20190822b1 (bionic/arm64): FAIL (4: FAIL)
systemd/243-rc2ubuntu0.18.04.1+upstream20190822b2 (bionic/arm64): FAIL (4: FAIL)
systemd/243-rc2ubuntu0.18.04.1+upstream20190822b1 (bionic/i386): OK
systemd/243-rc2ubuntu0.18.04.1+upstream20190822b2 (bionic/i386): OK
systemd/243-rc2ubuntu0.18.04.1+upstream20190822b2 (bionic/i386): OK
systemd/243-rc2ubuntu0.18.04.1+upstream20190822b1 (bionic/s390x): OK
systemd/243-rc2ubuntu0.18.04.1+upstream20190822b1 (bionic/s390x): OK
systemd/243-rc2ubuntu0.18.04.1+upstream20190822b2 (bionic/s390x): OK
15 pass, 3 fail

These are all the arm64 tests since I got qemu working in the test:
$ autopkgtest-manager --ppa ddstreet/systemd-upstream systemd -r bionic -a arm64 -g --summary -p 9
systemd/243-rc1ubuntu0.18.04.1+upstream20190803b1 (bionic/arm64): FAIL (4: FAIL)
upstream FAIL timed out
systemd/243-rc1ubuntu0.18.04.1+upstream20190803b1 (bionic/arm64): FAIL (4: FAIL)
upstream FAIL timed out
systemd/243-rc1ubuntu0.18.04.1+upstream20190808b1 (bionic/arm64): FAIL (4: FAIL)
root-unittests FAIL non-zero exit status 134
upstream FAIL timed out
systemd/243-rc1ubuntu0.18.04.1+upstream20190812b1 (bionic/arm64): FAIL (4: FAIL)
upstream FAIL timed out
systemd/243-rc1ubuntu0.18.04.1+upstream20190813b4 (bionic/arm64): FAIL (4: FAIL)
upstream FAIL timed out
systemd/243-rc1ubuntu0.18.04.1+upstream20190819b3 (bionic/arm64): FAIL (4: FAIL)
upstream FAIL timed out
systemd/243-rc2ubuntu0.18.04.1+upstream20190822b1 (bionic/arm64): FAIL (4: FAIL)
upstream FAIL timed out
systemd/243-rc2ubuntu0.18.04.1+upstream20190822b1 (bionic/arm64): FAIL (4: FAIL)
upstream FAIL timed out
systemd/243-rc2ubuntu0.18.04.1+upstream20190822b2 (bionic/arm64): FAIL (4: FAIL)
upstream FAIL timed out
0 pass, 9 fail

Some example log files from autopkgtest server:
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-bionic-ddstreet-systemd-upstream/bionic/arm64/s/systemd/20190823_143713_d3bee@/log.gz
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-bionic-ddstreet-systemd-upstream/bionic/arm64/s/systemd/20190823_050908_a41cb@/log.gz
https://ob...

Read more...

Revision history for this message
Iain Lane (laney) wrote :

I'll abstain from deciding whether big_packages is appropriate in itself or not (haven't analysed the problem and it looks like Steve has).

We often get very many systemd-upstream tests running in parallel. If we start running them on big instances, we'll reduce our available parallelism, which might make overall throughput worse on two axes: for systemd upstream themselves, and for all other tests we're trying to run on autopkgtest.ubuntu.com.

I think we'd want to ensure a hard limit on the number of parallel (systemd-)upstream tests, to not produce an unacceptably negatively impact on Ubuntu's own tests. They could be easily crowded out if just a few upstream PRs are running at once. For the first axis, as the operator of this service I don't have a particular opinion, but upstream developers might.

review: Needs Information
Revision history for this message
Dan Streetman (ddstreet) wrote :

> (haven't analysed the problem and it looks like Steve has).

I'm not sure if this is true; @vorlon are you familiar with the systemd 'upstream' autopkgtest (or the other tests)?

> We often get very many systemd-upstream tests running in parallel

$ autopkgtest-manager --systemd -r bionic -a amd64 -g --timestamp -p 80 --passed
systemd-upstream/242-2[PR=13322]/2019-08-14T17:58:51 (bionic/amd64): OK
...

So 80 tests since 8-14, an average of 8 tests/day. Some days definitely do have more, while other days less.

> If we start running them on big instances, we'll reduce our available parallelism

If capacity is an issue, might I suggest looking at some of the commonly run tests that don't seem to provide much value, like glibc, which is run in big_packges, and run multiple times per day on some days.

The only test that runs is rebuild:
https://git.launchpad.net/ubuntu/+source/glibc/tree/debian/tests/control?h=ubuntu/eoan-devel

I get that a simple rebuild can be useful, but it doesn't really seem that anyone is looking at, or caring about, the failures:
http://autopkgtest.ubuntu.com/packages/glibc/bionic/amd64

In contrast, the systemd (and especially systemd-upstream) tests actually do run a large suite of tests (which is growing every day, for systemd-upstream) and people actually do look at the results for every single test run (that fails).

> I think we'd want to ensure a hard limit on the number of parallel (systemd-)upstream tests

That's up to you, and I am fairly sure upstream appreciates the CI resources already provided. I would like to think that providing these resources for upstream CI actually decreases the workload for Canonical/Ubuntu in the long term, by catching/fixing bugs upstream instead of once we try to merge it into our devel release. So maybe it's worth finding/adding some extra capacity for the upstream systemd CI tests.

Just for clarification, this MP is only suggesting that the systemd-upstream tests move from the current flavor of m1.small (1 vcpu, 2G mem, 10G disk) to m1.large (4 vcpu, 8G mem, 10G disk). The change will also increase the individual test timeout from 10,000 seconds up to 20,000 seconds.

The main driver of this suggested change is the long test duration, which could alternately be addressed by changing the test to a 'long_tests' (still a m1.small instance, but increase test timeout to 40,000 seconds). However, I feel the tests already take long enough (hours) and stretching that out to even more hours will delay even longer the amount of time upstream has to wait before being able to assess any PR's test results.

The main driver of long test duration (by a large margin) is the 'upstream' test:
https://salsa.debian.org/ddstreet-guest/systemd/blob/experimental/debian/tests/upstream

note that several of the tests are currently blacklisted, which I have fixed and am preparing to un-blacklist, and also many of the tests attempt to run tests under qemu, which currently doesn't work on non-intel archs, which i've also fixed and am in progress of enabling.

> but upstream developers might.

I'll make them aware of this request so they can comment.

Revision history for this message
Evgeny Vereshchagin (evvers) wrote :

> I think we'd want to ensure a hard limit on the number of parallel (systemd-)upstream tests, to not produce an unacceptably negatively impact on Ubuntu's own tests.

Ubuntu CI doesn't cancel jobs when PRs are updated, which I think is why "systemd-upstream" might seem overly crowded (unintentionally wasting much more resources than necessary along the way). In a PR I opened recently after I force-pushed code 4 times (which isn't unusual in the systemd repository) there were 16 jobs running in parallel instead of 4. Until this is fixed, I don't think the number of runs should be limited so as not to make people wait for too long for their PRs to be tested. If it's decided that the limit should be enforced anyway, I'm pretty sure nobody is going to wait for it to finish and it will be most likely turned off.

Revision history for this message
Evgeny Vereshchagin (evvers) wrote :

Speaking of wasting the resources, other than numerous jobs that should be cancelled, Ubuntu CI also blindly runs all the integration tests even when it doesn't make sense. For example, the PR I mentioned (where there were 16 jobs) had nothing to do with Ubuntu and should have been skipped altogether there. If the overall throughput is really important I think it would probably make sense to address those issues first.

Just in case, here's what's skipped on CentOS CI:
https://github.com/systemd/systemd-centos-ci/blob/master/jenkins/runners/systemd-pr-build.sh
https://github.com/systemd/systemd-centos-ci/blob/8b4e41878b3f85d16cc6a51af401ebc49dd88f65/agent/testsuite.sh#L45

Revision history for this message
Dan Streetman (ddstreet) wrote :

I'm moving this into WIP due to the following reasons:

-the autopkgtest.ubuntu.com cloud capacity seems to be quite limited
-the autopkgtest-cloud code has no current mechanism to:
  1) filter out PRs that don't need to be tested and/or limit which tests are run based on the PR content, as mentioned in comment above
  2) cancel running tests for any reason, e.g. if a PR is force-pushed while the test is running
-the amd64 and i386 tests will speed up significantly once patches to enable KVM in the tests are applied

Unfortunately, once patches to enable use of qemu (non-accelerated on non-intel archs) are applied, the arm64 tests will need to have some of its longer-running tests blacklisted, as it will run over the 10,000 second test limit for every run, based on my testing so far, unless other ways to reduce its test time can be found.

Revision history for this message
Steve Langasek (vorlon) wrote :

My $.02, I was actually prepared to enable this change, because while it sounds like the systemd-upstream tests are making suboptimal use of infrastructure, we are not actually in terrible shape overall in terms of capacity, and I would certainly much rather have systemd-upstream tests being run that *do* have a chance of passing, rather than running tests for 2 hours at a go that consume resources but we know will always fail.

So even without any of the proposed enhancements (caps on parallelization; test cancellation on PR update), I think this is still a sensible thing to land as-is.

What we don't know is how much more CPU capacity these are going to take as a result of them running to completion on large instances rather than running for 2h on a small instance before being cut off. But I think we should land this and then assess.

BTW running under kvm is not a great solution for improving efficiency on amd64+i386 either, the cloud units are already running as kvm guests and nested kvm is not something we promise stability of.

review: Approve
Revision history for this message
Dan Streetman (ddstreet) wrote :

I unfortunately used my 'master' branch for this and I can't see any way to change that in this MP. It also looks like I can't close this without 'deleting' it, which removes all comments.

So I'm just going to take back my master branch, thus the commits for this MP will not be correct, so please ignore this. If there is a way to change the origin branch, or just close without deleting, please feel free to do so.

review: Disapprove

Preview Diff

Empty

Subscribers

People subscribed via source and target branches