Code review comment for ~rbalint/autopkgtest-cloud:more-retries

Iain Lane (laney) wrote :

Sorry Balint, this still needs fixing. :(

I'm not being clear, let me try some more:

There are two types of failure.

1. Permanent failures, which cause a test to be marked as a fail in the database and frontends (e.g. proposed-migration or the website) display this as a failure.
2. Temporary failures, where autopkgtest thinks that *it* caused the failure, or it's otherwise transient. These are reported by exit code 16 and autopkgtest-cloud queues these to be re-run.
  2.5. Sometimes permanent failures are misdetected as temporary ones, so we have this code to convert the two. In that case we override the code 16 to a code 4 ("at least one test failed").

What you want to introduce is a 1.5 that's the kind of opposite of 2.5. We want to convert some kinds of permanent failure into temporary ones, so that they get retried. We do *not* want to generally start retrying all permanent failures, which is what the MP currently would do.

I think that *above* the "if code == 16 ..." line, you should add a check for "if code in (2, 4, 6, 8):", which:

  - Greps the log for one of the *new* (different variable) strings that we want to retry on
  - If found, return as if it were a temporary failure, so that we retry. Do this up to three times.

Of course that code will be *common* with 2.5, so you can probably move some of that into functions and call it in both places.

review: Needs Fixing

« Back to merge proposal