~rbalint/autopkgtest-cloud:retry-relax

Last commit made on 2020-11-18
Get this branch:
git clone -b retry-relax https://git.launchpad.net/~rbalint/autopkgtest-cloud
Only Balint Reczey can upload to this branch. If you are Balint Reczey please log in for upload directions.

Branch merges

Branch information

Name:
retry-relax
Repository:
lp:~rbalint/autopkgtest-cloud

Recent commits

ea12b5e... by Balint Reczey

worker/worker: Relax APT error matcher for retry

Sometimes parallel APT threads mix up the first letters of the message with
the previous line.

7da2e65... by Balint Reczey

cross-toolchain-base is big on arm64, too

482291f... by Balint Reczey

worker: Retry on apt-mark error

LP: #1903840

f4d80a9... by Steve Langasek

Merge remote-tracking branch 'mwhudson/dask-big'

946473a... by Iain Lane

web/install: Block a couple more bots which were ignoring robots.txt

17df99d... by Michael Hudson-Doyle

add dask to big_packages

600c592... by Balint Reczey

worker: Retry on E: Failed to fetch http://ftpmaster.internal/

2924b4d... by Balint Reczey

worker: Retry on PPA connection timeout, too

90e4801... by Iain Lane

worker: Properly retry on failures we think might be temporary

We're seeing a test run currently looping with this trace:

WARNING: Saw Temporary failure resolving in log, which is a sign of a temporary failure.
WARNING: Retrying in 5 minutes. Log follows:
[ ... log ... ]
gzip: /tmp/autopkgtest-work.j5qth004/out/log: No such file or directory
Traceback (most recent call last):
  [ ... cut some bits of the trace ... ]
  File "/home/ubuntu/autopkgtest-cloud/worker/worker", line 645, in request
    process_output_dir(out_dir, pkgname, code)
  File "/home/ubuntu/autopkgtest-cloud/worker/worker", line 172, in process_output_dir
    subprocess.check_call(['gzip', '-9', os.path.join(dir, 'log')])
  File "/usr/lib/python3.5/subprocess.py", line 581, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['gzip', '-9', '/tmp/autopkgtest-work.j5qth004/out/log']' returned non-zero exit status 1

But we should not be calling process_output_dir() when we're retrying.
That is to be called when we are about to upload the directory to swift.

What's happening is, we have this logic:

  for retry in range(3):
     <run the test>
     <did it permanently fail?> { /* 1 */
             <grep the log, to see if we think this might be transient>
             <break if not, otherwise print a warning, *delete the
              output directory* and retry>
     }

     <did it temporarily fail?> { /* 2 */
             <grep the log, to see if we think this might be permanent>
             <print a warning, delete the output directory and retry if
              not, otherwise break>
     } else { /* 3, passed */
             <break, no more retries, upload the result>
     }

We think it might be transient, so we clean up the output directory and
try to retry. But since we have two *separate* if statement here, the
second's else clause is entered - which is supposed to be the case for
if the run has passed cleanly - and we break out the loop, then go on to
try to upload the result. This fails, because we cleaned up the
directory.

Instead, we should have one if statement here. If we enter the first
case, for 'permanent' failures, we should never go on to enter any of
the others.

0ae42b3... by Balint Reczey

pandas takes long on armhf (in lxd)