Merge lp:~rbalint/ubuntu-archive-tools/retry-intermittent into lp:ubuntu-archive-tools

Proposed by Balint Reczey on 2020-05-22
Status: Merged
Merged at revision: 1354
Proposed branch: lp:~rbalint/ubuntu-archive-tools/retry-intermittent
Merge into: lp:ubuntu-archive-tools
Diff against target: 267 lines (+146/-29)
1 file modified
retry-autopkgtest-regressions (+146/-29)
To merge this branch: bzr merge lp:~rbalint/ubuntu-archive-tools/retry-intermittent
Reviewer Review Type Date Requested Status
Iain Lane 2020-05-23 Approve on 2020-07-27
Ubuntu Package Archive Administrators 2020-05-22 Pending
Review via email: mp+384468@code.launchpad.net
To post a comment you must log in.
Steve Langasek (vorlon) wrote :

I'd like Iain's review on this, since I understood he had plans to implement something like this on the infrastructure side instead.

Iain Lane (laney) wrote :

Not reviewed yet, but yes, the intermittent handling should be done on the worker side. I think "don't retry queued tests" is more OK for this script, although that also ideally would be defended against on the server side too. There was some prior work on this, but it needs refreshing per a review from Łukasz. That's https://code.launchpad.net/~tsimonq2/autopkgtest-cloud/+git/bug-1654761/+merge/363643.

Can we please just keep the don't-retry-already-queued commits?

For intermittent handling, I'll outline here how I think it should be done. I'm planning to work on this in the next month, but feel free to get there before me. Please see:

https://git.launchpad.net/autopkgtest-cloud/tree/worker/worker

Have a look at the code around FAIL_STRINGS. The logic there is about converting *tmpfails* (what autopkgtest thinks it's going to be able to recover from) into real failures when it's probably not temporary at all. If I'm remembering correctly, these DNS and other failures end up being reported as permanent failures. Therefore I think that the fix here is to extend or replicate the FAIL_STRINGS (etc) logic to apply to all kinds of failure. Then you'd get the same behaviour we have for the tmpfail case: if we see one of those messages in the log, you get three tries to pass.

Balint Reczey (rbalint) wrote :

@laney thanks for the rewiew.

I'm updating the retry patterns then in autopktest-cloud, please take a look:
https://code.launchpad.net/~rbalint/autopkgtest-cloud/+git/autopkgtest-cloud/+merge/384609

Balint Reczey (rbalint) wrote :

@laney I've finished the logic for skipping already running tests and instead of retrying intermittent ones added the logic for retrying based on patterns found in the logs.
It is useful for retrying similar failures without the duplication of intermittent handling.

Balint Reczey (rbalint) wrote :

For the record I'll be happy to resubmit this MP it can't make it before the repo conversion to git. :-)

1327. By Steve Langasek on 2020-05-28

Merge lp:~xnox/ubuntu-archive-tools/built-using-finally

1328. By Steve Langasek on 2020-05-28

don't pass architecture=arch to apt_pkg.parse_depends, not supported on xenial.

1329. By Steve Langasek on 2020-05-28

Merge lp:~brian-murray/ubuntu-archive-tools/pu-stopped-email

1330. By Steve Langasek on 2020-05-28

use https not http for yui

1331. By Łukasz Zemczak on 2020-05-28

kernel-sru-review: look at kernel-stable-master-bug for the master bug in the description first, and only then fallback to the tags. The tags can sometimes be out-of-date.

1332. By Iain Lane on 2020-06-09

Resurrect libemail-address-xs-perl, new lintian dep

1333. By Sebastien Bacher on 2020-06-10

revert temporary change from the previous commit

1334. By Steve Langasek on 2020-06-17

Merge lp:~vorlon/ubuntu-archive-tools/clean-NBS-for-EOL

1335. By Steve Langasek on 2020-06-17

Merge lp:~brian-murray/ubuntu-archive-tools/identify-security

1336. By Steve Langasek on 2020-06-17

s/focal/groovy

1337. By Steve Langasek on 2020-06-17

Drop override for finished spdlog boostrap

1338. By Steve Langasek on 2020-06-22

Allow canonical-support as an owning team for packages in main

1339. By Steve Langasek on 2020-06-30

Fix retry-autopkgtest-regressions for compatibility with the new britney yaml schema

1340. By Steve Langasek on 2020-07-01

Fix sru-report for compatibility wit the new britney yaml schema

1341. By Steve Langasek on 2020-07-01

Merge lp:~laney/ubuntu-archive-tools/r-a-r-excuses-xz

1342. By Steve Langasek on 2020-07-02

Merge lp:~vorlon/ubuntu-archive-tools/close-EOL-bugs

1343. By Matthias Klose on 2020-07-07

update-i386-whitelist: add python3.9

1344. By Steve Langasek on 2020-07-10

Merge lp:~laney/ubuntu-archive-tools/sru-report-excuses-xz

1345. By Steve Langasek on 2020-07-10

fix url to point to ubuntu-archive, not laney

1346. By Steve Langasek on 2020-07-10

Merge lp:~laney/ubuntu-archive-tools/retry-autopkgtest-regressions-lzma-cache

1347. By Matthias Klose on 2020-07-14

add new lintian dependencies to bootstrap ...

1348. By Matthias Klose on 2020-07-14

add more lintian dependencies to bootstrap ...

1349. By Steve Langasek on 2020-07-14

drop override for finished lintian bootstrap
add override for opencc bootstrap

1350. By Łukasz Zemczak on 2020-07-16

Add new kernels to sru-report, comment out tag-based master guessing in kernel-sru-review.

1351. By Steve Langasek on 2020-07-17

Merge lp:~albertomilone/ubuntu-archive-tools/nvidia-450-whitelist

1352. By Balint Reczey on 2020-07-17

retry-autopkgtest-regressions: Factor out caching to a function

1353. By Balint Reczey on 2020-07-17

retry-autopkgtest-regressions: Don't trigger running and queued tests again

1354. By Balint Reczey on 2020-07-17

retry-autopkgtest-regressions: Add --log-regex and --cached-only

Iain Lane (laney) wrote :

I've just re-reviewed, sorry for the extended delay.

I have some comments below. Apart from the http->https (will fix that one before pushing) they are mostly nitpicks but it would be good if you could address them.

The --log-regex thing obviously makes the runtime way longer, so I think some progress output would be nice there. Maybe behind a --verbose flag, not sure.

I'm going to merge this though, so it'll have to be a follow up.

Thanks a lot!

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'retry-autopkgtest-regressions'
2--- retry-autopkgtest-regressions 2020-07-10 19:44:05 +0000
3+++ retry-autopkgtest-regressions 2020-07-17 14:12:34 +0000
4@@ -19,19 +19,23 @@
5 # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301
6 # USA
7
8-from datetime import datetime
9+from datetime import datetime, timedelta
10 import dateutil.parser
11 from dateutil.tz import tzutc
12 import urllib.request
13 import urllib.parse
14 import argparse
15+import io
16 import lzma
17 import os
18 import re
19 import yaml
20 import json
21+import gzip
22
23 request_url = 'https://autopkgtest.ubuntu.com/request.cgi'
24+running_url = 'http://autopkgtest.ubuntu.com/static/running.json'
25+queues_url = 'http://autopkgtest.ubuntu.com/queues.json'
26 default_series = 'groovy'
27 args = None
28
29@@ -64,7 +68,8 @@
30 parser.add_argument('--all-proposed', action='store_true',
31 help='run tests against all of proposed, i. e. with disabling apt pinning')
32 parser.add_argument('--state', default='REGRESSION',
33- help='generate commands for given test state (default: %(default)s)')
34+ help='Generate commands for given test state (default: %(default)s). '
35+ '--state=RUNNING also enables triggering already queued and running tests')
36 parser.add_argument('--max-age', type=float, metavar='DAYS',
37 help='only consider candidates which are at most '
38 'this number of days old (float allowed)')
39@@ -79,47 +84,55 @@
40 'against proposed, to re-establish a baseline for the '
41 'test. This currently only works for packages that '
42 'do not themselves have a newer version in proposed.')
43+ parser.add_argument('--log-regex', default=None,
44+ help='only consider tests with logs matching the '
45+ '(Python) regular expression')
46+ parser.add_argument('--force-cached', action='store_true',
47+ help='Do not try to download files again, use cached'
48+ 'version when it is present. This is useful '
49+ 'when triggering various subset of tests in a row.')
50 args = parser.parse_args()
51
52 return args
53
54-
55-def get_regressions(excuses_url, release, retry_state, min_age, max_age,
56- blocks, no_proposed):
57- '''Return dictionary with regressions
58-
59- Return dict: release → pkg → arch → [trigger, ...]
60+def get_url(url, force_cached):
61+ ''' Return file to the URL, possibly caching it
62 '''
63 cache_file = None
64
65- # load YAML excuses
66-
67 # ignore bileto urls wrt caching, they're usually too small to matter
68 # and we don't do proper cache expiry
69 m = re.search('people.canonical.com/~ubuntu-archive/proposed-migration/'
70 '([^/]*)/([^/]*)',
71- excuses_url)
72+ url)
73 if m:
74 cache_dir = get_cache_dir()
75 cache_file = os.path.join(cache_dir, '%s_%s' % (m.group(1), m.group(2)))
76+ else:
77+ # test logs can be cached, too
78+ m = re.search(
79+ 'https://objectstorage.prodstack[-0-9]*.canonical.com/v1/'
80+ 'AUTH_[a-f0-9]*/autopkgtest-[a-z]*/([^/]*)/([^/]*)'
81+ '/[a-z0-9]*/([^/]*)/([_a-f0-9]*)@/log.gz',
82+ url)
83+ if m:
84+ cache_dir = get_cache_dir()
85+ cache_file = os.path.join(
86+ cache_dir, '%s_%s_%s_%s.gz' % (
87+ m.group(1), m.group(2), m.group(3), m.group(4)))
88+
89+ if cache_file:
90 try:
91 prev_mtime = os.stat(cache_file).st_mtime
92 except FileNotFoundError:
93 prev_mtime = 0
94 prev_timestamp = datetime.fromtimestamp(prev_mtime, tz=tzutc())
95 new_timestamp = datetime.now(tz=tzutc()).timestamp()
96-
97- try:
98- f = urllib.request.urlopen(excuses_url)
99- is_xz = True
100- except urllib.error.HTTPError as e:
101- if e.code == 404:
102- # some versions of britney output this file uncompressed, try that
103- # location too
104- f = urllib.request.urlopen(excuses_url.rstrip('.xz'))
105- is_xz = False
106- else:
107- raise
108+ if force_cached:
109+ return open(cache_file, 'rb')
110+
111+ f = urllib.request.urlopen(url)
112+
113 if cache_file:
114 remote_ts = dateutil.parser.parse(f.headers['last-modified'])
115 if remote_ts > prev_timestamp:
116@@ -130,15 +143,89 @@
117 os.utime(cache_file, times=(new_timestamp, new_timestamp))
118 f.close()
119 f = open(cache_file, 'rb')
120-
121- if is_xz:
122+ return f
123+
124+
125+def already_triggered(release, arch, pkg, triggers, extra_params, running, queues):
126+
127+ # check queues
128+ for queue_name, releases in queues.items():
129+ for queue_release, arches in releases.items():
130+ if not queue_release == release:
131+ continue
132+ for queue_arch, queue in arches.items():
133+ if not queue or not queue_arch == arch:
134+ continue
135+ for raw_item in queue:
136+ buf = io.StringIO(raw_item)
137+ queued_pkg = buf.readline()
138+ item_params_raw = buf.readline()
139+ if not item_params_raw:
140+ # it was only one line
141+ m = re.search('([^ ^{]*) (.*)', queued_pkg)
142+ queued_pkg = m.group(1)
143+ item_params_raw = m.group(2)
144+ item_params_json = json.loads(item_params_raw)
145+ queued_triggers = item_params_json['triggers']
146+ queued_extra_params = []
147+ try:
148+ queued_extra_params.append(('all-proposed', item_params_json['all-proposed']))
149+ except KeyError:
150+ pass
151+
152+ if queued_pkg == pkg \
153+ and sorted(queued_triggers) == sorted(triggers) \
154+ and sorted(queued_extra_params) == sorted(extra_params):
155+ return True
156+
157+ # check running
158+ for running_pkg, submissions in running.items():
159+ for submission in submissions.values():
160+ for running_release, arches in submission.items():
161+ if not running_release == release:
162+ continue
163+ for running_arch, running_params in arches.items():
164+ if not running_arch == arch:
165+ continue
166+ running_triggers = running_params[0]['triggers']
167+ running_extra_params = []
168+ try:
169+ running_extra_params.append(('all-proposed', running_params[0]['all-proposed']))
170+ except KeyError:
171+ pass
172+ if running_pkg == pkg \
173+ and sorted(running_triggers) == sorted(triggers) \
174+ and sorted(running_extra_params) == sorted(extra_params):
175+ return True
176+ return False
177+
178+
179+def get_regressions(excuses_url, release, retry_state, min_age, max_age,
180+ blocks, no_proposed, log_regex, force_cached):
181+ '''Return dictionary with regressions
182+
183+ Return dict: release → pkg → arch → [trigger, ...]
184+ '''
185+
186+ # load YAML excuses
187+
188+ try:
189+ f = get_url(excuses_url, force_cached)
190 lzma_f = lzma.open(f)
191 excuses = yaml.load(lzma_f, Loader=yaml.CSafeLoader)
192 lzma_f.close()
193- else:
194- excuses = yaml.load(f, Loader=yaml.CSafeLoader)
195- f.close()
196+ except urllib.error.HTTPError as e:
197+ if e.code == 404:
198+ # some versions of britney output this file uncompressed, try that
199+ # location too
200+ f = get_url(url.rstrip('.xz'), force_cached)
201+ excuses = yaml.load(f, Loader=yaml.CSafeLoader)
202+ f.close()
203+ else:
204+ raise
205+
206 regressions = {}
207+
208 for excuse in excuses['sources']:
209 if blocks and blocks != excuse['source']:
210 continue
211@@ -168,6 +255,11 @@
212 continue
213 for arch, state in archinfo.items():
214 if state[0] == retry_state:
215+ if log_regex and state[1].endswith(".gz"):
216+ log = gzip.open(get_url(state[1], force_cached),
217+ mode='rt', errors='replace')
218+ if not re.findall(log_regex, log.read(), re.MULTILINE):
219+ continue
220 regressions.setdefault(release, {}).setdefault(
221 pkg, {}).setdefault(arch, []).append(trigger)
222
223@@ -180,6 +272,19 @@
224 if args.all_proposed:
225 extra_params.append(('all-proposed', '1'))
226
227+if args.log_regex:
228+ # expire old cache
229+ cache_dir = get_cache_dir()
230+ mtime_limit = datetime.now() - timedelta(weeks=4)
231+ for entry in os.listdir(cache_dir):
232+ cache_file = os.path.join(cache_dir, entry)
233+ try:
234+ mtime = os.stat(cache_file).st_mtime
235+ if datetime.fromtimestamp(mtime) < mtime_limit:
236+ os.remove(cache_file)
237+ except FileNotFoundError:
238+ pass
239+
240 if args.bileto:
241 url_root = 'https://bileto.ubuntu.com'
242 ticket_url = url_root + '/v2/ticket/%s' % args.bileto
243@@ -200,11 +305,23 @@
244 excuses_url = 'http://people.canonical.com/~ubuntu-archive/proposed-migration/%s/update_excuses.yaml.xz' % args.series
245 regressions = get_regressions(excuses_url, args.series, args.state,
246 args.min_age, args.max_age, args.blocks,
247- args.no_proposed)
248+ args.no_proposed, args.log_regex,
249+ args.force_cached)
250+
251+
252+# load JSON running and queued tests
253+running = json.loads(
254+ get_url(running_url, args.force_cached).read().decode('utf-8'))
255+queues = json.loads(
256+ get_url(queues_url, args.force_cached).read().decode('utf-8'))
257
258 for release, pkgmap in regressions.items():
259 for pkg, archmap in pkgmap.items():
260 for arch, triggers in archmap.items():
261+ if not args.state == 'RUNNING' \
262+ and already_triggered(release, arch, pkg, triggers,
263+ extra_params, running, queues):
264+ continue
265 params = [('release', release), ('arch', arch), ('package', pkg)]
266 params += [('trigger', t) for t in triggers]
267 params += extra_params

Subscribers

People subscribed via source and target branches