Merge lp:~rbalint/ubuntu-archive-tools/retry-intermittent into lp:ubuntu-archive-tools

Proposed by Balint Reczey on 2020-05-22
Status: Needs review
Proposed branch: lp:~rbalint/ubuntu-archive-tools/retry-intermittent
Merge into: lp:ubuntu-archive-tools
Diff against target: 236 lines (+132/-14)
1 file modified
retry-autopkgtest-regressions (+132/-14)
To merge this branch: bzr merge lp:~rbalint/ubuntu-archive-tools/retry-intermittent
Reviewer Review Type Date Requested Status
Iain Lane 2020-05-23 Pending
Ubuntu Package Archive Administrators 2020-05-22 Pending
Review via email: mp+384468@code.launchpad.net
To post a comment you must log in.
Steve Langasek (vorlon) wrote :

I'd like Iain's review on this, since I understood he had plans to implement something like this on the infrastructure side instead.

Iain Lane (laney) wrote :

Not reviewed yet, but yes, the intermittent handling should be done on the worker side. I think "don't retry queued tests" is more OK for this script, although that also ideally would be defended against on the server side too. There was some prior work on this, but it needs refreshing per a review from Łukasz. That's https://code.launchpad.net/~tsimonq2/autopkgtest-cloud/+git/bug-1654761/+merge/363643.

Can we please just keep the don't-retry-already-queued commits?

For intermittent handling, I'll outline here how I think it should be done. I'm planning to work on this in the next month, but feel free to get there before me. Please see:

https://git.launchpad.net/autopkgtest-cloud/tree/worker/worker

Have a look at the code around FAIL_STRINGS. The logic there is about converting *tmpfails* (what autopkgtest thinks it's going to be able to recover from) into real failures when it's probably not temporary at all. If I'm remembering correctly, these DNS and other failures end up being reported as permanent failures. Therefore I think that the fix here is to extend or replicate the FAIL_STRINGS (etc) logic to apply to all kinds of failure. Then you'd get the same behaviour we have for the tmpfail case: if we see one of those messages in the log, you get three tries to pass.

Balint Reczey (rbalint) wrote :

@laney thanks for the rewiew.

I'm updating the retry patterns then in autopktest-cloud, please take a look:
https://code.launchpad.net/~rbalint/autopkgtest-cloud/+git/autopkgtest-cloud/+merge/384609

1328. By Balint Reczey on 2020-05-27

retry-autopkgtest-regressions: Don't trigger running and queued tests again

1329. By Balint Reczey on 2020-05-27

retry-autopkgtest-regressions: Add --log-regex and --cached-only

Balint Reczey (rbalint) wrote :

@laney I've finished the logic for skipping already running tests and instead of retrying intermittent ones added the logic for retrying based on patterns found in the logs.
It is useful for retrying similar failures without the duplication of intermittent handling.

Balint Reczey (rbalint) wrote :

For the record I'll be happy to resubmit this MP it can't make it before the repo conversion to git. :-)

Unmerged revisions

1329. By Balint Reczey on 2020-05-27

retry-autopkgtest-regressions: Add --log-regex and --cached-only

1328. By Balint Reczey on 2020-05-27

retry-autopkgtest-regressions: Don't trigger running and queued tests again

1327. By Balint Reczey on 2020-05-22

retry-autopkgtest-regressions: Factor out caching to a function

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'retry-autopkgtest-regressions'
2--- retry-autopkgtest-regressions 2020-04-24 15:35:59 +0000
3+++ retry-autopkgtest-regressions 2020-05-27 19:37:02 +0000
4@@ -19,18 +19,22 @@
5 # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301
6 # USA
7
8-from datetime import datetime
9+from datetime import datetime, timedelta
10 import dateutil.parser
11 from dateutil.tz import tzutc
12 import urllib.request
13 import urllib.parse
14 import argparse
15+import io
16 import os
17 import re
18 import yaml
19 import json
20+import gzip
21
22 request_url = 'https://autopkgtest.ubuntu.com/request.cgi'
23+running_url = 'http://autopkgtest.ubuntu.com/static/running.json'
24+queues_url = 'http://autopkgtest.ubuntu.com/queues.json'
25 default_series = 'groovy'
26 args = None
27
28@@ -63,7 +67,8 @@
29 parser.add_argument('--all-proposed', action='store_true',
30 help='run tests against all of proposed, i. e. with disabling apt pinning')
31 parser.add_argument('--state', default='REGRESSION',
32- help='generate commands for given test state (default: %(default)s)')
33+ help='Generate commands for given test state (default: %(default)s). '
34+ '--state=RUNNING also enables triggering already queued and running tests')
35 parser.add_argument('--max-age', type=float, metavar='DAYS',
36 help='only consider candidates which are at most '
37 'this number of days old (float allowed)')
38@@ -78,37 +83,54 @@
39 'against proposed, to re-establish a baseline for the '
40 'test. This currently only works for packages that '
41 'do not themselves have a newer version in proposed.')
42+ parser.add_argument('--log-regex', default=None,
43+ help='only consider tests with logs matching the '
44+ '(Python) regular expression')
45+ parser.add_argument('--force-cached', action='store_true',
46+ help='Do not try to download files again, use cached'
47+ 'version when it is present. This is useful '
48+ 'when triggering various subset of tests in a row.')
49 args = parser.parse_args()
50
51 return args
52
53-
54-def get_regressions(excuses_url, release, retry_state, min_age, max_age,
55- blocks, no_proposed):
56- '''Return dictionary with regressions
57-
58- Return dict: release → pkg → arch → [trigger, ...]
59+def get_url(url, force_cached):
60+ ''' Return file to the URL, possibly caching it
61 '''
62 cache_file = None
63
64- # load YAML excuses
65-
66 # ignore bileto urls wrt caching, they're usually too small to matter
67 # and we don't do proper cache expiry
68 m = re.search('people.canonical.com/~ubuntu-archive/proposed-migration/'
69 '([^/]*)/([^/]*)',
70- excuses_url)
71+ url)
72 if m:
73 cache_dir = get_cache_dir()
74 cache_file = os.path.join(cache_dir, '%s_%s' % (m.group(1), m.group(2)))
75+ else:
76+ # test logs can be cached, too
77+ m = re.search(
78+ 'https://objectstorage.prodstack[-0-9]*.canonical.com/v1/'
79+ 'AUTH_[a-f0-9]*/autopkgtest-[a-z]*/([^/]*)/([^/]*)'
80+ '/[a-z0-9]*/([^/]*)/([_a-f0-9]*)@/log.gz',
81+ url)
82+ if m:
83+ cache_dir = get_cache_dir()
84+ cache_file = os.path.join(
85+ cache_dir, '%s_%s_%s_%s.gz' % (
86+ m.group(1), m.group(2), m.group(3), m.group(4)))
87+
88+ if cache_file:
89 try:
90 prev_mtime = os.stat(cache_file).st_mtime
91 except FileNotFoundError:
92 prev_mtime = 0
93 prev_timestamp = datetime.fromtimestamp(prev_mtime, tz=tzutc())
94 new_timestamp = datetime.now(tz=tzutc()).timestamp()
95+ if force_cached:
96+ return open(cache_file, 'rb')
97
98- f = urllib.request.urlopen(excuses_url)
99+ f = urllib.request.urlopen(url)
100 if cache_file:
101 remote_ts = dateutil.parser.parse(f.headers['last-modified'])
102 if remote_ts > prev_timestamp:
103@@ -119,9 +141,75 @@
104 os.utime(cache_file, times=(new_timestamp, new_timestamp))
105 f.close()
106 f = open(cache_file, 'rb')
107-
108+ return f
109+
110+
111+def already_triggered(release, arch, pkg, triggers, extra_params, running, queues):
112+
113+ # check queues
114+ for queue_name, releases in queues.items():
115+ for queue_release, arches in releases.items():
116+ if not queue_release == release:
117+ continue
118+ for queue_arch, queue in arches.items():
119+ if not queue or not queue_arch == arch:
120+ continue
121+ for raw_item in queue:
122+ buf = io.StringIO(raw_item)
123+ queued_pkg = buf.readline()
124+ item_params_raw = buf.readline()
125+ if not item_params_raw:
126+ # it was only one line
127+ m = re.search('([^ ^{]*) (.*)', queued_pkg)
128+ queued_pkg = m.group(1)
129+ item_params_raw = m.group(2)
130+ item_params_json = json.loads(item_params_raw)
131+ queued_triggers = item_params_json['triggers']
132+ queued_extra_params = []
133+ try:
134+ queued_extra_params.append(('all-proposed', item_params_json['all-proposed']))
135+ except KeyError:
136+ pass
137+
138+ if queued_pkg == pkg \
139+ and sorted(queued_triggers) == sorted(triggers) \
140+ and sorted(queued_extra_params) == sorted(extra_params):
141+ return True
142+
143+ # check running
144+ for running_pkg, submissions in running.items():
145+ for submission in submissions.values():
146+ for running_release, arches in submission.items():
147+ if not running_release == release:
148+ continue
149+ for running_arch, running_params in arches.items():
150+ if not running_arch == arch:
151+ continue
152+ running_triggers = running_params[0]['triggers']
153+ running_extra_params = []
154+ try:
155+ running_extra_params.append(('all-proposed', running_params[0]['all-proposed']))
156+ except KeyError:
157+ pass
158+ if running_pkg == pkg \
159+ and sorted(running_triggers) == sorted(triggers) \
160+ and sorted(running_extra_params) == sorted(extra_params):
161+ return True
162+ return False
163+
164+
165+def get_regressions(excuses_url, release, retry_state, min_age, max_age,
166+ blocks, no_proposed, log_regex, force_cached):
167+ '''Return dictionary with regressions
168+
169+ Return dict: release → pkg → arch → [trigger, ...]
170+ '''
171+
172+ # load YAML excuses
173+ f = get_url(excuses_url, force_cached)
174 excuses = yaml.load(f, Loader=yaml.CSafeLoader)
175 f.close()
176+
177 regressions = {}
178 for excuse in excuses['sources']:
179 if blocks and blocks != excuse['source']:
180@@ -148,6 +236,11 @@
181 trigger = excuse['source'] + '/' + excuse['new-version']
182 for arch, state in archinfo.items():
183 if state[0] == retry_state:
184+ if log_regex and state[1].endswith(".gz"):
185+ log = gzip.open(get_url(state[1], force_cached),
186+ mode='rt', errors='replace')
187+ if not re.findall(log_regex, log.read(), re.MULTILINE):
188+ continue
189 regressions.setdefault(release, {}).setdefault(
190 pkg, {}).setdefault(arch, []).append(trigger)
191
192@@ -160,6 +253,19 @@
193 if args.all_proposed:
194 extra_params.append(('all-proposed', '1'))
195
196+if args.log_regex:
197+ # expire old cache
198+ cache_dir = get_cache_dir()
199+ mtime_limit = datetime.now() - timedelta(weeks=4)
200+ for entry in os.listdir(cache_dir):
201+ cache_file = os.path.join(cache_dir, entry)
202+ try:
203+ mtime = os.stat(cache_file).st_mtime
204+ if datetime.fromtimestamp(mtime) < mtime_limit:
205+ os.remove(cache_file)
206+ except FileNotFoundError:
207+ pass
208+
209 if args.bileto:
210 url_root = 'https://bileto.ubuntu.com'
211 ticket_url = url_root + '/v2/ticket/%s' % args.bileto
212@@ -180,11 +286,23 @@
213 excuses_url = 'http://people.canonical.com/~ubuntu-archive/proposed-migration/%s/update_excuses.yaml' % args.series
214 regressions = get_regressions(excuses_url, args.series, args.state,
215 args.min_age, args.max_age, args.blocks,
216- args.no_proposed)
217+ args.no_proposed, args.log_regex,
218+ args.force_cached)
219+
220+
221+# load JSON running and queued tests
222+running = json.loads(
223+ get_url(running_url, args.force_cached).read().decode('utf-8'))
224+queues = json.loads(
225+ get_url(queues_url, args.force_cached).read().decode('utf-8'))
226
227 for release, pkgmap in regressions.items():
228 for pkg, archmap in pkgmap.items():
229 for arch, triggers in archmap.items():
230+ if not args.state == 'RUNNING' \
231+ and already_triggered(release, arch, pkg, triggers,
232+ extra_params, running, queues):
233+ continue
234 params = [('release', release), ('arch', arch), ('package', pkg)]
235 params += [('trigger', t) for t in triggers]
236 params += extra_params

Subscribers

People subscribed via source and target branches