Merge autopkgtest-cloud:retry-download-all-results into autopkgtest-cloud:master

Proposed by Brian Murray
Status: Merged
Merged at revision: fa729a0a0374687617742ecc505ac835387b5d75
Proposed branch: autopkgtest-cloud:retry-download-all-results
Merge into: autopkgtest-cloud:master
Diff against target: 28 lines (+10/-1)
1 file modified
charms/focal/autopkgtest-web/webcontrol/download-all-results (+10/-1)
Reviewer Review Type Date Requested Status
Steve Langasek Approve
Julian Andres Klode (community) Approve
Review via email: mp+422584@code.launchpad.net

Description of the change

The web frontends did not have the same test results and when I was running download-all-results I noticed that it frequently exited due to being disconnected from swift. This change wraps the urlopen for retrieving the results in a retry loop.

I cowboyed this change on both web frontends and they were able to download all the results without needing to be restarted.

To post a comment you must log in.
Revision history for this message
Julian Andres Klode (juliank) wrote :

lgtm

review: Approve
Revision history for this message
Steve Langasek (vorlon) wrote :

Why do this with a retry within the script, vs generic failure handling at the level of the systemd unit?

Revision history for this message
Steve Langasek (vorlon) :
review: Needs Information
Revision history for this message
Steve Langasek (vorlon) wrote :

Thinking about this from the other direction - if we're retrying this inside the script because it's stateful, and it's expensive to rerun the whole script instead of retrying in the inner loop, why have a limit of 5 times? Seems arbitrary. Why not retry forever?

Revision history for this message
Brian Murray (brian-murray) wrote :

Come to find out the script actually isn't stateful and it will only try to download results which it doesn't yet have. I was led astray by the fact that some test results are missing 'testinfo.json' and the script will try and download those results every time.

Revision history for this message
Brian Murray (brian-murray) wrote :

> Come to find out the script actually isn't stateful and it will only try to
> download results which it doesn't yet have. I was led astray by the fact that
> some test results are missing 'testinfo.json' and the script will try and
> download those results every time.

Having said that I added the retry part to "list_remote_container" which actually builds the dictionary of known test results. Previously if that failed for any release container then the whole script would crash and no other releases would be updated. So yes the retry belongs in here.

Revision history for this message
Julian Andres Klode (juliank) wrote :

Retries in the script are better anyway to avoid having to go over all the failures again, systemd retries should be there as a safety net, but don't block this.

Revision history for this message
Steve Langasek (vorlon) :
review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/charms/focal/autopkgtest-web/webcontrol/download-all-results b/charms/focal/autopkgtest-web/webcontrol/download-all-results
2index 6696be7..ab0a1e3 100755
3--- a/charms/focal/autopkgtest-web/webcontrol/download-all-results
4+++ b/charms/focal/autopkgtest-web/webcontrol/download-all-results
5@@ -20,6 +20,7 @@ import json
6 import configparser
7 import urllib.parse
8 import time
9+import http
10
11 from distro_info import UbuntuDistroInfo
12 from helpers.utils import get_test_id, init_db
13@@ -41,7 +42,15 @@ def list_remote_container(container_url):
14 url += f"&marker={urllib.parse.quote(start)}"
15
16 LOGGER.debug('Retrieving "%s"', url)
17- resp = urlopen(url)
18+ for retry in range(5):
19+ try:
20+ resp = urlopen(url)
21+ except http.client.RemoteDisconnected as e:
22+ LOGGER.debug('Got disconnected, sleeping')
23+ time.sleep(5)
24+ continue
25+ else:
26+ break
27 json_string = resp.read()
28 json_data = json.loads(json_string)
29

Subscribers

People subscribed via source and target branches