Merge ~pwlars/testflinger-cli:dont-let-poll-die into testflinger-cli:master

Proposed by Paul Larson
Status: Merged
Approved by: Paul Larson
Approved revision: 6c1a552069f6ba63b70232f3741dcf267d27ab89
Merged at revision: 1ec2a0aac3b17000e5db729adbf0fd3331e4a4ec
Proposed branch: ~pwlars/testflinger-cli:dont-let-poll-die
Merge into: testflinger-cli:master
Diff against target: 18 lines (+6/-1)
1 file modified
testflinger-cli (+6/-1)
Reviewer Review Type Date Requested Status
Paul Larson Approve
Review via email: mp+326528@code.launchpad.net

Description of the change

I noticed that in one of the test runs today, we saw a crash while testflinger-cli was polling [1]. The job continues to run in the background, but because jenkins loses it's monitor of the status, it tries to continue and fails. So even though the test might run, we don't get the email at the end with the summary and all that. This should make testflinger-cli retry if it hits spurious errors like this, rather than just dying.

To post a comment you must log in.
Revision history for this message
Paul Larson (pwlars) wrote :
Download full text (7.0 KiB)

[1]
Timeout while trying to communicate with the server.
Traceback (most recent call last):
  File "/srv/mnt/jenkins/jobs/stlouis-stlouis-kernel-edge/workspace/testflinger-cli/env/lib/python3.5/site-packages/urllib3-1.21.1-py3.5.egg/urllib3/connectionpool.py", line 386, in _make_request
    six.raise_from(e, None)
  File "<string>", line 2, in raise_from
  File "/srv/mnt/jenkins/jobs/stlouis-stlouis-kernel-edge/workspace/testflinger-cli/env/lib/python3.5/site-packages/urllib3-1.21.1-py3.5.egg/urllib3/connectionpool.py", line 382, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/lib/python3.5/http/client.py", line 1197, in getresponse
    response.begin()
  File "/usr/lib/python3.5/http/client.py", line 297, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.5/http/client.py", line 258, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/lib/python3.5/socket.py", line 575, in readinto
    return self._sock.recv_into(b)
socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/srv/mnt/jenkins/jobs/stlouis-stlouis-kernel-edge/workspace/testflinger-cli/env/lib/python3.5/site-packages/requests-2.18.1-py3.5.egg/requests/adapters.py", line 440, in send
    timeout=timeout
  File "/srv/mnt/jenkins/jobs/stlouis-stlouis-kernel-edge/workspace/testflinger-cli/env/lib/python3.5/site-packages/urllib3-1.21.1-py3.5.egg/urllib3/connectionpool.py", line 649, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/srv/mnt/jenkins/jobs/stlouis-stlouis-kernel-edge/workspace/testflinger-cli/env/lib/python3.5/site-packages/urllib3-1.21.1-py3.5.egg/urllib3/util/retry.py", line 357, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/srv/mnt/jenkins/jobs/stlouis-stlouis-kernel-edge/workspace/testflinger-cli/env/lib/python3.5/site-packages/urllib3-1.21.1-py3.5.egg/urllib3/packages/six.py", line 686, in reraise
    raise value
  File "/srv/mnt/jenkins/jobs/stlouis-stlouis-kernel-edge/workspace/testflinger-cli/env/lib/python3.5/site-packages/urllib3-1.21.1-py3.5.egg/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/srv/mnt/jenkins/jobs/stlouis-stlouis-kernel-edge/workspace/testflinger-cli/env/lib/python3.5/site-packages/urllib3-1.21.1-py3.5.egg/urllib3/connectionpool.py", line 388, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/srv/mnt/jenkins/jobs/stlouis-stlouis-kernel-edge/workspace/testflinger-cli/env/lib/python3.5/site-packages/urllib3-1.21.1-py3.5.egg/urllib3/connectionpool.py", line 308, in _raise_timeout
    raise ReadTimeoutError(self, url, "Read timed out. (read timeout=%s)" % timeout_value)
urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='testflinger.canonical.com', port=80): Read timed out. (read timeout=15)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/srv/mnt/jenkins/jobs/stlouis-stlouis-kernel-edge/workspace/testflinger-cli/env/bin/testflinger-cli", line 4, in <module>
    __import__('pkg_resources').run_...

Read more...

Revision history for this message
Paul Larson (pwlars) wrote :

I've tried this locally from a virtualenv, and seems to be working or at least not causing problems. I'd like to land it in trunk at least, so that we can safely kick off some runs and have better hope for not hitting that problem again

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/testflinger-cli b/testflinger-cli
2index d51b4f2..4625ac5 100755
3--- a/testflinger-cli
4+++ b/testflinger-cli
5@@ -194,7 +194,12 @@ def poll(ctx, job_id):
6 continue
7 if output:
8 print(output, end='', flush=True)
9- job_state = conn.get_status(job_id)
10+ try:
11+ job_state = conn.get_status(job_id)
12+ except:
13+ # If something breaks here, just retry so we don't affect
14+ # a running test monitor that relies on poll
15+ continue
16 if job_state == 'complete':
17 break
18 time.sleep(10)

Subscribers

People subscribed via source and target branches