testflinger-cli

Merge ~pwlars/testflinger-cli:safer-polling into testflinger-cli:master

Proposed by Paul Larson on 2017-06-29

Status:	Merged
Approved by:	Paul Larson on 2017-07-14
Approved revision:	bb3f35afe75880e368fb69631f8123daa9ba9bf7
Merged at revision:	b34625416293ca9bf1291a2e8fceba19676eb63d
Proposed branch:	~pwlars/testflinger-cli:safer-polling
Merge into:	testflinger-cli:master
Diff against target:	69 lines (+26/-23) 1 file modified testflinger-cli (+26/-23)
Related bugs:	Link a bug report

Reviewer	Review Type	Date Requested	Status
Paul Larson			Approve on 2017-07-14
Review via email: mp+326567@code.launchpad.net

Description of the change

I found even more cases where polling can fail, this time it failed to even get the job_state at the beginning of polling - not due to a bad job_id or anything, just due to timeout. The downside of this is that if something is really stuck, or server continually times out, the jenkins job could be stuck waiting forever (or until the jenkins job timeout is reached). I'm leaning towards that being ok though, because the alternative is pretty annoying when testflinger is still running the test job, but the server timed out a response, so we crash on the jenkins side and don't see proper results from it.

We could also take the approach of "make sure the server never times out", and there may be more we could do there too, but in general, I think tools should handle failure cases gracefully.

Revision history for this message

Paul Larson (pwlars) wrote on 2017-07-14:

I'd like to go ahead and land this and try it in trunk at least. If it seems to work well with our jenkins jobs (which pull from there), then I'll promote the snap to candidate/stable also.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Canonical Hardware Certification

Paul Larson

 diff --git a/testflinger-cli b/testflinger-cli
 index 4625ac5..181fc06 100755
 --- a/testflinger-cli
 +++ b/testflinger-cli
@@ -169,20 +169,10 @@ def artifacts(ctx, job_id, filename):
  @click.pass_context
  def poll(ctx, job_id):
      conn = ctx.obj['conn']
--    try:
--        job_state = conn.get_status(job_id)
--    except testflinger_cli.HTTPError as e:
--        if e.status == 204:
--            print('No data found for that job id. Check the job id to be sure '
--                  'it is correct')
--        elif e.status == 400:
--            print('Invalid job id specified. Check the job id to be sure it '
--                  'is correct')
--        if e.status == 404:
--            print('Received 404 error from server. Are you sure this '
--                  'is a testflinger server?')
--        sys.exit(1)
--    while True:
++    job_state = get_job_state(conn, job_id)
++    while job_state != 'complete':
++        print('sleeping, job state was {}'.format(job_state))
++        time.sleep(10)
          output = ''
          try:
              output = conn.get_output(job_id)
@@ -194,17 +184,30 @@ def poll(ctx, job_id):
              continue
          if output:
              print(output, end='', flush=True)
--        try:
--            job_state = conn.get_status(job_id)
--        except:
--            # If something breaks here, just retry so we don't affect
--            # a running test monitor that relies on poll
--            continue
--        if job_state == 'complete':
--            break
--        time.sleep(10)
++        job_state = get_job_state(conn, job_id)
      print(job_state)
++def get_job_state(conn, job_id):
++    try:
++        return conn.get_status(job_id)
++    except testflinger_cli.HTTPError as e:
++        if e.status == 204:
++            print('No data found for that job id. Check the job id to be sure '
++                  'it is correct')
++        elif e.status == 400:
++            print('Invalid job id specified. Check the job id to be sure it '
++                  'is correct')
++        if e.status == 404:
++            print('Received 404 error from server. Are you sure this '
++                  'is a testflinger server?')
++        sys.exit(1)
++    except:
++        # If we fail to get the job_state here, it could be because of timeout
++        # but we can keep going and retrying
++        pass
++    return 'unknown'
++
++
  if __name__ == '__main__':
      cli(obj={})

testflinger-cli

Merge ~pwlars/testflinger-cli:safer-polling into testflinger-cli:master

Commit message

Description of the change

Preview Diff

Subscribers