juju-ci-tools

Merge lp:~adeuring/juju-ci-tools/add-missing-results-files-to-s3 into lp:juju-ci-tools

add-missing-results-files-to-s3
Merge into trunk

Proposed by Abel Deuring on 2014-09-25

Status:	Merged
Merged at revision:	692
Proposed branch:	lp:~adeuring/juju-ci-tools/add-missing-results-files-to-s3
Merge into:	lp:juju-ci-tools
Diff against target:	189 lines (+148/-11) 3 files modified add-missing-result-yaml-files.py (+135/-0) backup-to-s3.py (+2/-11) utility.py (+11/-0)
To merge this branch:	bzr merge lp:~adeuring/juju-ci-tools/add-missing-results-files-to-s3
Related bugs:	Link a bug report

Reviewer	Review Type	Date Requested	Status
Curtis Hovey (community)	code	2014-09-25	Approve on 2014-09-26
Review via email: mp+236008@code.launchpad.net

Description of the change

This branch adds a new script to check if the files stored in S3 for all test runs contain a file result.yaml or result.json.

Additionally it checks for existing files "result.(json|yaml)" if the contain the final result information (key "result" in the main dicitonary). The final result is added if it missing.

Being a bit paranoid, I added checks that the dat recorded in S3 matches that in the main state file (lines 92..111).

In a dry run of the script on my laptop the second check results in warnings for some revisions that the compared dictionaries do not match. Some details:

revision 1855

S3 file:
"complete_failure_info": {'functional-ha-recovery': 835, 'hp-deploy-trusty-amd64': 1715, 'run-unit-tests-utopic-amd64': 690}
state file:
"complete_failure_info": {'functional-ha-recovery': 836, 'hp-deploy-trusty-amd64': 1715, 'run-unit-tests-utopic-amd64': 690}

revision 1856:

S3 file
"result": {"status": "curse", "build": 1268}
state file:
result: {build: 1269, status: bless}

Somewhat worrying... But the S3 files for both revisions contain a final result, so the script will not change anything.

I'm a bit too tired to dive into the other failures; can do this tomorrow. But it makes probably sense to explicitly prevent changes to result files that show this type of inconsistency.

Revision history for this message

Curtis Hovey (sinzui) wrote on 2014-09-26:

Thank you very much for this script. I have a question. I think this branch is really valuable and would like it merged quickly.

I am not surprised by divergent result informaiton. juju-reports didn't try to get revised results for many months. This also accounts for retries of tests that still failed.

review: Needs Information (code)

lp:~adeuring/juju-ci-tools/add-missing-results-files-to-s3 updated on 2014-09-26

686. By Abel Deuring on 2014-09-26: If two result files exist for one revision, use the timestamp of the older file but the data from the newer file.

Revision history for this message

Abel Deuring (adeuring) wrote on 2014-09-26:

Am 26.09.2014 um 16:33 schrieb Curtis Hovey:
> Review: Needs Information code
>
> Thank you very much for this script. I have a question. I think this branch is really valuable and would like it merged quickly.
>
> I am not surprised by divergent result informaiton. juju-reports didn't try to get revised results for many months. This also accounts for retries of tests that still failed.

I think I understand meanwhile a bit better how the result diverged. But
that's a discussion about possible improvements of ci-director.

>
> Diff comments:
>
>> + # The most recent version may currently be building, hence a check
>> + # if the result file is useless.
>> + del all_revisions[max(all_revisions)]
>> + for revision_number, revision_data in sorted(all_revisions.items()):
>> + if not revision_data['result']:
>> + result_file_name = None
>> + else:
>> + # If both a result.yaml and a result.json file exist, use
>> + # the older one.
>> + older = min(revision_data['result'])
>
> The older one? Isn't the older one more likely to be wrong? I think the newer one is the replacement with revised data. If this is the case, we want the newer file with the older files's timestamp

Argh, right. Fixed.

>> +def s3_cmd(params, drop_output=False):
>> + s3cfg_path = os.path.join(
>> + os.environ['HOME'], 'cloud-city/juju-qa.s3cfg')
>> + if drop_output:
>> + return subprocess.check_call(
>> + ['s3cmd', '-c', s3cfg_path] + params, stdout=open('/dev/null', 'w'))
>
> Maybe --no-progress should be the default. I think most of the cron spam I get is because juju-reports never uses this option.

Right, I've added "--no-progress" as a non-removable parameter.

Revision history for this message

Curtis Hovey (sinzui) wrote on 2014-09-26:

Thank you.

review: Approve (code)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Abel Deuring

Juju Release Engineering

 === added file 'add-missing-result-yaml-files.py'
 --- add-missing-result-yaml-files.py	1970-01-01 00:00:00 +0000
 +++ add-missing-result-yaml-files.py	2014-09-26 15:12:24 +0000
@@ -0,0 +1,135 @@
++#!/usr/bin/env python
++
++"""Add missing result.yaml in S3; ensue that existing files contain
++the final result.
++"""
++
++from __future__ import print_function
++from argparse import ArgumentParser
++from datetime import datetime
++import json
++import os
++import re
++from tempfile import NamedTemporaryFile
++import yaml
++
++from utility import (
++    s3_cmd,
++    temp_dir,
++)
++
++ARCHIVE_URL = 's3://juju-qa-data/juju-ci/products/'
++ISO_8601_FORMAT = '%Y-%m-%dT%H:%M:%S.%fZ'
++LONG_AGO = datetime(2000, 1, 1)
++
++
++def get_ci_director_state():
++    state_file_path = os.path.join(
++        os.environ['HOME'], '.config/ci-director-state')
++    with open(state_file_path) as state_file:
++        return yaml.load(state_file)['versions']
++
++
++def list_s3_files():
++    text = s3_cmd(['ls', '-r', ARCHIVE_URL])
++    for line in text.strip().split('\n'):
++        file_date, file_time, size, url = re.split(r'\s+', line)
++        file_date = [int(part) for part in file_date.split('-')]
++        file_time = [int(part) for part in file_time.split(':')]
++        file_time = datetime(*(file_date + file_time))
++        revision_number, filename = re.search(
++            r'^{}version-(\d+)/(.*)$'.format(ARCHIVE_URL), url).groups()
++        yield int(revision_number), filename, file_time
++
++
++def get_s3_revision_info():
++    all_revisions = {}
++    for revision_number, file_name, file_time in list_s3_files():
++        revision = all_revisions.setdefault(revision_number, {
++            'result': {},
++            'artifact_time': LONG_AGO,
++            })
++        if file_name in ('result.yaml', 'result.json'):
++            # Many result.json files were added on 2014-08-14 for older
++            # builds, so we may have both a result.yaml file and a
++            # result.json file.
++            revision['result'][file_time] = file_name
++        else:
++            revision['artifact_time'] = max(
++                revision['artifact_time'], file_time)
++    # The most recent version may currently be building, hence a check
++    # if the result file exists is useless.
++    del all_revisions[max(all_revisions)]
++    result_file_time = revision['artifact_time']
++    for revision_number, revision_data in sorted(all_revisions.items()):
++        if not revision_data['result']:
++            result_file_name = None
++        else:
++            result_file_time = min(revision_data['result'])
++            # If both a result.yaml and a result.json file exist, use
++            # the newer one.
++            newer = max(revision_data['result'])
++            result_file_name = revision_data['result'][newer]
++        yield revision_number, result_file_name, result_file_time
++
++def main(args):
++    ci_director_state = get_ci_director_state()
++    for revision_number, result_file, artifact_time in get_s3_revision_info():
++        state_file_result = ci_director_state.get(revision_number)
++        if state_file_result is None:
++            print(
++                "Warning: No state file data available for revision",
++                revision_number)
++            continue
++        if result_file is not None:
++            with temp_dir() as workspace:
++                copy_from = '{}version-{}/{}'.format(
++                    ARCHIVE_URL, revision_number, result_file)
++                copy_to = os.path.join(workspace, result_file)
++                s3_cmd(['--no-progress', 'get', copy_from, copy_to])
++                with open(copy_to) as f:
++                    s3_result = yaml.load(f)
++                # For paranoids: Check that the data from S3 is a subset
++                # of the data from the state file
++                s3_keys = set(s3_result)
++                state_keys = set(ci_director_state[revision_number])
++                if not s3_keys.issubset(state_keys):
++                    print(
++                        "Warning: S3 result file for {} contains keys that do "
++                        "not exist in the main state file: {}".format(
++                            revision_number, s3_keys.difference(state_keys)))
++                    continue
++                comparable_state_data = dict(
++                    (k, v)
++                    for k, v in ci_director_state[revision_number].items()
++                    if k in s3_keys)
++                if comparable_state_data != s3_result:
++                    # This can happen when the result file was written
++                    # when a -devel job is still running.
++                    print(
++                        "Warning: Diverging data for revision {} in S3 ({}) "
++                        "and in state file ({}).".format(
++                            revision_number, s3_result,
++                            ci_director_state[revision_number]))
++                if 'result' in s3_result:
++                    continue
++
++        if 'finished' not in state_file_result:
++            state_file_result['finished'] = artifact_time.strftime(
++                ISO_8601_FORMAT)
++        with NamedTemporaryFile() as new_result_file:
++            json.dump(state_file_result, new_result_file)
++            new_result_file.flush()
++            dest_url = '{}version-{}/result.json'.format(
++                ARCHIVE_URL, revision_number)
++            params = ['put', new_result_file.name, dest_url]
++            if args.dry_run:
++                print(*(['s3cmd'] + params))
++            else:
++                s3_cmd(params)
++
++if __name__ == '__main__':
++    parser = ArgumentParser()
++    parser.add_argument('--dry-run', action='store_true')
++    args = parser.parse_args()
++    main(args)
 === modified file 'backup-to-s3.py'
 --- backup-to-s3.py	2014-07-25 12:02:32 +0000
 +++ backup-to-s3.py	2014-09-26 15:12:24 +0000
@@ -6,6 +6,8 @@
  import re
  import subprocess
++from utility import s3_cmd
++
  MAX_BACKUPS = 10
  BACKUP_URL = 's3://juju-qa-data/juju-ci/backups/'
@@ -26,17 +28,6 @@
+     ]
--def s3_cmd(params, drop_output=False):
--    s3cfg_path = os.path.join(
--        os.environ['HOME'], 'cloud-city/juju-qa.s3cfg')
--    if drop_output:
--        return subprocess.check_call(
--            ['s3cmd', '-c', s3cfg_path] + params, stdout=open('/dev/null', 'w'))
--    else:
--        return subprocess.check_output(
--            ['s3cmd', '-c', s3cfg_path] + params)
--
--
  def current_backups():
      """Return a list of S3 URLs of existing backups."""
      # We expect lines like
 === modified file 'utility.py'
 --- utility.py	2014-09-20 00:05:37 +0000
 +++ utility.py	2014-09-26 15:12:24 +0000
@@ -133,3 +133,14 @@
              'path': path, 'mount': df_result[5], 'required': required,
              'available': available, 'purpose': purpose
              })
++
++
++def s3_cmd(params, drop_output=False):
++    s3cfg_path = os.path.join(
++        os.environ['HOME'], 'cloud-city/juju-qa.s3cfg')
++    command = ['s3cmd', '-c', s3cfg_path, '--no-progress'] + params
++    if drop_output:
++        return subprocess.check_call(
++            command, stdout=open('/dev/null', 'w'))
++    else:
++        return subprocess.check_output(command)

juju-ci-tools

Merge lp:~adeuring/juju-ci-tools/add-missing-results-files-to-s3 into lp:juju-ci-tools

Commit message

Description of the change

Preview Diff

Subscribers