Ubuntu CI Services

Merge lp:~vila/ubuntu-ci-services-itself/debug into lp:ubuntu-ci-services-itself

debug
Merge into trunk

Proposed by Vincent Ladeuil on 2014-03-17

Status:	Merged
Approved by:	Vincent Ladeuil on 2014-03-18
Approved revision:	405
Merged at revision:	411
Proposed branch:	lp:~vila/ubuntu-ci-services-itself/debug
Merge into:	lp:ubuntu-ci-services-itself
Diff against target:	245 lines (+83/-54) 3 files modified test_runner/run_test.py (+7/-3) test_runner/tstrun/run_worker.py (+64/-42) test_runner/tstrun/testbed.py (+12/-9)
To merge this branch:	bzr merge lp:~vila/ubuntu-ci-services-itself/debug
Related bugs:	Link a bug report

Reviewer	Review Type	Date Requested	Status
Andy Doan (community)		2014-03-17	Approve on 2014-03-18
PS Jenkins bot (community)	continuous-integration		Approve on 2014-03-18
Review via email: mp+211384@code.launchpad.net

Commit message

Rename summary and log produced on the test bed and add them to the artifacts

Description of the change

This provides more logs for the test runner to help debug.

Paul encountered a case where the subunit stream content was broken. This
requires the summary.log created by adt-run.

The log of the test run itself wasn't saved as an artifact, it is now,
package by package.

Finally, the cloud-init log was only available on errors, it's now available
in all cases as it may contains useful info for diagnosis (like which
package *versions* were installed).

I've tested locally and will run a ticket through the engine (as soon as I get an updated deployment...)

Revision history for this message

PS Jenkins bot (ps-jenkins) wrote on 2014-03-17:

FAILED: Continuous integration, rev:403
No commit message was specified in the merge proposal. Click on the following link and set the commit message (if you want a jenkins rebuild you need to trigger it yourself):
https://code.launchpad.net/~vila/ubuntu-ci-services-itself/debug/+merge/211384/+edit-commit-message

http://s-jenkins.ubuntu-ci:8080/job/uci-engine-ci/449/
Executed test runs:

Click here to trigger a rebuild:
http://s-jenkins.ubuntu-ci:8080/job/uci-engine-ci/449/rebuild

review: Needs Fixing (continuous-integration)

lp:~vila/ubuntu-ci-services-itself/debug updated on 2014-03-17

404. By Vincent Ladeuil on 2014-03-17: Fix pep8 issues.

Revision history for this message

Andy Doan (doanac) wrote on 2014-03-17:

100 + results.setdefault('artifacts', []).append({
101 + 'name': '{}.{}'.format(self.logger_name, name),
102 + 'reference': url,
103 + 'type': kind,
104 + })

there's a method in the base class that does this now:

http://bazaar.launchpad.net/~canonical-ci-engineering/ubuntu-ci-services-itself/trunk/view/head:/ci-utils/ci_utils/amqp_worker.py#L135

7 def run_test(package):
114 + def save_testbed_artifacts(self, logger, results, test_bed, package):

I wish there was a better way to keep the artifacts in sync between these two modules. ie just pull everything from a directory and make an assumption on the file-type if its LOGS/RESULTS type. But I don't see anything obvious or easy right now. Just something to think about in the future.

Revision history for this message

PS Jenkins bot (ps-jenkins) wrote on 2014-03-17:

PASSED: Continuous integration, rev:404
http://s-jenkins.ubuntu-ci:8080/job/uci-engine-ci/451/
Executed test runs:

Click here to trigger a rebuild:
http://s-jenkins.ubuntu-ci:8080/job/uci-engine-ci/451/rebuild

review: Approve (continuous-integration)

lp:~vila/ubuntu-ci-services-itself/debug updated on 2014-03-18

405. By Vincent Ladeuil on 2014-03-18: Fix review comments.

Revision history for this message

Vincent Ladeuil (vila) wrote on 2014-03-18:

> 100 + results.setdefault('artifacts', []).append({
> 101 + 'name': '{}.{}'.format(self.logger_name, name),
> 102 + 'reference': url,
> 103 + 'type': kind,
> 104 + })
>
> there's a method in the base class that does this now:
>
> http://bazaar.launchpad.net/~canonical-ci-engineering/ubuntu-ci-services-
> itself/trunk/view/head:/ci-utils/ci_utils/amqp_worker.py#L135

Called.

>
>
> 7 def run_test(package):
> 114 + def save_testbed_artifacts(self, logger, results, test_bed,
> package):
>
> I wish there was a better way to keep the artifacts in sync between these two
> modules. ie just pull everything from a directory and make an assumption on
> the file-type if its LOGS/RESULTS type. But I don't see anything obvious or
> easy right now. Just something to think about in the future.

Agreed, it's not super pretty, especially when you consider that they are downloaded from the testbed to be uploaded to swift. I didn't want to refactor too aggressively either without tests.

Revision history for this message

PS Jenkins bot (ps-jenkins) wrote on 2014-03-18:

PASSED: Continuous integration, rev:405
http://s-jenkins.ubuntu-ci:8080/job/uci-engine-ci/453/
Executed test runs:

Click here to trigger a rebuild:
http://s-jenkins.ubuntu-ci:8080/job/uci-engine-ci/453/rebuild

review: Approve (continuous-integration)

Revision history for this message

Andy Doan (doanac) wrote on 2014-03-18:

On 03/17/2014 07:52 PM, Vincent Ladeuil wrote:
> Agreed, it's not super pretty, especially when you consider that they are downloaded from the testbed to be uploaded to swift. I didn't want to refactor too aggressively either without tests.
agreed also.

Revision history for this message

Andy Doan (doanac) on 2014-03-18:

review: Approve

Revision history for this message

Vincent Ladeuil (vila) wrote on 2014-03-18:

Finally ! After an epic fight against swift uploads on hp (ticket filed with support), it appears that I can get some success as long as my uploads stay below some mysterious limit... with enough retries.

So, I've tested this: http://15.125.78.170/ticket.html?ticket_id=1

All logs are there, properly named, even in the left column under 'Image Testing' for easy access, pfew, what a journey.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Canonical CI Engineering

Vincent Ladeuil

 === modified file 'test_runner/run_test.py'
 --- test_runner/run_test.py	2014-03-14 10:37:28 +0000
 +++ test_runner/run_test.py	2014-03-18 00:50:25 +0000
@@ -137,7 +137,8 @@
  def run_test(package):
--    output = open('result.subunit', 'wb')
++    subunit_path = '{}.subunit'.format(package)
++    output = open(subunit_path, 'wb')
      result = subunit.TestProtocolClient(output)
      result.startTestRun()
      os.mkdir(package)
@@ -147,15 +148,18 @@
      # MISSINGTEST: dsc file not found
      dsc_path = get_dsc_path(package, cwd=package)
      try:
++        summary_path = '{}-summary.log'.format(package)
++        log_path = '{}.log'.format(package)
          cmd = ['sudo', 'adt-run', dsc_path,
--               '--summary', 'summary.log',
++               '--summary', summary_path,
++               '--log-file', log_path,
                 # Required to get files produced in 'results'
                 '--paths-testbed',
                 '--output-dir', 'results',
                 '---', 'adt-virt-null']
          proc, out, err = run(*cmd, out=None, err=None, check_rc=False)
          rc = proc.returncode
--        for name, status in parse_summary('summary.log'):
++        for name, status in parse_summary(summary_path):
              process_test_result(package, result, name, status)
      finally:
          result.stopTestRun()
 === modified file 'test_runner/tstrun/run_worker.py'
 --- test_runner/tstrun/run_worker.py	2014-03-14 21:54:37 +0000
 +++ test_runner/tstrun/run_worker.py	2014-03-18 00:50:25 +0000
@@ -28,39 +28,60 @@
          super(TestRunnerWorker, self).__init__('image_test')
          self.data_store = None
--    def save_subunit(self, log, package, stream, results):
--        # make this exception-safe since we can already report pass/fail
--        # with or without this file
--        try:
--            log.info('Saving subunit results for {}'.format(package))
--            name = 'test-runner.{}-subunit-stream'.format(package)
--            url = self.data_store.put_file(name, stream, 'text/plain')
--            results.setdefault('artifacts', []).append({
--                'name': name,
--                'reference': url,
--                'type': 'RESULTS',
--            })
--        except:
--            log.exception(
--                'Unable to upload subunit result for {}'.format(package))
--
--    def save_testbed_console(self, log, test_bed, results):
--        # make this exception-safe since we can already report pass/fail
--        # with or without this file
--        try:
--            log.info('Saving testbed console')
--            name = 'test-runner.testbed-cloud-init.log'
--            console = test_bed.get_cloud_init_console()
--            url = self.data_store.put_file(name, console, 'text/plain')
--            results.setdefault('artifacts', []).append({
--                'name': name,
--                'reference': url,
--                'type': 'LOGS',
--            })
--        except:
--            log.exception('unable to upload testbed console')
--
--    def handle_request(self, log, params):
++    def save_artifact(self, logger,  results, name, value, description=None,
++                      kind='LOGS'):
++        """Save an artifact catching and reporting exceptions.
++
++        This should only be used to upload artifacts after the pass/fail status
++        has been established to it's safe to fail the upload.
++
++        :param logger: To report execution.
++
++        :param results: The dict holding the 'artifacts' attribute.
++
++        :param name: The artifact name.
++
++        :param value: The artifact value as a string.
++
++        :param description: For reporting purposes.
++
++        :param kind: The kind of artifact (defaults to 'LOGS').
++        """
++        if description is None:
++            description = name
++        try:
++            logger.info('Saving {}'.format(description))
++            url = self.data_store.put_file(name, value, 'text/plain')
++            self._create_artifact('{}.{}'.format(self.logger_name, name),
++                                  url, results, kind)
++        except:
++            logger.exception('Unable to upload {}'.format(description))
++
++    def save_testbed_console(self, logger, results, test_bed):
++        self.save_artifact(logger, results,
++                           'testbed-cloud-init.log',
++                           test_bed.get_cloud_init_console(),
++                           'testbed console')
++
++    def save_testbed_artifacts(self, logger, results, test_bed, package):
++        subunit_path = '{}.subunit'.format(package)
++        self.save_artifact(
++            logger, results,
++            subunit_path, test_bed.get_remote_content(subunit_path),
++            'subunit results for {}'.format(package),
++            'RESULTS')
++        package_log_path = '{}.log'.format(package)
++        self.save_artifact(
++            logger, results,
++            package_log_path, test_bed.get_remote_content(package_log_path),
++            'adt-run log for {}'.format(package))
++        summary_log_path = '{}-summary.log'.format(package)
++        self.save_artifact(
++            logger, results,
++            summary_log_path, test_bed.get_remote_content(summary_log_path),
++            'adt-run summary for {}'.format(package))
++
++    def handle_request(self, logger, params):
          ticket_id = params['ticket_id']
          progress_queue = params['progress_trigger']
          image_id = params['image_id']
@@ -69,7 +90,7 @@
          results = {}
          def status_cb(msg):
--            log.info(msg)
++            logger.info(msg)
              amqp_utils.progress_update(progress_queue, {'message': msg})
          self.data_store = self._create_data_store(ticket_id)
@@ -79,41 +100,42 @@
              test_bed = testbed.TestBed('testbed-{}'.format(progress_queue),
                                         flavors, image_id, status_cb)
          except:
--            log.exception(
++            logger.exception(
                  'The testbed creation for {} failed'.format(ticket_id))
              return amqp_utils.progress_failed, results
          try:
              test_bed.setup()
          except:
--            log.exception(
++            logger.exception(
                  'The testbed setup for {} failed'.format(ticket_id))
              if test_bed.instance is not None:
--                self.save_testbed_console(log, test_bed, results)
++                self.save_testbed_console(logger, results, test_bed)
                  test_bed.teardown()
              return amqp_utils.progress_failed, results
++        self.save_testbed_console(logger, results, test_bed)
          status_cb('The test bed is ready')
          # The tests will succeed unless they fail ;)
          notify = amqp_utils.progress_completed
          try:
              for package in package_list:
                  if params.get('cancelled'):
--                    log.error('The request  for {} has been cancelled,'
--                              ' exiting'.format(ticket_id))
++                    logger.error('The request for {} has been cancelled,'
++                                 ' exiting'.format(ticket_id))
                      return amqp_utils.progress_failed, results
                  status_cb('Testing {}'.format(package))
                  # uci-vms shell adt-run ...  --- adt-virt-null
--                return_code, subunit_stream = test_bed.run_test(package)
++                return_code = test_bed.run_test(package)
                  # 0 is success, 8 is skipped and considered a success
                  if return_code not in (0, 8):
                      # At least one test failed
                      notify = amqp_utils.progress_failed
--                self.save_subunit(log, package, subunit_stream, results)
++                self.save_testbed_artifacts(logger, results, test_bed, package)
              status_cb('Test completed for ticket {}'.format(ticket_id))
              return notify, results
          except:
--            log.exception(
++            logger.exception(
                  'Exception while handling ticket {}'.format(ticket_id))
              raise
          finally:
 === modified file 'test_runner/tstrun/testbed.py'
 --- test_runner/tstrun/testbed.py	2014-03-14 17:57:46 +0000
 +++ test_runner/tstrun/testbed.py	2014-03-18 00:50:25 +0000
@@ -245,7 +245,7 @@
      def wait_for_cloud_init(self):
          # FIXME: cloud_init_timeout should be a config option (related to
          # get_ip_timeout and probably the two can be merged) -- vila 2014-01-30
--        cloud_init_timeout = 300  # in seconds so 5 minutes
++        cloud_init_timeout = 600  # in seconds so 10 minutes
          timeout_limit = time.time() + cloud_init_timeout
          while time.time() < timeout_limit:
              # A relatively cheap way to catch cloud-init completion is to watch
@@ -262,8 +262,6 @@
                  # We're good to go
                  log.info(
                      'cloud-init completed for {}'.format(self.instance.id))
--                # FIXME: Right place to report how long it
--                # took for cloud-init to finish. -- vila 2014-01-30
                  return
              time.sleep(5)
          raise NovaClientException('Instance never completed cloud-init')
@@ -295,12 +293,13 @@
          out, err = proc.communicate()
          return proc, out, err
++    def get_remote_content(self, path):
++        _, content, _ = self.ssh('cat', path, out=subprocess.PIPE)
++        return content
++
      def run_test(self, package):
          proc, _, _ = self.ssh('/tmp/run_test.py', package)
--        # FIXME: Does that look like a pipe ? -- vila 2014-02-06
--        _, subunit_stream, _ = self.ssh('cat', 'result.subunit',
--                                        out=subprocess.PIPE)
--        return proc.returncode, subunit_stream
++        return proc.returncode
  # The following are helpers for local manual tests, they should disappear once
@@ -345,6 +344,10 @@
      from ci_utils import dump_stack
      dump_stack.install_stack_dump_signal()
      test_bed = test_print_ip()
--    print test_bed.run_test('libpng')  # Known to run no tests
--    print test_bed.run_test('juju-core')
++    # libpng is known to run no tests
++    for package in ('libpng',):
++        print test_bed.run_test(package)
++#        print test_bed.get_remote_content('{}.log'.format(package))
++        print test_bed.get_remote_content('{}-summary.log'.format(package))
++        print test_bed.get_remote_content('{}.subunit'.format(package))
      test_bed.teardown()