Merge lp:~vila/ubuntu-ci-services-itself/debug into lp:ubuntu-ci-services-itself

Proposed by Vincent Ladeuil on 2014-03-17
Status: Merged
Approved by: Vincent Ladeuil on 2014-03-18
Approved revision: 405
Merged at revision: 411
Proposed branch: lp:~vila/ubuntu-ci-services-itself/debug
Merge into: lp:ubuntu-ci-services-itself
Diff against target: 245 lines (+83/-54)
3 files modified
test_runner/run_test.py (+7/-3)
test_runner/tstrun/run_worker.py (+64/-42)
test_runner/tstrun/testbed.py (+12/-9)
To merge this branch: bzr merge lp:~vila/ubuntu-ci-services-itself/debug
Reviewer Review Type Date Requested Status
Andy Doan (community) 2014-03-17 Approve on 2014-03-18
PS Jenkins bot (community) continuous-integration Approve on 2014-03-18
Review via email: mp+211384@code.launchpad.net

Commit message

Rename summary and log produced on the test bed and add them to the artifacts

Description of the change

This provides more logs for the test runner to help debug.

Paul encountered a case where the subunit stream content was broken. This
requires the summary.log created by adt-run.

The log of the test run itself wasn't saved as an artifact, it is now,
package by package.

Finally, the cloud-init log was only available on errors, it's now available
in all cases as it may contains useful info for diagnosis (like which
package *versions* were installed).

I've tested locally and will run a ticket through the engine (as soon as I get an updated deployment...)

To post a comment you must log in.
PS Jenkins bot (ps-jenkins) wrote :

FAILED: Continuous integration, rev:403
No commit message was specified in the merge proposal. Click on the following link and set the commit message (if you want a jenkins rebuild you need to trigger it yourself):
https://code.launchpad.net/~vila/ubuntu-ci-services-itself/debug/+merge/211384/+edit-commit-message

http://s-jenkins.ubuntu-ci:8080/job/uci-engine-ci/449/
Executed test runs:

Click here to trigger a rebuild:
http://s-jenkins.ubuntu-ci:8080/job/uci-engine-ci/449/rebuild

review: Needs Fixing (continuous-integration)
404. By Vincent Ladeuil on 2014-03-17

Fix pep8 issues.

Andy Doan (doanac) wrote :

100 + results.setdefault('artifacts', []).append({
101 + 'name': '{}.{}'.format(self.logger_name, name),
102 + 'reference': url,
103 + 'type': kind,
104 + })

there's a method in the base class that does this now:

 http://bazaar.launchpad.net/~canonical-ci-engineering/ubuntu-ci-services-itself/trunk/view/head:/ci-utils/ci_utils/amqp_worker.py#L135

7 def run_test(package):
114 + def save_testbed_artifacts(self, logger, results, test_bed, package):

I wish there was a better way to keep the artifacts in sync between these two modules. ie just pull everything from a directory and make an assumption on the file-type if its LOGS/RESULTS type. But I don't see anything obvious or easy right now. Just something to think about in the future.

PS Jenkins bot (ps-jenkins) wrote :

PASSED: Continuous integration, rev:404
http://s-jenkins.ubuntu-ci:8080/job/uci-engine-ci/451/
Executed test runs:

Click here to trigger a rebuild:
http://s-jenkins.ubuntu-ci:8080/job/uci-engine-ci/451/rebuild

review: Approve (continuous-integration)
405. By Vincent Ladeuil on 2014-03-18

Fix review comments.

Vincent Ladeuil (vila) wrote :

> 100 + results.setdefault('artifacts', []).append({
> 101 + 'name': '{}.{}'.format(self.logger_name, name),
> 102 + 'reference': url,
> 103 + 'type': kind,
> 104 + })
>
> there's a method in the base class that does this now:
>
> http://bazaar.launchpad.net/~canonical-ci-engineering/ubuntu-ci-services-
> itself/trunk/view/head:/ci-utils/ci_utils/amqp_worker.py#L135

Called.

>
>
> 7 def run_test(package):
> 114 + def save_testbed_artifacts(self, logger, results, test_bed,
> package):
>
> I wish there was a better way to keep the artifacts in sync between these two
> modules. ie just pull everything from a directory and make an assumption on
> the file-type if its LOGS/RESULTS type. But I don't see anything obvious or
> easy right now. Just something to think about in the future.

Agreed, it's not super pretty, especially when you consider that they are downloaded from the testbed to be uploaded to swift. I didn't want to refactor too aggressively either without tests.

PS Jenkins bot (ps-jenkins) wrote :

PASSED: Continuous integration, rev:405
http://s-jenkins.ubuntu-ci:8080/job/uci-engine-ci/453/
Executed test runs:

Click here to trigger a rebuild:
http://s-jenkins.ubuntu-ci:8080/job/uci-engine-ci/453/rebuild

review: Approve (continuous-integration)
Andy Doan (doanac) wrote :

On 03/17/2014 07:52 PM, Vincent Ladeuil wrote:
> Agreed, it's not super pretty, especially when you consider that they are downloaded from the testbed to be uploaded to swift. I didn't want to refactor too aggressively either without tests.
agreed also.

Andy Doan (doanac) :
review: Approve
Vincent Ladeuil (vila) wrote :

Finally ! After an epic fight against swift uploads on hp (ticket filed with support), it appears that I can get some success as long as my uploads stay below some mysterious limit... with enough retries.

So, I've tested this: http://15.125.78.170/ticket.html?ticket_id=1

All logs are there, properly named, even in the left column under 'Image Testing' for easy access, pfew, what a journey.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'test_runner/run_test.py'
2--- test_runner/run_test.py 2014-03-14 10:37:28 +0000
3+++ test_runner/run_test.py 2014-03-18 00:50:25 +0000
4@@ -137,7 +137,8 @@
5
6
7 def run_test(package):
8- output = open('result.subunit', 'wb')
9+ subunit_path = '{}.subunit'.format(package)
10+ output = open(subunit_path, 'wb')
11 result = subunit.TestProtocolClient(output)
12 result.startTestRun()
13 os.mkdir(package)
14@@ -147,15 +148,18 @@
15 # MISSINGTEST: dsc file not found
16 dsc_path = get_dsc_path(package, cwd=package)
17 try:
18+ summary_path = '{}-summary.log'.format(package)
19+ log_path = '{}.log'.format(package)
20 cmd = ['sudo', 'adt-run', dsc_path,
21- '--summary', 'summary.log',
22+ '--summary', summary_path,
23+ '--log-file', log_path,
24 # Required to get files produced in 'results'
25 '--paths-testbed',
26 '--output-dir', 'results',
27 '---', 'adt-virt-null']
28 proc, out, err = run(*cmd, out=None, err=None, check_rc=False)
29 rc = proc.returncode
30- for name, status in parse_summary('summary.log'):
31+ for name, status in parse_summary(summary_path):
32 process_test_result(package, result, name, status)
33 finally:
34 result.stopTestRun()
35
36=== modified file 'test_runner/tstrun/run_worker.py'
37--- test_runner/tstrun/run_worker.py 2014-03-14 21:54:37 +0000
38+++ test_runner/tstrun/run_worker.py 2014-03-18 00:50:25 +0000
39@@ -28,39 +28,60 @@
40 super(TestRunnerWorker, self).__init__('image_test')
41 self.data_store = None
42
43- def save_subunit(self, log, package, stream, results):
44- # make this exception-safe since we can already report pass/fail
45- # with or without this file
46- try:
47- log.info('Saving subunit results for {}'.format(package))
48- name = 'test-runner.{}-subunit-stream'.format(package)
49- url = self.data_store.put_file(name, stream, 'text/plain')
50- results.setdefault('artifacts', []).append({
51- 'name': name,
52- 'reference': url,
53- 'type': 'RESULTS',
54- })
55- except:
56- log.exception(
57- 'Unable to upload subunit result for {}'.format(package))
58-
59- def save_testbed_console(self, log, test_bed, results):
60- # make this exception-safe since we can already report pass/fail
61- # with or without this file
62- try:
63- log.info('Saving testbed console')
64- name = 'test-runner.testbed-cloud-init.log'
65- console = test_bed.get_cloud_init_console()
66- url = self.data_store.put_file(name, console, 'text/plain')
67- results.setdefault('artifacts', []).append({
68- 'name': name,
69- 'reference': url,
70- 'type': 'LOGS',
71- })
72- except:
73- log.exception('unable to upload testbed console')
74-
75- def handle_request(self, log, params):
76+ def save_artifact(self, logger, results, name, value, description=None,
77+ kind='LOGS'):
78+ """Save an artifact catching and reporting exceptions.
79+
80+ This should only be used to upload artifacts after the pass/fail status
81+ has been established to it's safe to fail the upload.
82+
83+ :param logger: To report execution.
84+
85+ :param results: The dict holding the 'artifacts' attribute.
86+
87+ :param name: The artifact name.
88+
89+ :param value: The artifact value as a string.
90+
91+ :param description: For reporting purposes.
92+
93+ :param kind: The kind of artifact (defaults to 'LOGS').
94+ """
95+ if description is None:
96+ description = name
97+ try:
98+ logger.info('Saving {}'.format(description))
99+ url = self.data_store.put_file(name, value, 'text/plain')
100+ self._create_artifact('{}.{}'.format(self.logger_name, name),
101+ url, results, kind)
102+ except:
103+ logger.exception('Unable to upload {}'.format(description))
104+
105+ def save_testbed_console(self, logger, results, test_bed):
106+ self.save_artifact(logger, results,
107+ 'testbed-cloud-init.log',
108+ test_bed.get_cloud_init_console(),
109+ 'testbed console')
110+
111+ def save_testbed_artifacts(self, logger, results, test_bed, package):
112+ subunit_path = '{}.subunit'.format(package)
113+ self.save_artifact(
114+ logger, results,
115+ subunit_path, test_bed.get_remote_content(subunit_path),
116+ 'subunit results for {}'.format(package),
117+ 'RESULTS')
118+ package_log_path = '{}.log'.format(package)
119+ self.save_artifact(
120+ logger, results,
121+ package_log_path, test_bed.get_remote_content(package_log_path),
122+ 'adt-run log for {}'.format(package))
123+ summary_log_path = '{}-summary.log'.format(package)
124+ self.save_artifact(
125+ logger, results,
126+ summary_log_path, test_bed.get_remote_content(summary_log_path),
127+ 'adt-run summary for {}'.format(package))
128+
129+ def handle_request(self, logger, params):
130 ticket_id = params['ticket_id']
131 progress_queue = params['progress_trigger']
132 image_id = params['image_id']
133@@ -69,7 +90,7 @@
134 results = {}
135
136 def status_cb(msg):
137- log.info(msg)
138+ logger.info(msg)
139 amqp_utils.progress_update(progress_queue, {'message': msg})
140
141 self.data_store = self._create_data_store(ticket_id)
142@@ -79,41 +100,42 @@
143 test_bed = testbed.TestBed('testbed-{}'.format(progress_queue),
144 flavors, image_id, status_cb)
145 except:
146- log.exception(
147+ logger.exception(
148 'The testbed creation for {} failed'.format(ticket_id))
149 return amqp_utils.progress_failed, results
150
151 try:
152 test_bed.setup()
153 except:
154- log.exception(
155+ logger.exception(
156 'The testbed setup for {} failed'.format(ticket_id))
157 if test_bed.instance is not None:
158- self.save_testbed_console(log, test_bed, results)
159+ self.save_testbed_console(logger, results, test_bed)
160 test_bed.teardown()
161 return amqp_utils.progress_failed, results
162
163+ self.save_testbed_console(logger, results, test_bed)
164 status_cb('The test bed is ready')
165 # The tests will succeed unless they fail ;)
166 notify = amqp_utils.progress_completed
167 try:
168 for package in package_list:
169 if params.get('cancelled'):
170- log.error('The request for {} has been cancelled,'
171- ' exiting'.format(ticket_id))
172+ logger.error('The request for {} has been cancelled,'
173+ ' exiting'.format(ticket_id))
174 return amqp_utils.progress_failed, results
175 status_cb('Testing {}'.format(package))
176 # uci-vms shell adt-run ... --- adt-virt-null
177- return_code, subunit_stream = test_bed.run_test(package)
178+ return_code = test_bed.run_test(package)
179 # 0 is success, 8 is skipped and considered a success
180 if return_code not in (0, 8):
181 # At least one test failed
182 notify = amqp_utils.progress_failed
183- self.save_subunit(log, package, subunit_stream, results)
184+ self.save_testbed_artifacts(logger, results, test_bed, package)
185 status_cb('Test completed for ticket {}'.format(ticket_id))
186 return notify, results
187 except:
188- log.exception(
189+ logger.exception(
190 'Exception while handling ticket {}'.format(ticket_id))
191 raise
192 finally:
193
194=== modified file 'test_runner/tstrun/testbed.py'
195--- test_runner/tstrun/testbed.py 2014-03-14 17:57:46 +0000
196+++ test_runner/tstrun/testbed.py 2014-03-18 00:50:25 +0000
197@@ -245,7 +245,7 @@
198 def wait_for_cloud_init(self):
199 # FIXME: cloud_init_timeout should be a config option (related to
200 # get_ip_timeout and probably the two can be merged) -- vila 2014-01-30
201- cloud_init_timeout = 300 # in seconds so 5 minutes
202+ cloud_init_timeout = 600 # in seconds so 10 minutes
203 timeout_limit = time.time() + cloud_init_timeout
204 while time.time() < timeout_limit:
205 # A relatively cheap way to catch cloud-init completion is to watch
206@@ -262,8 +262,6 @@
207 # We're good to go
208 log.info(
209 'cloud-init completed for {}'.format(self.instance.id))
210- # FIXME: Right place to report how long it
211- # took for cloud-init to finish. -- vila 2014-01-30
212 return
213 time.sleep(5)
214 raise NovaClientException('Instance never completed cloud-init')
215@@ -295,12 +293,13 @@
216 out, err = proc.communicate()
217 return proc, out, err
218
219+ def get_remote_content(self, path):
220+ _, content, _ = self.ssh('cat', path, out=subprocess.PIPE)
221+ return content
222+
223 def run_test(self, package):
224 proc, _, _ = self.ssh('/tmp/run_test.py', package)
225- # FIXME: Does that look like a pipe ? -- vila 2014-02-06
226- _, subunit_stream, _ = self.ssh('cat', 'result.subunit',
227- out=subprocess.PIPE)
228- return proc.returncode, subunit_stream
229+ return proc.returncode
230
231
232 # The following are helpers for local manual tests, they should disappear once
233@@ -345,6 +344,10 @@
234 from ci_utils import dump_stack
235 dump_stack.install_stack_dump_signal()
236 test_bed = test_print_ip()
237- print test_bed.run_test('libpng') # Known to run no tests
238- print test_bed.run_test('juju-core')
239+ # libpng is known to run no tests
240+ for package in ('libpng',):
241+ print test_bed.run_test(package)
242+# print test_bed.get_remote_content('{}.log'.format(package))
243+ print test_bed.get_remote_content('{}-summary.log'.format(package))
244+ print test_bed.get_remote_content('{}.subunit'.format(package))
245 test_bed.teardown()

Subscribers

People subscribed via source and target branches