Merge lp:~terceiro/lava-scheduler/better-logging into lp:lava-scheduler

Proposed by Antonio Terceiro
Status: Merged
Merged at revision: 260
Proposed branch: lp:~terceiro/lava-scheduler/better-logging
Merge into: lp:lava-scheduler
Diff against target: 11 lines (+2/-0)
1 file modified
lava_scheduler_app/logfile_helper.py (+2/-0)
To merge this branch: bzr merge lp:~terceiro/lava-scheduler/better-logging
Reviewer Review Type Date Requested Status
Tyler Baker Approve
Review via email: mp+183992@code.launchpad.net

Description of the change

Catch other types of errors from the dispatcher logs

To post a comment you must log in.
Revision history for this message
Tyler Baker (tyler-baker) wrote :

I have tested this change locally. It is working properly, and really makes a difference when trying to find the root cause of a LAVA job.

Some examples:

http://community.validation.linaro.org/scheduler/job/708

http://community.validation.linaro.org/scheduler/job/255

Approved.

review: Approve
Revision history for this message
Tyler Baker (tyler-baker) wrote :

After playing around with this a bit..

I think we should remove the check for "is finished with error"

and instead raise a CriticalError in the dispatcher in job.py:

                    if status == 'fail':
                        # XXX mwhudson, 2013-01-17: I have no idea what this
                        # code is doing.
                        logging.warning(
                            "[ACTION-E] %s is finished with error (%s)." %
                            (cmd['command'], err))
                        err_msg = ("Lava failed at action %s with error:"
                                   "%s\n") % (cmd['command'],
                                              unicode(str(err),
                                                      'ascii', 'replace'))
                        if cmd['command'] == 'lava_test_run':
                            err_msg += "Lava failed on test: %s" % \
                                       params.get('test_name', "Unknown")
                        err_msg = err_msg + traceback.format_exc()
- print err_msg
+ raise CriticalError(err_msg)

Notice in this job:

http://community.validation.linaro.org/scheduler/job/825

OperationFailed: Soft reboot failed
OperationFailed: Soft reboot failed
CriticalError: Could not get the Android test image booted properly
<LAVA_DISPATCHER>2013-09-05 02:20:44 AM WARNING: [ACTION-E] boot_linaro_android_image is finished with error (Failed to boot test image.).

The last line looks a bit cryptic, would it not be better if it were like this? (What the expected output of the code snippet above would be)

OperationFailed: Soft reboot failed
OperationFailed: Soft reboot failed
CriticalError: Could not get the Android test image booted properly
CriticalError: boot_linaro_android_image is finished with error (Failed to boot test image.)

260. By Antonio Terceiro

Don't look for "is finished with error" anymore, message is too much

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lava_scheduler_app/logfile_helper.py'
2--- lava_scheduler_app/logfile_helper.py 2013-08-28 15:13:07 +0000
3+++ lava_scheduler_app/logfile_helper.py 2013-09-05 16:59:40 +0000
4@@ -5,6 +5,8 @@
5 errors = ""
6 for line in logfile:
7 if line.find("CriticalError:") != -1 or \
8+ line.find("OperationFailed:") != -1 or \
9+ line.find("ErrorMessage:") != -1 or \
10 line.find("Lava failed on test:") != -1:
11 errors += line
12

Subscribers

People subscribed via source and target branches