Merge lp:~jeffmarcom/opencompute/add-idle-and-syslog-check-tests into lp:opencompute/checkbox

Proposed by Jeff Marcom
Status: Merged
Approved by: Jeff Lane 
Approved revision: 2156
Merged at revision: 2154
Proposed branch: lp:~jeffmarcom/opencompute/add-idle-and-syslog-check-tests
Merge into: lp:opencompute/checkbox
Diff against target: 57 lines (+28/-0)
3 files modified
data/whitelists/opencompute-ready-local.whitelist (+3/-0)
debian/changelog (+1/-0)
jobs/miscellanea.txt.in (+24/-0)
To merge this branch: bzr merge lp:~jeffmarcom/opencompute/add-idle-and-syslog-check-tests
Reviewer Review Type Date Requested Status
Jeff Lane  Approve
Review via email: mp+187627@code.launchpad.net

Commit message

Added a couple of idle verification and syslog error check tests to the opencompute ready whitelist

Description of the change

This adds a couple of idle verification and syslog error check tests to the opencompute ready whitelist.

To post a comment you must log in.
2154. By Jeff Marcom

Added 12hr idle, idle check, and syslog check tests to opencompute ready whitelist

Signed-off-by: Jeff Marcom <email address hidden>

2155. By Jeff Marcom

Added job definitions for 12hr idle and syslog tests

Signed-off-by: Jeff Marcom <email address hidden>

2156. By Jeff Marcom

Updated changelog

Signed-off-by: Jeff Marcom <email address hidden>

Revision history for this message
Jeff Marcom (jeffmarcom) wrote :

rebased to fix changelog conflict

Revision history for this message
Jeff Lane  (bladernr) wrote :

I wonder about the placement of the 12 hour idle test.

Your assumption seems to be that the tester will idle the machine for > 12 hours before running checkbox, as evidinced by the idle check test and syslog check to look for errors. However, what if the tester idles for 12 hours then reboots the machine before starting checkbox? They wont necessarily know to not do that until the idle test itself runs after idle_check and syslog_check.

What about actually putting that at the end of the test run and making it all automatic:
at end of the run do a manual test for 12 hour IDLE that looks like this:
description:
 PURPOSE:
  To test that they system idles for 12 hours minimum
 STEPS:
  Select Test to begin the 12 hour idle test. Wait the prescribed amount of time and the
  testing should complete automatically
 VERIFICATION:
  The verification of this test is automatic
command:
 for x in `seq 1 12`; do echo "12 Hour Idle Test: Hour $x"; sleep 1h; done && True

followed up by idle_check and syslog_check just before finishing the run.

That, at least, guarantees that all the bits will work. My concern is that the way it's currently laid out is brittle (not sure if my idea is that much less brittle, but it seems so to me, feel free to poke holes)

review: Needs Information
Revision history for this message
Jeff Marcom (jeffmarcom) wrote :

> I wonder about the placement of the 12 hour idle test.
>
> Your assumption seems to be that the tester will idle the machine for > 12
> hours before running checkbox, as evidinced by the idle check test and syslog
> check to look for errors. However, what if the tester idles for 12 hours then
> reboots the machine before starting checkbox? They wont necessarily know to
> not do that until the idle test itself runs after idle_check and syslog_check.

That's why I tried to include a manual verification test. I assume it may happen.
>
> What about actually putting that at the end of the test run and making it all
> automatic:
> at end of the run do a manual test for 12 hour IDLE that looks like this:
> description:
> PURPOSE:
> To test that they system idles for 12 hours minimum
> STEPS:
> Select Test to begin the 12 hour idle test. Wait the prescribed amount of
> time and the
> testing should complete automatically
> VERIFICATION:
> The verification of this test is automatic
> command:
> for x in `seq 1 12`; do echo "12 Hour Idle Test: Hour $x"; sleep 1h; done &&
> True
>
> followed up by idle_check and syslog_check just before finishing the run.
>
> That, at least, guarantees that all the bits will work. My concern is that
> the way it's currently laid out is brittle (not sure if my idea is that much
> less brittle, but it seems so to me, feel free to poke holes)

The problem is that I believe they want checkbox to actually perform testing for ready in ~ 4 hours.

Revision history for this message
Jeff Marcom (jeffmarcom) wrote :

I don't think it's "brittle" in that the test works as advertised...I would agree that the test is extremely bare-bones...but so is the test itself really.

Revision history for this message
Jeff Lane  (bladernr) wrote :

Fair enough, though I'd really like them to provide feedback on these things too, we can change it as necessary later on.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'data/whitelists/opencompute-ready-local.whitelist'
2--- data/whitelists/opencompute-ready-local.whitelist 2013-09-25 17:30:41 +0000
3+++ data/whitelists/opencompute-ready-local.whitelist 2013-09-25 23:51:31 +0000
4@@ -68,6 +68,9 @@
5 memory/mcelog_check
6 __miscellanea__
7 miscellanea/ipmi_test
8+miscellanea/idle_check
9+miscellanea/syslog_check
10+miscellanea/12hr_idle_verify
11 __networking__
12 networking/detect
13 networking/bandwidth
14
15=== modified file 'debian/changelog'
16--- debian/changelog 2013-09-25 22:41:35 +0000
17+++ debian/changelog 2013-09-25 23:51:31 +0000
18@@ -5,6 +5,7 @@
19 * Updated plainbox based on version 0.4.dev in lp:checkbox (16.12)
20 * Updated checkbox OCP intro prompt
21 * Updated Open Compute ready whitelist with new power management, cpu stress, and networking tests"
22+ * Added 12hr idle verification test and sylog check for PCI/Device errors
23
24 [ Jeff Lane ]
25 * Updated OCP Checkbox to latest checkbox trunk, 0.16.11 revno 2353
26
27=== modified file 'jobs/miscellanea.txt.in'
28--- jobs/miscellanea.txt.in 2013-09-08 04:14:01 +0000
29+++ jobs/miscellanea.txt.in 2013-09-25 23:51:31 +0000
30@@ -1,3 +1,27 @@
31+plugin: shell
32+name: miscellanea/syslog_check
33+command: SYSLOG_ERROR=`grep -i "${acpi,device} error" /var/log/syslog -A 20 -B 20`; echo $SYSLOG_ERROR >> /dev/stderr; [ -z "$SYSLOG_ERROR" ]
34+_description:
35+ Checks the system log for any critical pci/pcie or acpi event errors
36+
37+plugin: shell
38+name: miscellanea/idle_check
39+command: [ `uptime | awk '{split($0,a,":"); print a[1]}'` -lt 12 ]
40+
41+plugin: manual
42+name: miscellanea/12hr_idle_verify
43+depends: miscellanea/idle_check
44+_description:
45+ PURPOSE:
46+ Warning. We could not verify that the system has already completed a 12 hour idle session.
47+ This test will ask you to verify the system has remained active and booted to an OS for 12+ hours.
48+ STEPS:
49+ 1. Prior to running this test, you should have attempted to let the machine idle for more than 12 hrs while booted to an OS.
50+ VERIFICATION:
51+ 1. Select Yes if the system remained on and booted to an operating system for 12 hours without error.
52+ 2. Select No if you have not allowed the system to be on and booted to an operating system for 12 hours.
53+ 3. Select No if you attempted to let the system remain idle for 12 hours and it failed to do so for any reason/
54+
55 plugin: manual
56 name: miscellanea/tester-info
57 _description:

Subscribers

People subscribed via source and target branches