Merge lp:~asac/ubuntu-test-cases/default-systemsettle-test into lp:ubuntu-test-cases/touch

Proposed by Alexander Sack
Status: Superseded
Proposed branch: lp:~asac/ubuntu-test-cases/default-systemsettle-test
Merge into: lp:ubuntu-test-cases/touch
Diff against target: 140 lines (+119/-0)
3 files modified
systemsettle/systemsettle.sh (+108/-0)
systemsettle/tc_control (+10/-0)
tslist.run (+1/-0)
To merge this branch: bzr merge lp:~asac/ubuntu-test-cases/default-systemsettle-test
Reviewer Review Type Date Requested Status
Gema Gomez (community) Needs Fixing
Review via email: mp+179916@code.launchpad.net

This proposal supersedes a proposal from 2013-08-13.

This proposal has been superseded by a proposal from 2013-08-13.

Commit message

add systemsettle test to default smoke test suite: we wait until system idle average goes beyond 99.25% before claiming a device and image is ready for further testing.

Description of the change

be aware that the tc_control part is untested, while the script is. should be easy to adjust so please do during merge/commit at best.

To post a comment you must log in.
Revision history for this message
Paul Larson (pwlars) wrote :

Seems to run ok on my device under utah, I'm guessing the intent of this is to catch if we have a runaway process right? A couple of questions:

+timeout: 720
Any particular reason for 12 minutes timeout?

123 - test: vmstat
124 +- test: systemsettle
125 - test: netstat
Any preference as to where it runs? You seem to have put it somewhere in the middle, but I wasn't sure if there was a reason for that.

Revision history for this message
Gema Gomez (gema) wrote :

The test case documentation needs to be somewhat explanatory of what the test case is trying to achieve, rather than talking about what script to run:
108 +action: |
109 + 1. run systemsettle.sh to wait for system to become idle
110 +expected_results: |
111 + 1. run systemsettle.sh succeeds

I was expecting something along the following lines:
action: |
1. Check the CPU load every minute for 10 minutes
expected_results: |
1. The load doesn't exceed X value

Whatever you are trying to actually do, I am not sure my description is accurate either, but you get the idea.

review: Needs Fixing
Revision history for this message
Alexander Sack (asac) wrote :

hi,

would be great if you could fix those nits while merging to your own needs.

On Tue, Aug 13, 2013 at 5:55 PM, Gema Gomez
<email address hidden> wrote:
> Review: Needs Fixing
>
> The test case documentation needs to be somewhat explanatory of what the test case is trying to achieve, rather than talking about what script to run:
> 108 +action: |
> 109 + 1. run systemsettle.sh to wait for system to become idle
> 110 +expected_results: |
> 111 + 1. run systemsettle.sh succeeds
>
> I was expecting something along the following lines:
> action: |
> 1. Check the CPU load every minute for 10 minutes
> expected_results: |
> 1. The load doesn't exceed X value
>
> Whatever you are trying to actually do, I am not sure my description is accurate either, but you get the idea.
>
> --
> https://code.launchpad.net/~asac/ubuntu-test-cases/default-systemsettle-test/+merge/179916
> You are the owner of lp:~asac/ubuntu-test-cases/default-systemsettle-test.

Revision history for this message
Alexander Sack (asac) wrote :

the purpose of this is to have logic that will wait until the system
has calmed down (settled). It is supposed to be run a) as part of the
default suite and as discussed on IRC later also as a prereq before we
start individual test runs (autopilots, benchmarks, whatever).

the 12 minute timeout is tuned to be 2 minutes more than we expect the
run to take using the current defaults set in the script. we basically
give the system 10 minutes at max to settle for now. guess thats far
too long, so we could reduce it using trial error to something more
reasonable.

On Tue, Aug 13, 2013 at 5:42 PM, Paul Larson <email address hidden> wrote:
> Seems to run ok on my device under utah, I'm guessing the intent of this is to catch if we have a runaway process right? A couple of questions:
>
> +timeout: 720
> Any particular reason for 12 minutes timeout?
>
> 123 - test: vmstat
> 124 +- test: systemsettle
> 125 - test: netstat
> Any preference as to where it runs? You seem to have put it somewhere in the middle, but I wasn't sure if there was a reason for that.
> --
> https://code.launchpad.net/~asac/ubuntu-test-cases/default-systemsettle-test/+merge/179916
> You are the owner of lp:~asac/ubuntu-test-cases/default-systemsettle-test.

Revision history for this message
Andy Doan (doanac) wrote :

Chris added this to his jenkins setup and it basically works:

 http://142.197.155.43:8080/view/settle/job/settle-saucy-touch-mako-smoke-default/2/console

UTAH failed this test because it never settled (whoopsie was being bad). I see one issue I'd change:

57 +while test `calc $idle_avg '<' $idle_avg_min` = 1 -a "$settle_count" -lt "$settle_max"; do

We already run the test with a timeout of 12minutes so the "settle_max" check for the loop shouldn't be needed. However, it looks like settle_max got hit first instead of the timeout and then the pass/fail logic gets hit. I think you should:

1) remove settle_max logic
2) remove the logic at the very end that determines pass/fail into your cleanup function

Revision history for this message
Alexander Sack (asac) wrote :

feel free to do the changes that need to happen to land it. I did this code to give folks a head start to get insight into things like whoopsie case and more...

I dont really understand what you say also, so I really think it would be cool to just change what you suggest while merging.

Revision history for this message
Alexander Sack (asac) wrote :

oh on the settle_max thing i have no opinion. I just made the script so it makes sense if run without utah.

Revision history for this message
Alexander Sack (asac) wrote :

In the test run the console output looks very garbled...

in reality it dumbs a nice top so you see which process goes looping

Revision history for this message
Andy Doan (doanac) wrote :

On 08/13/2013 03:36 PM, Alexander Sack wrote:
> In the test run the console output looks very garbled...
>
> in reality it dumbs a nice top so you see which process goes looping

yeah. it also shows up fine in the UTAH yaml. don't worry about that

14. By Alexander Sack

systemsettle: add run-forever option for utah timeout support and improve toplog formatting

15. By Alexander Sack

systemsettle: improve tc_control action and expected_results wording

Revision history for this message
Alexander Sack (asac) wrote :
16. By Alexander Sack

systemsettle: refactor pass/success exit code logic into trap handler

Unmerged revisions

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== added directory 'systemsettle'
2=== added file 'systemsettle/systemsettle.sh'
3--- systemsettle/systemsettle.sh 1970-01-01 00:00:00 +0000
4+++ systemsettle/systemsettle.sh 2013-08-13 21:03:13 +0000
5@@ -0,0 +1,108 @@
6+#!/bin/bash
7+
8+calc () { awk "BEGIN{ print $* }" ;}
9+
10+cleanup () { rm -f $top_log $vmstat_log $vmstat_log.reduced; exit $exit_code;}
11+
12+if test -z "$1"; then
13+ echo "ERROR: you need to provide the average idle value"
14+ echo "Usage: systemsettle.sh <avg-idle> [run-forever]"
15+ echo " - e.g. systemsettle.sh 99.25"
16+ echo " - e.g. systemsettle.sh 99.25 run-forever"
17+ exit 129
18+fi
19+
20+if test "$2" = "run-forever"; then
21+ settle_prefix='-'
22+fi
23+
24+# minimum average idle level required to succeed
25+idle_avg_min=$1
26+
27+# how many total attempts to settle the system
28+settle_max=1
29+
30+# measurement details: vmstat $vmstat_wait $vmstat_repeat
31+vmstat_wait=1
32+vmstat_repeat=10
33+
34+# how many samples to ignore
35+vmstat_ignore=1
36+
37+# exit code storage
38+exit_code=2
39+
40+# tweak cut field by arch
41+if uname -m | grep -q armv7; then
42+ idle_pos=16
43+elif uname -m | grep -q i.86; then
44+ idle_pos=15
45+else
46+ echo "machine \'`uname -m`\' not supported"
47+ exit 128
48+fi
49+
50+# set and calc more runtime values
51+vmstat_tail=`calc $vmstat_repeat - $vmstat_ignore`
52+settle_count=0
53+idle_avg=0
54+
55+echo "System Settle run - quiesce the system"
56+echo "--------------------------------------"
57+echo
58+echo " + cmd: \'vmstat $vmstat_wait $vmstat_repeat\' ignoring first $vmstat_ignore (tail: $vmstat_tail)"
59+echo
60+
61+trap cleanup EXIT INT QUIT ILL KILL SEGV TERM
62+vmstat_log=`mktemp -t`
63+top_log=`mktemp -t`
64+
65+while test `calc $idle_avg '<' $idle_avg_min` = 1 -a "$settle_prefix$settle_count" -lt "$settle_max"; do
66+ echo Starting settle run $settle_count:
67+
68+ # get vmstat
69+ vmstat $vmstat_wait $vmstat_repeat | tee $vmstat_log
70+ cat $vmstat_log | tail -n $vmstat_tail > $vmstat_log.reduced
71+
72+ # log top output for potential debugging
73+ echo "TOP DUMP (after settle run: $settle_count)" >> $top_log
74+ echo "========================" >> $top_log
75+ top -n 1 -b >> $top_log
76+ echo >> $top_log
77+
78+ # calc average of idle field for this measurement
79+ sum=0
80+ count=0
81+ while read line; do
82+ idle=`echo $line | sed -e 's/\s\s*/ /g' | cut -d ' ' -f 15`
83+ sum=`calc $sum + $idle`
84+ count=`calc $count + 1`
85+ done < $vmstat_log.reduced
86+
87+ idle_avg=`calc $sum.0 / $count.0`
88+ settle_count=`calc $settle_count + 1`
89+
90+ echo
91+ echo "Measurement:"
92+ echo " + idle level: $idle_avg"
93+ echo " + idle sum: $sum / count: $count"
94+ echo
95+done
96+
97+if test `calc $idle_avg '<' $idle_avg_min` = 1; then
98+ echo "System failed to settle to target idle level ($idle_avg_min)"
99+ echo " + check out the following top log taken at each retry:"
100+
101+ # dumb toplog indented
102+ while read line; do
103+ echo " $line"
104+ done < $top_log
105+
106+ echo
107+ echo "system did not settle. FAILED."
108+ exit_code=1
109+else
110+ echo "system settled. SUCCESS"
111+ exit_code=0
112+fi
113+
114
115=== added file 'systemsettle/tc_control'
116--- systemsettle/tc_control 1970-01-01 00:00:00 +0000
117+++ systemsettle/tc_control 2013-08-13 21:03:13 +0000
118@@ -0,0 +1,10 @@
119+description: check if system settles to idle average > 99.25%
120+dependencies: none
121+action: |
122+ 1. Take CPU load samples for 10 minutes and fail if average idle never goes above 99.25% percent
123+expected_results: |
124+ 1. When doing nothing, system calms down to at least 99.25% idle level
125+type: userland
126+timeout: 720
127+command: ./systemsettle.sh 99.25 run-forever
128+run_as: root
129
130=== modified file 'tslist.run'
131--- tslist.run 2013-06-17 20:59:34 +0000
132+++ tslist.run 2013-08-13 21:03:13 +0000
133@@ -1,6 +1,7 @@
134 - test: pwd
135 - test: uname
136 - test: vmstat
137+- test: systemsettle
138 - test: netstat
139 - test: ifconfig
140 - test: route

Subscribers

People subscribed via source and target branches