Ubuntu Test Cases

Merge lp:~asac/ubuntu-test-cases/default-systemsettle-test into lp:ubuntu-test-cases/touch

default-systemsettle-test
Merge into touch

Proposed by Alexander Sack on 2013-08-13

Status:	Superseded
Proposed branch:	lp:~asac/ubuntu-test-cases/default-systemsettle-test
Merge into:	lp:ubuntu-test-cases/touch
Diff against target:	140 lines (+119/-0) 3 files modified systemsettle/systemsettle.sh (+108/-0) systemsettle/tc_control (+10/-0) tslist.run (+1/-0)
To merge this branch:	bzr merge lp:~asac/ubuntu-test-cases/default-systemsettle-test
Related bugs:	Link a bug report

Reviewer	Review Type	Date Requested	Status
Gema Gomez (community)		2013-08-13	Needs Fixing on 2013-08-13
Review via email: mp+179916@code.launchpad.net

This proposal supersedes a proposal from 2013-08-13.

This proposal has been superseded by a proposal from 2013-08-13.

Commit message

add systemsettle test to default smoke test suite: we wait until system idle average goes beyond 99.25% before claiming a device and image is ready for further testing.

Description of the change

be aware that the tc_control part is untested, while the script is. should be easy to adjust so please do during merge/commit at best.

Revision history for this message

Paul Larson (pwlars) wrote on 2013-08-13:

Seems to run ok on my device under utah, I'm guessing the intent of this is to catch if we have a runaway process right? A couple of questions:

+timeout: 720
Any particular reason for 12 minutes timeout?

123 - test: vmstat
124 +- test: systemsettle
125 - test: netstat
Any preference as to where it runs? You seem to have put it somewhere in the middle, but I wasn't sure if there was a reason for that.

Revision history for this message

Gema Gomez (gema) wrote on 2013-08-13:

The test case documentation needs to be somewhat explanatory of what the test case is trying to achieve, rather than talking about what script to run:
108 +action: |
109 + 1. run systemsettle.sh to wait for system to become idle
110 +expected_results: |
111 + 1. run systemsettle.sh succeeds

I was expecting something along the following lines:
action: |
1. Check the CPU load every minute for 10 minutes
expected_results: |
1. The load doesn't exceed X value

Whatever you are trying to actually do, I am not sure my description is accurate either, but you get the idea.

review: Needs Fixing

Revision history for this message

Alexander Sack (asac) wrote on 2013-08-13:

hi,

would be great if you could fix those nits while merging to your own needs.

On Tue, Aug 13, 2013 at 5:55 PM, Gema Gomez
<email address hidden> wrote:
> Review: Needs Fixing
>
> The test case documentation needs to be somewhat explanatory of what the test case is trying to achieve, rather than talking about what script to run:
> 108 +action: |
> 109 + 1. run systemsettle.sh to wait for system to become idle
> 110 +expected_results: |
> 111 + 1. run systemsettle.sh succeeds
>
> I was expecting something along the following lines:
> action: |
> 1. Check the CPU load every minute for 10 minutes
> expected_results: |
> 1. The load doesn't exceed X value
>
> Whatever you are trying to actually do, I am not sure my description is accurate either, but you get the idea.
>
> --
> https://code.launchpad.net/~asac/ubuntu-test-cases/default-systemsettle-test/+merge/179916
> You are the owner of lp:~asac/ubuntu-test-cases/default-systemsettle-test.

Revision history for this message

Alexander Sack (asac) wrote on 2013-08-13:

the purpose of this is to have logic that will wait until the system
has calmed down (settled). It is supposed to be run a) as part of the
default suite and as discussed on IRC later also as a prereq before we
start individual test runs (autopilots, benchmarks, whatever).

the 12 minute timeout is tuned to be 2 minutes more than we expect the
run to take using the current defaults set in the script. we basically
give the system 10 minutes at max to settle for now. guess thats far
too long, so we could reduce it using trial error to something more
reasonable.

On Tue, Aug 13, 2013 at 5:42 PM, Paul Larson <email address hidden> wrote:
> Seems to run ok on my device under utah, I'm guessing the intent of this is to catch if we have a runaway process right? A couple of questions:
>
> +timeout: 720
> Any particular reason for 12 minutes timeout?
>
> 123 - test: vmstat
> 124 +- test: systemsettle
> 125 - test: netstat
> Any preference as to where it runs? You seem to have put it somewhere in the middle, but I wasn't sure if there was a reason for that.
> --
> https://code.launchpad.net/~asac/ubuntu-test-cases/default-systemsettle-test/+merge/179916
> You are the owner of lp:~asac/ubuntu-test-cases/default-systemsettle-test.

Revision history for this message

Andy Doan (doanac) wrote on 2013-08-13:

Chris added this to his jenkins setup and it basically works:

http://142.197.155.43:8080/view/settle/job/settle-saucy-touch-mako-smoke-default/2/console

UTAH failed this test because it never settled (whoopsie was being bad). I see one issue I'd change:

57 +while test `calc $idle_avg '<' $idle_avg_min` = 1 -a "$settle_count" -lt "$settle_max"; do

We already run the test with a timeout of 12minutes so the "settle_max" check for the loop shouldn't be needed. However, it looks like settle_max got hit first instead of the timeout and then the pass/fail logic gets hit. I think you should:

1) remove settle_max logic
2) remove the logic at the very end that determines pass/fail into your cleanup function

Revision history for this message

Alexander Sack (asac) wrote on 2013-08-13:

feel free to do the changes that need to happen to land it. I did this code to give folks a head start to get insight into things like whoopsie case and more...

I dont really understand what you say also, so I really think it would be cool to just change what you suggest while merging.

Revision history for this message

Alexander Sack (asac) wrote on 2013-08-13:

oh on the settle_max thing i have no opinion. I just made the script so it makes sense if run without utah.

Revision history for this message

Alexander Sack (asac) wrote on 2013-08-13:

In the test run the console output looks very garbled...

in reality it dumbs a nice top so you see which process goes looping

Revision history for this message

Andy Doan (doanac) wrote on 2013-08-13:

On 08/13/2013 03:36 PM, Alexander Sack wrote:
> In the test run the console output looks very garbled...
>
> in reality it dumbs a nice top so you see which process goes looping

yeah. it also shows up fine in the UTAH yaml. don't worry about that

lp:~asac/ubuntu-test-cases/default-systemsettle-test updated on 2013-08-13

14. By Alexander Sack on 2013-08-13: systemsettle: add run-forever option for utah timeout support and improve toplog formatting
15. By Alexander Sack on 2013-08-13: systemsettle: improve tc_control action and expected_results wording

Revision history for this message

Alexander Sack (asac) wrote on 2013-08-13:

addressed stuff and resubmitted: https://code.launchpad.net/~asac/ubuntu-test-cases/default-systemsettle-test/+merge/180004

lp:~asac/ubuntu-test-cases/default-systemsettle-test updated on 2013-08-13

16. By Alexander Sack on 2013-08-13: systemsettle: refactor pass/success exit code logic into trap handler

Unmerged revisions

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Alexander Sack

Ubuntu Test Case Developers

 === added directory 'systemsettle'
 === added file 'systemsettle/systemsettle.sh'
 --- systemsettle/systemsettle.sh	1970-01-01 00:00:00 +0000
 +++ systemsettle/systemsettle.sh	2013-08-13 21:03:13 +0000
@@ -0,0 +1,108 @@
++#!/bin/bash
++
++calc () { awk "BEGIN{ print $* }" ;}
++
++cleanup () { rm -f $top_log $vmstat_log $vmstat_log.reduced; exit $exit_code;}
++
++if test -z "$1"; then
++   echo "ERROR: you need to provide the average idle value"
++   echo "Usage: systemsettle.sh <avg-idle> [run-forever]"
++   echo "       - e.g. systemsettle.sh 99.25"
++   echo "       - e.g. systemsettle.sh 99.25 run-forever"
++   exit 129
++fi
++
++if test "$2" = "run-forever"; then
++  settle_prefix='-'
++fi
++
++# minimum average idle level required to succeed
++idle_avg_min=$1
++
++# how many total attempts to settle the system
++settle_max=1
++
++# measurement details: vmstat $vmstat_wait $vmstat_repeat
++vmstat_wait=1
++vmstat_repeat=10
++
++# how many samples to ignore
++vmstat_ignore=1
++
++# exit code storage
++exit_code=2
++
++# tweak cut field by arch
++if uname -m | grep -q armv7; then
++  idle_pos=16
++elif uname -m | grep -q i.86; then
++  idle_pos=15
++else
++  echo "machine \'`uname -m`\' not supported"
++  exit 128
++fi
++
++# set and calc more runtime values
++vmstat_tail=`calc $vmstat_repeat - $vmstat_ignore`
++settle_count=0
++idle_avg=0
++
++echo "System Settle run - quiesce the system"
++echo "--------------------------------------"
++echo
++echo "  + cmd: \'vmstat $vmstat_wait $vmstat_repeat\' ignoring first $vmstat_ignore (tail: $vmstat_tail)"
++echo
++
++trap cleanup EXIT INT QUIT ILL KILL SEGV TERM
++vmstat_log=`mktemp -t`
++top_log=`mktemp -t`
++
++while test `calc $idle_avg '<' $idle_avg_min` = 1 -a "$settle_prefix$settle_count" -lt "$settle_max"; do
++  echo Starting settle run $settle_count:
++
++  # get vmstat
++  vmstat $vmstat_wait $vmstat_repeat | tee $vmstat_log
++  cat $vmstat_log | tail -n $vmstat_tail > $vmstat_log.reduced
++
++  # log top output for potential debugging
++  echo "TOP DUMP (after settle run: $settle_count)" >> $top_log
++  echo "========================" >> $top_log
++  top -n 1 -b >> $top_log
++  echo >> $top_log
++
++  # calc average of idle field for this measurement
++  sum=0
++  count=0
++  while read line; do
++     idle=`echo $line | sed -e 's/\s\s*/ /g' | cut -d ' ' -f 15`
++     sum=`calc $sum + $idle`
++     count=`calc $count + 1`
++  done < $vmstat_log.reduced
++
++  idle_avg=`calc $sum.0 / $count.0`
++  settle_count=`calc $settle_count + 1`
++
++  echo
++  echo "Measurement:"
++  echo "  + idle level: $idle_avg"
++  echo "  + idle sum: $sum / count: $count"
++  echo
++done
++
++if test `calc $idle_avg '<' $idle_avg_min` = 1; then
++  echo "System failed to settle to target idle level ($idle_avg_min)"
++  echo "   + check out the following top log taken at each retry:"
++
++  # dumb toplog indented
++  while read line; do
++    echo "  $line"
++  done < $top_log
++
++  echo
++  echo "system did not settle. FAILED."
++  exit_code=1
++else
++  echo "system settled. SUCCESS"
++  exit_code=0
++fi
++
 === added file 'systemsettle/tc_control'
 --- systemsettle/tc_control	1970-01-01 00:00:00 +0000
 +++ systemsettle/tc_control	2013-08-13 21:03:13 +0000
@@ -0,0 +1,10 @@
++description: check if system settles to idle average > 99.25%
++dependencies: none
++action: |
++  1. Take CPU load samples for 10 minutes and fail if average idle never goes above 99.25% percent
++expected_results: |
++  1. When doing nothing, system calms down to at least 99.25% idle level
++type: userland
++timeout: 720
++command: ./systemsettle.sh 99.25 run-forever
++run_as: root
 === modified file 'tslist.run'
 --- tslist.run	2013-06-17 20:59:34 +0000
 +++ tslist.run	2013-08-13 21:03:13 +0000
@@ -1,6 +1,7 @@
  - test: pwd
  - test: uname
  - test: vmstat
++- test: systemsettle
  - test: netstat
  - test: ifconfig
  - test: route