Ubuntu Test Cases

Merge lp:~asac/ubuntu-test-cases/default-systemsettle-test into lp:ubuntu-test-cases/touch

default-systemsettle-test
Merge into touch

Proposed by Alexander Sack on 2013-08-13

Status:	Merged
Merged at revision:	10
Proposed branch:	lp:~asac/ubuntu-test-cases/default-systemsettle-test
Merge into:	lp:ubuntu-test-cases/touch
Diff against target:	149 lines (+128/-0) 3 files modified systemsettle/systemsettle.sh (+117/-0) systemsettle/tc_control (+10/-0) tslist.run (+1/-0)
To merge this branch:	bzr merge lp:~asac/ubuntu-test-cases/default-systemsettle-test
Related bugs:	Link a bug report

Reviewer	Review Type	Date Requested	Status
Gema Gomez		2013-08-13	Pending
Review via email: mp+180004@code.launchpad.net

This proposal supersedes a proposal from 2013-08-13.

Description of the change

be aware that the tc_control part is untested, while the script is. should be easy to adjust so please do during merge/commit at best.

addressed all previous comments by gema and doanac by:

16. By Alexander Sack 27 seconds ago

systemsettle: refactor pass/success exit code logic into trap handler

15. By Alexander Sack 4 minutes ago

systemsettle: improve tc_control action and expected_results wording

14. By Alexander Sack 6 minutes ago

systemsettle: add run-forever option for utah timeout support and improve toplog formatting

please merge :)

Revision history for this message

Paul Larson (pwlars) wrote on 2013-08-13: Posted in a previous version of this proposal

Seems to run ok on my device under utah, I'm guessing the intent of this is to catch if we have a runaway process right? A couple of questions:

+timeout: 720
Any particular reason for 12 minutes timeout?

123 - test: vmstat
124 +- test: systemsettle
125 - test: netstat
Any preference as to where it runs? You seem to have put it somewhere in the middle, but I wasn't sure if there was a reason for that.

Revision history for this message

Gema Gomez (gema) wrote on 2013-08-13: Posted in a previous version of this proposal

The test case documentation needs to be somewhat explanatory of what the test case is trying to achieve, rather than talking about what script to run:
108 +action: |
109 + 1. run systemsettle.sh to wait for system to become idle
110 +expected_results: |
111 + 1. run systemsettle.sh succeeds

I was expecting something along the following lines:
action: |
1. Check the CPU load every minute for 10 minutes
expected_results: |
1. The load doesn't exceed X value

Whatever you are trying to actually do, I am not sure my description is accurate either, but you get the idea.

review: Needs Fixing

Revision history for this message

Alexander Sack (asac) wrote on 2013-08-13: Posted in a previous version of this proposal

hi,

would be great if you could fix those nits while merging to your own needs.

On Tue, Aug 13, 2013 at 5:55 PM, Gema Gomez
<email address hidden> wrote:
> Review: Needs Fixing
>
> The test case documentation needs to be somewhat explanatory of what the test case is trying to achieve, rather than talking about what script to run:
> 108 +action: |
> 109 + 1. run systemsettle.sh to wait for system to become idle
> 110 +expected_results: |
> 111 + 1. run systemsettle.sh succeeds
>
> I was expecting something along the following lines:
> action: |
> 1. Check the CPU load every minute for 10 minutes
> expected_results: |
> 1. The load doesn't exceed X value
>
> Whatever you are trying to actually do, I am not sure my description is accurate either, but you get the idea.
>
> --
> https://code.launchpad.net/~asac/ubuntu-test-cases/default-systemsettle-test/+merge/179916
> You are the owner of lp:~asac/ubuntu-test-cases/default-systemsettle-test.

Revision history for this message

Alexander Sack (asac) wrote on 2013-08-13: Posted in a previous version of this proposal

the purpose of this is to have logic that will wait until the system
has calmed down (settled). It is supposed to be run a) as part of the
default suite and as discussed on IRC later also as a prereq before we
start individual test runs (autopilots, benchmarks, whatever).

the 12 minute timeout is tuned to be 2 minutes more than we expect the
run to take using the current defaults set in the script. we basically
give the system 10 minutes at max to settle for now. guess thats far
too long, so we could reduce it using trial error to something more
reasonable.

On Tue, Aug 13, 2013 at 5:42 PM, Paul Larson <email address hidden> wrote:
> Seems to run ok on my device under utah, I'm guessing the intent of this is to catch if we have a runaway process right? A couple of questions:
>
> +timeout: 720
> Any particular reason for 12 minutes timeout?
>
> 123 - test: vmstat
> 124 +- test: systemsettle
> 125 - test: netstat
> Any preference as to where it runs? You seem to have put it somewhere in the middle, but I wasn't sure if there was a reason for that.
> --
> https://code.launchpad.net/~asac/ubuntu-test-cases/default-systemsettle-test/+merge/179916
> You are the owner of lp:~asac/ubuntu-test-cases/default-systemsettle-test.

Revision history for this message

Andy Doan (doanac) wrote on 2013-08-13: Posted in a previous version of this proposal

Chris added this to his jenkins setup and it basically works:

http://142.197.155.43:8080/view/settle/job/settle-saucy-touch-mako-smoke-default/2/console

UTAH failed this test because it never settled (whoopsie was being bad). I see one issue I'd change:

57 +while test `calc $idle_avg '<' $idle_avg_min` = 1 -a "$settle_count" -lt "$settle_max"; do

We already run the test with a timeout of 12minutes so the "settle_max" check for the loop shouldn't be needed. However, it looks like settle_max got hit first instead of the timeout and then the pass/fail logic gets hit. I think you should:

1) remove settle_max logic
2) remove the logic at the very end that determines pass/fail into your cleanup function

Revision history for this message

Alexander Sack (asac) wrote on 2013-08-13: Posted in a previous version of this proposal

feel free to do the changes that need to happen to land it. I did this code to give folks a head start to get insight into things like whoopsie case and more...

I dont really understand what you say also, so I really think it would be cool to just change what you suggest while merging.

Revision history for this message

Alexander Sack (asac) wrote on 2013-08-13: Posted in a previous version of this proposal

oh on the settle_max thing i have no opinion. I just made the script so it makes sense if run without utah.

Revision history for this message

Alexander Sack (asac) wrote on 2013-08-13: Posted in a previous version of this proposal

In the test run the console output looks very garbled...

in reality it dumbs a nice top so you see which process goes looping

Revision history for this message

Andy Doan (doanac) wrote on 2013-08-13: Posted in a previous version of this proposal

On 08/13/2013 03:36 PM, Alexander Sack wrote:
> In the test run the console output looks very garbled...
>
> in reality it dumbs a nice top so you see which process goes looping

yeah. it also shows up fine in the UTAH yaml. don't worry about that

Revision history for this message

Alexander Sack (asac) wrote on 2013-08-13: Posted in a previous version of this proposal

addressed stuff and resubmitted: https://code.launchpad.net/~asac/ubuntu-test-cases/default-systemsettle-test/+merge/180004

lp:~asac/ubuntu-test-cases/default-systemsettle-test updated on 2013-08-13

16. By Alexander Sack on 2013-08-13: systemsettle: refactor pass/success exit code logic into trap handler

Revision history for this message

Alexander Sack (asac) wrote on 2013-08-13:

Download full text (5.3 KiB)

fwiw, I repushed revision 16 a few times, i didnt see a new diff
coming through mail, so please check the web when reviewing for the
real, latest code.

On Tue, Aug 13, 2013 at 11:05 PM, Alexander Sack <email address hidden> wrote:
> Alexander Sack has proposed merging lp:~asac/ubuntu-test-cases/default-systemsettle-test into lp:ubuntu-test-cases/touch.
>
> Requested reviews:
> Gema Gomez (gema)
>
> For more details, see:
> https://code.launchpad.net/~asac/ubuntu-test-cases/default-systemsettle-test/+merge/180004
>
> be aware that the tc_control part is untested, while the script is. should be easy to adjust so please do during merge/commit at best.
>
> addressed all previous comments by gema and doanac by:
>
> 16. By Alexander Sack 27 seconds ago
>
> systemsettle: refactor pass/success exit code logic into trap handler
>
> 15. By Alexander Sack 4 minutes ago
>
> systemsettle: improve tc_control action and expected_results wording
>
> 14. By Alexander Sack 6 minutes ago
>
> systemsettle: add run-forever option for utah timeout support and improve toplog formatting
>
>
> please merge :)
> --
> https://code.launchpad.net/~asac/ubuntu-test-cases/default-systemsettle-test/+merge/180004
> You are the owner of lp:~asac/ubuntu-test-cases/default-systemsettle-test.
>
> === added directory 'systemsettle'
> === added file 'systemsettle/systemsettle.sh'
> --- systemsettle/systemsettle.sh 1970-01-01 00:00:00 +0000
> +++ systemsettle/systemsettle.sh 2013-08-13 21:04:16 +0000
> @@ -0,0 +1,108 @@
> +#!/bin/bash
> +
> +calc () { awk "BEGIN{ print $* }" ;}
> +
> +cleanup () { rm -f $top_log $vmstat_log $vmstat_log.reduced; exit $exit_code;}
> +
> +if test -z "$1"; then
> + echo "ERROR: you need to provide the average idle value"
> + echo "Usage: systemsettle.sh <avg-idle> [run-forever]"
> + echo " - e.g. systemsettle.sh 99.25"
> + echo " - e.g. systemsettle.sh 99.25 run-forever"
> + exit 129
> +fi
> +
> +if test "$2" = "run-forever"; then
> + settle_prefix='-'
> +fi
> +
> +# minimum average idle level required to succeed
> +idle_avg_min=$1
> +
> +# how many total attempts to settle the system
> +settle_max=1
> +
> +# measurement details: vmstat $vmstat_wait $vmstat_repeat
> +vmstat_wait=1
> +vmstat_repeat=10
> +
> +# how many samples to ignore
> +vmstat_ignore=1
> +
> +# exit code storage
> +exit_code=2
> +
> +# tweak cut field by arch
> +if uname -m | grep -q armv7; then
> + idle_pos=16
> +elif uname -m | grep -q i.86; then
> + idle_pos=15
> +else
> + echo "machine \'`uname -m`\' not supported"
> + exit 128
> +fi
> +
> +# set and calc more runtime values
> +vmstat_tail=`calc $vmstat_repeat - $vmstat_ignore`
> +settle_count=0
> +idle_avg=0
> +
> +echo "System Settle run - quiesce the system"
> +echo "--------------------------------------"
> +echo
> +echo " + cmd: \'vmstat $vmstat_wait $vmstat_repeat\' ignoring first $vmstat_ignore (tail: $vmstat_tail)"
> +echo
> +
> +trap cleanup EXIT INT QUIT ILL KILL SEGV TERM
> +vmstat_log=`mktemp -t`
> +top_log=`mktemp -t`
> +
> +while test `calc $idle_avg '<' $idle_avg_min` = 1 -a "$settle_prefix$settle_count" -lt "$settle_max"; do
> + echo Starting settle run $settle...

fwiw, I repushed revision 16 a few times, i didnt see a new diff
coming through mail, so please check the web when reviewing for the
real, latest code.

On Tue, Aug 13, 2013 at 11:05 PM, Alexander Sack <asac@ubuntu.com> wrote:
> Alexander Sack has proposed merging lp:~asac/ubuntu-test-cases/default-systemsettle-test into lp:ubuntu-test-cases/touch.
>
> Requested reviews:
>   Gema Gomez (gema)
>
> For more details, see:
> https://code.launchpad.net/~asac/ubuntu-test-cases/default-systemsettle-test/+merge/180004
>
> be aware that the tc_control part is untested, while the script is. should be easy to adjust so please do during merge/commit at best.
>
> addressed all previous comments by gema and doanac by:
>
>  16. By Alexander Sack 27 seconds ago
>
>     systemsettle: refactor pass/success exit code logic into trap handler
>
> 15. By Alexander Sack 4 minutes ago
>
>     systemsettle: improve tc_control action and expected_results wording
>
> 14. By Alexander Sack 6 minutes ago
>
>     systemsettle: add run-forever option for utah timeout support and improve toplog formatting
>
>
> please merge :)
> --
> https://code.launchpad.net/~asac/ubuntu-test-cases/default-systemsettle-test/+merge/180004
> You are the owner of lp:~asac/ubuntu-test-cases/default-systemsettle-test.
>
> === added directory 'systemsettle'
> === added file 'systemsettle/systemsettle.sh'
> --- systemsettle/systemsettle.sh        1970-01-01 00:00:00 +0000
> +++ systemsettle/systemsettle.sh        2013-08-13 21:04:16 +0000
> @@ -0,0 +1,108 @@
> +#!/bin/bash
> +
> +calc () { awk "BEGIN{ print $* }" ;}
> +
> +cleanup () { rm -f $top_log $vmstat_log $vmstat_log.reduced; exit $exit_code;}
> +
> +if test -z "$1"; then
> +   echo "ERROR: you need to provide the average idle value"
> +   echo "Usage: systemsettle.sh <avg-idle> [run-forever]"
> +   echo "       - e.g. systemsettle.sh 99.25"
> +   echo "       - e.g. systemsettle.sh 99.25 run-forever"
> +   exit 129
> +fi
> +
> +if test "$2" = "run-forever"; then
> +  settle_prefix='-'
> +fi
> +
> +# minimum average idle level required to succeed
> +idle_avg_min=$1
> +
> +# how many total attempts to settle the system
> +settle_max=1
> +
> +# measurement details: vmstat $vmstat_wait $vmstat_repeat
> +vmstat_wait=1
> +vmstat_repeat=10
> +
> +# how many samples to ignore
> +vmstat_ignore=1
> +
> +# exit code storage
> +exit_code=2
> +
> +# tweak cut field by arch
> +if uname -m | grep -q armv7; then
> +  idle_pos=16
> +elif uname -m | grep -q i.86; then
> +  idle_pos=15
> +else
> +  echo "machine \'`uname -m`\' not supported"
> +  exit 128
> +fi
> +
> +# set and calc more runtime values
> +vmstat_tail=`calc $vmstat_repeat - $vmstat_ignore`
> +settle_count=0
> +idle_avg=0
> +
> +echo "System Settle run - quiesce the system"
> +echo "--------------------------------------"
> +echo
> +echo "  + cmd: \'vmstat $vmstat_wait $vmstat_repeat\' ignoring first $vmstat_ignore (tail: $vmstat_tail)"
> +echo
> +
> +trap cleanup EXIT INT QUIT ILL KILL SEGV TERM
> +vmstat_log=`mktemp -t`
> +top_log=`mktemp -t`
> +
> +while test `calc $idle_avg '<' $idle_avg_min` = 1 -a "$settle_prefix$settle_count" -lt "$settle_max"; do
> +  echo Starting settle run $settle_count:
> +
> +  # get vmstat
> +  vmstat $vmstat_wait $vmstat_repeat | tee $vmstat_log
> +  cat $vmstat_log | tail -n $vmstat_tail > $vmstat_log.reduced
> +
> +  # log top output for potential debugging
> +  echo "TOP DUMP (after settle run: $settle_count)" >> $top_log
> +  echo "========================" >> $top_log
> +  top -n 1 -b >> $top_log
> +  echo >> $top_log
> +
> +  # calc average of idle field for this measurement
> +  sum=0
> +  count=0
> +  while read line; do
> +     idle=`echo $line | sed -e 's/\s\s*/ /g' | cut -d ' ' -f 15`
> +     sum=`calc $sum + $idle`
> +     count=`calc $count + 1`
> +  done < $vmstat_log.reduced
> +
> +  idle_avg=`calc $sum.0 / $count.0`
> +  settle_count=`calc $settle_count + 1`
> +
> +  echo
> +  echo "Measurement:"
> +  echo "  + idle level: $idle_avg"
> +  echo "  + idle sum: $sum / count: $count"
> +  echo
> +done
> +
> +if test `calc $idle_avg '<' $idle_avg_min` = 1; then
> +  echo "System failed to settle to target idle level ($idle_avg_min)"
> +  echo "   + check out the following top log taken at each retry:"
> +
> +  # dumb toplog indented
> +  while read line; do
> +    echo "  $line"
> +  done < $top_log
> +
> +  echo
> +  echo "system did not settle. FAILED."
> +  exit_code=1
> +else
> +  echo "system settled. SUCCESS"
> +  exit_code=0
> +fi
> +
>
> === added file 'systemsettle/tc_control'
> --- systemsettle/tc_control     1970-01-01 00:00:00 +0000
> +++ systemsettle/tc_control     2013-08-13 21:04:16 +0000
> @@ -0,0 +1,10 @@
> +description: check if system settles to idle average > 99.25%
> +dependencies: none
> +action: |
> +  1. Take CPU load samples for 10 minutes and fail if average idle never goes above 99.25% percent
> +expected_results: |
> +  1. When doing nothing, system calms down to at least 99.25% idle level
> +type: userland
> +timeout: 720
> +command: ./systemsettle.sh 99.25 run-forever
> +run_as: root
>
> === modified file 'tslist.run'
> --- tslist.run  2013-06-17 20:59:34 +0000
> +++ tslist.run  2013-08-13 21:04:16 +0000
> @@ -1,6 +1,7 @@
>  - test: pwd
>  - test: uname
>  - test: vmstat
> +- test: systemsettle
>  - test: netstat
>  - test: ifconfig
>  - test: route
>
>

Revision history for this message

Andy Doan (doanac) wrote on 2013-08-13:

this is close, but not quite right. The problem I see is related to the signal handling somehow. Ctrl-C works well, but if I run "kill <pid>" from another terminal, its really slow to repond and when it does - it doesn't exit the process. The problem is that when we run this in practice, UTAH is going to give it a sig-term when its timedout and then give a sig-kill. Given the sig-term will repond to slow, the process will just exit with no proper cleanup.

However, that might be okay since it will still exit with a bad return code and show the test as failed?

Revision history for this message

Alexander Sack (asac) wrote on 2013-08-14:

no its not okay. we want the top report that it dumps in case of failure...

I believe my initial revision was on the spot :-P ...

SIGTERM takes a while, because it doesnt propagate down to vmstat ... you should give SIGTERM more time to finish (you always should if you hope for graceful shutdown anyway) or not run it with "run-forerver"

Revision history for this message

Alexander Sack (asac) wrote on 2013-08-14:

OK, me looked up kill foo and found that in order to behave SIGTERM like the SIGINT from ctrl-c (propagate to whole process group) you would have to send kill with a negative PID: kill -TERM -1234

So yeah, you should fix it in utah and this is all good as it is ...

btw, ctrl-c sends SIGINT afaik...

Revision history for this message

Alexander Sack (asac) wrote on 2013-08-14:

btw, i checked utah and in process.py you already try to kill all childrens manually as well ... so not sure if that code is buggy or if you didn't try the test in the real utah code ...

in anycase, I have pushed an inspirational branch that might work (not tested) that replaces that manual business with OS facilities ...

see: http://bazaar.launchpad.net/~asac/utah/use-os_killpg/revision/996

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Alexander Sack

Ubuntu Test Case Developers

 === added directory 'systemsettle'
 === added file 'systemsettle/systemsettle.sh'
 --- systemsettle/systemsettle.sh	1970-01-01 00:00:00 +0000
 +++ systemsettle/systemsettle.sh	2013-08-13 21:47:11 +0000
@@ -0,0 +1,117 @@
++#!/bin/bash
++
++set -e
++
++# default exit code storage
++dump_error=1
++
++calc () { awk "BEGIN{ print $* }" ;}
++
++cleanup () {
++  if ! test "$dump_error" = 0; then
++    echo "System failed to settle to target idle level ($idle_avg_min)"
++    echo "   + check out the following top log taken at each retry:"
++
++    # dumb toplog indented
++    while read line; do
++      echo "  $line"
++    done < $top_log
++
++    echo
++    # dont rerun this logic in case we get multiple signals
++    dump_error=0
++  fi
++  rm -f $top_log $vmstat_log $vmstat_log.reduced
++}
++
++if test -z "$1"; then
++   echo "ERROR: you need to provide the average idle value"
++   echo "Usage: systemsettle.sh <avg-idle> [run-forever]"
++   echo "       - e.g. systemsettle.sh 99.25"
++   echo "       - e.g. systemsettle.sh 99.25 run-forever"
++   exit 129
++fi
++
++if test "$2" = "run-forever"; then
++  settle_prefix='-'
++fi
++
++# minimum average idle level required to succeed
++idle_avg_min=$1
++
++# how many total attempts to settle the system
++settle_max=10
++
++# measurement details: vmstat $vmstat_wait $vmstat_repeat
++vmstat_wait=6
++vmstat_repeat=10
++
++# how many samples to ignore
++vmstat_ignore=1
++
++# tweak cut field by arch
++if uname -m | grep -q armv7; then
++  idle_pos=16
++elif uname -m | grep -q i.86; then
++  idle_pos=15
++else
++  echo "machine \'`uname -m`\' not supported"
++  exit 128
++fi
++
++# set and calc more runtime values
++vmstat_tail=`calc $vmstat_repeat - $vmstat_ignore`
++settle_count=0
++idle_avg=0
++
++echo "System Settle run - quiesce the system"
++echo "--------------------------------------"
++echo
++echo "  + cmd: \'vmstat $vmstat_wait $vmstat_repeat\' ignoring first $vmstat_ignore (tail: $vmstat_tail)"
++echo
++
++trap cleanup EXIT INT QUIT ILL KILL SEGV TERM
++vmstat_log=`mktemp -t`
++top_log=`mktemp -t`
++
++while test `calc $idle_avg '<' $idle_avg_min` = 1 -a "$settle_prefix$settle_count" -lt "$settle_max"; do
++  echo Starting settle run $settle_count:
++
++  # get vmstat
++  vmstat $vmstat_wait $vmstat_repeat | tee $vmstat_log
++  cat $vmstat_log | tail -n $vmstat_tail > $vmstat_log.reduced
++
++  # log top output for potential debugging
++  echo "TOP DUMP (after settle run: $settle_count)" >> $top_log
++  echo "========================" >> $top_log
++  top -n 1 -b >> $top_log
++  echo >> $top_log
++
++  # calc average of idle field for this measurement
++  sum=0
++  count=0
++  while read line; do
++     idle=`echo $line | sed -e 's/\s\s*/ /g' | cut -d ' ' -f 15`
++     sum=`calc $sum + $idle`
++     count=`calc $count + 1`
++  done < $vmstat_log.reduced
++
++  idle_avg=`calc $sum.0 / $count.0`
++  settle_count=`calc $settle_count + 1`
++
++  echo
++  echo "Measurement:"
++  echo "  + idle level: $idle_avg"
++  echo "  + idle sum: $sum / count: $count"
++  echo
++done
++
++if test `calc $idle_avg '<' $idle_avg_min` = 1; then
++  echo "system not settled. FAIL"
++  exit 1
++else
++  echo "system settled. SUCCESS"
++  dump_error=0
++  exit 0
++fi
++
 === added file 'systemsettle/tc_control'
 --- systemsettle/tc_control	1970-01-01 00:00:00 +0000
 +++ systemsettle/tc_control	2013-08-13 21:47:11 +0000
@@ -0,0 +1,10 @@
++description: check if system settles to idle average > 99.25%
++dependencies: none
++action: |
++  1. Take CPU load samples for 10 minutes and fail if average idle never goes above 99.25% percent
++expected_results: |
++  1. When doing nothing, system calms down to at least 99.25% idle level
++type: userland
++timeout: 720
++command: ./systemsettle.sh 99.25 run-forever
++run_as: root
 === modified file 'tslist.run'
 --- tslist.run	2013-06-17 20:59:34 +0000
 +++ tslist.run	2013-08-13 21:47:11 +0000
@@ -1,6 +1,7 @@
  - test: pwd
  - test: uname
  - test: vmstat
++- test: systemsettle
  - test: netstat
  - test: ifconfig
  - test: route

Ubuntu Test Cases

Merge lp:~asac/ubuntu-test-cases/default-systemsettle-test into lp:ubuntu-test-cases/touch

Commit message

Description of the change

Preview Diff

Subscribers