Merge into touch : use-top-and-always-dumb-toplog : Code : Ubuntu Test Cases

Status:	Merged
Merged at revision:	12
Proposed branch:	lp:~asac/ubuntu-test-cases/use-top-and-always-dumb-toplog
Merge into:	lp:ubuntu-test-cases/touch
Diff against target:	140 lines (+32/-33) 1 file modified systemsettle/systemsettle.sh (+32/-33)
To merge this branch:	bzr merge lp:~asac/ubuntu-test-cases/use-top-and-always-dumb-toplog
Related bugs:	Link a bug report

Reviewer	Review Type	Date Requested	Status
Paul Larson		2013-08-19	Approve on 2013-08-19
Review via email: mp+180889@code.launchpad.net

This proposal supersedes a proposal from 2013-08-19.

Description of the change

+ use top in batchmode instead of vmstat
+ improve logging; echo cli arguments given as well as
+ dumb top_log regardless of success and failyure
+ top_log is now more useful to see what was going on as we dumb exactly what was measured
+ addresses all previous comments (minus request for optionally dumping top log)

Revision history for this message

Paul Larson (pwlars) wrote on 2013-08-19: Posted in a previous version of this proposal

#

I haven't looked too deeply just yet, but there are some things I see after a quick trial run locally:

>Measurement:
> + idle level: 148.50
> + idle sum: 148.5 / count: 2
>
>system settled. SUCCESS
I told it to only pass if it hit 99% or better, and my system was constantly at 80% or less idle... clearly this is not good.

Using top in the way you are using it right now means that we only see that it's starting the first pass, but then we don't get any more feedback that something is happening until the end of the entire run.

27 +echo " cmd = 'top -b -d $vmstat_wait -n $vmstat_repeat' ignoring first $vmstat_ignore (tail: $vmstat_tail)"
Instead of printing out 7 lines of verbose detail about the options we just passed it, I think it would make more sense to just have the above line do something like print the exact command line args. I had thought about changing this in the first pass, but waited.

48 + top -b -d $vmstat_wait -n $vmstat_repeat >> $top_log
49 + cat $top_log | grep '.Cpu.*' | tail -n $vmstat_tail > $vmstat_log.reduced
If we are no longer using vmstat, we may as well change the variable names to reflect that as well.

Printing the entire top log of every run is not always desirable, and sometimes gets in the way. Considering our primary use, I think it's sensible to have it on by default, but we should at least have an option to turn it off.

review: Needs Fixing

Revision history for this message

Alexander Sack (asac) wrote on 2013-08-19: Posted in a previous version of this proposal

#

> I haven't looked too deeply just yet, but there are some things I see after a
> quick trial run locally:
>
> >Measurement:
> > + idle level: 148.50
> > + idle sum: 148.5 / count: 2
> >
> >system settled. SUCCESS
> I told it to only pass if it hit 99% or better, and my system was constantly
> at 80% or less idle... clearly this is not good.

I will fix the problem you saw with your 140% idle average ... need a way to reproduce. The script works here on my system and on the phone.

>
> Using top in the way you are using it right now means that we only see that
> it's starting the first pass, but then we don't get any more feedback that
> something is happening until the end of the entire run.

We see the result... not the pumping. Instead you get everything nicely at the end. I think its good that way. I will tweak the output to be clearer that its just "measuring now..."

>
> 27 +echo " cmd = 'top -b -d $vmstat_wait -n $vmstat_repeat' ignoring
> first $vmstat_ignore (tail: $vmstat_tail)"
> Instead of printing out 7 lines of verbose detail about the options we just
> passed it, I think it would make more sense to just have the above line do
> something like print the exact command line args. I had thought about changing
> this in the first pass, but waited.

Yeah let me do something about this. will probably remove that line because we have all the details now. And then using set -x set +x to turn on and off echoing the command line (so we really see what happens).

>
> 48 + top -b -d $vmstat_wait -n $vmstat_repeat >> $top_log
> 49 + cat $top_log | grep '.Cpu.*' | tail -n $vmstat_tail >
> $vmstat_log.reduced
> If we are no longer using vmstat, we may as well change the variable names to
> reflect that as well.
>
> Printing the entire top log of every run is not always desirable, and
> sometimes gets in the way. Considering our primary use, I think it's sensible
> to have it on by default, but we should at least have an option to turn it
> off.

I see that some might want it, but then I don't think that use case is really relevant to maintain right now... unless you have a real case where we dont want it in automation.

The

> I haven't looked too deeply just yet, but there are some things I see after a
> quick trial run locally:
> 
> >Measurement:
> >  + idle level: 148.50
> >  + idle sum: 148.5 / count: 2
> >
> >system settled. SUCCESS
> I told it to only pass if it hit 99% or better, and my system was constantly
> at 80% or less idle... clearly this is not good.

I will fix the problem you saw with your 140% idle average ... need a way to reproduce. The script works here on my system and on the phone.

> 
> Using top in the way you are using it right now means that we only see that
> it's starting the first pass, but then we don't get any more feedback that
> something is happening until the end of the entire run.

We see the result... not the pumping. Instead you get everything nicely at the end. I think its good that way. I will tweak the output to be clearer that its just "measuring now..."

> 
> 27      +echo " cmd = 'top -b -d $vmstat_wait -n $vmstat_repeat' ignoring
> first $vmstat_ignore (tail: $vmstat_tail)"
> Instead of printing out 7 lines of verbose detail about the options we just
> passed it, I think it would make more sense to just have the above line do
> something like print the exact command line args. I had thought about changing
> this in the first pass, but waited.

Yeah let me do something about this. will probably remove that line because we have all the details now. And then using set -x set +x to turn on and off echoing the command line (so we really see what happens).

> 
> 48      + top -b -d $vmstat_wait -n $vmstat_repeat >> $top_log
> 49      + cat $top_log | grep '.Cpu.*' | tail -n $vmstat_tail >
> $vmstat_log.reduced
> If we are no longer using vmstat, we may as well change the variable names to
> reflect that as well.
> 
> Printing the entire top log of every run is not always desirable, and
> sometimes gets in the way. Considering our primary use, I think it's sensible
> to have it on by default, but we should at least have an option to turn it
> off.

I see that some might want it, but then I don't think that use case is really relevant to maintain right now... unless you have a real case where we dont want it in automation.

The

Revision history for this message

Alexander Sack (asac) wrote on 2013-08-19: Posted in a previous version of this proposal

#

pushed an updated version which addresses all comments except the request for conveniently disabling dumping the toplog at the end.

Revision history for this message

Paul Larson (pwlars) wrote on 2013-08-19:

#

Tested the new version and it fixes the bug I saw.

review: Approve

 === modified file 'systemsettle/systemsettle.sh'
 --- systemsettle/systemsettle.sh	2013-08-16 04:33:28 +0000
 +++ systemsettle/systemsettle.sh	2013-08-19 15:01:53 +0000
@@ -9,19 +9,17 @@
  cleanup () {
    if ! test "$dump_error" = 0; then
--    echo "System failed to settle to target idle level ($idle_avg_min)"
--    echo "   + check out the following top log taken at each retry:"
++    echo "Check out the following top log taken at each retry:"
++    echo
      # dumb toplog indented
      while read line; do
        echo "  $line"
      done < $top_log
--
--    echo
      # dont rerun this logic in case we get multiple signals
      dump_error=0
    fi
--  rm -f $top_log $vmstat_log $vmstat_log.reduced
++  rm -f $top_log $top_log.reduced
+ }
  function show_usage() {
@@ -30,10 +28,10 @@
     echo "Options:"
     echo " -r  run forever without exiting"
     echo " -p  minimum idle percent to wait for (Default: 99)"
--   echo " -c  number of times to run vmstat at each iteration (Default: 10)"
--   echo " -d  seconds to delay between each vmstat iteration (Default: 6)"
--   echo " -i  vmstat measurements to ignore from each loop (Default: 1)"
--   echo " -m  maximum loops of vmstat before giving up if minimum idle"
++   echo " -c  number of times to run top at each iteration (Default: 10)"
++   echo " -d  seconds to delay between each top iteration (Default: 6)"
++   echo " -i  top measurements to ignore from each loop (Default: 1)"
++   echo " -m  maximum loops of top before giving up if minimum idle"
     echo "     percent is not reached (Default: 1)"
     exit 129
+ }
@@ -46,11 +44,11 @@
                ;;
          p)    idle_avg_min=$OPTARG
                ;;
--        c)    vmstat_repeat=$OPTARG
--              ;;
--        d)    vmstat_wait=$OPTARG
--              ;;
--        i)    vmstat_ignore=$OPTARG
++        c)    top_repeat=$OPTARG
++              ;;
++        d)    top_wait=$OPTARG
++              ;;
++        i)    top_ignore=$OPTARG
                ;;
          m)    settle_max=$OPTARG
                ;;
@@ -59,54 +57,56 @@
  # minimum average idle level required to succeed
  idle_avg_min=${idle_avg_min:-99}
--# measurement details: vmstat $vmstat_wait $vmstat_repeat
--vmstat_repeat=${vmstat_repeat:-10}
--vmstat_wait=${vmstat_wait:-6}
++# measurement details: top $top_wait $top_repeat
++top_repeat=${top_repeat:-10}
++top_wait=${top_wait:-6}
  # how many samples to ignore
--vmstat_ignore=${vmstat_ignore:-1}
++top_ignore=${top_ignore:-1}
  # how many total attempts to settle the system
  settle_max=${settle_max:-10}
  # set and calc more runtime values
--vmstat_tail=`calc $vmstat_repeat - $vmstat_ignore`
++top_tail=`calc $top_repeat - $top_ignore`
  settle_count=0
  idle_avg=0
  echo "System Settle run - quiesce the system"
  echo "--------------------------------------"
  echo
--echo "  + cmd: \'vmstat $vmstat_wait $vmstat_repeat\' ignoring first $vmstat_ignore (tail: $vmstat_tail)"
++echo "  idle_avg_min   = '$idle_avg_min'"
++echo "  top_repeat  = '$top_repeat'"
++echo "  top_wait    = '$top_wait'"
++echo "  top_ignore  = '$top_ignore'"
++echo "  settle_max     = '$settle_max'"
++echo "  run_forever    = '$settle_prefix' (- = yes)"
  echo
  trap cleanup EXIT INT QUIT ILL KILL SEGV TERM
--vmstat_log=`mktemp -t`
  top_log=`mktemp -t`
  while test `calc $idle_avg '<' $idle_avg_min` = 1 -a "$settle_prefix$settle_count" -lt "$settle_max"; do
--  echo Starting settle run $settle_count:
--
--  # get vmstat
--  vmstat $vmstat_wait $vmstat_repeat | tee $vmstat_log
--  cat $vmstat_log | tail -n $vmstat_tail > $vmstat_log.reduced
--
--  # log top output for potential debugging
++  echo -n "Starting system idle measurement (run: $settle_count) ... "
++
++  # get top
    echo "TOP DUMP (after settle run: $settle_count)" >> $top_log
    echo "========================" >> $top_log
--  top -n 1 -b >> $top_log
++  top -b -d $top_wait -n $top_repeat >> $top_log
++  cat $top_log | grep '.Cpu.*' | tail -n $top_tail > $top_log.reduced
    echo >> $top_log
    # calc average of idle field for this measurement
    sum=0
    count=0
    while read line; do
--     idle=`echo $line | sed -e 's/\s\s*/ /g' | cut -d ' ' -f 15`
++     idle=`echo $line | sed -e 's/.* \([0-9\.]*\) id.*/\1/'`
       sum=`calc $sum + $idle`
       count=`calc $count + 1`
--  done < $vmstat_log.reduced
++  done < $top_log.reduced
--  idle_avg=`calc $sum.0 / $count.0`
++  idle_avg=`calc $sum / $count`
    settle_count=`calc $settle_count + 1`
++  echo " DONE."
    echo
    echo "Measurement:"
    echo "  + idle level: $idle_avg"
@@ -119,7 +119,6 @@
    exit 1
  else
    echo "system settled. SUCCESS"
--  dump_error=0
    exit 0
  fi

Ubuntu Test Cases

Merge lp:~asac/ubuntu-test-cases/use-top-and-always-dumb-toplog into lp:ubuntu-test-cases/touch

Commit message

Description of the change

Preview Diff

Subscribers