Merge lp:~asac/ubuntu-test-cases/use-top-and-always-dumb-toplog into lp:ubuntu-test-cases/touch

Proposed by Alexander Sack
Status: Merged
Merged at revision: 12
Proposed branch: lp:~asac/ubuntu-test-cases/use-top-and-always-dumb-toplog
Merge into: lp:ubuntu-test-cases/touch
Diff against target: 140 lines (+32/-33)
1 file modified
systemsettle/systemsettle.sh (+32/-33)
To merge this branch: bzr merge lp:~asac/ubuntu-test-cases/use-top-and-always-dumb-toplog
Reviewer Review Type Date Requested Status
Paul Larson Approve
Review via email: mp+180889@code.launchpad.net

This proposal supersedes a proposal from 2013-08-19.

Description of the change

+ use top in batchmode instead of vmstat
+ improve logging; echo cli arguments given as well as
+ dumb top_log regardless of success and failyure
+ top_log is now more useful to see what was going on as we dumb exactly what was measured
+ addresses all previous comments (minus request for optionally dumping top log)

To post a comment you must log in.
Revision history for this message
Paul Larson (pwlars) wrote : Posted in a previous version of this proposal

I haven't looked too deeply just yet, but there are some things I see after a quick trial run locally:

>Measurement:
> + idle level: 148.50
> + idle sum: 148.5 / count: 2
>
>system settled. SUCCESS
I told it to only pass if it hit 99% or better, and my system was constantly at 80% or less idle... clearly this is not good.

Using top in the way you are using it right now means that we only see that it's starting the first pass, but then we don't get any more feedback that something is happening until the end of the entire run.

27 +echo " cmd = 'top -b -d $vmstat_wait -n $vmstat_repeat' ignoring first $vmstat_ignore (tail: $vmstat_tail)"
Instead of printing out 7 lines of verbose detail about the options we just passed it, I think it would make more sense to just have the above line do something like print the exact command line args. I had thought about changing this in the first pass, but waited.

48 + top -b -d $vmstat_wait -n $vmstat_repeat >> $top_log
49 + cat $top_log | grep '.Cpu.*' | tail -n $vmstat_tail > $vmstat_log.reduced
If we are no longer using vmstat, we may as well change the variable names to reflect that as well.

Printing the entire top log of every run is not always desirable, and sometimes gets in the way. Considering our primary use, I think it's sensible to have it on by default, but we should at least have an option to turn it off.

review: Needs Fixing
Revision history for this message
Alexander Sack (asac) wrote : Posted in a previous version of this proposal

> I haven't looked too deeply just yet, but there are some things I see after a
> quick trial run locally:
>
> >Measurement:
> > + idle level: 148.50
> > + idle sum: 148.5 / count: 2
> >
> >system settled. SUCCESS
> I told it to only pass if it hit 99% or better, and my system was constantly
> at 80% or less idle... clearly this is not good.

I will fix the problem you saw with your 140% idle average ... need a way to reproduce. The script works here on my system and on the phone.

>
> Using top in the way you are using it right now means that we only see that
> it's starting the first pass, but then we don't get any more feedback that
> something is happening until the end of the entire run.

We see the result... not the pumping. Instead you get everything nicely at the end. I think its good that way. I will tweak the output to be clearer that its just "measuring now..."

>
> 27 +echo " cmd = 'top -b -d $vmstat_wait -n $vmstat_repeat' ignoring
> first $vmstat_ignore (tail: $vmstat_tail)"
> Instead of printing out 7 lines of verbose detail about the options we just
> passed it, I think it would make more sense to just have the above line do
> something like print the exact command line args. I had thought about changing
> this in the first pass, but waited.

Yeah let me do something about this. will probably remove that line because we have all the details now. And then using set -x set +x to turn on and off echoing the command line (so we really see what happens).

>
> 48 + top -b -d $vmstat_wait -n $vmstat_repeat >> $top_log
> 49 + cat $top_log | grep '.Cpu.*' | tail -n $vmstat_tail >
> $vmstat_log.reduced
> If we are no longer using vmstat, we may as well change the variable names to
> reflect that as well.
>
> Printing the entire top log of every run is not always desirable, and
> sometimes gets in the way. Considering our primary use, I think it's sensible
> to have it on by default, but we should at least have an option to turn it
> off.

I see that some might want it, but then I don't think that use case is really relevant to maintain right now... unless you have a real case where we dont want it in automation.

The

Revision history for this message
Alexander Sack (asac) wrote : Posted in a previous version of this proposal

pushed an updated version which addresses all comments except the request for conveniently disabling dumping the toplog at the end.

Revision history for this message
Paul Larson (pwlars) wrote :

Tested the new version and it fixes the bug I saw.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'systemsettle/systemsettle.sh'
2--- systemsettle/systemsettle.sh 2013-08-16 04:33:28 +0000
3+++ systemsettle/systemsettle.sh 2013-08-19 15:01:53 +0000
4@@ -9,19 +9,17 @@
5
6 cleanup () {
7 if ! test "$dump_error" = 0; then
8- echo "System failed to settle to target idle level ($idle_avg_min)"
9- echo " + check out the following top log taken at each retry:"
10+ echo "Check out the following top log taken at each retry:"
11
12+ echo
13 # dumb toplog indented
14 while read line; do
15 echo " $line"
16 done < $top_log
17-
18- echo
19 # dont rerun this logic in case we get multiple signals
20 dump_error=0
21 fi
22- rm -f $top_log $vmstat_log $vmstat_log.reduced
23+ rm -f $top_log $top_log.reduced
24 }
25
26 function show_usage() {
27@@ -30,10 +28,10 @@
28 echo "Options:"
29 echo " -r run forever without exiting"
30 echo " -p minimum idle percent to wait for (Default: 99)"
31- echo " -c number of times to run vmstat at each iteration (Default: 10)"
32- echo " -d seconds to delay between each vmstat iteration (Default: 6)"
33- echo " -i vmstat measurements to ignore from each loop (Default: 1)"
34- echo " -m maximum loops of vmstat before giving up if minimum idle"
35+ echo " -c number of times to run top at each iteration (Default: 10)"
36+ echo " -d seconds to delay between each top iteration (Default: 6)"
37+ echo " -i top measurements to ignore from each loop (Default: 1)"
38+ echo " -m maximum loops of top before giving up if minimum idle"
39 echo " percent is not reached (Default: 1)"
40 exit 129
41 }
42@@ -46,11 +44,11 @@
43 ;;
44 p) idle_avg_min=$OPTARG
45 ;;
46- c) vmstat_repeat=$OPTARG
47- ;;
48- d) vmstat_wait=$OPTARG
49- ;;
50- i) vmstat_ignore=$OPTARG
51+ c) top_repeat=$OPTARG
52+ ;;
53+ d) top_wait=$OPTARG
54+ ;;
55+ i) top_ignore=$OPTARG
56 ;;
57 m) settle_max=$OPTARG
58 ;;
59@@ -59,54 +57,56 @@
60
61 # minimum average idle level required to succeed
62 idle_avg_min=${idle_avg_min:-99}
63-# measurement details: vmstat $vmstat_wait $vmstat_repeat
64-vmstat_repeat=${vmstat_repeat:-10}
65-vmstat_wait=${vmstat_wait:-6}
66+# measurement details: top $top_wait $top_repeat
67+top_repeat=${top_repeat:-10}
68+top_wait=${top_wait:-6}
69 # how many samples to ignore
70-vmstat_ignore=${vmstat_ignore:-1}
71+top_ignore=${top_ignore:-1}
72 # how many total attempts to settle the system
73 settle_max=${settle_max:-10}
74
75 # set and calc more runtime values
76-vmstat_tail=`calc $vmstat_repeat - $vmstat_ignore`
77+top_tail=`calc $top_repeat - $top_ignore`
78 settle_count=0
79 idle_avg=0
80
81 echo "System Settle run - quiesce the system"
82 echo "--------------------------------------"
83 echo
84-echo " + cmd: \'vmstat $vmstat_wait $vmstat_repeat\' ignoring first $vmstat_ignore (tail: $vmstat_tail)"
85+echo " idle_avg_min = '$idle_avg_min'"
86+echo " top_repeat = '$top_repeat'"
87+echo " top_wait = '$top_wait'"
88+echo " top_ignore = '$top_ignore'"
89+echo " settle_max = '$settle_max'"
90+echo " run_forever = '$settle_prefix' (- = yes)"
91 echo
92
93 trap cleanup EXIT INT QUIT ILL KILL SEGV TERM
94-vmstat_log=`mktemp -t`
95 top_log=`mktemp -t`
96
97 while test `calc $idle_avg '<' $idle_avg_min` = 1 -a "$settle_prefix$settle_count" -lt "$settle_max"; do
98- echo Starting settle run $settle_count:
99-
100- # get vmstat
101- vmstat $vmstat_wait $vmstat_repeat | tee $vmstat_log
102- cat $vmstat_log | tail -n $vmstat_tail > $vmstat_log.reduced
103-
104- # log top output for potential debugging
105+ echo -n "Starting system idle measurement (run: $settle_count) ... "
106+
107+ # get top
108 echo "TOP DUMP (after settle run: $settle_count)" >> $top_log
109 echo "========================" >> $top_log
110- top -n 1 -b >> $top_log
111+ top -b -d $top_wait -n $top_repeat >> $top_log
112+ cat $top_log | grep '.Cpu.*' | tail -n $top_tail > $top_log.reduced
113 echo >> $top_log
114
115 # calc average of idle field for this measurement
116 sum=0
117 count=0
118 while read line; do
119- idle=`echo $line | sed -e 's/\s\s*/ /g' | cut -d ' ' -f 15`
120+ idle=`echo $line | sed -e 's/.* \([0-9\.]*\) id.*/\1/'`
121 sum=`calc $sum + $idle`
122 count=`calc $count + 1`
123- done < $vmstat_log.reduced
124+ done < $top_log.reduced
125
126- idle_avg=`calc $sum.0 / $count.0`
127+ idle_avg=`calc $sum / $count`
128 settle_count=`calc $settle_count + 1`
129
130+ echo " DONE."
131 echo
132 echo "Measurement:"
133 echo " + idle level: $idle_avg"
134@@ -119,7 +119,6 @@
135 exit 1
136 else
137 echo "system settled. SUCCESS"
138- dump_error=0
139 exit 0
140 fi
141

Subscribers

People subscribed via source and target branches