Merge lp:~julian-edwards/maas/fix-amt-template-retries-and-error-handling into lp:~maas-committers/maas/trunk

Proposed by Julian Edwards
Status: Merged
Approved by: Julian Edwards
Approved revision: no longer in the source branch.
Merged at revision: 3027
Proposed branch: lp:~julian-edwards/maas/fix-amt-template-retries-and-error-handling
Merge into: lp:~maas-committers/maas/trunk
Diff against target: 103 lines (+30/-36)
1 file modified
etc/maas/templates/power/amt.template (+30/-36)
To merge this branch: bzr merge lp:~julian-edwards/maas/fix-amt-template-retries-and-error-handling
Reviewer Review Type Date Requested Status
Gavin Panella (community) Approve
Review via email: mp+235070@code.launchpad.net

Commit message

Changes to the AMT power template so that it catches errors from the amttool and exits with a code of 1 (which means you can retry). Also remove all the in-template retries as that is now handled by the core driver.

Description of the change

Tested on my NUCs by changing the IP to a bogus one which results in a similar error to the one in the linked bug. Also verified that the reties DTRT centrally, and finally I verified that the happy path works for starting and stopping the node.

To post a comment you must log in.
Revision history for this message
Gavin Panella (allenap) wrote :

Looks good.

review: Approve
Revision history for this message
Julian Edwards (julian-edwards) wrote :

Ah didn't know it waited in the core!

I think these are still required as they are delays between ops *in the
template*, not external retries.

Thanks for the review, it's always nice to have a shell expert on hand ;)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'etc/maas/templates/power/amt.template'
2--- etc/maas/templates/power/amt.template 2014-09-12 10:27:06 +0000
3+++ etc/maas/templates/power/amt.template 2014-09-18 01:22:37 +0000
4@@ -23,6 +23,7 @@
5
6 # Exit with failure message.
7 # Parameters: exit code, and error message.
8+# Fail with exit code 1 to ask for a retry, exit code 2 for a hard fail.
9 fail() {
10 echo "$2" >&2
11 exit $1
12@@ -31,7 +32,8 @@
13
14 # Perform a command using amttool.
15 issue_amt_command() {
16- LC_ALL='C' AMT_PASSWORD="$power_pass" amttool "$ip_address" "$@"
17+ LC_ALL='C' AMT_PASSWORD="$power_pass" amttool "$ip_address" "$@" || fail 1 "amttool failure"
18+ # Stderr will be caught by the power driver, no need to repeat it.
19 }
20
21 # Check that amttool is present.
22@@ -44,17 +46,14 @@
23 # Retry the state if it fails because it often fails the first time.
24 local state=
25 local count=
26- for attempts in $(seq 0 10)
27- do
28- state=$(issue_amt_command info | grep '^Powerstate:' | awk '{print $2}')
29- if [ -n "$state" ]
30- then
31- break
32- fi
33- # Wait 1 second between retries. AMT controllers are generally very
34- # light and may not be comfortable with more frequent queries.
35- sleep 1
36- done
37+ state=$(issue_amt_command info | grep '^Powerstate:' | awk '{print $2}')
38+ if [ -n "$state" ]
39+ then
40+ break
41+ fi
42+ # Wait 1 second between queries AMT controllers are generally very
43+ # light and may not be comfortable with more frequent queries.
44+ sleep 1
45 case "$state" in
46 S[0-4])
47 # Wide awake (S0), or asleep (S1-S4), but not a clean slate that
48@@ -80,18 +79,15 @@
49 # Power the machine on, and boot it using PXE.
50 power_on() {
51 local attempts=
52- # Try several times. Power commands often fail the first time.
53- for attempts in $(seq 0 10)
54- do
55- # Issue the AMT command; amttool will prompt for confirmation.
56- yes | issue_amt_command powerup pxe
57- if [ "$(query_state)" = 'on' ]
58- then
59- # Success. Machine is on.
60- return 0
61- fi
62- sleep 1
63- done
64+ # Do not retry here, it's handled in the core power driver.
65+ # Issue the AMT command; amttool will prompt for confirmation.
66+ yes | issue_amt_command powerup pxe
67+ sleep 1
68+ if [ "$(query_state)" = 'on' ]
69+ then
70+ # Success. Machine is on.
71+ return 0
72+ fi
73 fail 1 "Machine is not powering on. Giving up."
74 }
75
76@@ -99,18 +95,16 @@
77 # Power the machine off.
78 power_off() {
79 local attempts=
80- # Try several times. Power commands often fail the first time.
81- for attempts in $(seq 0 10)
82- do
83- if [ "$(query_state)" = 'off' ]
84- then
85- # Success. Machine is off.
86- return 0
87- fi
88- # Issue the AMT command; amttool will prompt for confirmation.
89- yes | issue_amt_command powerdown
90- sleep 1
91- done
92+ # Do not retry here, it's handled in the core power driver.
93+ # Issue the AMT command; amttool will prompt for confirmation.
94+ yes | issue_amt_command powerdown
95+ sleep 1
96+ if [ "$(query_state)" = 'off' ]
97+ then
98+ # Success. Machine is off.
99+ return 0
100+ fi
101+
102 fail 1 "Machine is not powering off. Giving up."
103 }
104