Merge lp:~julian-edwards/maas/fix-amt-template-retries-and-error-handling into lp:~maas-committers/maas/trunk

Proposed by Julian Edwards
Status: Merged
Approved by: Julian Edwards
Approved revision: no longer in the source branch.
Merged at revision: 3027
Proposed branch: lp:~julian-edwards/maas/fix-amt-template-retries-and-error-handling
Merge into: lp:~maas-committers/maas/trunk
Diff against target: 103 lines (+30/-36)
1 file modified
etc/maas/templates/power/amt.template (+30/-36)
To merge this branch: bzr merge lp:~julian-edwards/maas/fix-amt-template-retries-and-error-handling
Reviewer Review Type Date Requested Status
Gavin Panella (community) Approve
Review via email: mp+235070@code.launchpad.net

Commit message

Changes to the AMT power template so that it catches errors from the amttool and exits with a code of 1 (which means you can retry). Also remove all the in-template retries as that is now handled by the core driver.

Description of the change

Tested on my NUCs by changing the IP to a bogus one which results in a similar error to the one in the linked bug. Also verified that the reties DTRT centrally, and finally I verified that the happy path works for starting and stopping the node.

To post a comment you must log in.
Revision history for this message
Gavin Panella (allenap) wrote :

Looks good.

review: Approve
Revision history for this message
Julian Edwards (julian-edwards) wrote :

Ah didn't know it waited in the core!

I think these are still required as they are delays between ops *in the
template*, not external retries.

Thanks for the review, it's always nice to have a shell expert on hand ;)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'etc/maas/templates/power/amt.template'
--- etc/maas/templates/power/amt.template 2014-09-12 10:27:06 +0000
+++ etc/maas/templates/power/amt.template 2014-09-18 01:22:37 +0000
@@ -23,6 +23,7 @@
2323
24# Exit with failure message.24# Exit with failure message.
25# Parameters: exit code, and error message.25# Parameters: exit code, and error message.
26# Fail with exit code 1 to ask for a retry, exit code 2 for a hard fail.
26fail() {27fail() {
27 echo "$2" >&228 echo "$2" >&2
28 exit $129 exit $1
@@ -31,7 +32,8 @@
3132
32# Perform a command using amttool.33# Perform a command using amttool.
33issue_amt_command() {34issue_amt_command() {
34 LC_ALL='C' AMT_PASSWORD="$power_pass" amttool "$ip_address" "$@"35 LC_ALL='C' AMT_PASSWORD="$power_pass" amttool "$ip_address" "$@" || fail 1 "amttool failure"
36 # Stderr will be caught by the power driver, no need to repeat it.
35}37}
3638
37# Check that amttool is present.39# Check that amttool is present.
@@ -44,17 +46,14 @@
44 # Retry the state if it fails because it often fails the first time.46 # Retry the state if it fails because it often fails the first time.
45 local state=47 local state=
46 local count=48 local count=
47 for attempts in $(seq 0 10)49 state=$(issue_amt_command info | grep '^Powerstate:' | awk '{print $2}')
48 do50 if [ -n "$state" ]
49 state=$(issue_amt_command info | grep '^Powerstate:' | awk '{print $2}')51 then
50 if [ -n "$state" ]52 break
51 then53 fi
52 break54 # Wait 1 second between queries AMT controllers are generally very
53 fi55 # light and may not be comfortable with more frequent queries.
54 # Wait 1 second between retries. AMT controllers are generally very56 sleep 1
55 # light and may not be comfortable with more frequent queries.
56 sleep 1
57 done
58 case "$state" in57 case "$state" in
59 S[0-4])58 S[0-4])
60 # Wide awake (S0), or asleep (S1-S4), but not a clean slate that59 # Wide awake (S0), or asleep (S1-S4), but not a clean slate that
@@ -80,18 +79,15 @@
80# Power the machine on, and boot it using PXE.79# Power the machine on, and boot it using PXE.
81power_on() {80power_on() {
82 local attempts=81 local attempts=
83 # Try several times. Power commands often fail the first time.82 # Do not retry here, it's handled in the core power driver.
84 for attempts in $(seq 0 10)83 # Issue the AMT command; amttool will prompt for confirmation.
85 do84 yes | issue_amt_command powerup pxe
86 # Issue the AMT command; amttool will prompt for confirmation.85 sleep 1
87 yes | issue_amt_command powerup pxe86 if [ "$(query_state)" = 'on' ]
88 if [ "$(query_state)" = 'on' ]87 then
89 then88 # Success. Machine is on.
90 # Success. Machine is on.89 return 0
91 return 090 fi
92 fi
93 sleep 1
94 done
95 fail 1 "Machine is not powering on. Giving up."91 fail 1 "Machine is not powering on. Giving up."
96}92}
9793
@@ -99,18 +95,16 @@
99# Power the machine off.95# Power the machine off.
100power_off() {96power_off() {
101 local attempts=97 local attempts=
102 # Try several times. Power commands often fail the first time.98 # Do not retry here, it's handled in the core power driver.
103 for attempts in $(seq 0 10)99 # Issue the AMT command; amttool will prompt for confirmation.
104 do100 yes | issue_amt_command powerdown
105 if [ "$(query_state)" = 'off' ]101 sleep 1
106 then102 if [ "$(query_state)" = 'off' ]
107 # Success. Machine is off.103 then
108 return 0104 # Success. Machine is off.
109 fi105 return 0
110 # Issue the AMT command; amttool will prompt for confirmation.106 fi
111 yes | issue_amt_command powerdown107
112 sleep 1
113 done
114 fail 1 "Machine is not powering off. Giving up."108 fail 1 "Machine is not powering off. Giving up."
115}109}
116110