Merge ~ltrager/maas:lp1730524 into maas:master

Proposed by Lee Trager
Status: Merged
Approved by: Lee Trager
Approved revision: 3923d84a1a312adaa0d0bdc5ff5d07fdaedf5ac2
Merge reported by: MAAS Lander
Merged at revision: not available
Proposed branch: ~ltrager/maas:lp1730524
Merge into: maas:master
Diff against target: 24 lines (+5/-1)
1 file modified
src/metadataserver/user_data/templates/commissioning.template (+5/-1)
Reviewer Review Type Date Requested Status
MAAS Lander Approve
Blake Rouse (community) Approve
Review via email: mp+334894@code.launchpad.net

Commit message

LP: #1730524 - Ping the metadata service before starting commissioning.

Before starting commissioning ping the metadata service, if unable exit. This
prevents the BMC password from being changed with the ephemeral environment
having no way to signal.

To post a comment you must log in.
~ltrager/maas:lp1730524 updated
3923d84... by Lee Trager

s/ipmi/bmc/g

Revision history for this message
Blake Rouse (blake-rouse) wrote :

Looks good.

This is a good change, but I would like to see this even more robust.

1. Check we can talk to MAAS. (as you have added)
2. Save current password on BMC.
3. Update password on BMC.
4. Update MAAS with new BMC.
5. If #4 fails then reset BMC password back to saved from #2.

I think that #5 would really make this a robust change.

I am not going to block you on this branch, as I know your trying to make this a quick fix. But it would be a good change to see.

review: Approve
Revision history for this message
Andres Rodriguez (andreserl) wrote :

I agree with Blake in that while this seems to be a good change, I'm not quite sure it will address the issue.

So, I investigate this issue before and the real issue is not exactly what Blake is describing. The IPMI discover script already ensures that if there's an error updating the password on the BMC, that it won't update the password on MAAS.

However, the issue is different. What happens is that while or right before MAAS is to run the IPMI discovery script, a different error could cause MAAS to mark a machine failed commissioning. Doing so, doesn't prevent the machine from continueing on running the IPMI discovery, as such, the BMC password gets changed, and even though MAAS had already amrked the machine failed commissioning, MAAS never stops running any remaining script.

This is because cloud-init runs the scripts. So this would be:

1. cloud-init starts
2. cloud-init sends failure message
3. right after the message, maas starts ipmi discovery and changes password
4. Howeve5r before password has been sent to MAAS, MAAS processes 2 above and marks machine failed commissioning
5. cloud-init continues running ipmi discover, resets password on BMC and attempts to send it back to MAAS, but fails because MAAS has already marked the machine as failed commissioning and removes the token.
6. cloud-init continues running other parts of its scripts.

Revision history for this message
Blake Rouse (blake-rouse) wrote :

If you do what I describe then your #5 would fail and the script would set it back to the original.

The issue is that MAAS should verify that it can communicate to that BMC. The ultimate workflow would be this.

1. Save the password
2. Change the password
3. Update MAAS with that password
4. MAAS performs a power check
5. Script polls MAAS until a valid power check occurred
6. If #5 fails revert back to #1.

Revision history for this message
MAAS Lander (maas-lander) wrote :

UNIT TESTS
-b lp1730524 lp:~ltrager/maas into -b master lp:~maas-committers/maas

STATUS: SUCCESS
COMMIT: 3923d84a1a312adaa0d0bdc5ff5d07fdaedf5ac2

review: Approve
Revision history for this message
Lee Trager (ltrager) wrote :

In master today when commissioning runs the user data script runs to completion and ignores failed commands. This is because the modprobes needed for IPMI discovery fail when no IPMI device is available. This bug is related to LP:1730456 where the node could not access the metadata server. What was happening is commissioning would run change the BMC password, then try to send it to the region via the metadata service. Since the node couldn't access the metadata service the password was changed but MAAS wasn't told what the new password is.

Now commissioning tries to access the metadata service first. If it can't commissioning does not proceed so no new BMC password is set. I agree it would be a good idea to store the original BMC username and password however I'm not sure how to retrieve the existing username and password for IPMI, moonshot, and wedge. If someone could give me the code to do that I can add it to this branch. Otherwise I'd like to merge as is and create a separate bug for that.

Revision history for this message
Lee Trager (ltrager) wrote :

Spoke with Andres over IRC, landing in master only.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/src/metadataserver/user_data/templates/commissioning.template b/src/metadataserver/user_data/templates/commissioning.template
2index f0d49ab..bea840f 100644
3--- a/src/metadataserver/user_data/templates/commissioning.template
4+++ b/src/metadataserver/user_data/templates/commissioning.template
5@@ -27,6 +27,10 @@ add_ipmi_config() {
6 main() {
7 prep_maas_api_helper
8
9+ # LP:1730524 - Make sure we can signal the metadata service before updating
10+ # the BMC username and password.
11+ signal WORKING "Starting [maas-bmc-autodetect]" || exit 1
12+
13 # Install IPMI deps
14 aptget install freeipmi-tools openipmi ipmitool sshpass
15
16@@ -61,7 +65,7 @@ main() {
17 if [ ! -z "$power_settings" ]; then
18 signal \
19 "--power-type=${power_type}" "--power-parameters=${power_settings}" \
20- WORKING "Finished [maas-ipmi-autodetect]"
21+ WORKING "Finished [maas-bmc-autodetect]"
22 fi
23
24 maas-run-remote-scripts "--config=${CRED_CFG}" "${TEMP_D}"

Subscribers

People subscribed via source and target branches