when net booting servers with MAAS, grub should never wait for user input to reboot

Bug #1747927 reported by Jason Hobbs
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
grub2 (Ubuntu)
New
Undecided
Unassigned

Bug Description

In bug 1743249, grub sometimes would hang waiting for user input on the keyboard. This is never an appropriate action for a net booting server, at least not one booting under MAAS direction. It may be appropriate to pause for 30 seconds or something so someone can see the error, but to just hang is wrong.

https://imgur.com/a/as8Sx

summary: - when net booting servers, grub should never wait for user input to
- reboot
+ when net booting servers with MAAS, grub should never wait for user
+ input to reboot
Revision history for this message
Steve Langasek (vorlon) wrote :

As I asked in https://bugs.launchpad.net/maas/+bug/1743249/comments/65:

What do you think the correct behavior should be when grub cannot find the file that it needs in order to boot? Should grub enter a boot loop, retrying endlessly? Should it try to halt the system? Why is either of these options more correct than putting the machine to a console prompt?

Changed in grub2 (Ubuntu):
status: New → Incomplete
Revision history for this message
Jason Hobbs (jason-hobbs) wrote : Re: [Bug 1747927] Re: when net booting servers with MAAS, grub should never wait for user input to reboot

Steve,

Sorry, I missed comment #65 on the original bug.

I think this is a bit different question than what to do when it can't
find the file.

In that bug, grub has been instructed to fall back to
grub.cfg-default-amd64 if it can't find the file. In some cases it
does, but in others it displays the error message linked above and
hangs up.

I don't know what happens if it tries to get grub.cfg-default-amd64
and can't - we haven't ever run into that.

In the case of not being able to find the file, or of an error, I
think it should go back to UEFI and let UEFI handle it as a failed
boot/can't boot from that device, and let UEFI handle that however it
handles it.

I think that's better because it leaves it up to UEFI to handle it
just like it would any other failed boot. If that's by rebooting, or
retrying the boot devices, things may work the second time, for
example, if the network boot server is no longer overloaded.

Looking at the grub code, it only halts in grub_abort() if grub thinks
there is an input method available. I guess there technically is one
available here because you could login via remote console on the BMC
and input text.. but that doesn't seem very friendly.

Jason

On Wed, Feb 7, 2018 at 11:30 AM, Steve Langasek
<email address hidden> wrote:
> As I asked in https://bugs.launchpad.net/maas/+bug/1743249/comments/65:
>
> What do you think the correct behavior should be when grub cannot find
> the file that it needs in order to boot? Should grub enter a boot loop,
> retrying endlessly? Should it try to halt the system? Why is either of
> these options more correct than putting the machine to a console prompt?
>
> ** Changed in: grub2 (Ubuntu)
> Status: New => Incomplete
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1747927
>
> Title:
> when net booting servers with MAAS, grub should never wait for user
> input to reboot
>
> Status in grub2 package in Ubuntu:
> Incomplete
>
> Bug description:
> In bug 1743249, grub sometimes would hang waiting for user input on
> the keyboard. This is never an appropriate action for a net booting
> server, at least not one booting under MAAS direction. It may be
> appropriate to pause for 30 seconds or something so someone can see
> the error, but to just hang is wrong.
>
>
> https://imgur.com/a/as8Sx
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1747927/+subscriptions

tags: removed: cdo-qa-blocker
Changed in grub2 (Ubuntu):
status: Incomplete → New
Revision history for this message
Steve Langasek (vorlon) wrote :

On Wed, Feb 07, 2018 at 06:19:16PM -0000, Jason Hobbs wrote:
> Steve,

> Sorry, I missed comment #65 on the original bug.

> I think this is a bit different question than what to do when it can't
> find the file.

> In that bug, grub has been instructed to fall back to
> grub.cfg-default-amd64 if it can't find the file. In some cases it
> does, but in others it displays the error message linked above and
> hangs up.

The intent of this grub.cfg is to fall back if grub.cfg-<mac> *does not
exist*. The current situation is that it's falling back to
grub.cfg-default-amd64 not because the file does not exist, but because of a
server error.

It is not unambiguously correct to fall back to grub.cfg-default-amd64 in
that case. The current error messages in this scenario do point to a grub
bug. I just don't know that fixing that bug is going to result in the
desired behavior. Booting from local disk (which is what
grub.cfg-default-amd64 does) doesn't seem to me to be sensible error
handling for maas failing to serve the per-host file, and at a high level I
don't see why anyone would want that instead of halting or powering off the
machine. In all cases, for a MAAS-managed system, the solution is the same
- shoot the node in the head, and power it back on to try again.

> In the case of not being able to find the file, or of an error, I
> think it should go back to UEFI and let UEFI handle it as a failed
> boot/can't boot from that device, and let UEFI handle that however it
> handles it.

I quite agree with this - it matches what I suggested to Andres would be a
sensible error handling in the case of maas not being able to serve the
correct grub.cfg-<mac> within the time limit. But I think this is a
domain-specific policy, not something that grub should have as built-in
behavior in the event of all failed config file reads, and therefore is best
achieved by having the maas rack controller itself detect that it's having
problems with the db, and serving a straight 'exit' as the full contents of
grub.cfg-<mac> in place of the real config.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.