Machine fails to boot if MAAS server is not available

Bug #1680917 reported by Michael Iatrou
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
MAAS
Invalid
Critical
Unassigned
curtin
Fix Released
Critical
Blake Rouse

Bug Description

maas 2.1.3+bzr5573-0ubuntu1~16.04.1
Machine type: Intel NUC (power type: Intel AMT, UEFI mode)

Steps to reproduce:

1. Install on the node (Intel NUC) Ubuntu 16.04 (default flat disk layout, default network config)
2. SSH into the deployed node.
3. On the MAAS server, terminate all MAAS services.
4. On the deployed node, do a reboot.

Observed behavior: after power-cycling, the machine stalls waiting to PXE boot.
Expected behavior: the machine should be booting from its disk.

Control environment 1:

1. On the same node, deploy Ubuntu 16.04 using a USB stick (*no* changes to the AMT settings, boot device order, etc)
2. Reboot the device

Observed behavior: properly boots from the disk.

Control environment 2:

1. Install on a KVM/virsh VM Ubuntu 16.04 (default flat disk layout, default network config)
2. SSH into the deployed VM.
3. On the MAAS server, terminate all MAAS services.
4. On the deployed VM, do a reboot.

Observed behavior: the machine is booting from its disk, as expected.

Conclusions:

1. PEBCAK - possible, but the behavior was confirmed on multiple different NUCs, from multiple users.

2. Intel's EFI implementation is fragile - possible, but the scenarios above have been tested with two different firmware versions (1.5 years old - mature, the latest release ~2 months old). Most importantly, when Ubuntu is deployed using a USB stick, the boot behavior works as expected.

3. There is something subtle in the way maas, curtin and cloud-init lay down the system that triggers the observed behavior. If someone can test this on a different hw/power type (e.g. IPMI) I believe it would help with the triage.

The issue is consistently reproducible, happy to provide any logs and configuration details as needed.

Output from dpkg -l '*maas*' http://paste.ubuntu.com/24335257/

Related branches

Changed in maas:
status: New → Incomplete
Revision history for this message
Michael Iatrou (michael.iatrou) wrote :

Many thanks to Francisco Hernandez for fact checking the repro steps and replicating the issue!

Revision history for this message
Andres Rodriguez (andreserl) wrote :

Hi Michael,

I've been able to reproduce this with a NUC, but I've not been able to reproduce this with a Server.

I'll keep it as incomplete for the time being, but this might as well be an issue with UEFI not falling back to booting from disk.

Changed in maas:
milestone: none → 2.2.0rc2
Revision history for this message
Andres Rodriguez (andreserl) wrote :

Ok, so I confirm that this is definitely a bug. Since MAAS doesn't write UEFI config onto the fs, I believe this is a curtin issue. What I've done to reproduce is to install 3 different machines:

A. Machine on legacy boot mode
B. Machine on EFI mode (NUC)
C. Machine on EFI mode (Server)

I then did;

1. killed MAAS completely.
2. rebooted (A), the machine booted into the OS just fine.
3. rebooted (B), the machine didn't boot into the OS.
4. rebooted (C), the machine didn't boot into the OS.

Changed in maas:
status: Incomplete → Triaged
importance: Undecided → Critical
status: Triaged → Invalid
status: Invalid → Incomplete
Revision history for this message
Michael Iatrou (michael.iatrou) wrote :

Thanks for the prompt action Andres!

After a bit more debugging, here is the root cause:

Despite the fact that this is a UEFI system, we do not install a signed linux kernel.
Seems that some EFI implementations are more pedantic than others, thus it affects Intel NUCs but not other systems.

Indeed, going through the repro steps:

1. Install on the node (Intel NUC) Ubuntu 16.04 (default flat disk layout, default network config)
2. SSH into the deployed node.
3. (this is the workaround)
sudo apt-get install linux-signed-generic
sudo dpkg-reconfigure grub-efi-amd64-signed
4. On the MAAS server, terminate all MAAS services.
5. On the deployed node, do a reboot.

Observed behavior: the machine properly boots from its disk, as expected.

From my side I am happy with the workaround for the time being, but it would be great if the fix could land with 2.2.0rc2

Revision history for this message
Ryan Harper (raharper) wrote :

Is there any way curtin would know that it needed the signed version vs. not? Ie, if we default to installing the signed, is that going to break someone else?

Revision history for this message
Ryan Harper (raharper) wrote :

On the systems that required secureboot, can someone run:

My local, older i5 NUC does not have it enabled; I'm curious to see
if those machines that fail to boot without MAAS and require the
signed package report if they're SecureBoot enabled or not.

(lambic) ~ % sudo bootctl status
Using EFI System Parition at /boot/efi.
System:
     Firmware: n/a (n/a)
  Secure Boot: disabled
   Setup Mode: setup

Loader:
      Product: n/a
    Partition: n/a
         File: └─n/a

Boot Loader Binaries:
          ESP: /dev/disk/by-partuuid/5f811d5e-73fe-44a0-a57c-6f8eaa91743e
systemd-boot not installed in ESP.
No default/fallback boot loader installed in ESP.

Boot Loader Entries in EFI Variables:
        Title: ubuntu
           ID: 0x0000
       Status: active, boot-order
    Partition: /dev/disk/by-partuuid/5f811d5e-73fe-44a0-a57c-6f8eaa91743e
         File: └─/EFI/ubuntu/shimx64.efi

Revision history for this message
Ryan Harper (raharper) wrote :

I'm somewhat confused on how a system which requires SecureBoot could PXE boot to maas and then fallback to localdisk.

Revision history for this message
Michael Iatrou (michael.iatrou) wrote :
Revision history for this message
Ryan Harper (raharper) wrote : Re: [Bug 1680917] Re: Machine fails to boot if MAAS server is not available

Thanks for confirming that Secure Boot is not enabled.
I don't understand then why installing signed packages is required...
SecureBoot enabled *surely* requires signed packages but not quite
sure why if it's disabled the BIOS cares; that kinda feels like a BIOS
bug.

On Fri, Apr 7, 2017 at 4:32 PM, Michael Iatrou <email address hidden>
wrote:

> @Ryan
>
> Here is the config from my system:
>
> https://imagebin.ca/v/3II8ELx9oJsE
> https://imagebin.ca/v/3II8W9Kd2Leh
> https://imagebin.ca/v/3II9B3AOdE3L
>
> --
> You received this bug notification because you are subscribed to curtin.
> Matching subscriptions: curtin-bugs-all
> https://bugs.launchpad.net/bugs/1680917
>
> Title:
> Machine fails to boot if MAAS server is not available
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/curtin/+bug/1680917/+subscriptions
>

Changed in maas:
milestone: 2.2.0rc2 → 2.2.0rc3
Changed in maas:
milestone: 2.2.0rc3 → 2.2.1
Changed in maas:
milestone: 2.2.1 → 2.2.0rc4
assignee: nobody → Blake Rouse (blake-rouse)
Changed in maas:
status: Incomplete → Invalid
Changed in maas:
assignee: Blake Rouse (blake-rouse) → nobody
Changed in curtin:
status: New → Triaged
importance: Undecided → Critical
Revision history for this message
Ryan Harper (raharper) wrote :

If the importance is Critical, then can we get the required information?

When should curtin install shim or signed grub? how is curtin to know which
package to install if SecureBoot isnt' enabled?

Questions in Comment 5, 9 are unanswered at this point, we don't have enough
information to make forward progress.

Changed in curtin:
status: Triaged → Incomplete
Changed in curtin:
status: Incomplete → Triaged
assignee: nobody → Blake Rouse (blake-rouse)
status: Triaged → In Progress
Ryan Harper (raharper)
Changed in curtin:
status: In Progress → Fix Committed
Revision history for this message
Scott Moser (smoser) wrote : Fixed in Curtin 17.1

This bug is believed to be fixed in curtin in 17.1. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

Changed in curtin:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.