Ubuntu removes CentOS UEFI boot entry on shutdown

Bug #1906379 reported by Lee Trager
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
High
Unassigned
curtin
Fix Released
Undecided
Unassigned

Bug Description

When MAAS deploys CentOS 7/8 it boots into the user selected commissioning environment(Xenial, Bionic, or Focal) and runs Curtin. If the system uses UEFI Curtin creates a boot entry with efibootmgr. Adding and ordering the entry is logged and I have manually confirmed it is created properly. Once Ubuntu shuts down to boot into the deployed system it seems to be removing CentOS's UEFI boot entry. I have confirmed this happens during the shutdown process by using the system firmware to select a temporary boot device(Ubuntu LiveCD) and running efibootmgr -v to see only the system default boot entries.

This can cause booting failures when network booting isn't available or a grub bug such as LP:1906344.

Reproduction:
1. Deploy CentOS 7 or 8 using MAAS to a UEFI system.
2. Verify UEFI boot entry was added in installation/Curtin log
3. In deployed system or rescue environment see `efibootmgr -v` does not include CentOS.

Related branches

Lee Trager (ltrager)
Changed in maas:
milestone: none → 2.9.0rc4
Lee Trager (ltrager)
description: updated
Lee Trager (ltrager)
Changed in maas:
milestone: 2.9.0rc4 → none
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

I'm not sure if the entry is deleted, or indeed, fails to be added.

I'm deploying onto nvme drive and whilst:

chroot /mnt efibootmgr -v --create --write-signature --label centos --disk /dev/nvme0n1 --part 1 --loader /boot/efi/EFI/centos/shimx64.efiCould not prepare Boot variable: No such file or directory

fails, (aka using efibootmgr from the chrooted system) whilst booted into ubuntu's v5.4 kernel.

But running the same command from focal environment (i.e. without chroot). Does succeed fine:

efibootmgr -v --create --write-signature --label centos --disk /dev/nvme0n1 --part 1 --loader /boot/efi/EFI/centos/shimx64.efi

$ sudo efibootmgr -v --create --write-signature --label centos --disk /dev/nvme0n1 --part 1 --loader /boot/efi/EFI/centos/shimx64.efi
BootNext: 0002
BootCurrent: 0002
Timeout: 2 seconds
BootOrder: 0000,0002,0003,0004,0005,0006,0001
Boot0001* UEFI: Built-in EFI Shell VenMedia(5023b95c-db26-429b-a648-bd47664c8012)..BO
Boot0002* UEFI: PXE IP4 Intel(R) I350 Gigabit Network Connection PciRoot(0x3)/Pci(0x3,0x4)/Pci(0x0,0x0)/MAC(18c04d711026,1)/IPv4(0.0.0.00.0.0.0,0,0)..BO
Boot0003* UEFI: PXE IP4 Intel(R) I350 Gigabit Network Connection PciRoot(0x3)/Pci(0x3,0x4)/Pci(0x0,0x1)/MAC(18c04d711027,1)/IPv4(0.0.0.00.0.0.0,0,0)..BO
Boot0004* UEFI: PXE IP6 Intel(R) I350 Gigabit Network Connection PciRoot(0x3)/Pci(0x3,0x4)/Pci(0x0,0x0)/MAC(18c04d711026,1)/IPv6([::]:<->[::]:,0,0)..BO
Boot0005* UEFI: PXE IP6 Intel(R) I350 Gigabit Network Connection PciRoot(0x3)/Pci(0x3,0x4)/Pci(0x0,0x1)/MAC(18c04d711027,1)/IPv6([::]:<->[::]:,0,0)..BO
Boot0006* UEFI OS HD(1,GPT,86ad9d07-0596-4077-965b-bded3e4da2e5,0x800,0x100000)/File(\EFI\BOOT\BOOTX64.EFI)..BO
Boot0000* centos HD(1,GPT,86ad9d07-0596-4077-965b-bded3e4da2e5,0x800,0x100000)/File(\boot\efi\EFI\centos\shimx64.efi)

I wonder if there rhel 8 efibootmgr is compatible with v5.4 ubuntu kernel, w.r.t. nvme drives.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

So my efibootmgr issue is the same as

Bug-Ubuntu: https://bugs.launchpad.net/bugs/1891718
Bug: https://github.com/rhboot/efivar/issues/157
Origin: upstream, https://github.com/rhboot/efivar/commit/4e12f997f8b6af76ef65e7045c232b7d642a1af4

because that is not fixed in the CentOS efivar package.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

If I set "nvme_core.multipath=N" in /MAAS/r/settings/configuration/kernel-parameters

I can deploy CentOS 8 successfully and ssh into it.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

CentOS 7 does not appear to support NVME drives. Deploying onto sda drive works.
CentOS 8 does not appear to support multipath NVME in their efibootmgr.
Deploying with multipath-nvme disabled works, and deploying onto sda drive works.

centos efibootmgr entry is present, working, and boots in all of those cases. And is updated to the new one each time with the new GPT partuuid.

I'm not sure at this point what is wrong, or how to reproduce this.

Somehow I did observe that I had to reboot the machine by-hand, before centos user appeared and was usable.

Can you please explain in more detail what you are observing? What are steps to reproduce this issue? and/or logs?

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

I wonder if i'm not catching this at the right time.

Revision history for this message
Lee Trager (ltrager) wrote :
Download full text (3.4 KiB)

I can produce this using a couple of the machines in our CI and KVM. The KVM uses /usr/share/OVMF/OVMF_CODE.secboot.fd and all virtio devices[1]. The attached Curtin log is deploying the latest CentOS 8 image to KVM running UEFI. You can see the UEFI entry is successfully created and verified.

Running command ['unshare', '--fork', '--pid', '--', 'chroot', '/tmp/tmp41l7sf8c/target', 'efibootmgr', '--create', '--write-signature', '--label', 'centos', '--disk', '/dev/vda', '--part', '1', '--loader', '/boot/efi/EFI/centos/shimx64.efi'] with allowed return codes [0] (capture=True)

Running command ['unshare', '--fork', '--pid', '--', 'chroot', '/tmp/tmp41l7sf8c/target', 'efibootmgr', '-o', '0003,0007,0004,0001,0002,0006,0000,0005'] with allowed return codes [0] (capture=False)
BootCurrent: 0003
Timeout: 0 seconds
BootOrder: 0003,0007,0004,0001,0002,0006,0000,0005
Boot0000* UiApp
Boot0001* UEFI Misc Device
Boot0002* UEFI Misc Device 2
Boot0003* UEFI PXEv4 (MAC:52540084334D)
Boot0004* UEFI HTTPv4 (MAC:52540084334D)
Boot0005* EFI Internal Shell
Boot0006* UEFI QEMU DVD-ROM QM00001
Boot0007* centos

I manually verified this by SSHing into the VM while it was deploying, preventing rebooting with `sudo touch /tmp/block-reboot`, waiting for MAAS to receive the Curtin log, and then verifying in the Focal ephemeral environment with `sudo efibootmgr -v` that centos was added to the boot order.

# efibootmgr -v
BootCurrent: 0003
Timeout: 0 seconds
BootOrder: 0003,0007,0004,0001,0002,0006,0000,0005
Boot0000* UiApp FvVol(7cb8bdc9-f8eb-4f34-aaea-3ee4af6516a1)/FvFile(462caa21-7614-4503-836e-8ab6f4662331)
Boot0001* UEFI Misc Device PciRoot(0x0)/Pci(0x2,0x4)/Pci(0x0,0x0)N.....YM....R,Y.
Boot0002* UEFI Misc Device 2 PciRoot(0x0)/Pci(0x2,0x5)/Pci(0x0,0x0)N.....YM....R,Y.
Boot0003* UEFI PXEv4 (MAC:52540084334D) PciRoot(0x0)/Pci(0x2,0x0)/Pci(0x0,0x0)/MAC(52540084334d,1)/IPv4(0.0.0.00.0.0.0,0,0)N.....YM....R,Y.
Boot0004* UEFI HTTPv4 (MAC:52540084334D) PciRoot(0x0)/Pci(0x2,0x0)/Pci(0x0,0x0)/MAC(52540084334d,1)/IPv4(0.0.0.00.0.0.0,0,0)/Uri()N.....YM....R,Y.
Boot0005* EFI Internal Shell FvVol(7cb8bdc9-f8eb-4f34-aaea-3ee4af6516a1)/FvFile(7c04a583-9e3e-4f1c-ad65-e05268d0b4d1)
Boot0006* UEFI QEMU DVD-ROM QM00001 PciRoot(0x0)/Pci(0x1f,0x2)/Sata(0,65535,0)N.....YM....R,Y.
Boot0007* centos HD(1,GPT,950e1d23-47b8-4c55-9180-41012f34861b,0x800,0x100000)/File(\boot\efi\EFI\centos\shimx64.efi

Once I rebooted the machine to allow the deployment to finish I checked `efibootmgr -v` in CentOS8 and now I see

# efibootmgr -v
BootCurrent: 0003
Timeout: 0 seconds
BootOrder: 0003,0004,0001,0002,0006,0000,0005
Boot0000* UiApp FvVol(7cb8bdc9-f8eb-4f34-aaea-3ee4af6516a1)/FvFile(462caa21-7614-4503-836e-8ab6f4662331)
Boot0001* UEFI Misc Device PciRoot(0x0)/Pci(0x2,0x4)/Pci(0x0,0x0)N.....YM....R,Y.
Boot0002* UEFI Misc Device 2 PciRoot(0x0)/Pci(0x2,0x5)/Pci(0x0,0x0)N.....YM....R,Y.
Boot0003* UEFI PXEv4 (MAC:52540084334D) PciRoot(0x0)/Pci(0x2,0x0)/Pci(0x0,0x0)/MAC(52540084334d,1)/IPv4(0.0.0.00.0.0.0,0,0)N.....YM....R,Y.
Boot0004* UEFI HTTPv4 (MAC:52540084334D) PciRoot(0x0)/Pci(0x2,0x0)/Pci(0x0,0x0)/MAC(52540084334d,1)/IPv4(0.0.0.00.0.0.0,0,0)/Uri()N.....YM....R,Y.
Boot0005* EFI Internal Shell F...

Read more...

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

@ltrager

Thanks for this. I do see that you are able to reproduce this with "normal" non-nvme drivers, and also using KVM.

Let me try to reproduce this again.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Boot0000* centos HD(1,GPT,952730e9-c1fb-4ecb-9e9d-9bb55d99979f,0x800,0x100000)/File(\boot\efi\EFI\centos\shimx64.efi)

$ sudo mount /dev/disk/by-partuuid/952730e9-c1fb-4ecb-9e9d-9bb55d99979f /mnt/

$ ls /mnt/boot/efi/EFI/centos/shimx64.efi
ls: cannot access '/mnt/boot/efi/EFI/centos/shimx64.efi': No such file or directory

"\boot\efi" is the mountpoint of the ESP. But the path that needs to go into the UEFI variable should be the one relative the root of ESP. I.e. "\EFI\centos\shimx64.efi".

Plus, it is probably best to create the entry the same way that the fallback mode would create

cat /mnt/EFI/centos/BOOTX64.CSV
��shimx64.efi,CentOS Linux,,This is the boot entry for CentOS Linux

Aka, "CentOS Linux".

Whatever is creating the centos efi entry is adding a wrong prefix.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

suspecting curtin bug

Changed in curtin:
status: New → Incomplete
Changed in curtin:
status: Incomplete → Confirmed
Revision history for this message
Server Team CI bot (server-team-bot) wrote :

This bug is fixed with commit 56cc9cce to curtin on branch master.
To view that commit see the following URL:
https://git.launchpad.net/curtin/commit/?id=56cc9cce

Changed in curtin:
status: Confirmed → Fix Committed
Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote : Fixed in curtin version 21.2.

This bug is believed to be fixed in curtin in version 21.2. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

Changed in curtin:
status: Fix Committed → Fix Released
Changed in maas:
assignee: Lee Trager (ltrager) → nobody
Changed in maas:
milestone: none → 3.5.0
status: Triaged → Fix Committed
Changed in maas:
milestone: 3.5.0 → 3.5.0-beta1
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.