2.8.2 deploy and commission fails corrupted bootorder variable detected

Bug #1894217 reported by David van der Spek
28
This bug affects 5 people
Affects Status Importance Assigned to Milestone
MAAS
Invalid
Undecided
Unassigned
curtin (Ubuntu)
Fix Released
Medium
Ryan Harper

Bug Description

When trying to redeploy OpenStack on Dell R620's in EFI boot mode and MAAS 2.8.2 the deployment failed which I believe was caused due to the previously reported boot order bug causing the system to try to boot from the "ubuntu" entry directly (this is an issue I have experienced before). When trying to recommission with 18.04 (not 20.04 due to #1869116) the commissioning fails and on the machine where the "ubuntu" boot entry was not present the firmware gives an error "Corrupted BootOrder varibale". I have not seen this before, however, I think this could cause serious issues in production.

Related branches

summary: - 2.8.2 commission fails corrupted bootorder variable detected
+ 2.8.2 deploy and commission fails corrupted bootorder variable detected
Revision history for this message
David van der Spek (vanderspek-david) wrote : Re: 2.8.2 fails corrupted bootorder variable detected
Download full text (6.5 KiB)

The error is being caused by the UEFI boot order. Here is a snippit from the log. I have attached the full log

BootCurrent: 0006
Timeout: 0 seconds
BootOrder: 000B,0007,0008,0009,0004
Boot0000* EFI Network 1
Boot0001* EFI Network 2
Boot0002* EFI Network 3
Boot0003* EFI Network 4
Boot0004* EFI Fixed Disk Boot Device 1
Boot0005* EFI Fixed Disk Boot Device 1
Boot0007 EFI Network 2
Boot0008 EFI Network 3
Boot0009 EFI Network 4
Boot000A* EFI Fixed Disk Boot Device 1
Boot000B* ubuntu
Removing duplicate EFI entry (0007, {'name': 'EFI Network 2', 'path': 'PcieRoot(0x0)/Pci(0x1,0x0)/Pci(0x0,0x1)/MAC(ecf4bbcdee55,1)/IPv4(0.0.0.00.0.0.0,0,0)'})
Running command ['unshare', '--fork', '--pid', '--', 'chroot', '/tmp/tmpcty9a921/target', 'efibootmgr', '--bootnum=0007', '--delete-bootnum'] with allowed return codes [0] (capture=False)
BootCurrent: 0006
Timeout: 0 seconds
BootOrder: 000B,0008,0009,0004
Boot0000* EFI Network 1
Boot0001* EFI Network 2
Boot0002* EFI Network 3
Boot0003* EFI Network 4
Boot0004* EFI Fixed Disk Boot Device 1
Boot0005* EFI Fixed Disk Boot Device 1
Boot0008 EFI Network 3
Boot0009 EFI Network 4
Boot000A* EFI Fixed Disk Boot Device 1
Boot000B* ubuntu
Removing duplicate EFI entry (0008, {'name': 'EFI Network 3', 'path': 'PcieRoot(0x0)/Pci(0x1,0x0)/Pci(0x0,0x2)/MAC(ecf4bbcdee56,1)/IPv4(0.0.0.00.0.0.0,0,0)'})
Running command ['unshare', '--fork', '--pid', '--', 'chroot', '/tmp/tmpcty9a921/target', 'efibootmgr', '--bootnum=0008', '--delete-bootnum'] with allowed return codes [0] (capture=False)
BootCurrent: 0006
Timeout: 0 seconds
BootOrder: 000B,0009,0004
Boot0000* EFI Network 1
Boot0001* EFI Network 2
Boot0002* EFI Network 3
Boot0003* EFI Network 4
Boot0004* EFI Fixed Disk Boot Device 1
Boot0005* EFI Fixed Disk Boot Device 1
Boot0009 EFI Network 4
Boot000A* EFI Fixed Disk Boot Device 1
Boot000B* ubuntu
Removing duplicate EFI entry (0009, {'name': 'EFI Network 4', 'path': 'PcieRoot(0x0)/Pci(0x1,0x0)/Pci(0x0,0x3)/MAC(ecf4bbcdee57,1)/IPv4(0.0.0.00.0.0.0,0,0)'})
Running command ['unshare', '--fork', '--pid', '--', 'chroot', '/tmp/tmpcty9a921/target', 'efibootmgr', '--bootnum=0009', '--delete-bootnum'] with allowed return codes [0] (capture=False)
BootCurrent: 0006
Timeout: 0 seconds
BootOrder: 000B,0004
Boot0000* EFI Network 1
Boot0001* EFI Network 2
Boot0002* EFI Network 3
Boot0003* EFI Network 4
Boot0004* EFI Fixed Disk Boot Device 1
Boot0005* EFI Fixed Disk Boot Device 1
Boot000A* EFI Fixed Disk Boot Device 1
Boot000B* ubuntu
Running command ['udevadm', 'settle'] with allowed return codes [0] (capture=False)
TIMED subp(['udevadm', 'settle']): 0.009
Running command ['umount', '/tmp/tmpcty9a921/target/sys/firmware/efi/efivars'] with allowed return codes [0] (capture=False)
Running command ['umount', '/tmp/tmpcty9a921/target/sys'] with allowed return codes [0] (capture=False)
Running command ['umount', '/tmp/tmpcty9a921/target/run'] with allowed return codes [0] (capture=False)
Running command ['umount', '/tmp/tmpcty9a921/target/proc'] with allowed return codes [0] (capture=False)
Running command ['umount', '/tmp/tmpcty9a921/target/dev'] with allowed return codes [0] (capture=False)
Running command ['mount', '--bind', '/dev', '/tmp/tmpc...

Read more...

summary: - 2.8.2 deploy and commission fails corrupted bootorder variable detected
+ 2.8.2 fails corrupted bootorder variable detected
Revision history for this message
David van der Spek (vanderspek-david) wrote :
summary: - 2.8.2 fails corrupted bootorder variable detected
+ 2.8.2 deploy and commission fails corrupted bootorder variable detected
Revision history for this message
Ryan Harper (raharper) wrote :

Thanks for filing this bug and linking to the other EFI related one. They aren't the same.
This looks like a bug in curtin's remove duplicate code:

Removing duplicate EFI entry (0006, {'name': 'EFI Network 1', 'path': 'PcieRoot(0x0)/Pci(0x1,0x0)/Pci(0x0,0x0)/MAC(ecf4bbcdee54,1)/IPv4(0.0.0.00.0.0.0,0,0)'})
Removing duplicate EFI entry (0007, {'name': 'EFI Network 2', 'path': 'PcieRoot(0x0)/Pci(0x1,0x0)/Pci(0x0,0x1)/MAC(ecf4bbcdee55,1)/IPv4(0.0.0.00.0.0.0,0,0)'})

Those entries aren't dupicates, the macaddr in the 'path' element differ. And, I don't think we should ever attempt to remove the the BootCurrent value. And, there should be a curtin.grub config option to enable/disable remove duplicates.

Changed in curtin (Ubuntu):
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
David van der Spek (vanderspek-david) wrote :

I'd like to quickly add that reverting back to MAAS version: 2.8.1 (8567-g.c4825ca06) has allowed me to successfully deploy machines again. Are there any more logs I could send or testing I could do to help solve this issue?

Dan Watkins (oddbloke)
Changed in curtin (Ubuntu):
status: Confirmed → Triaged
Revision history for this message
Ryan Harper (raharper) wrote :

Hi David,

would you be able to attach:

1) efibootmgr -v output (with all of the network entries present? it appears that there
may be some duplicate entries in the menu; I'd like to confirm that by examining verbose
output. Specifically Boot0000 -> Boot0003.

Boot0000* EFI Network 1
Boot0001* EFI Network 2
Boot0002* EFI Network 3
Boot0003* EFI Network 4

2) could you attach the deploy log from your successful deployment?

Changed in curtin (Ubuntu):
status: Triaged → Incomplete
Revision history for this message
David van der Spek (vanderspek-david) wrote :
Download full text (6.2 KiB)

Hi Ryan,

I just successfully deployed OpenStack using MAAS 2.8.1. I did need to remove and re-add the machines from MAAS before I could get them to work again. Using juju to SSH into the three bare metal machines this is the output of efibootmgr. In the attachment I have the installation logs from MAAS of the 3 machines in the same order. I hope these logs are useful. If not I can retry deploying with 2.8.2 or whatever might be needed.

BootCurrent: 0004
Timeout: 0 seconds
BootOrder: 0004,0010,000C,000D,000E,000F
Boot0000* EFI Network 1 PcieRoot(0x0)/Pci(0x1,0x0)/Pci(0x0,0x0)/MAC(ecf4bbce7b8c,1)/IPv4(0.0.0.00.0.0.0,0,0)
Boot0001 EFI Network 2 PcieRoot(0x0)/Pci(0x1,0x0)/Pci(0x0,0x1)/MAC(ecf4bbce7b8d,1)/IPv4(0.0.0.00.0.0.0,0,0)
Boot0002 EFI Network 3 PcieRoot(0x0)/Pci(0x1,0x0)/Pci(0x0,0x2)/MAC(ecf4bbce7b8e,1)/IPv4(0.0.0.00.0.0.0,0,0)
Boot0003 EFI Network 4 PcieRoot(0x0)/Pci(0x1,0x0)/Pci(0x0,0x3)/MAC(ecf4bbce7b8f,1)/IPv4(0.0.0.00.0.0.0,0,0)
Boot0004* EFI Network 1 PcieRoot(0x0)/Pci(0x1,0x0)/Pci(0x0,0x0)/MAC(ecf4bbce7b8c,1)/IPv4(0.0.0.00.0.0.0,0,0)
Boot0005* EFI Fixed Disk Boot Device 1 PcieRoot(0x0)/Pci(0x2,0x2)/Pci(0x0,0x0)/Ctrl(0x0)/SCSI(0,0)/HD(1,GPT,a54cd15f-3741-405f-9075-7f60363b6f34,0x800,0x100000)
Boot0006* EFI Fixed Disk Boot Device 1 PcieRoot(0x0)/Pci(0x2,0x2)/Pci(0x0,0x0)/SAS(4433221105000000,0,1,NoTopology)/HD(1,GPT,5603b135-0290-4962-95a8-1cafd309d1fb,0x800,0x100000)
Boot0007* EFI Network 1 PcieRoot(0x0)/Pci(0x1,0x0)/Pci(0x0,0x0)/MAC(ecf4bbce7b8c,1)/IPv4(0.0.0.00.0.0.0,0,0)
Boot0008* EFI Network 2 PcieRoot(0x0)/Pci(0x1,0x0)/Pci(0x0,0x1)/MAC(ecf4bbce7b8d,1)/IPv4(0.0.0.00.0.0.0,0,0)
Boot0009* EFI Network 3 PcieRoot(0x0)/Pci(0x1,0x0)/Pci(0x0,0x2)/MAC(ecf4bbce7b8e,1)/IPv4(0.0.0.00.0.0.0,0,0)
Boot000A* EFI Fixed Disk Boot Device 1 PcieRoot(0x0)/Pci(0x2,0x2)/Pci(0x0,0x0)/SAS(4433221105000000,0,1,NoTopology)/HD(1,GPT,df3108c7-4b8f-483b-8440-f738fb75c9e2,0x800,0x100000)
Boot000B* EFI Network 4 PcieRoot(0x0)/Pci(0x1,0x0)/Pci(0x0,0x3)/MAC(ecf4bbce7b8f,1)/IPv4(0.0.0.00.0.0.0,0,0)
Boot000C* EFI Network 2 PcieRoot(0x0)/Pci(0x1,0x0)/Pci(0x0,0x1)/MAC(ecf4bbce7b8d,1)/IPv4(0.0.0.00.0.0.0,0,0)
Boot000D* EFI Network 3 PcieRoot(0x0)/Pci(0x1,0x0)/Pci(0x0,0x2)/MAC(ecf4bbce7b8e,1)/IPv4(0.0.0.00.0.0.0,0,0)
Boot000E* EFI Network 4 PcieRoot(0x0)/Pci(0x1,0x0)/Pci(0x0,0x3)/MAC(ecf4bbce7b8f,1)/IPv4(0.0.0.00.0.0.0,0,0)
Boot000F* EFI Fixed Disk Boot Device 1 PcieRoot(0x0)/Pci(0x2,0x2)/Pci(0x0,0x0)/SAS(4433221105000000,0,1,NoTopology)/HD(1,GPT,845a3fc2-60d2-4639-ad8b-b36806bdd567,0x800,0x100000)
Boot0010* ubuntu HD(1,GPT,845a3fc2-60d2-4639-ad8b-b36806bdd567,0x800,0x100000)/File(\EFI\ubuntu\shimx64.efi)

BootCurrent: 0006
Timeout: 0 seconds
BootOrder: 0006,000B,0007,0008,0009,0004
Boot0000* EFI Network 1 PcieRoot(0x0)/Pci(0x1,0x0)/Pci(0x0,0x0)/MAC(ecf4bbcdee54,1)/IPv4(0.0.0.00.0.0.0,0,0)
Boot0001* EFI Network 2 PcieRoot(0x0)/Pci(0x1,0x0)/Pci(0x0,0x1)/MAC(ecf4bbcdee55,1)/IPv4(0.0.0.00.0.0.0,0,0)
Boot0002* EFI Network 3 PcieRoot(0x0)/Pci(0x1,0x0)/Pci(0x0,0x2)/MAC(ecf4bbcdee56,1)/IPv4(0.0.0.00.0.0.0,0,0)
Boot0003* EFI Network 4 PcieRoot(0x0)/Pci(0x1,0x0)/Pci(0x0,0x3)/MAC(ecf4bbcdee57,1)/IPv4(0.0.0.00.0.0.0,0,0)
Boot0004* EFI Fixed Disk Boot Device 1 PcieRoot(0x0...

Read more...

Revision history for this message
Seth Tanner (sjtanner) wrote :

I am seeing the same issue with Dell R620's and MaaS 2.7.3, had previously deployed nodes successfully with 2.7.1. Have other servers that are not Dell R620's and they all deploy without issue.

Revision history for this message
Ryan Harper (raharper) wrote :

Thanks for the logs.

They certainly seem to be duplicate entries, so AFAICT, the removal seems reasonable.
However, removing BootCurrent shouldn't happen as we plan to boot back into that entry anyhow.

I've got a branch that I'll build and put into a PPA so you can test here shortly. It does two things:

1) does not remove BootCurrent if it is a duplicate
2) adds a new curtin grub config to enable users to override the duplicate removal behavior

Revision history for this message
Ryan Harper (raharper) wrote :

Thanks for the logs.

Changed in curtin (Ubuntu):
assignee: nobody → Ryan Harper (raharper)
status: Incomplete → In Progress
Revision history for this message
Jeff Lane  (bladernr) wrote :

Hi Ryan, I can, I believe, reproduce this exact bug on one of the systems I'm trying to run cert on as well. I found this bug while in the process of filing my own. This is currently gating certification for that machine on Focal.

tags: added: blocks-hwcert-server
Revision history for this message
Ryan Harper (raharper) wrote :

Hi Jeff,

That's great. I'm just pushing this into a ppa for testing:

https://launchpad.net/~raharper/+archive/ubuntu/lp1894217

Groovy should publish any minute now, and I'll copy into focal/bionic/xenial pockets shortly after.

Please let me know if this fixes things.

Revision history for this message
Seth Tanner (sjtanner) wrote :

Hey Ryan,
The update to curtin solved the issue for us.

Revision history for this message
Ryan Harper (raharper) wrote :

@Seth

Thanks for verifying!

Revision history for this message
Server Team CI bot (server-team-bot) wrote :

This bug is fixed with commit 06be71e1 to curtin on branch master.
To view that commit see the following URL:
https://git.launchpad.net/curtin/commit/?id=06be71e1

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package curtin - 20.1-29-g81144052-0ubuntu1

---------------
curtin (20.1-29-g81144052-0ubuntu1) groovy; urgency=medium

  * New upstream snapshot.
    - vmtest: Fix multiple issues with vmtest on master
    - Refactor uefi_remove_duplicates into find/remove functions for reuse
    - distro: run apt-get clean after dist-upgrade, install, upgrade
    - curthooks: UEFI remove dupes: don't remove BootCurrent, config option
      (LP: #1894217)
    - Pin the dependency on pyrsistent [Paride Legovini]
    - restore default of grub.update_nvram to True in install_grub
      [Michael Hudson-Doyle]
    - block: disk_to_byid_path handle missing /dev/disk/by-id directory
      (LP: #1876258)
    - UEFI: Handle missing BootCurrent entry when reordering UEFI entries
      (LP: #1789650)
    - dasd: fix off-by-one device_id devno range check [Paride Legovini]
    - curthooks: uefi_find_grub_device_ids handle type:mount without path
      (LP: #1892242)
    - netplan openvswitch yaml changed (LP: #1891608)
    - tools/curtainer: do not wait for snapd.seeded.service
    - tools/curtainer: enable using ubuntu-minimal images
    - vmtests: add Groovy [Paride Legovini]
    - Drop the Eoan vmtests (EOL) [Paride Legovini]
    - tools: rename remove-vmtest-release to vmtest-remove-release
    - Snooze the tests failing because of LP: #1861941 for two more months
      [Paride Legovini]
    - LP: #1671951 is Fix Released => Drop the PPA [Paride Legovini]
    - swaps: handle swapfiles on btrfs (LP: #1884161)
    - curtainer: fail is masking of zfs-mount or zfs-share fails
      [Paride Legovini]
    - multipath: handle multipath nvme name fields correctly (LP: #1878041)
    - curtainer: mask the zfs-mount and zfs-share services [Paride Legovini]
    - tools/jenkins-runner: shuffle test-cases to randomize load
      [Paride Legovini]
    - Add Trusty/UEFI/HWE-X vmtest, drop realpath add, drop shell code
    - LP: #1881977 - Install realpath on Trusty UEFI. [Lee Trager]
    - vmtests: fix PreservePartitionWipeVg storage config
    - Fix mdraid name creates broken configuration
      [James Falcon] (LP: #1803933)
    - vmtests: update skiptests
    - vmtest: allow installed centos images to reboot (LP: #1881011)

 -- Paride Legovini <email address hidden> Mon, 14 Sep 2020 17:53:15 +0200

Changed in curtin (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
Jeff Lane  (bladernr) wrote :

This is installed on a MAAS server but installs still fail with the same issue:

ubuntu@maas-6-2-3rd-admin-nodes:~$ apt-cache policy curtin
curtin:
  Installed: 20.1-29-g81144052-0ubuntu1~18.04.1~dannf.1
  Candidate: 20.1-29-g81144052-0ubuntu1~18.04.1~dannf.1

Dann threw this into a PPA and installed it since 20.1-29-g81144052-0ubuntu1 doesn't appear in updates

 curtin | 20.1-2-g42a9667f-0ubuntu1~18.04.1 | bionic-updates | source, all
 curtin | 20.1-2-g42a9667f-0ubuntu1~20.04.1 | focal-updates | source, all

With 20.1-29 on the MAAS server (and presumably since this has not landed in updates, it's not being installed on the ephemeral either) installs are still failing:

https://pastebin.canonical.com/p/976FvZDDHd/

Changed in curtin (Ubuntu):
status: Fix Released → Confirmed
Revision history for this message
nb (nateybobo) wrote :

Hi @Ryan,

I'm suffering from this bug at the moment as a MAAS user: https://discourse.maas.io/t/maas-is-changing-my-boot-order/3491

I have MAAS installed via Snap. Which installed python3-curtin via: https://github.com/maas/maas/blob/master/snap/snapcraft.yaml

Is there a work around for me?

Revision history for this message
Ryan Harper (raharper) wrote :

@Jeff I cannot read the canonical pastebin output, can you use paste.ubuntu ?

re: -updates; MAAS needs to initiate an SRU for archive users of MAAS.

@nb

I don't know for sure. I was told that MAAS snaps would pull curtin from git master (or the respective release branches).

Revision history for this message
dann frazier (dannf) wrote :
Download full text (3.2 KiB)

In comment #16 Jeff noticed that I had backported a package to see it if fixed the issue we were seeing. It did not - but it did have the patches intended to fix this issue. So I think we're seeing an additional issue. I added some debugging code to figure out what was going on:

@@ -487,15 +489,18 @@ def uefi_find_duplicate_entries(grubcfg,
     to_remove = []
     if efi_output is None:
         efi_output = util.get_efibootmgr(target=target)
+ LOG.info("DANNF: %s", efi_output)
     entries = efi_output.get('entries', {})
     current_bootnum = efi_output.get('current', None)
     # adding BootCurrent to seen first allows us to remove any other duplicate
     # entry of BootCurrent.

With this, I see:
DANNF: {'current': '0003', 'timeout': '5 seconds', 'order': ['0000'], 'entries': {'0000': {'name': 'ubuntu', 'path': 'HD(1,GPT,e45478ae-8e68-4171-a165-5a8ac6654f30,0x800,0x100000)/File(\\EFI\\ubuntu\\shimx64.efi)'}}}

Notice we have no entry for current/0003. I have not inspected the code to determine whether or not this is an issue parsing efibootmgr output or the actual variable state.

I made a few hacks which allowed me to deploy:

Index: curtin-20.1-2-g42a9667f/curtin/commands/curthooks.py
===================================================================
--- curtin-20.1-2-g42a9667f.orig/curtin/commands/curthooks.py
+++ curtin-20.1-2-g42a9667f/curtin/commands/curthooks.py
@@ -451,7 +451,9 @@ def uefi_reorder_loaders(grubcfg, target
         if currently_booted:
             if currently_booted in boot_order:
                 boot_order.remove(currently_booted)
- boot_order = [currently_booted] + boot_order
+ entries = efi_output.get('entries', {})
+ if currently_booted in entries.keys():
+ boot_order = [currently_booted] + boot_order
             new_boot_order = ','.join(boot_order)
             LOG.debug(
                 "Setting currently booted %s as the first "
@@ -487,15 +489,18 @@ def uefi_find_duplicate_entries(grubcfg,
     to_remove = []
     if efi_output is None:
         efi_output = util.get_efibootmgr(target=target)
+ LOG.info("DANNF: %s", efi_output)
     entries = efi_output.get('entries', {})
     current_bootnum = efi_output.get('current', None)
     # adding BootCurrent to seen first allows us to remove any other duplicate
     # entry of BootCurrent.
- if current_bootnum:
+ if current_bootnum and current_bootnum in entries.keys():
         seen.add(tuple(entries[current_bootnum].items()))
     for bootnum in sorted(entries):
         if bootnum == current_bootnum:
             continue
+ if bootnum not in entries.keys():
+ continue
         entry = entries[bootnum]
         t = tuple(entry.items())
         if t not in seen:

Post-deploy I checked to see if the no-entry-for-current issue persists, but it does not appear to:

$ sudo efibootmgr -v
BootCurrent: 0002
Timeout: 5 seconds
BootOrder: 0003,0004,0005,0002,0006,0001,0000
Boot0000 ubuntu HD(1,GPT,e45478ae-8e68-4171-a165-5a8ac6654f30,0x800,0x100000)/File(\EFI\UBUNTU\SHIMX64.EFI)
Boot0002* UEFI: NIC1 IPv4 Quanta Dual Port 10G BASE-T Mezzanine PciRoot(0x0)/Pci(0x1,0x0)/Pci(0x0,0...

Read more...

Revision history for this message
Ryan Harper (raharper) wrote :

@Dann Thanks.

Can you provide the log without your changes? Or at least the efibootmgr dump before curtin starts processing things?

> Notice we have no entry for current/0003. I have not inspected the code to determine whether or not this is an issue parsing efibootmgr output or the actual variable state.

Is this not the bug? The full logs will show what efibootmgr -v shows. I that state is not good (BootCurrent is set, but no entry to match) then that's firmware bug (IMHO). If current entry is present then we likely do have a parsing bug.

Either way the full log with efibootmgr -v dump in the logs will shed light on what to do next.

Changed in curtin (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
nb (nateybobo) wrote :

@Ryan, FYI - I ran a successful test using a rebuilt Snap from the latest Curtin PPA:
snap refresh maas --channel=latest/edge/curtin-stable

My actions can be found here:
https://discourse.maas.io/t/maas-is-changing-my-boot-order/3491/9

Revision history for this message
dann frazier (dannf) wrote : Re: [Bug 1894217] Re: 2.8.2 deploy and commission fails corrupted bootorder variable detected

On Wed, Oct 7, 2020 at 4:30 PM Ryan Harper <email address hidden> wrote:
>
> @Dann Thanks.
>
> Can you provide the log without your changes? Or at least the
> efibootmgr dump before curtin starts processing things?

Well nuts. I got access to the system again and added that debug, and
now the problem has vanished:

[ 188.557031] cloud-init[4885]: DANNF:
[ 188.557313] cloud-init[4885]: BootCurrent: 0002
[ 188.557631] cloud-init[4885]: Timeout: 5 seconds
[ 188.558379] cloud-init[4885]: BootOrder: 0004,0005,0002,0006,0003,0001,0000
[ 188.560161] cloud-init[4885]: Boot0000 ubuntu
HD(1,GPT,afcc79a9-d867-481c-8d98-fa0fb8c6063c,0x800,0x100000)/File(\EFI\UBUNTU\SHIMX64.EFI)
[ 188.562671] cloud-init[4885]: Boot0002* UEFI: NIC1 IPv4 Quanta Dual
Port 10G BASE-T Mezzanine
PciRoot(0x0)/Pci(0x1,0x0)/Pci(0x0,0x0)/MAC(2c600c6fbb15,1)/IPv4(0.0.0.00.0.0.0,0,0)..BO

.. and the system deploys fine (w/o my workaround hack). I'm pretty
sure I've seen issues like this before with early UEFI dev boards -
probably just need to figure out the right set of circumstances to
recreate it. I'll keep poking at it.

Chad Smith (chad.smith)
Changed in curtin (Ubuntu):
status: Incomplete → Fix Committed
Revision history for this message
dann frazier (dannf) wrote :

@chad.smith - was there an additional fix applied for the issues being discussed in comments #16 & further? Are you looking for us to open a separate issue for that?

Revision history for this message
dann frazier (dannf) wrote :

@raharper - I was able to reproduce on a second system of the same model/firmware, and the debug log is attached, with the following change:

Index: curtin-20.1-29-g81144052/curtin/util.py
===================================================================
--- curtin-20.1-29-g81144052.orig/curtin/util.py
+++ curtin-20.1-29-g81144052/curtin/util.py
@@ -886,6 +886,7 @@ def get_efibootmgr(target=None):
     """
     with ChrootableTarget(target=target) as in_chroot:
         stdout, _ = in_chroot.subp(['efibootmgr', '-v'], capture=True)
+ LOG.info("DANNF:\n%s", stdout)
         output = parse_efibootmgr(stdout)
         return output

Revision history for this message
Ryan Harper (raharper) wrote :

Thanks for the logs Dann,

The error comes from efibootmgr itself; so I don't think this is the same issue. You can open a new bug; but it still looks like a firmware/platform issue.

efibootmgr claims there are 5 entries; and at least after a grub install, there is only one. What happened to all of those entries?

BootCurrent: 0003
Timeout: 10 seconds
BootOrder: 0000,0003,0004,0005,0006,0001
Boot0000* ubuntu HD(1,GPT,0937ffdf-628c-4161-8b2f-5920235669c6,0x800,0x100000)/File(\EFI\ubuntu\shimx64.efi)

This bug and curtin and the change that's merged prevents curtin from removing the BootCurrent entry, 0003; which it did not; otherwise we'd see:

Removing duplicate EFI entry

dann frazier (dannf)
Changed in maas:
status: New → Invalid
Revision history for this message
dann frazier (dannf) wrote :

OK, split out into bug 1899993

Revision history for this message
Paride Legovini (paride) wrote :

I'm setting this to Fix Released again, as the fix released with curtin 20.1-29-g81144052-0ubuntu1 did address the curtin bug. The remaining issue (LP: #1899993) was in the kernel and that's Fix Released too.

Changed in curtin (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.