UEFI installation overrides firmware PXE-boot settings

Bug #1311827 reported by Rod Smith
36
This bug affects 6 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
High
Blake Rouse
curtin
Fix Released
High
Blake Rouse
curtin (Ubuntu)
Fix Released
Medium
Unassigned
Trusty
Fix Released
Medium
Unassigned
Vivid
Fix Released
Medium
Unassigned

Bug Description

=== Begin SRU Template ===
[Description]
Installing a node configured to boot in EFI/UEFI mode results in Ubuntu registering GRUB with the firmware, thus overriding the node's original PXE-boot setting. This can be seen via efibootmgr after the node has been started by running 'efibootmgr -v'.

The solution is to invoke grub-install with '--no-nvram'. That instructs grub to not update the PXE boot setting.

Curtin code detects if the --no-nvram flag is available and uses it if it is.

[Impact]
MAAS generally works by expecting a system to always request boot from the network. MAAS then instructs it to boot from the local disk when it is installed, and feeds it network boot information for install or commissioning or enlistment.

The issue here is that we would install to the system and configure the system to boot from the local disk. The end result is that installation works once, but when the system is returned to maas, a subsequent installation will fail.

[Test Case]
a.) enlist a EFI/UEFI system into maas with network boot configured
b.) acquire and start system to install 14.04
c.) release system
d.) acquire and start system to install 14.04

Currently step 'd' will boot from local disk. After fix is applied, system will boot from network and perform install correctly.

[Regression Potential]
Seems fairly unlikely for regression. I can't think of a specific case that would fail that previously succeeded.

[Other]
As grub in 12.04 does not support the --no-nvram flag, installation of 12.04 will still cause this failure path.

=== End SRU Template ===

Installing a node configured to boot in EFI/UEFI mode results in Ubuntu registering GRUB with the firmware, thus overriding the node's original PXE-boot setting. This can be seen via efibootmgr after the node has been started:

$ sudo efibootmgr -v
BootCurrent: 0006
Timeout: 0 seconds
BootOrder: 0006,000E,0007,0008,0009,000A,000B,000C,000D
Boot0000 Setup
Boot0001 Boot Menu
Boot0002 Diagnostic Splash Screen
Boot0003 Lenovo Diagnostics
Boot0004 Startup Interrupt Menu
Boot0005 ME Configuration Menu
Boot0006* ubuntu HD(1,800,100000,2017753f-1b99-424d-811b-91c2e8c3c816)File(\EFI\ubuntu\shimx64.efi)
Boot0007* USB CD 030a2400d23878bc820f604d8316c068ee79d25b86701296aa5a7848b66cd49dd3ba6a55
Boot0008* USB FDD 030a2400d23878bc820f604d8316c068ee79d25b6ff015a28830b543a8b8641009461e49
Boot0009* ATAPI CD0 030a2500d23878bc820f604d8316c068ee79d25baea2090adfde214e8b3a5e471856a35401
Boot000A* ATA HDD0 030a2500d23878bc820f604d8316c068ee79d25b91af625956449f41a7b91f4f892ab0f600
Boot000B* ATA HDD1 030a2500d23878bc820f604d8316c068ee79d25b91af625956449f41a7b91f4f892ab0f601
Boot000C* ATA HDD2 030a2500d23878bc820f604d8316c068ee79d25b91af625956449f41a7b91f4f892ab0f602
Boot000D* USB HDD 030a2400d23878bc820f604d8316c068ee79d25b33e821aaaf33bc4789bd419f88c50803
Boot000E* PCI LAN 030a2400d23878bc820f604d8316c068ee79d25b78a84aaf2b2afc4ea79cf5cc8f3d3803
Boot000F* IDER BOOT CDROM ACPI(a0341d0,0)PCI(16,2)ATAPI(0,1,0)
Boot0010* IDER BOOT Floppy ACPI(a0341d0,0)PCI(16,2)ATAPI(0,0,0)
Boot0012 Rescue and Recovery

Note that the system's BootOrder is set to load \EFI\ubuntu\shimx64.efi (which in turn launches GRUB; item Boot0006) from the hard disk, and to PXE-boot (item Boot000E) second. When the process began, it was set to PXE-boot first.

This works fine for bringing up a node initially, but it's different from the way a BIOS-mode installation works, in which the system is left PXE-booting by default. The UEFI approach will result in an inability to delete the node and re-enlist it, or even to stop the node and then re-start it and have Ubuntu re-installed, without either manually running efibootmgr or delete the file or partition to which the boot manager entry points.

I'm attaching my MAAS log files. The UEFI-booting system is 192.168.0.56.

Here's the MAAS server version information:

$ dpkg -l '*maas*'|cat
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-=====================================================-===================================================-============-===============================================================================
ii maas 1.5+bzr2252-0ubuntu1 all MAAS server all-in-one metapackage
ii maas-cli 1.5+bzr2252-0ubuntu1 all MAAS command line API tool
ii maas-cluster-controller 1.5+bzr2252-0ubuntu1 all MAAS server cluster controller
ii maas-common 1.5+bzr2252-0ubuntu1 all MAAS server common files
ii maas-dhcp 1.5+bzr2252-0ubuntu1 all MAAS DHCP server
ii maas-dns 1.5+bzr2252-0ubuntu1 all MAAS DNS server
ii maas-region-controller 1.5+bzr2252-0ubuntu1 all MAAS server complete region controller
ii maas-region-controller-min 1.5+bzr2252-0ubuntu1 all MAAS Server minimum region controller
ii python-django-maas 1.5+bzr2252-0ubuntu1 all MAAS server Django web framework
ii python-maas-client 1.5+bzr2252-0ubuntu1 all MAAS python API client
ii python-maas-provisioningserver 1.5+bzr2252-0ubuntu1 all MAAS server

Related branches

Revision history for this message
Rod Smith (rodsmith) wrote :
tags: added: server-hwe
Changed in maas:
assignee: nobody → Jason Hobbs (jason-hobbs)
Changed in maas:
assignee: Jason Hobbs (jason-hobbs) → Blake Rouse (blake-rouse)
Changed in maas:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Blake Rouse (blake-rouse) wrote :

What installation method were you seeing this issue? Was it with d-i and/or curtin?

Revision history for this message
Rod Smith (rodsmith) wrote :

I think I confirmed it with both methods, but I'm more positive of the d-i install.

Revision history for this message
Blake Rouse (blake-rouse) wrote :

Do you know of away this can be fixed? What would be away to stop this from occurring? If its occurring in d-i then its also a standard Ubuntu installer issue.

Revision history for this message
Rod Smith (rodsmith) wrote :

I've just double-checked, and it does the same thing in the fastpath install as in the d-i install.

This is not a problem when doing a conventional (non-MAAS) install because it's the way EFI/UEFI is supposed to work; the OS registers its (on-disk) boot loader with the firmware, which then launches the boot loader based on its own built-in boot manager's rules. The problem is that this EFI convention doesn't work well with MAAS, or at least MAAS should modify it so that the MAAS server can retain control of its nodes' boot processes. As it stands now, MAAS is unable to re-start an EFI-booting node once it's been stopped unless somebody takes manual steps (to delete GRUB or to modify the boot order).

I don't know the details of the various scripts involved in the various installers, but from my perspective, the simplest way to prevent this problem from occurring is to have the installers NOT call efibootmgr to register GRUB with the firmware. This might require changes to MAAS scripts and/or to the GRUB installation scripts. It's conceivable that something else (system update tools, for instance) might also call efibootmgr to re-register GRUB, so inquiries should be made about that.

If the GRUB scripts automatically set it up, and if the GRUB package maintainers are unwilling to change it, then the second-best option is to have MAAS remove the GRUB entries from the NVRAM, or at least change the BootOrder variable to omit GRUB. This will entail parsing the efibootmgr output to identify GRUB so that its entry can be removed or omitted from BootOrder.

Changed in maas:
assignee: Blake Rouse (blake-rouse) → nobody
JuanJo Ciarlante (jjo)
tags: added: canonical-bootstack
Changed in curtin:
status: New → Triaged
importance: Undecided → High
assignee: nobody → Blake Rouse (blake-rouse)
Changed in maas:
status: Triaged → In Progress
Changed in curtin:
status: Triaged → In Progress
Changed in maas:
assignee: nobody → Blake Rouse (blake-rouse)
milestone: none → 1.8.0
Changed in curtin:
status: In Progress → Fix Committed
Changed in maas:
status: In Progress → Fix Committed
Scott Moser (smoser)
Changed in curtin (Ubuntu):
status: New → Confirmed
status: Confirmed → Fix Released
Changed in curtin (Ubuntu Trusty):
status: New → Confirmed
Changed in curtin (Ubuntu Vivid):
status: New → Confirmed
Changed in curtin (Ubuntu):
importance: Undecided → Medium
Changed in curtin (Ubuntu Trusty):
importance: Undecided → Medium
Changed in curtin (Ubuntu Vivid):
importance: Undecided → Medium
Scott Moser (smoser)
description: updated
Changed in maas:
status: Fix Committed → Fix Released
Revision history for this message
Chris J Arges (arges) wrote : Please test proposed package

Hello Roderick, or anyone else affected,

Accepted curtin into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/curtin/0.1.0~bzr221-0ubuntu1~14.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in curtin (Ubuntu Trusty):
status: Confirmed → Fix Committed
tags: added: verification-needed
Revision history for this message
Chris J Arges (arges) wrote :

Hello Roderick, or anyone else affected,

Accepted curtin into vivid-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/curtin/0.1.0~bzr221-0ubuntu1~14.10.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in curtin (Ubuntu Vivid):
status: Confirmed → Fix Committed
Revision history for this message
Scott Moser (smoser) wrote :

I've installed a UEFI system using curtin today and verified that it did not set boot from network.

tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package curtin - 0.1.0~bzr221-0ubuntu1~14.04.1

---------------
curtin (0.1.0~bzr221-0ubuntu1~14.04.1) trusty-proposed; urgency=medium

  * New upstream snapshot.
    - support installation to multipath devices. (LP: #1371634)
    - know that kernel version 4.2.0 maps to linux-generic-lts-wily
    - support install to arm64 systems that use UEFI for boot (LP: #1447834)
    - fix remaining usage of 'lsblk --out' rather than 'lsblk --output'
      (LP: #1386275)
    - retry 'apt-get update' on failure to avoid transient failures
      (LP: #1403133)
    - run udevadm settle before unmounting /dev in a target to avoid transient
      failures (LP: #1462139)
    - fixes and additions to tools used in development.
    - Add --no-nvram to the grub-install command for UEFI. (LP: #1311827)
    - avoid race condition and transient failure due busy device in mkfs
      (LP: #1443542)
    - improvements to device and partition naming code which allow installation
      devices with HP cciss smart array drives(LP: #1401190, #1263181)
    - do not consider devices < 1G as installable targets
  * debian/README.source fix doc on how to create new upstream snapshots

 -- Scott Moser <email address hidden> Wed, 24 Jun 2015 14:31:14 -0400

Changed in curtin (Ubuntu Trusty):
status: Fix Committed → Fix Released
Revision history for this message
Chris J Arges (arges) wrote : Update Released

The verification of the Stable Release Update for curtin has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package curtin - 0.1.0~bzr221-0ubuntu1~14.10.1

---------------
curtin (0.1.0~bzr221-0ubuntu1~14.10.1) vivid-proposed; urgency=medium

  * New upstream snapshot.
    - support installation to multipath devices. (LP: #1371634)
    - know that kernel version 4.2.0 maps to linux-generic-lts-wily
    - support install to arm64 systems that use UEFI for boot (LP: #1447834)
    - fix remaining usage of 'lsblk --out' rather than 'lsblk --output'
      (LP: #1386275)
    - retry 'apt-get update' on failure to avoid transient failures
      (LP: #1403133)
    - run udevadm settle before unmounting /dev in a target to avoid transient
      failures (LP: #1462139)
    - fixes and additions to tools used in development.
    - Add --no-nvram to the grub-install command for UEFI. (LP: #1311827)
    - avoid race condition and transient failure due busy device in mkfs
      (LP: #1443542)
    - improvements to device and partition naming code which allow installation
      devices with HP cciss smart array drives(LP: #1401190, #1263181)
    - do not consider devices < 1G as installable targets
  * debian/README.source fix doc on how to create new upstream snapshots

 -- Scott Moser <email address hidden> Wed, 24 Jun 2015 16:12:59 -0400

Changed in curtin (Ubuntu Vivid):
status: Fix Committed → Fix Released
Revision history for this message
Scott Moser (smoser) wrote : Fixed in Curtin 17.1

This bug is believed to be fixed in curtin in 17.1. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

Changed in curtin:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.