cc_grub_dpkg was fixed to support nvme drives, but didn't clear the state of cc_grub_dpkg and didn't rerun it on upgrades

Bug #1889555 reported by Dimitri John Ledkov
264
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-init (Ubuntu)
Fix Released
Undecided
Dan Watkins
Xenial
Fix Released
Undecided
Unassigned
Bionic
Fix Released
Undecided
Unassigned
Focal
Fix Released
Undecided
Unassigned
Groovy
Fix Released
Undecided
Dan Watkins

Bug Description

=== Begin SRU Template ===
[Impact]
Older versions of cloud-init could misconfigure grub on nvme devices,
which could prevent instances from booting after a grub upgrade.

[Test Case]
For focal, bionic, and xenial verify the following:
1. on an affected instance, test that installing the new version of cloud-init appropriately updates debconf
2. on an affected instance, modify of the debconf settings and test that installing the new version of cloud-init does not touch those values
3. in a container, confirm that cloud-init does not touch the values
4. in an unaffected instance (i.e. one without an NVMe root), confirm that cloud-init does not touch the values

Steps for test 1:
# Find an old affected image with
aws ec2 describe-images --filters "Name=name,Values=Ubuntu <release number>*"

# Launch an AWS with affected image-id, ssh in

# After startup, connect via SSH, then
# Verify we're on an nvme device
lsblk | grep nvme

# Verify install_devices set incorrectly
debconf-show grub-pc | grep "install_devices:"

# update cloud-init to proposed
mirror=http://archive.ubuntu.com/ubuntu
echo deb $mirror $(lsb_release -sc)-proposed main | tee /etc/apt/sources.list.d/proposed.list
apt-get update -q
apt-get install -qy cloud-init

# Verify "Reconfiguring grub" message in upgrade output

# Verify install_devices set correctly
debconf-show grub-pc | grep "install_devices:"

# Verify that after reboot we can still connect

Steps for test 2:
# Find an old affected image with
aws ec2 describe-images --filters "Name=name,Values=Ubuntu <release number>*"

# Launch an AWS with affected image-id

# After startup, connect via SSH, then
# Verify we're on an nvme device
lsblk | grep nvme

# Verify install_devices set incorrectly
debconf-show grub-pc | grep "install_devices:"

# Update install device to something (anything) else
echo 'set grub-pc/install_devices /dev/sdb' | debconf-communicate

# update cloud-init to proposed
mirror=http://archive.ubuntu.com/ubuntu
echo deb $mirror $(lsb_release -sc)-proposed main | tee /etc/apt/sources.list.d/proposed.list
apt-get update -q
apt-get install -qy cloud-init

# Verify no "Reconfiguring grub" message in upgrade output
# Verify install_devices not changed
debconf-show grub-pc | grep "install_devices:"

Steps for test 3:
# lxd launch affected image
lxc launch <image>

# Obtain bash shell
lxc exec <image> bash

# Check install_devices
debconf-show grub-pc | grep "install_devices:"

# Update cloud-init to propsed
mirror=http://archive.ubuntu.com/ubuntu
echo deb $mirror $(lsb_release -sc)-proposed main | tee /etc/apt/sources.list.d/proposed.list
apt-get update -q
apt-get install -qy cloud-init

# Verify no "Reconfiguring grub" message in upgrade output
# Verify install_devices not changed
debconf-show grub-pc | grep "install_devices:"

Steps for test 4:
# Launch GCE image with:
gcloud compute instances create falcon-test --image <image> --image-project ubuntu-os-cloud --zone=us-central1-a

# After startup, connect via SSH, then
# Verify we're not on an nvme device
lsblk | grep nvme

# Check install_devices
debconf-show grub-pc | grep "install_devices:"

# update cloud-init to proposed

# Verify "Reconfiguring grub" message not in upgrade output

# Verify install_devices set correctly
debconf-show grub-pc | grep "install_devices:"

# Verify that after reboot we can still connect

[Regression Potential]
If a user manually configured their system in such a way that both devices
exist and it matches our error condition, the grub install device
could be reconfigured incorrectly.

[Other Info]
Pull request: https://github.com/canonical/cloud-init/pull/514/files
Upstream commit: https://github.com/canonical/cloud-init/commit/f48acc2bdc41c347d2eb899038e2520383851103

==== Original Description ====
cc_grub_dpkg was fixed to support nvme drives, but didn't clear the state of cc_grub_dpkg and didn't rerun it on upgrades

However, that only fixed the issue for the newly first-booted instances on nvme.

All existing boots of cloud-init on nvmes are still broken, and will fail to apply the latest grub2 update for BootHole mitigation.

Please add maintainer scripts changes to re-run cc_grub_dpkg, once-only, when cloud-init is upgraded to a new sru. To ensure that cc_grub_dpkg has been rerun, once, since nvme fixes.

You could guard this call, if debconf database grub-pc devices do not exist on the instance. (i.e. debconf has /dev/sda, and yet /dev/sda does not exist)

information type: Public → Public Security
tags: added: regression-update
Revision history for this message
Dan Watkins (oddbloke) wrote :

OK, so the issue we're dealing with here is that bug 1877491 fixed the grub install device for _new_ NVMe instances, but it did not fix it on existing NVMe instances. So, for existing instances, they will still have an incorrect grub install device configured (something like /dev/sda).

grub has two parts: the core and its modules. These two components are expected to be updated in lockstep. If each component is using a different ABI (i.e. they are not in lockstep), then systems will fail to boot. The components are kept in lockstep by the grub packaging; it will install the core to the grub install device(s) to ensure this.

For NVMe systems which have an incorrect grub install device configured (i.e. any which were launched before 2020/07/15), the grub packaging will fail to perform this core installation. This means that the core and modules will be using incompatible ABIs, so such systems will fail to boot when next rebooted.

(Note that the core/modules ABI does not change on every update to the grub package, so this mismatched boot failure will not be observed on every grub update. This bug has been filed, however, because we _do_ have such a grub update pending/in progress.)

There is a grub bug (bug 1889556) for handling the case where the core and modules are mismatched, but the solution there will require manual user intervention. cloud-init can fix this for NVMe drives non-interactively by redetermining the grub install devices in its postinst using its existing logic.

Dan Watkins (oddbloke)
Changed in cloud-init (Ubuntu Groovy):
status: New → In Progress
assignee: nobody → Dan Watkins (oddbloke)
Revision history for this message
Chris Halse Rogers (raof) wrote : Please test proposed package

Hello Dimitri, or anyone else affected,

Accepted cloud-init into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-init/20.3-2-g371b392c-0ubuntu1~20.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in cloud-init (Ubuntu Focal):
status: New → Fix Committed
tags: added: verification-needed verification-needed-focal
Revision history for this message
Chris Halse Rogers (raof) wrote : Proposed package upload rejected

An upload of cloud-init to bionic-proposed has been rejected from the upload queue for the following reason: "Unnecessary non-functional changes in ec2-dont-apply-full-imds-network-config.patch and renderer-do-not-prefer-netplan.patch".

Revision history for this message
Chris Halse Rogers (raof) wrote :

An upload of cloud-init to xenial-proposed has been rejected from the upload queue for the following reason: "Unnecessary non-functional changes in ec2-dont-apply-full-imds-network-config.patch and renderer-do-not-prefer-netplan.patch".

Revision history for this message
Chris Halse Rogers (raof) wrote : Please test proposed package

Hello Dimitri, or anyone else affected,

Accepted cloud-init into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-init/20.3-2-g371b392c-0ubuntu1~18.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in cloud-init (Ubuntu Bionic):
status: New → Fix Committed
tags: added: verification-needed-bionic
Changed in cloud-init (Ubuntu Xenial):
status: New → Fix Committed
tags: added: verification-needed-xenial
Revision history for this message
Chris Halse Rogers (raof) wrote :

Hello Dimitri, or anyone else affected,

Accepted cloud-init into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-init/20.3-2-g371b392c-0ubuntu1~16.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Chad Smith (chad.smith)
description: updated
Revision history for this message
James Falcon (falcojr) wrote :
tags: added: verification-done verification-done-bionic verification-done-focal verification-done-xenial
removed: verification-needed verification-needed-bionic verification-needed-focal verification-needed-xenial
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cloud-init - 20.3-15-g6d332e5c-0ubuntu1

---------------
cloud-init (20.3-15-g6d332e5c-0ubuntu1) groovy; urgency=medium

  * d/cloud-init.postinst: fix the grub install device for NVMe-rooted
    instances on upgrade. (LP: #1889555)
  * d/cloud-init.templates: add RbxCloud to Choices-C.
  * Add d/clean to fully clean the build artifacts.
  * d/control:
    - Bump Standards-Version to 4.5.0, no changes needed.
    - B-D on debhelper-compat; drop d/compat.
  * Bump the debhelper compat level to 13. Required changes:
    - Stop including the dh systemd plugin.
    - Switch from dh_systemd_start to dh_installsystemd
  * New upstream snapshot.
    - create a shutdown_command method in distro classes (#567)
      [Emmanuel Thomé]
    - user_data: remove unused constant (#566)
    - network: Fix type and respect name when rendering vlan in
      sysconfig. (#541) [Eduardo Otubo] (LP: #1788915, #1826608)
    - Retrieve SSH keys from IMDS first with OVF as a fallback (#509)
      [Thomas Stringer]
    - Add jqueuniet as contributor (#569) [Johann Queuniet]
    - distros: minor typo fix (#562)
    - Bump the integration-requirements versioned dependencies (#565)
      [Paride Legovini]
    - network-config-format-v1: fix typo in nameserver example (#564)
      [Stanislas]
    - Run cloud-init-local.service after the hv_kvp_daemon (#505)
      [Robert Schweikert]
    - Add method type hints for Azure helper (#540) [Johnson Shi]
    - systemd: add Before=shutdown.target when Conflicts=shutdown.target is
      used (#546) [Paride Legovini]
    - LXD: detach network from profile before deleting it (#542)
      [Paride Legovini] (LP: #1776958)
    - redhat spec: add missing BuildRequires (#552) [Paride Legovini]

 -- Chad Smith <email address hidden> Tue, 15 Sep 2020 20:19:10 -0600

Changed in cloud-init (Ubuntu Groovy):
status: In Progress → Fix Released
Revision history for this message
Chris Halse Rogers (raof) wrote : Update Released

The verification of the Stable Release Update for cloud-init has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (7.3 KiB)

This bug was fixed in the package cloud-init - 20.3-2-g371b392c-0ubuntu1~20.04.1

---------------
cloud-init (20.3-2-g371b392c-0ubuntu1~20.04.1) focal; urgency=medium

  * d/cloud-init.postinst: fix the grub install device for NVMe-rooted
    instances on upgrade. (LP: #1889555)
  * New upstream snapshot. (LP: #1893064)
    - util: remove debug statement (#556) [Joshua Powers]
    - Fix cloud config on chef example (#551) [lucasmoura]
    - Release 20.3 (#547) [James Falcon]
    - tox: bump the pylint version to 2.6.0 in the default run (#544)
      [Paride Legovini]
    - Azure: Add netplan driver filter when using hv_netvsc driver (#539)
      [James Falcon]
    - query: do not handle non-decodable non-gzipped content (#543)
    - DHCP sandboxing failing on noexec mounted /var/tmp (#521) [Eduardo Otubo]
    - Update the list of valid ssh keys. (#487) [Ole-Martin Bratteng]
    - cmd: cloud-init query to handle compressed userdata (#516)
    - Pushing cloud-init log to the KVP (#529) [Moustafa Moustafa]
    - Add Alpine Linux support. (#535) [dermotbradley]
    - Detect kernel version before swap file creation (#428) [Eduardo Otubo]
    - cli: add devel make-mime subcommand (#518)
    - user-data: only verify mime-types for TYPE_NEEDED and x-shellscript
      (#511)
    - DataSourceOracle: retry twice (and document why we retry at all) (#536)
    - Refactor Azure report ready code (#468) [Johnson Shi]
    - tox.ini: pin correct version of httpretty in xenial{,-dev} envs (#531)
    - Support Oracle IMDSv2 API (#528) [James Falcon]
    - .travis.yml: run a doc build during CI (#534)
    - doc/rtd/topics/datasources/ovf.rst: fix doc8 errors (#533)
    - Fix 'Users and Groups' configuration documentation (#530) [sshedi]
    - cloudinit.distros: update docstrings of add_user and create_user (#527)
    - Fix headers for device types in network v2 docs (#532)
      [Caleb Xavier Berger]
    - Add AlexBaranowski as contributor (#508) [Aleksander Baranowski]
    - DataSourceOracle: refactor to use only OPC v1 endpoint (#493)
    - .github/workflows/stale.yml: s/Josh/Rick/ (#526)
    - Fix a typo in apt pipelining module (#525) [Xiao Liang]
    - test_util: parametrize devlist tests (#523) [James Falcon]
    - Recognize LABEL_FATBOOT labels (#513) [James Falcon]
    - Handle additional identifier for SLES For HPC (#520) [Robert Schweikert]
    - Revert "test-requirements.txt: pin pytest to <6 (#512)" (#515)
    - test-requirements.txt: pin pytest to <6 (#512)
    - Add "tsanghan" as contributor (#504) [tsanghan]
    - fix brpm building
    - Adding eandersson as a contributor (#502) [Erik Olof Gunnar Andersson]
    - azure: disable bouncing hostname when setting hostname fails (#494)
      [Anh Vo]
    - VMware: Support parsing DEFAULT-RUN-POST-CUST-SCRIPT (#441)
      [xiaofengw-vmware]
    - DataSourceAzure: Use ValueError when JSONDecodeError is not available
      (#490) [Anh Vo]
    - cc_ca_certs.py: fix blank line problem when removing CAs and adding
      new one (#483) [dermotbradley]
    - freebsd: py37-serial is now py37-pyserial (#492) [Gonéri Le Bouder]
    - ssh exit with non-zero status on disabled user (#472) [Eduardo Otubo]
    - cloud...

Read more...

Changed in cloud-init (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (7.3 KiB)

This bug was fixed in the package cloud-init - 20.3-2-g371b392c-0ubuntu1~18.04.1

---------------
cloud-init (20.3-2-g371b392c-0ubuntu1~18.04.1) bionic; urgency=medium

  * d/cloud-init.postinst: fix the grub install device for NVMe-rooted
    instances on upgrade. (LP: #1889555)
  * refresh patches:
   + debian/patches/ubuntu-advantage-revert-tip.patch
  * New upstream snapshot. (LP: #1893064)
    - util: remove debug statement (#556) [Joshua Powers]
    - Fix cloud config on chef example (#551) [lucasmoura]
    - Release 20.3 (#547) [James Falcon]
    - tox: bump the pylint version to 2.6.0 in the default run (#544)
      [Paride Legovini]
    - Azure: Add netplan driver filter when using hv_netvsc driver (#539)
      [James Falcon]
    - query: do not handle non-decodable non-gzipped content (#543)
    - DHCP sandboxing failing on noexec mounted /var/tmp (#521) [Eduardo Otubo]
    - Update the list of valid ssh keys. (#487) [Ole-Martin Bratteng]
    - cmd: cloud-init query to handle compressed userdata (#516)
    - Pushing cloud-init log to the KVP (#529) [Moustafa Moustafa]
    - Add Alpine Linux support. (#535) [dermotbradley]
    - Detect kernel version before swap file creation (#428) [Eduardo Otubo]
    - cli: add devel make-mime subcommand (#518)
    - user-data: only verify mime-types for TYPE_NEEDED and x-shellscript
      (#511)
    - DataSourceOracle: retry twice (and document why we retry at all) (#536)
    - Refactor Azure report ready code (#468) [Johnson Shi]
    - tox.ini: pin correct version of httpretty in xenial{,-dev} envs (#531)
    - Support Oracle IMDSv2 API (#528) [James Falcon]
    - .travis.yml: run a doc build during CI (#534)
    - doc/rtd/topics/datasources/ovf.rst: fix doc8 errors (#533)
    - Fix 'Users and Groups' configuration documentation (#530) [sshedi]
    - cloudinit.distros: update docstrings of add_user and create_user (#527)
    - Fix headers for device types in network v2 docs (#532)
      [Caleb Xavier Berger]
    - Add AlexBaranowski as contributor (#508) [Aleksander Baranowski]
    - DataSourceOracle: refactor to use only OPC v1 endpoint (#493)
    - .github/workflows/stale.yml: s/Josh/Rick/ (#526)
    - Fix a typo in apt pipelining module (#525) [Xiao Liang]
    - test_util: parametrize devlist tests (#523) [James Falcon]
    - Recognize LABEL_FATBOOT labels (#513) [James Falcon]
    - Handle additional identifier for SLES For HPC (#520) [Robert Schweikert]
    - Revert "test-requirements.txt: pin pytest to <6 (#512)" (#515)
    - test-requirements.txt: pin pytest to <6 (#512)
    - Add "tsanghan" as contributor (#504) [tsanghan]
    - fix brpm building
    - Adding eandersson as a contributor (#502) [Erik Olof Gunnar Andersson]
    - azure: disable bouncing hostname when setting hostname fails (#494)
      [Anh Vo]
    - VMware: Support parsing DEFAULT-RUN-POST-CUST-SCRIPT (#441)
      [xiaofengw-vmware]
    - DataSourceAzure: Use ValueError when JSONDecodeError is not available
      (#490) [Anh Vo]
    - cc_ca_certs.py: fix blank line problem when removing CAs and adding
      new one (#483) [dermotbradley]
    - freebsd: py37-serial is now py37-pyserial (#492) [Gonéri Le Bouder]
    - ssh e...

Read more...

Changed in cloud-init (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (7.4 KiB)

This bug was fixed in the package cloud-init - 20.3-2-g371b392c-0ubuntu1~16.04.1

---------------
cloud-init (20.3-2-g371b392c-0ubuntu1~16.04.1) xenial; urgency=medium

  * d/control: add python3-pytest-catchlog to Build-Depends
  * d/cloud-init.postinst: fix the grub install device for NVMe-rooted
    instances on upgrade. (LP: #1889555)
  * refresh patches:
   + debian/patches/azure-apply-network-config-false.patch
   + debian/patches/ubuntu-advantage-revert-tip.patch
  * New upstream snapshot. (LP: #1893064)
    - util: remove debug statement (#556) [Joshua Powers]
    - Fix cloud config on chef example (#551) [lucasmoura]
    - Release 20.3 (#547) [James Falcon]
    - tox: bump the pylint version to 2.6.0 in the default run (#544)
      [Paride Legovini]
    - Azure: Add netplan driver filter when using hv_netvsc driver (#539)
      [James Falcon]
    - query: do not handle non-decodable non-gzipped content (#543)
    - DHCP sandboxing failing on noexec mounted /var/tmp (#521) [Eduardo Otubo]
    - Update the list of valid ssh keys. (#487) [Ole-Martin Bratteng]
    - cmd: cloud-init query to handle compressed userdata (#516)
    - Pushing cloud-init log to the KVP (#529) [Moustafa Moustafa]
    - Add Alpine Linux support. (#535) [dermotbradley]
    - Detect kernel version before swap file creation (#428) [Eduardo Otubo]
    - cli: add devel make-mime subcommand (#518)
    - user-data: only verify mime-types for TYPE_NEEDED and x-shellscript
      (#511)
    - DataSourceOracle: retry twice (and document why we retry at all) (#536)
    - Refactor Azure report ready code (#468) [Johnson Shi]
    - tox.ini: pin correct version of httpretty in xenial{,-dev} envs (#531)
    - Support Oracle IMDSv2 API (#528) [James Falcon]
    - .travis.yml: run a doc build during CI (#534)
    - doc/rtd/topics/datasources/ovf.rst: fix doc8 errors (#533)
    - Fix 'Users and Groups' configuration documentation (#530) [sshedi]
    - cloudinit.distros: update docstrings of add_user and create_user (#527)
    - Fix headers for device types in network v2 docs (#532)
      [Caleb Xavier Berger]
    - Add AlexBaranowski as contributor (#508) [Aleksander Baranowski]
    - DataSourceOracle: refactor to use only OPC v1 endpoint (#493)
    - .github/workflows/stale.yml: s/Josh/Rick/ (#526)
    - Fix a typo in apt pipelining module (#525) [Xiao Liang]
    - test_util: parametrize devlist tests (#523) [James Falcon]
    - Recognize LABEL_FATBOOT labels (#513) [James Falcon]
    - Handle additional identifier for SLES For HPC (#520) [Robert Schweikert]
    - Revert "test-requirements.txt: pin pytest to <6 (#512)" (#515)
    - test-requirements.txt: pin pytest to <6 (#512)
    - Add "tsanghan" as contributor (#504) [tsanghan]
    - fix brpm building
    - Adding eandersson as a contributor (#502) [Erik Olof Gunnar Andersson]
    - azure: disable bouncing hostname when setting hostname fails (#494)
      [Anh Vo]
    - VMware: Support parsing DEFAULT-RUN-POST-CUST-SCRIPT (#441)
      [xiaofengw-vmware]
    - DataSourceAzure: Use ValueError when JSONDecodeError is not available
      (#490) [Anh Vo]
    - cc_ca_certs.py: fix blank line problem when removing CAs and adding
   ...

Read more...

Changed in cloud-init (Ubuntu Xenial):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public Security information  
Everyone can see this security related information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.