[2.3+] Unable to disk erase if machine is deployed with a non-lts kernel

Bug #1730525 reported by Matt Dirba
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
High
Lee Trager
2.3
In Progress
High
Unassigned

Bug Description

I have asked maas 2.2.2 to erase disks as they are released. This works great for xenial but fails on machines deployed with artful. Here is what happens.

1) In maasserver/models/node.py the release_or_erase function calls either the start_disk_erasing or release function depending on whether the disks need to be erased or not. Note: release clears the distro and hwe_kernel flags from the node object but start_disk_erasing does not.
2) The machine is rebooted and requests its grub.cfg which results in a function call to get_config in maasserver/rpc/boot.py. From there, we get the purpose of the boot by calling function get_boot_purpose defined in maasserver/rpc/boot.py. The boot purpose returned is "commissioning" because "The environment (boot images, kernel options, etc for erasing is the same as that of commissioning." (as documented in the comments)
3) Since the boot purpose is commissioning we decide to overwrite the system and series but we do not modify the architecture or hwe_kernel.
4) Still in get_config we validate this combination of artful kernel with a xenial series and promptly throw the following error and give up on erasing the disk.
maas.node: [error] hostname: Marking node failed: Missing boot image ubuntu/amd64/ga-17.10/xenial.

My hack/workaround is as follows in case you are interested.

diff --git a/maasserver/rpc/boot.py b/maasserver/rpc/boot.py
index 5ec41bb..83d58d3 100644
--- a/maasserver/rpc/boot.py
+++ b/maasserver/rpc/boot.py
@@ -199,6 +199,9 @@ def get_config(
         if purpose == "commissioning":
             osystem = Config.objects.get_config('commissioning_osystem')
             series = Config.objects.get_config('commissioning_distro_series')
+ subarch = "generic"
+ machine.architecture = '{}/{}'.format(arch, subarch)
+ machine.hwe_kernel = None
         else:
             osystem = machine.get_osystem()
             series = machine.get_distro_series()

Related branches

Revision history for this message
Andres Rodriguez (andreserl) wrote :

To reproduce:

1. Deploy Artful
2. Release with disk erasing
3. The machine will attempt to boot the artful kernel with a Xenial image:

ubuntu/amd64/ga-17.10/xenial

We need to determine whether:

1. Since it was deployed with artful, boot the kernel/image same as the deployment
2. Use the default commissioning kernel/image.

summary: - maas 2.2.2 unable to disk erase artful deployment
+ [2.3+] maas 2.2.2 unable to disk erase artful deployment
Changed in maas:
milestone: none → 2.4.0beta3
milestone: 2.4.0beta3 → 2.4.0beta2
importance: Undecided → High
status: New → Triaged
summary: - [2.3+] maas 2.2.2 unable to disk erase artful deployment
+ [2.3+] Unable to disk erase if machine is deployed with a non-lts kernel
Changed in maas:
assignee: nobody → Lee Trager (ltrager)
Changed in maas:
status: Triaged → In Progress
Changed in maas:
status: In Progress → Fix Committed
Changed in maas:
status: Fix Committed → Fix Released
Changed in maas:
status: Fix Released → Fix Committed
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

I'm hitting this in 2.3.3:

Marking node failed - Missing boot image ubuntu/arm64/ga-18.04/xenial.

Changed in maas:
status: Fix Committed → Fix Released
Revision history for this message
Joe Julian (joe.julian) wrote :

Can this be backported to 2.3?

Revision history for this message
Daniel Souza (danielsouzasp) wrote :

Hello, I'm facing the same issue when deploying any node with 14.04 LTS within MAAS version: 2.3.5.

When I run a Release with erase disk, it tries boot from /var/lib/maas/boot-resources/current/ubuntu/amd64/hwe-t/xenial
and then the node is marked as failed.

my temporary workaround is create symbolic link:
cd /var/lib/maas/boot-resources/current/ubuntu/amd64/hwe-t
ln -s /var/lib/maas/boot-resources/current/ubuntu/amd64/generic/xenial/ xenial

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.