~ce-hyperscale/+git/maas-preseeds:fix_driver_version

Last commit made on 2024-03-01
Get this branch:
git clone -b fix_driver_version https://git.launchpad.net/~ce-hyperscale/+git/maas-preseeds
Members of CE Hyperscale can upload to this branch. Log in for directions.

Branch merges

Branch information

Name:
fix_driver_version
Repository:
lp:~ce-hyperscale/+git/maas-preseeds

Recent commits

7f8bdbd... by Mitchell Augustin

Install correct linux-modules variant (open/non-open)

Currently, non-open linux-modules are installed even if
the open variant of the nvidia drivers are installed.
This change corrects that.

0c8b579... by Mitchell Augustin

late.sh: corrected install_gpu_drivers on focal

Since we install the free version of Nvidia drivers on all systems
except those with a V100, such systems running Focal will fail
deployment since nvidia-driver-470-server-open does not exist.
This change ensures any attempts to download 470 drivers only
search for nvidia-driver-470-server.

27fb80c... by Mitchell Augustin

Removed systemctl enable fabricmanager

Per discussion in https://warthogs.atlassian.net/browse/NVDGX-314
(regarding LP: #2025614), we determined that fabricmanager is
automatically started on compatible systems at install time,
so this enable call is unnecessary.

a4ded9f... by dann frazier

Disable MOFED installation on hidon again

So cert team can run the iperf cert test.

d301fff... by dann frazier

install_gpu_drivers(): Add additional kernel metapackage patterns

We're currently missing the pattern for linux-nvidia HWE kernels, which
results in us installing the DKMS package instead of the signed modules.

d86fc8f... by dann frazier

hidon: don't write to /home/ubuntu, it doesn't exist yet

Seen on the console:
[ 452.036488] cloud-init[4510]: + dmidecode -s system-product-name
[ 452.052372] cloud-init[4510]: + [ DGXH100 = DGXH100 ]
[ 452.068371] cloud-init[4510]: + touch /home/ubuntu/JEFF-THIS-HAS-MOFED-INSTALLED-DONT-IPERF
[ 452.088364] cloud-init[4510]: touch: cannot touch '/home/ubuntu/JEFF-THIS-HAS-MOFED-INSTALLED-DONT-IPERF': No such file or directory

cb5e8aa... by dann frazier

hidon: Re-enable MOFED installation and try to warn the cert team

9233282... by Mitchell Augustin

Moved update-initramfs to MOFED install function

update-initramfs is called as part of the MOFED installation process, so it has been moved to that function

90cdf29... by Mitchell Augustin

Change late.sh so fabric manager and MOFED are only installed on x86

This aligns with our test plans

af76449... by Mitchell Augustin

Added || true to "ipmitool sel clear" so MAAS deployment can continue if that command fails

On Hinyari, "ipmitool sel clear" will exit with code 1 and output "Unable to clear SEL: Unspecified error",
which causes the entire deployment to fail.

Since this is not a critical step of the deployment, this should not happen. This change allows the deployment
to continue while still printing the error message to the console.