Merge ~ian-may/+git/autotest-client-tests:ubuntu_nvidia_fs into ~canonical-kernel-team/+git/autotest-client-tests:master

Proposed by Ian May
Status: Superseded
Proposed branch: ~ian-may/+git/autotest-client-tests:ubuntu_nvidia_fs
Merge into: ~canonical-kernel-team/+git/autotest-client-tests:master
Diff against target: 649 lines (+567/-2)
13 files modified
ubuntu_nvidia_fs/control (+12/-0)
ubuntu_nvidia_fs/nvidia-fs/00-vars (+11/-0)
ubuntu_nvidia_fs/nvidia-fs/01-run-test.sh (+163/-0)
ubuntu_nvidia_fs/nvidia-fs/02-inside-vm-update-kernel.sh (+52/-0)
ubuntu_nvidia_fs/nvidia-fs/03-inside-vm-install-drivers.sh (+46/-0)
ubuntu_nvidia_fs/nvidia-fs/04-inside-vm-setup-docker-and-run-test.sh (+41/-0)
ubuntu_nvidia_fs/nvidia-fs/05-inside-docker-run-test.sh (+38/-0)
ubuntu_nvidia_fs/nvidia-fs/README (+17/-0)
ubuntu_nvidia_fs/nvidia-fs/a-c-t-entry.sh (+14/-0)
ubuntu_nvidia_fs/nvidia-module-lib (+96/-0)
ubuntu_nvidia_fs/ubuntu_nvidia_fs.py (+35/-0)
ubuntu_nvidia_fs/ubuntu_nvidia_fs.sh (+41/-0)
ubuntu_nvidia_server_driver/control (+1/-2)
Reviewer Review Type Date Requested Status
Francis Ginther Needs Information
Po-Hsu Lin Approve
Review via email: mp+428555@code.launchpad.net

This proposal has been superseded by a proposal from 2022-09-20.

Commit message

Not all DGX systems need 'nvidia-fs' ran. So I'd like to decouple it from the 'nvidia driver load' test. No functional change to the test.

To post a comment you must log in.
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Hi Ian,
overall it's looking good. +1 on this.

Some cleanup questions:
 * Do you still want to keep the nvidia-fs/ in ubuntu_nvidia_server_driver?
 * Also, these lines in ubuntu_nvidia_server_driver.py:

22 def run_nvidia_fs_in_lxc(self):
23 cmd = os.path.join(p_dir, "./nvidia-fs/a-c-t-entry.sh")
24 utils.system(cmd)

And the test_name if statement for checking nvidia-fs.

It's rather trivial. So I am ok to keep or not to keep these.

review: Approve
Revision history for this message
Francis Ginther (fginther) wrote :

As Sam asked, is there any reason to keep nvidia-fs under ubuntu_nvidia_server_driver. And if not, why not use a 'git mv' on these files to preserve their git history?

Functionally everything looks fine.

review: Needs Information

Unmerged commits

c6eaf91... by Ian May

UBUNTU: SAUCE: ubuntu_nvidia_server_driver: extend test timeout

Increase test timeout to accommodate additional drivers in test.

Signed-off-by: Ian May <email address hidden>

122edec... by Ian May

UBUNTU: SAUCE: ubuntu_nvidia_server_driver: disable nvidia-fs test

With 'ubuntu-nvidia-fs' in place, disable the 'nvidia-fs' job call in
'ubuntu_nvidia_server_driver'.

Signed-off-by: Ian May <email address hidden>

f3d3b4a... by Ian May

UBUNTU: SAUCE: ubuntu_nvidia_fs: create nvidia-fs test

The 'nvidia-fs' test was originally added to
'ubuntu_nvidia_server_driver'. There are situations where
it would be better to have the 'nvidia-fs' as a stand alone
test. No functional change to the test.

Signed-off-by: Ian May <email address hidden>

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/ubuntu_nvidia_fs/control b/ubuntu_nvidia_fs/control
2new file mode 100644
3index 0000000..75d21a4
4--- /dev/null
5+++ b/ubuntu_nvidia_fs/control
6@@ -0,0 +1,12 @@
7+AUTHOR = 'Taihsiang Ho <taihsiang.ho@canonical.com>'
8+TIME = 'SHORT'
9+NAME = 'nvidia-fs module test'
10+TEST_TYPE = 'client'
11+TEST_CLASS = 'General'
12+TEST_CATEGORY = 'Smoke'
13+
14+DOC = """
15+Perform testing of nvidia-fs module
16+"""
17+
18+job.run_test_detail('ubuntu_nvidia_fs', test_name='nvidia-fs', tag='nvidia-fs', timeout=1500)
19diff --git a/ubuntu_nvidia_fs/nvidia-fs/00-vars b/ubuntu_nvidia_fs/nvidia-fs/00-vars
20new file mode 100644
21index 0000000..ad86f46
22--- /dev/null
23+++ b/ubuntu_nvidia_fs/nvidia-fs/00-vars
24@@ -0,0 +1,11 @@
25+# shellcheck shell=bash
26+# shellcheck disable=SC2034
27+KERNEL_FLAVOR="generic"
28+CUDA_CONTAINER_NAME="nvcr.io/nvidia/cuda"
29+NVIDIA_BRANCH="470-server"
30+LXD_INSTANCE="nvidia-fs-test"
31+MLNX_REPO="https://linux.mellanox.com/public/repo/mlnx_ofed"
32+MLNX_OFED_VER="5.4-1.0.3.0"
33+if [ -f 00-vars.gen ]; then
34+ source ./00-vars.gen
35+fi
36diff --git a/ubuntu_nvidia_fs/nvidia-fs/01-run-test.sh b/ubuntu_nvidia_fs/nvidia-fs/01-run-test.sh
37new file mode 100755
38index 0000000..9d6670c
39--- /dev/null
40+++ b/ubuntu_nvidia_fs/nvidia-fs/01-run-test.sh
41@@ -0,0 +1,163 @@
42+#!/usr/bin/env bash
43+
44+set -e
45+set -x
46+set -o pipefail
47+
48+shopt -s nullglob
49+
50+rm -f 00-vars.gen # avoid stale configs from previous runs
51+source 00-vars
52+source ../nvidia-module-lib
53+
54+sudo apt install -y jq xmlstarlet
55+
56+driver_recommended_cuda_version() {
57+ local xmlout
58+ xmlout="$(mktemp)"
59+
60+ sudo nvidia-smi -q -u -x --dtd | tee "$xmlout" > /dev/null
61+ xmlstarlet sel -t -v "/nvidia_smi_log/cuda_version" < "$xmlout"
62+ rm -f "$xmlout"
63+}
64+
65+find_latest_cuda_container_tag_by_branch() {
66+ local branch="$1" # e.g. 11.4
67+ local tmpfile="$(mktemp)"
68+ local url_api_base="https://registry.hub.docker.com/v2/repositories/nvidia/cuda/tags"
69+ source ./00-vars.gen # pick up LXD_OS_VER
70+ local search_tag=devel-ubuntu"${LXD_OS_VER}"
71+ local url=${url_api_base}"?name="-"${search_tag}"
72+
73+ # List all of the available nvidia cuda image tags, filter for
74+ # devel/ubuntu images that match our cuda x.y, and sort numerically
75+ # to find the newest minor (x.y.z) version.
76+ #
77+ # Output is paginated, this loops through each page.
78+ while [ "$url" != "null" ]; do
79+ curl -L -s "$url" > "$tmpfile"
80+ url="$(jq '."next"' < "$tmpfile" | tr -d \")"
81+ jq '."results"[]["name"]' < "$tmpfile" |
82+ tr -d \"
83+ done |
84+ grep -E "^${branch}(\.[0-9]+)*-${search_tag}$" | \
85+ sort -n | tail -1
86+ rm -f "$tmpfile"
87+}
88+
89+gen_vars() {
90+ local cuda_branch
91+ local container_tag
92+
93+ # Match the host OS
94+ echo "LXD_OS_CODENAME=$(lsb_release -cs)" > 00-vars.gen
95+ echo "LXD_OS_VER=$(lsb_release -rs)" >> 00-vars.gen
96+ cuda_branch="$(driver_recommended_cuda_version)"
97+ container_tag="$(find_latest_cuda_container_tag_by_branch "$cuda_branch")"
98+ echo "CUDA_BRANCH=${cuda_branch}" >> 00-vars.gen
99+ echo "CUDA_CONTAINER_TAG=${container_tag}" >> 00-vars.gen
100+}
101+
102+lxd_wait() {
103+ local instance="$1"
104+
105+ for _ in $(seq 300); do
106+ if lxc exec "${instance}" -- /bin/true; then
107+ break
108+ fi
109+ sleep 1
110+ done
111+}
112+
113+is_whole_nvme_dev() {
114+ local dev
115+ dev="$(basename "$1")"
116+ echo "$dev" | grep -Eq '^nvme[0-9]+n[0-9]+$'
117+}
118+
119+find_free_nvme() {
120+ local dev
121+ local children
122+ command -v jq > /dev/null || sudo apt install -y jq 1>&2
123+ for dev in /dev/nvme*; do
124+ is_whole_nvme_dev "$dev" || continue
125+ # Is this device used by another kernel device (RAID/LVM/etc)?
126+ children=$(lsblk -J "$dev" | jq '.["blockdevices"][0]."children"')
127+ if [ "$children" = "null" ]; then
128+ echo "$dev"
129+ return 0
130+ fi
131+ done
132+ return 1
133+}
134+
135+nvme_dev_to_bdf() {
136+ local dev="$1"
137+ local bdf=""
138+
139+ while read -r comp; do
140+ if echo "$comp" | grep -q -E '^[0-9a-f]{4}:[0-9a-f]{2}:[0-9a-f]{2}\.[0-9a-f]$'; then
141+ bdf="$comp"
142+ fi
143+ done <<<"$(readlink /sys/block/"$(basename "$dev")" | tr / '\n')"
144+ if [ -z "$bdf" ]; then
145+ echo "ERROR: name_dev_to_bdf: No PCI address found for $dev" 1>&2
146+ return 1
147+ fi
148+ echo "$bdf"
149+ return 0
150+}
151+
152+gen_vars
153+source ./00-vars.gen
154+
155+# 20.04 installs currently get LXD 4.0.7 by default, but we need at least
156+# 4.11 for PCI passthrough support for VMs. latest/stable is new enough.
157+sudo snap refresh lxd --channel=latest/stable
158+sudo lxd init --auto
159+lxc delete --force "$LXD_INSTANCE" || :
160+
161+# FIXME: Should probably dynamically adapt cpu/memory based on host system
162+lxc launch --vm "ubuntu:${LXD_OS_CODENAME}" "$LXD_INSTANCE" \
163+ -t c48-m16 \
164+ -c security.secureboot=false # so we can load untrusted modules
165+
166+# Ran out of space pulling the docker image w/ the default 10GB. Double it.
167+lxc config device override "${LXD_INSTANCE}" root size=20GB
168+lxd_wait "${LXD_INSTANCE}"
169+
170+for file in 00-vars 00-vars.gen 02-inside-vm-update-kernel.sh 03-inside-vm-install-drivers.sh 04-inside-vm-setup-docker-and-run-test.sh 05-inside-docker-run-test.sh; do
171+ lxc file push ${file} "${LXD_INSTANCE}"/root/${file}
172+done
173+lxc exec "${LXD_INSTANCE}" -- /root/02-inside-vm-update-kernel.sh
174+
175+# Reboot to switch to updated kernel, so new drivers will build for it
176+lxc stop "${LXD_INSTANCE}"
177+
178+# Release GPU devices so we can assign them to a VM
179+sudo service nvidia-fabricmanager stop || :
180+recursive_remove_module nvidia
181+
182+## Pass in devices. Note: devices can be assigned only while VM is stopped
183+
184+# Any Nvidia GPU will do, just grab the first one we find
185+gpuaddr="$(lspci | grep '3D controller: NVIDIA Corporation' | cut -d' ' -f1 | head -1)"
186+lxc config device add "${LXD_INSTANCE}" gpu pci "address=${gpuaddr}"
187+
188+# Find an unused NVMe device to pass in
189+nvmedev=$(find_free_nvme) || \
190+ (echo "ERROR: No unused nvme device found" 1>&2 && exit 1)
191+nvmeaddr="$(nvme_dev_to_bdf "$nvmedev")" || \
192+ (echo "ERROR: No PCI device found for $nvmedev" 1>&2 && exit 1)
193+lxc config device add "${LXD_INSTANCE}" nvme pci "address=${nvmeaddr}"
194+
195+lxc start "${LXD_INSTANCE}"
196+lxd_wait "${LXD_INSTANCE}"
197+lxc exec "${LXD_INSTANCE}" -- /root/03-inside-vm-install-drivers.sh
198+
199+# Reboot to switch to new overridden drivers
200+lxc stop "${LXD_INSTANCE}"
201+lxc start "${LXD_INSTANCE}"
202+
203+lxd_wait "${LXD_INSTANCE}"
204+lxc exec "${LXD_INSTANCE}" -- /root/04-inside-vm-setup-docker-and-run-test.sh
205diff --git a/ubuntu_nvidia_fs/nvidia-fs/02-inside-vm-update-kernel.sh b/ubuntu_nvidia_fs/nvidia-fs/02-inside-vm-update-kernel.sh
206new file mode 100755
207index 0000000..021cfc8
208--- /dev/null
209+++ b/ubuntu_nvidia_fs/nvidia-fs/02-inside-vm-update-kernel.sh
210@@ -0,0 +1,52 @@
211+#!/usr/bin/env bash
212+
213+set -e
214+set -x
215+
216+source ./00-vars
217+
218+export DEBCONF_FRONTEND="noniteractive"
219+export DEBIAN_PRIORITY="critical"
220+
221+enable_proposed() {
222+ local arch
223+ local release
224+ local mirror
225+ local pockets
226+ arch="$(dpkg --print-architecture)"
227+ release="$(lsb_release -cs)"
228+ pockets="restricted main universe multiverse"
229+
230+ case $arch in
231+ i386|amd64)
232+ mirror="http://archive.ubuntu.com/ubuntu"
233+ ;;
234+ *)
235+ mirror="http://ports.ubuntu.com/ubuntu-ports"
236+ ;;
237+ esac
238+
239+ echo "deb $mirror ${release}-proposed restricted $pockets" | \
240+ sudo tee "/etc/apt/sources.list.d/${release}-proposed.list" > /dev/null
241+ echo "deb-src $mirror ${release}-proposed restricted $pockets" | \
242+ sudo tee -a "/etc/apt/sources.list.d/${release}-proposed.list" > /dev/null
243+}
244+
245+enable_proposed
246+apt update
247+apt install -y linux-"${KERNEL_FLAVOR}" \
248+ linux-modules-nvidia-"${NVIDIA_BRANCH}"-"${KERNEL_FLAVOR}" \
249+ nvidia-kernel-source-"${NVIDIA_BRANCH}" \
250+ nvidia-utils-"${NVIDIA_BRANCH}"
251+
252+# Find the latest kernel version that matches our flavor and create "-test"
253+# symlinks to it since they will sort highest, making it the default
254+kver=$(linux-version list | grep -- "-${KERNEL_FLAVOR}$" | \
255+ linux-version sort --reverse | head -1)
256+ln -s "vmlinuz-${kver}" /boot/vmlinuz-test
257+ln -s "initrd.img-${kver}" /boot/initrd.img-test
258+
259+# Workaround LP: #1849563
260+echo "GRUB_CMDLINE_LINUX_DEFAULT=\"\$GRUB_CMDLINE_LINUX_DEFAULT pci=nocrs pci=realloc\"" > /etc/default/grub.d/99-nvidia-fs-test.cfg
261+
262+update-grub
263diff --git a/ubuntu_nvidia_fs/nvidia-fs/03-inside-vm-install-drivers.sh b/ubuntu_nvidia_fs/nvidia-fs/03-inside-vm-install-drivers.sh
264new file mode 100755
265index 0000000..94a6147
266--- /dev/null
267+++ b/ubuntu_nvidia_fs/nvidia-fs/03-inside-vm-install-drivers.sh
268@@ -0,0 +1,46 @@
269+#!/usr/bin/env bash
270+
271+set -e
272+set -x
273+
274+source ./00-vars
275+
276+export DEBCONF_FRONTEND="noniteractive"
277+export DEBIAN_PRIORITY="critical"
278+
279+# Remove headers for all kernels except the one running so DKMS does not
280+# try to build modules against them. Other kernels may not be compatible
281+# with our modules, and we don't want the install to fail because of that.
282+# We need to do this twice because apt will avoid removing a metapackage
283+# (e.g. linux-kvm) if it can instead upgrade it, which may pull in a new
284+# headers package. If that happens, the 2nd time through we'll remove that
285+# updated headers package as well as the metapackage(s) that brung it.
286+for _ in 1 2; do
287+ for file in /lib/modules/*/build; do
288+ if [ "$file" = "/lib/modules/$(uname -r)/build" ]; then
289+ continue
290+ fi
291+ apt remove --purge "$(dpkg -S "$file" | cut -d":" -f1 | sed 's/, / /g')" -y
292+ done
293+done
294+
295+# Install MOFED stack
296+wget -qO - https://www.mellanox.com/downloads/ofed/RPM-GPG-KEY-Mellanox | \
297+ apt-key add -
298+wget -qO - "${MLNX_REPO}/${MLNX_OFED_VER}/ubuntu${LXD_OS_VER}/mellanox_mlnx_ofed.list" | tee /etc/apt/sources.list.d/mellanox_mlnx_ofed.list
299+apt update
300+apt install -y mlnx-ofed-all mlnx-nvme-dkms mlnx-nfsrdma-dkms
301+
302+# Install nvidia-fs module
303+cuda_os="ubuntu$(echo "$LXD_OS_VER" | tr -d .)"
304+
305+# keyring install instructions from:
306+# https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
307+cuda_keyring_deb="$(mktemp)"
308+wget "https://developer.download.nvidia.com/compute/cuda/repos/$cuda_os/x86_64/cuda-keyring_1.0-1_all.deb" -O "$cuda_keyring_deb"
309+sudo dpkg -i "$cuda_keyring_deb"
310+rm -f "$cuda_keyring_deb"
311+
312+add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/${cuda_os}/x86_64/ /"
313+apt install -y nvidia-fs-dkms
314+add-apt-repository -r "deb https://developer.download.nvidia.com/compute/cuda/repos/${cuda_os}/x86_64/ /"
315diff --git a/ubuntu_nvidia_fs/nvidia-fs/04-inside-vm-setup-docker-and-run-test.sh b/ubuntu_nvidia_fs/nvidia-fs/04-inside-vm-setup-docker-and-run-test.sh
316new file mode 100755
317index 0000000..3cbd62b
318--- /dev/null
319+++ b/ubuntu_nvidia_fs/nvidia-fs/04-inside-vm-setup-docker-and-run-test.sh
320@@ -0,0 +1,41 @@
321+#!/usr/bin/env bash
322+
323+set -e
324+set -x
325+
326+source ./00-vars
327+
328+install_nvidia_docker() {
329+ local distribution
330+ distribution="$(. /etc/os-release;echo "$ID$VERSION_ID")"
331+ curl --retry 6 --retry-delay 10 --silent --show-error -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
332+ curl --retry 6 --retry-delay 10 --silent --show-error -L "https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list" | \
333+ sudo tee /etc/apt/sources.list.d/nvidia-docker.list > /dev/null
334+ sudo apt update
335+ sudo apt install -y nvidia-docker2 -y
336+ sudo systemctl restart docker
337+}
338+
339+umount /mnt/nvme || true
340+parted -s /dev/nvme0n1 -- mklabel gpt
341+parted -s /dev/nvme0n1 -- mkpart primary ext4 0 100%
342+udevadm settle
343+mkfs.ext4 -F "/dev/nvme0n1p1"
344+mkdir -p /mnt/nvme
345+mount "/dev/nvme0n1p1" /mnt/nvme -o data=ordered
346+
347+modprobe nvidia-fs
348+
349+install_nvidia_docker
350+
351+container="${CUDA_CONTAINER_NAME}:${CUDA_CONTAINER_TAG}"
352+
353+docker pull "${container}"
354+docker run --rm --ipc host --name test_gds --gpus device=all \
355+ --volume /run/udev:/run/udev:ro \
356+ --volume /sys/kernel/config:/sys/kernel/config/ \
357+ --volume /dev:/dev:ro \
358+ --volume /mnt/nvme:/data/:rw \
359+ --volume /root:/root/:ro \
360+ --privileged "${container}" \
361+ bash -c 'cd /root && ./05-inside-docker-run-test.sh'
362diff --git a/ubuntu_nvidia_fs/nvidia-fs/05-inside-docker-run-test.sh b/ubuntu_nvidia_fs/nvidia-fs/05-inside-docker-run-test.sh
363new file mode 100755
364index 0000000..652bb55
365--- /dev/null
366+++ b/ubuntu_nvidia_fs/nvidia-fs/05-inside-docker-run-test.sh
367@@ -0,0 +1,38 @@
368+#!/usr/bin/env bash
369+
370+set -e
371+set -x
372+
373+source ./00-vars
374+
375+# We want e.g. gds-tools-11-4 if using CUDA 11.4
376+gds_tools="gds-tools-$(echo "$CUDA_BRANCH" | tr "." "-")"
377+
378+apt update
379+apt install "$gds_tools" libssl-dev -y
380+cd /usr/local/cuda/gds/samples
381+make -j "$(nproc)"
382+dd status=none if=/dev/urandom of=/data/file1 iflag=fullblock bs=1M count=1024
383+dd status=none if=/dev/urandom of=/data/file2 iflag=fullblock bs=1M count=1024
384+
385+#Edit cufile.json and set "allow_compat" property to "false".
386+sed -i 's/"allow_compat_mode": true,/"allow_compat_mode": false,/' /etc/cufile.json
387+
388+echo "sample1"
389+./cufile_sample_001 /data/file1 0
390+echo "sample 2"
391+./cufile_sample_002 /data/file1 0
392+echo "sample 3"
393+./cufile_sample_003 /data/file1 /data/file2 0
394+echo "sample 4"
395+./cufile_sample_004 /data/file1 /data/file2 0
396+echo "sample 5"
397+./cufile_sample_005 /data/file1 /data/file2 0
398+echo "sample 6"
399+./cufile_sample_006 /data/file1 /data/file2 0
400+echo "sample 7"
401+./cufile_sample_007 0
402+echo "sample 8"
403+./cufile_sample_008 0
404+echo "sample 14"
405+./cufile_sample_014 /data/file1 /data/file2 0
406diff --git a/ubuntu_nvidia_fs/nvidia-fs/README b/ubuntu_nvidia_fs/nvidia-fs/README
407new file mode 100644
408index 0000000..fb68ce7
409--- /dev/null
410+++ b/ubuntu_nvidia_fs/nvidia-fs/README
411@@ -0,0 +1,17 @@
412+= nvidia-fs testing =
413+The goal of this test is to confirm that the nvidia-fs module continues to
414+build and work properly with new kernel updates.
415+
416+The environment in which this test needs to run requires several 3rd party
417+pieces of software - including other 3rd party modules that require a reboot
418+after installation. To avoid having to handle reboots of the test client,
419+we instead do the test inside of a virtual machine that the test client
420+can spin up and reboot itself. The actual nvidia-fs test runs in a docker
421+container inside that virtual machine.
422+
423+The test is kicked off by running 01-run-test.sh, which will run each of
424+the other scripts in turn to set up the virtual machine and the test
425+docker container within it.
426+
427+
428+
429diff --git a/ubuntu_nvidia_fs/nvidia-fs/a-c-t-entry.sh b/ubuntu_nvidia_fs/nvidia-fs/a-c-t-entry.sh
430new file mode 100755
431index 0000000..38c3c54
432--- /dev/null
433+++ b/ubuntu_nvidia_fs/nvidia-fs/a-c-t-entry.sh
434@@ -0,0 +1,14 @@
435+#!/usr/bin/env bash
436+
437+set -e
438+set -x
439+
440+# make sure a-c-t invoke the script in the right directory context
441+run_test() {
442+ exe_dir=$(dirname "${BASH_SOURCE[0]}")
443+ pushd "${exe_dir}"
444+ ./01-run-test.sh
445+ popd
446+}
447+
448+run_test
449diff --git a/ubuntu_nvidia_fs/nvidia-module-lib b/ubuntu_nvidia_fs/nvidia-module-lib
450new file mode 100644
451index 0000000..06141bf
452--- /dev/null
453+++ b/ubuntu_nvidia_fs/nvidia-module-lib
454@@ -0,0 +1,96 @@
455+# Copyright 2021 Canonical Ltd.
456+# Written by:
457+# Dann Frazier <dann.frazier@canonical.com>
458+# Taihsiang Ho <taihsiang.ho@canonical.com>
459+#
460+# shellcheck shell=bash
461+module_loaded() {
462+ module="$1"
463+ # Check linux/include/linux/module.h for module_state enumeration
464+ # There are the other states like Loading and Unloading besides Live. The
465+ # other states usually only take only few microseconds but let's specify
466+ # Live explicitly.
467+ grep "^${module} " /proc/modules | grep -q Live
468+}
469+
470+get_module_field() {
471+ local module="$1"
472+ local field="$2"
473+ # shellcheck disable=SC2034
474+ read -r mod size usecnt deps rest < <(grep "^${module} " /proc/modules)
475+ case $field in
476+ usecnt)
477+ echo "$usecnt"
478+ ;;
479+ deps)
480+ if [ "$deps" = "-" ]; then
481+ return 0
482+ fi
483+ echo "$deps" | tr ',' ' '
484+ ;;
485+ *)
486+ return 1
487+ esac
488+}
489+
490+module_in_use() {
491+ module="$1"
492+
493+ usecnt="$(get_module_field "$module" usecnt)"
494+
495+ if [ "$usecnt" -eq 0 ]; then
496+ return 1
497+ fi
498+ return 0
499+}
500+
501+recursive_remove_module() {
502+ local module="$1"
503+
504+ if ! module_loaded "$module"; then
505+ return 0
506+ fi
507+
508+ if ! module_in_use "$module"; then
509+ sudo rmmod "$module"
510+ return 0
511+ fi
512+
513+ if [ "$(get_module_field "$module" deps)" = "" ]; then
514+ echo "ERROR: $module is in use, but has no reverse dependencies"
515+ echo "ERROR: Maybe an application is using it."
516+ exit 1
517+ fi
518+ beforecnt="$(get_module_field "$module" usecnt)"
519+ for dep in $(get_module_field "$module" deps); do
520+ recursive_remove_module "$dep"
521+ done
522+ aftercnt="$(get_module_field "$module" usecnt)"
523+ if [ "$beforecnt" -eq "$aftercnt" ]; then
524+ echo "ERROR: Unable to reduce $module use count"
525+ exit 1
526+ fi
527+ recursive_remove_module "$module"
528+}
529+
530+uninstall_all_nvidia_mod_pkgs() {
531+ for pkg in $(dpkg-query -f "\${Package}\n" -W 'linux-modules-nvidia-*'); do
532+ sudo apt remove --purge "$pkg" -y
533+ done
534+ if sudo modinfo nvidia; then
535+ echo "ERROR: Uninstallation of all nvidia modules failed."
536+ exit 1
537+ fi
538+}
539+
540+product="$(sudo dmidecode -s baseboard-product-name)"
541+pkg_compatible_with_platform() {
542+ local pkg="$1"
543+ branch="$(echo "$pkg" | cut -d- -f4)"
544+
545+ if [ "$product" = "DGXA100" ] && [ "$branch" -le "418" ]; then
546+ return 1
547+ fi
548+
549+ return 0
550+}
551diff --git a/ubuntu_nvidia_fs/ubuntu_nvidia_fs.py b/ubuntu_nvidia_fs/ubuntu_nvidia_fs.py
552new file mode 100644
553index 0000000..77ac0bb
554--- /dev/null
555+++ b/ubuntu_nvidia_fs/ubuntu_nvidia_fs.py
556@@ -0,0 +1,35 @@
557+import os
558+from autotest.client import test, utils
559+
560+p_dir = os.path.dirname(os.path.abspath(__file__))
561+sh_executable = os.path.join(p_dir, "ubuntu_nvidia_fs.sh")
562+
563+
564+class ubuntu_nvidia_fs(test.test):
565+ version = 1
566+
567+ def initialize(self):
568+ pass
569+
570+ def setup(self):
571+ cmd = "{} setup".format(sh_executable)
572+ utils.system(cmd)
573+
574+ def run_nvidia_fs_in_lxc(self):
575+ #cmd = os.path.join(p_dir, "./nvidia-fs/a-c-t-entry.sh")
576+ #utils.system(cmd)
577+ cmd = "{} test".format(sh_executable)
578+ utils.system(cmd)
579+
580+ def run_once(self, test_name):
581+ print("HELLO WORLD")
582+ if test_name == "nvidia-fs":
583+ self.run_nvidia_fs_in_lxc()
584+
585+ print("")
586+ print("{} has run.".format(test_name))
587+
588+ print("")
589+
590+ def postprocess_iteration(self):
591+ pass
592diff --git a/ubuntu_nvidia_fs/ubuntu_nvidia_fs.sh b/ubuntu_nvidia_fs/ubuntu_nvidia_fs.sh
593new file mode 100755
594index 0000000..62b1a29
595--- /dev/null
596+++ b/ubuntu_nvidia_fs/ubuntu_nvidia_fs.sh
597@@ -0,0 +1,41 @@
598+#!/usr/bin/env bash
599+#
600+# perform Nvidia driver load testing and corresponding pre-setup.
601+#
602+
603+set -eo pipefail
604+
605+setup() {
606+ # pre-setup testing environment and necessary tools
607+ # currently there is nothing practically but will be used possibly in the future.
608+ echo "begin to pre-setup testing"
609+}
610+
611+run_test() {
612+ exe_dir=$(dirname "${BASH_SOURCE[0]}")
613+ pushd "${exe_dir}"
614+ #./test-each-nvidia-server-driver.sh
615+ ./nvidia-fs/a-c-t-entry.sh
616+ popd
617+}
618+
619+case $1 in
620+ setup)
621+ echo ""
622+ echo "On setting up necessary test environment..."
623+ echo ""
624+ setup
625+ echo ""
626+ echo "Setting up necessary test environment..."
627+ echo ""
628+ ;;
629+ test)
630+ echo ""
631+ echo "On running test..."
632+ echo ""
633+ run_test
634+ echo ""
635+ echo "Running test..."
636+ echo ""
637+ ;;
638+esac
639diff --git a/ubuntu_nvidia_server_driver/control b/ubuntu_nvidia_server_driver/control
640index a88eff0..3052a3c 100644
641--- a/ubuntu_nvidia_server_driver/control
642+++ b/ubuntu_nvidia_server_driver/control
643@@ -9,5 +9,4 @@ DOC = """
644 Perform testing of Nvidia server drivers
645 """
646
647-job.run_test_detail('ubuntu_nvidia_server_driver', test_name='nvidia-fs', tag='nvidia-fs', timeout=1500)
648-job.run_test_detail('ubuntu_nvidia_server_driver', test_name='load', tag='load', timeout=600)
649+job.run_test_detail('ubuntu_nvidia_server_driver', test_name='load', tag='load', timeout=1200)

Subscribers

People subscribed via source and target branches

to all changes: