Merge ~ian-may/+git/autotest-client-tests:ubuntu_nvidia_fs into ~canonical-kernel-team/+git/autotest-client-tests:master
- Git
- lp:~ian-may/+git/autotest-client-tests
- ubuntu_nvidia_fs
- Merge into master
Status: | Superseded |
---|---|
Proposed branch: | ~ian-may/+git/autotest-client-tests:ubuntu_nvidia_fs |
Merge into: | ~canonical-kernel-team/+git/autotest-client-tests:master |
Diff against target: |
649 lines (+567/-2) 13 files modified
ubuntu_nvidia_fs/control (+12/-0) ubuntu_nvidia_fs/nvidia-fs/00-vars (+11/-0) ubuntu_nvidia_fs/nvidia-fs/01-run-test.sh (+163/-0) ubuntu_nvidia_fs/nvidia-fs/02-inside-vm-update-kernel.sh (+52/-0) ubuntu_nvidia_fs/nvidia-fs/03-inside-vm-install-drivers.sh (+46/-0) ubuntu_nvidia_fs/nvidia-fs/04-inside-vm-setup-docker-and-run-test.sh (+41/-0) ubuntu_nvidia_fs/nvidia-fs/05-inside-docker-run-test.sh (+38/-0) ubuntu_nvidia_fs/nvidia-fs/README (+17/-0) ubuntu_nvidia_fs/nvidia-fs/a-c-t-entry.sh (+14/-0) ubuntu_nvidia_fs/nvidia-module-lib (+96/-0) ubuntu_nvidia_fs/ubuntu_nvidia_fs.py (+35/-0) ubuntu_nvidia_fs/ubuntu_nvidia_fs.sh (+41/-0) ubuntu_nvidia_server_driver/control (+1/-2) |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Francis Ginther | Needs Information | ||
Po-Hsu Lin | Approve | ||
Review via email: mp+428555@code.launchpad.net |
This proposal has been superseded by a proposal from 2022-09-20.
Commit message
Not all DGX systems need 'nvidia-fs' ran. So I'd like to decouple it from the 'nvidia driver load' test. No functional change to the test.
Description of the change
Francis Ginther (fginther) wrote : | # |
As Sam asked, is there any reason to keep nvidia-fs under ubuntu_
Functionally everything looks fine.
Unmerged commits
- c6eaf91... by Ian May
-
UBUNTU: SAUCE: ubuntu_
nvidia_ server_ driver: extend test timeout Increase test timeout to accommodate additional drivers in test.
Signed-off-by: Ian May <email address hidden>
- 122edec... by Ian May
-
UBUNTU: SAUCE: ubuntu_
nvidia_ server_ driver: disable nvidia-fs test With 'ubuntu-nvidia-fs' in place, disable the 'nvidia-fs' job call in
'ubuntu_nvidia_ server_ driver' . Signed-off-by: Ian May <email address hidden>
- f3d3b4a... by Ian May
-
UBUNTU: SAUCE: ubuntu_nvidia_fs: create nvidia-fs test
The 'nvidia-fs' test was originally added to
'ubuntu_nvidia_ server_ driver' . There are situations where
it would be better to have the 'nvidia-fs' as a stand alone
test. No functional change to the test.Signed-off-by: Ian May <email address hidden>
Preview Diff
1 | diff --git a/ubuntu_nvidia_fs/control b/ubuntu_nvidia_fs/control |
2 | new file mode 100644 |
3 | index 0000000..75d21a4 |
4 | --- /dev/null |
5 | +++ b/ubuntu_nvidia_fs/control |
6 | @@ -0,0 +1,12 @@ |
7 | +AUTHOR = 'Taihsiang Ho <taihsiang.ho@canonical.com>' |
8 | +TIME = 'SHORT' |
9 | +NAME = 'nvidia-fs module test' |
10 | +TEST_TYPE = 'client' |
11 | +TEST_CLASS = 'General' |
12 | +TEST_CATEGORY = 'Smoke' |
13 | + |
14 | +DOC = """ |
15 | +Perform testing of nvidia-fs module |
16 | +""" |
17 | + |
18 | +job.run_test_detail('ubuntu_nvidia_fs', test_name='nvidia-fs', tag='nvidia-fs', timeout=1500) |
19 | diff --git a/ubuntu_nvidia_fs/nvidia-fs/00-vars b/ubuntu_nvidia_fs/nvidia-fs/00-vars |
20 | new file mode 100644 |
21 | index 0000000..ad86f46 |
22 | --- /dev/null |
23 | +++ b/ubuntu_nvidia_fs/nvidia-fs/00-vars |
24 | @@ -0,0 +1,11 @@ |
25 | +# shellcheck shell=bash |
26 | +# shellcheck disable=SC2034 |
27 | +KERNEL_FLAVOR="generic" |
28 | +CUDA_CONTAINER_NAME="nvcr.io/nvidia/cuda" |
29 | +NVIDIA_BRANCH="470-server" |
30 | +LXD_INSTANCE="nvidia-fs-test" |
31 | +MLNX_REPO="https://linux.mellanox.com/public/repo/mlnx_ofed" |
32 | +MLNX_OFED_VER="5.4-1.0.3.0" |
33 | +if [ -f 00-vars.gen ]; then |
34 | + source ./00-vars.gen |
35 | +fi |
36 | diff --git a/ubuntu_nvidia_fs/nvidia-fs/01-run-test.sh b/ubuntu_nvidia_fs/nvidia-fs/01-run-test.sh |
37 | new file mode 100755 |
38 | index 0000000..9d6670c |
39 | --- /dev/null |
40 | +++ b/ubuntu_nvidia_fs/nvidia-fs/01-run-test.sh |
41 | @@ -0,0 +1,163 @@ |
42 | +#!/usr/bin/env bash |
43 | + |
44 | +set -e |
45 | +set -x |
46 | +set -o pipefail |
47 | + |
48 | +shopt -s nullglob |
49 | + |
50 | +rm -f 00-vars.gen # avoid stale configs from previous runs |
51 | +source 00-vars |
52 | +source ../nvidia-module-lib |
53 | + |
54 | +sudo apt install -y jq xmlstarlet |
55 | + |
56 | +driver_recommended_cuda_version() { |
57 | + local xmlout |
58 | + xmlout="$(mktemp)" |
59 | + |
60 | + sudo nvidia-smi -q -u -x --dtd | tee "$xmlout" > /dev/null |
61 | + xmlstarlet sel -t -v "/nvidia_smi_log/cuda_version" < "$xmlout" |
62 | + rm -f "$xmlout" |
63 | +} |
64 | + |
65 | +find_latest_cuda_container_tag_by_branch() { |
66 | + local branch="$1" # e.g. 11.4 |
67 | + local tmpfile="$(mktemp)" |
68 | + local url_api_base="https://registry.hub.docker.com/v2/repositories/nvidia/cuda/tags" |
69 | + source ./00-vars.gen # pick up LXD_OS_VER |
70 | + local search_tag=devel-ubuntu"${LXD_OS_VER}" |
71 | + local url=${url_api_base}"?name="-"${search_tag}" |
72 | + |
73 | + # List all of the available nvidia cuda image tags, filter for |
74 | + # devel/ubuntu images that match our cuda x.y, and sort numerically |
75 | + # to find the newest minor (x.y.z) version. |
76 | + # |
77 | + # Output is paginated, this loops through each page. |
78 | + while [ "$url" != "null" ]; do |
79 | + curl -L -s "$url" > "$tmpfile" |
80 | + url="$(jq '."next"' < "$tmpfile" | tr -d \")" |
81 | + jq '."results"[]["name"]' < "$tmpfile" | |
82 | + tr -d \" |
83 | + done | |
84 | + grep -E "^${branch}(\.[0-9]+)*-${search_tag}$" | \ |
85 | + sort -n | tail -1 |
86 | + rm -f "$tmpfile" |
87 | +} |
88 | + |
89 | +gen_vars() { |
90 | + local cuda_branch |
91 | + local container_tag |
92 | + |
93 | + # Match the host OS |
94 | + echo "LXD_OS_CODENAME=$(lsb_release -cs)" > 00-vars.gen |
95 | + echo "LXD_OS_VER=$(lsb_release -rs)" >> 00-vars.gen |
96 | + cuda_branch="$(driver_recommended_cuda_version)" |
97 | + container_tag="$(find_latest_cuda_container_tag_by_branch "$cuda_branch")" |
98 | + echo "CUDA_BRANCH=${cuda_branch}" >> 00-vars.gen |
99 | + echo "CUDA_CONTAINER_TAG=${container_tag}" >> 00-vars.gen |
100 | +} |
101 | + |
102 | +lxd_wait() { |
103 | + local instance="$1" |
104 | + |
105 | + for _ in $(seq 300); do |
106 | + if lxc exec "${instance}" -- /bin/true; then |
107 | + break |
108 | + fi |
109 | + sleep 1 |
110 | + done |
111 | +} |
112 | + |
113 | +is_whole_nvme_dev() { |
114 | + local dev |
115 | + dev="$(basename "$1")" |
116 | + echo "$dev" | grep -Eq '^nvme[0-9]+n[0-9]+$' |
117 | +} |
118 | + |
119 | +find_free_nvme() { |
120 | + local dev |
121 | + local children |
122 | + command -v jq > /dev/null || sudo apt install -y jq 1>&2 |
123 | + for dev in /dev/nvme*; do |
124 | + is_whole_nvme_dev "$dev" || continue |
125 | + # Is this device used by another kernel device (RAID/LVM/etc)? |
126 | + children=$(lsblk -J "$dev" | jq '.["blockdevices"][0]."children"') |
127 | + if [ "$children" = "null" ]; then |
128 | + echo "$dev" |
129 | + return 0 |
130 | + fi |
131 | + done |
132 | + return 1 |
133 | +} |
134 | + |
135 | +nvme_dev_to_bdf() { |
136 | + local dev="$1" |
137 | + local bdf="" |
138 | + |
139 | + while read -r comp; do |
140 | + if echo "$comp" | grep -q -E '^[0-9a-f]{4}:[0-9a-f]{2}:[0-9a-f]{2}\.[0-9a-f]$'; then |
141 | + bdf="$comp" |
142 | + fi |
143 | + done <<<"$(readlink /sys/block/"$(basename "$dev")" | tr / '\n')" |
144 | + if [ -z "$bdf" ]; then |
145 | + echo "ERROR: name_dev_to_bdf: No PCI address found for $dev" 1>&2 |
146 | + return 1 |
147 | + fi |
148 | + echo "$bdf" |
149 | + return 0 |
150 | +} |
151 | + |
152 | +gen_vars |
153 | +source ./00-vars.gen |
154 | + |
155 | +# 20.04 installs currently get LXD 4.0.7 by default, but we need at least |
156 | +# 4.11 for PCI passthrough support for VMs. latest/stable is new enough. |
157 | +sudo snap refresh lxd --channel=latest/stable |
158 | +sudo lxd init --auto |
159 | +lxc delete --force "$LXD_INSTANCE" || : |
160 | + |
161 | +# FIXME: Should probably dynamically adapt cpu/memory based on host system |
162 | +lxc launch --vm "ubuntu:${LXD_OS_CODENAME}" "$LXD_INSTANCE" \ |
163 | + -t c48-m16 \ |
164 | + -c security.secureboot=false # so we can load untrusted modules |
165 | + |
166 | +# Ran out of space pulling the docker image w/ the default 10GB. Double it. |
167 | +lxc config device override "${LXD_INSTANCE}" root size=20GB |
168 | +lxd_wait "${LXD_INSTANCE}" |
169 | + |
170 | +for file in 00-vars 00-vars.gen 02-inside-vm-update-kernel.sh 03-inside-vm-install-drivers.sh 04-inside-vm-setup-docker-and-run-test.sh 05-inside-docker-run-test.sh; do |
171 | + lxc file push ${file} "${LXD_INSTANCE}"/root/${file} |
172 | +done |
173 | +lxc exec "${LXD_INSTANCE}" -- /root/02-inside-vm-update-kernel.sh |
174 | + |
175 | +# Reboot to switch to updated kernel, so new drivers will build for it |
176 | +lxc stop "${LXD_INSTANCE}" |
177 | + |
178 | +# Release GPU devices so we can assign them to a VM |
179 | +sudo service nvidia-fabricmanager stop || : |
180 | +recursive_remove_module nvidia |
181 | + |
182 | +## Pass in devices. Note: devices can be assigned only while VM is stopped |
183 | + |
184 | +# Any Nvidia GPU will do, just grab the first one we find |
185 | +gpuaddr="$(lspci | grep '3D controller: NVIDIA Corporation' | cut -d' ' -f1 | head -1)" |
186 | +lxc config device add "${LXD_INSTANCE}" gpu pci "address=${gpuaddr}" |
187 | + |
188 | +# Find an unused NVMe device to pass in |
189 | +nvmedev=$(find_free_nvme) || \ |
190 | + (echo "ERROR: No unused nvme device found" 1>&2 && exit 1) |
191 | +nvmeaddr="$(nvme_dev_to_bdf "$nvmedev")" || \ |
192 | + (echo "ERROR: No PCI device found for $nvmedev" 1>&2 && exit 1) |
193 | +lxc config device add "${LXD_INSTANCE}" nvme pci "address=${nvmeaddr}" |
194 | + |
195 | +lxc start "${LXD_INSTANCE}" |
196 | +lxd_wait "${LXD_INSTANCE}" |
197 | +lxc exec "${LXD_INSTANCE}" -- /root/03-inside-vm-install-drivers.sh |
198 | + |
199 | +# Reboot to switch to new overridden drivers |
200 | +lxc stop "${LXD_INSTANCE}" |
201 | +lxc start "${LXD_INSTANCE}" |
202 | + |
203 | +lxd_wait "${LXD_INSTANCE}" |
204 | +lxc exec "${LXD_INSTANCE}" -- /root/04-inside-vm-setup-docker-and-run-test.sh |
205 | diff --git a/ubuntu_nvidia_fs/nvidia-fs/02-inside-vm-update-kernel.sh b/ubuntu_nvidia_fs/nvidia-fs/02-inside-vm-update-kernel.sh |
206 | new file mode 100755 |
207 | index 0000000..021cfc8 |
208 | --- /dev/null |
209 | +++ b/ubuntu_nvidia_fs/nvidia-fs/02-inside-vm-update-kernel.sh |
210 | @@ -0,0 +1,52 @@ |
211 | +#!/usr/bin/env bash |
212 | + |
213 | +set -e |
214 | +set -x |
215 | + |
216 | +source ./00-vars |
217 | + |
218 | +export DEBCONF_FRONTEND="noniteractive" |
219 | +export DEBIAN_PRIORITY="critical" |
220 | + |
221 | +enable_proposed() { |
222 | + local arch |
223 | + local release |
224 | + local mirror |
225 | + local pockets |
226 | + arch="$(dpkg --print-architecture)" |
227 | + release="$(lsb_release -cs)" |
228 | + pockets="restricted main universe multiverse" |
229 | + |
230 | + case $arch in |
231 | + i386|amd64) |
232 | + mirror="http://archive.ubuntu.com/ubuntu" |
233 | + ;; |
234 | + *) |
235 | + mirror="http://ports.ubuntu.com/ubuntu-ports" |
236 | + ;; |
237 | + esac |
238 | + |
239 | + echo "deb $mirror ${release}-proposed restricted $pockets" | \ |
240 | + sudo tee "/etc/apt/sources.list.d/${release}-proposed.list" > /dev/null |
241 | + echo "deb-src $mirror ${release}-proposed restricted $pockets" | \ |
242 | + sudo tee -a "/etc/apt/sources.list.d/${release}-proposed.list" > /dev/null |
243 | +} |
244 | + |
245 | +enable_proposed |
246 | +apt update |
247 | +apt install -y linux-"${KERNEL_FLAVOR}" \ |
248 | + linux-modules-nvidia-"${NVIDIA_BRANCH}"-"${KERNEL_FLAVOR}" \ |
249 | + nvidia-kernel-source-"${NVIDIA_BRANCH}" \ |
250 | + nvidia-utils-"${NVIDIA_BRANCH}" |
251 | + |
252 | +# Find the latest kernel version that matches our flavor and create "-test" |
253 | +# symlinks to it since they will sort highest, making it the default |
254 | +kver=$(linux-version list | grep -- "-${KERNEL_FLAVOR}$" | \ |
255 | + linux-version sort --reverse | head -1) |
256 | +ln -s "vmlinuz-${kver}" /boot/vmlinuz-test |
257 | +ln -s "initrd.img-${kver}" /boot/initrd.img-test |
258 | + |
259 | +# Workaround LP: #1849563 |
260 | +echo "GRUB_CMDLINE_LINUX_DEFAULT=\"\$GRUB_CMDLINE_LINUX_DEFAULT pci=nocrs pci=realloc\"" > /etc/default/grub.d/99-nvidia-fs-test.cfg |
261 | + |
262 | +update-grub |
263 | diff --git a/ubuntu_nvidia_fs/nvidia-fs/03-inside-vm-install-drivers.sh b/ubuntu_nvidia_fs/nvidia-fs/03-inside-vm-install-drivers.sh |
264 | new file mode 100755 |
265 | index 0000000..94a6147 |
266 | --- /dev/null |
267 | +++ b/ubuntu_nvidia_fs/nvidia-fs/03-inside-vm-install-drivers.sh |
268 | @@ -0,0 +1,46 @@ |
269 | +#!/usr/bin/env bash |
270 | + |
271 | +set -e |
272 | +set -x |
273 | + |
274 | +source ./00-vars |
275 | + |
276 | +export DEBCONF_FRONTEND="noniteractive" |
277 | +export DEBIAN_PRIORITY="critical" |
278 | + |
279 | +# Remove headers for all kernels except the one running so DKMS does not |
280 | +# try to build modules against them. Other kernels may not be compatible |
281 | +# with our modules, and we don't want the install to fail because of that. |
282 | +# We need to do this twice because apt will avoid removing a metapackage |
283 | +# (e.g. linux-kvm) if it can instead upgrade it, which may pull in a new |
284 | +# headers package. If that happens, the 2nd time through we'll remove that |
285 | +# updated headers package as well as the metapackage(s) that brung it. |
286 | +for _ in 1 2; do |
287 | + for file in /lib/modules/*/build; do |
288 | + if [ "$file" = "/lib/modules/$(uname -r)/build" ]; then |
289 | + continue |
290 | + fi |
291 | + apt remove --purge "$(dpkg -S "$file" | cut -d":" -f1 | sed 's/, / /g')" -y |
292 | + done |
293 | +done |
294 | + |
295 | +# Install MOFED stack |
296 | +wget -qO - https://www.mellanox.com/downloads/ofed/RPM-GPG-KEY-Mellanox | \ |
297 | + apt-key add - |
298 | +wget -qO - "${MLNX_REPO}/${MLNX_OFED_VER}/ubuntu${LXD_OS_VER}/mellanox_mlnx_ofed.list" | tee /etc/apt/sources.list.d/mellanox_mlnx_ofed.list |
299 | +apt update |
300 | +apt install -y mlnx-ofed-all mlnx-nvme-dkms mlnx-nfsrdma-dkms |
301 | + |
302 | +# Install nvidia-fs module |
303 | +cuda_os="ubuntu$(echo "$LXD_OS_VER" | tr -d .)" |
304 | + |
305 | +# keyring install instructions from: |
306 | +# https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html |
307 | +cuda_keyring_deb="$(mktemp)" |
308 | +wget "https://developer.download.nvidia.com/compute/cuda/repos/$cuda_os/x86_64/cuda-keyring_1.0-1_all.deb" -O "$cuda_keyring_deb" |
309 | +sudo dpkg -i "$cuda_keyring_deb" |
310 | +rm -f "$cuda_keyring_deb" |
311 | + |
312 | +add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/${cuda_os}/x86_64/ /" |
313 | +apt install -y nvidia-fs-dkms |
314 | +add-apt-repository -r "deb https://developer.download.nvidia.com/compute/cuda/repos/${cuda_os}/x86_64/ /" |
315 | diff --git a/ubuntu_nvidia_fs/nvidia-fs/04-inside-vm-setup-docker-and-run-test.sh b/ubuntu_nvidia_fs/nvidia-fs/04-inside-vm-setup-docker-and-run-test.sh |
316 | new file mode 100755 |
317 | index 0000000..3cbd62b |
318 | --- /dev/null |
319 | +++ b/ubuntu_nvidia_fs/nvidia-fs/04-inside-vm-setup-docker-and-run-test.sh |
320 | @@ -0,0 +1,41 @@ |
321 | +#!/usr/bin/env bash |
322 | + |
323 | +set -e |
324 | +set -x |
325 | + |
326 | +source ./00-vars |
327 | + |
328 | +install_nvidia_docker() { |
329 | + local distribution |
330 | + distribution="$(. /etc/os-release;echo "$ID$VERSION_ID")" |
331 | + curl --retry 6 --retry-delay 10 --silent --show-error -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - |
332 | + curl --retry 6 --retry-delay 10 --silent --show-error -L "https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list" | \ |
333 | + sudo tee /etc/apt/sources.list.d/nvidia-docker.list > /dev/null |
334 | + sudo apt update |
335 | + sudo apt install -y nvidia-docker2 -y |
336 | + sudo systemctl restart docker |
337 | +} |
338 | + |
339 | +umount /mnt/nvme || true |
340 | +parted -s /dev/nvme0n1 -- mklabel gpt |
341 | +parted -s /dev/nvme0n1 -- mkpart primary ext4 0 100% |
342 | +udevadm settle |
343 | +mkfs.ext4 -F "/dev/nvme0n1p1" |
344 | +mkdir -p /mnt/nvme |
345 | +mount "/dev/nvme0n1p1" /mnt/nvme -o data=ordered |
346 | + |
347 | +modprobe nvidia-fs |
348 | + |
349 | +install_nvidia_docker |
350 | + |
351 | +container="${CUDA_CONTAINER_NAME}:${CUDA_CONTAINER_TAG}" |
352 | + |
353 | +docker pull "${container}" |
354 | +docker run --rm --ipc host --name test_gds --gpus device=all \ |
355 | + --volume /run/udev:/run/udev:ro \ |
356 | + --volume /sys/kernel/config:/sys/kernel/config/ \ |
357 | + --volume /dev:/dev:ro \ |
358 | + --volume /mnt/nvme:/data/:rw \ |
359 | + --volume /root:/root/:ro \ |
360 | + --privileged "${container}" \ |
361 | + bash -c 'cd /root && ./05-inside-docker-run-test.sh' |
362 | diff --git a/ubuntu_nvidia_fs/nvidia-fs/05-inside-docker-run-test.sh b/ubuntu_nvidia_fs/nvidia-fs/05-inside-docker-run-test.sh |
363 | new file mode 100755 |
364 | index 0000000..652bb55 |
365 | --- /dev/null |
366 | +++ b/ubuntu_nvidia_fs/nvidia-fs/05-inside-docker-run-test.sh |
367 | @@ -0,0 +1,38 @@ |
368 | +#!/usr/bin/env bash |
369 | + |
370 | +set -e |
371 | +set -x |
372 | + |
373 | +source ./00-vars |
374 | + |
375 | +# We want e.g. gds-tools-11-4 if using CUDA 11.4 |
376 | +gds_tools="gds-tools-$(echo "$CUDA_BRANCH" | tr "." "-")" |
377 | + |
378 | +apt update |
379 | +apt install "$gds_tools" libssl-dev -y |
380 | +cd /usr/local/cuda/gds/samples |
381 | +make -j "$(nproc)" |
382 | +dd status=none if=/dev/urandom of=/data/file1 iflag=fullblock bs=1M count=1024 |
383 | +dd status=none if=/dev/urandom of=/data/file2 iflag=fullblock bs=1M count=1024 |
384 | + |
385 | +#Edit cufile.json and set "allow_compat" property to "false". |
386 | +sed -i 's/"allow_compat_mode": true,/"allow_compat_mode": false,/' /etc/cufile.json |
387 | + |
388 | +echo "sample1" |
389 | +./cufile_sample_001 /data/file1 0 |
390 | +echo "sample 2" |
391 | +./cufile_sample_002 /data/file1 0 |
392 | +echo "sample 3" |
393 | +./cufile_sample_003 /data/file1 /data/file2 0 |
394 | +echo "sample 4" |
395 | +./cufile_sample_004 /data/file1 /data/file2 0 |
396 | +echo "sample 5" |
397 | +./cufile_sample_005 /data/file1 /data/file2 0 |
398 | +echo "sample 6" |
399 | +./cufile_sample_006 /data/file1 /data/file2 0 |
400 | +echo "sample 7" |
401 | +./cufile_sample_007 0 |
402 | +echo "sample 8" |
403 | +./cufile_sample_008 0 |
404 | +echo "sample 14" |
405 | +./cufile_sample_014 /data/file1 /data/file2 0 |
406 | diff --git a/ubuntu_nvidia_fs/nvidia-fs/README b/ubuntu_nvidia_fs/nvidia-fs/README |
407 | new file mode 100644 |
408 | index 0000000..fb68ce7 |
409 | --- /dev/null |
410 | +++ b/ubuntu_nvidia_fs/nvidia-fs/README |
411 | @@ -0,0 +1,17 @@ |
412 | += nvidia-fs testing = |
413 | +The goal of this test is to confirm that the nvidia-fs module continues to |
414 | +build and work properly with new kernel updates. |
415 | + |
416 | +The environment in which this test needs to run requires several 3rd party |
417 | +pieces of software - including other 3rd party modules that require a reboot |
418 | +after installation. To avoid having to handle reboots of the test client, |
419 | +we instead do the test inside of a virtual machine that the test client |
420 | +can spin up and reboot itself. The actual nvidia-fs test runs in a docker |
421 | +container inside that virtual machine. |
422 | + |
423 | +The test is kicked off by running 01-run-test.sh, which will run each of |
424 | +the other scripts in turn to set up the virtual machine and the test |
425 | +docker container within it. |
426 | + |
427 | + |
428 | + |
429 | diff --git a/ubuntu_nvidia_fs/nvidia-fs/a-c-t-entry.sh b/ubuntu_nvidia_fs/nvidia-fs/a-c-t-entry.sh |
430 | new file mode 100755 |
431 | index 0000000..38c3c54 |
432 | --- /dev/null |
433 | +++ b/ubuntu_nvidia_fs/nvidia-fs/a-c-t-entry.sh |
434 | @@ -0,0 +1,14 @@ |
435 | +#!/usr/bin/env bash |
436 | + |
437 | +set -e |
438 | +set -x |
439 | + |
440 | +# make sure a-c-t invoke the script in the right directory context |
441 | +run_test() { |
442 | + exe_dir=$(dirname "${BASH_SOURCE[0]}") |
443 | + pushd "${exe_dir}" |
444 | + ./01-run-test.sh |
445 | + popd |
446 | +} |
447 | + |
448 | +run_test |
449 | diff --git a/ubuntu_nvidia_fs/nvidia-module-lib b/ubuntu_nvidia_fs/nvidia-module-lib |
450 | new file mode 100644 |
451 | index 0000000..06141bf |
452 | --- /dev/null |
453 | +++ b/ubuntu_nvidia_fs/nvidia-module-lib |
454 | @@ -0,0 +1,96 @@ |
455 | +# Copyright 2021 Canonical Ltd. |
456 | +# Written by: |
457 | +# Dann Frazier <dann.frazier@canonical.com> |
458 | +# Taihsiang Ho <taihsiang.ho@canonical.com> |
459 | +# |
460 | +# shellcheck shell=bash |
461 | +module_loaded() { |
462 | + module="$1" |
463 | + # Check linux/include/linux/module.h for module_state enumeration |
464 | + # There are the other states like Loading and Unloading besides Live. The |
465 | + # other states usually only take only few microseconds but let's specify |
466 | + # Live explicitly. |
467 | + grep "^${module} " /proc/modules | grep -q Live |
468 | +} |
469 | + |
470 | +get_module_field() { |
471 | + local module="$1" |
472 | + local field="$2" |
473 | + # shellcheck disable=SC2034 |
474 | + read -r mod size usecnt deps rest < <(grep "^${module} " /proc/modules) |
475 | + case $field in |
476 | + usecnt) |
477 | + echo "$usecnt" |
478 | + ;; |
479 | + deps) |
480 | + if [ "$deps" = "-" ]; then |
481 | + return 0 |
482 | + fi |
483 | + echo "$deps" | tr ',' ' ' |
484 | + ;; |
485 | + *) |
486 | + return 1 |
487 | + esac |
488 | +} |
489 | + |
490 | +module_in_use() { |
491 | + module="$1" |
492 | + |
493 | + usecnt="$(get_module_field "$module" usecnt)" |
494 | + |
495 | + if [ "$usecnt" -eq 0 ]; then |
496 | + return 1 |
497 | + fi |
498 | + return 0 |
499 | +} |
500 | + |
501 | +recursive_remove_module() { |
502 | + local module="$1" |
503 | + |
504 | + if ! module_loaded "$module"; then |
505 | + return 0 |
506 | + fi |
507 | + |
508 | + if ! module_in_use "$module"; then |
509 | + sudo rmmod "$module" |
510 | + return 0 |
511 | + fi |
512 | + |
513 | + if [ "$(get_module_field "$module" deps)" = "" ]; then |
514 | + echo "ERROR: $module is in use, but has no reverse dependencies" |
515 | + echo "ERROR: Maybe an application is using it." |
516 | + exit 1 |
517 | + fi |
518 | + beforecnt="$(get_module_field "$module" usecnt)" |
519 | + for dep in $(get_module_field "$module" deps); do |
520 | + recursive_remove_module "$dep" |
521 | + done |
522 | + aftercnt="$(get_module_field "$module" usecnt)" |
523 | + if [ "$beforecnt" -eq "$aftercnt" ]; then |
524 | + echo "ERROR: Unable to reduce $module use count" |
525 | + exit 1 |
526 | + fi |
527 | + recursive_remove_module "$module" |
528 | +} |
529 | + |
530 | +uninstall_all_nvidia_mod_pkgs() { |
531 | + for pkg in $(dpkg-query -f "\${Package}\n" -W 'linux-modules-nvidia-*'); do |
532 | + sudo apt remove --purge "$pkg" -y |
533 | + done |
534 | + if sudo modinfo nvidia; then |
535 | + echo "ERROR: Uninstallation of all nvidia modules failed." |
536 | + exit 1 |
537 | + fi |
538 | +} |
539 | + |
540 | +product="$(sudo dmidecode -s baseboard-product-name)" |
541 | +pkg_compatible_with_platform() { |
542 | + local pkg="$1" |
543 | + branch="$(echo "$pkg" | cut -d- -f4)" |
544 | + |
545 | + if [ "$product" = "DGXA100" ] && [ "$branch" -le "418" ]; then |
546 | + return 1 |
547 | + fi |
548 | + |
549 | + return 0 |
550 | +} |
551 | diff --git a/ubuntu_nvidia_fs/ubuntu_nvidia_fs.py b/ubuntu_nvidia_fs/ubuntu_nvidia_fs.py |
552 | new file mode 100644 |
553 | index 0000000..77ac0bb |
554 | --- /dev/null |
555 | +++ b/ubuntu_nvidia_fs/ubuntu_nvidia_fs.py |
556 | @@ -0,0 +1,35 @@ |
557 | +import os |
558 | +from autotest.client import test, utils |
559 | + |
560 | +p_dir = os.path.dirname(os.path.abspath(__file__)) |
561 | +sh_executable = os.path.join(p_dir, "ubuntu_nvidia_fs.sh") |
562 | + |
563 | + |
564 | +class ubuntu_nvidia_fs(test.test): |
565 | + version = 1 |
566 | + |
567 | + def initialize(self): |
568 | + pass |
569 | + |
570 | + def setup(self): |
571 | + cmd = "{} setup".format(sh_executable) |
572 | + utils.system(cmd) |
573 | + |
574 | + def run_nvidia_fs_in_lxc(self): |
575 | + #cmd = os.path.join(p_dir, "./nvidia-fs/a-c-t-entry.sh") |
576 | + #utils.system(cmd) |
577 | + cmd = "{} test".format(sh_executable) |
578 | + utils.system(cmd) |
579 | + |
580 | + def run_once(self, test_name): |
581 | + print("HELLO WORLD") |
582 | + if test_name == "nvidia-fs": |
583 | + self.run_nvidia_fs_in_lxc() |
584 | + |
585 | + print("") |
586 | + print("{} has run.".format(test_name)) |
587 | + |
588 | + print("") |
589 | + |
590 | + def postprocess_iteration(self): |
591 | + pass |
592 | diff --git a/ubuntu_nvidia_fs/ubuntu_nvidia_fs.sh b/ubuntu_nvidia_fs/ubuntu_nvidia_fs.sh |
593 | new file mode 100755 |
594 | index 0000000..62b1a29 |
595 | --- /dev/null |
596 | +++ b/ubuntu_nvidia_fs/ubuntu_nvidia_fs.sh |
597 | @@ -0,0 +1,41 @@ |
598 | +#!/usr/bin/env bash |
599 | +# |
600 | +# perform Nvidia driver load testing and corresponding pre-setup. |
601 | +# |
602 | + |
603 | +set -eo pipefail |
604 | + |
605 | +setup() { |
606 | + # pre-setup testing environment and necessary tools |
607 | + # currently there is nothing practically but will be used possibly in the future. |
608 | + echo "begin to pre-setup testing" |
609 | +} |
610 | + |
611 | +run_test() { |
612 | + exe_dir=$(dirname "${BASH_SOURCE[0]}") |
613 | + pushd "${exe_dir}" |
614 | + #./test-each-nvidia-server-driver.sh |
615 | + ./nvidia-fs/a-c-t-entry.sh |
616 | + popd |
617 | +} |
618 | + |
619 | +case $1 in |
620 | + setup) |
621 | + echo "" |
622 | + echo "On setting up necessary test environment..." |
623 | + echo "" |
624 | + setup |
625 | + echo "" |
626 | + echo "Setting up necessary test environment..." |
627 | + echo "" |
628 | + ;; |
629 | + test) |
630 | + echo "" |
631 | + echo "On running test..." |
632 | + echo "" |
633 | + run_test |
634 | + echo "" |
635 | + echo "Running test..." |
636 | + echo "" |
637 | + ;; |
638 | +esac |
639 | diff --git a/ubuntu_nvidia_server_driver/control b/ubuntu_nvidia_server_driver/control |
640 | index a88eff0..3052a3c 100644 |
641 | --- a/ubuntu_nvidia_server_driver/control |
642 | +++ b/ubuntu_nvidia_server_driver/control |
643 | @@ -9,5 +9,4 @@ DOC = """ |
644 | Perform testing of Nvidia server drivers |
645 | """ |
646 | |
647 | -job.run_test_detail('ubuntu_nvidia_server_driver', test_name='nvidia-fs', tag='nvidia-fs', timeout=1500) |
648 | -job.run_test_detail('ubuntu_nvidia_server_driver', test_name='load', tag='load', timeout=600) |
649 | +job.run_test_detail('ubuntu_nvidia_server_driver', test_name='load', tag='load', timeout=1200) |
Hi Ian,
overall it's looking good. +1 on this.
Some cleanup questions: nvidia_ server_ driver? nvidia_ server_ driver. py:
* Do you still want to keep the nvidia-fs/ in ubuntu_
* Also, these lines in ubuntu_
22 def run_nvidia_ fs_in_lxc( self): fs/a-c- t-entry. sh")
23 cmd = os.path.join(p_dir, "./nvidia-
24 utils.system(cmd)
And the test_name if statement for checking nvidia-fs.
It's rather trivial. So I am ok to keep or not to keep these.