Merge ~os369510/plainbox-provider-pc-sanity:gpu-driver-checker into plainbox-provider-pc-sanity:master

Proposed by jeremyszu
Status: Merged
Merged at revision: 796d721e586dfba282eec0737b944a63efd2a17a
Proposed branch: ~os369510/plainbox-provider-pc-sanity:gpu-driver-checker
Merge into: plainbox-provider-pc-sanity:master
Diff against target: 100 lines (+63/-0)
3 files modified
bin/gpu-driver-checker.sh (+50/-0)
units/pc-sanity/pc-sanity-gpu.pxu (+10/-0)
units/pc-sanity/pc-sanity.pxu (+3/-0)
Reviewer Review Type Date Requested Status
Cyrus Lien Approve
Bin Li Approve
Kai-Chuan Hsieh Approve
Andy Chi Approve
Alex Tu Pending
OEM Solutions Group: Engineers Pending
Review via email: mp+414122@code.launchpad.net

Commit message

Add gpu-driver-checker to check gpu driver

1. Check whether all GPUs have their driver.
2. Check nvidia driver whether LTS version (LP: #1919118).
3. Check nvidia driver whether pre-signed version.
4. Check whether nvidia-dkms be installed.

Description of the change

Because of https://bugs.launchpad.net/sutton/+bug/1919118.

Many IDs be removed from 470 driver and there is no any LTS driver could use.
In this moment, only 495 (non-LTS) and need to wait until 510.

We shouldn't ship 495 driver unless have a green-light.
Write a test case to finger out unexpected driver version and also a GPU without driver.

To post a comment you must log in.
Revision history for this message
OEM Taipei Bot (oem-taipei-bot) wrote :
Revision history for this message
jeremyszu (os369510) :
Revision history for this message
OEM Taipei Bot (oem-taipei-bot) wrote :

[autopkgtest]
blame: .
badpkg: rules build failed with exit code 2
erroneous package: rules build failed with exit code 2

https://oem-share.canonical.com/partners/lyoncore/share/artifacts/plainbox-provider-pc-sanity/plainbox-provider-pc-sanity-1.0.1ubuntu1-feac4af-in-linux-container-focal

Revision history for this message
jeremyszu (os369510) wrote :

ubuntu@ubuntu-XPS-15-9510:~$ checkbox-cli run com.canonical.certification::miscellanea/check-gpu-driver
===========================[ Running Selected Jobs ]============================
--------------[ Running job 1 / 1. Estimated time left: unknown ]---------------
----------------------[ Check drivers on each gpu cards. ]----------------------
ID: com.canonical.certification::miscellanea/check-gpu-driver
Category: com.canonical.plainbox::miscellanea
... 8< -------------------------------------------------------------------------
Your GPU 0000:00:02.0 is using i915.
Your GPU 0000:01:00.0 is using nvidia.
Nvidia version is 470.86.
Nvidia driver is signed.
Nvidia driver is LTS version.
------------------------------------------------------------------------- >8 ---
Outcome: job passed
Finalizing session that hasn't been submitted anywhere: checkbox-run-2022-02-18T13.34.12
==================================[ Results ]===================================
 ☑ : Check drivers on each gpu cards.

Revision history for this message
OEM Taipei Bot (oem-taipei-bot) wrote :
Revision history for this message
jeremyszu (os369510) wrote :

Test on HP-ZBook-Studio-16.0-Inch-Mobile-Worksta_202201-29882 10.102.183.127

Your GPU 0000:00:02.0 is using i915.
Your GPU 0000:01:00.0 is using nvidia.
Nvidia version is 470.86.
E: Your nvidia driver is not signed by Canonical.
E: Expecting linux-modules-nvidia-470-5.14.0-1015-oem.
Nvidia driver is LTS version.

Revision history for this message
jeremyszu (os369510) wrote :

Test on WNT5-DVT1-C2_202103-28834 10.102.183.124

# In powersaving mode:
Your GPU 0000:00:02.0 is using i915.
E: Your GPU 0000:01:00.0 (0x10de:0x1c94) haven't driver.
01:00.0 3D controller [0302]: NVIDIA Corporation GP107M [GeForce MX350] [10de:1c94] (rev a1)
 Subsystem: Dell GP107M [GeForce MX350] [1028:0ab0]
 Flags: bus master, fast devsel, latency 0, IRQ 255
 Memory at 71000000 (32-bit, non-prefetchable) [size=16M]
 Memory at 6000000000 (64-bit, prefetchable) [size=256M]
 Memory at 6010000000 (64-bit, prefetchable) [size=32M]
 I/O ports at 3000 [size=128]
 Expansion ROM at <ignored> [disabled]
 Capabilities: [60] Power Management version 3
 Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
 Capabilities: [78] Express Endpoint, MSI 00
 Capabilities: [100] Virtual Channel
 Capabilities: [250] Latency Tolerance Reporting
 Capabilities: [258] L1 PM Substates
 Capabilities: [128] Power Budgeting <?>
 Capabilities: [420] Advanced Error Reporting
 Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
 Capabilities: [900] Secondary PCI Express
 Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

Nvidia version is 470.74.
E: Your nvidia driver is not signed by Canonical.
E: Expecting linux-modules-nvidia-470-5.14.0-1005-oem.
Nvidia driver is LTS version.

# In on-demand mode:
Your GPU 0000:00:02.0 is using i915.
Your GPU 0000:01:00.0 is using nvidia.
Nvidia version is 470.74.
E: Your nvidia driver is not signed by Canonical.
E: Expecting linux-modules-nvidia-470-5.14.0-1005-oem.
Nvidia driver is LTS version.

Revision history for this message
jeremyszu (os369510) wrote :

Test on BMM4-DVT1.1-C4X_202111-29673 10.102.182.50

Your GPU 0000:00:02.0 is using i915.
Your GPU 0000:01:00.0 is using nvidia.
Nvidia version is 470.86.
E: Your nvidia driver is not signed by Canonical.
E: Expecting linux-modules-nvidia-470-5.14.0-1012-oem.
Nvidia driver is LTS version.

Revision history for this message
jeremyszu (os369510) wrote :

Test on 4 Dell machines, 1 HP machine and results as my expected.

Anyway, I aware the all Dell machines are using dkms?
It doesn't make sense.

Revision history for this message
OEM Taipei Bot (oem-taipei-bot) wrote :
Revision history for this message
jeremyszu (os369510) wrote (last edit ):

Test on HP-ZBook-Fury-17-G7-Mobile-Workstation_202012-28490 10.102.180.43

The UMA platform:

Your GPU 0000:00:02.0 is using i915.

Revision history for this message
OEM Taipei Bot (oem-taipei-bot) wrote :
Revision history for this message
Andy Chi (andch) wrote :

Run on SIF-MLK-DVT2-C2 and passed.

--------------[ Running job 1 / 1. Estimated time left: unknown ]---------------
----------------------[ Check drivers on each gpu cards. ]----------------------
ID: com.canonical.certification::miscellanea/check-gpu-driver
Category: com.canonical.plainbox::miscellanea
... 8< -------------------------------------------------------------------------
Your GPU 0000:00:02.0 is using i915.
Your GPU 0000:01:00.0 is using nvidia.
Nvidia version is 470.86.
Nvidia driver is signed.
Nvidia driver is LTS version.
------------------------------------------------------------------------- >8 ---
Outcome: job passed
Finalizing session that hasn't been submitted anywhere: checkbox-run-2022-02-21T02.24.52
==================================[ Results ]===================================
 ☑ : Check drivers on each gpu cards.

LGTM.

Revision history for this message
Andy Chi (andch) :
review: Approve
Revision history for this message
Kai-Chuan Hsieh (kchsieh) wrote :

The script have command with root privilege, might need "user: root" in pxu file.

Revision history for this message
Cyrus Lien (cyruslien) wrote :

1. Need to make sure this test case runs before test case which does switching to igpu.
   e.g. com.canonical.certification::after-suspend-graphics/1_auto_switch_card_PCI_ID_0x4628

2. Something weird of message "N: Unable to locate package nvidia-driver-470.103"
   In my test machine:
   ii nvidia-driver-470 470.103.01-0ubuntu0.20.04.1 amd64 NVIDIA driver metapackage

Test machine: 202202-29951 (WMNA5-DVT2-C1) 10.102.182.90

BTW, WNT5-DVT1-C2_202103-28834 10.102.183.124 is for Stanly testing, which image is alloem-init.

review: Needs Information
Revision history for this message
Bin Li (binli) wrote :

I tried the script on stock ubuntu 20.04.3. And looks the result for apt-cache is wrong, it should use 'apt-cache madison nvidia-driver-470'.

$ ./testj.sh
Your GPU 0000:00:02.0 is using i915.
Your GPU 0000:01:00.0 is using nvidia.
Nvidia version is 470.103.01.
E: Your nvidia driver is not signed by Canonical.
E: Expecting linux-modules-nvidia-470.103-5.13.0-30-generic.
E: This nvidia is not LTS version, please check.
N: Unable to locate package nvidia-driver-470.103

review: Needs Fixing
Revision history for this message
jeremyszu (os369510) wrote :

Hi KC,

> might need "user: root" in pxu file.
Fixed, thanks.

Hi Cyrus,

1. I think it should before switching card. (Although I think the card switching is useless now but cert team keeps it)
...
 39 com.canonical.certification::miscellanea/check-gpu-driver
...
103 com.canonical.certification::graphics/1_auto_switch_card_TigerLake-H_GT1__UHD_Graphics_
...
112 com.canonical.certification::graphics/2_auto_switch_card_GA107M__GeForce_RTX_3050_Ti_Mobile_

May I know which test plan you meaning?

> 2. Something weird of message "N: Unable to locate package nvidia-driver-470.103"
Thanks! I fixed it, but this machine is using dkms. Thus, the other failed.
Could you please check it again? many thanks!

Hi Bin,

I tried the stock ubuntu 20.04.3 and it works after the latest commmit.
Could you please help to update the script and provide:
`bash -x ./testj.sh`?
`modinfo nvidia`?

Revision history for this message
Bin Li (binli) wrote :

$ modinfo nvidia | grep sig
sig_id: PKCS#7
signer: binli-ThinkPad-P17-Gen-1 Secure Boot Module Signature key
sig_key: 1C:1F:49:5D:E4:BB:EB:41:35:73:9C:40:61:7B:85:06:6A:F9:3D:4A
sig_hashalgo: sha512
signature: 76:E9:8A:71:A1:0E:48:EE:5E:3A:6B:92:BD:7C:C7:B9:0D:2B:7E:39:

Revision history for this message
OEM Taipei Bot (oem-taipei-bot) wrote :
Revision history for this message
Kai-Chuan Hsieh (kchsieh) wrote :

LGTM

review: Approve
Revision history for this message
Bin Li (binli) wrote :

LGTM, thanks!

$ ./testj.sh
Your GPU 0000:00:02.0 is using i915.
Your GPU 0000:01:00.0 is using nvidia.
Nvidia version is 470.103.01.
E: Your nvidia driver is not signed by Canonical.
E: Expecting linux-modules-nvidia-470-5.13.0-30-generic.
Nvidia driver is LTS version

review: Approve
Revision history for this message
Cyrus Lien (cyruslien) wrote :

u@ubuntu:/tmp$ ./gpu-driver-checker.sh
Your GPU 0000:00:02.0 is using i915.
Your GPU 0000:01:00.0 is using nvidia.
Nvidia version is 470.103.01.
Nvidia driver is signed.
Nvidia driver is LTS version.

Test machine: 202202-29951 (WMNA5-DVT2-C1) 10.102.182.90

review: Approve
Revision history for this message
Bin Li (binli) wrote :

+ set -e [19/805]
+ result=0
++ lspci -n -d ::0x0300
++ awk '{print $1}'
++ lspci -n -d ::0x0302
++ awk '{print $1}'
+ for gpu in $(lspci -n -d ::0x0300| awk '{print $1}') $(lspci -n -d ::0x0302| awk '{print $1}')
+ [[ 00:02.0 != \0\0\0\0* ]]
+ gpu=0000:00:02.0
++ cat /sys/bus/pci/devices/0000:00:02.0/vendor
+ vendor=0x8086
++ cat /sys/bus/pci/devices/0000:00:02.0/device
+ device=0x9bc4
+ '[' '!' -d /sys/bus/pci/devices/0000:00:02.0/driver ']'
+++ readlink /sys/bus/pci/devices/0000:00:02.0/driver
++ basename ../../../bus/pci/drivers/i915
+ driver=i915
+ echo 'Your GPU 0000:00:02.0 is using i915.'
Your GPU 0000:00:02.0 is using i915.
+ for gpu in $(lspci -n -d ::0x0300| awk '{print $1}') $(lspci -n -d ::0x0302| awk '{print $1}')
+ [[ 01:00.0 != \0\0\0\0* ]]
+ gpu=0000:01:00.0
++ cat /sys/bus/pci/devices/0000:01:00.0/vendor
+ vendor=0x10de
++ cat /sys/bus/pci/devices/0000:01:00.0/device
+ device=0x1eb6
+ '[' '!' -d /sys/bus/pci/devices/0000:01:00.0/driver ']'
+++ readlink /sys/bus/pci/devices/0000:01:00.0/driver
++ basename ../../../../bus/pci/drivers/nvidia
+ driver=nvidia
+ echo 'Your GPU 0000:01:00.0 is using nvidia.'
Your GPU 0000:01:00.0 is using nvidia.
++ modinfo nvidia
++ grep '^version'
++ awk '{print $2}'
+ nvidia_version=470.103.01
+ '[' -n 470.103.01 ']'
+ nvidia_pkg_prefix=nvidia-driver-
+ signed_nvidia_prefix=linux-modules-nvidia
+ echo 'Nvidia version is 470.103.01.'
Nvidia version is 470.103.01.
+ modinfo nvidia
+ grep -q '^signer:.*Canonical Ltd. Kernel Module Signing'
+ echo 'E: Your nvidia driver is not signed by Canonical.'
E: Your nvidia driver is not signed by Canonical.
++ uname -r
+ echo 'E: Expecting linux-modules-nvidia-470-5.13.0-30-generic.'
E: Expecting linux-modules-nvidia-470-5.13.0-30-generic.
+ result=255
+ pkg=nvidia-driver-470
++ apt show nvidia-driver-470
++ grep '^Support:'
++ awk '{print $2}'
+ support=LTSB
+ '[' LTSB '!=' LTSB ']'
+ echo 'Nvidia driver is LTS version.'
Nvidia driver is LTS version.
+ exit 255

Revision history for this message
Bin Li (binli) wrote :

filename: /lib/modules/5.13.0-30-generic/updates/dkms/nvidia.ko
firmware: nvidia/470.103.01/gsp.bin
alias: char-major-195-*
version: 470.103.01
supported: external
license: NVIDIA
srcversion: DA38FB2932B7F54B41FC6D0
alias: pci:v000010DEd*sv*sd*bc03sc02i00*
alias: pci:v000010DEd*sv*sd*bc03sc00i00*
depends: drm
retpoline: Y
name: nvidia
vermagic: 5.13.0-30-generic SMP mod_unload modversions
sig_id: PKCS#7
signer: binli-ThinkPad-P17-Gen-1 Secure Boot Module Signature key
sig_key: 1C:1F:49:5D:E4:BB:EB:41:35:73:9C:40:61:7B:85:06:6A:F9:3D:4A
sig_hashalgo: sha512
signature: 76:E9:8A:71:A1:0E:48:EE:5E:3A:6B:92:BD:7C:C7:B9:0D:2B:7E:39:
                13:95:20:CB:EF:8A:F2:7F:D3:F5:0A:23:7C:E9:3F:65:F6:02:72:A4:

Revision history for this message
jeremyszu (os369510) wrote :

Thank you all guys!

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/bin/gpu-driver-checker.sh b/bin/gpu-driver-checker.sh
2new file mode 100755
3index 0000000..ebbfa7d
4--- /dev/null
5+++ b/bin/gpu-driver-checker.sh
6@@ -0,0 +1,50 @@
7+#!/bin/bash
8+
9+set -e
10+
11+result=0
12+
13+# Available driver check in each GPU
14+for gpu in $(lspci -n -d ::0x0300| awk '{print $1}') \
15+ $(lspci -n -d ::0x0302| awk '{print $1}'); do
16+ if [[ ${gpu} != "0000"* ]]; then
17+ gpu="0000:${gpu}"
18+ fi
19+ vendor=$(cat /sys/bus/pci/devices/"${gpu}"/vendor)
20+ device=$(cat /sys/bus/pci/devices/"${gpu}"/device)
21+ if [ ! -d "/sys/bus/pci/devices/${gpu}/driver" ]; then
22+ echo "E: Your GPU ${gpu} (${vendor}:${device}) hasn't driver."
23+ sudo lspci -nnvk -s "$gpu"
24+ result=255
25+ else
26+ driver=$(basename "$(readlink /sys/bus/pci/devices/"${gpu}"/driver)")
27+ echo "Your GPU ${gpu} is using ${driver}."
28+ fi
29+done
30+
31+
32+# Check nvidia driver
33+nvidia_version=$(modinfo nvidia 2>/dev/null| grep "^version"| awk '{print $2}')
34+if [ -n "$nvidia_version" ]; then
35+ nvidia_pkg_prefix="nvidia-driver-"
36+ signed_nvidia_prefix="linux-modules-nvidia"
37+ echo "Nvidia version is ${nvidia_version}."
38+ if ! modinfo nvidia| grep -q "^signer:.*Canonical Ltd. Kernel Module Signing"; then
39+ echo "E: Your nvidia driver is not signed by Canonical."
40+ echo "E: Expecting ${signed_nvidia_prefix}-${nvidia_version%%.*}-$(uname -r)."
41+ result=255
42+ else
43+ echo "Nvidia driver is signed."
44+ fi
45+ pkg="${nvidia_pkg_prefix}${nvidia_version%%.*}"
46+ support=$(apt show "${pkg}" 2>/dev/null| grep "^Support:"| awk '{print $2}')
47+ if [ "$support" != "LTSB" ]; then
48+ echo "E: ${pkg} is not LTS version, please check."
49+ apt-cache madison "$pkg"
50+ result=255
51+ else
52+ echo "Nvidia driver is LTS version."
53+ fi
54+fi
55+
56+exit $result
57diff --git a/units/pc-sanity/pc-sanity-gpu.pxu b/units/pc-sanity/pc-sanity-gpu.pxu
58new file mode 100644
59index 0000000..6ccf00d
60--- /dev/null
61+++ b/units/pc-sanity/pc-sanity-gpu.pxu
62@@ -0,0 +1,10 @@
63+plugin: shell
64+category_id: com.canonical.plainbox::miscellanea
65+id: miscellanea/check-gpu-driver
66+user: root
67+command:
68+ gpu-driver-checker.sh
69+_summary: Check drivers on each gpu cards.
70+_description:
71+ All GPUs should have corresponding driver and those drivers should be
72+ LTS/formal version.
73diff --git a/units/pc-sanity/pc-sanity.pxu b/units/pc-sanity/pc-sanity.pxu
74index f7f669e..b6f006a 100644
75--- a/units/pc-sanity/pc-sanity.pxu
76+++ b/units/pc-sanity/pc-sanity.pxu
77@@ -12,6 +12,7 @@ include:
78 com.canonical.certification::misc/generic/grub_boothole
79 com.canonical.certification::miscellanea/cvescan
80 com.canonical.certification::miscellanea/check-nvidia
81+ com.canonical.certification::miscellanea/check-gpu-driver
82 com.canonical.certification::miscellanea/debsums
83 com.canonical.certification::miscellanea/install_kernel_tools_testing
84 com.canonical.certification::power-management/check-turbostat-long-idle-cpu-residency
85@@ -73,6 +74,7 @@ include:
86 com.canonical.certification::misc/generic/grub_boothole
87 com.canonical.certification::miscellanea/cvescan
88 com.canonical.certification::miscellanea/check-nvidia
89+ com.canonical.certification::miscellanea/check-gpu-driver
90 com.canonical.certification::miscellanea/debsums
91 com.canonical.certification::miscellanea/install_kernel_tools_testing
92 com.canonical.certification::power-management/check-turbostat-long-idle-cpu-residency
93@@ -132,6 +134,7 @@ _name: A test plan to confirm dgpu auto switch works well.
94 _description: Be as a unit test. To confirm there's no regression on dgpu automatic switching.
95 include:
96 com.canonical.certification::miscellanea/check-nvidia
97+ com.canonical.certification::miscellanea/check-gpu-driver
98 com.canonical.certification::graphics/2_auto_switch_card_.*
99 com.canonical.certification::graphics/2_valid_opengl_renderer_.*
100 com.canonical.certification::graphics/1_auto_switch_card_.*

Subscribers

People subscribed via source and target branches