amdgpu kernel errors in Linux 5.4

Bug #1871248 reported by Kirt Runolfson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Unassigned
Focal
Fix Released
Undecided
Unassigned

Bug Description

[Impact]

amdgpu generates assert warning messages when retimer is not supported.
This behaviour can cause confusions to users.

[Fix]

This is fixed by commit a0e40018dcc3f59a. It silences ASSERTs by
outputing a debug message and exiting when retimer is not supported by
hardware.

[Test]

Verified on B450 I AORUS PRO WIFI with AMD Ryzen 3 3200G as discussed
on LP:1871248.

[Regression Potential]

Low. It is for unsupported hardware features, and the patch was cherry-picked
from upstream kernel 5.7 rc1.

======================================================

Linux atlas 5.4.0-21-generic #25-Ubuntu SMP Sat Mar 28 13:10:28 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Ubuntu 20.04 beta fresh installation. Upgrades applied.
AMD Ryzen 3 3200G with Radeon Vega Graphics
Motherboard: Gigabyte B450 I AORUS PRO WIFI/B450 I AORUS PRO WIFI-CF, BIOS F50 11/27/2019

These errors keep appearing in the kernel messages.
[ 3.258890] ------------[ cut here ]------------
[ 3.258944] WARNING: CPU: 1 PID: 621 at drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_link.c:1737 write_i2c_retimer_setting+0xc8/0x3e0 [amdgpu]
[ 3.258945] Modules linked in: amd64_edac_mod(-) cmac algif_hash algif_skcipher af_alg bnep edac_mce_amd ccp kvm snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_nhlt snd_hda_codec snd_hda_core snd_hwdep snd_pcm crct10dif_pclmul ghash_clmulni_intel snd_seq_midi nls_iso8859_1 snd_seq_midi_event snd_rawmidi snd_seq aesni_intel iwlmvm crypto_simd mac80211 amdgpu cryptd libarc4 glue_helper snd_seq_device amd_iommu_v2 btusb iwlwifi gpu_sched btrtl wmi_bmof ttm btbcm k10temp snd_timer btintel input_leds drm_kms_helper bluetooth snd fb_sys_fops syscopyarea cfg80211 sysfillrect soundcore sysimgblt ecdh_generic ecc mac_hid sch_fq_codel parport_pc ppdev lp drm parport ip_tables x_tables autofs4 hid_generic usbhid hid nvme crc32_pclmul i2c_piix4 ahci igb nvme_core i2c_algo_bit dca libahci wmi video gpio_amdpt gpio_generic
[ 3.258966] CPU: 1 PID: 621 Comm: gpu-manager Not tainted 5.4.0-21-generic #25-Ubuntu
[ 3.258967] Hardware name: Gigabyte Technology Co., Ltd. B450 I AORUS PRO WIFI/B450 I AORUS PRO WIFI-CF, BIOS F50 11/27/2019
[ 3.259012] RIP: 0010:write_i2c_retimer_setting+0xc8/0x3e0 [amdgpu]

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: linux-image-5.4.0-21-generic 5.4.0-21.25
ProcVersionSignature: Ubuntu 5.4.0-21.25-generic 5.4.27
Uname: Linux 5.4.0-21-generic x86_64
ApportVersion: 2.20.11-0ubuntu24
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC1: kirt 1293 F.... pulseaudio
 /dev/snd/controlC0: kirt 1293 F.... pulseaudio
CurrentDesktop: ubuntu:GNOME
Date: Mon Apr 6 15:52:30 2020
InstallationDate: Installed on 2020-04-06 (0 days ago)
InstallationMedia: Ubuntu 20.04 LTS "Focal Fossa" - Beta amd64 (20200402)
MachineType: Gigabyte Technology Co., Ltd. B450 I AORUS PRO WIFI
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 amdgpudrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.4.0-21-generic root=UUID=9940ba5b-7683-413e-92a2-854b3816799b ro quiet splash vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-5.4.0-21-generic N/A
 linux-backports-modules-5.4.0-21-generic N/A
 linux-firmware 1.187
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 11/27/2019
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: F50
dmi.board.asset.tag: Default string
dmi.board.name: B450 I AORUS PRO WIFI-CF
dmi.board.vendor: Gigabyte Technology Co., Ltd.
dmi.board.version: x.x
dmi.chassis.asset.tag: Default string
dmi.chassis.type: 3
dmi.chassis.vendor: Default string
dmi.chassis.version: Default string
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrF50:bd11/27/2019:svnGigabyteTechnologyCo.,Ltd.:pnB450IAORUSPROWIFI:pvrDefaultstring:rvnGigabyteTechnologyCo.,Ltd.:rnB450IAORUSPROWIFI-CF:rvrx.x:cvnDefaultstring:ct3:cvrDefaultstring:
dmi.product.family: Default string
dmi.product.name: B450 I AORUS PRO WIFI
dmi.product.sku: Default string
dmi.product.version: Default string
dmi.sys.vendor: Gigabyte Technology Co., Ltd.

Revision history for this message
Kirt Runolfson (kirtr) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Kirt Runolfson (kirtr) wrote :
Revision history for this message
Alex Hung (alexhung) wrote :

The errors (or warnings) are generated by ASSERT like the following statement.

if (!i2c_success)
   /* Write failure */
   ASSERT(i2c_success);

Details @ https://elixir.bootlin.com/linux/v5.4.31/source/drivers/gpu/drm/amd/display/dc/core/dc_link.c#L1737

I am not an amdgpu expert but I don't think they are harmful but are indication that one feature in amdgpu does not work.

If this really bugs you, I can backport a fix that silences the ASSERTs but outputs "simple log message" in uptream kernel. See following for details.

commit a0e40018dcc3f59a10ca21d58f8ea8ceb1b035ac
Author: Rodrigo Siqueira <email address hidden>
Date: Mon Feb 24 10:13:37 2020 -0500

    drm/amd/display: Stop if retimer is not available

    Raven provides retimer feature support that requires i2c interaction in
    order to make it work well, all settings required for this configuration
    are loaded from the Atom bios which include the i2c address. If the
    retimer feature is not available, we should abort the attempt to set
    this feature, otherwise, it makes the following line return
    I2C_CHANNEL_OPERATION_NO_RESPONSE:

    ... // skipped

Revision history for this message
Alex Hung (alexhung) wrote :

A kernel, based on Ubuntu-5.4.0-21.25 + a0e40018dcc3f, is available for testing @ https://people.canonical.com/~alexhung/LP1871248/

Revision history for this message
Kirt Runolfson (kirtr) wrote : Re: [Bug 1871248] Re: amdgpu kernel errors in Linux 5.4

Interesting. Thank you, I'll try that kernel!

On Wed, Apr 8, 2020 at 4:50 PM Alex Hung <email address hidden> wrote:

> A kernel, based on Ubuntu-5.4.0-21.25 + a0e40018dcc3f, is available for
> testing @ https://people.canonical.com/~alexhung/LP1871248/
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1871248
>
> Title:
> amdgpu kernel errors in Linux 5.4
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1871248/+subscriptions
>

Revision history for this message
Kirt Runolfson (kirtr) wrote :

This kernel silences those errors/warnings nicely. Good find and thank you! So nice not to have it dump 8 blocks of warnings that look like errors.

Is it possible to have that patch backported into the 5.4 series going forward? I would guess that most users that can read kernel messages are going to be a bit alarmed by those.

Revision history for this message
Alex Hung (alexhung) wrote :

A SRU was sent for reviewing. Your help will be needed to test proposed kernel before next kernel updates later.

This patch is contained in kernel 5.7 rc1 and it is likely to be included in next Ubuntu release without backporting.

Revision history for this message
Kirt Runolfson (kirtr) wrote :

Hi Alex,

I'd be more than happy to test. I'll jump on it and report as soon as I see
a kernel with the SRU applied.

Thank you again!

-Kirt

On Mon, Apr 13, 2020 at 12:01 PM Alex Hung <email address hidden>
wrote:

> A SRU was sent for reviewing. Your help will be needed to test proposed
> kernel before next kernel updates later.
>
> This patch is contained in kernel 5.7 rc1 and it is likely to be
> included in next Ubuntu release without backporting.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1871248
>
> Title:
> amdgpu kernel errors in Linux 5.4
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1871248/+subscriptions
>

Alex Hung (alexhung)
description: updated
Changed in linux (Ubuntu Focal):
status: New → Fix Committed
Revision history for this message
Kirt Runolfson (kirtr) wrote :

Confirmed that these errors do not appear in a fresh 20.04 release installation.

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Revision history for this message
Kirt Runolfson (kirtr) wrote :

Received an automated message from the kernel bot to change the tag "verification-done-focal". I've done this. This patch makes kernel messages much more readable for this motherboard/apu combo.

tags: added: verification-done-focal
removed: verification-needed-focal
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (25.9 KiB)

This bug was fixed in the package linux - 5.4.0-31.35

---------------
linux (5.4.0-31.35) focal; urgency=medium

  * focal/linux: 5.4.0-31.35 -proposed tracker (LP: #1877253)

  * Intermittent display blackouts on event (LP: #1875254)
    - drm/i915: Limit audio CDCLK>=2*BCLK constraint back to GLK only

  * Unable to handle kernel pointer dereference in virtual kernel address space
    on Eoan (LP: #1876645)
    - SAUCE: overlayfs: fix shitfs special-casing

linux (5.4.0-30.34) focal; urgency=medium

  * focal/linux: 5.4.0-30.34 -proposed tracker (LP: #1875385)

  * ubuntu/focal64 fails to mount Vagrant shared folders (LP: #1873506)
    - [Packaging] Move virtualbox modules to linux-modules
    - [Packaging] Remove vbox and zfs modules from generic.inclusion-list

  * linux-image-5.0.0-35-generic breaks checkpointing of container
    (LP: #1857257)
    - SAUCE: overlayfs: use shiftfs hacks only with shiftfs as underlay

  * shiftfs: broken shiftfs nesting (LP: #1872094)
    - SAUCE: shiftfs: record correct creator credentials

  * Add debian/rules targets to compile/run kernel selftests (LP: #1874286)
    - [Packaging] add support to compile/run selftests

  * shiftfs: O_TMPFILE reports ESTALE (LP: #1872757)
    - SAUCE: shiftfs: fix dentry revalidation

  * LIO hanging in iscsit_free_session and iscsit_stop_session (LP: #1871688)
    - scsi: target: iscsi: calling iscsit_stop_session() inside
      iscsit_close_session() has no effect

  * [ICL] TC port in legacy/static mode can't be detected due TCCOLD
    (LP: #1868936)
    - SAUCE: drm/i915: Align power domain names with port names
    - SAUCE: drm/i915/display: Move out code to return the digital_port of the aux
      ch
    - SAUCE: drm/i915/display: Add intel_legacy_aux_to_power_domain()
    - SAUCE: drm/i915/display: Split hsw_power_well_enable() into two
    - SAUCE: drm/i915/tc/icl: Implement TC cold sequences
    - SAUCE: drm/i915/tc: Skip ref held check for TC legacy aux power wells
    - SAUCE: drm/i915/tc/tgl: Implement TC cold sequences
    - SAUCE: drm/i915/tc: Catch TC users accessing FIA registers without enable
      aux
    - SAUCE: drm/i915/tc: Do not warn when aux power well of static TC ports
      timeout

  * alsa/sof: external mic can't be deteced on Lenovo and HP laptops
    (LP: #1872569)
    - SAUCE: ASoC: intel/skl/hda - set autosuspend timeout for hda codecs

  * amdgpu kernel errors in Linux 5.4 (LP: #1871248)
    - drm/amd/display: Stop if retimer is not available

  * Focal update: v5.4.34 upstream stable release (LP: #1874111)
    - amd-xgbe: Use __napi_schedule() in BH context
    - hsr: check protocol version in hsr_newlink()
    - l2tp: Allow management of tunnels and session in user namespace
    - net: dsa: mt7530: fix tagged frames pass-through in VLAN-unaware mode
    - net: ipv4: devinet: Fix crash when add/del multicast IP with autojoin
    - net: ipv6: do not consider routes via gateways for anycast address check
    - net: phy: micrel: use genphy_read_status for KSZ9131
    - net: qrtr: send msgs from local of same id as broadcast
    - net: revert default NAPI poll timeout to 2 jiffies
    - net: tun: record RX queue in skb before do_xdp_gener...

Changed in linux (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 5.4.0-42.46

---------------
linux (5.4.0-42.46) focal; urgency=medium

  * focal/linux: 5.4.0-42.46 -proposed tracker (LP: #1887069)

  * linux 4.15.0-109-generic network DoS regression vs -108 (LP: #1886668)
    - SAUCE: Revert "netprio_cgroup: Fix unlimited memory leak of v2 cgroups"

linux (5.4.0-41.45) focal; urgency=medium

  * focal/linux: 5.4.0-41.45 -proposed tracker (LP: #1885855)

  * Packaging resync (LP: #1786013)
    - update dkms package versions

  * CVE-2019-19642
    - kernel/relay.c: handle alloc_percpu returning NULL in relay_open

  * CVE-2019-16089
    - SAUCE: nbd_genl_status: null check for nla_nest_start

  * CVE-2020-11935
    - aufs: do not call i_readcount_inc()

  * ip_defrag.sh in net from ubuntu_kernel_selftests failed with 5.0 / 5.3 / 5.4
    kernel (LP: #1826848)
    - selftests: net: ip_defrag: ignore EPERM

  * Update lockdown patches (LP: #1884159)
    - SAUCE: acpi: disallow loading configfs acpi tables when locked down

  * seccomp_bpf fails on powerpc (LP: #1885757)
    - SAUCE: selftests/seccomp: fix ptrace tests on powerpc

  * Introduce the new NVIDIA 418-server and 440-server series, and update the
    current NVIDIA drivers (LP: #1881137)
    - [packaging] add signed modules for the 418-server and the 440-server
      flavours

 -- Khalid Elmously <email address hidden> Thu, 09 Jul 2020 19:50:26 -0400

Changed in linux (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.