i40e: general protection fault in i40e_config_vf_promiscuous_mode

Bug #1852663 reported by gerald.yang
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
gerald.yang
Eoan
Fix Released
Undecided
gerald.yang

Bug Description

SRU Justification

[Impact]
Assign some VFs to VMs, when deleting VMs, a general protection fault occurs in i40e_config_vf_promiscuous_mode

general protection fault: 0000 [#1] SMP PTI
CPU: 54 PID: 6200 Comm: libvirtd Not tainted 5.3.0-21-generic #22~18.04.1-UbuntuHardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 05/21/2019
RIP: 0010:i40e_config_vf_promiscuous_mode+0x172/0x330 [i40e]
Code: 48 8b 00 83 d1 00 48 85 c0 75 ef 49 83 c4 08 4c 39 e6 75 dd 85 c9 74 73 0f b6 45 c0 45 31 d2 89 45 d0 4d 8b 3e 4d 85 ff 74 53 <41> 0f b7 4f 16 66 81 f9 ff 0f 77 3f 0f b7 b3 ea 0c 00 00 8b 55 d0
RSP: 0018:ffffb987b5c77760 EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff9bb5df5a9000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000006000000 RDI: ffff9bace4bce350
RBP: ffffb987b5c777b0 R08: 0000000000000000 R09: ffff9bace56a9da0
R10: 0000000000000000 R11: 0000000000000100 R12: ffff9bb5df5a9a28
R13: ffff9bace4bce008 R14: ffff9bb5df5a9338 R15: 26c2b975d54f5980
FS: 00007f9f07fff700(0000) GS:ffff9bfcff480000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa73c9c0e10 CR3: 000000f6ab37a002 CR4: 00000000007626e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
i40e_ndo_set_vf_port_vlan+0x1a2/0x440[i40e]
do_setlink+0x53f/0xee0
?update_load_avg+0x596/0x620
?update_curr+0x7a/0x1d0
?__switch_to_asm+0x40/0x70
?__switch_to_asm+0x34/0x70
?__switch_to_asm+0x40/0x70
?__switch_to_asm+0x34/0x70
rtnl_setlink+0x113/0x150
rtnetlink_rcv_msg+0x296/0x340
?aa_label_sk_perm.part.4+0x10f/0x160
?_cond_resched+0x19/0x40
?rtnl_calcit.isra.30+0x120/0x120
netlink_rcv_skb+0x51/0x120
rtnetlink_rcv+0x15/0x20
netlink_unicast+0x1a4/0x250
netlink_sendmsg+0x2d7/0x3d0
sock_sendmsg+0x63/0x70
___sys_sendmsg+0x2a9/0x320
?aa_label_sk_perm.part.4+0x10f/0x160
?_raw_spin_unlock_bh+0x1e/0x20
?release_sock+0x8f/0xa0
__sys_sendmsg+0x63/0xa0
?__sys_sendmsg+0x63/0xa0
__x64_sys_sendmsg+0x1f/0x30
do_syscall_64+0x5a/0x130
entry_SYSCALL_64_after_hwframe+0x44/0xa9

This issue also happens when deleting k8s pod if VF is used by k8s pod, there was a bug reported in the e1000-devel mailing list
https://sourceforge.net/p/e1000/mailman/message/36766306/

The fix is suggested by Billy McFall, to add protection when accessing the hash list(vsi->mac_filter_hash), but it's not upstream yet

[Test Case]
Spin up some VMs with VFs, then delete all VMs

[Regression Potential]
Low, the fix is to add a protection for a hash list, shouldn't have potential regression

Changed in linux (Ubuntu):
assignee: nobody → gerald.yang (gerald-yang-tw)
Changed in linux (Ubuntu Eoan):
assignee: nobody → gerald.yang (gerald-yang-tw)
Changed in linux (Ubuntu):
status: New → In Progress
Changed in linux (Ubuntu Eoan):
status: New → In Progress
tags: added: sts
Changed in linux (Ubuntu Eoan):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-eoan' to 'verification-done-eoan'. If the problem still exists, change the tag 'verification-needed-eoan' to 'verification-failed-eoan'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-eoan
Changed in linux (Ubuntu Eoan):
status: Fix Committed → Invalid
status: Invalid → Fix Committed
Revision history for this message
Nobuto Murata (nobuto) wrote :

The general protection fault is reproducible with the current 5.3 kernel as follows by creating 10 SR-IOV enabled VMs with OpenStack and deleting 5 VMs sequentially. After updating it to the -proposed one as
 linux-image-5.3.0-25-generic 5.3.0-25.27, there is no such general protection fault happened with the same operations. So we can consider the fix is verified.

tags: added: verification-done-eoan
removed: verification-needed-eoan
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (27.4 KiB)

This bug was fixed in the package linux - 5.3.0-26.28

---------------
linux (5.3.0-26.28) eoan; urgency=medium

  * eoan/linux: 5.3.0-26.28 -proposed tracker (LP: #1856807)

  * nvidia-435 is in eoan, linux-restricted-modules only builds against 430,
    ubiquity gives me the self-signed modules experience instead of using the
    Canonical-signed modules (LP: #1856407)
    - Add nvidia-435 dkms build

linux (5.3.0-25.27) eoan; urgency=medium

  * eoan/linux: 5.3.0-25.27 -proposed tracker (LP: #1854762)

  * CVE-2019-14901
    - SAUCE: mwifiex: Fix heap overflow in mmwifiex_process_tdls_action_frame()

  * CVE-2019-14896 // CVE-2019-14897
    - SAUCE: libertas: Fix two buffer overflows at parsing bss descriptor

  * CVE-2019-14895
    - SAUCE: mwifiex: fix possible heap overflow in mwifiex_process_country_ie()

  * [CML] New device id's for CMP-H (LP: #1846335)
    - mmc: sdhci-pci: Add another Id for Intel CML
    - i2c: i801: Add support for Intel Comet Lake PCH-H
    - mtd: spi-nor: intel-spi: Add support for Intel Comet Lake-H SPI serial flash
    - mfd: intel-lpss: Add Intel Comet Lake PCH-H PCI IDs

  * i915: Display flickers (monitor loses signal briefly) during "flickerfree"
    boot, while showing the BIOS logo on a black background (LP: #1836858)
    - [Config] FRAMEBUFFER_CONSOLE_DEFERRED_TAKEOVER=y

  * Please add patch fixing RK818 ID detection (LP: #1853192)
    - SAUCE: mfd: rk808: Fix RK818 ID template

  * Kernel build log filled with "/bin/bash: line 5: warning: command
    substitution: ignored null byte in input" (LP: #1853843)
    - [Debian] Fix warnings when checking for modules signatures

  * Lenovo dock MAC Address pass through doesn't work in Ubuntu (LP: #1827961)
    - r8152: Add macpassthru support for ThinkPad Thunderbolt 3 Dock Gen 2

  * Dell XPS 13 9350/9360 headphone audio hiss (LP: #1654448) // [XPS 13 9360,
    Realtek ALC3246, Black Headphone Out, Front] High noise floor (LP: #1845810)
    - ALSA: hda/realtek: Reduce the Headphone static noise on XPS 9350/9360

  * no HDMI video output since GDM greeter after linux-oem-osp1 version
    5.0.0-1026 (LP: #1852386)
    - drm/i915: Add new CNL PCH ID seen on a CML platform
    - SAUCE: drm/i915: Fix detection for a CMP-V PCH

  * [broadwell-rt286, playback] Since Linux 5.2rc2 audio playback no longer
    works on Dell Venue 11 Pro 7140 (LP: #1846539)
    - [Config] Drop snd-sof-intel-bdw build
    - SAUCE: ASoC: SOF: Intel: Broadwell: clarify mutual exclusion with legacy
      driver

  * [CML-S62] Need enable turbostat patch support for Comet lake- S 6+2
    (LP: #1847451)
    - SAUCE: tools/power turbostat: Add Cometlake support

  * External microphone can't work on some dell machines with the codec alc256
    or alc236 (LP: #1853791)
    - SAUCE: ALSA: hda/realtek - Move some alc256 pintbls to fallback table
    - SAUCE: ALSA: hda/realtek - Move some alc236 pintbls to fallback table

  * Memory leak in net/xfrm/xfrm_state.c - 8 pages per ipsec connection
    (LP: #1853197)
    - xfrm: Fix memleak on xfrm state destroy

  * CVE-2019-18660: patches for Ubuntu (LP: #1853142) // CVE-2019-18660
    - powerpc/64s: support nospectre_v2 cmdline option
    - powerp...

Changed in linux (Ubuntu Eoan):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (8.6 KiB)

This bug was fixed in the package linux - 5.4.0-9.12

---------------
linux (5.4.0-9.12) focal; urgency=medium

  * alsa/hda/realtek: the line-out jack doens't work on a dell AIO
    (LP: #1855999)
    - SAUCE: ALSA: hda/realtek - Line-out jack doesn't work on a Dell AIO

  * scsi: hisi_sas: Check sas_port before using it (LP: #1855952)
    - scsi: hisi_sas: Check sas_port before using it

  * CVE-2019-19078
    - ath10k: fix memory leak

  * cifs: DFS Caching feature causing problems traversing multi-tier DFS setups
    (LP: #1854887)
    - cifs: Fix retrieval of DFS referrals in cifs_mount()

  * Support DPCD aux brightness control (LP: #1856134)
    - SAUCE: drm/i915: Fix eDP DPCD aux max backlight calculations
    - SAUCE: drm/i915: Assume 100% brightness when not in DPCD control mode
    - SAUCE: drm/i915: Fix DPCD register order in intel_dp_aux_enable_backlight()
    - SAUCE: drm/i915: Auto detect DPCD backlight support by default
    - SAUCE: drm/i915: Force DPCD backlight mode on X1 Extreme 2nd Gen 4K AMOLED
      panel
    - USUNTU: SAUCE: drm/i915: Force DPCD backlight mode on Dell Precision 4K sku

  * The system cannot resume from S3 if user unplugs the TB16 during suspend
    state (LP: #1849269)
    - PCI: pciehp: Do not disable interrupt twice on suspend
    - PCI: pciehp: Prevent deadlock on disconnect

  * change kconfig of the soundwire bus driver from y to m (LP: #1855685)
    - [Config]: SOUNDWIRE=m

  * alsa/sof: change to use hda hdmi codec driver to make hdmi audio on the
    docking station work (LP: #1855666)
    - ALSA: hda/hdmi - implement mst_no_extra_pcms flag
    - ASoC: hdac_hda: add support for HDMI/DP as a HDA codec
    - ASoC: Intel: skl-hda-dsp-generic: use snd-hda-codec-hdmi
    - ASoC: Intel: skl-hda-dsp-generic: fix include guard name
    - ASoC: SOF: Intel: add support for snd-hda-codec-hdmi
    - ASoC: Intel: bxt-da7219-max98357a: common hdmi codec support
    - ASoC: Intel: glk_rt5682_max98357a: common hdmi codec support
    - ASoC: intel: sof_rt5682: common hdmi codec support
    - ASoC: Intel: bxt_rt298: common hdmi codec support
    - ASoC: SOF: enable sync_write in hdac_bus
    - [config]: SND_SOC_SOF_HDA_COMMON_HDMI_CODEC=y

  * Fix unusable USB hub on Dell TB16 after S3 (LP: #1855312)
    - SAUCE: USB: core: Make port power cycle a seperate helper function
    - SAUCE: USB: core: Attempt power cycle port when it's in eSS.Disabled state

  * Focal update: v5.4.3 upstream stable release (LP: #1856583)
    - rsi: release skb if rsi_prepare_beacon fails
    - arm64: tegra: Fix 'active-low' warning for Jetson TX1 regulator
    - arm64: tegra: Fix 'active-low' warning for Jetson Xavier regulator
    - perf scripts python: exported-sql-viewer.py: Fix use of TRUE with SQLite
    - sparc64: implement ioremap_uc
    - lp: fix sparc64 LPSETTIMEOUT ioctl
    - time: Zero the upper 32-bits in __kernel_timespec on 32-bit
    - mailbox: tegra: Fix superfluous IRQ error message
    - staging/octeon: Use stubs for MIPS && !CAVIUM_OCTEON_SOC
    - usb: gadget: u_serial: add missing port entry locking
    - serial: 8250-mtk: Use platform_get_irq_optional() for optional irq
    - tty: serial: fsl_lpuart: use the sg ...

Read more...

Changed in linux (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.