Xenial: data corruption when using i40e with iommu

Bug #1802421 reported by Daniel Axtens
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned
Xenial
Fix Released
Undecided
Unassigned

Bug Description

A user reports that using an i40e with intel_iommu=on with the Xenial GA kernel causes data corruption. Using the Xenial HWE kernel or an out-of-tree driver more recent than the version shipped with Xenial solves the issue.

[Impact]
Corrupted data is returned from the network card intermittently. This is often noticeable when using apt, as the checksums are verified. If often leads to failure of apt operations. When there are no checksums done, this could lead to silent data corruption.

[Fix]
This was fixed somewhere post-4.4. Testing identified b32bfa17246d ("i40e: Drop packet split receive routine") which is part of a broader refactor. Picking this patch alone is sufficient to fix the issue. My theory is that iommu exposes an issue in the packet split receive routine and so removing it is sufficient to prevent the problem from occurring.

[Test]
A user tested a Xenial 4.4 kernel with this patch applied and it fixed their issue - no data corruption was observed. (The test repeatedly deletes the apt cache and then does apt update.)

[Regression Potential]
It's a messy change inside i40e, so the risk is that i40e will be broken in some subtle way we haven't noticed, or have performance issues. None of these have been observed so far.

Daniel Axtens (daxtens)
description: updated
Changed in linux (Ubuntu Xenial):
status: New → In Progress
Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
Revision history for this message
Daniel Axtens (daxtens) wrote :

The user has verified that the -proposed kernel resolves their issue.

Regards,
Daniel

tags: added: verification-done-xenial
removed: verification-needed-xenial
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (33.2 KiB)

This bug was fixed in the package linux - 4.4.0-142.168

---------------
linux (4.4.0-142.168) xenial; urgency=medium

  * linux: 4.4.0-142.168 -proposed tracker (LP: #1811846)

  * Packaging resync (LP: #1786013)
    - [Packaging] update helper scripts

  * iptables connlimit allows more connections than the limit when using
    multiple CPUs (LP: #1811094)
    - netfilter: xt_connlimit: don't store address in the conn nodes
    - SAUCE: netfilter: xt_connlimit: remove the 'addr' parameter in add_hlist()
    - netfilter: nf_conncount: expose connection list interface
    - netfilter: nf_conncount: Fix garbage collection with zones
    - netfilter: nf_conncount: fix garbage collection confirm race
    - netfilter: nf_conncount: don't skip eviction when age is negative

  * CVE-2017-5715
    - SAUCE: x86/speculation: Cleanup IBPB runtime control handling
    - SAUCE: x86/speculation: Cleanup IBRS runtime control handling
    - SAUCE: x86/speculation: Use x86_spec_ctrl_base in entry/exit code
    - SAUCE: x86/speculation: Move RSB_CTXSW hunk

  * Xenial update: 4.4.167 upstream stable release (LP: #1811077)
    - media: em28xx: Fix use-after-free when disconnecting
    - Revert "wlcore: Add missing PM call for
      wlcore_cmd_wait_for_event_or_timeout()"
    - rapidio/rionet: do not free skb before reading its length
    - s390/qeth: fix length check in SNMP processing
    - usbnet: ipheth: fix potential recvmsg bug and recvmsg bug 2
    - kvm: mmu: Fix race in emulated page table writes
    - xtensa: enable coprocessors that are being flushed
    - xtensa: fix coprocessor context offset definitions
    - Btrfs: ensure path name is null terminated at btrfs_control_ioctl
    - ALSA: wss: Fix invalid snd_free_pages() at error path
    - ALSA: ac97: Fix incorrect bit shift at AC97-SPSA control write
    - ALSA: control: Fix race between adding and removing a user element
    - ALSA: sparc: Fix invalid snd_free_pages() at error path
    - ext2: fix potential use after free
    - dmaengine: at_hdmac: fix memory leak in at_dma_xlate()
    - dmaengine: at_hdmac: fix module unloading
    - btrfs: release metadata before running delayed refs
    - USB: usb-storage: Add new IDs to ums-realtek
    - usb: core: quirks: add RESET_RESUME quirk for Cherry G230 Stream series
    - misc: mic/scif: fix copy-paste error in scif_create_remote_lookup
    - Kbuild: suppress packed-not-aligned warning for default setting only
    - exec: avoid gcc-8 warning for get_task_comm
    - disable stringop truncation warnings for now
    - kobject: Replace strncpy with memcpy
    - unifdef: use memcpy instead of strncpy
    - kernfs: Replace strncpy with memcpy
    - ip_tunnel: Fix name string concatenate in __ip_tunnel_create()
    - drm: gma500: fix logic error
    - scsi: bfa: convert to strlcpy/strlcat
    - staging: rts5208: fix gcc-8 logic error warning
    - kdb: use memmove instead of overlapping memcpy
    - iser: set sector for ambiguous mr status errors
    - uprobes: Fix handle_swbp() vs. unregister() + register() race once more
    - MIPS: ralink: Fix mt7620 nd_sd pinmux
    - mips: fix mips_get_syscall_arg o32 check
    - drm/ast: Fix incorrect free on ioregs
 ...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Brad Figg (brad-figg)
tags: added: cscc
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.