skb_warn_bad_offload Crash

Bug #1558025 reported by Richard Laager
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
Joseph Salisbury
Xenial
Fix Released
Medium
Joseph Salisbury

Bug Description

We're setting up a new Dovecot server on Xenial. When we start the migration of emails, either the IMAP traffic or the NFS traffic immediately reproduces the crash below.

I bisected using the mainline kernels. Everything up to and including v4.5-rc7-wily fails, but v4.5-wily works.

[35439.677840] WARNING: CPU: 1 PID: 1209 at /build/linux-chsvUo/linux-4.4.0/net/core/dev.c:2422 skb_warn_bad_offload+0xd1/0x120()
[35439.677843] virtio_net: caps=(0x00000804001f4a29, 0x0000000000000000) len=1622 data_len=1452 gso_size=1480 gso_type=2 ip_summed=0
[35439.677844] Modules linked in: nfsv3 nfs_acl nfs lockd grace fscache ppdev kvm_intel kvm irqbypass input_leds joydev serio_raw pvpanic parport_pc 8250_fintek par
v6 xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 xt_comment xt_multiport xt_recent xt_limit ib_iser xt_tcpudp rdma_cm iw_cm xt_addrtype i
ipv4 nf_defrag_ipv4 iscsi_tcp xt_conntrack libiscsi_tcp libiscsi scsi_transport_iscsi ip6table_filter ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_n
able_filter ip_tables x_tables sunrpc autofs4 cirrus ttm drm_kms_helper syscopyarea sysfillrect psmouse sysimgblt fb_sys_fops floppy drm pata_acpi
[35439.677893] CPU: 1 PID: 1209 Comm: kworker/1:1H Tainted: G W 4.4.0-12-generic #28-Ubuntu
[35439.677895] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[35439.677920] Workqueue: rpciod rpc_async_schedule [sunrpc]
[35439.677922] 0000000000000286 00000000989a4fbd ffff880038c13798 ffffffff813e1ec3
[35439.677924] ffff880038c137e0 ffffffff81d5f900 ffff880038c137d0 ffffffff8107fe12
[35439.677926] ffff8800382f2600 ffff880035cdb000 0000000000000002 ffff880038264aac
[35439.677928] Call Trace:
[35439.677934] [<ffffffff813e1ec3>] dump_stack+0x63/0x90
[35439.677938] [<ffffffff8107fe12>] warn_slowpath_common+0x82/0xc0
[35439.677940] [<ffffffff8107feac>] warn_slowpath_fmt+0x5c/0x80
[35439.677943] [<ffffffff813e7f32>] ? ___ratelimit+0xa2/0xe0
[35439.677946] [<ffffffff81711011>] skb_warn_bad_offload+0xd1/0x120
[35439.677948] [<ffffffff8171469e>] __skb_gso_segment+0x7e/0xd0
[35439.677950] [<ffffffff81714a4d>] validate_xmit_skb.isra.97.part.98+0x10d/0x2b0
[35439.677951] [<ffffffff81714ffb>] validate_xmit_skb_list+0x3b/0x60
[35439.677956] [<ffffffff817397eb>] sch_direct_xmit+0x16b/0x210
[35439.677957] [<ffffffff8171533d>] __dev_queue_xmit+0x23d/0x590
[35439.677959] [<ffffffff817156a0>] dev_queue_xmit+0x10/0x20
[35439.677962] [<ffffffff8171e098>] neigh_resolve_output+0x118/0x1c0
[35439.677966] [<ffffffff817553b6>] ip_finish_output2+0x146/0x340
[35439.677968] [<ffffffff81756316>] ip_finish_output+0x136/0x1f0
[35439.677971] [<ffffffff81749b93>] ? nf_hook_slow+0x73/0xd0
[35439.677973] [<ffffffff81756d0e>] ip_output+0x6e/0xe0
[35439.677975] [<ffffffff817561e0>] ? __ip_flush_pending_frames.isra.39+0x90/0x90
[35439.677977] [<ffffffff817564d5>] ip_local_out+0x35/0x40
[35439.677979] [<ffffffff817576b9>] ip_send_skb+0x19/0x40
[35439.677981] [<ffffffff8177ed26>] udp_send_skb+0x176/0x270
[35439.677983] [<ffffffff8177ee5e>] udp_push_pending_frames+0x3e/0x60
[35439.677985] [<ffffffff81780581>] udp_sendpage+0x121/0x1a0
[35439.677990] [<ffffffff816f6458>] ? sock_sendmsg+0x38/0x50
[35439.677992] [<ffffffff816f658b>] ? kernel_sendmsg+0x2b/0x30
[35439.678001] [<ffffffffc0131625>] ? xs_send_kvec+0xa5/0xb0 [sunrpc]
[35439.678004] [<ffffffff8178d133>] inet_sendpage+0x73/0xd0
[35439.678013] [<ffffffffc01317b9>] xs_sendpages+0x189/0x1d0 [sunrpc]
[35439.678022] [<ffffffffc0131acb>] xs_udp_send_request+0x7b/0x1b0 [sunrpc]
[35439.678031] [<ffffffffc012f616>] xprt_transmit+0x66/0x340 [sunrpc]
[35439.678039] [<ffffffffc012bd89>] call_transmit+0x1b9/0x2a0 [sunrpc]
[35439.678047] [<ffffffffc012bbd0>] ? call_decode+0x800/0x800 [sunrpc]
[35439.678055] [<ffffffffc012bbd0>] ? call_decode+0x800/0x800 [sunrpc]
[35439.678065] [<ffffffffc0136461>] __rpc_execute+0x91/0x470 [sunrpc]
[35439.678074] [<ffffffffc0136855>] rpc_async_schedule+0x15/0x20 [sunrpc]
[35439.678078] [<ffffffff81098eb2>] process_one_work+0x162/0x480
[35439.678080] [<ffffffff8109921b>] worker_thread+0x4b/0x4c0
[35439.678082] [<ffffffff810991d0>] ? process_one_work+0x480/0x480
[35439.678084] [<ffffffff810991d0>] ? process_one_work+0x480/0x480
[35439.678086] [<ffffffff8109f3e8>] kthread+0xd8/0xf0
[35439.678088] [<ffffffff8109f310>] ? kthread_create_on_node+0x1e0/0x1e0
[35439.678091] [<ffffffff8181cbcf>] ret_from_fork+0x3f/0x70
[35439.678093] [<ffffffff8109f310>] ? kthread_create_on_node+0x1e0/0x1e0
[35439.678094] ---[ end trace f958a4523480bfd6 ]---

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: linux-image-4.4.0-12-generic 4.4.0-12.28
ProcVersionSignature: Ubuntu 4.4.0-12.28-generic 4.4.4
Uname: Linux 4.4.0-12-generic x86_64
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Mar 16 06:04 seq
 crw-rw---- 1 root audio 116, 33 Mar 16 06:04 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20-0ubuntu3
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
Date: Wed Mar 16 06:04:42 2016
HibernationDevice: RESUME=UUID=5bd13db0-5643-4ab4-9755-6f699d2bb5e5
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
Lsusb: Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
MachineType: QEMU Standard PC (i440FX + PIIX, 1996)
PciMultimedia:

ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 cirrusdrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.4.0-12-generic root=UUID=b7f56990-c269-4bc6-872c-7d7982f384bf ro console=ttyS0,115200n8 console=tty1 transparent_hugepage=always elevator=noop
RelatedPackageVersions:
 linux-restricted-modules-4.4.0-12-generic N/A
 linux-backports-modules-4.4.0-12-generic N/A
 linux-firmware 1.156
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 01/01/2011
dmi.bios.vendor: Bochs
dmi.bios.version: Bochs
dmi.chassis.type: 1
dmi.chassis.vendor: Bochs
dmi.modalias: dmi:bvnBochs:bvrBochs:bd01/01/2011:svnQEMU:pnStandardPC(i440FX+PIIX,1996):pvrpc-i440fx-trusty:cvnBochs:ct1:cvr:
dmi.product.name: Standard PC (i440FX + PIIX, 1996)
dmi.product.version: pc-i440fx-trusty
dmi.sys.vendor: QEMU

Revision history for this message
Richard Laager (rlaager) wrote :
description: updated
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Xenial):
assignee: nobody → Joseph Salisbury (jsalisbury)
status: Confirmed → In Progress
Changed in linux (Ubuntu Xenial):
importance: Undecided → Medium
tags: added: performing-bisect
Revision history for this message
Richard Laager (rlaager) wrote :

This is the relevant commit which *fixes* the bug:
https://github.com/torvalds/linux/commit/a8c4a2522a0808c5c2143612909717d1115c40cf

The fact that this occurs with UDP+IPv4 traffic lines up with my NFS usage, as we use UDP for NFS.

Revision history for this message
Tim Gardner (timg-tpi) wrote :

commit a8c4a2522a0808c5c2143612909717d1115c40cf ('ipv4: only create late gso-skb if skb is already set up with CHECKSUM_PARTIAL') applied to Xenial.

jsalisbury - perhaps you could build a test kernel from Xenial master-next to verify ?

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a Xenial test kernel from master-next. The kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1558025/

Can you test this kernel and see if it resolves this bug?

Revision history for this message
Richard Laager (rlaager) wrote :

I tested that build. It works. Thanks!

Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (11.3 KiB)

This bug was fixed in the package linux - 4.4.0-15.31

---------------
linux (4.4.0-15.31) xenial; urgency=low

  [ Tim Gardner ]

  * Release Tracking Bug
    - LP: #1559252

  * Xilinx KU3 Capi card does not show up in Ubuntu 16.04 (LP: #1557001)
    - SAUCE: (noup) cxl: Allow initialization on timebase sync failures

  * policy namespace stacking (LP: #1379535)
    - Revert "UBUNTU: SAUCE: Move replacedby allocation into label_alloc"
    - Revert "UBUNTU: SAUCE: Fixup: __label_update() still doesn't handle some cases correctly."
    - Revert "UBUNTU: SAUCE: fix: audit "no_new_privs" case for exec failure"
    - Revert "UBUNTU: SAUCE: fixup: warning about aa_label_vec_find_or_create not being static"
    - Revert "UBUNTU: SAUCE: apparmor: fix refcount race when finding a child profile"
    - Revert "UBUNTU: SAUCE: fixup: cast poison values to remove warnings"
    - Revert "UBUNTU: SAUCE: fixup: get rid of unused var build warning"
    - Revert "UBUNTU: SAUCE: fixup: 20/23 locking issue around in __label_update"
    - Revert "UBUNTU: SAUCE: fixup: make __share_replacedby private to get rid of build warning"
    - Revert "UBUNTU: SAUCE: fix: replacedby forwarding is not being properly update when ns is destroyed"
    - Revert "UBUNTU: SAUCE: apparmor: fix log of apparmor audit message when kern_path() fails"
    - Revert "UBUNTU: SAUCE: fixup: cleanup return handling of labels"
    - Revert "UBUNTU: SAUCE: apparmor: fix: ref count leak when profile sha1 hash is read"
    - Revert "UBUNTU: SAUCE: apparmor: Fix: query label file permission"
    - Revert "UBUNTU: SAUCE: apparmor: Don't remove label on rcu callback if the label has already been removed"
    - Revert "UBUNTU: SAUCE: apparmor: Fix: break circular refcount for label that is directly freed."
    - Revert "UBUNTU: SAUCE: apparmor: Fix: refcount bug when inserting label update that transitions ns"
    - Revert "UBUNTU: SAUCE: apparmor: Fix: now that insert can force replacement use it instead of remove_and_insert"
    - Revert "UBUNTU: SAUCE: apparmor Fix: refcount bug in pivotroot mediation"
    - Revert "UBUNTU: SAUCE: apparmor: ensure that repacedby sharing is done correctly"
    - Revert "UBUNTU: SAUCE: apparmor: Fix: update replacedby allocation to take a gfp parameter"
    - Revert "UBUNTU: SAUCE: apparmor: Fix: convert replacedby update to be protected by the labelset lock"
    - Revert "UBUNTU: SAUCE: apparmor: Fix: add required locking of __aa_update_replacedby on merge path"
    - Revert "UBUNTU: SAUCE: apparmor: Fix: deadlock in aa_put_label() call chain"
    - Revert "UBUNTU: SAUCE: apparmor: Fix: label_vec_merge insertion"
    - Revert "UBUNTU: SAUCE: apparmor: Fix: ensure new labels resulting from merge have a replacedby"
    - Revert "UBUNTU: SAUCE: apparmor: Fix: refcount leak in aa_label_merge"
    - Revert "UBUNTU: SAUCE: apparmor: Fix: refcount race between locating in labelset and get"
    - Revert "UBUNTU: SAUCE: apparmor: Fix: label merge handling of marking unconfined and stale"
    - Revert "UBUNTU: SAUCE: apparmor: add underscores to indicate aa_label_next_not_in_set() use needs locking"
    - Revert "UBUNTU: SAUCE: apparmor: debug: POISON label and replaceby ...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.