~mreed8855/ubuntu/+source/linux/+git/jammy:lp_2023185_iostat

Last commit made on 2023-09-25
Get this branch:
git clone -b lp_2023185_iostat https://git.launchpad.net/~mreed8855/ubuntu/+source/linux/+git/jammy
Only Michael Reed can upload to this branch. If you are Michael Reed please log in for upload directions.

Branch merges

Branch information

Name:
lp_2023185_iostat
Repository:
lp:~mreed8855/ubuntu/+source/linux/+git/jammy

Recent commits

c5b8307... by Sagi Grimberg <email address hidden>

nvme-multipath: support io stats on the mpath device

BugLink: https://bugs.launchpad.net/bugs/2023185

Our mpath stack device is just a shim that selects a bottom namespace
and submits the bio to it without any fancy splitting. This also means
that we don't clone the bio or have any context to the bio beyond
submission. However it really sucks that we don't see the mpath device
io stats.

Given that the mpath device can't do that without adding some context
to it, we let the bottom device do it on its behalf (somewhat similar
to the approach taken in nvme_trace_bio_complete).

When the IO starts, we account the request for multipath IO stats using
REQ_NVME_MPATH_IO_STATS nvme_request flag to avoid queue io stats disable
in the middle of the request.

Signed-off-by: Sagi Grimberg <email address hidden>
Signed-off-by: Christoph Hellwig <email address hidden>
Reviewed-by: Keith Busch <email address hidden>
(backported from commit d4d957b53d91eebc8c681c480edfdc697e55231e)
[Michael Reed - Adjusted location of code from patch]
Signed-off-by: Michael Reed <email address hidden>

d2c14f3... by Sagi Grimberg <email address hidden>

nvme: introduce nvme_start_request

BugLink: https://bugs.launchpad.net/bugs/2023185

In preparation for nvme-multipath IO stats accounting, we want the
accounting to happen in a centralized place. The request completion
is already centralized, but we need a common helper to request I/O
start.

Signed-off-by: Sagi Grimberg <email address hidden>
Signed-off-by: Christoph Hellwig <email address hidden>
Reviewed-by: Keith Busch <email address hidden>
Reviewed-by: Hannes Reinecke <email address hidden>
(backported from commit 6887fc6495f2dfd55e088c982e983815278ee453)
[Michael Reed - Did not remove apple.c and adjusted for missing code from patch]
Signed-off-by: Michael Reed <email address hidden>

7e7e452... by Jordan Rife <email address hidden>

net: Avoid address overwrite in kernel_connect

BugLink: https://bugs.launchpad.net/bugs/2035163

BPF programs that run on connect can rewrite the connect address. For
the connect system call this isn't a problem, because a copy of the address
is made when it is moved into kernel space. However, kernel_connect
simply passes through the address it is given, so the caller may observe
its address value unexpectedly change.

A practical example where this is problematic is where NFS is combined
with a system such as Cilium which implements BPF-based load balancing.
A common pattern in software-defined storage systems is to have an NFS
mount that connects to a persistent virtual IP which in turn maps to an
ephemeral server IP. This is usually done to achieve high availability:
if your server goes down you can quickly spin up a replacement and remap
the virtual IP to that endpoint. With BPF-based load balancing, mounts
will forget the virtual IP address when the address rewrite occurs
because a pointer to the only copy of that address is passed down the
stack. Server failover then breaks, because clients have forgotten the
virtual IP address. Reconnects fail and mounts remain broken. This patch
was tested by setting up a scenario like this and ensuring that NFS
reconnects worked after applying the patch.

Signed-off-by: Jordan Rife <email address hidden>
Signed-off-by: David S. Miller <email address hidden>
(backported from commit 0bdf399342c5acbd817c9098b6c7ed21f1974312)
[ kmously: adjusted for lack of READ_ONCE() ]
Signed-off-by: Khalid Elmously <email address hidden>
Acked-by: Stefan Bader <email address hidden>
Acked-by: Tim Gardner <email address hidden>
Signed-off-by: Roxana Nicolescu <email address hidden>

4d40519... by Sean Christopherson <email address hidden>

KVM: x86/mmu: Track the number of TDP MMU pages, but not the actual pages

BugLink: https://bugs.launchpad.net/bugs/2035166

Track the number of TDP MMU "shadow" pages instead of tracking the pages
themselves. With the NX huge page list manipulation moved out of the common
linking flow, elminating the list-based tracking means the happy path of
adding a shadow page doesn't need to acquire a spinlock and can instead
inc/dec an atomic.

Keep the tracking as the WARN during TDP MMU teardown on leaked shadow
pages is very, very useful for detecting KVM bugs.

Tracking the number of pages will also make it trivial to expose the
counter to userspace as a stat in the future, which may or may not be
desirable.

Note, the TDP MMU needs to use a separate counter (and stat if that ever
comes to be) from the existing n_used_mmu_pages. The TDP MMU doesn't bother
supporting the shrinker nor does it honor KVM_SET_NR_MMU_PAGES (because the
TDP MMU consumes so few pages relative to shadow paging), and including TDP
MMU pages in that counter would break both the shrinker and shadow MMUs,
e.g. if a VM is using nested TDP.

Cc: Yan Zhao <email address hidden>
Reviewed-by: Mingwei Zhang <email address hidden>
Reviewed-by: David Matlack <email address hidden>
Signed-off-by: Sean Christopherson <email address hidden>
Reviewed-by: Yan Zhao <email address hidden>
Message-Id: <email address hidden>
Signed-off-by: Paolo Bonzini <email address hidden>
(backported from commit d25ceb9264364dca0683748b301340097fdab6c7)
[chengen - Apply atomic64_t-related changes following
the upstream patch's execution order]
Signed-off-by: Chengen Du <email address hidden>
Acked-by: Thadeu Lima de Souza Cascardo <email address hidden>
Acked-by: Tim Gardner <email address hidden>
Signed-off-by: Roxana Nicolescu <email address hidden>

90fb919... by Ying Hsu <email address hidden>

igb: Fix igb_down hung on surprise removal

BugLink: https://bugs.launchpad.net/bugs/2034479

In a setup where a Thunderbolt hub connects to Ethernet and a display
through USB Type-C, users may experience a hung task timeout when they
remove the cable between the PC and the Thunderbolt hub.
This is because the igb_down function is called multiple times when
the Thunderbolt hub is unplugged. For example, the igb_io_error_detected
triggers the first call, and the igb_remove triggers the second call.
The second call to igb_down will block at napi_synchronize.
Here's the call trace:
    __schedule+0x3b0/0xddb
    ? __mod_timer+0x164/0x5d3
    schedule+0x44/0xa8
    schedule_timeout+0xb2/0x2a4
    ? run_local_timers+0x4e/0x4e
    msleep+0x31/0x38
    igb_down+0x12c/0x22a [igb 6615058754948bfde0bf01429257eb59f13030d4]
    __igb_close+0x6f/0x9c [igb 6615058754948bfde0bf01429257eb59f13030d4]
    igb_close+0x23/0x2b [igb 6615058754948bfde0bf01429257eb59f13030d4]
    __dev_close_many+0x95/0xec
    dev_close_many+0x6e/0x103
    unregister_netdevice_many+0x105/0x5b1
    unregister_netdevice_queue+0xc2/0x10d
    unregister_netdev+0x1c/0x23
    igb_remove+0xa7/0x11c [igb 6615058754948bfde0bf01429257eb59f13030d4]
    pci_device_remove+0x3f/0x9c
    device_release_driver_internal+0xfe/0x1b4
    pci_stop_bus_device+0x5b/0x7f
    pci_stop_bus_device+0x30/0x7f
    pci_stop_bus_device+0x30/0x7f
    pci_stop_and_remove_bus_device+0x12/0x19
    pciehp_unconfigure_device+0x76/0xe9
    pciehp_disable_slot+0x6e/0x131
    pciehp_handle_presence_or_link_change+0x7a/0x3f7
    pciehp_ist+0xbe/0x194
    irq_thread_fn+0x22/0x4d
    ? irq_thread+0x1fd/0x1fd
    irq_thread+0x17b/0x1fd
    ? irq_forced_thread_fn+0x5f/0x5f
    kthread+0x142/0x153
    ? __irq_get_irqchip_state+0x46/0x46
    ? kthread_associate_blkcg+0x71/0x71
    ret_from_fork+0x1f/0x30

In this case, igb_io_error_detected detaches the network interface
and requests a PCIE slot reset, however, the PCIE reset callback is
not being invoked and thus the Ethernet connection breaks down.
As the PCIE error in this case is a non-fatal one, requesting a
slot reset can be avoided.
This patch fixes the task hung issue and preserves Ethernet
connection by ignoring non-fatal PCIE errors.

Signed-off-by: Ying Hsu <email address hidden>
Tested-by: Pucha Himasekhar Reddy <email address hidden> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <email address hidden>
Reviewed-by: Simon Horman <email address hidden>
Link: https://<email address hidden>
Signed-off-by: Jakub Kicinski <email address hidden>
(cherry picked from commit 004d25060c78fc31f66da0fa439c544dda1ac9d5)
Signed-off-by: Aaron Ma <email address hidden>
Acked-by: Tim Gardner <email address hidden>
Acked-by: Roxana Nicolescu <email address hidden>
Signed-off-by: Roxana Nicolescu <email address hidden>

54cefb5... by Feng Tang

x86/fpu: Set X86_FEATURE_OSXSAVE feature after enabling OSXSAVE in CR4

BugLink: https://bugs.launchpad.net/bugs/2034745

0-Day found a 34.6% regression in stress-ng's 'af-alg' test case, and
bisected it to commit b81fac906a8f ("x86/fpu: Move FPU initialization into
arch_cpu_finalize_init()"), which optimizes the FPU init order, and moves
the CR4_OSXSAVE enabling into a later place:

   arch_cpu_finalize_init
       identify_boot_cpu
    identify_cpu
        generic_identify
                   get_cpu_cap --> setup cpu capability
       ...
       fpu__init_cpu
           fpu__init_cpu_xstate
               cr4_set_bits(X86_CR4_OSXSAVE);

As the FPU is not yet initialized the CPU capability setup fails to set
X86_FEATURE_OSXSAVE. Many security module like 'camellia_aesni_avx_x86_64'
depend on this feature and therefore fail to load, causing the regression.

Cure this by setting X86_FEATURE_OSXSAVE feature right after OSXSAVE
enabling.

[ tglx: Moved it into the actual BSP FPU initialization code and added a comment ]

Fixes: b81fac906a8f ("x86/fpu: Move FPU initialization into arch_cpu_finalize_init()")
Reported-by: kernel test robot <email address hidden>
Signed-off-by: Feng Tang <email address hidden>
Signed-off-by: Thomas Gleixner <email address hidden>
Cc: <email address hidden>
Link: https://<email address hidden>
Link: https://<email address hidden>
(cherry picked from commit 2c66ca3949dc701da7f4c9407f2140ae425683a5)
Signed-off-by: Tim Gardner <email address hidden>
Acked-by: Thadeu Lima de Souza Cascardo <email address hidden>
Acked-by: Roxana Nicolescu <email address hidden>
Signed-off-by: Roxana Nicolescu <email address hidden>

c3c5aa8... by Florian Westphal <email address hidden>

netfilter: nftables: exthdr: fix 4-byte stack OOB write

If priv->len is a multiple of 4, then dst[len / 4] can write past
the destination array which leads to stack corruption.

This construct is necessary to clean the remainder of the register
in case ->len is NOT a multiple of the register size, so make it
conditional just like nft_payload.c does.

The bug was added in 4.1 cycle and then copied/inherited when
tcp/sctp and ip option support was added.

Bug reported by Zero Day Initiative project (ZDI-CAN-21950,
ZDI-CAN-21951, ZDI-CAN-21961).

Fixes: 49499c3e6e18 ("netfilter: nf_tables: switch registers to 32 bit addressing")
Fixes: 935b7f643018 ("netfilter: nft_exthdr: add TCP option matching")
Fixes: 133dc203d77d ("netfilter: nft_exthdr: Support SCTP chunks")
Fixes: dbb5281a1f84 ("netfilter: nf_tables: add support for matching IPv4 options")
Signed-off-by: Florian Westphal <email address hidden>
(cherry picked from commit fd94d9dadee58e09b49075240fe83423eb1dcd36)
CVE-2023-4881
Signed-off-by: Yuxuan Luo <email address hidden>
Acked-by: Tim Gardner <email address hidden>
Acked-by: Roxana Nicolescu <email address hidden>
Signed-off-by: Roxana Nicolescu <email address hidden>

6384b38... by Kuniyuki Iwashima <email address hidden>

af_unix: Fix null-ptr-deref in unix_stream_sendpage().

Bing-Jhong Billy Jheng reported null-ptr-deref in unix_stream_sendpage()
with detailed analysis and a nice repro.

unix_stream_sendpage() tries to add data to the last skb in the peer's
recv queue without locking the queue.

If the peer's FD is passed to another socket and the socket's FD is
passed to the peer, there is a loop between them. If we close both
sockets without receiving FD, the sockets will be cleaned up by garbage
collection.

The garbage collection iterates such sockets and unlinks skb with
FD from the socket's receive queue under the queue's lock.

So, there is a race where unix_stream_sendpage() could access an skb
locklessly that is being released by garbage collection, resulting in
use-after-free.

To avoid the issue, unix_stream_sendpage() must lock the peer's recv
queue.

Note the issue does not exist in 6.5+ thanks to the recent sendpage()
refactoring.

This patch is originally written by Linus Torvalds.

BUG: unable to handle page fault for address: ffff988004dd6870
PF: supervisor read access in kernel mode
PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
PREEMPT SMP PTI
CPU: 4 PID: 297 Comm: garbage_uaf Not tainted 6.1.46 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:kmem_cache_alloc_node+0xa2/0x1e0
Code: c0 0f 84 32 01 00 00 41 83 fd ff 74 10 48 8b 00 48 c1 e8 3a 41 39 c5 0f 85 1c 01 00 00 41 8b 44 24 28 49 8b 3c 24 48 8d 4a 40 <49> 8b 1c 06 4c 89 f0 65 48 0f c7 0f 0f 94 c0 84 c0 74 a1 41 8b 44
RSP: 0018:ffffc9000079fac0 EFLAGS: 00000246
RAX: 0000000000000070 RBX: 0000000000000005 RCX: 000000000001a284
RDX: 000000000001a244 RSI: 0000000000400cc0 RDI: 000000000002eee0
RBP: 0000000000400cc0 R08: 0000000000400cc0 R09: 0000000000000003
R10: 0000000000000001 R11: 0000000000000000 R12: ffff888003970f00
R13: 00000000ffffffff R14: ffff988004dd6800 R15: 00000000000000e8
FS: 00007f174d6f3600(0000) GS:ffff88807db00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff988004dd6870 CR3: 00000000092be000 CR4: 00000000007506e0
PKRU: 55555554
Call Trace:
 <TASK>
 ? __die_body.cold+0x1a/0x1f
 ? page_fault_oops+0xa9/0x1e0
 ? fixup_exception+0x1d/0x310
 ? exc_page_fault+0xa8/0x150
 ? asm_exc_page_fault+0x22/0x30
 ? kmem_cache_alloc_node+0xa2/0x1e0
 ? __alloc_skb+0x16c/0x1e0
 __alloc_skb+0x16c/0x1e0
 alloc_skb_with_frags+0x48/0x1e0
 sock_alloc_send_pskb+0x234/0x270
 unix_stream_sendmsg+0x1f5/0x690
 sock_sendmsg+0x5d/0x60
 ____sys_sendmsg+0x210/0x260
 ___sys_sendmsg+0x83/0xd0
 ? kmem_cache_alloc+0xc6/0x1c0
 ? avc_disable+0x20/0x20
 ? percpu_counter_add_batch+0x53/0xc0
 ? alloc_empty_file+0x5d/0xb0
 ? alloc_file+0x91/0x170
 ? alloc_file_pseudo+0x94/0x100
 ? __fget_light+0x9f/0x120
 __sys_sendmsg+0x54/0xa0
 do_syscall_64+0x3b/0x90
 entry_SYSCALL_64_after_hwframe+0x69/0xd3
RIP: 0033:0x7f174d639a7d
Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 8a c1 f4 ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 de c1 f4 ff 48
RSP: 002b:00007ffcb563ea50 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f174d639a7d
RDX: 0000000000000000 RSI: 00007ffcb563eab0 RDI: 0000000000000007
RBP: 00007ffcb563eb10 R08: 0000000000000000 R09: 00000000ffffffff
R10: 00000000004040a0 R11: 0000000000000293 R12: 00007ffcb563ec28
R13: 0000000000401398 R14: 0000000000403e00 R15: 00007f174d72c000
 </TASK>

Fixes: 869e7c62486e ("net: af_unix: implement stream sendpage support")
Reported-by: Bing-Jhong Billy Jheng <email address hidden>
Reviewed-by: Bing-Jhong Billy Jheng <email address hidden>
Co-developed-by: Linus Torvalds <email address hidden>
Signed-off-by: Linus Torvalds <email address hidden>
Signed-off-by: Kuniyuki Iwashima <email address hidden>
Signed-off-by: Greg Kroah-Hartman <email address hidden>
(cherry picked from commit 790c2f9d15b594350ae9bca7b236f2b1859de02c
stable/linux-6.1.y)
CVE-2023-4622
Signed-off-by: Yuxuan Luo <email address hidden>
Acked-by: Stefan Bader <email address hidden>
Acked-by: Thadeu Lima de Souza Cascardo <email address hidden>
Signed-off-by: Roxana Nicolescu <email address hidden>

d40e4e1... by Kamal Mostafa

UBUNTU: Upstream stable to v5.15.124

BugLink: https://bugs.launchpad.net/bugs/2035400

Ignore: yes
Signed-off-by: Kamal Mostafa <email address hidden>
Signed-off-by: Stefan Bader <email address hidden>

a62f853... by Greg Kroah-Hartman <email address hidden>

Linux 5.15.124

BugLink: https://bugs.launchpad.net/bugs/2035400

Link: https://<email address hidden>
Tested-by: Jon Hunter <email address hidden>
Tested-by: SeongJae Park <email address hidden>
Tested-by: Florian Fainelli <email address hidden>
Tested-by: Shuah Khan <email address hidden>
Tested-by: Harshit Mogalapalli <email address hidden>
Link: https://<email address hidden>
Tested-by: Chris Paterson (CIP) <email address hidden>
Tested-by: SeongJae Park <email address hidden>
Tested-by: Ron Economos <email address hidden>
Tested-by: Florian Fainelli <email address hidden>
Tested-by: Guenter Roeck <email address hidden>
Tested-by: Linux Kernel Functional Testing <email address hidden>
Signed-off-by: Greg Kroah-Hartman <email address hidden>
Signed-off-by: Kamal Mostafa <email address hidden>
Signed-off-by: Stefan Bader <email address hidden>