~mreed8855/ubuntu/+source/linux/+git/jammy:lp_2008745_config_numa_emu_2

Last commit made on 2023-06-14
Get this branch:
git clone -b lp_2008745_config_numa_emu_2 https://git.launchpad.net/~mreed8855/ubuntu/+source/linux/+git/jammy
Only Michael Reed can upload to this branch. If you are Michael Reed please log in for upload directions.

Branch merges

Branch information

Name:
lp_2008745_config_numa_emu_2
Repository:
lp:~mreed8855/ubuntu/+source/linux/+git/jammy

Recent commits

843a9cc... by Michael Reed

UBUNTU: [Config] Intel Sapphire Rapids HBM support needs CONFIG_NUMA_EMU

BugLink: https://bugs.launchpad.net/bugs/2008745

Currently Ubuntu kernel has this kernel config disabled.
But in some cases, Intel's Sapphire Rapids High Bandwith
Memory (SPR-HBM) needs this option.

Memory bandwidth has been a bottleneck of increasingly memory bound
workloads. Sapphire Rapids plus HBM is specifically targeted to
cater to these workloads, traditionally served using overprovisioning
of memory devices.

Signed-off-by: Michael Reed <email address hidden>

bc2e133... by =?utf-8?q?Michal_Koutn=C3=BD?= <email address hidden>

x86/mm: Do not shuffle CPU entry areas without KASLR

The commit 97e3d26b5e5f ("x86/mm: Randomize per-cpu entry area") fixed
an omission of KASLR on CPU entry areas. It doesn't take into account
KASLR switches though, which may result in unintended non-determinism
when a user wants to avoid it (e.g. debugging, benchmarking).

Generate only a single combination of CPU entry areas offsets -- the
linear array that existed prior randomization when KASLR is turned off.

Since we have 3f148f331814 ("x86/kasan: Map shadow for percpu pages on
demand") and followups, we can use the more relaxed guard
kasrl_enabled() (in contrast to kaslr_memory_enabled()).

Fixes: 97e3d26b5e5f ("x86/mm: Randomize per-cpu entry area")
Signed-off-by: Michal Koutný <email address hidden>
Signed-off-by: Dave Hansen <email address hidden>
Cc: <email address hidden>
Link: https://lore.kernel.org/all/20230306193144.24605-1-mkoutny%40suse.com
CVE-2023-0597
(cherry picked from commit a3f547addcaa10df5a226526bc9e2d9a94542344)
Signed-off-by: Cengiz Can <email address hidden>
Acked-by: Tim Gardner <email address hidden>
Acked-by: Andrei Gherzan <email address hidden>
Signed-off-by: Luke Nowakowski-Krijger <email address hidden>

2c1a2ce... by Sean Christopherson <email address hidden>

x86/mm: Populate KASAN shadow for entire per-CPU range of CPU entry area

Populate a KASAN shadow for the entire possible per-CPU range of the CPU
entry area instead of requiring that each individual chunk map a shadow.
Mapping shadows individually is error prone, e.g. the per-CPU GDT mapping
was left behind, which can lead to not-present page faults during KASAN
validation if the kernel performs a software lookup into the GDT. The DS
buffer is also likely affected.

The motivation for mapping the per-CPU areas on-demand was to avoid
mapping the entire 512GiB range that's reserved for the CPU entry area,
shaving a few bytes by not creating shadows for potentially unused memory
was not a goal.

The bug is most easily reproduced by doing a sigreturn with a garbage
CS in the sigcontext, e.g.

  int main(void)
  {
    struct sigcontext regs;

    syscall(__NR_mmap, 0x1ffff000ul, 0x1000ul, 0ul, 0x32ul, -1, 0ul);
    syscall(__NR_mmap, 0x20000000ul, 0x1000000ul, 7ul, 0x32ul, -1, 0ul);
    syscall(__NR_mmap, 0x21000000ul, 0x1000ul, 0ul, 0x32ul, -1, 0ul);

    memset(&regs, 0, sizeof(regs));
    regs.cs = 0x1d0;
    syscall(__NR_rt_sigreturn);
    return 0;
  }

to coerce the kernel into doing a GDT lookup to compute CS.base when
reading the instruction bytes on the subsequent #GP to determine whether
or not the #GP is something the kernel should handle, e.g. to fixup UMIP
violations or to emulate CLI/STI for IOPL=3 applications.

  BUG: unable to handle page fault for address: fffffbc8379ace00
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 16c03a067 P4D 16c03a067 PUD 15b990067 PMD 15b98f067 PTE 0
  Oops: 0000 [#1] PREEMPT SMP KASAN
  CPU: 3 PID: 851 Comm: r2 Not tainted 6.1.0-rc3-next-20221103+ #432
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  RIP: 0010:kasan_check_range+0xdf/0x190
  Call Trace:
   <TASK>
   get_desc+0xb0/0x1d0
   insn_get_seg_base+0x104/0x270
   insn_fetch_from_user+0x66/0x80
   fixup_umip_exception+0xb1/0x530
   exc_general_protection+0x181/0x210
   asm_exc_general_protection+0x22/0x30
  RIP: 0003:0x0
  Code: Unable to access opcode bytes at 0xffffffffffffffd6.
  RSP: 0003:0000000000000000 EFLAGS: 00000202
  RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000000001d0
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
  RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
  R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
   </TASK>

Fixes: 9fd429c28073 ("x86/kasan: Map shadow for percpu pages on demand")
Reported-by: <email address hidden>
Suggested-by: Andrey Ryabinin <email address hidden>
Signed-off-by: Sean Christopherson <email address hidden>
Signed-off-by: Peter Zijlstra (Intel) <email address hidden>
Reviewed-by: Andrey Ryabinin <email address hidden>
Link: https://<email address hidden>
CVE-2023-0597
(cherry picked from commit 97650148a15e0b30099d6175ffe278b9f55ec66a)
Signed-off-by: Cengiz Can <email address hidden>
Acked-by: Tim Gardner <email address hidden>
Acked-by: Andrei Gherzan <email address hidden>
Signed-off-by: Luke Nowakowski-Krijger <email address hidden>

1623168... by Sean Christopherson <email address hidden>

x86/mm: Recompute physical address for every page of per-CPU CEA mapping

Recompute the physical address for each per-CPU page in the CPU entry
area, a recent commit inadvertantly modified cea_map_percpu_pages() such
that every PTE is mapped to the physical address of the first page.

Fixes: 9fd429c28073 ("x86/kasan: Map shadow for percpu pages on demand")
Signed-off-by: Sean Christopherson <email address hidden>
Signed-off-by: Peter Zijlstra (Intel) <email address hidden>
Reviewed-by: Andrey Ryabinin <email address hidden>
Link: https://<email address hidden>
CVE-2023-0597
(cherry picked from commit 80d72a8f76e8f3f0b5a70b8c7022578e17bde8e7)
Signed-off-by: Cengiz Can <email address hidden>
Acked-by: Tim Gardner <email address hidden>
Acked-by: Andrei Gherzan <email address hidden>
Signed-off-by: Luke Nowakowski-Krijger <email address hidden>

6bdab74... by Peter Zijlstra <email address hidden>

x86/mm: Randomize per-cpu entry area

Seth found that the CPU-entry-area; the piece of per-cpu data that is
mapped into the userspace page-tables for kPTI is not subject to any
randomization -- irrespective of kASLR settings.

On x86_64 a whole P4D (512 GB) of virtual address space is reserved for
this structure, which is plenty large enough to randomize things a
little.

As such, use a straight forward randomization scheme that avoids
duplicates to spread the existing CPUs over the available space.

  [ bp: Fix le build. ]

Reported-by: Seth Jenkins <email address hidden>
Reviewed-by: Kees Cook <email address hidden>
Signed-off-by: Peter Zijlstra (Intel) <email address hidden>
Signed-off-by: Dave Hansen <email address hidden>
Signed-off-by: Borislav Petkov <email address hidden>
CVE-2023-0597
(backported from commit 97e3d26b5e5f371b3ee223d94dd123e6c442ba80)
[cengizcan: include random.h for newly introduced `prandom_u32_max` call]
Signed-off-by: Cengiz Can <email address hidden>
Acked-by: Tim Gardner <email address hidden>
Acked-by: Andrei Gherzan <email address hidden>
Signed-off-by: Luke Nowakowski-Krijger <email address hidden>

9862993... by Andrey Ryabinin <email address hidden>

x86/kasan: Map shadow for percpu pages on demand

KASAN maps shadow for the entire CPU-entry-area:
  [CPU_ENTRY_AREA_BASE, CPU_ENTRY_AREA_BASE + CPU_ENTRY_AREA_MAP_SIZE]

This will explode once the per-cpu entry areas are randomized since it
will increase CPU_ENTRY_AREA_MAP_SIZE to 512 GB and KASAN fails to
allocate shadow for such big area.

Fix this by allocating KASAN shadow only for really used cpu entry area
addresses mapped by cea_map_percpu_pages()

Thanks to the 0day folks for finding and reporting this to be an issue.

[ dhansen: tweak changelog since this will get committed before peterz's
    actual cpu-entry-area randomization ]

Signed-off-by: Andrey Ryabinin <email address hidden>
Signed-off-by: Dave Hansen <email address hidden>
Tested-by: Yujie Liu <email address hidden>
Cc: kernel test robot <email address hidden>
Link: https://<email address hidden>
CVE-2023-0597
(cherry picked from commit 3f148f3318140035e87decc1214795ff0755757b)
[cengizcan: prerequisite commit]
Signed-off-by: Cengiz Can <email address hidden>
Acked-by: Tim Gardner <email address hidden>
Acked-by: Andrei Gherzan <email address hidden>
Signed-off-by: Luke Nowakowski-Krijger <email address hidden>

881a5b1... by Hangyu Hua

net/sched: flower: fix possible OOB write in fl_set_geneve_opt()

BugLink: https://bugs.launchpad.net/bugs/2023577

If we send two TCA_FLOWER_KEY_ENC_OPTS_GENEVE packets and their total
size is 252 bytes(key->enc_opts.len = 252) then
key->enc_opts.len = opt->length = data_len / 4 = 0 when the third
TCA_FLOWER_KEY_ENC_OPTS_GENEVE packet enters fl_set_geneve_opt. This
bypasses the next bounds check and results in an out-of-bounds.

Fixes: 0a6e77784f49 ("net/sched: allow flower to match tunnel options")
Signed-off-by: Hangyu Hua <email address hidden>
Reviewed-by: Simon Horman <email address hidden>
Reviewed-by: Pieter Jansen van Vuuren <email address hidden>
Link: https://<email address hidden>
Signed-off-by: Paolo Abeni <email address hidden>

(cherry picked from commit 4d56304e5827c8cc8cc18c75343d283af7c4825c)
Signed-off-by: Thadeu Lima de Souza Cascardo <email address hidden>
Acked-by: Khaled Elmously <email address hidden>
Acked-by: Ian May <email address hidden>
Signed-off-by: Stefan Bader <email address hidden>

77680ec... by Dave Hansen <email address hidden>

x86/mm: Avoid incomplete Global INVLPG flushes

BugLink: https://bugs.launchpad.net/bugs/2023220

The INVLPG instruction is used to invalidate TLB entries for a
specified virtual address. When PCIDs are enabled, INVLPG is supposed
to invalidate TLB entries for the specified address for both the
current PCID *and* Global entries. (Note: Only kernel mappings set
Global=1.)

Unfortunately, some INVLPG implementations can leave Global
translations unflushed when PCIDs are enabled.

As a workaround, never enable PCIDs on affected processors.

I expect there to eventually be microcode mitigations to replace this
software workaround. However, the exact version numbers where that
will happen are not known today. Once the version numbers are set in
stone, the processor list can be tweaked to only disable PCIDs on
affected processors with affected microcode.

Note: if anyone wants a quick fix that doesn't require patching, just
stick 'nopcid' on your kernel command-line.

Signed-off-by: Dave Hansen <email address hidden>
Reviewed-by: Thomas Gleixner <email address hidden>
Cc: <email address hidden>
(cherry picked from commit ce0b15d11ad837fbacc5356941712218e38a0a83)
Signed-off-by: Thadeu Lima de Souza Cascardo <email address hidden>
Acked-by: Luke Nowakowski-Krijger <email address hidden>
Acked-by: Tim Gardner <email address hidden>
Signed-off-by: Luke Nowakowski-Krijger <email address hidden>

6e98058... by Kamal Mostafa

UBUNTU: Upstream stable to v5.15.108

BugLink: https://bugs.launchpad.net/bugs/2023328

Ignore: yes
Signed-off-by: Kamal Mostafa <email address hidden>
Signed-off-by: Luke Nowakowski-Krijger <email address hidden>

acabbc6... by Greg Kroah-Hartman <email address hidden>

Linux 5.15.108

BugLink: https://bugs.launchpad.net/bugs/2023328

Link: https://<email address hidden>
Tested-by: Florian Fainelli <email address hidden>
Tested-by: Shuah Khan <email address hidden>
Link: https://<email address hidden>
Tested-by: Chris Paterson (CIP) <email address hidden>
Tested-by: Jon Hunter <email address hidden>
Link: https://<email address hidden>
Tested-by: Ron Economos <email address hidden>
Link: https://<email address hidden>
Tested-by: Florian Fainelli <email address hidden>
Tested-by: Jon Hunter <email address hidden>
Tested-by: Guenter Roeck <email address hidden>
Tested-by: Bagas Sanjaya <email address hidden>
Tested-by: Ron Economos <email address hidden>
Signed-off-by: Greg Kroah-Hartman <email address hidden>
Signed-off-by: Kamal Mostafa <email address hidden>
Signed-off-by: Luke Nowakowski-Krijger <email address hidden>