~timg-tpi/ubuntu/+source/linux-azure/+git/lunar:lunar-azure-Add-PCI-pass-thru-to-cvm-lp2015369

Last commit made on 2023-05-31
Get this branch:
git clone -b lunar-azure-Add-PCI-pass-thru-to-cvm-lp2015369 https://git.launchpad.net/~timg-tpi/ubuntu/+source/linux-azure/+git/lunar
Only Tim Gardner can upload to this branch. If you are Tim Gardner please log in for upload directions.

Branch merges

Branch information

Name:
lunar-azure-Add-PCI-pass-thru-to-cvm-lp2015369
Repository:
lp:~timg-tpi/ubuntu/+source/linux-azure/+git/lunar

Recent commits

3aec786... by Michael Kelley

arm64/hyperv: Use CPUHP_AP_HYPERV_ONLINE state to fix CPU online sequencing

BugLink: https://bugs.launchpad.net/bugs/2015369

State CPUHP_AP_HYPERV_ONLINE has been introduced to correctly sequence the
initialization of hyperv_pcpu_input_arg. Use this new state for Hyper-V
initialization so that hyperv_pcpu_input_arg is allocated early enough.

Signed-off-by: Michael Kelley <email address hidden>
(cherry picked from commit 2cd655fda96ea765db6b4145f7828280a0cfa2a1 https://github.com/kelleymh/linux.git)
Signed-off-by: Tim Gardner <email address hidden>

441920a... by Michael Kelley

x86/hyperv: Fix hyperv_pcpu_input_arg handling when CPUs go online/offline

BugLink: https://bugs.launchpad.net/bugs/2015369

These commits

a494aef23dfc ("PCI: hv: Replace retarget_msi_interrupt_params with hyperv_pcpu_input_arg")
2c6ba4216844 ("PCI: hv: Enable PCI pass-thru devices in Confidential VMs")

update the Hyper-V virtual PCI driver to use the hyperv_pcpu_input_arg
because that memory will be correctly marked as decrypted or encrypted
for all VM types (CoCo or normal). But problems ensue when CPUs in the
VM go online or offline after virtual PCI devices have been configured.

When a CPU is brought online, the hyperv_pcpu_input_arg for that CPU is
initialized by hv_cpu_init() running under state CPUHP_AP_ONLINE_DYN.
But this state occurs after state CPUHP_AP_IRQ_AFFINITY_ONLINE, which
may call the virtual PCI driver and fault trying to use the as yet
uninitialized hyperv_pcpu_input_arg. A similar problem occurs in a CoCo
VM if the MMIO read and write hypercalls are used from state
CPUHP_AP_IRQ_AFFINITY_ONLINE.

When a CPU is taken offline, IRQs may be reassigned in state
CPUHP_TEARDOWN_CPU. Again, the virtual PCI driver may fault trying to
use the hyperv_pcpu_input_arg that has already been freed by a
higher state.

Fix the onlining problem by adding state CPUHP_AP_HYPERV_ONLINE
immediately after CPUHP_AP_ONLINE_IDLE (similar to CPUHP_AP_KVM_ONLINE)
and before CPUHP_AP_IRQ_AFFINITY_ONLINE. Use this new state for
Hyper-V initialization so that hyperv_pcpu_input_arg is allocated
early enough.

Fix the offlining problem by not freeing hyperv_pcpu_input_arg when
a CPU goes offline. Retain the allocated memory, and reuse it if
the CPU comes back online later.

Signed-off-by: Michael Kelley <email address hidden>
(backported from commit d2d07a52edd573b7a884d30f90f347a10d83fba0 https://github.com/kelleymh/linux.git)
[rtg - conflicts with existing TDX patches with much help from Dexuan Cui <email address hidden>]
Signed-off-by: Tim Gardner <email address hidden>

018a77f... by Dexuan Cui

PCI: hv: Replace retarget_msi_interrupt_params with hyperv_pcpu_input_arg

BugLink: https://bugs.launchpad.net/bugs/2015369

4 commits are involved here:
A (2016): commit 0de8ce3ee8e3 ("PCI: hv: Allocate physically contiguous hypercall params buffer")
B (2017): commit be66b6736591 ("PCI: hv: Use page allocation for hbus structure")
C (2019): commit 877b911a5ba0 ("PCI: hv: Avoid a kmemleak false positive caused by the hbus buffer")
D (2018): commit 68bb7bfb7985 ("X86/Hyper-V: Enable IPI enlightenments")

Patch D introduced the per-CPU hypercall input page "hyperv_pcpu_input_arg"
in 2018. With patch D, we no longer need the per-Hyper-V-PCI-bus hypercall
input page "hbus->retarget_msi_interrupt_params" that was added in patch A,
and the issue addressed by patch B is no longer an issue, and we can also
get rid of patch C.

The change here is required for PCI device assignment to work for
Confidential VMs (CVMs) running without a paravisor, because otherwise we
would have to call set_memory_decrypted() for
"hbus->retarget_msi_interrupt_params" before calling the hypercall
HVCALL_RETARGET_INTERRUPT.

Signed-off-by: Dexuan Cui <email address hidden>
Reviewed-by: Michael Kelley <email address hidden>
Acked-by: Lorenzo Pieralisi <email address hidden>
Link: https://<email address hidden>
Signed-off-by: Wei Liu <email address hidden>
(cherry picked from commit a494aef23dfc732945cb42e22246a5c31174e4a5)
Signed-off-by: Tim Gardner <email address hidden>

48323d6... by Dexuan Cui

Drivers: hv: vmbus: Remove the per-CPU post_msg_page

BugLink: https://bugs.launchpad.net/bugs/2015369

The post_msg_page was introduced in 2014 in
commit b29ef3546aec ("Drivers: hv: vmbus: Cleanup hv_post_message()")

Commit 68bb7bfb7985 ("X86/Hyper-V: Enable IPI enlightenments") introduced
the hyperv_pcpu_input_arg in 2018, which can be used in hv_post_message().

Remove post_msg_page to simplify the code a little bit.

Signed-off-by: Dexuan Cui <email address hidden>
Reviewed-by: Jinank Jain <email address hidden>
Link: https://<email address hidden>
Signed-off-by: Wei Liu <email address hidden>
(backported from commit 9a6b1a170ca8627d401fa76112e4920ba8c6314c)
[rtg - context adjustment]
Signed-off-by: Dexuan Cui <email address hidden>
Signed-off-by: Tim Gardner <email address hidden>

8ba9049... by Dexuan Cui

UBUNTU: SAUCE: x86/hyperv: Support hypercalls for TDX guests (part 2)

BugLink: https://bugs.launchpad.net/bugs/2015369

Add the missing patch from
https://github.com/dcui/tdx/commit/1c59d1571537c077052140d61589c8851b3e4fd2

Signed-off-by: Dexuan Cui <email address hidden>
Signed-off-by: Tim Gardner <email address hidden>

39b5499... by Dexuan Cui

UBUNTU: SAUCE: Drivers: hv: vmbus: Hardcode MMIO resources in vmbus_walk_resources() when necessary

BugLink: https://bugs.launchpad.net/bugs/2015369

Currently Hyper-V has not provided valid MMIO resources for TDX VMs (without pavavisor).
Hardcode the resources as a temporary workaround.

This patch is tested on a local Hyper-V host for TDX VMs.

These hardcoded MMIO ranges are the same default values for a regular non-CVM VM
on a local Hyper-V host. On Azure, the default ranges are:
[c0000000, ffffffff] and [fe0000000, fffffffff]. These hardcoded MMIO ranges here
should also work when a TDX VM runs on Azure, though the patch uses a smaller 32-bit
MMIO range that's a subset of [c0000000, ffffffff].

The patch should not cause any regression for non-TDX VMs because the 'start' in
those VMs should be non-zero, so it should be safe to include this patch in an
official kernel.

Signed-off-by: Dexuan Cui <email address hidden>
(cherry picked from commit 41bc034ec095b46e13c59749736550aff451d011 https://github.com/dcui/tdx/commit/41bc034ec095b46e13c59749736550aff451d011)
Signed-off-by: Dexuan Cui <email address hidden>
Signed-off-by: Tim Gardner <email address hidden>

a9d9436... by Michael Kelley

PCI: hv: Enable PCI pass-thru devices in Confidential VMs

BugLink: https://bugs.launchpad.net/bugs/2015369

For PCI pass-thru devices in a Confidential VM, Hyper-V requires
that PCI config space be accessed via hypercalls. In normal VMs,
config space accesses are trapped to the Hyper-V host and emulated.
But in a confidential VM, the host can't access guest memory to
decode the instruction for emulation, so an explicit hypercall must
be used.

Add functions to make the new MMIO read and MMIO write hypercalls.
Update the PCI config space access functions to use the hypercalls
when such use is indicated by Hyper-V flags. Also, set the flag to
allow the Hyper-V PCI driver to be loaded and used in a Confidential
VM (a.k.a., "Isolation VM"). The driver has previously been hardened
against a malicious Hyper-V host[1].

[1] https://<email address hidden>/

Co-developed-by: Dexuan Cui <email address hidden>
Signed-off-by: Dexuan Cui <email address hidden>
Signed-off-by: Michael Kelley <email address hidden>
Reviewed-by: Boqun Feng <email address hidden>
Reviewed-by: Haiyang Zhang <email address hidden>
Link: https://<email address hidden>
Signed-off-by: Wei Liu <email address hidden>
(cherry picked from commit 2c6ba4216844ca7918289b49ed5f3f7138ee2402)
Signed-off-by: Tim Gardner <email address hidden>

9aa6302... by Michael Kelley

Drivers: hv: Don't remap addresses that are above shared_gpa_boundary

BugLink: https://bugs.launchpad.net/bugs/2015369

With the vTOM bit now treated as a protection flag and not part of
the physical address, avoid remapping physical addresses with vTOM set
since technically such addresses aren't valid. Use ioremap_cache()
instead of memremap() to ensure that the mapping provides decrypted
access, which will correctly set the vTOM bit as a protection flag.

While this change is not required for correctness with the current
implementation of memremap(), for general code hygiene it's better to
not depend on the mapping functions doing something reasonable with
a physical address that is out-of-range.

While here, fix typos in two error messages.

Signed-off-by: Michael Kelley <email address hidden>
Reviewed-by: Tianyu Lan <email address hidden>
Link: https://<email address hidden>
Signed-off-by: Wei Liu <email address hidden>
(cherry picked from commit 6afd9dc1a4b158456c072580f0851b4dbaaa02f1)
Signed-off-by: Tim Gardner <email address hidden>

86bf315... by Jinank Jain <email address hidden>

Drivers: hv: Enable vmbus driver for nested root partition

BugLink: https://bugs.launchpad.net/bugs/2015369

Currently VMBus driver is not initialized for root partition but we need
to enable the VMBus driver for nested root partition. This is required,
so that L2 root can use the VMBus devices.

Signed-off-by: Jinank Jain <email address hidden>
Reviewed-by: Michael Kelley <email address hidden>
Link: https://lore.kernel.org/r/c3cdd2cf2bffeba388688640eb61bc182e4c041d<email address hidden>
Signed-off-by: Wei Liu <email address hidden>
(cherry picked from commit 8536290f0011c582df08657825cf2fee65aac9ed)
Signed-off-by: Tim Gardner <email address hidden>

59454e8... by Jinank Jain <email address hidden>

x86/hyperv: Add an interface to do nested hypercalls

BugLink: https://bugs.launchpad.net/bugs/2015369

According to TLFS, in order to communicate to L0 hypervisor there needs
to be an additional bit set in the control register. This communication
is required to perform privileged instructions which can only be
performed by L0 hypervisor. An example of that could be setting up the
VMBus infrastructure.

Signed-off-by: Jinank Jain <email address hidden>
Reviewed-by: Michael Kelley <email address hidden>
Link: https://lore.kernel.org/r/24f9d46d5259a688113e6e5e69e21002647f4949<email address hidden>
Signed-off-by: Wei Liu <email address hidden>
(cherry picked from commit f0d2f5c2c000c03aa6b6a29954042174b59a0d1c)
Signed-off-by: Tim Gardner <email address hidden>