~john-cabaj/ubuntu/+source/linux-gcp/+git/mantic-gcp-tdx:tdx

Last commit made on 2024-02-07
Get this branch:
git clone -b tdx https://git.launchpad.net/~john-cabaj/ubuntu/+source/linux-gcp/+git/mantic-gcp-tdx
Only John Cabaj can upload to this branch. If you are John Cabaj please log in for upload directions.

Branch merges

Branch information

Recent commits

ff41aa9... by John Cabaj

UBUNTU: [Config] gcp: Updates for TDX

BugLink: https://bugs.launchpad.net/bugs/2052576

Signed-off-by: John Cabaj <email address hidden>

5ea6d41... by "Kirill A. Shutemov" <email address hidden>

x86/kvm: Do not try to disable kvmclock if it was not enabled

BugLink: https://bugs.launchpad.net/bugs/2052576

kvm_guest_cpu_offline() tries to disable kvmclock regardless if it is
present in the VM. It leads to write to a MSR that doesn't exist on some
configurations, namely in TDX guest:

 unchecked MSR access error: WRMSR to 0x12 (tried to write 0x0000000000000000)
 at rIP: 0xffffffff8110687c (kvmclock_disable+0x1c/0x30)

kvmclock enabling is gated by CLOCKSOURCE and CLOCKSOURCE2 KVM paravirt
features.

Do not disable kvmclock if it was not enabled.

Signed-off-by: Kirill A. Shutemov <email address hidden>
Fixes: c02027b5742b ("x86/kvm: Disable kvmclock on all CPUs on shutdown")
Reviewed-by: Sean Christopherson <email address hidden>
Reviewed-by: Vitaly Kuznetsov <email address hidden>
Cc: Paolo Bonzini <email address hidden>
Cc: Wanpeng Li <email address hidden>
Cc: <email address hidden>
Message-Id: <email address hidden>
Signed-off-by: Paolo Bonzini <email address hidden>
(cherry picked from commit 1c6d984f523f67ecfad1083bb04c55d91977bb15)
Signed-off-by: John Cabaj <email address hidden>

63362df... by Kuppuswamy Sathyanarayanan <email address hidden>

virt: tdx-guest: Add Quote generation support using TSM_REPORTS

BugLink: https://bugs.launchpad.net/bugs/2052576

In TDX guest, the attestation process is used to verify the TDX guest
trustworthiness to other entities before provisioning secrets to the
guest. The first step in the attestation process is TDREPORT
generation, which involves getting the guest measurement data in the
format of TDREPORT, which is further used to validate the authenticity
of the TDX guest. TDREPORT by design is integrity-protected and can
only be verified on the local machine.

To support remote verification of the TDREPORT in a SGX-based
attestation, the TDREPORT needs to be sent to the SGX Quoting Enclave
(QE) to convert it to a remotely verifiable Quote. SGX QE by design can
only run outside of the TDX guest (i.e. in a host process or in a
normal VM) and guest can use communication channels like vsock or
TCP/IP to send the TDREPORT to the QE. But for security concerns, the
TDX guest may not support these communication channels. To handle such
cases, TDX defines a GetQuote hypercall which can be used by the guest
to request the host VMM to communicate with the SGX QE. More details
about GetQuote hypercall can be found in TDX Guest-Host Communication
Interface (GHCI) for Intel TDX 1.0, section titled
"TDG.VP.VMCALL<GetQuote>".

Trusted Security Module (TSM) [1] exposes a common ABI for Confidential
Computing Guest platforms to get the measurement data via ConfigFS.
Extend the TSM framework and add support to allow an attestation agent
to get the TDX Quote data (included usage example below).

  report=/sys/kernel/config/tsm/report/report0
  mkdir $report
  dd if=/dev/urandom bs=64 count=1 > $report/inblob
  hexdump -C $report/outblob
  rmdir $report

GetQuote TDVMCALL requires TD guest pass a 4K aligned shared buffer
with TDREPORT data as input, which is further used by the VMM to copy
the TD Quote result after successful Quote generation. To create the
shared buffer, allocate a large enough memory and mark it shared using
set_memory_decrypted() in tdx_guest_init(). This buffer will be re-used
for GetQuote requests in the TDX TSM handler.

Although this method reserves a fixed chunk of memory for GetQuote
requests, such one time allocation can help avoid memory fragmentation
related allocation failures later in the uptime of the guest.

Since the Quote generation process is not time-critical or frequently
used, the current version uses a polling model for Quote requests and
it also does not support parallel GetQuote requests.

Link: https://lore<email address hidden>/ [1]
Signed-off-by: Kuppuswamy Sathyanarayanan <email address hidden>
Reviewed-by: Erdem Aktas <email address hidden>
Tested-by: Kuppuswamy Sathyanarayanan <email address hidden>
Tested-by: Peter Gonda <email address hidden>
Reviewed-by: Tom Lendacky <email address hidden>
Signed-off-by: Dan Williams <email address hidden>
(cherry picked from commit f4738f56d1dc62aaba69b33702a5ab098f1b8c63)
[john-cabaj: commit 2a74ba8fe46d added tdx_mcall_extend_rtmr(), which is not
yet upstream, so adjusting context]
Signed-off-by: John Cabaj <email address hidden>

658291b... by "Kirill A. Shutemov" <email address hidden>

x86/tdx: Allow 32-bit emulation by default

BugLink: https://bugs.launchpad.net/bugs/2052576

32-bit emulation was disabled on TDX to prevent a possible attack by
a VMM injecting an interrupt on vector 0x80.

Now that int80_emulation() has a check for external interrupts the
limitation can be lifted.

To distinguish software interrupts from external ones, int80_emulation()
checks the APIC ISR bit relevant to the 0x80 vector. For
software interrupts, this bit will be 0.

On TDX, the VAPIC state (including ISR) is protected and cannot be
manipulated by the VMM. The ISR bit is set by the microcode flow during
the handling of posted interrupts.

[ dhansen: more changelog tweaks ]

Signed-off-by: Kirill A. Shutemov <email address hidden>
Signed-off-by: Dave Hansen <email address hidden>
Reviewed-by: Thomas Gleixner <email address hidden>
Reviewed-by: Borislav Petkov (AMD) <email address hidden>
Cc: <email address hidden> # v6.0+
(cherry picked from commit f4116bfc44621882556bbf70f5284fbf429a5cf6)
Signed-off-by: John Cabaj <email address hidden>

6dfe097... by tglx

x86/entry: Do not allow external 0x80 interrupts

BugLink: https://bugs.launchpad.net/bugs/2052576

The INT 0x80 instruction is used for 32-bit x86 Linux syscalls. The
kernel expects to receive a software interrupt as a result of the INT
0x80 instruction. However, an external interrupt on the same vector
also triggers the same codepath.

An external interrupt on vector 0x80 will currently be interpreted as a
32-bit system call, and assuming that it was a user context.

Panic on external interrupts on the vector.

To distinguish software interrupts from external ones, the kernel checks
the APIC ISR bit relevant to the 0x80 vector. For software interrupts,
this bit will be 0.

Signed-off-by: Thomas Gleixner <email address hidden>
Signed-off-by: Kirill A. Shutemov <email address hidden>
Signed-off-by: Dave Hansen <email address hidden>
Reviewed-by: Borislav Petkov (AMD) <email address hidden>
Cc: <email address hidden> # v6.0+
(cherry picked from commit 55617fb991df535f953589586468612351575704)
Signed-off-by: John Cabaj <email address hidden>

407b5ca... by tglx

x86/entry: Convert INT 0x80 emulation to IDTENTRY

BugLink: https://bugs.launchpad.net/bugs/2052576

There is no real reason to have a separate ASM entry point implementation
for the legacy INT 0x80 syscall emulation on 64-bit.

IDTENTRY provides all the functionality needed with the only difference
that it does not:

  - save the syscall number (AX) into pt_regs::orig_ax
  - set pt_regs::ax to -ENOSYS

Both can be done safely in the C code of an IDTENTRY before invoking any of
the syscall related functions which depend on this convention.

Aside of ASM code reduction this prepares for detecting and handling a
local APIC injected vector 0x80.

[ kirill.shutemov: More verbose comments ]
Suggested-by: Linus Torvalds <email address hidden>
Signed-off-by: Thomas Gleixner <email address hidden>
Signed-off-by: Kirill A. Shutemov <email address hidden>
Signed-off-by: Dave Hansen <email address hidden>
Reviewed-by: Borislav Petkov (AMD) <email address hidden>
Cc: <email address hidden> # v6.0+
(backported from commit be5341eb0d43b1e754799498bd2e8756cc167a41)
[john-cabaj: context changes]
Signed-off-by: John Cabaj <email address hidden>

5401bd8... by "Kirill A. Shutemov" <email address hidden>

x86/coco: Disable 32-bit emulation by default on TDX and SEV

BugLink: https://bugs.launchpad.net/bugs/2052576

The INT 0x80 instruction is used for 32-bit x86 Linux syscalls. The
kernel expects to receive a software interrupt as a result of the INT
0x80 instruction. However, an external interrupt on the same vector
triggers the same handler.

The kernel interprets an external interrupt on vector 0x80 as a 32-bit
system call that came from userspace.

A VMM can inject external interrupts on any arbitrary vector at any
time. This remains true even for TDX and SEV guests where the VMM is
untrusted.

Put together, this allows an untrusted VMM to trigger int80 syscall
handling at any given point. The content of the guest register file at
that moment defines what syscall is triggered and its arguments. It
opens the guest OS to manipulation from the VMM side.

Disable 32-bit emulation by default for TDX and SEV. User can override
it with the ia32_emulation=y command line option.

[ dhansen: reword the changelog ]

Reported-by: Supraja Sridhara <email address hidden>
Reported-by: Benedict Schlüter <email address hidden>
Reported-by: Mark Kuhne <email address hidden>
Reported-by: Andrin Bertschi <email address hidden>
Reported-by: Shweta Shinde <email address hidden>
Signed-off-by: Kirill A. Shutemov <email address hidden>
Signed-off-by: Dave Hansen <email address hidden>
Reviewed-by: Thomas Gleixner <email address hidden>
Reviewed-by: Borislav Petkov (AMD) <email address hidden>
Cc: <email address hidden> # v6.0+: 1da5c9b x86: Introduce ia32_enabled()
Cc: <email address hidden> # v6.0+
(cherry picked from commit b82a8dbd3d2f4563156f7150c6f2ecab6e960b30)
Signed-off-by: John Cabaj <email address hidden>

2b9f83b... by Nikolay Borisov <email address hidden>

x86: Introduce ia32_enabled()

BugLink: https://bugs.launchpad.net/bugs/2052576

IA32 support on 64bit kernels depends on whether CONFIG_IA32_EMULATION
is selected or not. As it is a compile time option it doesn't
provide the flexibility to have distributions set their own policy for
IA32 support and give the user the flexibility to override it.

As a first step introduce ia32_enabled() which abstracts whether IA32
compat is turned on or off. Upcoming patches will implement
the ability to set IA32 compat state at boot time.

Signed-off-by: Nikolay Borisov <email address hidden>
Signed-off-by: Thomas Gleixner <email address hidden>
Link: https://<email address hidden>

(cherry picked from commit 1da5c9bc119d3a749b519596b93f9b2667e93c4a)
Signed-off-by: John Cabaj <email address hidden>

a81548b... by Kai Huang <email address hidden>

x86/tdx: Fix __noreturn build warning around __tdx_hypercall_failed()

BugLink: https://bugs.launchpad.net/bugs/2052576

LKP reported below build warning:

  vmlinux.o: warning: objtool: __tdx_hypercall+0x128: __tdx_hypercall_failed() is missing a __noreturn annotation

The __tdx_hypercall_failed() function definition already has __noreturn
annotation, but it turns out the __noreturn must be annotated to the
function declaration.

PeterZ explains:

  "FWIW, the reason being that...

   The point of noreturn is that the caller should know to stop generating
   code. For that the declaration needs the attribute, because call sites
   typically do not have access to the function definition in C."

Add __noreturn annotation to the declaration of __tdx_hypercall_failed()
to fix. It's not a bad idea to document the __noreturn nature at the
definition site either, so keep the annotation at the definition.

Note <asm/shared/tdx.h> is also included by TDX related assembly files.
Include <linux/compiler_attributes.h> only in case of !__ASSEMBLY__
otherwise compiling assembly file would trigger build error.

Also, following the objtool documentation, add __tdx_hypercall_failed()
to "tools/objtool/noreturns.h".

Fixes: c641cfb5c157 ("x86/tdx: Make TDX_HYPERCALL asm similar to TDX_MODULE_CALL")
Reported-by: kernel test robot <email address hidden>
Signed-off-by: Kai Huang <email address hidden>
Signed-off-by: Ingo Molnar <email address hidden>
Link: https://<email address hidden>
Closes: https://<email address hidden>/
(cherry picked from commit 518755a7eeae77a399430eaf211a1e71f6b87d4a)
Signed-off-by: John Cabaj <email address hidden>

c6a91b3... by Kai Huang <email address hidden>

x86/virt/tdx: Make TDX_MODULE_CALL handle SEAMCALL #UD and #GP

BugLink: https://bugs.launchpad.net/bugs/2052576

SEAMCALL instruction causes #UD if the CPU isn't in VMX operation.
Currently the TDX_MODULE_CALL assembly doesn't handle #UD, thus making
SEAMCALL when VMX is disabled would cause Oops.

Unfortunately, there are legal cases that SEAMCALL can be made when VMX
is disabled. For instance, VMX can be disabled due to emergency reboot
while there are still TDX guests running.

Extend the TDX_MODULE_CALL assembly to return an error code for #UD to
handle this case gracefully, e.g., KVM can then quietly eat all SEAMCALL
errors caused by emergency reboot.

SEAMCALL instruction also causes #GP when TDX isn't enabled by the BIOS.
Use _ASM_EXTABLE_FAULT() to catch both exceptions with the trap number
recorded, and define two new error codes by XORing the trap number to
the TDX_SW_ERROR. This opportunistically handles #GP too while using
the same simple assembly code.

A bonus is when kernel mistakenly calls SEAMCALL when CPU isn't in VMX
operation, or when TDX isn't enabled by the BIOS, or when the BIOS is
buggy, the kernel can get a nicer error code rather than a less
understandable Oops.

This is basically based on Peter's code.

Suggested-by: Peter Zijlstra <email address hidden>
Signed-off-by: Kai Huang <email address hidden>
Signed-off-by: Dave Hansen <email address hidden>
Reviewed-by: Kirill A. Shutemov <email address hidden>
Acked-by: Peter Zijlstra (Intel) <email address hidden>
Link: https://lore.kernel.org/all/de975832a367f476aab2d0eb0d9de66019a16b54.1692096753.git.kai.huang%40intel.com
(cherry picked from commit 7b804135d4d1f0a2b9dda69c6303d3f2dcbe9d37)
Signed-off-by: John Cabaj <email address hidden>