~john-cabaj/ubuntu/+source/linux/+git/lunar:tdx-getquote-1

Last commit made on 2023-06-28
Get this branch:
git clone -b tdx-getquote-1 https://git.launchpad.net/~john-cabaj/ubuntu/+source/linux/+git/lunar
Only John Cabaj can upload to this branch. If you are John Cabaj please log in for upload directions.

Branch merges

Branch information

Name:
tdx-getquote-1
Repository:
lp:~john-cabaj/ubuntu/+source/linux/+git/lunar

Recent commits

16dc571... by Khaled El Mously

SAUCE: Fix __tdx_hypercall call

Signed-off-by: Khalid Elmously <email address hidden>

19ecbc7... by Kuppuswamy Sathyanarayanan <email address hidden>

x86/tdx: Add Quote generation support

In TDX guest, the second stage in attestation process is to send the
TDREPORT to QE/QGS to generate the TD Quote. For platforms that does
not support communication channels like vsock or TCP/IP, implement
support to get TD Quote using hypercall. GetQuote hypercall can be used
by the TD guest to request VMM facilitate the Quote generation via
QE/QGS. More details about GetQuote hypercall can be found in TDX
Guest-Host Communication Interface (GHCI) for Intel TDX 1.0, section
titled "TDG.VP.VMCALL<GetQuote>.

Since GetQuote is an asynchronous request hypercall, it will not block
till the TD Quote is generated. So VMM uses callback interrupt vector
configured by SetupEventNotifyInterrupt hypercall to notify the guest
about Quote generation completion or failure.

GetQuote TDVMCALL requires TD guest pass a 4K aligned shared buffer
with TDREPORT data as input, which is further used by the VMM to copy
the TD Quote result after successful Quote generation. To create the
shared buffer without breaking the direct map, use
cc_decrypted_alloc/free() APIs.

Also note that, shared buffer allocation is currently handled in IOCTL
handler, although it will increase the TDX_CMD_GET_QUOTE IOCTL response
time, it is negligible compared to the time required for the quote
generation completion. So IOCTL performance optimization is not
considered at this time.

Add support for TDX_CMD_GET_QUOTE IOCTL to allow attestation agent
submit GetQuote requests from the user space. Since Quote generation
is an asynchronous request, IOCTL will block indefinitely for the VMM
response in wait_for_completion_interruptible() call. Using this call
will also add an option for the user to end the current request
prematurely by raising any signals. This can be used by attestation
agent to implement Quote generation timeout feature. If attestation
agent is aware of time it can validly wait for QE/QGS response, then
a possible timeout support can be implemented in the user application
using signals. Quote generation timeout feature is currently not
implemented in the driver because the current TDX specification does
not have any recommendation for it.

After submitting the GetQuote request using hypercall, the shared buffer
allocated for the current request is owned by the VMM. So, during this
wait window, if the user terminates the request by raising a signal or
by terminating the application, add a logic to do the memory cleanup
after receiving the VMM response at a later time.

To support parallel GetQuote requests, use linked list to track the
active GetQuote requests and upon receiving the callback IRQ, loop
through the active requests and mark the processed requests complete.
Users can open multiple instances of the attestation device and send
GetQuote requests in parallel.

Reviewed-by: Tony Luck <email address hidden>
Reviewed-by: Andi Kleen <email address hidden>
Acked-by: Kirill A. Shutemov <email address hidden>
Signed-off-by: Kuppuswamy Sathyanarayanan <email address hidden>
Signed-off-by: Khalid Elmously <email address hidden>

e003cf7... by Kuppuswamy Sathyanarayanan <email address hidden>

x86/coco: Add cc_decrypted_alloc/free() interfaces

Confidential computing platforms, such as AMD SEV and Intel TDX,
protect memory from VMM access. Any memory that is required for
communication with the VMM must be explicitly shared. It involves
adjusting page table entries to indicate that the memory is shared and
notifies VMM about the change.

set_memory_decrypted() converts memory to shared. Before freeing
memory it has to be converted back with set_memory_encrypted().

The interface works fine for long-term allocations, but for frequent
short-lived allocations it causes problems. Conversion takes time and
direct mapping modification leads to its fracturing and performance
degradation over time.

Direct mapping modifications can be avoided by creating a vmap that
maps allocated pages as shared while direct mapping is untouched.

But having private mapping of a shared memory causes problems too.
Any access of such memory via private mapping in TDX guest would
trigger unrecoverable SEPT violation and termination of the virtual
machine. It is known that load_unaligned_zeropad() can issue such
unwanted loads across page boundaries that can trigger the issue.

It can also be fixed by allocating a guard page in front of any memory
that has to be converted to shared, so load_unaligned_zeropad() will
roll off to the guard page instead. But it is wasteful and does not
address cost of the memory conversion.

The next logical step is to introduce a pool of shared memory that can
share a single guard page and makes conversion less frequent.

Fortunately, the kernel already has such a pool of memory: SWIOTLB
buffer is used by the DMA API to allocate memory for I/O. The buffer is
allocated once during the boot, so direct mapping fracturing is not an
issue and no need for vmap tricks.

Tapping into the SWIOTLB pool requires a device structure and using DMA
API. Provide a couple of simple helpers to allocate and free shared
memory that hide required plumbing.

Signed-off-by: Kuppuswamy Sathyanarayanan <email address hidden>
Signed-off-by: Khalid Elmously <email address hidden>

850315f... by Kuppuswamy Sathyanarayanan <email address hidden>

x86/tdx: Add TDX Guest event notify interrupt support

Host-guest event notification via configured interrupt vector is useful
in cases where a guest makes an asynchronous request and needs a
callback from the host to indicate the completion or to let the host
notify the guest about events like device removal. One usage example is,
callback requirement of GetQuote asynchronous hypercall.

In TDX guest, SetupEventNotifyInterrupt hypercall can be used by the
guest to specify which interrupt vector to use as an event-notify
vector to the VMM. Details about the SetupEventNotifyInterrupt
hypercall can be found in TDX Guest-Host Communication Interface
(GHCI) Specification, sec 3.5 "VP.VMCALL<SetupEventNotifyInterrupt>".
Add a tdx_hcall_set_notify_intr() helper function to implement the
SetupEventNotifyInterrupt hypercall.

As per design VMM will post the event completion IRQ using the same CPU
in which SetupEventNotifyInterrupt hypercall request is received. So
allocate an IRQ vector from "x86_vector_domain", and set the CPU
affinity of the IRQ vector to the current CPU.

Please note that this patch only reserves the IRQ number. It is
expected that the user of event notify IRQ (like GetQuote handler) will
directly register the handler for "tdx_notify_irq" IRQ by using
request_irq() with IRQF_SHARED flag set. Using IRQF_SHARED will enable
multiple users to use the same IRQ for event notification.

Reviewed-by: Tony Luck <email address hidden>
Reviewed-by: Andi Kleen <email address hidden>
Acked-by: Kirill A. Shutemov <email address hidden>
Acked-by: Wander Lairson Costa <email address hidden>
Signed-off-by: Kuppuswamy Sathyanarayanan <email address hidden>
Signed-off-by: Khalid Elmously <email address hidden>

2500ca2... by Kuppuswamy Sathyanarayanan <email address hidden>

x86/tdx: Add TDX Guest attestation interface driver

In TDX guest, attestation is used to verify the trustworthiness of a TD
to other entities before provisioning secrets to the TD.

During the TD launch, the initial contents and configuration of the TD
are recorded by the Intel TDX module in build time measurement register
(MRTD). It is a SHA384 digest created using data from TD private pages(
including TD firmware) and the configuration of the TD.

After TD build, run-time measurement registers (RTMRs) can be used by
the guest TD software to extend the TD measurements. TDX supports 4
RTMR registers, and TDG.MR.RTMR.EXTEND TDCALL is used to update the
RTMR registers securely. RTMRs are mainly used to record measurements
related to sections like the kernel image, command line parameters,
initrd, ACPI tables, firmware data, configuration firmware volume (CFV)
of TDVF, etc. For more details, please refer to TDX Virtual Firmware
design specification, sec titled "TD Measurement".

At TD runtime, the Intel TDX module reuses the Intel SGX attestation
infrastructure to provide support for attesting to these measurements
as described below.

The attestation process consists of two steps: TDREPORT generation and
Quote generation.

TDREPORT (TDREPORT_STRUCT) is a fixed-size data structure generated by
the TDX module which contains TD-specific information (such as TD
measurements), platform security version, and the MAC to protect the
integrity of the TDREPORT. The TD kernel uses TDCALL[TDG.MR.REPORT] to
get the TDREPORT from the TDX module. A user-provided 64-Byte
REPORTDATA is used as input and included in the TDREPORT. Typically it
can be some nonce provided by attestation service so the TDREPORT can
be verified uniquely. More details about TDREPORT can be found in
Intel TDX Module specification, section titled "TDG.MR.REPORT Leaf".

TDREPORT can only be verified on local platform as the MAC key is bound
to the platform. To support remote verification of the TDREPORT, TDX
leverages Intel SGX Quote Enclave (QE) to verify the TDREPORT locally
and convert it to a remote verifiable Quote.

After getting the TDREPORT, the second step of the attestation process
is to send it to the QE to generate the Quote. TDX doesn't support SGX
inside the TD, so the QE can be deployed in the host, or in another
legacy VM with SGX support. How to send the TDREPORT to QE and receive
the Quote is implementation and deployment specific.

Implement a basic TD guest misc driver to allow TD userspace to get the
TDREPORT. The TD userspace attestation software can get the TDREPORT
and then choose whatever communication channel available (i.e. vsock
or hypercall) to send the TDREPORT to QE and receive the Quote.

Also note that explicit access permissions are not enforced in this
driver because the quote and measurements are not a secret. However
the access permissions of the device node can be used to set any
desired access policy. The udev default is usually root access
only.

Operations like getting TDREPORT or Quote generation involves sending
a blob of data as input and getting another blob of data as output. It
was considered to use a sysfs interface for this, but it doesn't fit
well into the standard sysfs model for configuring values. It would be
possible to do read/write on files, but it would need multiple file
descriptors, which would be somewhat messy. IOCTLs seems to be the best
fitting and simplest model for this use case. This is similar to AMD
SEV platform, which also uses IOCTL interface to support attestation.

Reviewed-by: Tony Luck <email address hidden>
Reviewed-by: Andi Kleen <email address hidden>
Acked-by: Kirill A. Shutemov <email address hidden>
Acked-by: Kai Huang <email address hidden>
Acked-by: Wander Lairson Costa <email address hidden>
Signed-off-by: Kuppuswamy Sathyanarayanan <email address hidden>
Signed-off-by: Khalid Elmously <email address hidden>

acff3bc... by Khaled El Mously

UBUNTU: Ubuntu-gcp-6.2.0-1009.9

Signed-off-by: Khalid Elmously <email address hidden>

a940445... by Khaled El Mously

UBUNTU: Start new release

Ignore: yes
Signed-off-by: Khalid Elmously <email address hidden>

ea216af... by "Kirill A. Shutemov" <email address hidden>

x86/tdx: Drop flags from __tdx_hypercall()

After TDX_HCALL_ISSUE_STI got dropped, the only flag left is
TDX_HCALL_HAS_OUTPUT. The flag indicates if the caller wants to see
tdx_hypercall_args updated based on the hypercall output.

Drop the flags and provide __tdx_hypercall_ret() that matches
TDX_HCALL_HAS_OUTPUT semantics.

Suggested-by: Peter Zijlstra <email address hidden>
Signed-off-by: Kirill A. Shutemov <email address hidden>
Signed-off-by: Dave Hansen <email address hidden>
Acked-by: Peter Zijlstra (Intel) <email address hidden>
Link: https://lore.kernel.org/all/20230321003511.9469-1-kirill.shutemov%40linux.intel.com
(cherry picked from commit 7a3a401874bea02f568aa416ac29170d8cde0dc2)
Signed-off-by: Khalid Elmously <email address hidden>

a96b2ad... by "Kirill A. Shutemov" <email address hidden>

x86/tdx: Do not corrupt frame-pointer in __tdx_hypercall()

If compiled with CONFIG_FRAME_POINTER=y, objtool is not happy that
__tdx_hypercall() messes up RBP:

  objtool: __tdx_hypercall+0x7f: return with modified stack frame

Rework the function to store TDX_HCALL_ flags on stack instead of RBP.

[ dhansen: minor changelog tweaks ]

Fixes: c30c4b2555ba ("x86/tdx: Refactor __tdx_hypercall() to allow pass down more arguments")
Reported-by: kernel test robot <email address hidden>
Signed-off-by: Kirill A. Shutemov <email address hidden>
Signed-off-by: Dave Hansen <email address hidden>
Link: https://<email address hidden>
Link: https://lore.kernel.org/all/20230130135354.27674-1-kirill.shutemov%40linux.intel.com
(backported from commit 1e70c680375aa33cca97bff0bca68c0f82f5023c)
[ kmously: Context adjust again due to addition of "x86/tdx: Remove TDX_HCALL_ISSUE_STI" ]
Signed-off-by: Khalid Elmously <email address hidden>

b1841e4... by "Kirill A. Shutemov" <email address hidden>

x86/tdx: Disable NOTIFY_ENABLES

== Background ==

There is a class of side-channel attacks against SGX enclaves called
"SGX Step"[1]. These attacks create lots of exceptions inside of
enclaves. Basically, run an in-enclave instruction, cause an exception.
Over and over.

There is a concern that a VMM could attack a TDX guest in the same way
by causing lots of #VE's. The TDX architecture includes new
countermeasures for these attacks. It basically counts the number of
exceptions and can send another *special* exception once the number of
VMM-induced #VE's hits a critical threshold[2].

== Problem ==

But, these special exceptions are independent of any action that the
guest takes. They can occur anywhere that the guest executes. This
includes sensitive areas like the entry code. The (non-paranoid) #VE
handler is incapable of handling exceptions in these areas.

== Solution ==

Fortunately, the special exceptions can be disabled by the guest via
write to NOTIFY_ENABLES TDCS field. NOTIFY_ENABLES is disabled by
default, but might be enabled by a bootloader, firmware or an earlier
kernel before the current kernel runs.

Disable NOTIFY_ENABLES feature explicitly and unconditionally. Any
NOTIFY_ENABLES-based #VE's that occur before this point will end up
in the early #VE exception handler and die due to unexpected exit
reason.

[1] https://github.com/jovanbulck/sgx-step
[2] https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html#safety-against-ve-in-kernel-code

Signed-off-by: Kirill A. Shutemov <email address hidden>
Signed-off-by: Dave Hansen <email address hidden>
Reviewed-by: Dave Hansen <email address hidden>
Link: https://lore.kernel.org/all/20230126221159.8635-8-kirill.shutemov%40linux.intel.com
(cherry picked from commit 8de62af018cc9262649d7688f7eb1409b2d8f594)
Signed-off-by: Khalid Elmously <email address hidden>