98dd980...
by
"Kirill A. Shutemov" <email address hidden>
x86/tdx: Add unaccepted memory support
Hookup TDX-specific code to accept memory.
Accepting the memory is done with ACCEPT_PAGE module call on every page
in the range. MAP_GPA hypercall is not required as the unaccepted memory
is considered private already.
Extract the part of tdx_enc_status_changed() that does memory acceptance
in a new helper. Move the helper tdx-shared.c. It is going to be used by
both main kernel and decompressor.
b936cb1...
by
"Kirill A. Shutemov" <email address hidden>
x86/tdx: Refactor try_accept_one()
Rework try_accept_one() to return accepted size instead of modifying
'start' inside the helper. It makes 'start' in-only argument and
streamlines code on the caller side.
7401e17...
by
"Kirill A. Shutemov" <email address hidden>
x86/tdx: Make _tdx_hypercall() and __tdx_module_call() available in boot stub
Memory acceptance requires a hypercall and one or multiple module calls.
Make helpers for the calls available in boot stub. It has to accept
memory where kernel image and initrd are placed.
Signed-off-by: Kirill A. Shutemov <email address hidden>
Reviewed-by: Dave Hansen <email address hidden>
Signed-off-by: Khalid Elmously <email address hidden>
27433c5...
by
"Kirill A. Shutemov" <email address hidden>
efi/unaccepted: Avoid load_unaligned_zeropad() stepping into unaccepted memory
load_unaligned_zeropad() can lead to unwanted loads across page boundaries.
The unwanted loads are typically harmless. But, they might be made to
totally unrelated or even unmapped memory. load_unaligned_zeropad()
relies on exception fixup (#PF, #GP and now #VE) to recover from these
unwanted loads.
But, this approach does not work for unaccepted memory. For TDX, a load
from unaccepted memory will not lead to a recoverable exception within
the guest. The guest will exit to the VMM where the only recourse is to
terminate the guest.
There are two parts to fix this issue and comprehensively avoid access
to unaccepted memory. Together these ensure that an extra "guard" page
is accepted in addition to the memory that needs to be used.
1. Implicitly extend the range_contains_unaccepted_memory(start, end)
checks up to end+unit_size if 'end' is aligned on a unit_size
boundary.
2. Implicitly extend accept_memory(start, end) to end+unit_size if 'end'
is aligned on a unit_size boundary.
Side note: This leads to something strange. Pages which were accepted
at boot, marked by the firmware as accepted and will never
_need_ to be accepted might be on unaccepted_pages list
This is a cue to ensure that the next page is accepted
before 'page' can be used.
This is an actual, real-world problem which was discovered during TDX
testing.
32bb0fb...
by
"Kirill A. Shutemov" <email address hidden>
x86/boot/compressed: Handle unaccepted memory
The firmware will pre-accept the memory used to run the stub. But, the
stub is responsible for accepting the memory into which it decompresses
the main kernel. Accept memory just before decompression starts.
The stub is also responsible for choosing a physical address in which to
place the decompressed kernel image. The KASLR mechanism will randomize
this physical address. Since the accepted memory region is relatively
small, KASLR would be quite ineffective if it only used the pre-accepted
area (EFI_CONVENTIONAL_MEMORY). Ensure that KASLR randomizes among the
entire physical address space by also including EFI_UNACCEPTED_MEMORY.
3cc1863...
by
"Kirill A. Shutemov" <email address hidden>
efi/libstub: Implement support for unaccepted memory
UEFI Specification version 2.9 introduces the concept of memory
acceptance: Some Virtual Machine platforms, such as Intel TDX or AMD
SEV-SNP, requiring memory to be accepted before it can be used by the
guest. Accepting happens via a protocol specific for the Virtual
Machine platform.
Accepting memory is costly and it makes VMM allocate memory for the
accepted guest physical address range. It's better to postpone memory
acceptance until memory is needed. It lowers boot time and reduces
memory overhead.
The kernel needs to know what memory has been accepted. Firmware
communicates this information via memory map: a new memory type --
EFI_UNACCEPTED_MEMORY -- indicates such memory.
Range-based tracking works fine for firmware, but it gets bulky for
the kernel: e820 (or whatever the arch uses) has to be modified on every
page acceptance. It leads to table fragmentation and there's a limited
number of entries in the e820 table.
Another option is to mark such memory as usable in e820 and track if the
range has been accepted in a bitmap. One bit in the bitmap represents a
naturally aligned power-2-sized region of address space -- unit.
For x86, unit size is 2MiB: 4k of the bitmap is enough to track 64GiB or
physical address space.
In the worst-case scenario -- a huge hole in the middle of the
address space -- It needs 256MiB to handle 4PiB of the address
space.
Any unaccepted memory that is not aligned to unit_size gets accepted
upfront.
The bitmap is allocated and constructed in the EFI stub and passed down
to the kernel via EFI configuration table. allocate_e820() allocates the
bitmap if unaccepted memory is present, according to the size of
unaccepted region.