RE: [PATCH v7 10/20] x86/virt/tdx: Use all system memory when initializing TDX module as TDX memory
From: Dan Williams
Date: Wed Nov 23 2022 - 20:50:57 EST
Kai Huang wrote:
> TDX reports a list of "Convertible Memory Region" (CMR) to indicate all
> memory regions that can possibly be used by the TDX module, but they are
> not automatically usable to the TDX module. As a step of initializing
> the TDX module, the kernel needs to choose a list of memory regions (out
> from convertible memory regions) that the TDX module can use and pass
> those regions to the TDX module. Once this is done, those "TDX-usable"
> memory regions are fixed during module's lifetime. No more TDX-usable
> memory can be added to the TDX module after that.
>
> The initial support of TDX guests will only allocate TDX guest memory
> from the global page allocator. To keep things simple, this initial
> implementation simply guarantees all pages in the page allocator are TDX
> memory. To achieve this, use all system memory in the core-mm at the
> time of initializing the TDX module as TDX memory, and at the meantime,
> refuse to add any non-TDX-memory in the memory hotplug.
>
> Specifically, walk through all memory regions managed by memblock and
> add them to a global list of "TDX-usable" memory regions, which is a
> fixed list after the module initialization (or empty if initialization
> fails). To reject non-TDX-memory in memory hotplug, add an additional
> check in arch_add_memory() to check whether the new region is covered by
> any region in the "TDX-usable" memory region list.
>
> Note this requires all memory regions in memblock are TDX convertible
> memory when initializing the TDX module. This is true in practice if no
> new memory has been hot-added before initializing the TDX module, since
> in practice all boot-time present DIMM is TDX convertible memory. If
> any new memory has been hot-added, then initializing the TDX module will
> fail due to that memory region is not covered by CMR.
>
> This can be enhanced in the future, i.e. by allowing adding non-TDX
> memory to a separate NUMA node. In this case, the "TDX-capable" nodes
> and the "non-TDX-capable" nodes can co-exist, but the kernel/userspace
> needs to guarantee memory pages for TDX guests are always allocated from
> the "TDX-capable" nodes.
>
> Note TDX assumes convertible memory is always physically present during
> machine's runtime. A non-buggy BIOS should never support hot-removal of
> any convertible memory. This implementation doesn't handle ACPI memory
> removal but depends on the BIOS to behave correctly.
>
> Signed-off-by: Kai Huang <kai.huang@xxxxxxxxx>
> ---
>
> v6 -> v7:
> - Changed to use all system memory in memblock at the time of
> initializing the TDX module as TDX memory
> - Added memory hotplug support
>
> ---
> arch/x86/Kconfig | 1 +
> arch/x86/include/asm/tdx.h | 3 +
> arch/x86/mm/init_64.c | 10 ++
> arch/x86/virt/vmx/tdx/tdx.c | 183 ++++++++++++++++++++++++++++++++++++
> 4 files changed, 197 insertions(+)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index dd333b46fafb..b36129183035 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1959,6 +1959,7 @@ config INTEL_TDX_HOST
> depends on X86_64
> depends on KVM_INTEL
> depends on X86_X2APIC
> + select ARCH_KEEP_MEMBLOCK
> help
> Intel Trust Domain Extensions (TDX) protects guest VMs from malicious
> host and certain physical attacks. This option enables necessary TDX
> diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
> index d688228f3151..71169ecefabf 100644
> --- a/arch/x86/include/asm/tdx.h
> +++ b/arch/x86/include/asm/tdx.h
> @@ -111,9 +111,12 @@ static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1,
> #ifdef CONFIG_INTEL_TDX_HOST
> bool platform_tdx_enabled(void);
> int tdx_enable(void);
> +bool tdx_cc_memory_compatible(unsigned long start_pfn, unsigned long end_pfn);
> #else /* !CONFIG_INTEL_TDX_HOST */
> static inline bool platform_tdx_enabled(void) { return false; }
> static inline int tdx_enable(void) { return -ENODEV; }
> +static inline bool tdx_cc_memory_compatible(unsigned long start_pfn,
> + unsigned long end_pfn) { return true; }
> #endif /* CONFIG_INTEL_TDX_HOST */
>
> #endif /* !__ASSEMBLY__ */
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 3f040c6e5d13..900341333d7e 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -55,6 +55,7 @@
> #include <asm/uv/uv.h>
> #include <asm/setup.h>
> #include <asm/ftrace.h>
> +#include <asm/tdx.h>
>
> #include "mm_internal.h"
>
> @@ -968,6 +969,15 @@ int arch_add_memory(int nid, u64 start, u64 size,
> unsigned long start_pfn = start >> PAGE_SHIFT;
> unsigned long nr_pages = size >> PAGE_SHIFT;
>
> + /*
> + * For now if TDX is enabled, all pages in the page allocator
> + * must be TDX memory, which is a fixed set of memory regions
> + * that are passed to the TDX module. Reject the new region
> + * if it is not TDX memory to guarantee above is true.
> + */
> + if (!tdx_cc_memory_compatible(start_pfn, start_pfn + nr_pages))
> + return -EINVAL;
arch_add_memory() does not add memory to the page allocator. For
example, memremap_pages() uses arch_add_memory() and explicitly does not
release the memory to the page allocator. This check belongs in
add_memory_resource() to prevent new memory that violates TDX from being
onlined. Hopefully there is also an option to disable TDX from the
kernel boot command line to recover memory-hotplug without needing to
boot into the BIOS to toggle TDX.