Re: [PATCH v7 04/20] x86/virt/tdx: Add skeleton to initialize TDX on demand
From: Huang, Kai
Date: Wed Nov 23 2022 - 05:33:18 EST
On Tue, 2022-11-22 at 10:05 -0800, Dave Hansen wrote:
> On 11/20/22 16:26, Kai Huang wrote:
> > 2) It is more flexible to support TDX module runtime updating in the
> > future (after updating the TDX module, it needs to be initialized
> > again).
>
> I hate this generic blabber about "more flexible". There's a *REASON*
> it's more flexible, so let's talk about the reasons, please.
>
> It's really something like this, right?
>
> The TDX module design allows it to be updated while the system
> is running. The update procedure shares quite a few steps with
> this "on demand" loading mechanism. The hope is that much of
> this "on demand" mechanism can be shared with a future "update"
> mechanism. A boot-time TDX module implementation would not be
> able to share much code with the update mechanism.
Yes. Thanks.
>
>
> > 3) It avoids having to do a "temporary" solution to handle VMXON in the
> > core (non-KVM) kernel for now. This is because SEAMCALL requires CPU
> > being in VMX operation (VMXON is done), but currently only KVM handles
> > VMXON. Adding VMXON support to the core kernel isn't trivial. More
> > importantly, from long-term a reference-based approach is likely needed
> > in the core kernel as more kernel components are likely needed to
> > support TDX as well. Allow KVM to initialize the TDX module avoids
> > having to handle VMXON during kernel boot for now.
>
> There are a lot of words in there.
>
> 3) Loading the TDX module requires VMX to be enabled. Currently, only
> the kernel KVM code mucks with VMX enabling. If the TDX module were
> to be initialized separately from KVM (like at boot), the boot code
> would need to be taught how to muck with VMX enabling and KVM would
> need to be taught how to cope with that. Making KVM itself
> responsible for TDX initialization lets the rest of the kernel stay
> blissfully unaware of VMX.
Thanks.
>
> > Add a placeholder tdx_enable() to detect and initialize the TDX module
> > on demand, with a state machine protected by mutex to support concurrent
> > calls from multiple callers.
>
> As opposed to concurrent calls from one caller? ;)
How about below?
"
Add a placeholder tdx_enable() to initialize the TDX module on demand. So far
KVM will be the only caller, but other kernel components will need to use it too
in the future. Just use a mutex protected state machine to make sure the module
initialization can only be done once.
"
>
> > The TDX module will be initialized in multi-steps defined by the TDX
> > module:
> >
> > 1) Global initialization;
> > 2) Logical-CPU scope initialization;
> > 3) Enumerate the TDX module capabilities and platform configuration;
> > 4) Configure the TDX module about TDX usable memory ranges and global
> > KeyID information;
> > 5) Package-scope configuration for the global KeyID;
> > 6) Initialize usable memory ranges based on 4).
>
> This would actually be a nice place to call out the SEAMCALL names and
> mention that each of these steps involves a set of SEAMCALLs.
How about below?
"
The TDX module will be initialized in multi-steps defined by the TDX module and
each of those steps involves a specific SEAMCALL:
1) Global initialization using TDH.SYS.INIT.
2) Logical-CPU scope initialization using TDH.SYS.LP.INIT.
3) Enumerate the TDX module capabilities and TDX-capable memory information
using TDH.SYS.INFO.
4) Configure the TDX module with TDX-usable memory regions and the global
KeyID information using TDH.SYS.CONFIG.
5) Package-scope configuration for the global KeyID using TDH.SYS.KEY.CONFIG.
6) Initialize TDX-usable memory regions using TDH.SYS.TDMR.INIT.
Before step 4), the kernel needs to build a set of TDX-usable memory regions,
and construct data structures to cover those regions.
"
>
> > The TDX module can also be shut down at any time during its lifetime.
> > In case of any error during the initialization process, shut down the
> > module. It's pointless to leave the module in any intermediate state
> > during the initialization.
> >
> > Both logical CPU scope initialization and shutting down the TDX module
> > require calling SEAMCALL on all boot-time present CPUs. For simplicity
> > just temporarily disable CPU hotplug during the module initialization.
>
> You might want to more precisely define "boot-time present CPUs". The
> boot of *what*?
How about use "BIOS-enabled CPUs" instead? If OK I'll use it consistently across
this series.
>
> > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> > index 8d943bdc8335..28c187b8726f 100644
> > --- a/arch/x86/virt/vmx/tdx/tdx.c
> > +++ b/arch/x86/virt/vmx/tdx/tdx.c
> > @@ -10,15 +10,34 @@
> > #include <linux/types.h>
> > #include <linux/init.h>
> > #include <linux/printk.h>
> > +#include <linux/mutex.h>
> > +#include <linux/cpu.h>
> > +#include <linux/cpumask.h>
> > #include <asm/msr-index.h>
> > #include <asm/msr.h>
> > #include <asm/apic.h>
> > #include <asm/tdx.h>
> > #include "tdx.h"
> >
> > +/* TDX module status during initialization */
> > +enum tdx_module_status_t {
> > + /* TDX module hasn't been detected and initialized */
> > + TDX_MODULE_UNKNOWN,
> > + /* TDX module is not loaded */
> > + TDX_MODULE_NONE,
> > + /* TDX module is initialized */
> > + TDX_MODULE_INITIALIZED,
> > + /* TDX module is shut down due to initialization error */
> > + TDX_MODULE_SHUTDOWN,
> > +};
>
> Are these part of the ABI or just a purely OS-side construct?
Purely OS-side construct. I'll explicitly call out in the comment.
>
> > static u32 tdx_keyid_start __ro_after_init;
> > static u32 tdx_keyid_num __ro_after_init;
> >
> > +static enum tdx_module_status_t tdx_module_status;
> > +/* Prevent concurrent attempts on TDX detection and initialization */
> > +static DEFINE_MUTEX(tdx_module_lock);
> > +
> > /*
> > * Detect TDX private KeyIDs to see whether TDX has been enabled by the
> > * BIOS. Both initializing the TDX module and running TDX guest require
> > @@ -104,3 +123,134 @@ bool platform_tdx_enabled(void)
> > {
> > return !!tdx_keyid_num;
> > }
> > +
> > +/*
> > + * Detect and initialize the TDX module.
> > + *
> > + * Return -ENODEV when the TDX module is not loaded, 0 when it
> > + * is successfully initialized, or other error when it fails to
> > + * initialize.
> > + */
> > +static int init_tdx_module(void)
> > +{
> > + /* The TDX module hasn't been detected */
> > + return -ENODEV;
> > +}
> > +
> > +static void shutdown_tdx_module(void)
> > +{
> > + /* TODO: Shut down the TDX module */
> > +}
> > +
> > +static int __tdx_enable(void)
> > +{
> > + int ret;
> > +
> > + /*
> > + * Initializing the TDX module requires doing SEAMCALL on all
> > + * boot-time present CPUs. For simplicity temporarily disable
> > + * CPU hotplug to prevent any CPU from going offline during
> > + * the initialization.
> > + */
> > + cpus_read_lock();
> > +
> > + /*
> > + * Check whether all boot-time present CPUs are online and
> > + * return early with a message so the user can be aware.
> > + *
> > + * Note a non-buggy BIOS should never support physical (ACPI)
> > + * CPU hotplug when TDX is enabled, and all boot-time present
> > + * CPU should be enabled in MADT, so there should be no
> > + * disabled_cpus and num_processors won't change at runtime
> > + * either.
> > + */
>
> Again, there are a lot of words in that comment, but I'm not sure why
> it's here. Despite all the whinging about ACPI, doesn't it boil down to:
>
> The TDX module itself establishes its own concept of how many
> logical CPUs there are in the system when it is loaded.
>
This isn't accurate. TDX MCHECK records the total number of logical CPUs when
the BIOS enables TDX. This happens before the TDX module is loaded. In fact
the TDX module only gets this information from a secret location.
> The
> module will reject initialization attempts unless the kernel
> runs TDX initialization code on every last CPU.
>
> Ensure that the kernel is able to run code on all known logical
> CPUs.
How about:
TDX itself establishes its own concept of how many logical CPUs there
are in the system when it gets enabled by the BIOS. The module will
reject initialization attempts unless the kernel runs TDX
initialization
code on every last CPU.
Ensure that the kernel is able to run code on all known logical CPUs.
>
> and these checks are just to see if the kernel has shot itself in the
> foot and is *KNOWS* that it is currently unable to run code on some
> logical CPU?
Yes.
>
> > + if (disabled_cpus || num_online_cpus() != num_processors) {
> > + pr_err("Unable to initialize the TDX module when there's offline CPU(s).\n");
> > + ret = -EINVAL;
> > + goto out;
> > + }
> > +
> > + ret = init_tdx_module();
> > + if (ret == -ENODEV) {
>
> Why check for -ENODEV exclusively? Is there some other error nonzero
> code that indicates success?
The idea is to print out "TDX module not loaded" to separate it from other
errors, so that the user can get a better idea when something goes wrong.
>
> > + pr_info("TDX module is not loaded.\n");
> > + tdx_module_status = TDX_MODULE_NONE;
> > + goto out;
> > + }
> > +
> > + /*
> > + * Shut down the TDX module in case of any error during the
> > + * initialization process. It's meaningless to leave the TDX
> > + * module in any middle state of the initialization process.
> > + *
> > + * Shutting down the module also requires doing SEAMCALL on all
> > + * MADT-enabled CPUs. Do it while CPU hotplug is disabled.
> > + *
> > + * Return all errors during the initialization as -EFAULT as the
> > + * module is always shut down.
> > + */
> > + if (ret) {
> > + pr_info("Failed to initialize TDX module. Shut it down.\n");
>
> "Shut it down" seems wrong here. That could be interpreted as "I have
> already shut it down". "Shutting down" seems better.
Will change to "Shutting down" if we still want to keep the shut down patch
(please see my another reply to Sean).
>
> > + shutdown_tdx_module();
> > + tdx_module_status = TDX_MODULE_SHUTDOWN;
> > + ret = -EFAULT;
> > + goto out;
> > + }
> > +
> > + pr_info("TDX module initialized.\n");
> > + tdx_module_status = TDX_MODULE_INITIALIZED;
> > +out:
> > + cpus_read_unlock();
> > +
> > + return ret;
> > +}
> > +
> > +/**
> > + * tdx_enable - Enable TDX by initializing the TDX module
> > + *
> > + * Caller to make sure all CPUs are online and in VMX operation before
> > + * calling this function. CPU hotplug is temporarily disabled internally
> > + * to prevent any cpu from going offline.
>
> "cpu" or "CPU"?
>
> > + * This function can be called in parallel by multiple callers.
> > + *
> > + * Return:
> > + *
> > + * * 0: The TDX module has been successfully initialized.
> > + * * -ENODEV: The TDX module is not loaded, or TDX is not supported.
> > + * * -EINVAL: The TDX module cannot be initialized due to certain
> > + * conditions are not met (i.e. when not all MADT-enabled
> > + * CPUs are not online).
> > + * * -EFAULT: Other internal fatal errors, or the TDX module is in
> > + * shutdown mode due to it failed to initialize in previous
> > + * attempts.
> > + */
>
> I honestly don't think all these error codes mean anything. They're
> plumbed nowhere and the use of -EFAULT is just plain wrong.
>
> Nobody can *DO* anything with these anyway.
>
> Just give one error code and make sure that you have pr_info()'s around
> to make it clear what went wrong. Then just do -EINVAL universally.
> Remove all the nonsense comments.
OK.
> > +int tdx_enable(void)
> > +{
> > + int ret;
> > +
> > + if (!platform_tdx_enabled())
> > + return -ENODEV;
> > +
> > + mutex_lock(&tdx_module_lock);
> > +
> > + switch (tdx_module_status) {
> > + case TDX_MODULE_UNKNOWN:
> > + ret = __tdx_enable();
> > + break;
> > + case TDX_MODULE_NONE:
> > + ret = -ENODEV;
> > + break;
>
> TDX_MODULE_NONE should probably be called TDX_MODULE_NOT_LOADED. A
> comment would also be nice:
>
> /* The BIOS did not load the module. No way to fix that. */
>
> > + case TDX_MODULE_INITIALIZED:
>
> /* Already initialized, great, tell the caller: */
Will do.
>
> > + ret = 0;
> > + break;
> > + default:
> > + WARN_ON_ONCE(tdx_module_status != TDX_MODULE_SHUTDOWN);
> > + ret = -EFAULT;
> > + break;
> > + }
>
> I don't get what that default: is for or what it has to do with
> TDX_MODULE_SHUTDOWN.
I meant we can only have 4 possible status, and the default case must be the
TDX_MODULE_SHUTDOWN state.
I think I can just remove that WARN()?
>
>
> > + mutex_unlock(&tdx_module_lock);
> > +
> > + return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(tdx_enable);
>