Re: [PATCH v7 04/20] x86/virt/tdx: Add skeleton to initialize TDX on demand
From: Huang, Kai
Date: Wed Nov 23 2022 - 07:22:29 EST
On Wed, 2022-11-23 at 12:05 +0100, Thomas Gleixner wrote:
> Kai!
>
> On Wed, Nov 23 2022 at 00:30, Kai Huang wrote:
> > On Tue, 2022-11-22 at 21:03 +0100, Thomas Gleixner wrote:
> > > Clearly that's nowhere spelled out in the documentation, but I don't
> > > buy the 'architecturaly required' argument not at all. It's an
> > > implementation detail of the TDX module.
> >
> > I agree on hardware level there shouldn't be such requirement (not 100% sure
> > though), but I guess from kernel's perspective, "the implementation detail of
> > the TDX module" is sort of "architectural requirement"
>
> Sure, but then it needs to be clearly documented so.
>
> > -- at least Intel arch guys think so I guess.
>
> Intel "arch" guys? You might look up the meanings of "arch" in a
> dictionary. LKML is not twatter.
>
> > > Technically there is IMO ZERO requirement to do so.
> > >
> > > 1) The TDX module is global
> > >
> > > 2) Seam-root and Seam-non-root operation are strictly a LP property.
> > >
> > > The only architectural prerequisite for using Seam on a LP is that
> > > obviously the encryption/decryption mechanics have been initialized
> > > on the package to which the LP belongs.
> > >
> > > I can see why it might be complicated to add/remove an LP after
> > > initialization fact, but technically it should be possible.
> >
> > "kernel soft offline" actually isn't an issue. We can bring down a logical cpu
> > after it gets initialized and then bring it up again.
>
> That's the whole point where this discussion started: _AFTER_ it gets
> initialized.
>
> Which means that, e.g. adding "nosmt" to the kernel command line will
> make TDX fail hard because at the point where TDX is initialized the
> hyperthreads are not 'soft' online and cannot respond to anything, but
> the BIOS already accounted them.
>
> This is just wrong as we all know that "nosmt" is sadly one of the
> obvious counter measures for the never ending flood of speculation
> issues.
Agree. As said in my other replies, I'll follow up with TDX module guys on
this.
>
> > Only add/removal of physical cpu will cause problem:
>
> You wish.
>
> > TDX MCHECK verifies all boot-time present cpus to make sure they are TDX-
> > compatible before it enables TDX in hardware. MCHECK cannot run on hot-added
> > CPU, so TDX cannot support physical CPU hotplug.
>
> TDX can rightfully impose the limitation that it only executes on CPUs,
> which are known at boot/initialization time, and only utilizes "known"
> memory. That's it, but that does not enforce to prevent physical hotplug
> in general.
Adding physical CPUs should have no problem I guess, they just cannot run TDX.
TDX architecture doesn't expect they can run TDX code anyway.
But would physical removal of boot-time present CPU cause problem? TDX MCHECK
checks/records boot-time present CPUs. If a CPU is removed and then a new one
is added, then TDX still treats it is TDX-compatible, but it may actually not.
So if this happens, from functionality's point of view, it can break. I think
TDX still wants to guarantee TDX code can work correctly on "TDX recorded" CPUs.
Also, I am not sure whether there's any security issue if a malicious kernel
tries to run TDX code on such removed-then-added CPU.
This seems a TDX architecture problem rather than kernel policy issue.
>
> > We tried to get it clarified in the specification, and below is what TDX/module
> > arch guys agreed to put to the TDX module spec (just checked it's not in latest
> > public spec yet, but they said it will be in next release):
> >
> > "
> > 4.1.3.2. CPU Configuration
> >
> > During platform boot, MCHECK verifies all logical CPUs to ensure they
> > meet TDX’s
>
> That MCHECK falls into the category of security voodoo.
>
> It needs to verify _ALL_ logical CPUs to ensure that Intel did not put
> different models and steppings into a package or what?
I am guessing so.
>
> > security and certain functionality requirements, and MCHECK passes the following
> > CPU configuration information to the NP-SEAMLDR, P-SEAMLDR and the TDX Module:
> >
> > · Total number of logical processors in the platform.
>
> You surely need MCHECK for this
>
> > · Total number of installed packages in the platform.
>
> and for this...
>
> > · A table of per-package CPU family, model and stepping etc.
> > identification, as enumerated by CPUID(1).EAX.
> > The above information is static and does not change after platform boot and
> > MCHECK run.
> >
> > Note: TDX doesn’t support adding or removing CPUs from TDX security
> > perimeter, as checked my MCHECK.
>
> More security voodoo. The TDX security perimeter has nothing to do with
> adding or removing CPUs on a system. If that'd be true then TDX is a
> complete fail.
No argument here.
> > BIOS should prevent CPUs from being hot-added or hot-removed after
> > platform boots.
>
> If the BIOS does not prevent it, then TDX and the Seam module will not
> even notice unless the OS tries to invoke seamcall() on a newly plugged
> CPU.
>
> A newly added CPU and newly added memory should not have any impact on
> the TDX integrity of the already running system. If they have then
> again, TDX is broken by design.
No argument here either.
>
> > The TDX module performs additional checks of the CPU’s configuration and
> > supported features, by reading MSRs and CPUID information as described in the
> > following sections.
>
> to ensure that the MCHECK generated table is still valid at the point
> where TDX is initialized?
I think it is trying to say:
MCHECK doesn't do all the verifications. Some verfications are deferred to the
TDX module to check when it gets initialized.
>
> That said, this documentation is at least better than the existing void,
> but that does not make it technically correct.
>
> Thanks,
>
> tglx