Re: [PATCH v7 06/20] x86/virt/tdx: Shut down TDX module in case of error
From: Dave Hansen
Date: Wed Nov 23 2022 - 13:19:10 EST
On 11/23/22 09:37, Sean Christopherson wrote:
> On Wed, Nov 23, 2022, Dave Hansen wrote:
>> There's no way we can guarantee _that_. For one, the PAMT* allocations
>> can always fail. I guess we could ask sysadmins to fire up a guest to
>> "prime" things, but that seems a little silly. Maybe that would work as
>> the initial implementation that we merge, but I suspect our users will
>> demand more determinism, maybe a boot or module parameter.
> Oh, you mean all of TDX initialization? I thought "initialization" here mean just
> doing tdx_enable().
Yes, but the first call to tdx_enable() does TDH_SYS_INIT and all the
subsequent work to get the module going.
> Yeah, that's not going to be a viable option. Aside from lacking determinisim,
> it would be all too easy to end up on a system with fragmented memory that can't
> allocate the PAMTs post-boot.
For now, the post-boot runtime PAMT allocations are the one any only way
that TDX can be initialized. I pushed for it to be done this way.
Here's why:
Doing tdx_enable() is relatively slow and it eats up a non-zero amount
of physically contiguous RAM for metadata (~1/256th or ~0.4% of RAM).
Systems that support TDX but will never run TDX guests should not pay
that cost.
That means that we either make folks opt-in at boot-time or we try to
make a best effort at runtime to do the metadata allocations.
>From my perspective, the best-effort stuff is absolutely needed. Users
are going to forget the command-line opt in and there's no harm in
_trying_ the big allocations even if they fail.
Second, in reality, the "real" systems that can run TDX guests are
probably not going to sit around fragmenting memory for a month before
they run their first guest. They're going to run one shortly after they
boot when memory isn't fragmented and the best-effort allocation will
work really well.
Third, if anyone *REALLY* cared to make it reliable *and* wanted to sit
around fragmenting memory for a month, they could just start a TDX guest
and kill it to get TDX initialized. This isn't ideal. But, to me, it
beats defining some new, separate ABI (or boot/module option) to do it.
So, let's have those discussions. Long-term, what *is* the most
reliable way to get the TDX module loaded with 100% determinism? What
new ABI or interfaces are needed? Also, is that 100% determinism
required the moment this series is merged? Or, can we work up to it?
I think it can wait until this particular series is farther along.