Re: [PATCH v7 09/20] x86/virt/tdx: Get information about TDX module and TDX-capable memory
From: Huang, Kai
Date: Wed Nov 23 2022 - 17:54:55 EST
On Wed, 2022-11-23 at 08:44 -0800, Dave Hansen wrote:
> > On 11/23/22 03:40, Huang, Kai wrote:
> > > > On Tue, 2022-11-22 at 15:39 -0800, Dave Hansen wrote:
> > > > > > That last sentece is kinda goofy. I think there's a way to distill this
> > > > > > whole thing down more effecively.
> > > > > >
> > > > > > CMRs tell the kernel which memory is TDX compatible. The kernel
> > > > > > takes CMRs and constructs "TD Memory Regions" (TDMRs). TDMRs
> > > > > > let the kernel grant TDX protections to some or all of the CMR
> > > > > > areas.
> > > >
> > > > Will do.
> > > >
> > > > But it seems we should still mention "Constructing TDMRs requires information of
> > > > both the TDX module (TDSYSINFO_STRUCT) and the CMRs"? The reason is to justify
> > > > "use static to avoid having to pass them as function arguments when constructing
> > > > TDMRs" below.
> >
> > In a changelog, no. You do *NOT* use super technical language in
> > changelogs if not super necessary. Mentioning "TDSYSINFO_STRUCT" here
> > is useless. The *MOST* you would do for a good changelog is:
> >
> > The kernel takes CMRs (plus a little more metadata) and
> > constructs "TD Memory Regions" (TDMRs).
> >
> > You just need to talk about things at a high level in mostly
> > non-technical language so that folks know the structure of the code
> > below. It's not a replacement for the code, the comments, *OR* the TDX
> > module specification.
> >
> > I'm also not quite sure that this justifies the static variables anyway.
> > They could be dynamically allocated and passed around, for instance.
I see. Thanks for explaining.
> >
> > > > > > > > Use static variables for both TDSYSINFO_STRUCT and CMR array to avoid
> > > > > >
> > > > > > I find it very useful to be precise when referring to code. Your code
> > > > > > says 'tdsysinfo_struct', yet this says 'TDSYSINFO_STRUCT'. Why the
> > > > > > difference?
> > > >
> > > > Here I actually didn't intend to refer to any code. In the above paragraph
> > > > (that is going to be replaced with yours), I mentioned "TDSYSINFO_STRUCT" to
> > > > explain what does "information of the TDX module" actually refer to, since
> > > > TDSYSINFO_STRUCT is used in the spec.
> > > >
> > > > What's your preference?
> >
> > Kill all mentions to TDSYSINFO_STRUCT whatsoever in the changelog.
> > Write comprehensible English.
OK.
> >
> > > > > > > > having to pass them as function arguments when constructing the TDMR
> > > > > > > > array. And they are too big to be put to the stack anyway. Also, KVM
> > > > > > > > needs to use the TDSYSINFO_STRUCT to create TDX guests.
> > > > > >
> > > > > > This is also a great place to mention that the tdsysinfo_struct contains
> > > > > > a *lot* of gunk which will not be used for a bit or that may never get
> > > > > > used.
> > > >
> > > > Perhaps below?
> > > >
> > > > "Note many members in tdsysinfo_struct' are not used by the kernel".
> > > >
> > > > Btw, may I ask why does it matter?
> >
> > Because you're adding a massive structure with all kinds of fields.
> > Those fields mostly aren't used. That could be from an error in this
> > series, or because they will be used later or because they will *never*
> > be used.
OK.
> >
> > > > > > > > + cmr = &cmr_array[0];
> > > > > > > > + /* There must be at least one valid CMR */
> > > > > > > > + if (WARN_ON_ONCE(is_cmr_empty(cmr) || !is_cmr_ok(cmr)))
> > > > > > > > + goto err;
> > > > > > > > +
> > > > > > > > + cmr_num = *actual_cmr_num;
> > > > > > > > + for (i = 1; i < cmr_num; i++) {
> > > > > > > > + struct cmr_info *cmr = &cmr_array[i];
> > > > > > > > + struct cmr_info *prev_cmr = NULL;
> > > > > > > > +
> > > > > > > > + /* Skip further empty CMRs */
> > > > > > > > + if (is_cmr_empty(cmr))
> > > > > > > > + break;
> > > > > > > > +
> > > > > > > > + /*
> > > > > > > > + * Do sanity check anyway to make sure CMRs:
> > > > > > > > + * - are 4K aligned
> > > > > > > > + * - don't overlap
> > > > > > > > + * - are in address ascending order.
> > > > > > > > + */
> > > > > > > > + if (WARN_ON_ONCE(!is_cmr_ok(cmr)))
> > > > > > > > + goto err;
> > > > > >
> > > > > > Why does cmr_array[0] get a pass on the empty and sanity checks?
> > > >
> > > > TDX MCHECK verifies CMRs before enabling TDX, so there must be at least one
> > > > valid CMR.
> > > >
> > > > And cmr_array[0] is checked before this loop.
> >
> > I think you're confusing two separate things. MCHECK ensures that there
> > is convertible memory. The CMRs that this code looks at are software
> > (TD module) defined and created structures that the OS and the module share.
Not sure whether I completely got your words, but the CMRs are generated by the
BIOS, verified and stored by the MCHECK. Thus the CMR structure is also
meaningful to the BIOS and the MCHECK, but not TDX module defined and created.
There are couple of places in the TDX module spec which says this. One example
is "Table 3.1: Typical Intel TDX Module Platform-Scope Initialization Sequence"
and "13.1.1. Initialization and Configuration Flow". They both mention:
"BIOS configures Convertible Memory Regions (CMRs); MCHECK checks them and
securely stores the information."
Also, "20.8.3 CMR_INFO":
"CMR_INFO is designed to provide information about a Convertible Memory Range
(CMR), as configured by BIOS and checked and stored securely by MCHECK."
> >
> > This cmr_array[] structure is not created by MCHECK.
Right.
But TDH.SYS.INFO only "Retrieve Intel TDX module information and convertible
memory (CMR) information." by writing CMRs to the buffer provided by the kernel
(cmr_array[]).
So my understanding is the entries in the cmr_array[] are just the same CMRs
that are verified by the MCHECK.
> >
> > Go look at your code. Consider what will happen if cmr_array[0] is
> > empty or !is_cmr_ok(). Then consider what will happen if cmr_array[1]
> > has the same happen.
> >
> > Does that end result really justify having separate code for
> > cmr_array[0] and cmr_array[>0]?
One slight difference is cmr_array[0] must be valid, but cmr_array[>1] can be
empty. And for cmr_array[>0] we also have additional check against the previous
one.
> >
> > > > > > > > + prev_cmr = &cmr_array[i - 1];
> > > > > > > > + if (WARN_ON_ONCE((prev_cmr->base + prev_cmr->size) >
> > > > > > > > + cmr->base))
> > > > > > > > + goto err;
> > > > > > > > + }
> > > > > > > > +
> > > > > > > > + /* Update the actual number of CMRs */
> > > > > > > > + *actual_cmr_num = i;
> > > > > >
> > > > > > That comment is not helpful. Yes, this is literally updating the number
> > > > > > of CMRs. Literally. That's the "what". But, the "why" is important.
> > > > > > Why is it doing this?
> > > >
> > > > When building the list of "TDX-usable" memory regions, the kernel verifies those
> > > > regions against CMRs to see whether they are truly convertible memory.
> > > >
> > > > How about adding a comment like below:
> > > >
> > > > /*
> > > > * When the kernel builds the TDX-usable memory regions, it verifies
> > > > * they are truly convertible memory by checking them against CMRs.
> > > > * Update the actual number of CMRs to skip those empty CMRs.
> > > > */
> > > >
> > > > Also, I think printing CMRs in the dmesg is helpful. Printing empty (zero) CMRs
> > > > will put meaningless log to the dmesg.
> >
> > So it's just about printing them?
> >
> > Then put a dang switch to the print function that says "print them all"
> > or not.
Yes can do. Currently "print them all" is only done when the CMR sanity check
fails. We can unconditionally "print valid CMRs" if we don't need that check.
> >
> > ...
> > > > > > Also, I saw the loop above check 'cmr_num' CMRs for is_cmr_ok(). Now,
> > > > > > it'll print an 'actual_cmr_num=1' number of CMRs as being
> > > > > > "kernel-checked". Why? That makes zero sense.
> > > >
> > > > The loop quits when it sees an empty CMR. I think there's no need to check
> > > > further CMRs as they must be empty (TDX MCHECK verifies CMRs).
> >
> > OK, so you're going to get some more homework here. Please explain to
> > me how MCHECK and the CMR array that comes out of the TDX module are
> > related. How does the output from MCHECK get turned into the in-memory
> > cmr_array[], step by step?
> >
(Please also see my above reply)
1. BIOS generates the CMRs and pass to the MCHECK
2. MCHECK verifies CMRs and stores the "CMR table in a pre-defined location in
SEAMRR’s SEAMCFG region so it can be read later and trusted by the Intel TDX
module" (13.1.4.1 Intel TDX ISA Background: Convertible Memory Ranges (CMRs)).
3. TDH.SYS.INFO copies the CMRs to the buffer provided by the kernel
(cmr_array[]).
> > At this point, I fear that you're offering up MCHECK like it's a bag of
> > magic beans rather than really truly thinking about the cmr_array[] data
> > structure. How it is generated? How might it be broken? Who might
> > break it? If so, what the kernel should do about it?
Only kernel bug can break the cmr_array[] I think. As described in "13.1.4.1
Intel TDX ISA Background: Convertible Memory Ranges (CMRs)", MCHECK should have
guaranteed that:
- there must be one CMR
- CMR is page aligned
- CMRs don't overlap and in address ascending order
The only legal thing is there might be empty CMRs at the tail of the cmr_array[]
following one or more valid CMRs.
> >
> >
> > > > > > > > +
> > > > > > > > + /*
> > > > > > > > + * trim_empty_cmrs() updates the actual number of CMRs by
> > > > > > > > + * dropping all tail empty CMRs.
> > > > > > > > + */
> > > > > > > > + return trim_empty_cmrs(tdx_cmr_array, &tdx_cmr_num);
> > > > > > > > +}
> > > > > >
> > > > > > Why does this both need to respect the "tdx_cmr_num = out.r9" value
> > > > > > *and* trim the empty ones? Couldn't it just ignore the "tdx_cmr_num =
> > > > > > out.r9" value and just trim the empty ones either way? It's not like
> > > > > > there is a billion of them. It would simplify the code for sure.
> > > >
> > > > OK. Since spec says MAX_CMRs is 32, so I can use 32 instead of reading out from
> > > > R9.
> >
> > But then you still have the "trimming" code. Why not just trust "r9"
> > and then axe all the trimming code? Heck, and most of the sanity checks.
> >
> > This code could be a *lot* smaller.
As I said the only problem is there might be empty CMRs at the tail of the
cmr_array[] following one or more valid CMRs.
But we can also do nothing here, but just skip empty CMRs when comparing the
memory region to it (in next patch).
Or, we don't even need to explicitly check memory region against CMRs. If the
memory regions that we provided in the TDMR doesn't fall into CMR, then
TDH.SYS.CONFIG will fail. We can just depend on the SEAMCALL to do that.