Re: [PATCH v7 10/20] x86/virt/tdx: Use all system memory when initializing TDX module as TDX memory

From: Huang, Kai
Date: Wed Nov 23 2022 - 21:27:38 EST


On Wed, 2022-11-23 at 17:22 -0800, Hansen, Dave wrote:
> On 11/23/22 17:04, Huang, Kai wrote:
> > On Tue, 2022-11-22 at 16:21 -0800, Dave Hansen wrote:
> > > > +struct tdx_memblock {
> > > > + struct list_head list;
> > > > + unsigned long start_pfn;
> > > > + unsigned long end_pfn;
> > > > + int nid;
> > > > +};
> > >
> > > Why does the nid matter?
> >
> > It is used to find the node for the PAMT allocation for a given TDMR.
>
> ... which is in this patch?
>
> You can't just plop unused and unmentioned nuggets in the code. Remove
> it until it is needed.

OK. I'll move to the PAMT allocation patch.

>
>
> > > > +/* Check whether the given pfn range is covered by any CMR or not. */
> > > > +static bool pfn_range_covered_by_cmr(unsigned long start_pfn,
> > > > + unsigned long end_pfn)
> > > > +{
> > > > + int i;
> > > > +
> > > > + for (i = 0; i < tdx_cmr_num; i++) {
> > > > + struct cmr_info *cmr = &tdx_cmr_array[i];
> > > > + unsigned long cmr_start_pfn;
> > > > + unsigned long cmr_end_pfn;
> > > > +
> > > > + cmr_start_pfn = cmr->base >> PAGE_SHIFT;
> > > > + cmr_end_pfn = (cmr->base + cmr->size) >> PAGE_SHIFT;
> > > > +
> > > > + if (start_pfn >= cmr_start_pfn && end_pfn <= cmr_end_pfn)
> > > > + return true;
> > > > + }
> > >
> > > What if the pfn range overlaps two CMRs? It will never pass any
> > > individual overlap test and will return false.
> >
> > We can only return true if the two CMRs are contiguous.
> >
> > I cannot think out a reason that a reasonable BIOS could generate contiguous
> > CMRs.
>
> Because it can?
>
> We don't just try and randomly assign what we think is reasonable or
> not. First and foremost, we need to ask whether the configuration in
> question is allowed by the spec.
>
> Would it be a *valid* thing to have two adjacent CMRs? Does the TDX
> module spec disallow it?

No the TDX module doesn't disallow it, IIUC. The spec only says they don't
overlap.

>
> > Perhaps one reason is two contiguous NUMA nodes? For this case, memblock
> > has made sure no memory region could cross NUMA nodes, so the start_pfn/end_pfn
> > here should always be within one node. Perhaps we can add a comment for this
> > case?
>
> <cough> numa=off <cough>
>
> > Anyway I am not sure whether it is worth to consider "contiguous CMRs" case.
>
> I am sure. You need to consider it.

OK.

Also, as mentioned in another reply to patch "Get information about TDX module
and TDX-capable memory", we can depend on TDH.SYS.CONFIG to return failure but
don't necessarily need to sanity check all memory regions are CMR memory. This
way we can just removing above sanity check code here.

What do you think?

>
> > > > + * and don't overlap.
> > > > + */
> > > > +static int add_tdx_memblock(unsigned long start_pfn, unsigned long end_pfn,
> > > > + int nid)
> > > > +{
> > > > + struct tdx_memblock *tmb;
> > > > +
> > > > + tmb = kmalloc(sizeof(*tmb), GFP_KERNEL);
> > > > + if (!tmb)
> > > > + return -ENOMEM;
> > > > +
> > > > + INIT_LIST_HEAD(&tmb->list);
> > > > + tmb->start_pfn = start_pfn;
> > > > + tmb->end_pfn = end_pfn;
> > > > + tmb->nid = nid;
> > > > +
> > > > + list_add_tail(&tmb->list, &tdx_memlist);
> > > > + return 0;
> > > > +}
> > > > +
> > > > +static void free_tdx_memory(void)
> > >
> > > This is named a bit too generically. How about free_tdx_memlist() or
> > > something?
> >
> > Will use free_tdx_memlist(). Do you want to also change build_tdx_memory() to
> > build_tdx_memlist()?
>
> Does it build a memlist?

Yes.


[...]

>
> I actually wasn't asking about the for_each_mem_pfn_range() use.
>
> > And here before skipping first 1MB, we add below:
> >
> > /*
> > * The first 1MB is not reported as TDX covertible memory.
> > * Although the first 1MB is always reserved and won't end up
> > * to the page allocator, it is still in memblock's memory
> > * regions. Skip them manually to exclude them as TDX memory.
> > */
>
> That looks OK, with the spelling fixed.

Yes "covertible" -> "convertible".


[...]

> > > > out:
> > > > + /*
> > > > + * Memory hotplug checks the hot-added memory region against the
> > > > + * @tdx_memlist to see if the region is TDX memory.
> > > > + *
> > > > + * Do put_online_mems() here to make sure any modification to
> > > > + * @tdx_memlist is done while holding the memory hotplug read
> > > > + * lock, so that the memory hotplug path can just check the
> > > > + * @tdx_memlist w/o holding the @tdx_module_lock which may cause
> > > > + * deadlock.
> > > > + */
> > >
> > > I'm honestly not following any of that.
> >
> > How about:
> >
> > /*
> > * Make sure tdx_cc_memory_compatible() either sees a fixed set of
> > * memory regions in @tdx_memlist, or an empty list.
> > */
>
> That's a comment for the lock side, not the unlock side. It should be:
>
> /*
> * @tdx_memlist is written here and read at memory hotplug time.
> * Lock out memory hotplug code while building it.
> */

Thanks.

>
> > > > + put_online_mems();
> > > > return ret;
> > > > }
> > > >
> > > > @@ -485,3 +645,26 @@ int tdx_enable(void)
> > > > return ret;
> > > > }
> > > > EXPORT_SYMBOL_GPL(tdx_enable);
> > > > +
> > > > +/*
> > > > + * Check whether the given range is TDX memory. Must be called between
> > > > + * mem_hotplug_begin()/mem_hotplug_done().
> > > > + */
> > > > +bool tdx_cc_memory_compatible(unsigned long start_pfn, unsigned long end_pfn)
> > > > +{
> > > > + struct tdx_memblock *tmb;
> > > > +
> > > > + /* Empty list means TDX isn't enabled successfully */
> > > > + if (list_empty(&tdx_memlist))
> > > > + return true;
> > > > +
> > > > + list_for_each_entry(tmb, &tdx_memlist, list) {
> > > > + /*
> > > > + * The new range is TDX memory if it is fully covered
> > > > + * by any TDX memory block.
> > > > + */
> > > > + if (start_pfn >= tmb->start_pfn && end_pfn <= tmb->end_pfn)
> > > > + return true;
> > >
> > > Same bug. What if the start/end_pfn range is covered by more than one
> > > tdx_memblock?
> >
> > We may want to return true if tdx_memblocks are contiguous.
> >
> > However I don't think this will happen?
> >
> > tdx_memblock is from memblock, and when two memory regions in memblock are
> > contiguous, they must have different node, or flags.
> >
> > My understanding is the hot-added memory region here cannot across NUMA nodes,
> > nor have different flags, correct?
>
> I'm not sure what flags are in this context.
>

The flags in 'struct memblock_region':

enum memblock_flags {
MEMBLOCK_NONE = 0x0, /* No special request */
MEMBLOCK_HOTPLUG = 0x1, /* hotpluggable region */
MEMBLOCK_MIRROR = 0x2, /* mirrored region */
MEMBLOCK_NOMAP = 0x4, /* don't add to kernel direct mapping */
MEMBLOCK_DRIVER_MANAGED = 0x8, /* always detected via a driver */
};

/**
* struct memblock_region - represents a memory region
* @base: base address of the region
* @size: size of the region
* @flags: memory region attributes
* @nid: NUMA node id
*/
struct memblock_region {
phys_addr_t base;
phys_addr_t size;
enum memblock_flags flags;
#ifdef CONFIG_NUMA
int nid;
#endif
};