Re: [PATCH Part2 v6 14/49] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled

From: HORIGUCHI NAOYA(堀口 直也)
Date: Wed Nov 16 2022 - 18:43:11 EST


On Wed, Nov 16, 2022 at 04:28:11AM -0600, Kalra, Ashish wrote:
> On 11/15/2022 11:19 PM, HORIGUCHI NAOYA(堀口 直也) wrote:
> > On Tue, Nov 15, 2022 at 04:14:42PM +0100, Vlastimil Babka wrote:
> > > Cc'ing memory failure folks, the beinning of this subthread is here:
> > >
> > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fall%2F3a51840f6a80c87b39632dc728dbd9b5dd444cd7.1655761627.git.ashish.kalra%40amd.com%2F&data=05%7C01%7Cashish.kalra%40amd.com%7C7b2d39d6e2504a8f923608dac792224b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638041727879125176%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=KBJLKhPQP23vmvY%2FNnbjZs8wTJs%2FrF%2BiU54Sdc4Ldx4%3D&reserved=0
> > >
> > > On 11/15/22 00:36, Kalra, Ashish wrote:
> > > > Hello Boris,
> > > >
> > > > On 11/2/2022 6:22 AM, Borislav Petkov wrote:
> > > > > On Mon, Oct 31, 2022 at 04:58:38PM -0500, Kalra, Ashish wrote:
> > > > > >       if (snp_lookup_rmpentry(pfn, &rmp_level)) {
> > > > > >              do_sigbus(regs, error_code, address, VM_FAULT_SIGBUS);
> > > > > >              return RMP_PF_RETRY;
> > > > >
> > > > > Does this issue some halfway understandable error message why the
> > > > > process got killed?
> > > > >
> > > > > > Will look at adding our own recovery function for the same, but that will
> > > > > > again mark the pages as poisoned, right ?
> > > > >
> > > > > Well, not poisoned but PG_offlimits or whatever the mm folks agree upon.
> > > > > Semantically, it'll be handled the same way, ofc.
> > > >
> > > > Added a new PG_offlimits flag and a simple corresponding handler for it.
> > >
> > > One thing is, there's not enough page flags to be adding more (except
> > > aliases for existing) for cases that can avoid it, but as Boris says, if
> > > using alias to PG_hwpoison it depends what will become confused with the
> > > actual hwpoison.
> >
> > I agree with this. Just defining PG_offlimits as an alias of PG_hwpoison
> > could break current hwpoison workload. So if you finally decide to go
> > forward in this direction, you may as well have some indicator to
> > distinguish the new kind of leaked pages from hwpoisoned pages.
> >
> > I don't remember exact thread, but I've read someone writing about similar
> > kind of suggestion of using memory_failure() to make pages inaccessible in
> > non-memory error usecase. I feel that it could be possible to generalize
> > memory_failure() as general-purpose page offlining (by renaming it with
>
> But, doesn't memory_failure() also mark the pages as PG_hwpoison, and then
> using it for these leaked pages will again cause confusion with actual
> hwpoison ?

Yes, so we might need modification of memory_failure code for this approach
like renaming PG_hwpoison to more generic one (although some possible names
like PageOffline and PageIsolated are already used) and/or somehow showing
"which kind of leaked pages" info.

Thanks,
Naoya Horiguchi