On Tue, Nov 15, 2022 at 04:14:42PM +0100, Vlastimil Babka wrote:
Cc'ing memory failure folks, the beinning of this subthread is here:
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fall%2F3a51840f6a80c87b39632dc728dbd9b5dd444cd7.1655761627.git.ashish.kalra%40amd.com%2F&data=05%7C01%7Cashish.kalra%40amd.com%7C7b2d39d6e2504a8f923608dac792224b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638041727879125176%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=KBJLKhPQP23vmvY%2FNnbjZs8wTJs%2FrF%2BiU54Sdc4Ldx4%3D&reserved=0
On 11/15/22 00:36, Kalra, Ashish wrote:
Hello Boris,
On 11/2/2022 6:22 AM, Borislav Petkov wrote:
On Mon, Oct 31, 2022 at 04:58:38PM -0500, Kalra, Ashish wrote:
if (snp_lookup_rmpentry(pfn, &rmp_level)) {
do_sigbus(regs, error_code, address, VM_FAULT_SIGBUS);
return RMP_PF_RETRY;
Does this issue some halfway understandable error message why the
process got killed?
Will look at adding our own recovery function for the same, but that will
again mark the pages as poisoned, right ?
Well, not poisoned but PG_offlimits or whatever the mm folks agree upon.
Semantically, it'll be handled the same way, ofc.
Added a new PG_offlimits flag and a simple corresponding handler for it.
One thing is, there's not enough page flags to be adding more (except
aliases for existing) for cases that can avoid it, but as Boris says, if
using alias to PG_hwpoison it depends what will become confused with the
actual hwpoison.
I agree with this. Just defining PG_offlimits as an alias of PG_hwpoison
could break current hwpoison workload. So if you finally decide to go
forward in this direction, you may as well have some indicator to
distinguish the new kind of leaked pages from hwpoisoned pages.
I don't remember exact thread, but I've read someone writing about similar
kind of suggestion of using memory_failure() to make pages inaccessible in
non-memory error usecase. I feel that it could be possible to generalize
memory_failure() as general-purpose page offlining (by renaming it with
hard_offline_page() and making memory_failure() one of the user of it).
Thanks,
Naoya Horiguchi