Re: [PATCH v2 07/16] KVM: Store gfn_to_pfn_cache length as an immutable property
From: Sean Christopherson
Date: Tue Nov 22 2022 - 13:59:19 EST
On Mon, Nov 21, 2022, David Woodhouse wrote:
> On Mon, 2022-11-21 at 19:11 +0000, Sean Christopherson wrote:
> > On Mon, Nov 21, 2022, David Woodhouse wrote:
> > > I won't fight for it, but I quite liked the idea that each user of a
> > > GPC would know how much space *it* is going to access, and provide that
> > > length as a required parameter. I do note you've added a WARN_ON to one
> > > such user, and that's great — but overall, this patch makes that
> > > checking *optional* instead of mandatory.
> >
> > I honestly don't see a meaningful difference in this case. The only practical
> > difference is that shoving @len into the cache makes the check a one-time thing.
> > The "mandatory" check at use time still relies on a human to not make a mistake.
> > If the check were derived from the actual access, a la get_user(), then I would
> > feel differently.
> >
> > Case in point, the mandatory check didn't prevent KVM from screwing up bounds
> > checking in kvm_xen_schedop_poll(). The PAGE_SIZE passed in for @len is in no
> > way tied to actual access that's being performed, the code is simply regurgitating
> > the size of the cache.
>
> True, but that's a different class of bug, and the human needs to make
> a more *egregious* mistake.
>
> If the function itself writes outside the size that *it* thinks *it's*
> going to write, right there and then in that function, that's utterly
> hosed (and the SCHEDOP_poll thing was indeed so hosed).
Yes, such mistakes are more egregious in the sense they are harder to find and
have more severe consequences, but I don't think the mistakes are necessarily
harder to make. Bugs in simple usage patterns are easy to spot, but at the same
time they're also less likely to be buggy because they're simpler.
> The mandatory check *did* save us from configuring a 32-bit runstate
> area at the end of a page, then *writing* to it in 64-bit mode (where
> it's larger) and running off the end of the page.
Only because the length/capacity wasn't immutable, i.e. that particilar bug couldn't
have been introduced in the first place if kvm_gpc_activate() was the only "public"
API that allowed "changing" the length.
That's really what I dislike. I have no objection to adding a sanity check, what
I think is broken and dangerous is allowing a gpc->gpa to effectively become valid
by refreshing with a smaller length.
The gfn_to_hva_cache APIs have the same problem, but they get away with it because
they don't support concurrent usage and don't have to deal with invalidation events.
Lastly, if we keep "length" then we also need to keep "gpa", otherwise the resulting
API is all kinds of funky.
E.g. I'd be totally ok with something like this that would allow users to opt-in
to sanity checking their usage.
int __kvm_gpc_lock(struct gfn_to_pfn_cache *gpc)
{
int r;
read_lock_irqsave(&gpc->lock, gpc->flags);
while (kvm_gpc_check(gpc)) {
read_unlock_irqrestore(&gpc->lock, gpc->flags);
r = kvm_gpc_refresh(gpc);
if (r)
return r;
read_lock_irqsave(&gpc->lock, gpc->flags);
}
return 0;
}
int kvm_gpc_lock(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned long len)
{
if (WARN_ON_ONCE(gpa < gpc->gpa || (gpa + len > PAGE_SIZE) ||
((gpa & PAGE_MASK) != (gpc->gpa & PAGE_MASK)))
return -EINVAL;
return __kvm_gpc_lock(gpc);
}