RE: [RESEND PATCH 5/6] KVM: x86/VMX: add kvm_vmx_reinject_nmi_irq() for NMI/IRQ reinjection

From: Li, Xin3
Date: Tue Nov 22 2022 - 08:01:19 EST


> > > > > > And yes, the current code appears to suffer the same defect.
> >
> > That defect isn't going to be fixed simply by changing how KVM
> > forwards NMIs though. IIUC, _everything_ between VM-Exit and the
> > invocation of the NMI handler needs to be noinstr. On VM-Exit due to
> > NMI, NMIs are blocked. If a #BP/#DB/#PF occurs before KVM gets to
> > kvm_x86_handle_exit_irqoff(), the subsequent IRET will unblock NMIs
> > before the original NMI is serviced, i.e. a second NMI could come in
> > at anytime regardless of how KVM forwards the NMI to the kernel.
> >
> > Is there any way to solve this without tagging everything noinstr?
> > There is a metric shit ton of code between VM-Exit and the handling of
> > NMIs, and much of that code is common helpers. It might be possible
> > to hoist NMI handler much earlier, though we'd need to do a super
> > thorough audit to ensure all necessary host state is restored.
>
> As NMI is the only vector with this potential issue, it sounds a good idea to only
> promote its handling.
>

Hi Peter/Sean,

I prefer to move _everything_ between VM-Exit and the invocation of the NMI
handler into the noinstr section in the next patch set, how do you think?

Xin

>
> > > > > With FRED, ERETS/ERETU replace IRET, and use bit 28 of the
> > > > > popped CS field to control whether to unblock NMI. If bit 28 of
> > > > > the field (above the selector) is 1, ERETS/ERETU unblocks NMIs.
> >
> > Side topic, there's a bug in the ISE docs. Section "9.4.2 NMI
> > Blocking" states that bit 16 holds the "unblock NMI" magic, which I'm
> > guessing is a holdover from an earlier revision of FRED.
>
> Good catch, the latest 3.0 draft spec changed it to bit 28, but section 9.4.2
> didn't get a proper update.
>
> >
> > As specified in Section 6.1.3 and Section 6.2.3, ERETS and ERETU
> > each unblocks NMIs
> > if bit 16 of the popped CS field is 1. The following items detail
> > how this behavior may be
> > changed in VMX non-root operation, depending on the settings of
> > certain VM- execution
> > controls:
> >
> > > > Yes, I know that. It is one of the many improvements FRED brings.
> > > > Ideally the IBT WAIT-FOR-ENDBR state also gets squirreled away in
> > > > the hardware exception frame, but that's still up in the air I
> > > > believe :/
> > > >
> > > > Anyway.. given there is interrupt priority and NMI is pretty much
> > > > on top of everything else the reinject crap *should* run NMI first.
> > > > That way NMI runs with the latch disabled and whatever other
> > > > pending
> > interrupts will run later.
> > > >
> > > > But that all is still broken because afaict the current code also
> > > > leaves noinstr -- and once you leave noinstr (or use a static_key,
> > > > static_call or anything else that
> > > > *can* change at runtime) you can't guarantee nothing.
> > >
> > > For NMI, HPA asked me to use "int $2", as it switches to the NMI IST
> > > stack to execute the NMI handler, essentially like how HW deals with
> > > a NMI in host. and I tested it with NMI watchdog, it looks working fine.
> >
> > Heh, well yeah, because that's how KVM used to handle NMIs back before
> > I reworked NMI handling to use the direct call method. Ironically,
> > that original change was done in part to try and make it _easier_ to
> > deal with FRED (back before FRED was publicly disclosed).
> >
> > If KVM reverts to INTn, the fix to route KVM=>NMI through the non-IST
> > entry can be reverted too.
> >
> > a217a6593cec ("KVM/VMX: Invoke NMI non-IST entry instead of IST entry")
> > 1a5488ef0dcf ("KVM: VMX: Invoke NMI handler via indirect call
> > instead of
> > INTn")
>
> Sure, I'm just thinking to put asm("int $2") in a new function exc_raise_nmi()
> defined in arch/x86/kernel/traps.c. Thus we will need to change it only
> whenever we have any better facility, and no KVM VMX change required.