Re: [RESEND PATCH 5/6] KVM: x86/VMX: add kvm_vmx_reinject_nmi_irq() for NMI/IRQ reinjection
From: Sean Christopherson
Date: Tue Nov 22 2022 - 15:52:54 EST
On Tue, Nov 22, 2022, Li, Xin3 wrote:
> > > > > > > And yes, the current code appears to suffer the same defect.
> > >
> > > That defect isn't going to be fixed simply by changing how KVM
> > > forwards NMIs though. IIUC, _everything_ between VM-Exit and the
> > > invocation of the NMI handler needs to be noinstr. On VM-Exit due to
> > > NMI, NMIs are blocked. If a #BP/#DB/#PF occurs before KVM gets to
> > > kvm_x86_handle_exit_irqoff(), the subsequent IRET will unblock NMIs
> > > before the original NMI is serviced, i.e. a second NMI could come in
> > > at anytime regardless of how KVM forwards the NMI to the kernel.
> > >
> > > Is there any way to solve this without tagging everything noinstr?
> > > There is a metric shit ton of code between VM-Exit and the handling of
> > > NMIs, and much of that code is common helpers. It might be possible
> > > to hoist NMI handler much earlier, though we'd need to do a super
> > > thorough audit to ensure all necessary host state is restored.
> >
> > As NMI is the only vector with this potential issue, it sounds a good idea to only
> > promote its handling.
> >
>
> Hi Peter/Sean,
>
> I prefer to move _everything_ between VM-Exit and the invocation of the NMI
> handler into the noinstr section in the next patch set, how do you think?
That's likely going to be beyond painful and will have a _lot_ of collateral
damage in the sense that other paths will end up calling noinstr function just
because of VMX. E.g. hw_breakpoint_restore(), fpu_sync_guest_vmexit_xfd_state(),
kvm_get_apic_mode(), multiple static calls in KVM... the list goes on and on and on.
The ongoing maintenance for that would also be quite painful.
Actually, SVM already enables NMIs far earlier, which means the probability of
breaking something by moving NMI handling earlier is lower. Not zero, as I don't
trust that SVM gets all the corner right, but definitely low.
E.g. this should be doable
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index cea8c07f5229..2fec93f38960 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7249,6 +7249,8 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
if (unlikely(vmx->exit_reason.failed_vmentry))
return EXIT_FASTPATH_NONE;
+ <handle NMIs here>
+
vmx->loaded_vmcs->launched = 1;
vmx_recover_nmi_blocking(vmx);
thouh we'd like want a fair bit of refactoring so that all of vmx_vcpu_run() and
svm_vcpu_run() don't need to be noinstr.
Another wart that needs to be addressed is trace_kvm_exit(). IIRC, tracepoints
must be outside of noinstr, though maybe I'm misremembering that. And conversely,
SVM doesn't trace exits that are handled in the fastpath. Best option is probably
to move VMX's trace_kvm_exit() call to vmx_handle_exit(), and then figure out a
common way to trace exits that are handled in the fastpath (which can reside outside
of the noinstr section).