Re: [PATCH 09/13] KVM: SVM: allow NMI window with vNMI
From: Maxim Levitsky
Date: Mon Nov 21 2022 - 08:42:21 EST
On Thu, 2022-11-17 at 18:21 +0000, Sean Christopherson wrote:
> On Thu, Nov 17, 2022, Maxim Levitsky wrote:
> > When the vNMI is enabled, the only case when the KVM will use an NMI
> > window is when the vNMI injection is pending.
> >
> > In this case on next IRET/RSM/STGI, the injection has to be complete
> > and a new NMI can be injected.
> >
> > Signed-off-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx>
> > ---
> > arch/x86/kvm/svm/svm.c | 19 ++++++++++++-------
> > 1 file changed, 12 insertions(+), 7 deletions(-)
> >
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index cfec4c98bb589b..eaa30f8ace518d 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -2477,7 +2477,10 @@ static int iret_interception(struct kvm_vcpu *vcpu)
> > struct vcpu_svm *svm = to_svm(vcpu);
> >
> > ++vcpu->stat.nmi_window_exits;
> > - vcpu->arch.hflags |= HF_IRET_MASK;
> > +
> > + if (!is_vnmi_enabled(svm))
> > + vcpu->arch.hflags |= HF_IRET_MASK;
>
> Ugh, HF_IRET_MASK is such a terrible name/flag. Given that it lives with GIF
> and NMI, one would naturally think that it means "IRET is intercepted", but it
> really means "KVM just intercepted an IRET and is waiting for NMIs to become
> unblocked".
>
> And on a related topic, why on earth are GIF, NMI, and IRET tracked in hflags?
> They are 100% SVM concepts. IMO, this code would be much easier to follow if
> by making them bools in vcpu_svm with more descriptive names.
>
> > +
> > if (!sev_es_guest(vcpu->kvm)) {
> > svm_clr_intercept(svm, INTERCEPT_IRET);
> > svm->nmi_iret_rip = kvm_rip_read(vcpu);
>
> The vNMI interaction with this logic is confusing, as nmi_iret_rip doesn't need
> to be captured for the vNMI case. SEV-ES actually has unrelated reasons for not
> reading RIP vs. not intercepting IRET, they just got bundled together here for
> convenience.
Yes, this can be cleaned up, again I didn't want to change too much of the code.
>
> This is also an opportunity to clean up the SEV-ES interaction with IRET interception,
> which is splattered all over the place and isn't documented anywhere.
>
> E.g. (with an HF_IRET_MASK => awaiting_iret_completion change)
>
> /*
> * For SEV-ES guests, KVM must not rely on IRET to detect NMI unblocking as
> * #VC->IRET in the guest will result in KVM thinking NMIs are unblocked before
> * the guest is ready for a new NMI. Architecturally, KVM is 100% correct to
> * treat NMIs as unblocked on IRET, but the guest-host ABI for SEV-ES guests is
> * that KVM must wait for an explicit "NMI Complete" from the guest.
> */
> static void svm_disable_iret_interception(struct vcpu_svm *svm)
> {
> if (!sev_es_guest(svm->vcpu.kvm))
> svm_clr_intercept(svm, INTERCEPT_IRET);
> }
>
> static void svm_enable_iret_interception(struct vcpu_svm *svm)
> {
> if (!sev_es_guest(svm->vcpu.kvm))
> svm_set_intercept(svm, INTERCEPT_IRET);
> }
This makes sense, but doesn't have to be done in this patch series IMHO.
>
> static int iret_interception(struct kvm_vcpu *vcpu)
> {
> struct vcpu_svm *svm = to_svm(vcpu);
>
> ++vcpu->stat.nmi_window_exits;
>
> /*
> * No need to wait for the IRET to complete if vNMIs are enabled as
> * hardware will automatically process the pending NMI when NMIs are
> * unblocked from the guest's perspective.
> */
> if (!is_vnmi_enabled(svm)) {
> svm->awaiting_iret_completion = true;
>
> /*
> * The guest's RIP is inaccessible for SEV-ES guests, just
> * assume forward progress was made on the next VM-Exit.
> */
> if (!sev_es_guest(vcpu->kvm))
> svm->nmi_iret_rip = kvm_rip_read(vcpu);
> }
>
> svm_disable_iret_interception(svm);
>
> kvm_make_request(KVM_REQ_EVENT, vcpu);
> return 1;
> }
> > @@ -3735,9 +3738,6 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
> > {
> > struct vcpu_svm *svm = to_svm(vcpu);
> >
> > - if (is_vnmi_enabled(svm))
> > - return;
> > -
> > if ((vcpu->arch.hflags & (HF_NMI_MASK | HF_IRET_MASK)) == HF_NMI_MASK)
> > return; /* IRET will cause a vm exit */
>
> As much as I like incremental patches, in this case I'm having a hell of a time
> reviewing the code as the vNMI logic ends up being split across four patches.
> E.g. in this particular case, the above requires knowing that svm_inject_nmi()
> never sets HF_NMI_MASK when vNMI is enabled.
>
> In the next version, any objection to squashing patches 7-10 into a single "Add
> non-nested vNMI support" patch?
No objection at all - again since this is not my patch series, I didn't want
to make too many invasive changes to it.
>
> As for this code, IMO some pre-work to change the flow would help with the vNMI
> case. The GIF=0 logic overrides legacy NMI blocking, and so can be handled first.
> And I vote to explicitly set INTERCEPT_IRET in the above case instead of relying
> on INTERCEPT_IRET to already be set by svm_inject_nmi().
>
> That would yield this as a final result:
>
> static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
> {
> struct vcpu_svm *svm = to_svm(vcpu);
>
> /*
> * GIF=0 blocks NMIs irrespective of legacy NMI blocking. No need to
> * intercept or single-step IRET if GIF=0, just intercept STGI.
> */
> if (!gif_set(svm)) {
> if (vgif)
> svm_set_intercept(svm, INTERCEPT_STGI);
> return;
> }
>
> /*
> * NMI is blocked, either because an NMI is in service or because KVM
> * just injected an NMI. If KVM is waiting for an intercepted IRET to
> * complete, single-step the IRET to wait for NMIs to become unblocked.
> * Otherwise, intercept the guest's next IRET.
> */
> if (svm->awaiting_iret_completion) {
> svm->nmi_singlestep_guest_rflags = svm_get_rflags(vcpu);
> svm->nmi_singlestep = true;
> svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
> } else {
> svm_set_intercept(svm, INTERCEPT_IRET);
> }
> }
>
> >
> > @@ -3751,9 +3751,14 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
> > * Something prevents NMI from been injected. Single step over possible
> > * problem (IRET or exception injection or interrupt shadow)
> > */
> > - svm->nmi_singlestep_guest_rflags = svm_get_rflags(vcpu);
> > - svm->nmi_singlestep = true;
> > - svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
> > +
> > + if (is_vnmi_enabled(svm)) {
> > + svm_set_intercept(svm, INTERCEPT_IRET);
>
> This will break SEV-ES. Per commit 4444dfe4050b ("KVM: SVM: Add NMI support for
> an SEV-ES guest"), the hypervisor must not rely on IRET interception to detect
> NMI unblocking for SEV-ES guests. As above, I think we should provide helpers to
> toggle NMI interception to reduce the probability of breaking SEV-ES.
Yes, one more reason for the helpers, I didn't notice that I missed that 'if'.
>
> > + } else {
> > + svm->nmi_singlestep_guest_rflags = svm_get_rflags(vcpu);
> > + svm->nmi_singlestep = true;
> > + svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
> > + }
> > }
> >
> > static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
> > --
> > 2.34.3
> >
>
Best regards,
Maxim Levitsky