linux/arch/x86/kvm/vmx
Sean Christopherson cd0e615c49 KVM: nVMX: Synthesize TRIPLE_FAULT for L2 if emulation is required
Synthesize a triple fault if L2 guest state is invalid at the time of
VM-Enter, which can happen if L1 modifies SMRAM or if userspace stuffs
guest state via ioctls(), e.g. KVM_SET_SREGS.  KVM should never emulate
invalid guest state, since from L1's perspective, it's architecturally
impossible for L2 to have invalid state while L2 is running in hardware.
E.g. attempts to set CR0 or CR4 to unsupported values will either VM-Exit
or #GP.

Modifying vCPU state via RSM+SMRAM and ioctl() are the only paths that
can trigger this scenario, as nested VM-Enter correctly rejects any
attempt to enter L2 with invalid state.

RSM is a straightforward case as (a) KVM follows AMD's SMRAM layout and
behavior, and (b) Intel's SDM states that loading reserved CR0/CR4 bits
via RSM results in shutdown, i.e. there is precedent for KVM's behavior.
Following AMD's SMRAM layout is important as AMD's layout saves/restores
the descriptor cache information, including CS.RPL and SS.RPL, and also
defines all the fields relevant to invalid guest state as read-only, i.e.
so long as the vCPU had valid state before the SMI, which is guaranteed
for L2, RSM will generate valid state unless SMRAM was modified.  Intel's
layout saves/restores only the selector, which means that scenarios where
the selector and cached RPL don't match, e.g. conforming code segments,
would yield invalid guest state.  Intel CPUs fudge around this issued by
stuffing SS.RPL and CS.RPL on RSM.  Per Intel's SDM on the "Default
Treatment of RSM", paraphrasing for brevity:

  IF internal storage indicates that the [CPU was post-VMXON]
  THEN
     enter VMX operation (root or non-root);
     restore VMX-critical state as defined in Section 34.14.1;
     set to their fixed values any bits in CR0 and CR4 whose values must
     be fixed in VMX operation [unless coming from an unrestricted guest];
     IF RFLAGS.VM = 0 AND (in VMX root operation OR the
        “unrestricted guest” VM-execution control is 0)
     THEN
       CS.RPL := SS.DPL;
       SS.RPL := SS.DPL;
     FI;
     restore current VMCS pointer;
  FI;

Note that Intel CPUs also overwrite the fixed CR0/CR4 bits, whereas KVM
will sythesize TRIPLE_FAULT in this scenario.  KVM's behavior is allowed
as both Intel and AMD define CR0/CR4 SMRAM fields as read-only, i.e. the
only way for CR0 and/or CR4 to have illegal values is if they were
modified by the L1 SMM handler, and Intel's SDM "SMRAM State Save Map"
section states "modifying these registers will result in unpredictable
behavior".

KVM's ioctl() behavior is less straightforward.  Because KVM allows
ioctls() to be executed in any order, rejecting an ioctl() if it would
result in invalid L2 guest state is not an option as KVM cannot know if
a future ioctl() would resolve the invalid state, e.g. KVM_SET_SREGS, or
drop the vCPU out of L2, e.g. KVM_SET_NESTED_STATE.  Ideally, KVM would
reject KVM_RUN if L2 contained invalid guest state, but that carries the
risk of a false positive, e.g. if RSM loaded invalid guest state and KVM
exited to userspace.  Setting a flag/request to detect such a scenario is
undesirable because (a) it's extremely unlikely to add value to KVM as a
whole, and (b) KVM would need to consider ioctl() interactions with such
a flag, e.g. if userspace migrated the vCPU while the flag were set.

Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211207193006.120997-3-seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-20 08:06:54 -05:00
..
capabilities.h KVM: x86: Use common 'enable_apicv' variable for both APICv and AVIC 2021-06-17 13:09:33 -04:00
evmcs.c KVM: nVMX: Filter out all unsupported controls when eVMCS was activated 2021-09-22 10:33:15 -04:00
evmcs.h x86/kvm: Always inline evmcs_write64() 2021-09-15 15:51:46 +02:00
nested.c KVM: VMX: Set failure code in prepare_vmcs02() 2021-12-02 04:12:11 -05:00
nested.h KVM: nVMX: Introduce 'EVMPTR_MAP_PENDING' post-migration state 2021-06-17 13:09:49 -04:00
pmu_intel.c kvm: x86: Convert return type of *is_valid_rdpmc_ecx() to bool 2021-11-11 10:56:19 -05:00
posted_intr.c KVM: x86: Use a stable condition around all VT-d PI paths 2021-11-30 03:53:14 -05:00
posted_intr.h KVM: VMX: update vcpu posted-interrupt descriptor when assigning device 2021-05-27 07:58:23 -04:00
sgx.c KVM: x86: SGX must obey the KVM_INTERNAL_ERROR_EMULATION protocol 2021-10-25 06:48:25 -04:00
sgx.h KVM: VMX: Enable SGX virtualization for SGX1, SGX2 and LC 2021-04-20 04:18:56 -04:00
vmcs.h KVM: x86: Clean up redundant ROL16(val, n) macro definition 2021-08-13 03:35:16 -04:00
vmcs12.c KVM: x86: Clean up redundant ROL16(val, n) macro definition 2021-08-13 03:35:16 -04:00
vmcs12.h KVM: x86: Clean up redundant ROL16(val, n) macro definition 2021-08-13 03:35:16 -04:00
vmcs_shadow_fields.h KVM: Fix some out-dated function names in comment 2020-01-21 13:57:27 +01:00
vmenter.S KVM/nVMX: Use __vmx_vcpu_run in nested_vmx_check_vmentry_hw 2021-02-04 05:27:32 -05:00
vmx.c KVM: nVMX: Synthesize TRIPLE_FAULT for L2 if emulation is required 2021-12-20 08:06:54 -05:00
vmx.h KVM: nVMX: Use a gfn_to_hva_cache for vmptrld 2021-11-18 02:03:43 -05:00
vmx_ops.h KVM: x86: Move declaration of kvm_spurious_fault() to x86.h 2021-08-13 03:35:16 -04:00