linux/Documentation/virt/kvm/x86/errata.rst
Sean Christopherson 0e3b70aa13 KVM: x86: Document an erratum in KVM_SET_VCPU_EVENTS on Intel CPUs
Document a flaw in KVM's ABI which lets userspace attempt to inject a
"bad" hardware exception event, and thus induce VM-Fail on Intel CPUs.
Fixing the flaw is a fool's errand, as AMD doesn't sanity check the
validity of the error code, Intel CPUs that support CET relax the check
for Protected Mode, userspace can change the mode after queueing an
exception, KVM ignores the error code when emulating Real Mode exceptions,
and so on and so forth.

The VM-Fail itself doesn't harm KVM or the kernel beyond triggering a
ratelimited pr_warn(), so just document the oddity.

Link: https://lore.kernel.org/r/20240802200420.330769-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01 09:22:28 -07:00

80 lines
No EOL
3.2 KiB
ReStructuredText

.. SPDX-License-Identifier: GPL-2.0
=======================================
Known limitations of CPU virtualization
=======================================
Whenever perfect emulation of a CPU feature is impossible or too hard, KVM
has to choose between not implementing the feature at all or introducing
behavioral differences between virtual machines and bare metal systems.
This file documents some of the known limitations that KVM has in
virtualizing CPU features.
x86
===
``KVM_GET_SUPPORTED_CPUID`` issues
----------------------------------
x87 features
~~~~~~~~~~~~
Unlike most other CPUID feature bits, CPUID[EAX=7,ECX=0]:EBX[6]
(FDP_EXCPTN_ONLY) and CPUID[EAX=7,ECX=0]:EBX]13] (ZERO_FCS_FDS) are
clear if the features are present and set if the features are not present.
Clearing these bits in CPUID has no effect on the operation of the guest;
if these bits are set on hardware, the features will not be present on
any virtual machine that runs on that hardware.
**Workaround:** It is recommended to always set these bits in guest CPUID.
Note however that any software (e.g ``WIN87EM.DLL``) expecting these features
to be present likely predates these CPUID feature bits, and therefore
doesn't know to check for them anyway.
``KVM_SET_VCPU_EVENTS`` issue
-----------------------------
Invalid KVM_SET_VCPU_EVENTS input with respect to error codes *may* result in
failed VM-Entry on Intel CPUs. Pre-CET Intel CPUs require that exception
injection through the VMCS correctly set the "error code valid" flag, e.g.
require the flag be set when injecting a #GP, clear when injecting a #UD,
clear when injecting a soft exception, etc. Intel CPUs that enumerate
IA32_VMX_BASIC[56] as '1' relax VMX's consistency checks, and AMD CPUs have no
restrictions whatsoever. KVM_SET_VCPU_EVENTS doesn't sanity check the vector
versus "has_error_code", i.e. KVM's ABI follows AMD behavior.
Nested virtualization features
------------------------------
TBD
x2APIC
------
When KVM_X2APIC_API_USE_32BIT_IDS is enabled, KVM activates a hack/quirk that
allows sending events to a single vCPU using its x2APIC ID even if the target
vCPU has legacy xAPIC enabled, e.g. to bring up hotplugged vCPUs via INIT-SIPI
on VMs with > 255 vCPUs. A side effect of the quirk is that, if multiple vCPUs
have the same physical APIC ID, KVM will deliver events targeting that APIC ID
only to the vCPU with the lowest vCPU ID. If KVM_X2APIC_API_USE_32BIT_IDS is
not enabled, KVM follows x86 architecture when processing interrupts (all vCPUs
matching the target APIC ID receive the interrupt).
MTRRs
-----
KVM does not virtualize guest MTRR memory types. KVM emulates accesses to MTRR
MSRs, i.e. {RD,WR}MSR in the guest will behave as expected, but KVM does not
honor guest MTRRs when determining the effective memory type, and instead
treats all of guest memory as having Writeback (WB) MTRRs.
CR0.CD
------
KVM does not virtualize CR0.CD on Intel CPUs. Similar to MTRR MSRs, KVM
emulates CR0.CD accesses so that loads and stores from/to CR0 behave as
expected, but setting CR0.CD=1 has no impact on the cachaeability of guest
memory.
Note, this erratum does not affect AMD CPUs, which fully virtualize CR0.CD in
hardware, i.e. put the CPU caches into "no fill" mode when CR0.CD=1, even when
running in the guest.