Commit graph

473 commits

Author SHA1 Message Date
Paolo Bonzini
c0623f5e5d Merge branch 'kvm-fixes' into 'next'
Pick up bugfixes from 5.9, otherwise various tests fail.
2020-10-21 18:05:58 -04:00
Sean Christopherson
2ed41aa631 KVM: VMX: Intercept guest reserved CR4 bits to inject #GP fault
Intercept CR4 bits that are guest reserved so that KVM correctly injects
a #GP fault if the guest attempts to set a reserved bit.  If a feature
is supported by the CPU but is not exposed to the guest, and its
associated CR4 bit is not intercepted by KVM by default, then KVM will
fail to inject a #GP if the guest sets the CR4 bit without triggering
an exit, e.g. by toggling only the bit in question.

Note, KVM doesn't give the guest direct access to any CR4 bits that are
also dependent on guest CPUID.  Yet.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200930041659.28181-5-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-21 17:48:50 -04:00
Sean Christopherson
a6337a3542 KVM: x86: Move call to update_exception_bitmap() into VMX code
Now that vcpu_after_set_cpuid() and update_exception_bitmap() are called
back-to-back, subsume the exception bitmap update into the common CPUID
update.  Drop the SVM invocation entirely as SVM's exception bitmap
doesn't vary with respect to guest CPUID.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200930041659.28181-4-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-21 17:48:50 -04:00
Maxim Levitsky
72f211ecaa KVM: x86: allow kvm_x86_ops.set_efer to return an error value
This will be used to signal an error to the userspace, in case
the vendor code failed during handling of this msr. (e.g -ENOMEM)

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20201001112954.6258-4-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-21 17:48:48 -04:00
Sean Christopherson
9389b9d5d3 KVM: VMX: Ignore userspace MSR filters for x2APIC
Rework the resetting of the MSR bitmap for x2APIC MSRs to ignore userspace
filtering.  Allowing userspace to intercept reads to x2APIC MSRs when
APICV is fully enabled for the guest simply can't work; the LAPIC and thus
virtual APIC is in-kernel and cannot be directly accessed by userspace.
To keep things simple we will in fact forbid intercepting x2APIC MSRs
altogether, independent of the default_allow setting.

Cc: Alexander Graf <graf@amazon.com>
Cc: Aaron Lewis <aaronlewis@google.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20201005195532.8674-3-sean.j.christopherson@intel.com>
[Modified to operate even if APICv is disabled, adjust documentation. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-21 17:36:19 -04:00
Peter Xu
628ade2d08 KVM: VMX: Fix x2APIC MSR intercept handling on !APICV platforms
Fix an inverted flag for intercepting x2APIC MSRs and intercept writes
by default, even when APICV is enabled.

Fixes: 3eb900173c ("KVM: x86: VMX: Prevent MSR passthrough when MSR access is denied")
Co-developed-by: Peter Xu <peterx@redhat.com>
[sean: added changelog]
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20201005195532.8674-2-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-19 12:22:52 -04:00
Paolo Bonzini
b502e6ecdc KVM: VMX: update PFEC_MASK/PFEC_MATCH together with PF intercept
The PFEC_MASK and PFEC_MATCH fields in the VMCS reverse the meaning of
the #PF intercept bit in the exception bitmap when they do not match.
This means that, if PFEC_MASK and/or PFEC_MATCH are set, the
hypervisor can get a vmexit for #PF exceptions even when the
corresponding bit is clear in the exception bitmap.

This is unexpected and is promptly detected by a WARN_ON_ONCE.
To fix it, reset PFEC_MASK and PFEC_MATCH when the #PF intercept
is disabled (as is common with enable_ept && !allow_smaller_maxphyaddr).

Reported-by: Qian Cai <cai@redhat.com>>
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-03 05:07:40 -04:00
kernel test robot
6a2e0923b2 KVM: VMX: vmx_uret_msrs_list[] can be static
Fixes: 14a61b642d ("KVM: VMX: Rename "vmx_msr_index" to "vmx_uret_msrs_list"")
Signed-off-by: kernel test robot <lkp@intel.com>
Message-Id: <20200928153714.GA6285@a3a878002045>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-29 05:44:37 -04:00
Alexander Graf
3eb900173c KVM: x86: VMX: Prevent MSR passthrough when MSR access is denied
We will introduce the concept of MSRs that may not be handled in kernel
space soon. Some MSRs are directly passed through to the guest, effectively
making them handled by KVM from user space's point of view.

This patch introduces all logic required to ensure that MSRs that
user space wants trapped are not marked as direct access for guests.

Signed-off-by: Alexander Graf <graf@amazon.com>
Message-Id: <20200925143422.21718-7-graf@amazon.com>
[Replace "_idx" with "_slot". - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:58:07 -04:00
Aaron Lewis
476c9bd8e9 KVM: x86: Prepare MSR bitmaps for userspace tracked MSRs
Prepare vmx and svm for a subsequent change that ensures the MSR permission
bitmap is set to allow an MSR that userspace is tracking to force a vmx_vmexit
in the guest.

Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
[agraf: rebase, adapt SVM scheme to nested changes that came in between]
Signed-off-by: Alexander Graf <graf@amazon.com>
Message-Id: <20200925143422.21718-5-graf@amazon.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:58:05 -04:00
Sean Christopherson
802145c56a KVM: VMX: Rename vmx_uret_msr's "index" to "slot"
Rename "index" to "slot" in struct vmx_uret_msr to align with the
terminology used by common x86's kvm_user_return_msrs, and to avoid
conflating "MSR's ECX index" with "MSR's index into an array".

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200923180409.32255-16-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:58:02 -04:00
Sean Christopherson
14a61b642d KVM: VMX: Rename "vmx_msr_index" to "vmx_uret_msrs_list"
Rename "vmx_msr_index" to "vmx_uret_msrs_list" to associate it with the
uret MSRs array, and to avoid conflating "MSR's ECX index" with "MSR's
index into an array".  Similarly, don't use "slot" in the name as that
terminology is claimed by the common x86 "user_return_msrs" mechanism.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200923180409.32255-15-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:58:01 -04:00
Sean Christopherson
7bf662bb5e KVM: VMX: Rename "vmx_set_guest_msr" to "vmx_set_guest_uret_msr"
Add "uret" to vmx_set_guest_msr() to explicitly associate it with the
guest_uret_msrs array, and to differentiate it from vmx_set_msr() as
well as VMX's load/store MSRs.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200923180409.32255-14-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:58:01 -04:00
Sean Christopherson
d85a8034c0 KVM: VMX: Rename "find_msr_entry" to "vmx_find_uret_msr"
Rename "find_msr_entry" to scope it to VMX and to associate it with
guest_uret_msrs.  Drop the "entry" so that the function name pairs with
the existing __vmx_find_uret_msr(), which intentionally uses a double
underscore prefix instead of appending "index" or "slot" as those names
are already claimed by other pieces of the user return MSR stack.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200923180409.32255-13-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:58:00 -04:00
Sean Christopherson
bd65ba82b3 KVM: VMX: Add vmx_setup_uret_msr() to handle lookup and swap
Add vmx_setup_uret_msr() to wrap the lookup and manipulation of the uret
MSRs array during setup_msrs().  In addition to consolidating code, this
eliminates move_msr_up(), which while being a very literally description
of the function, isn't exacly helpful in understanding the net effect of
the code.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200923180409.32255-12-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:58:00 -04:00
Sean Christopherson
86e3e494fe KVM: VMX: Move uret MSR lookup into update_transition_efer()
Move checking for the existence of MSR_EFER in the uret MSR array into
update_transition_efer() so that the lookup and manipulation of the
array in setup_msrs() occur back-to-back.  This paves the way toward
adding a helper to wrap the lookup and manipulation.

To avoid unnecessary overhead, defer the lookup until the uret array
would actually be modified in update_transition_efer().  EFER obviously
exists on CPUs that support the dedicated VMCS fields for switching
EFER, and EFER must exist for the guest and host EFER.NX value to
diverge, i.e. there is no danger of attempting to read/write EFER when
it doesn't exist.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200923180409.32255-11-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:57:59 -04:00
Sean Christopherson
ef1d2ee12e KVM: VMX: Check guest support for RDTSCP before processing MSR_TSC_AUX
Check for RDTSCP support prior to checking if MSR_TSC_AUX is in the uret
MSRs array so that the array lookup and manipulation are back-to-back.
This paves the way toward adding a helper to wrap the lookup and
manipulation.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200923180409.32255-10-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:57:59 -04:00
Sean Christopherson
1e7a483037 KVM: VMX: Rename "__find_msr_index" to "__vmx_find_uret_msr"
Rename "__find_msr_index" to scope it to VMX, associate it with
guest_uret_msrs, and to avoid conflating "MSR's ECX index" with "MSR's
array index".  Similarly, don't use "slot" in the name so as to avoid
colliding the common x86's half of "user_return_msrs" (the slot in
kvm_user_return_msrs is not the same slot in guest_uret_msrs).

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200923180409.32255-9-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:57:58 -04:00
Sean Christopherson
658ece84f5 KVM: VMX: Rename vcpu_vmx's "guest_msrs_ready" to "guest_uret_msrs_loaded"
Add "uret" to "guest_msrs_ready" to explicitly associate it with the
"guest_uret_msrs" array, and replace "ready" with "loaded" to more
precisely reflect what it tracks, e.g. "ready" could be interpreted as
meaning ready for processing (setup_msrs() has run), which is wrong.
"loaded" also aligns with the similar "guest_state_loaded" field.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200923180409.32255-8-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:57:58 -04:00
Sean Christopherson
e9bb1ae92d KVM: VMX: Rename vcpu_vmx's "save_nmsrs" to "nr_active_uret_msrs"
Add "uret" into the name of "save_nmsrs" to explicitly associate it with
the guest_uret_msrs array, and replace "save" with "active" (for lack of
a better word) to better describe what is being tracked.  While "save"
is more or less accurate when viewed as a literal description of the
field, e.g. it holds the number of MSRs that were saved into the array
the last time setup_msrs() was invoked, it can easily be misinterpreted
by the reader, e.g. as meaning the number of MSRs that were saved from
hardware at some point in the past, or as the number of MSRs that need
to be saved at some point in the future, both of which are wrong.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200923180409.32255-7-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:57:57 -04:00
Sean Christopherson
fbc1800738 KVM: VMX: Rename vcpu_vmx's "nmsrs" to "nr_uret_msrs"
Rename vcpu_vmx.nsmrs to vcpu_vmx.nr_uret_msrs to explicitly associate
it with the guest_uret_msrs array.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200923180409.32255-6-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:57:57 -04:00
Sean Christopherson
eb3db1b137 KVM: VMX: Rename the "shared_msr_entry" struct to "vmx_uret_msr"
Rename struct "shared_msr_entry" to "vmx_uret_msr" to align with x86's
rename of "shared_msrs" to "user_return_msrs", and to call out that the
struct is specific to VMX, i.e. not part of the generic "shared_msrs"
framework.  Abbreviate "user_return" as "uret" to keep line lengths
marginally sane and code more or less readable.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200923180409.32255-5-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:57:56 -04:00
Sean Christopherson
a128a934f2 KVM: VMX: Rename "vmx_find_msr_index" to "vmx_find_loadstore_msr_slot"
Add "loadstore" to vmx_find_msr_index() to differentiate it from the so
called shared MSRs helpers (which will soon be renamed), and replace
"index" with "slot" to better convey that the helper returns slot in the
array, not the MSR index (the value that gets stuffed into ECX).

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200923180409.32255-4-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:57:55 -04:00
Sean Christopherson
ce833b2324 KVM: VMX: Prepend "MAX_" to MSR array size defines
Add "MAX" to the LOADSTORE and so called SHARED MSR defines to make it
more clear that the define controls the array size, as opposed to the
actual number of valid entries that are in the array.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200923180409.32255-3-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:57:55 -04:00
Sean Christopherson
7e34fbd05c KVM: x86: Rename "shared_msrs" to "user_return_msrs"
Rename the "shared_msrs" mechanism, which is used to defer restoring
MSRs that are only consumed when running in userspace, to a more banal
but less likely to be confusing "user_return_msrs".

The "shared" nomenclature is confusing as it's not obvious who is
sharing what, e.g. reasonable interpretations are that the guest value
is shared by vCPUs in a VM, or that the MSR value is shared/common to
guest and host, both of which are wrong.

"shared" is also misleading as the MSR value (in hardware) is not
guaranteed to be shared/reused between VMs (if that's indeed the correct
interpretation of the name), as the ability to share values between VMs
is simply a side effect (albiet a very nice side effect) of deferring
restoration of the host value until returning from userspace.

"user_return" avoids the above confusion by describing the mechanism
itself instead of its effects.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200923180409.32255-2-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:57:54 -04:00
Sean Christopherson
235ba74f00 KVM: x86: Add intr/vectoring info and error code to kvm_exit tracepoint
Extend the kvm_exit tracepoint to align it with kvm_nested_vmexit in
terms of what information is captured.  On SVM, add interrupt info and
error code, while on VMX it add IDT vectoring and error code.  This
sets the stage for macrofying the kvm_exit tracepoint definition so that
it can be reused for kvm_nested_vmexit without loss of information.

Opportunistically stuff a zero for VM_EXIT_INTR_INFO if the VM-Enter
failed, as the field is guaranteed to be invalid.  Note, it'd be
possible to further filter the interrupt/exception fields based on the
VM-Exit reason, but the helper is intended only for tracepoints, i.e.
an extra VMREAD or two is a non-issue, the failed VM-Enter case is just
low hanging fruit.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200923201349.16097-5-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:57:51 -04:00
Sean Christopherson
2ba4493a8b KVM: nVMX: Explicitly check for valid guest state for !unrestricted guest
Call guest_state_valid() directly instead of querying emulation_required
when checking if L1 is attempting VM-Enter with invalid guest state.
If emulate_invalid_guest_state is false, KVM will fixup segment regs to
avoid emulation and will never set emulation_required, i.e. KVM will
incorrectly miss the associated consistency checks because the nested
path stuffs segments directly into vmcs02.

Opportunsitically add Consistency Check tracing to make future debug
suck a little less.

Fixes: 2bb8cafea8 ("KVM: vVMX: signal failure for nested VMEntry if emulation_required")
Fixes: 3184a995f7 ("KVM: nVMX: fix vmentry failure code when L2 state would require emulation")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200923184452.980-4-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:57:45 -04:00
Sean Christopherson
5a085326d5 KVM: VMX: Rename ops.h to vmx_ops.h
Rename ops.h to vmx_ops.h to allow adding a tdx_ops.h in the future
without causing massive confusion.

Trust Domain Extensions (TDX) is built on VMX, but KVM cannot directly
access the VMCS(es) for a TDX guest, thus TDX will need its own "ops"
implementation for wrapping the low level operations.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200923183112.3030-3-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:57:38 -04:00
Xiaoyao Li
8888cdd099 KVM: VMX: Extract posted interrupt support to separate files
Extract the posted interrupt code so that it can be reused for Trust
Domain Extensions (TDX), which requires posted interrupts and can use
KVM VMX's implementation almost verbatim.  TDX is different enough from
raw VMX that it is highly desirable to implement the guts of TDX in a
separate file, i.e. reusing posted interrupt code by shoving TDX support
into vmx.c would be a mess.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200923183112.3030-2-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:57:38 -04:00
Sean Christopherson
8b50b92f9f KVM: VMX: Add a helper and macros to reduce boilerplate for sec exec ctls
Add a helper function and several wrapping macros to consolidate the
copy-paste code in vmx_compute_secondary_exec_control() for adjusting
controls that are dependent on guest CPUID bits.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200925003011.21016-1-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:57:31 -04:00
Sean Christopherson
7f3603b631 KVM: VMX: Rename RDTSCP secondary exec control name to insert "ENABLE"
Rename SECONDARY_EXEC_RDTSCP to SECONDARY_EXEC_ENABLE_RDTSCP in
preparation for consolidating the logic for adjusting secondary exec
controls based on the guest CPUID model.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200923165048.20486-4-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:57:30 -04:00
Sean Christopherson
b936d3eb92 KVM: VMX: Unconditionally clear CPUID.INVPCID if !CPUID.PCID
If PCID is not exposed to the guest, clear INVPCID in the guest's CPUID
even if the VMCS INVPCID enable is not supported.  This will allow
consolidating the secondary execution control adjustment code without
having to special case INVPCID.

Technically, this fixes a bug where !CPUID.PCID && CPUID.INVCPID would
result in unexpected guest behavior (#UD instead of #GP/#PF), but KVM
doesn't support exposing INVPCID if it's not supported in the VMCS, i.e.
such a config is broken/bogus no matter what.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200923165048.20486-3-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:57:29 -04:00
Sean Christopherson
becdad8592 KVM: VMX: Rename vmx_*_supported() helpers to cpu_has_vmx_*()
Rename helpers for a few controls to conform to the more prevelant style
of cpu_has_vmx_<feature>().  Consistent names will allow adding macros
to consolidate the boilerplate code for adjusting secondary execution
controls.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200923165048.20486-2-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:57:29 -04:00
Sean Christopherson
7096cbfb6c KVM: VMX: Use "illegal GPA" helper for PT/RTIT output base check
Use kvm_vcpu_is_illegal_gpa() to check for a legal GPA when validating a
PT output base instead of open coding a clever, but difficult to read,
variant.  Code readability is far more important than shaving a few uops
in a slow path.

No functional change intended.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200924194250.19137-6-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:57:27 -04:00
Sean Christopherson
dc46515cf8 KVM: x86: Move illegal GPA helper out of the MMU code
Rename kvm_mmu_is_illegal_gpa() to kvm_vcpu_is_illegal_gpa() and move it
to cpuid.h so that's it's colocated with cpuid_maxphyaddr().  The helper
is not MMU specific and will gain a user that is completely unrelated to
the MMU in a future patch.

No functional change intended.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200924194250.19137-5-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:57:27 -04:00
Sean Christopherson
1cc6cbc3e4 KVM: VMX: Replace MSR_IA32_RTIT_OUTPUT_BASE_MASK with helper function
Replace the subtly not-a-constant MSR_IA32_RTIT_OUTPUT_BASE_MASK with a
proper helper function to check whether or not the specified base is
valid.  Blindly referencing the local 'vcpu' is especially nasty.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200924194250.19137-4-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:57:26 -04:00
Sean Christopherson
8d921acf98 KVM: VMX: Use precomputed MAXPHYADDR for RTIT base MSR check
Use cpuid_maxphyaddr() instead of cpuid_query_maxphyaddr() for the
RTIT base MSR check.  There is no reason to recompute MAXPHYADDR as the
precomputed version is synchronized with CPUID updates, and
MSR_IA32_RTIT_OUTPUT_BASE is not written between stuffing CPUID and
refreshing vcpu->arch.maxphyaddr.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200924194250.19137-2-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:57:25 -04:00
Tom Lendacky
28e2b2f1a4 KVM: VMX: Do not perform emulation for INVD intercept
The INVD instruction is emulated as a NOP, just skip the instruction
instead.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <addd41be2fbf50f5f4059e990a2a0cff182d2136.1600972918.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:57:25 -04:00
Sean Christopherson
1a5488ef0d KVM: VMX: Invoke NMI handler via indirect call instead of INTn
Rework NMI VM-Exit handling to invoke the kernel handler by function
call instead of INTn.  INTn microcode is relatively expensive, and
aligning the IRQ and NMI handling will make it easier to update KVM
should some newfangled method for invoking the handlers come along.

Suggested-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200915191505.10355-3-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:57:20 -04:00
Sean Christopherson
535f7ef2ab KVM: VMX: Move IRQ invocation to assembly subroutine
Move the asm blob that invokes the appropriate IRQ handler after VM-Exit
into a proper subroutine.  Unconditionally create a stack frame in the
subroutine so that, as objtool sees things, the function has standard
stack behavior.  The dynamic stack adjustment makes using unwind hints
problematic.

Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200915191505.10355-2-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:57:20 -04:00
Sean Christopherson
09e3e2a1cc KVM: x86: Add kvm_x86_ops hook to short circuit emulation
Replace the existing kvm_x86_ops.need_emulation_on_page_fault() with a
more generic is_emulatable(), and unconditionally call the new function
in x86_emulate_instruction().

KVM will use the generic hook to support multiple security related
technologies that prevent emulation in one way or another.  Similar to
the existing AMD #NPF case where emulation of the current instruction is
not possible due to lack of information, AMD's SEV-ES and Intel's SGX
and TDX will introduce scenarios where emulation is impossible due to
the guest's register state being inaccessible.  And again similar to the
existing #NPF case, emulation can be initiated by kvm_mmu_page_fault(),
i.e. outside of the control of vendor-specific code.

While the cause and architecturally visible behavior of the various
cases are different, e.g. SGX will inject a #UD, AMD #NPF is a clean
resume or complete shutdown, and SEV-ES and TDX "return" an error, the
impact on the common emulation code is identical: KVM must stop
emulation immediately and resume the guest.

Query is_emulatable() in handle_ud() as well so that the
force_emulation_prefix code doesn't incorrectly modify RIP before
calling emulate_instruction() in the absurdly unlikely scenario that
KVM encounters forced emulation in conjunction with "do not emulate".

Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200915232702.15945-1-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:57:19 -04:00
Krish Sadhukhan
bddd82d19e KVM: nVMX: KVM needs to unset "unrestricted guest" VM-execution control in vmcs02 if vmcs12 doesn't set it
Currently, prepare_vmcs02_early() does not check if the "unrestricted guest"
VM-execution control in vmcs12 is turned off and leaves the corresponding
bit on in vmcs02. Due to this setting, vmentry checks which are supposed to
render the nested guest state as invalid when this VM-execution control is
not set, are passing in hardware.

This patch turns off the "unrestricted guest" VM-execution control in vmcs02
if vmcs12 has turned it off.

Suggested-by: Jim Mattson <jmattson@google.com>
Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Message-Id: <20200921081027.23047-2-krish.sadhukhan@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:57:18 -04:00
Babu Moger
9715092f8d KVM: X86: Move handling of INVPCID types to x86
INVPCID instruction handling is mostly same across both VMX and
SVM. So, move the code to common x86.c.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Message-Id: <159985255212.11252.10322694343971983487.stgit@bmoger-ubuntu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:57:17 -04:00
Babu Moger
3f3393b3ce KVM: X86: Rename and move the function vmx_handle_memory_failure to x86.c
Handling of kvm_read/write_guest_virt*() errors can be moved to common
code. The same code can be used by both VMX and SVM.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Message-Id: <159985254493.11252.6603092560732507607.stgit@bmoger-ubuntu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:57:16 -04:00
Wanpeng Li
010fd37fdd KVM: LAPIC: Reduce world switch latency caused by timer_advance_ns
All the checks in lapic_timer_int_injected(), __kvm_wait_lapic_expire(), and
these function calls waste cpu cycles when the timer mode is not tscdeadline.
We can observe ~1.3% world switch time overhead by kvm-unit-tests/vmexit.flat
vmcall testing on AMD server. This patch reduces the world switch latency
caused by timer_advance_ns feature when the timer mode is not tscdeadline by
simpling move the check against apic->lapic_timer.expired_tscdeadline much
earlier.

Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1599731444-3525-7-git-send-email-wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:57:10 -04:00
Chenyi Qiang
b9757a4b6f KVM: nVMX: Simplify the initialization of nested_vmx_msrs
The nested VMX controls MSRs can be initialized by the global capability
values stored in vmcs_config.

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20200828085622.8365-6-chenyi.qiang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:57:07 -04:00
Mohammed Gamal
b96e6506c2 KVM: x86: VMX: Make smaller physical guest address space support user-configurable
This patch exposes allow_smaller_maxphyaddr to the user as a module parameter.
Since smaller physical address spaces are only supported on VMX, the
parameter is only exposed in the kvm_intel module.

For now disable support by default, and let the user decide if they want
to enable it.

Modifications to VMX page fault and EPT violation handling will depend
on whether that parameter is enabled.

Signed-off-by: Mohammed Gamal <mgamal@redhat.com>
Message-Id: <20200903141122.72908-1-mgamal@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-23 09:47:24 -04:00
Paolo Bonzini
bf3c0e5e71 Merge branch 'x86-seves-for-paolo' of https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into HEAD 2020-09-22 06:43:17 -04:00
Wanpeng Li
99b82a1437 KVM: VMX: Don't freeze guest when event delivery causes an APIC-access exit
According to SDM 27.2.4, Event delivery causes an APIC-access VM exit.
Don't report internal error and freeze guest when event delivery causes
an APIC-access exit, it is handleable and the event will be re-injected
during the next vmentry.

Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1597827327-25055-2-git-send-email-wanpengli@tencent.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-12 02:21:17 -04:00
Peter Shier
43fea4e425 KVM: nVMX: Update VMCS02 when L2 PAE PDPTE updates detected
When L2 uses PAE, L0 intercepts of L2 writes to CR0/CR3/CR4 call
load_pdptrs to read the possibly updated PDPTEs from the guest
physical address referenced by CR3.  It loads them into
vcpu->arch.walk_mmu->pdptrs and sets VCPU_EXREG_PDPTR in
vcpu->arch.regs_dirty.

At the subsequent assumed reentry into L2, the mmu will call
vmx_load_mmu_pgd which calls ept_load_pdptrs. ept_load_pdptrs sees
VCPU_EXREG_PDPTR set in vcpu->arch.regs_dirty and loads
VMCS02.GUEST_PDPTRn from vcpu->arch.walk_mmu->pdptrs[]. This all works
if the L2 CRn write intercept always resumes L2.

The resume path calls vmx_check_nested_events which checks for
exceptions, MTF, and expired VMX preemption timers. If
vmx_check_nested_events finds any of these conditions pending it will
reflect the corresponding exit into L1. Live migration at this point
would also cause a missed immediate reentry into L2.

After L1 exits, vmx_vcpu_run calls vmx_register_cache_reset which
clears VCPU_EXREG_PDPTR in vcpu->arch.regs_dirty.  When L2 next
resumes, ept_load_pdptrs finds VCPU_EXREG_PDPTR clear in
vcpu->arch.regs_dirty and does not load VMCS02.GUEST_PDPTRn from
vcpu->arch.walk_mmu->pdptrs[]. prepare_vmcs02 will then load
VMCS02.GUEST_PDPTRn from vmcs12->pdptr0/1/2/3 which contain the stale
values stored at last L2 exit. A repro of this bug showed L2 entering
triple fault immediately due to the bad VMCS02.GUEST_PDPTRn values.

When L2 is in PAE paging mode add a call to ept_load_pdptrs before
leaving L2. This will update VMCS02.GUEST_PDPTRn if they are dirty in
vcpu->arch.walk_mmu->pdptrs[].

Tested:
kvm-unit-tests with new directed test: vmx_mtf_pdpte_test.
Verified that test fails without the fix.

Also ran Google internal VMM with an Ubuntu 16.04 4.4.0-83 guest running a
custom hypervisor with a 32-bit Windows XP L2 guest using PAE. Prior to fix
would repro readily. Ran 14 simultaneous L2s for 140 iterations with no
failures.

Signed-off-by: Peter Shier <pshier@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Message-Id: <20200820230545.2411347-1-pshier@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-11 13:15:09 -04:00