License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 15:07:57 +01:00
|
|
|
/* SPDX-License-Identifier: GPL-2.0 */
|
2011-11-23 16:30:32 +02:00
|
|
|
#ifndef ARCH_X86_KVM_CPUID_H
|
|
|
|
#define ARCH_X86_KVM_CPUID_H
|
|
|
|
|
2021-04-21 17:56:22 -07:00
|
|
|
#include "reverse_cpuid.h"
|
2015-11-23 11:12:22 +01:00
|
|
|
#include <asm/cpu.h>
|
2017-08-05 00:12:49 +02:00
|
|
|
#include <asm/processor.h>
|
2020-08-18 15:24:28 +00:00
|
|
|
#include <uapi/asm/kvm_para.h>
|
2011-11-23 16:30:32 +02:00
|
|
|
|
2021-04-12 16:21:35 +12:00
|
|
|
extern u32 kvm_cpu_caps[NR_KVM_CPU_CAPS] __read_mostly;
|
KVM: x86: Introduce kvm_cpu_caps to replace runtime CPUID masking
Calculate the CPUID masks for KVM_GET_SUPPORTED_CPUID at load time using
what is effectively a KVM-adjusted copy of boot_cpu_data, or more
precisely, the x86_capability array in boot_cpu_data.
In terms of KVM support, the vast majority of CPUID feature bits are
constant, and *all* feature support is known at KVM load time. Rather
than apply boot_cpu_data, which is effectively read-only after init,
at runtime, copy it into a KVM-specific array and use *that* to mask
CPUID registers.
In additional to consolidating the masking, kvm_cpu_caps can be adjusted
by SVM/VMX at load time and thus eliminate all feature bit manipulation
in ->set_supported_cpuid().
Opportunistically clean up a few warts:
- Replace bare "unsigned" with "unsigned int" when a feature flag is
captured in a local variable, e.g. f_nx.
- Sort the CPUID masks by function, index and register (alphabetically
for registers, i.e. EBX comes before ECX/EDX).
- Remove the superfluous /* cpuid 7.0.ecx */ comments.
No functional change intended.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
[Call kvm_set_cpu_caps from kvm_x86_ops->hardware_setup due to fixed
GBPAGES patch. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-02 15:56:41 -08:00
|
|
|
void kvm_set_cpu_caps(void);
|
|
|
|
|
KVM: x86: Do all post-set CPUID processing during vCPU creation
During vCPU creation, process KVM's default, empty CPUID as if userspace
set an empty CPUID to ensure consistent and correct behavior with respect
to guest CPUID. E.g. if userspace never sets guest CPUID, KVM will never
configure cr4_guest_rsvd_bits, and thus create divergent, incorrect, guest-
visible behavior due to letting the guest set any KVM-supported CR4 bits
despite the features not being allowed per guest CPUID.
Note! This changes KVM's ABI, as lack of full CPUID processing allowed
userspace to stuff garbage vCPU state, e.g. userspace could set CR4 to a
guest-unsupported value via KVM_SET_SREGS. But it's extremely unlikely
that this is a breaking change, as KVM already has many flows that require
userspace to set guest CPUID before loading vCPU state. E.g. multiple MSR
flows consult guest CPUID on host writes, and KVM_SET_SREGS itself already
relies on guest CPUID being up-to-date, as KVM's validity check on CR3
consumes CPUID.0x7.1 (for LAM) and CPUID.0x80000008 (for MAXPHYADDR).
Furthermore, the plan is to commit to enforcing guest CPUID for userspace
writes to MSRs, at which point bypassing sregs CPUID checks is even more
nonsensical.
Link: https://lore.kernel.org/r/20241128013424.4096668-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-27 17:33:30 -08:00
|
|
|
void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu);
|
2025-01-22 06:31:31 -05:00
|
|
|
struct kvm_cpuid_entry2 *kvm_find_cpuid_entry2(struct kvm_cpuid_entry2 *entries,
|
|
|
|
int nent, u32 function, u64 index);
|
|
|
|
/*
|
|
|
|
* Magic value used by KVM when querying userspace-provided CPUID entries and
|
|
|
|
* doesn't care about the CPIUD index because the index of the function in
|
|
|
|
* question is not significant. Note, this magic value must have at least one
|
|
|
|
* bit set in bits[63:32] and must be consumed as a u64 by kvm_find_cpuid_entry2()
|
|
|
|
* to avoid false positives when processing guest CPUID input.
|
|
|
|
*
|
|
|
|
* KVM_CPUID_INDEX_NOT_SIGNIFICANT should never be used directly outside of
|
|
|
|
* kvm_find_cpuid_entry2() and kvm_find_cpuid_entry().
|
|
|
|
*/
|
|
|
|
#define KVM_CPUID_INDEX_NOT_SIGNIFICANT -1ull
|
|
|
|
|
|
|
|
static inline struct kvm_cpuid_entry2 *kvm_find_cpuid_entry_index(struct kvm_vcpu *vcpu,
|
|
|
|
u32 function, u32 index)
|
|
|
|
{
|
|
|
|
return kvm_find_cpuid_entry2(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent,
|
|
|
|
function, index);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu,
|
|
|
|
u32 function)
|
|
|
|
{
|
|
|
|
return kvm_find_cpuid_entry2(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent,
|
|
|
|
function, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
|
|
|
|
}
|
|
|
|
|
2013-09-22 16:44:50 +02:00
|
|
|
int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid,
|
|
|
|
struct kvm_cpuid_entry2 __user *entries,
|
|
|
|
unsigned int type);
|
2011-11-23 16:30:32 +02:00
|
|
|
int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
|
|
|
|
struct kvm_cpuid *cpuid,
|
|
|
|
struct kvm_cpuid_entry __user *entries);
|
|
|
|
int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,
|
|
|
|
struct kvm_cpuid2 *cpuid,
|
|
|
|
struct kvm_cpuid_entry2 __user *entries);
|
|
|
|
int kvm_vcpu_ioctl_get_cpuid2(struct kvm_vcpu *vcpu,
|
|
|
|
struct kvm_cpuid2 *cpuid,
|
|
|
|
struct kvm_cpuid_entry2 __user *entries);
|
2017-08-24 20:27:52 +08:00
|
|
|
bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
|
2020-03-04 17:34:37 -08:00
|
|
|
u32 *ecx, u32 *edx, bool exact_only);
|
2011-11-23 16:30:32 +02:00
|
|
|
|
KVM: x86: Cache CPUID.0xD XSTATE offsets+sizes during module init
Snapshot the output of CPUID.0xD.[1..n] during kvm.ko initiliaization to
avoid the overead of CPUID during runtime. The offset, size, and metadata
for CPUID.0xD.[1..n] sub-leaves does not depend on XCR0 or XSS values, i.e.
is constant for a given CPU, and thus can be cached during module load.
On Intel's Emerald Rapids, CPUID is *wildly* expensive, to the point where
recomputing XSAVE offsets and sizes results in a 4x increase in latency of
nested VM-Enter and VM-Exit (nested transitions can trigger
xstate_required_size() multiple times per transition), relative to using
cached values. The issue is easily visible by running `perf top` while
triggering nested transitions: kvm_update_cpuid_runtime() shows up at a
whopping 50%.
As measured via RDTSC from L2 (using KVM-Unit-Test's CPUID VM-Exit test
and a slightly modified L1 KVM to handle CPUID in the fastpath), a nested
roundtrip to emulate CPUID on Skylake (SKX), Icelake (ICX), and Emerald
Rapids (EMR) takes:
SKX 11650
ICX 22350
EMR 28850
Using cached values, the latency drops to:
SKX 6850
ICX 9000
EMR 7900
The underlying issue is that CPUID itself is slow on ICX, and comically
slow on EMR. The problem is exacerbated on CPUs which support XSAVES
and/or XSAVEC, as KVM invokes xstate_required_size() twice on each
runtime CPUID update, and because there are more supported XSAVE features
(CPUID for supported XSAVE feature sub-leafs is significantly slower).
SKX:
CPUID.0xD.2 = 348 cycles
CPUID.0xD.3 = 400 cycles
CPUID.0xD.4 = 276 cycles
CPUID.0xD.5 = 236 cycles
<other sub-leaves are similar>
EMR:
CPUID.0xD.2 = 1138 cycles
CPUID.0xD.3 = 1362 cycles
CPUID.0xD.4 = 1068 cycles
CPUID.0xD.5 = 910 cycles
CPUID.0xD.6 = 914 cycles
CPUID.0xD.7 = 1350 cycles
CPUID.0xD.8 = 734 cycles
CPUID.0xD.9 = 766 cycles
CPUID.0xD.10 = 732 cycles
CPUID.0xD.11 = 718 cycles
CPUID.0xD.12 = 734 cycles
CPUID.0xD.13 = 1700 cycles
CPUID.0xD.14 = 1126 cycles
CPUID.0xD.15 = 898 cycles
CPUID.0xD.16 = 716 cycles
CPUID.0xD.17 = 748 cycles
CPUID.0xD.18 = 776 cycles
Note, updating runtime CPUID information multiple times per nested
transition is itself a flaw, especially since CPUID is a mandotory
intercept on both Intel and AMD. E.g. KVM doesn't need to ensure emulated
CPUID state is up-to-date while running L2. That flaw will be fixed in a
future patch, as deferring runtime CPUID updates is more subtle than it
appears at first glance, the benefits aren't super critical to have once
the XSAVE issue is resolved, and caching CPUID output is desirable even if
KVM's updates are deferred.
Cc: Jim Mattson <jmattson@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20241211013302.1347853-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-12-10 17:32:58 -08:00
|
|
|
void __init kvm_init_xstate_sizes(void);
|
2022-01-05 04:35:29 -08:00
|
|
|
u32 xstate_required_size(u64 xstate_bv, bool compacted);
|
|
|
|
|
2015-03-29 23:56:12 +03:00
|
|
|
int cpuid_query_maxphyaddr(struct kvm_vcpu *vcpu);
|
2024-10-30 12:00:38 -07:00
|
|
|
int cpuid_query_maxguestphyaddr(struct kvm_vcpu *vcpu);
|
2021-02-03 16:01:15 -08:00
|
|
|
u64 kvm_vcpu_reserved_gpa_bits_raw(struct kvm_vcpu *vcpu);
|
2015-03-29 23:56:12 +03:00
|
|
|
|
|
|
|
static inline int cpuid_maxphyaddr(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
return vcpu->arch.maxphyaddr;
|
|
|
|
}
|
2011-11-23 16:30:32 +02:00
|
|
|
|
2021-02-03 16:01:08 -08:00
|
|
|
static inline bool kvm_vcpu_is_legal_gpa(struct kvm_vcpu *vcpu, gpa_t gpa)
|
|
|
|
{
|
KVM: x86: SEV: Treat C-bit as legal GPA bit regardless of vCPU mode
Rename cr3_lm_rsvd_bits to reserved_gpa_bits, and use it for all GPA
legality checks. AMD's APM states:
If the C-bit is an address bit, this bit is masked from the guest
physical address when it is translated through the nested page tables.
Thus, any access that can conceivably be run through NPT should ignore
the C-bit when checking for validity.
For features that KVM emulates in software, e.g. MTRRs, there is no
clear direction in the APM for how the C-bit should be handled. For
such cases, follow the SME behavior inasmuch as possible, since SEV is
is essentially a VM-specific variant of SME. For SME, the APM states:
In this case the upper physical address bits are treated as reserved
when the feature is enabled except where otherwise indicated.
Collecting the various relavant SME snippets in the APM and cross-
referencing the omissions with Linux kernel code, this leaves MTTRs and
APIC_BASE as the only flows that KVM emulates that should _not_ ignore
the C-bit.
Note, this means the reserved bit checks in the page tables are
technically broken. This will be remedied in a future patch.
Although the page table checks are technically broken, in practice, it's
all but guaranteed to be irrelevant. NPT is required for SEV, i.e.
shadowing page tables isn't needed in the common case. Theoretically,
the checks could be in play for nested NPT, but it's extremely unlikely
that anyone is running nested VMs on SEV, as doing so would require L1
to expose sensitive data to L0, e.g. the entire VMCB. And if anyone is
running nested VMs, L0 can't read the guest's encrypted memory, i.e. L1
would need to put its NPT in shared memory, in which case the C-bit will
never be set. Or, L1 could use shadow paging, but again, if L0 needs to
read page tables, e.g. to load PDPTRs, the memory can't be encrypted if
L1 has any expectation of L0 doing the right thing.
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210204000117.3303214-8-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-03 16:01:12 -08:00
|
|
|
return !(gpa & vcpu->arch.reserved_gpa_bits);
|
2021-02-03 16:01:08 -08:00
|
|
|
}
|
|
|
|
|
2021-02-03 16:01:09 -08:00
|
|
|
static inline bool kvm_vcpu_is_legal_aligned_gpa(struct kvm_vcpu *vcpu,
|
|
|
|
gpa_t gpa, gpa_t alignment)
|
|
|
|
{
|
|
|
|
return IS_ALIGNED(gpa, alignment) && kvm_vcpu_is_legal_gpa(vcpu, gpa);
|
|
|
|
}
|
|
|
|
|
2021-02-03 16:01:08 -08:00
|
|
|
static inline bool page_address_valid(struct kvm_vcpu *vcpu, gpa_t gpa)
|
|
|
|
{
|
2021-02-03 16:01:09 -08:00
|
|
|
return kvm_vcpu_is_legal_aligned_gpa(vcpu, gpa, PAGE_SIZE);
|
2020-09-24 12:42:49 -07:00
|
|
|
}
|
|
|
|
|
2020-03-02 15:56:53 -08:00
|
|
|
static __always_inline void cpuid_entry_override(struct kvm_cpuid_entry2 *entry,
|
2021-04-20 18:08:50 -07:00
|
|
|
unsigned int leaf)
|
2020-03-02 15:56:32 -08:00
|
|
|
{
|
|
|
|
u32 *reg = cpuid_entry_get_reg(entry, leaf * 32);
|
|
|
|
|
KVM: x86: Introduce kvm_cpu_caps to replace runtime CPUID masking
Calculate the CPUID masks for KVM_GET_SUPPORTED_CPUID at load time using
what is effectively a KVM-adjusted copy of boot_cpu_data, or more
precisely, the x86_capability array in boot_cpu_data.
In terms of KVM support, the vast majority of CPUID feature bits are
constant, and *all* feature support is known at KVM load time. Rather
than apply boot_cpu_data, which is effectively read-only after init,
at runtime, copy it into a KVM-specific array and use *that* to mask
CPUID registers.
In additional to consolidating the masking, kvm_cpu_caps can be adjusted
by SVM/VMX at load time and thus eliminate all feature bit manipulation
in ->set_supported_cpuid().
Opportunistically clean up a few warts:
- Replace bare "unsigned" with "unsigned int" when a feature flag is
captured in a local variable, e.g. f_nx.
- Sort the CPUID masks by function, index and register (alphabetically
for registers, i.e. EBX comes before ECX/EDX).
- Remove the superfluous /* cpuid 7.0.ecx */ comments.
No functional change intended.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
[Call kvm_set_cpu_caps from kvm_x86_ops->hardware_setup due to fixed
GBPAGES patch. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-02 15:56:41 -08:00
|
|
|
BUILD_BUG_ON(leaf >= ARRAY_SIZE(kvm_cpu_caps));
|
2020-03-02 15:56:53 -08:00
|
|
|
*reg = kvm_cpu_caps[leaf];
|
2020-03-02 15:56:32 -08:00
|
|
|
}
|
|
|
|
|
KVM: x86: Replace (almost) all guest CPUID feature queries with cpu_caps
Switch all queries (except XSAVES) of guest features from guest CPUID to
guest capabilities, i.e. replace all calls to guest_cpuid_has() with calls
to guest_cpu_cap_has().
Keep guest_cpuid_has() around for XSAVES, but subsume its helper
guest_cpuid_get_register() and add a compile-time assertion to prevent
using guest_cpuid_has() for any other feature. Add yet another comment
for XSAVE to explain why KVM is allowed to query its raw guest CPUID.
Opportunistically drop the unused guest_cpuid_clear(), as there should be
no circumstance in which KVM needs to _clear_ a guest CPUID feature now
that everything is tracked via cpu_caps. E.g. KVM may need to _change_
a feature to emulate dynamic CPUID flags, but KVM should never need to
clear a feature in guest CPUID to prevent it from being used by the guest.
Delete the last remnants of the governed features framework, as the lone
holdout was vmx_adjust_secondary_exec_control()'s divergent behavior for
governed vs. ungoverned features.
Note, replacing guest_cpuid_has() checks with guest_cpu_cap_has() when
computing reserved CR4 bits is a nop when viewed as a whole, as KVM's
capabilities are already incorporated into the calculation, i.e. if a
feature is present in guest CPUID but unsupported by KVM, its CR4 bit
was already being marked as reserved, checking guest_cpu_cap_has() simply
double-stamps that it's a reserved bit.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20241128013424.4096668-51-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-27 17:34:17 -08:00
|
|
|
static __always_inline bool guest_cpuid_has(struct kvm_vcpu *vcpu,
|
|
|
|
unsigned int x86_feature)
|
2020-03-02 15:56:30 -08:00
|
|
|
{
|
|
|
|
const struct cpuid_reg cpuid = x86_feature_cpuid(x86_feature);
|
|
|
|
struct kvm_cpuid_entry2 *entry;
|
KVM: x86: Replace (almost) all guest CPUID feature queries with cpu_caps
Switch all queries (except XSAVES) of guest features from guest CPUID to
guest capabilities, i.e. replace all calls to guest_cpuid_has() with calls
to guest_cpu_cap_has().
Keep guest_cpuid_has() around for XSAVES, but subsume its helper
guest_cpuid_get_register() and add a compile-time assertion to prevent
using guest_cpuid_has() for any other feature. Add yet another comment
for XSAVE to explain why KVM is allowed to query its raw guest CPUID.
Opportunistically drop the unused guest_cpuid_clear(), as there should be
no circumstance in which KVM needs to _clear_ a guest CPUID feature now
that everything is tracked via cpu_caps. E.g. KVM may need to _change_
a feature to emulate dynamic CPUID flags, but KVM should never need to
clear a feature in guest CPUID to prevent it from being used by the guest.
Delete the last remnants of the governed features framework, as the lone
holdout was vmx_adjust_secondary_exec_control()'s divergent behavior for
governed vs. ungoverned features.
Note, replacing guest_cpuid_has() checks with guest_cpu_cap_has() when
computing reserved CR4 bits is a nop when viewed as a whole, as KVM's
capabilities are already incorporated into the calculation, i.e. if a
feature is present in guest CPUID but unsupported by KVM, its CR4 bit
was already being marked as reserved, checking guest_cpu_cap_has() simply
double-stamps that it's a reserved bit.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20241128013424.4096668-51-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-27 17:34:17 -08:00
|
|
|
u32 *reg;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* XSAVES is a special snowflake. Due to lack of a dedicated intercept
|
|
|
|
* on SVM, KVM must assume that XSAVES (and thus XRSTORS) is usable by
|
|
|
|
* the guest if the host supports XSAVES and *XSAVE* is exposed to the
|
|
|
|
* guest. Because the guest can execute XSAVES and XRSTORS, i.e. can
|
|
|
|
* indirectly consume XSS, KVM must ensure XSS is zeroed when running
|
|
|
|
* the guest, i.e. must set XSAVES in vCPU capabilities. But to reject
|
|
|
|
* direct XSS reads and writes (to minimize the virtualization hole and
|
|
|
|
* honor userspace's CPUID), KVM needs to check the raw guest CPUID,
|
|
|
|
* not KVM's view of guest capabilities.
|
|
|
|
*
|
|
|
|
* For all other features, guest capabilities are accurate. Expand
|
|
|
|
* this allowlist with extreme vigilance.
|
|
|
|
*/
|
|
|
|
BUILD_BUG_ON(x86_feature != X86_FEATURE_XSAVES);
|
2020-03-02 15:56:30 -08:00
|
|
|
|
2022-07-12 02:06:45 +02:00
|
|
|
entry = kvm_find_cpuid_entry_index(vcpu, cpuid.function, cpuid.index);
|
2020-03-02 15:56:30 -08:00
|
|
|
if (!entry)
|
|
|
|
return NULL;
|
|
|
|
|
KVM: x86: Replace (almost) all guest CPUID feature queries with cpu_caps
Switch all queries (except XSAVES) of guest features from guest CPUID to
guest capabilities, i.e. replace all calls to guest_cpuid_has() with calls
to guest_cpu_cap_has().
Keep guest_cpuid_has() around for XSAVES, but subsume its helper
guest_cpuid_get_register() and add a compile-time assertion to prevent
using guest_cpuid_has() for any other feature. Add yet another comment
for XSAVE to explain why KVM is allowed to query its raw guest CPUID.
Opportunistically drop the unused guest_cpuid_clear(), as there should be
no circumstance in which KVM needs to _clear_ a guest CPUID feature now
that everything is tracked via cpu_caps. E.g. KVM may need to _change_
a feature to emulate dynamic CPUID flags, but KVM should never need to
clear a feature in guest CPUID to prevent it from being used by the guest.
Delete the last remnants of the governed features framework, as the lone
holdout was vmx_adjust_secondary_exec_control()'s divergent behavior for
governed vs. ungoverned features.
Note, replacing guest_cpuid_has() checks with guest_cpu_cap_has() when
computing reserved CR4 bits is a nop when viewed as a whole, as KVM's
capabilities are already incorporated into the calculation, i.e. if a
feature is present in guest CPUID but unsupported by KVM, its CR4 bit
was already being marked as reserved, checking guest_cpu_cap_has() simply
double-stamps that it's a reserved bit.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20241128013424.4096668-51-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-27 17:34:17 -08:00
|
|
|
reg = __cpuid_entry_get_reg(entry, cpuid.reg);
|
2017-08-05 00:12:49 +02:00
|
|
|
if (!reg)
|
|
|
|
return false;
|
2014-01-24 16:48:44 +01:00
|
|
|
|
2019-12-17 13:32:42 -08:00
|
|
|
return *reg & __feature_bit(x86_feature);
|
2014-01-24 16:48:44 +01:00
|
|
|
}
|
|
|
|
|
KVM: x86: Snapshot if a vCPU's vendor model is AMD vs. Intel compatible
Add kvm_vcpu_arch.is_amd_compatible to cache if a vCPU's vendor model is
compatible with AMD, i.e. if the vCPU vendor is AMD or Hygon, along with
helpers to check if a vCPU is compatible AMD vs. Intel. To handle Intel
vs. AMD behavior related to masking the LVTPC entry, KVM will need to
check for vendor compatibility on every PMI injection, i.e. querying for
AMD will soon be a moderately hot path.
Note! This subtly (or maybe not-so-subtly) makes "Intel compatible" KVM's
default behavior, both if userspace omits (or never sets) CPUID 0x0 and if
userspace sets a completely unknown vendor. One could argue that KVM
should treat such vCPUs as not being compatible with Intel *or* AMD, but
that would add useless complexity to KVM.
KVM needs to do *something* in the face of vendor specific behavior, and
so unless KVM conjured up a magic third option, choosing to treat unknown
vendors as neither Intel nor AMD means that checks on AMD compatibility
would yield Intel behavior, and checks for Intel compatibility would yield
AMD behavior. And that's far worse as it would effectively yield random
behavior depending on whether KVM checked for AMD vs. Intel vs. !AMD vs.
!Intel. And practically speaking, all x86 CPUs follow either Intel or AMD
architecture, i.e. "supporting" an unknown third architecture adds no
value.
Deliberately don't convert any of the existing guest_cpuid_is_intel()
checks, as the Intel side of things is messier due to some flows explicitly
checking for exactly vendor==Intel, versus some flows assuming anything
that isn't "AMD compatible" gets Intel behavior. The Intel code will be
cleaned up in the future.
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240405235603.1173076-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-04-05 16:55:54 -07:00
|
|
|
static inline bool guest_cpuid_is_amd_compatible(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
return vcpu->arch.is_amd_compatible;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline bool guest_cpuid_is_intel_compatible(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
return !guest_cpuid_is_amd_compatible(vcpu);
|
|
|
|
}
|
|
|
|
|
2015-11-23 11:12:22 +01:00
|
|
|
static inline int guest_cpuid_family(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
struct kvm_cpuid_entry2 *best;
|
|
|
|
|
2022-07-12 02:06:45 +02:00
|
|
|
best = kvm_find_cpuid_entry(vcpu, 0x1);
|
2015-11-23 11:12:22 +01:00
|
|
|
if (!best)
|
|
|
|
return -1;
|
|
|
|
|
|
|
|
return x86_family(best->eax);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline int guest_cpuid_model(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
struct kvm_cpuid_entry2 *best;
|
|
|
|
|
2022-07-12 02:06:45 +02:00
|
|
|
best = kvm_find_cpuid_entry(vcpu, 0x1);
|
2015-11-23 11:12:22 +01:00
|
|
|
if (!best)
|
|
|
|
return -1;
|
|
|
|
|
|
|
|
return x86_model(best->eax);
|
|
|
|
}
|
|
|
|
|
2022-04-11 18:19:45 +08:00
|
|
|
static inline bool cpuid_model_is_consistent(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
return boot_cpu_data.x86_model == guest_cpuid_model(vcpu);
|
|
|
|
}
|
|
|
|
|
2015-11-23 11:12:22 +01:00
|
|
|
static inline int guest_cpuid_stepping(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
struct kvm_cpuid_entry2 *best;
|
|
|
|
|
2022-07-12 02:06:45 +02:00
|
|
|
best = kvm_find_cpuid_entry(vcpu, 0x1);
|
2015-11-23 11:12:22 +01:00
|
|
|
if (!best)
|
|
|
|
return -1;
|
|
|
|
|
|
|
|
return x86_stepping(best->eax);
|
|
|
|
}
|
|
|
|
|
2017-03-20 01:16:28 -07:00
|
|
|
static inline bool supports_cpuid_fault(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
return vcpu->arch.msr_platform_info & MSR_PLATFORM_INFO_CPUID_FAULT;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline bool cpuid_fault_enabled(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
return vcpu->arch.msr_misc_features_enables &
|
|
|
|
MSR_MISC_FEATURES_ENABLES_CPUID_FAULT;
|
|
|
|
}
|
|
|
|
|
KVM: x86: Introduce kvm_cpu_caps to replace runtime CPUID masking
Calculate the CPUID masks for KVM_GET_SUPPORTED_CPUID at load time using
what is effectively a KVM-adjusted copy of boot_cpu_data, or more
precisely, the x86_capability array in boot_cpu_data.
In terms of KVM support, the vast majority of CPUID feature bits are
constant, and *all* feature support is known at KVM load time. Rather
than apply boot_cpu_data, which is effectively read-only after init,
at runtime, copy it into a KVM-specific array and use *that* to mask
CPUID registers.
In additional to consolidating the masking, kvm_cpu_caps can be adjusted
by SVM/VMX at load time and thus eliminate all feature bit manipulation
in ->set_supported_cpuid().
Opportunistically clean up a few warts:
- Replace bare "unsigned" with "unsigned int" when a feature flag is
captured in a local variable, e.g. f_nx.
- Sort the CPUID masks by function, index and register (alphabetically
for registers, i.e. EBX comes before ECX/EDX).
- Remove the superfluous /* cpuid 7.0.ecx */ comments.
No functional change intended.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
[Call kvm_set_cpu_caps from kvm_x86_ops->hardware_setup due to fixed
GBPAGES patch. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-02 15:56:41 -08:00
|
|
|
static __always_inline void kvm_cpu_cap_clear(unsigned int x86_feature)
|
|
|
|
{
|
2021-04-12 16:21:35 +12:00
|
|
|
unsigned int x86_leaf = __feature_leaf(x86_feature);
|
KVM: x86: Introduce kvm_cpu_caps to replace runtime CPUID masking
Calculate the CPUID masks for KVM_GET_SUPPORTED_CPUID at load time using
what is effectively a KVM-adjusted copy of boot_cpu_data, or more
precisely, the x86_capability array in boot_cpu_data.
In terms of KVM support, the vast majority of CPUID feature bits are
constant, and *all* feature support is known at KVM load time. Rather
than apply boot_cpu_data, which is effectively read-only after init,
at runtime, copy it into a KVM-specific array and use *that* to mask
CPUID registers.
In additional to consolidating the masking, kvm_cpu_caps can be adjusted
by SVM/VMX at load time and thus eliminate all feature bit manipulation
in ->set_supported_cpuid().
Opportunistically clean up a few warts:
- Replace bare "unsigned" with "unsigned int" when a feature flag is
captured in a local variable, e.g. f_nx.
- Sort the CPUID masks by function, index and register (alphabetically
for registers, i.e. EBX comes before ECX/EDX).
- Remove the superfluous /* cpuid 7.0.ecx */ comments.
No functional change intended.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
[Call kvm_set_cpu_caps from kvm_x86_ops->hardware_setup due to fixed
GBPAGES patch. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-02 15:56:41 -08:00
|
|
|
|
|
|
|
kvm_cpu_caps[x86_leaf] &= ~__feature_bit(x86_feature);
|
|
|
|
}
|
|
|
|
|
|
|
|
static __always_inline void kvm_cpu_cap_set(unsigned int x86_feature)
|
|
|
|
{
|
2021-04-12 16:21:35 +12:00
|
|
|
unsigned int x86_leaf = __feature_leaf(x86_feature);
|
KVM: x86: Introduce kvm_cpu_caps to replace runtime CPUID masking
Calculate the CPUID masks for KVM_GET_SUPPORTED_CPUID at load time using
what is effectively a KVM-adjusted copy of boot_cpu_data, or more
precisely, the x86_capability array in boot_cpu_data.
In terms of KVM support, the vast majority of CPUID feature bits are
constant, and *all* feature support is known at KVM load time. Rather
than apply boot_cpu_data, which is effectively read-only after init,
at runtime, copy it into a KVM-specific array and use *that* to mask
CPUID registers.
In additional to consolidating the masking, kvm_cpu_caps can be adjusted
by SVM/VMX at load time and thus eliminate all feature bit manipulation
in ->set_supported_cpuid().
Opportunistically clean up a few warts:
- Replace bare "unsigned" with "unsigned int" when a feature flag is
captured in a local variable, e.g. f_nx.
- Sort the CPUID masks by function, index and register (alphabetically
for registers, i.e. EBX comes before ECX/EDX).
- Remove the superfluous /* cpuid 7.0.ecx */ comments.
No functional change intended.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
[Call kvm_set_cpu_caps from kvm_x86_ops->hardware_setup due to fixed
GBPAGES patch. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-02 15:56:41 -08:00
|
|
|
|
|
|
|
kvm_cpu_caps[x86_leaf] |= __feature_bit(x86_feature);
|
|
|
|
}
|
|
|
|
|
2020-03-02 15:56:46 -08:00
|
|
|
static __always_inline u32 kvm_cpu_cap_get(unsigned int x86_feature)
|
|
|
|
{
|
2021-04-12 16:21:35 +12:00
|
|
|
unsigned int x86_leaf = __feature_leaf(x86_feature);
|
2020-03-02 15:56:46 -08:00
|
|
|
|
|
|
|
return kvm_cpu_caps[x86_leaf] & __feature_bit(x86_feature);
|
|
|
|
}
|
|
|
|
|
|
|
|
static __always_inline bool kvm_cpu_cap_has(unsigned int x86_feature)
|
|
|
|
{
|
|
|
|
return !!kvm_cpu_cap_get(x86_feature);
|
|
|
|
}
|
|
|
|
|
2020-03-02 15:56:45 -08:00
|
|
|
static __always_inline void kvm_cpu_cap_check_and_set(unsigned int x86_feature)
|
|
|
|
{
|
|
|
|
if (boot_cpu_has(x86_feature))
|
|
|
|
kvm_cpu_cap_set(x86_feature);
|
|
|
|
}
|
|
|
|
|
2020-08-18 15:24:28 +00:00
|
|
|
static __always_inline bool guest_pv_has(struct kvm_vcpu *vcpu,
|
|
|
|
unsigned int kvm_feature)
|
|
|
|
{
|
|
|
|
if (!vcpu->arch.pv_cpuid.enforce)
|
|
|
|
return true;
|
|
|
|
|
|
|
|
return vcpu->arch.pv_cpuid.features & (1u << kvm_feature);
|
|
|
|
}
|
|
|
|
|
2024-11-27 17:34:06 -08:00
|
|
|
static __always_inline void guest_cpu_cap_set(struct kvm_vcpu *vcpu,
|
|
|
|
unsigned int x86_feature)
|
2023-08-15 13:36:39 -07:00
|
|
|
{
|
2024-11-27 17:34:07 -08:00
|
|
|
unsigned int x86_leaf = __feature_leaf(x86_feature);
|
2023-08-15 13:36:39 -07:00
|
|
|
|
2024-11-27 17:34:07 -08:00
|
|
|
vcpu->arch.cpu_caps[x86_leaf] |= __feature_bit(x86_feature);
|
2023-08-15 13:36:39 -07:00
|
|
|
}
|
|
|
|
|
2024-11-27 17:34:08 -08:00
|
|
|
static __always_inline void guest_cpu_cap_clear(struct kvm_vcpu *vcpu,
|
|
|
|
unsigned int x86_feature)
|
2023-08-15 13:36:39 -07:00
|
|
|
{
|
2024-11-27 17:34:08 -08:00
|
|
|
unsigned int x86_leaf = __feature_leaf(x86_feature);
|
|
|
|
|
|
|
|
vcpu->arch.cpu_caps[x86_leaf] &= ~__feature_bit(x86_feature);
|
|
|
|
}
|
|
|
|
|
|
|
|
static __always_inline void guest_cpu_cap_change(struct kvm_vcpu *vcpu,
|
|
|
|
unsigned int x86_feature,
|
|
|
|
bool guest_has_cap)
|
|
|
|
{
|
|
|
|
if (guest_has_cap)
|
2024-11-27 17:34:06 -08:00
|
|
|
guest_cpu_cap_set(vcpu, x86_feature);
|
2024-11-27 17:34:08 -08:00
|
|
|
else
|
|
|
|
guest_cpu_cap_clear(vcpu, x86_feature);
|
|
|
|
}
|
|
|
|
|
2024-11-27 17:34:06 -08:00
|
|
|
static __always_inline bool guest_cpu_cap_has(struct kvm_vcpu *vcpu,
|
|
|
|
unsigned int x86_feature)
|
2023-08-15 13:36:39 -07:00
|
|
|
{
|
2024-11-27 17:34:07 -08:00
|
|
|
unsigned int x86_leaf = __feature_leaf(x86_feature);
|
2023-08-15 13:36:39 -07:00
|
|
|
|
KVM: x86: Defer runtime updates of dynamic CPUID bits until CPUID emulation
Defer runtime CPUID updates until the next non-faulting CPUID emulation
or KVM_GET_CPUID2, which are the only paths in KVM that consume the
dynamic entries. Deferring the updates is especially beneficial to
nested VM-Enter/VM-Exit, as KVM will almost always detect multiple state
changes, not to mention the updates don't need to be realized while L2 is
active if CPUID is being intercepted by L1 (CPUID is a mandatory intercept
on Intel, but not AMD).
Deferring CPUID updates shaves several hundred cycles from nested VMX
roundtrips, as measured from L2 executing CPUID in a tight loop:
SKX 6850 => 6450
ICX 9000 => 8800
EMR 7900 => 7700
Alternatively, KVM could update only the CPUID leaves that are affected
by the state change, e.g. update XSAVE info only if XCR0 or XSS changes,
but that adds non-trivial complexity and doesn't solve the underlying
problem of nested transitions potentially changing both XCR0 and XSS, on
both nested VM-Enter and VM-Exit.
Skipping updates entirely if L2 is active and CPUID is being intercepted
by L1 could work for the common case. However, simply skipping updates if
L2 is active is *very* subtly dangerous and complex. Most KVM updates are
triggered by changes to the current vCPU state, which may be L2 state,
whereas performing updates only for L1 would requiring detecting changes
to L1 state. KVM would need to either track relevant L1 state, or defer
runtime CPUID updates until the next nested VM-Exit. The former is ugly
and complex, while the latter comes with similar dangers to deferring all
CPUID updates, and would only address the nested VM-Enter path.
To guard against using stale data, disallow querying dynamic CPUID feature
bits, i.e. features that KVM updates at runtime, via a compile-time
assertion in guest_cpu_cap_has(). Exempt MWAIT from the rule, as the
MISC_ENABLE_NO_MWAIT means that MWAIT is _conditionally_ a dynamic CPUID
feature.
Note, the rule could be enforced for MWAIT as well, e.g. by querying guest
CPUID in kvm_emulate_monitor_mwait, but there's no obvious advtantage to
doing so, and allowing MWAIT for guest_cpuid_has() opens up a different can
of worms. MONITOR/MWAIT can't be virtualized (for a reasonable definition),
and the nature of the MWAIT_NEVER_UD_FAULTS and MISC_ENABLE_NO_MWAIT quirks
means checking X86_FEATURE_MWAIT outside of kvm_emulate_monitor_mwait() is
wrong for other reasons.
Beyond the aforementioned feature bits, the only other dynamic CPUID
(sub)leaves are the XSAVE sizes, and similar to MWAIT, consuming those
CPUID entries in KVM is all but guaranteed to be a bug. The layout for an
actual XSAVE buffer depends on the format (compacted or not) and
potentially the features that are actually enabled. E.g. see the logic in
fpstate_clear_xstate_component() needed to poke into the guest's effective
XSAVE state to clear MPX state on INIT. KVM does consume
CPUID.0xD.0.{EAX,EDX} in kvm_check_cpuid() and cpuid_get_supported_xcr0(),
but not EBX, which is the only dynamic output register in the leaf.
Link: https://lore.kernel.org/r/20241211013302.1347853-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-10 17:33:02 -08:00
|
|
|
/*
|
|
|
|
* Except for MWAIT, querying dynamic feature bits is disallowed, so
|
|
|
|
* that KVM can defer runtime updates until the next CPUID emulation.
|
|
|
|
*/
|
|
|
|
BUILD_BUG_ON(x86_feature == X86_FEATURE_APIC ||
|
|
|
|
x86_feature == X86_FEATURE_OSXSAVE ||
|
|
|
|
x86_feature == X86_FEATURE_OSPKE);
|
|
|
|
|
2024-11-27 17:34:07 -08:00
|
|
|
return vcpu->arch.cpu_caps[x86_leaf] & __feature_bit(x86_feature);
|
2023-08-15 13:36:39 -07:00
|
|
|
}
|
|
|
|
|
2023-09-13 20:42:17 +08:00
|
|
|
static inline bool kvm_vcpu_is_legal_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
|
|
|
|
{
|
2024-11-27 17:34:06 -08:00
|
|
|
if (guest_cpu_cap_has(vcpu, X86_FEATURE_LAM))
|
KVM: x86: Virtualize LAM for user pointer
Add support to allow guests to set the new CR3 control bits for Linear
Address Masking (LAM) and add implementation to get untagged address for
user pointers.
LAM modifies the canonical check for 64-bit linear addresses, allowing
software to use the masked/ignored address bits for metadata. Hardware
masks off the metadata bits before using the linear addresses to access
memory. LAM uses two new CR3 non-address bits, LAM_U48 (bit 62) and
LAM_U57 (bit 61), to configure LAM for user pointers. LAM also changes
VMENTER to allow both bits to be set in VMCS's HOST_CR3 and GUEST_CR3 for
virtualization.
When EPT is on, CR3 is not trapped by KVM and it's up to the guest to set
any of the two LAM control bits. However, when EPT is off, the actual CR3
used by the guest is generated from the shadow MMU root which is different
from the CR3 that is *set* by the guest, and KVM needs to manually apply
any active control bits to VMCS's GUEST_CR3 based on the cached CR3 *seen*
by the guest.
KVM manually checks guest's CR3 to make sure it points to a valid guest
physical address (i.e. to support smaller MAXPHYSADDR in the guest). Extend
this check to allow the two LAM control bits to be set. After check, LAM
bits of guest CR3 will be stripped off to extract guest physical address.
In case of nested, for a guest which supports LAM, both VMCS12's HOST_CR3
and GUEST_CR3 are allowed to have the new LAM control bits set, i.e. when
L0 enters L1 to emulate a VMEXIT from L2 to L1 or when L0 enters L2
directly. KVM also manually checks VMCS12's HOST_CR3 and GUEST_CR3 being
valid physical address. Extend such check to allow the new LAM control bits
too.
Note, LAM doesn't have a global control bit to turn on/off LAM completely,
but purely depends on hardware's CPUID to determine it can be enabled or
not. That means, when EPT is on, even when KVM doesn't expose LAM to guest,
the guest can still set LAM control bits in CR3 w/o causing problem. This
is an unfortunate virtualization hole. KVM could choose to intercept CR3 in
this case and inject fault but this would hurt performance when running a
normal VM w/o LAM support. This is undesirable. Just choose to let the
guest do such illegal thing as the worst case is guest being killed when
KVM eventually find out such illegal behaviour and that the guest is
misbehaving.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Tested-by: Xuelian Guo <xuelian.guo@intel.com>
Link: https://lore.kernel.org/r/20230913124227.12574-12-binbin.wu@linux.intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-09-13 20:42:22 +08:00
|
|
|
cr3 &= ~(X86_CR3_LAM_U48 | X86_CR3_LAM_U57);
|
|
|
|
|
2023-09-13 20:42:17 +08:00
|
|
|
return kvm_vcpu_is_legal_gpa(vcpu, cr3);
|
|
|
|
}
|
|
|
|
|
2024-11-27 17:34:16 -08:00
|
|
|
static inline bool guest_has_spec_ctrl_msr(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
KVM: x86: Replace (almost) all guest CPUID feature queries with cpu_caps
Switch all queries (except XSAVES) of guest features from guest CPUID to
guest capabilities, i.e. replace all calls to guest_cpuid_has() with calls
to guest_cpu_cap_has().
Keep guest_cpuid_has() around for XSAVES, but subsume its helper
guest_cpuid_get_register() and add a compile-time assertion to prevent
using guest_cpuid_has() for any other feature. Add yet another comment
for XSAVE to explain why KVM is allowed to query its raw guest CPUID.
Opportunistically drop the unused guest_cpuid_clear(), as there should be
no circumstance in which KVM needs to _clear_ a guest CPUID feature now
that everything is tracked via cpu_caps. E.g. KVM may need to _change_
a feature to emulate dynamic CPUID flags, but KVM should never need to
clear a feature in guest CPUID to prevent it from being used by the guest.
Delete the last remnants of the governed features framework, as the lone
holdout was vmx_adjust_secondary_exec_control()'s divergent behavior for
governed vs. ungoverned features.
Note, replacing guest_cpuid_has() checks with guest_cpu_cap_has() when
computing reserved CR4 bits is a nop when viewed as a whole, as KVM's
capabilities are already incorporated into the calculation, i.e. if a
feature is present in guest CPUID but unsupported by KVM, its CR4 bit
was already being marked as reserved, checking guest_cpu_cap_has() simply
double-stamps that it's a reserved bit.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20241128013424.4096668-51-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-27 17:34:17 -08:00
|
|
|
return (guest_cpu_cap_has(vcpu, X86_FEATURE_SPEC_CTRL) ||
|
|
|
|
guest_cpu_cap_has(vcpu, X86_FEATURE_AMD_STIBP) ||
|
|
|
|
guest_cpu_cap_has(vcpu, X86_FEATURE_AMD_IBRS) ||
|
|
|
|
guest_cpu_cap_has(vcpu, X86_FEATURE_AMD_SSBD));
|
2024-11-27 17:34:16 -08:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline bool guest_has_pred_cmd_msr(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
KVM: x86: Replace (almost) all guest CPUID feature queries with cpu_caps
Switch all queries (except XSAVES) of guest features from guest CPUID to
guest capabilities, i.e. replace all calls to guest_cpuid_has() with calls
to guest_cpu_cap_has().
Keep guest_cpuid_has() around for XSAVES, but subsume its helper
guest_cpuid_get_register() and add a compile-time assertion to prevent
using guest_cpuid_has() for any other feature. Add yet another comment
for XSAVE to explain why KVM is allowed to query its raw guest CPUID.
Opportunistically drop the unused guest_cpuid_clear(), as there should be
no circumstance in which KVM needs to _clear_ a guest CPUID feature now
that everything is tracked via cpu_caps. E.g. KVM may need to _change_
a feature to emulate dynamic CPUID flags, but KVM should never need to
clear a feature in guest CPUID to prevent it from being used by the guest.
Delete the last remnants of the governed features framework, as the lone
holdout was vmx_adjust_secondary_exec_control()'s divergent behavior for
governed vs. ungoverned features.
Note, replacing guest_cpuid_has() checks with guest_cpu_cap_has() when
computing reserved CR4 bits is a nop when viewed as a whole, as KVM's
capabilities are already incorporated into the calculation, i.e. if a
feature is present in guest CPUID but unsupported by KVM, its CR4 bit
was already being marked as reserved, checking guest_cpu_cap_has() simply
double-stamps that it's a reserved bit.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20241128013424.4096668-51-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-27 17:34:17 -08:00
|
|
|
return (guest_cpu_cap_has(vcpu, X86_FEATURE_SPEC_CTRL) ||
|
|
|
|
guest_cpu_cap_has(vcpu, X86_FEATURE_AMD_IBPB) ||
|
|
|
|
guest_cpu_cap_has(vcpu, X86_FEATURE_SBPB));
|
2024-11-27 17:34:16 -08:00
|
|
|
}
|
|
|
|
|
2011-11-23 16:30:32 +02:00
|
|
|
#endif
|