KVM: VMX: Move out vmx_x86_ops to 'main.c' to dispatch VMX and TDX
KVM accesses Virtual Machine Control Structure (VMCS) with VMX instructions
to operate on VM. TDX doesn't allow VMM to operate VMCS directly.
Instead, TDX has its own data structures, and TDX SEAMCALL APIs for VMM to
indirectly operate those data structures. This means we must have a TDX
version of kvm_x86_ops.
The existing global struct kvm_x86_ops already defines an interface which
can be adapted to TDX, but kvm_x86_ops is a system-wide, not per-VM
structure. To allow VMX to coexist with TDs, the kvm_x86_ops callbacks
will have wrappers "if (tdx) tdx_op() else vmx_op()" to pick VMX or
TDX at run time.
To split the runtime switch, the VMX implementation, and the TDX
implementation, add main.c, and move out the vmx_x86_ops hooks in
preparation for adding TDX. Use 'vt' for the naming scheme as a nod to
VT-x and as a concatenation of VmxTdx.
The eventually converted code will look like this:
vmx.c:
vmx_op() { ... }
VMX initialization
tdx.c:
tdx_op() { ... }
TDX initialization
x86_ops.h:
vmx_op();
tdx_op();
main.c:
static vt_op() { if (tdx) tdx_op() else vmx_op() }
static struct kvm_x86_ops vt_x86_ops = {
.op = vt_op,
initialization functions call both VMX and TDX initialization
Opportunistically, fix the name inconsistency from vmx_create_vcpu() and
vmx_free_vcpu() to vmx_vcpu_create() and vmx_vcpu_free().
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Yuan Yao <yuan.yao@intel.com>
Message-Id: <e603c317587f933a9d1bee8728c84e4935849c16.1705965634.git.isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-03-18 11:39:16 -04:00
|
|
|
// SPDX-License-Identifier: GPL-2.0
|
|
|
|
#include <linux/moduleparam.h>
|
|
|
|
|
|
|
|
#include "x86_ops.h"
|
|
|
|
#include "vmx.h"
|
2024-11-12 15:37:20 +08:00
|
|
|
#include "mmu.h"
|
KVM: VMX: Move out vmx_x86_ops to 'main.c' to dispatch VMX and TDX
KVM accesses Virtual Machine Control Structure (VMCS) with VMX instructions
to operate on VM. TDX doesn't allow VMM to operate VMCS directly.
Instead, TDX has its own data structures, and TDX SEAMCALL APIs for VMM to
indirectly operate those data structures. This means we must have a TDX
version of kvm_x86_ops.
The existing global struct kvm_x86_ops already defines an interface which
can be adapted to TDX, but kvm_x86_ops is a system-wide, not per-VM
structure. To allow VMX to coexist with TDs, the kvm_x86_ops callbacks
will have wrappers "if (tdx) tdx_op() else vmx_op()" to pick VMX or
TDX at run time.
To split the runtime switch, the VMX implementation, and the TDX
implementation, add main.c, and move out the vmx_x86_ops hooks in
preparation for adding TDX. Use 'vt' for the naming scheme as a nod to
VT-x and as a concatenation of VmxTdx.
The eventually converted code will look like this:
vmx.c:
vmx_op() { ... }
VMX initialization
tdx.c:
tdx_op() { ... }
TDX initialization
x86_ops.h:
vmx_op();
tdx_op();
main.c:
static vt_op() { if (tdx) tdx_op() else vmx_op() }
static struct kvm_x86_ops vt_x86_ops = {
.op = vt_op,
initialization functions call both VMX and TDX initialization
Opportunistically, fix the name inconsistency from vmx_create_vcpu() and
vmx_free_vcpu() to vmx_vcpu_create() and vmx_vcpu_free().
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Yuan Yao <yuan.yao@intel.com>
Message-Id: <e603c317587f933a9d1bee8728c84e4935849c16.1705965634.git.isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-03-18 11:39:16 -04:00
|
|
|
#include "nested.h"
|
|
|
|
#include "pmu.h"
|
ARM:
* Move a lot of state that was previously stored on a per vcpu
basis into a per-CPU area, because it is only pertinent to the
host while the vcpu is loaded. This results in better state
tracking, and a smaller vcpu structure.
* Add full handling of the ERET/ERETAA/ERETAB instructions in
nested virtualisation. The last two instructions also require
emulating part of the pointer authentication extension.
As a result, the trap handling of pointer authentication has
been greatly simplified.
* Turn the global (and not very scalable) LPI translation cache
into a per-ITS, scalable cache, making non directly injected
LPIs much cheaper to make visible to the vcpu.
* A batch of pKVM patches, mostly fixes and cleanups, as the
upstreaming process seems to be resuming. Fingers crossed!
* Allocate PPIs and SGIs outside of the vcpu structure, allowing
for smaller EL2 mapping and some flexibility in implementing
more or less than 32 private IRQs.
* Purge stale mpidr_data if a vcpu is created after the MPIDR
map has been created.
* Preserve vcpu-specific ID registers across a vcpu reset.
* Various minor cleanups and improvements.
LoongArch:
* Add ParaVirt IPI support.
* Add software breakpoint support.
* Add mmio trace events support.
RISC-V:
* Support guest breakpoints using ebreak
* Introduce per-VCPU mp_state_lock and reset_cntx_lock
* Virtualize SBI PMU snapshot and counter overflow interrupts
* New selftests for SBI PMU and Guest ebreak
* Some preparatory work for both TDX and SNP page fault handling.
This also cleans up the page fault path, so that the priorities
of various kinds of fauls (private page, no memory, write
to read-only slot, etc.) are easier to follow.
x86:
* Minimize amount of time that shadow PTEs remain in the special
REMOVED_SPTE state. This is a state where the mmu_lock is held for
reading but concurrent accesses to the PTE have to spin; shortening
its use allows other vCPUs to repopulate the zapped region while
the zapper finishes tearing down the old, defunct page tables.
* Advertise the max mappable GPA in the "guest MAXPHYADDR" CPUID field,
which is defined by hardware but left for software use. This lets KVM
communicate its inability to map GPAs that set bits 51:48 on hosts
without 5-level nested page tables. Guest firmware is expected to
use the information when mapping BARs; this avoids that they end up at
a legal, but unmappable, GPA.
* Fixed a bug where KVM would not reject accesses to MSR that aren't
supposed to exist given the vCPU model and/or KVM configuration.
* As usual, a bunch of code cleanups.
x86 (AMD):
* Implement a new and improved API to initialize SEV and SEV-ES VMs, which
will also be extendable to SEV-SNP. The new API specifies the desired
encryption in KVM_CREATE_VM and then separately initializes the VM.
The new API also allows customizing the desired set of VMSA features;
the features affect the measurement of the VM's initial state, and
therefore enabling them cannot be done tout court by the hypervisor.
While at it, the new API includes two bugfixes that couldn't be
applied to the old one without a flag day in userspace or without
affecting the initial measurement. When a SEV-ES VM is created with
the new VM type, KVM_GET_REGS/KVM_SET_REGS and friends are
rejected once the VMSA has been encrypted. Also, the FPU and AVX
state will be synchronized and encrypted too.
* Support for GHCB version 2 as applicable to SEV-ES guests. This, once
more, is only accessible when using the new KVM_SEV_INIT2 flow for
initialization of SEV-ES VMs.
x86 (Intel):
* An initial bunch of prerequisite patches for Intel TDX were merged.
They generally don't do anything interesting. The only somewhat user
visible change is a new debugging mode that checks that KVM's MMU
never triggers a #VE virtualization exception in the guest.
* Clear vmcs.EXIT_QUALIFICATION when synthesizing an EPT Misconfig VM-Exit to
L1, as per the SDM.
Generic:
* Use vfree() instead of kvfree() for allocations that always use vcalloc()
or __vcalloc().
* Remove .change_pte() MMU notifier - the changes to non-KVM code are
small and Andrew Morton asked that I also take those through the KVM
tree. The callback was only ever implemented by KVM (which was also the
original user of MMU notifiers) but it had been nonfunctional ever since
calls to set_pte_at_notify were wrapped with invalidate_range_start
and invalidate_range_end... in 2012.
Selftests:
* Enhance the demand paging test to allow for better reporting and stressing
of UFFD performance.
* Convert the steal time test to generate TAP-friendly output.
* Fix a flaky false positive in the xen_shinfo_test due to comparing elapsed
time across two different clock domains.
* Skip the MONITOR/MWAIT test if the host doesn't actually support MWAIT.
* Avoid unnecessary use of "sudo" in the NX hugepage test wrapper shell
script, to play nice with running in a minimal userspace environment.
* Allow skipping the RSEQ test's sanity check that the vCPU was able to
complete a reasonable number of KVM_RUNs, as the assert can fail on a
completely valid setup. If the test is run on a large-ish system that is
otherwise idle, and the test isn't affined to a low-ish number of CPUs, the
vCPU task can be repeatedly migrated to CPUs that are in deep sleep states,
which results in the vCPU having very little net runtime before the next
migration due to high wakeup latencies.
* Define _GNU_SOURCE for all selftests to fix a warning that was introduced by
a change to kselftest_harness.h late in the 6.9 cycle, and because forcing
every test to #define _GNU_SOURCE is painful.
* Provide a global pseudo-RNG instance for all tests, so that library code can
generate random, but determinstic numbers.
* Use the global pRNG to randomly force emulation of select writes from guest
code on x86, e.g. to help validate KVM's emulation of locked accesses.
* Allocate and initialize x86's GDT, IDT, TSS, segments, and default exception
handlers at VM creation, instead of forcing tests to manually trigger the
related setup.
Documentation:
* Fix a goof in the KVM_CREATE_GUEST_MEMFD documentation.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmZE878UHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroOukQf+LcvZsWtrC7Wd5K9SQbYXaS4Rk6P6
JHoQW2d0hUN893J2WibEw+l1J/0vn5JumqHXyZgJ7CbaMtXkWWQTwDSDLuURUKpv
XNB3Sb17G87NH+s1tOh0tA9h5upbtlHVHvrtIwdbb9+XHgQ6HTL4uk+HdfO/p9fW
cWBEZAKoWcCIa99Numv3pmq5vdrvBlNggwBugBS8TH69EKMw+V1Vu1SFkIdNDTQk
NJJ28cohoP3wnwlIHaXSmU4RujipPH3Lm/xupyA5MwmzO713eq2yUqV49jzhD5/I
MA4Ruvgrdm4wpp89N9lQMyci91u6q7R9iZfMu0tSg2qYI3UPKIdstd8sOA==
=2lED
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"ARM:
- Move a lot of state that was previously stored on a per vcpu basis
into a per-CPU area, because it is only pertinent to the host while
the vcpu is loaded. This results in better state tracking, and a
smaller vcpu structure.
- Add full handling of the ERET/ERETAA/ERETAB instructions in nested
virtualisation. The last two instructions also require emulating
part of the pointer authentication extension. As a result, the trap
handling of pointer authentication has been greatly simplified.
- Turn the global (and not very scalable) LPI translation cache into
a per-ITS, scalable cache, making non directly injected LPIs much
cheaper to make visible to the vcpu.
- A batch of pKVM patches, mostly fixes and cleanups, as the
upstreaming process seems to be resuming. Fingers crossed!
- Allocate PPIs and SGIs outside of the vcpu structure, allowing for
smaller EL2 mapping and some flexibility in implementing more or
less than 32 private IRQs.
- Purge stale mpidr_data if a vcpu is created after the MPIDR map has
been created.
- Preserve vcpu-specific ID registers across a vcpu reset.
- Various minor cleanups and improvements.
LoongArch:
- Add ParaVirt IPI support
- Add software breakpoint support
- Add mmio trace events support
RISC-V:
- Support guest breakpoints using ebreak
- Introduce per-VCPU mp_state_lock and reset_cntx_lock
- Virtualize SBI PMU snapshot and counter overflow interrupts
- New selftests for SBI PMU and Guest ebreak
- Some preparatory work for both TDX and SNP page fault handling.
This also cleans up the page fault path, so that the priorities of
various kinds of fauls (private page, no memory, write to read-only
slot, etc.) are easier to follow.
x86:
- Minimize amount of time that shadow PTEs remain in the special
REMOVED_SPTE state.
This is a state where the mmu_lock is held for reading but
concurrent accesses to the PTE have to spin; shortening its use
allows other vCPUs to repopulate the zapped region while the zapper
finishes tearing down the old, defunct page tables.
- Advertise the max mappable GPA in the "guest MAXPHYADDR" CPUID
field, which is defined by hardware but left for software use.
This lets KVM communicate its inability to map GPAs that set bits
51:48 on hosts without 5-level nested page tables. Guest firmware
is expected to use the information when mapping BARs; this avoids
that they end up at a legal, but unmappable, GPA.
- Fixed a bug where KVM would not reject accesses to MSR that aren't
supposed to exist given the vCPU model and/or KVM configuration.
- As usual, a bunch of code cleanups.
x86 (AMD):
- Implement a new and improved API to initialize SEV and SEV-ES VMs,
which will also be extendable to SEV-SNP.
The new API specifies the desired encryption in KVM_CREATE_VM and
then separately initializes the VM. The new API also allows
customizing the desired set of VMSA features; the features affect
the measurement of the VM's initial state, and therefore enabling
them cannot be done tout court by the hypervisor.
While at it, the new API includes two bugfixes that couldn't be
applied to the old one without a flag day in userspace or without
affecting the initial measurement. When a SEV-ES VM is created with
the new VM type, KVM_GET_REGS/KVM_SET_REGS and friends are rejected
once the VMSA has been encrypted. Also, the FPU and AVX state will
be synchronized and encrypted too.
- Support for GHCB version 2 as applicable to SEV-ES guests.
This, once more, is only accessible when using the new
KVM_SEV_INIT2 flow for initialization of SEV-ES VMs.
x86 (Intel):
- An initial bunch of prerequisite patches for Intel TDX were merged.
They generally don't do anything interesting. The only somewhat
user visible change is a new debugging mode that checks that KVM's
MMU never triggers a #VE virtualization exception in the guest.
- Clear vmcs.EXIT_QUALIFICATION when synthesizing an EPT Misconfig
VM-Exit to L1, as per the SDM.
Generic:
- Use vfree() instead of kvfree() for allocations that always use
vcalloc() or __vcalloc().
- Remove .change_pte() MMU notifier - the changes to non-KVM code are
small and Andrew Morton asked that I also take those through the
KVM tree.
The callback was only ever implemented by KVM (which was also the
original user of MMU notifiers) but it had been nonfunctional ever
since calls to set_pte_at_notify were wrapped with
invalidate_range_start and invalidate_range_end... in 2012.
Selftests:
- Enhance the demand paging test to allow for better reporting and
stressing of UFFD performance.
- Convert the steal time test to generate TAP-friendly output.
- Fix a flaky false positive in the xen_shinfo_test due to comparing
elapsed time across two different clock domains.
- Skip the MONITOR/MWAIT test if the host doesn't actually support
MWAIT.
- Avoid unnecessary use of "sudo" in the NX hugepage test wrapper
shell script, to play nice with running in a minimal userspace
environment.
- Allow skipping the RSEQ test's sanity check that the vCPU was able
to complete a reasonable number of KVM_RUNs, as the assert can fail
on a completely valid setup.
If the test is run on a large-ish system that is otherwise idle,
and the test isn't affined to a low-ish number of CPUs, the vCPU
task can be repeatedly migrated to CPUs that are in deep sleep
states, which results in the vCPU having very little net runtime
before the next migration due to high wakeup latencies.
- Define _GNU_SOURCE for all selftests to fix a warning that was
introduced by a change to kselftest_harness.h late in the 6.9
cycle, and because forcing every test to #define _GNU_SOURCE is
painful.
- Provide a global pseudo-RNG instance for all tests, so that library
code can generate random, but determinstic numbers.
- Use the global pRNG to randomly force emulation of select writes
from guest code on x86, e.g. to help validate KVM's emulation of
locked accesses.
- Allocate and initialize x86's GDT, IDT, TSS, segments, and default
exception handlers at VM creation, instead of forcing tests to
manually trigger the related setup.
Documentation:
- Fix a goof in the KVM_CREATE_GUEST_MEMFD documentation"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (225 commits)
selftests/kvm: remove dead file
KVM: selftests: arm64: Test vCPU-scoped feature ID registers
KVM: selftests: arm64: Test that feature ID regs survive a reset
KVM: selftests: arm64: Store expected register value in set_id_regs
KVM: selftests: arm64: Rename helper in set_id_regs to imply VM scope
KVM: arm64: Only reset vCPU-scoped feature ID regs once
KVM: arm64: Reset VM feature ID regs from kvm_reset_sys_regs()
KVM: arm64: Rename is_id_reg() to imply VM scope
KVM: arm64: Destroy mpidr_data for 'late' vCPU creation
KVM: arm64: Use hVHE in pKVM by default on CPUs with VHE support
KVM: arm64: Fix hvhe/nvhe early alias parsing
KVM: SEV: Allow per-guest configuration of GHCB protocol version
KVM: SEV: Add GHCB handling for termination requests
KVM: SEV: Add GHCB handling for Hypervisor Feature Support requests
KVM: SEV: Add support to handle AP reset MSR protocol
KVM: x86: Explicitly zero kvm_caps during vendor module load
KVM: x86: Fully re-initialize supported_mce_cap on vendor module load
KVM: x86: Fully re-initialize supported_vm_types on vendor module load
KVM: x86/mmu: Sanity check that __kvm_faultin_pfn() doesn't create noslot pfns
KVM: x86/mmu: Initialize kvm_page_fault's pfn and hva to error values
...
2024-05-15 14:46:43 -07:00
|
|
|
#include "posted_intr.h"
|
KVM: VMX: Initialize TDX during KVM module load
Before KVM can use TDX to create and run TDX guests, TDX needs to be
initialized from two perspectives: 1) TDX module must be initialized
properly to a working state; 2) A per-cpu TDX initialization, a.k.a the
TDH.SYS.LP.INIT SEAMCALL must be done on any logical cpu before it can
run any other TDX SEAMCALLs.
The TDX host core-kernel provides two functions to do the above two
respectively: tdx_enable() and tdx_cpu_enable().
There are two options in terms of when to initialize TDX: initialize TDX
at KVM module loading time, or when creating the first TDX guest.
Choose to initialize TDX during KVM module loading time:
Initializing TDX module is both memory and CPU time consuming: 1) the
kernel needs to allocate a non-trivial size(~1/256) of system memory
as metadata used by TDX module to track each TDX-usable memory page's
status; 2) the TDX module needs to initialize this metadata, one entry
for each TDX-usable memory page.
Also, the kernel uses alloc_contig_pages() to allocate those metadata
chunks, because they are large and need to be physically contiguous.
alloc_contig_pages() can fail. If initializing TDX when creating the
first TDX guest, then there's chance that KVM won't be able to run any
TDX guests albeit KVM _declares_ to be able to support TDX.
This isn't good for the user.
On the other hand, initializing TDX at KVM module loading time can make
sure KVM is providing a consistent view of whether KVM can support TDX
to the user.
Always only try to initialize TDX after VMX has been initialized. TDX
is based on VMX, and if VMX fails to initialize then TDX is likely to be
broken anyway. Also, in practice, supporting TDX will require part of
VMX and common x86 infrastructure in working order, so TDX cannot be
enabled alone w/o VMX support.
There are two cases that can result in failure to initialize TDX: 1) TDX
cannot be supported (e.g., because of TDX is not supported or enabled by
hardware, or module is not loaded, or missing some dependency in KVM's
configuration); 2) Any unexpected error during TDX bring-up. For the
first case only mark TDX is disabled but still allow KVM module to be
loaded. For the second case just fail to load the KVM module so that
the user can be aware.
Because TDX costs additional memory, don't enable TDX by default. Add a
new module parameter 'enable_tdx' to allow the user to opt-in.
Note, the name tdx_init() has already been taken by the early boot code.
Use tdx_bringup() for initializing TDX (and tdx_cleanup() since KVM
doesn't actually teardown TDX). They don't match vt_init()/vt_exit(),
vmx_init()/vmx_exit() etc but it's not end of the world.
Also, once initialized, the TDX module cannot be disabled and enabled
again w/o the TDX module runtime update, which isn't supported by the
kernel. After TDX is enabled, nothing needs to be done when KVM
disables hardware virtualization, e.g., when offlining CPU, or during
suspend/resume. TDX host core-kernel code internally tracks TDX status
and can handle "multiple enabling" scenario.
Similar to KVM_AMD_SEV, add a new KVM_INTEL_TDX Kconfig to guide KVM TDX
code. Make it depend on INTEL_TDX_HOST but not replace INTEL_TDX_HOST
because in the longer term there's a use case that requires making
SEAMCALLs w/o KVM as mentioned by Dan [1].
Link: https://lore.kernel.org/6723fc2070a96_60c3294dc@dwillia2-mobl3.amr.corp.intel.com.notmuch/ [1]
Signed-off-by: Kai Huang <kai.huang@intel.com>
Message-ID: <162f9dee05c729203b9ad6688db1ca2960b4b502.1731664295.git.kai.huang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-11-15 22:52:41 +13:00
|
|
|
#include "tdx.h"
|
KVM: TDX: Support per-VM KVM_CAP_MAX_VCPUS extension check
Change to report the KVM_CAP_MAX_VCPUS extension from globally to per-VM
to allow userspace to be able to query maximum vCPUs for TDX guest via
checking the KVM_CAP_MAX_VCPU extension on per-VM basis.
Today KVM x86 reports KVM_MAX_VCPUS as guest's maximum vCPUs for all
guests globally, and userspace, i.e. Qemu, queries the KVM_MAX_VCPUS
extension globally but not on per-VM basis.
TDX has its own limit of maximum vCPUs it can support for all TDX guests
in addition to KVM_MAX_VCPUS. TDX module reports this limit via the
MAX_VCPU_PER_TD global metadata. Different modules may report different
values. In practice, the reported value reflects the maximum logical
CPUs that ALL the platforms that the module supports can possibly have.
Note some old modules may also not support this metadata, in which case
the limit is U16_MAX.
The current way to always report KVM_MAX_VCPUS in the KVM_CAP_MAX_VCPUS
extension is not enough for TDX. To accommodate TDX, change to report
the KVM_CAP_MAX_VCPUS extension on per-VM basis.
Specifically, override kvm->max_vcpus in tdx_vm_init() for TDX guest,
and report kvm->max_vcpus in the KVM_CAP_MAX_VCPUS extension check.
Change to report "the number of logical CPUs the platform has" as the
maximum vCPUs for TDX guest. Simply forwarding the MAX_VCPU_PER_TD
reported by the TDX module would result in an unpredictable ABI because
the reported value to userspace would be depending on whims of TDX
modules.
This works in practice because of the MAX_VCPU_PER_TD reported by the
TDX module will never be smaller than the one reported to userspace.
But to make sure KVM never reports an unsupported value, sanity check
the MAX_VCPU_PER_TD reported by TDX module is not smaller than the
number of logical CPUs the platform has, otherwise refuse to use TDX.
Note, when creating a TDX guest, TDX actually requires the "maximum
vCPUs for _this_ TDX guest" as an input to initialize the TDX guest.
But TDX guest's maximum vCPUs is not part of TDREPORT thus not part of
attestation, thus there's no need to allow userspace to explicitly
_configure_ the maximum vCPUs on per-VM basis. KVM will simply use
kvm->max_vcpus as input when initializing the TDX guest.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-30 12:00:31 -07:00
|
|
|
#include "tdx_arch.h"
|
KVM: VMX: Move out vmx_x86_ops to 'main.c' to dispatch VMX and TDX
KVM accesses Virtual Machine Control Structure (VMCS) with VMX instructions
to operate on VM. TDX doesn't allow VMM to operate VMCS directly.
Instead, TDX has its own data structures, and TDX SEAMCALL APIs for VMM to
indirectly operate those data structures. This means we must have a TDX
version of kvm_x86_ops.
The existing global struct kvm_x86_ops already defines an interface which
can be adapted to TDX, but kvm_x86_ops is a system-wide, not per-VM
structure. To allow VMX to coexist with TDs, the kvm_x86_ops callbacks
will have wrappers "if (tdx) tdx_op() else vmx_op()" to pick VMX or
TDX at run time.
To split the runtime switch, the VMX implementation, and the TDX
implementation, add main.c, and move out the vmx_x86_ops hooks in
preparation for adding TDX. Use 'vt' for the naming scheme as a nod to
VT-x and as a concatenation of VmxTdx.
The eventually converted code will look like this:
vmx.c:
vmx_op() { ... }
VMX initialization
tdx.c:
tdx_op() { ... }
TDX initialization
x86_ops.h:
vmx_op();
tdx_op();
main.c:
static vt_op() { if (tdx) tdx_op() else vmx_op() }
static struct kvm_x86_ops vt_x86_ops = {
.op = vt_op,
initialization functions call both VMX and TDX initialization
Opportunistically, fix the name inconsistency from vmx_create_vcpu() and
vmx_free_vcpu() to vmx_vcpu_create() and vmx_vcpu_free().
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Yuan Yao <yuan.yao@intel.com>
Message-Id: <e603c317587f933a9d1bee8728c84e4935849c16.1705965634.git.isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-03-18 11:39:16 -04:00
|
|
|
|
2025-03-14 14:06:48 -04:00
|
|
|
#ifdef CONFIG_KVM_INTEL_TDX
|
|
|
|
static_assert(offsetof(struct vcpu_vmx, vt) == offsetof(struct vcpu_tdx, vt));
|
|
|
|
|
KVM: TDX: Handle vCPU dissociation
Handle vCPUs dissociations by invoking SEAMCALL TDH.VP.FLUSH which flushes
the address translation caches and cached TD VMCS of a TD vCPU in its
associated pCPU.
In TDX, a vCPUs can only be associated with one pCPU at a time, which is
done by invoking SEAMCALL TDH.VP.ENTER. For a successful association, the
vCPU must be dissociated from its previous associated pCPU.
To facilitate vCPU dissociation, introduce a per-pCPU list
associated_tdvcpus. Add a vCPU into this list when it's loaded into a new
pCPU (i.e. when a vCPU is loaded for the first time or migrated to a new
pCPU).
vCPU dissociations can happen under below conditions:
- On the op hardware_disable is called.
This op is called when virtualization is disabled on a given pCPU, e.g.
when hot-unplug a pCPU or machine shutdown/suspend.
In this case, dissociate all vCPUs from the pCPU by iterating its
per-pCPU list associated_tdvcpus.
- On vCPU migration to a new pCPU.
Before adding a vCPU into associated_tdvcpus list of the new pCPU,
dissociation from its old pCPU is required, which is performed by issuing
an IPI and executing SEAMCALL TDH.VP.FLUSH on the old pCPU.
On a successful dissociation, the vCPU will be removed from the
associated_tdvcpus list of its previously associated pCPU.
- On tdx_mmu_release_hkid() is called.
TDX mandates that all vCPUs must be disassociated prior to the release of
an hkid. Therefore, dissociation of all vCPUs is a must before executing
the SEAMCALL TDH.MNG.VPFLUSHDONE and subsequently freeing the hkid.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Co-developed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Message-ID: <20241112073858.22312-1-yan.y.zhao@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-11-12 15:38:58 +08:00
|
|
|
static void vt_disable_virtualization_cpu(void)
|
|
|
|
{
|
|
|
|
/* Note, TDX *and* VMX need to be disabled if TDX is enabled. */
|
|
|
|
if (enable_tdx)
|
|
|
|
tdx_disable_virtualization_cpu();
|
|
|
|
vmx_disable_virtualization_cpu();
|
|
|
|
}
|
|
|
|
|
KVM: TDX: Add placeholders for TDX VM/vCPU structures
Add TDX's own VM and vCPU structures as placeholder to manage and run
TDX guests. Also add helper functions to check whether a VM/vCPU is
TDX or normal VMX one, and add helpers to convert between TDX VM/vCPU
and KVM VM/vCPU.
TDX protects guest VMs from malicious host. Unlike VMX guests, TDX
guests are crypto-protected. KVM cannot access TDX guests' memory and
vCPU states directly. Instead, TDX requires KVM to use a set of TDX
architecture-defined firmware APIs (a.k.a TDX module SEAMCALLs) to
manage and run TDX guests.
In fact, the way to manage and run TDX guests and normal VMX guests are
quite different. Because of that, the current structures
('struct kvm_vmx' and 'struct vcpu_vmx') to manage VMX guests are not
quite suitable for TDX guests. E.g., the majority of the members of
'struct vcpu_vmx' don't apply to TDX guests.
Introduce TDX's own VM and vCPU structures ('struct kvm_tdx' and 'struct
vcpu_tdx' respectively) for KVM to manage and run TDX guests. And
instead of building TDX's VM and vCPU structures based on VMX's, build
them directly based on 'struct kvm'.
As a result, TDX and VMX guests will have different VM size and vCPU
size/alignment.
Currently, kvm_arch_alloc_vm() uses 'kvm_x86_ops::vm_size' to allocate
enough space for the VM structure when creating guest. With TDX guests,
ideally, KVM should allocate the VM structure based on the VM type so
that the precise size can be allocated for VMX and TDX guests. But this
requires more extensive code change. For now, simply choose the maximum
size of 'struct kvm_tdx' and 'struct kvm_vmx' for VM structure
allocation for both VMX and TDX guests. This would result in small
memory waste for each VM which has smaller VM structure size but this is
acceptable.
For simplicity, use the same way for vCPU allocation too. Otherwise KVM
would need to maintain a separate 'kvm_vcpu_cache' for each VM type.
Note, updating the 'vt_x86_ops::vm_size' needs to be done before calling
kvm_ops_update(), which copies vt_x86_ops to kvm_x86_ops. However this
happens before TDX module initialization. Therefore theoretically it is
possible that 'kvm_x86_ops::vm_size' is set to size of 'struct kvm_tdx'
(when it's larger) but TDX actually fails to initialize at a later time.
Again the worst case of this is wasting couple of bytes memory for each
VM. KVM could choose to update 'kvm_x86_ops::vm_size' at a later time
depending on TDX's status but that would require base KVM module to
export either kvm_x86_ops or kvm_ops_update().
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
- Make to_kvm_tdx() and to_tdx() private to tdx.c (Francesco, Tony)
- Add pragma poison for to_vmx() (Paolo)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-30 12:00:24 -07:00
|
|
|
static __init int vt_hardware_setup(void)
|
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
ret = vmx_hardware_setup();
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
2025-05-22 17:11:35 -07:00
|
|
|
if (enable_tdx)
|
|
|
|
tdx_hardware_setup();
|
KVM: TDX: Add placeholders for TDX VM/vCPU structures
Add TDX's own VM and vCPU structures as placeholder to manage and run
TDX guests. Also add helper functions to check whether a VM/vCPU is
TDX or normal VMX one, and add helpers to convert between TDX VM/vCPU
and KVM VM/vCPU.
TDX protects guest VMs from malicious host. Unlike VMX guests, TDX
guests are crypto-protected. KVM cannot access TDX guests' memory and
vCPU states directly. Instead, TDX requires KVM to use a set of TDX
architecture-defined firmware APIs (a.k.a TDX module SEAMCALLs) to
manage and run TDX guests.
In fact, the way to manage and run TDX guests and normal VMX guests are
quite different. Because of that, the current structures
('struct kvm_vmx' and 'struct vcpu_vmx') to manage VMX guests are not
quite suitable for TDX guests. E.g., the majority of the members of
'struct vcpu_vmx' don't apply to TDX guests.
Introduce TDX's own VM and vCPU structures ('struct kvm_tdx' and 'struct
vcpu_tdx' respectively) for KVM to manage and run TDX guests. And
instead of building TDX's VM and vCPU structures based on VMX's, build
them directly based on 'struct kvm'.
As a result, TDX and VMX guests will have different VM size and vCPU
size/alignment.
Currently, kvm_arch_alloc_vm() uses 'kvm_x86_ops::vm_size' to allocate
enough space for the VM structure when creating guest. With TDX guests,
ideally, KVM should allocate the VM structure based on the VM type so
that the precise size can be allocated for VMX and TDX guests. But this
requires more extensive code change. For now, simply choose the maximum
size of 'struct kvm_tdx' and 'struct kvm_vmx' for VM structure
allocation for both VMX and TDX guests. This would result in small
memory waste for each VM which has smaller VM structure size but this is
acceptable.
For simplicity, use the same way for vCPU allocation too. Otherwise KVM
would need to maintain a separate 'kvm_vcpu_cache' for each VM type.
Note, updating the 'vt_x86_ops::vm_size' needs to be done before calling
kvm_ops_update(), which copies vt_x86_ops to kvm_x86_ops. However this
happens before TDX module initialization. Therefore theoretically it is
possible that 'kvm_x86_ops::vm_size' is set to size of 'struct kvm_tdx'
(when it's larger) but TDX actually fails to initialize at a later time.
Again the worst case of this is wasting couple of bytes memory for each
VM. KVM could choose to update 'kvm_x86_ops::vm_size' at a later time
depending on TDX's status but that would require base KVM module to
export either kvm_x86_ops or kvm_ops_update().
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
- Make to_kvm_tdx() and to_tdx() private to tdx.c (Francesco, Tony)
- Add pragma poison for to_vmx() (Paolo)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-30 12:00:24 -07:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2025-02-25 12:45:13 -05:00
|
|
|
static int vt_vm_init(struct kvm *kvm)
|
|
|
|
{
|
|
|
|
if (is_td(kvm))
|
|
|
|
return tdx_vm_init(kvm);
|
|
|
|
|
|
|
|
return vmx_vm_init(kvm);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void vt_vm_pre_destroy(struct kvm *kvm)
|
|
|
|
{
|
|
|
|
if (is_td(kvm))
|
|
|
|
return tdx_mmu_release_hkid(kvm);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void vt_vm_destroy(struct kvm *kvm)
|
|
|
|
{
|
|
|
|
if (is_td(kvm))
|
|
|
|
return tdx_vm_destroy(kvm);
|
|
|
|
|
|
|
|
vmx_vm_destroy(kvm);
|
|
|
|
}
|
|
|
|
|
2024-10-30 12:00:35 -07:00
|
|
|
static int vt_vcpu_precreate(struct kvm *kvm)
|
|
|
|
{
|
|
|
|
if (is_td(kvm))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
return vmx_vcpu_precreate(kvm);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int vt_vcpu_create(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return tdx_vcpu_create(vcpu);
|
|
|
|
|
|
|
|
return vmx_vcpu_create(vcpu);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void vt_vcpu_free(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu)) {
|
|
|
|
tdx_vcpu_free(vcpu);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
vmx_vcpu_free(vcpu);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
|
|
|
|
{
|
2025-02-22 09:47:50 +08:00
|
|
|
if (is_td_vcpu(vcpu)) {
|
|
|
|
tdx_vcpu_reset(vcpu, init_event);
|
2024-10-30 12:00:35 -07:00
|
|
|
return;
|
2025-02-22 09:47:50 +08:00
|
|
|
}
|
2024-10-30 12:00:35 -07:00
|
|
|
|
|
|
|
vmx_vcpu_reset(vcpu, init_event);
|
|
|
|
}
|
|
|
|
|
KVM: TDX: Handle vCPU dissociation
Handle vCPUs dissociations by invoking SEAMCALL TDH.VP.FLUSH which flushes
the address translation caches and cached TD VMCS of a TD vCPU in its
associated pCPU.
In TDX, a vCPUs can only be associated with one pCPU at a time, which is
done by invoking SEAMCALL TDH.VP.ENTER. For a successful association, the
vCPU must be dissociated from its previous associated pCPU.
To facilitate vCPU dissociation, introduce a per-pCPU list
associated_tdvcpus. Add a vCPU into this list when it's loaded into a new
pCPU (i.e. when a vCPU is loaded for the first time or migrated to a new
pCPU).
vCPU dissociations can happen under below conditions:
- On the op hardware_disable is called.
This op is called when virtualization is disabled on a given pCPU, e.g.
when hot-unplug a pCPU or machine shutdown/suspend.
In this case, dissociate all vCPUs from the pCPU by iterating its
per-pCPU list associated_tdvcpus.
- On vCPU migration to a new pCPU.
Before adding a vCPU into associated_tdvcpus list of the new pCPU,
dissociation from its old pCPU is required, which is performed by issuing
an IPI and executing SEAMCALL TDH.VP.FLUSH on the old pCPU.
On a successful dissociation, the vCPU will be removed from the
associated_tdvcpus list of its previously associated pCPU.
- On tdx_mmu_release_hkid() is called.
TDX mandates that all vCPUs must be disassociated prior to the release of
an hkid. Therefore, dissociation of all vCPUs is a must before executing
the SEAMCALL TDH.MNG.VPFLUSHDONE and subsequently freeing the hkid.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Co-developed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Message-ID: <20241112073858.22312-1-yan.y.zhao@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-11-12 15:38:58 +08:00
|
|
|
static void vt_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu)) {
|
|
|
|
tdx_vcpu_load(vcpu, cpu);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
vmx_vcpu_load(vcpu, cpu);
|
|
|
|
}
|
|
|
|
|
2025-02-19 07:43:51 -05:00
|
|
|
static void vt_update_cpu_dirty_logging(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Basic TDX does not support feature PML. KVM does not enable PML in
|
|
|
|
* TD's VMCS, nor does it allocate or flush PML buffer for TDX.
|
|
|
|
*/
|
|
|
|
if (WARN_ON_ONCE(is_td_vcpu(vcpu)))
|
|
|
|
return;
|
|
|
|
|
|
|
|
vmx_update_cpu_dirty_logging(vcpu);
|
|
|
|
}
|
|
|
|
|
2025-01-29 11:58:55 +02:00
|
|
|
static void vt_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu)) {
|
|
|
|
tdx_prepare_switch_to_guest(vcpu);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
vmx_prepare_switch_to_guest(vcpu);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void vt_vcpu_put(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu)) {
|
|
|
|
tdx_vcpu_put(vcpu);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
vmx_vcpu_put(vcpu);
|
|
|
|
}
|
|
|
|
|
2025-01-29 11:58:54 +02:00
|
|
|
static int vt_vcpu_pre_run(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return tdx_vcpu_pre_run(vcpu);
|
|
|
|
|
|
|
|
return vmx_vcpu_pre_run(vcpu);
|
|
|
|
}
|
|
|
|
|
2025-06-10 16:20:04 -07:00
|
|
|
static fastpath_t vt_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
|
2025-01-29 11:58:54 +02:00
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu))
|
2025-06-10 16:20:04 -07:00
|
|
|
return tdx_vcpu_run(vcpu, run_flags);
|
2025-01-29 11:58:54 +02:00
|
|
|
|
2025-06-10 16:20:04 -07:00
|
|
|
return vmx_vcpu_run(vcpu, run_flags);
|
2025-01-29 11:58:54 +02:00
|
|
|
}
|
|
|
|
|
KVM: TDX: Add a place holder to handle TDX VM exit
Introduce the wiring for handling TDX VM exits by implementing the
callbacks .get_exit_info(), .get_entry_info(), and .handle_exit().
Additionally, add error handling during the TDX VM exit flow, and add a
place holder to handle various exit reasons.
Store VMX exit reason and exit qualification in struct vcpu_vt for TDX,
so that TDX/VMX can use the same helpers to get exit reason and exit
qualification. Store extended exit qualification and exit GPA info in
struct vcpu_tdx because they are used by TDX code only.
Contention Handling: The TDH.VP.ENTER operation may contend with TDH.MEM.*
operations due to secure EPT or TD EPOCH. If the contention occurs,
the return value will have TDX_OPERAND_BUSY set, prompting the vCPU to
attempt re-entry into the guest with EXIT_FASTPATH_EXIT_HANDLED,
not EXIT_FASTPATH_REENTER_GUEST, so that the interrupts pending during
IN_GUEST_MODE can be delivered for sure. Otherwise, the requester of
KVM_REQ_OUTSIDE_GUEST_MODE may be blocked endlessly.
Error Handling:
- TDX_SW_ERROR: This includes #UD caused by SEAMCALL instruction if the
CPU isn't in VMX operation, #GP caused by SEAMCALL instruction when TDX
isn't enabled by the BIOS, and TDX_SEAMCALL_VMFAILINVALID when SEAM
firmware is not loaded or disabled.
- TDX_ERROR: This indicates some check failed in the TDX module, preventing
the vCPU from running.
- Failed VM Entry: Exit to userspace with KVM_EXIT_FAIL_ENTRY. Handle it
separately before handling TDX_NON_RECOVERABLE because when off-TD debug
is not enabled, TDX_NON_RECOVERABLE is set.
- TDX_NON_RECOVERABLE: Set by the TDX module when the error is
non-recoverable, indicating that the TDX guest is dead or the vCPU is
disabled.
A special case is triple fault, which also sets TDX_NON_RECOVERABLE but
exits to userspace with KVM_EXIT_SHUTDOWN, aligning with the VMX case.
- Any unhandled VM exit reason will also return to userspace with
KVM_EXIT_INTERNAL_ERROR.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Message-ID: <20250222014225.897298-4-binbin.wu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-03-06 13:27:04 -05:00
|
|
|
static int vt_handle_exit(struct kvm_vcpu *vcpu,
|
|
|
|
enum exit_fastpath_completion fastpath)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return tdx_handle_exit(vcpu, fastpath);
|
|
|
|
|
|
|
|
return vmx_handle_exit(vcpu, fastpath);
|
|
|
|
}
|
|
|
|
|
KVM: TDX: Implement callbacks for MSR operations
Add functions to implement MSR related callbacks, .set_msr(), .get_msr(),
and .has_emulated_msr(), for preparation of handling hypercalls from TDX
guest for PV RDMSR and WRMSR. Ignore KVM_REQ_MSR_FILTER_CHANGED for TDX.
There are three classes of MSR virtualization for TDX.
- Non-configurable: TDX module directly virtualizes it. VMM can't configure
it, the value set by KVM_SET_MSRS is ignored.
- Configurable: TDX module directly virtualizes it. VMM can configure it at
VM creation time. The value set by KVM_SET_MSRS is used.
- #VE case: TDX guest would issue TDG.VP.VMCALL<INSTRUCTION.{WRMSR,RDMSR}>
and VMM handles the MSR hypercall. The value set by KVM_SET_MSRS is used.
For the MSRs belonging to the #VE case, the TDX module injects #VE to the
TDX guest upon RDMSR or WRMSR. The exact list of such MSRs is defined in
TDX Module ABI Spec.
Upon #VE, the TDX guest may call TDG.VP.VMCALL<INSTRUCTION.{WRMSR,RDMSR}>,
which are defined in GHCI (Guest-Host Communication Interface) so that the
host VMM (e.g. KVM) can virtualize the MSRs.
TDX doesn't allow VMM to configure interception of MSR accesses. Ignore
KVM_REQ_MSR_FILTER_CHANGED for TDX guest. If the userspace has set any
MSR filters, it will be applied when handling
TDG.VP.VMCALL<INSTRUCTION.{WRMSR,RDMSR}> in a later patch.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20250227012021.1778144-9-binbin.wu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-02-27 09:20:09 +08:00
|
|
|
static int vt_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
|
|
|
|
{
|
|
|
|
if (unlikely(is_td_vcpu(vcpu)))
|
|
|
|
return tdx_set_msr(vcpu, msr_info);
|
|
|
|
|
|
|
|
return vmx_set_msr(vcpu, msr_info);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The kvm parameter can be NULL (module initialization, or invocation before
|
|
|
|
* VM creation). Be sure to check the kvm parameter before using it.
|
|
|
|
*/
|
|
|
|
static bool vt_has_emulated_msr(struct kvm *kvm, u32 index)
|
|
|
|
{
|
|
|
|
if (kvm && is_td(kvm))
|
|
|
|
return tdx_has_emulated_msr(index);
|
|
|
|
|
|
|
|
return vmx_has_emulated_msr(kvm, index);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int vt_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
|
|
|
|
{
|
|
|
|
if (unlikely(is_td_vcpu(vcpu)))
|
|
|
|
return tdx_get_msr(vcpu, msr_info);
|
|
|
|
|
|
|
|
return vmx_get_msr(vcpu, msr_info);
|
|
|
|
}
|
|
|
|
|
2025-06-10 15:57:25 -07:00
|
|
|
static void vt_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
|
KVM: TDX: Implement callbacks for MSR operations
Add functions to implement MSR related callbacks, .set_msr(), .get_msr(),
and .has_emulated_msr(), for preparation of handling hypercalls from TDX
guest for PV RDMSR and WRMSR. Ignore KVM_REQ_MSR_FILTER_CHANGED for TDX.
There are three classes of MSR virtualization for TDX.
- Non-configurable: TDX module directly virtualizes it. VMM can't configure
it, the value set by KVM_SET_MSRS is ignored.
- Configurable: TDX module directly virtualizes it. VMM can configure it at
VM creation time. The value set by KVM_SET_MSRS is used.
- #VE case: TDX guest would issue TDG.VP.VMCALL<INSTRUCTION.{WRMSR,RDMSR}>
and VMM handles the MSR hypercall. The value set by KVM_SET_MSRS is used.
For the MSRs belonging to the #VE case, the TDX module injects #VE to the
TDX guest upon RDMSR or WRMSR. The exact list of such MSRs is defined in
TDX Module ABI Spec.
Upon #VE, the TDX guest may call TDG.VP.VMCALL<INSTRUCTION.{WRMSR,RDMSR}>,
which are defined in GHCI (Guest-Host Communication Interface) so that the
host VMM (e.g. KVM) can virtualize the MSRs.
TDX doesn't allow VMM to configure interception of MSR accesses. Ignore
KVM_REQ_MSR_FILTER_CHANGED for TDX guest. If the userspace has set any
MSR filters, it will be applied when handling
TDG.VP.VMCALL<INSTRUCTION.{WRMSR,RDMSR}> in a later patch.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20250227012021.1778144-9-binbin.wu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-02-27 09:20:09 +08:00
|
|
|
{
|
|
|
|
/*
|
|
|
|
* TDX doesn't allow VMM to configure interception of MSR accesses.
|
|
|
|
* TDX guest requests MSR accesses by calling TDVMCALL. The MSR
|
|
|
|
* filters will be applied when handling the TDVMCALL for RDMSR/WRMSR
|
|
|
|
* if the userspace has set any.
|
|
|
|
*/
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return;
|
|
|
|
|
2025-06-10 15:57:25 -07:00
|
|
|
vmx_recalc_msr_intercepts(vcpu);
|
KVM: TDX: Implement callbacks for MSR operations
Add functions to implement MSR related callbacks, .set_msr(), .get_msr(),
and .has_emulated_msr(), for preparation of handling hypercalls from TDX
guest for PV RDMSR and WRMSR. Ignore KVM_REQ_MSR_FILTER_CHANGED for TDX.
There are three classes of MSR virtualization for TDX.
- Non-configurable: TDX module directly virtualizes it. VMM can't configure
it, the value set by KVM_SET_MSRS is ignored.
- Configurable: TDX module directly virtualizes it. VMM can configure it at
VM creation time. The value set by KVM_SET_MSRS is used.
- #VE case: TDX guest would issue TDG.VP.VMCALL<INSTRUCTION.{WRMSR,RDMSR}>
and VMM handles the MSR hypercall. The value set by KVM_SET_MSRS is used.
For the MSRs belonging to the #VE case, the TDX module injects #VE to the
TDX guest upon RDMSR or WRMSR. The exact list of such MSRs is defined in
TDX Module ABI Spec.
Upon #VE, the TDX guest may call TDG.VP.VMCALL<INSTRUCTION.{WRMSR,RDMSR}>,
which are defined in GHCI (Guest-Host Communication Interface) so that the
host VMM (e.g. KVM) can virtualize the MSRs.
TDX doesn't allow VMM to configure interception of MSR accesses. Ignore
KVM_REQ_MSR_FILTER_CHANGED for TDX guest. If the userspace has set any
MSR filters, it will be applied when handling
TDG.VP.VMCALL<INSTRUCTION.{WRMSR,RDMSR}> in a later patch.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20250227012021.1778144-9-binbin.wu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-02-27 09:20:09 +08:00
|
|
|
}
|
|
|
|
|
2025-02-27 09:20:10 +08:00
|
|
|
static int vt_complete_emulated_msr(struct kvm_vcpu *vcpu, int err)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return tdx_complete_emulated_msr(vcpu, err);
|
|
|
|
|
2025-03-18 00:35:08 -06:00
|
|
|
return vmx_complete_emulated_msr(vcpu, err);
|
2025-02-27 09:20:10 +08:00
|
|
|
}
|
|
|
|
|
2025-02-22 09:47:49 +08:00
|
|
|
#ifdef CONFIG_KVM_SMM
|
|
|
|
static int vt_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
|
|
|
|
{
|
|
|
|
if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
return vmx_smi_allowed(vcpu, for_injection);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int vt_enter_smm(struct kvm_vcpu *vcpu, union kvm_smram *smram)
|
|
|
|
{
|
|
|
|
if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
return vmx_enter_smm(vcpu, smram);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int vt_leave_smm(struct kvm_vcpu *vcpu, const union kvm_smram *smram)
|
|
|
|
{
|
|
|
|
if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
return vmx_leave_smm(vcpu, smram);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void vt_enable_smi_window(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
|
|
|
|
return;
|
|
|
|
|
|
|
|
/* RSM will cause a vmexit anyway. */
|
|
|
|
vmx_enable_smi_window(vcpu);
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2025-02-27 09:20:14 +08:00
|
|
|
static int vt_check_emulate_instruction(struct kvm_vcpu *vcpu, int emul_type,
|
|
|
|
void *insn, int insn_len)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* For TDX, this can only be triggered for MMIO emulation. Let the
|
|
|
|
* guest retry after installing the SPTE with suppress #VE bit cleared,
|
|
|
|
* so that the guest will receive #VE when retry. The guest is expected
|
|
|
|
* to call TDG.VP.VMCALL<MMIO> to request VMM to do MMIO emulation on
|
|
|
|
* #VE.
|
|
|
|
*/
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return X86EMUL_RETRY_INSTR;
|
|
|
|
|
|
|
|
return vmx_check_emulate_instruction(vcpu, emul_type, insn, insn_len);
|
|
|
|
}
|
|
|
|
|
2025-02-22 09:47:50 +08:00
|
|
|
static bool vt_apic_init_signal_blocked(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* INIT and SIPI are always blocked for TDX, i.e., INIT handling and
|
|
|
|
* the OP vcpu_deliver_sipi_vector() won't be called.
|
|
|
|
*/
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return true;
|
|
|
|
|
|
|
|
return vmx_apic_init_signal_blocked(vcpu);
|
|
|
|
}
|
|
|
|
|
2025-02-22 09:47:53 +08:00
|
|
|
static void vt_set_virtual_apic_mode(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
/* Only x2APIC mode is supported for TD. */
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return;
|
|
|
|
|
|
|
|
return vmx_set_virtual_apic_mode(vcpu);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void vt_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return;
|
|
|
|
|
|
|
|
return vmx_hwapic_isr_update(vcpu, max_isr);
|
|
|
|
}
|
|
|
|
|
2025-02-22 09:47:45 +08:00
|
|
|
static int vt_sync_pir_to_irr(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return -1;
|
|
|
|
|
|
|
|
return vmx_sync_pir_to_irr(vcpu);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void vt_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
|
|
|
|
int trig_mode, int vector)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(apic->vcpu)) {
|
|
|
|
tdx_deliver_interrupt(apic, delivery_mode, trig_mode,
|
|
|
|
vector);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
vmx_deliver_interrupt(apic, delivery_mode, trig_mode, vector);
|
|
|
|
}
|
|
|
|
|
2025-02-27 09:20:13 +08:00
|
|
|
static void vt_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return;
|
|
|
|
|
|
|
|
vmx_vcpu_after_set_cpuid(vcpu);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void vt_update_exception_bitmap(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return;
|
|
|
|
|
|
|
|
vmx_update_exception_bitmap(vcpu);
|
|
|
|
}
|
|
|
|
|
|
|
|
static u64 vt_get_segment_base(struct kvm_vcpu *vcpu, int seg)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
return vmx_get_segment_base(vcpu, seg);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void vt_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var,
|
|
|
|
int seg)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu)) {
|
|
|
|
memset(var, 0, sizeof(*var));
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
vmx_get_segment(vcpu, var, seg);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void vt_set_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var,
|
|
|
|
int seg)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return;
|
|
|
|
|
|
|
|
vmx_set_segment(vcpu, var, seg);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int vt_get_cpl(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
return vmx_get_cpl(vcpu);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int vt_get_cpl_no_cache(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
return vmx_get_cpl_no_cache(vcpu);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void vt_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu)) {
|
|
|
|
*db = 0;
|
|
|
|
*l = 0;
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
vmx_get_cs_db_l_bits(vcpu, db, l);
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool vt_is_valid_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return true;
|
|
|
|
|
|
|
|
return vmx_is_valid_cr0(vcpu, cr0);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void vt_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return;
|
|
|
|
|
|
|
|
vmx_set_cr0(vcpu, cr0);
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool vt_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return true;
|
|
|
|
|
|
|
|
return vmx_is_valid_cr4(vcpu, cr4);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void vt_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return;
|
|
|
|
|
|
|
|
vmx_set_cr4(vcpu, cr4);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int vt_set_efer(struct kvm_vcpu *vcpu, u64 efer)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
return vmx_set_efer(vcpu, efer);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void vt_get_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu)) {
|
|
|
|
memset(dt, 0, sizeof(*dt));
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
vmx_get_idt(vcpu, dt);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void vt_set_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return;
|
|
|
|
|
|
|
|
vmx_set_idt(vcpu, dt);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void vt_get_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu)) {
|
|
|
|
memset(dt, 0, sizeof(*dt));
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
vmx_get_gdt(vcpu, dt);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void vt_set_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return;
|
|
|
|
|
|
|
|
vmx_set_gdt(vcpu, dt);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void vt_set_dr7(struct kvm_vcpu *vcpu, unsigned long val)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return;
|
|
|
|
|
|
|
|
vmx_set_dr7(vcpu, val);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void vt_sync_dirty_debug_regs(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* MOV-DR exiting is always cleared for TD guest, even in debug mode.
|
|
|
|
* Thus KVM_DEBUGREG_WONT_EXIT can never be set and it should never
|
|
|
|
* reach here for TD vcpu.
|
|
|
|
*/
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return;
|
|
|
|
|
|
|
|
vmx_sync_dirty_debug_regs(vcpu);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void vt_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
|
|
|
|
{
|
|
|
|
if (WARN_ON_ONCE(is_td_vcpu(vcpu)))
|
|
|
|
return;
|
|
|
|
|
|
|
|
vmx_cache_reg(vcpu, reg);
|
|
|
|
}
|
|
|
|
|
|
|
|
static unsigned long vt_get_rflags(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
return vmx_get_rflags(vcpu);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void vt_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return;
|
|
|
|
|
|
|
|
vmx_set_rflags(vcpu, rflags);
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool vt_get_if_flag(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return false;
|
|
|
|
|
|
|
|
return vmx_get_if_flag(vcpu);
|
|
|
|
}
|
|
|
|
|
KVM: TDX: Handle TLB tracking for TDX
Handle TLB tracking for TDX by introducing function tdx_track() for private
memory TLB tracking and implementing flush_tlb* hooks to flush TLBs for
shared memory.
Introduce function tdx_track() to do TLB tracking on private memory, which
basically does two things: calling TDH.MEM.TRACK to increase TD epoch and
kicking off all vCPUs. The private EPT will then be flushed when each vCPU
re-enters the TD. This function is unused temporarily in this patch and
will be called on a page-by-page basis on removal of private guest page in
a later patch.
In earlier revisions, tdx_track() relied on an atomic counter to coordinate
the synchronization between the actions of kicking off vCPUs, incrementing
the TD epoch, and the vCPUs waiting for the incremented TD epoch after
being kicked off.
However, the core MMU only actually needs to call tdx_track() while
aleady under a write mmu_lock. So this sychnonization can be made to be
unneeded. vCPUs are kicked off only after the successful execution of
TDH.MEM.TRACK, eliminating the need for vCPUs to wait for TDH.MEM.TRACK
completion after being kicked off. tdx_track() is therefore able to send
requests KVM_REQ_OUTSIDE_GUEST_MODE rather than KVM_REQ_TLB_FLUSH.
Hooks for flush_remote_tlb and flush_remote_tlbs_range are not necessary
for TDX, as tdx_track() will handle TLB tracking of private memory on
page-by-page basis when private guest pages are removed. There is no need
to invoke tdx_track() again in kvm_flush_remote_tlbs() even after changes
to the mirrored page table.
For hooks flush_tlb_current and flush_tlb_all, which are invoked during
kvm_mmu_load() and vcpu load for normal VMs, let VMM to flush all EPTs in
the two hooks for simplicity, since TDX does not depend on the two
hooks to notify TDX module to flush private EPT in those cases.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Co-developed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Message-ID: <20241112073753.22228-1-yan.y.zhao@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-11-12 15:37:53 +08:00
|
|
|
static void vt_flush_tlb_all(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu)) {
|
|
|
|
tdx_flush_tlb_all(vcpu);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
vmx_flush_tlb_all(vcpu);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void vt_flush_tlb_current(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu)) {
|
|
|
|
tdx_flush_tlb_current(vcpu);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
vmx_flush_tlb_current(vcpu);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void vt_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t addr)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return;
|
|
|
|
|
|
|
|
vmx_flush_tlb_gva(vcpu, addr);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void vt_flush_tlb_guest(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return;
|
|
|
|
|
|
|
|
vmx_flush_tlb_guest(vcpu);
|
|
|
|
}
|
|
|
|
|
KVM: TDX: Implement methods to inject NMI
Inject NMI to TDX guest by setting the PEND_NMI TDVPS field to 1, i.e. make
the NMI pending in the TDX module. If there is a further pending NMI in
KVM, collapse it to the one pending in the TDX module.
VMM can request the TDX module to inject a NMI into a TDX vCPU by setting
the PEND_NMI TDVPS field to 1. Following that, VMM can call TDH.VP.ENTER
to run the vCPU and the TDX module will attempt to inject the NMI as soon
as possible.
KVM has the following 3 cases to inject two NMIs when handling simultaneous
NMIs and they need to be injected in a back-to-back way. Otherwise, OS
kernel may fire a warning about the unknown NMI [1]:
K1. One NMI is being handled in the guest and one NMI pending in KVM.
KVM requests NMI window exit to inject the pending NMI.
K2. Two NMIs are pending in KVM.
KVM injects the first NMI and requests NMI window exit to inject the
second NMI.
K3. A previous NMI needs to be rejected and one NMI pending in KVM.
KVM first requests force immediate exit followed by a VM entry to
complete the NMI rejection. Then, during the force immediate exit, KVM
requests NMI window exit to inject the pending NMI.
For TDX, PEND_NMI TDVPS field is a 1-bit field, i.e. KVM can only pend one
NMI in the TDX module. Also, the vCPU state is protected, KVM doesn't know
the NMI blocking states of TDX vCPU, KVM has to assume NMI is always
unmasked and allowed. When KVM sees PEND_NMI is 1 after a TD exit, it
means the previous NMI needs to be re-injected.
Based on KVM's NMI handling flow, there are following 6 cases:
In NMI handler TDX module KVM
T1. No PEND_NMI=0 1 pending NMI
T2. No PEND_NMI=0 2 pending NMIs
T3. No PEND_NMI=1 1 pending NMI
T4. Yes PEND_NMI=0 1 pending NMI
T5. Yes PEND_NMI=0 2 pending NMIs
T6. Yes PEND_NMI=1 1 pending NMI
K1 is mapped to T4.
K2 is mapped to T2 or T5.
K3 is mapped to T3 or T6.
Note: KVM doesn't know whether NMI is blocked by a NMI or not, case T5 and
T6 can happen.
When handling pending NMI in KVM for TDX guest, what KVM can do is to add a
pending NMI in TDX module when PEND_NMI is 0. T1 and T4 can be handled by
this way. However, TDX doesn't allow KVM to request NMI window exit
directly, if PEND_NMI is already set and there is still pending NMI in KVM,
the only way KVM could try is to request a force immediate exit. But for
case T5 and T6, force immediate exit will result in infinite loop because
force immediate exit makes it no progress in the NMI handler, so that the
pending NMI in the TDX module can never be injected.
Considering on X86 bare metal, multiple NMIs could collapse into one NMI,
e.g. when NMI is blocked by SMI. It's OS's responsibility to poll all NMI
sources in the NMI handler to avoid missing handling of some NMI events.
Based on that, for the above 3 cases (K1-K3), only case K1 must inject the
second NMI because the guest NMI handler may have already polled some of
the NMI sources, which could include the source of the pending NMI, the
pending NMI must be injected to avoid the lost of NMI. For case K2 and K3,
the guest OS will poll all NMI sources (including the sources caused by the
second NMI and further NMI collapsed) when the delivery of the first NMI,
KVM doesn't have the necessity to inject the second NMI.
To handle the NMI injection properly for TDX, there are two options:
- Option 1: Modify the KVM's NMI handling common code, to collapse the
second pending NMI for K2 and K3.
- Option 2: Do it in TDX specific way. When the previous NMI is still
pending in the TDX module, i.e. it has not been delivered to TDX guest
yet, collapse the pending NMI in KVM into the previous one.
This patch goes with option 2 because it is simple and doesn't impact other
VM types. Option 1 may need more discussions.
This is the first need to access vCPU scope metadata in the "management"
class. Make needed accessors available.
[1] https://lore.kernel.org/all/1317409584-23662-5-git-send-email-dzickus@redhat.com/
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20250222014757.897978-8-binbin.wu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-02-22 09:47:48 +08:00
|
|
|
static void vt_inject_nmi(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu)) {
|
|
|
|
tdx_inject_nmi(vcpu);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
vmx_inject_nmi(vcpu);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int vt_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* The TDX module manages NMI windows and NMI reinjection, and hides NMI
|
|
|
|
* blocking, all KVM can do is throw an NMI over the wall.
|
|
|
|
*/
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return true;
|
|
|
|
|
|
|
|
return vmx_nmi_allowed(vcpu, for_injection);
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool vt_get_nmi_mask(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* KVM can't get NMI blocking status for TDX guest, assume NMIs are
|
|
|
|
* always unmasked.
|
|
|
|
*/
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return false;
|
|
|
|
|
|
|
|
return vmx_get_nmi_mask(vcpu);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void vt_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return;
|
|
|
|
|
|
|
|
vmx_set_nmi_mask(vcpu, masked);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void vt_enable_nmi_window(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
/* Refer to the comments in tdx_inject_nmi(). */
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return;
|
|
|
|
|
|
|
|
vmx_enable_nmi_window(vcpu);
|
|
|
|
}
|
|
|
|
|
2024-11-12 15:36:01 +08:00
|
|
|
static void vt_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,
|
|
|
|
int pgd_level)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu)) {
|
|
|
|
tdx_load_mmu_pgd(vcpu, root_hpa, pgd_level);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
vmx_load_mmu_pgd(vcpu, root_hpa, pgd_level);
|
|
|
|
}
|
|
|
|
|
2025-02-22 09:47:45 +08:00
|
|
|
static void vt_set_interrupt_shadow(struct kvm_vcpu *vcpu, int mask)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return;
|
|
|
|
|
|
|
|
vmx_set_interrupt_shadow(vcpu, mask);
|
|
|
|
}
|
|
|
|
|
|
|
|
static u32 vt_get_interrupt_shadow(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
return vmx_get_interrupt_shadow(vcpu);
|
|
|
|
}
|
|
|
|
|
2025-02-27 09:20:18 +08:00
|
|
|
static void vt_patch_hypercall(struct kvm_vcpu *vcpu,
|
|
|
|
unsigned char *hypercall)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Because guest memory is protected, guest can't be patched. TD kernel
|
|
|
|
* is modified to use TDG.VP.VMCALL for hypercall.
|
|
|
|
*/
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return;
|
|
|
|
|
|
|
|
vmx_patch_hypercall(vcpu, hypercall);
|
|
|
|
}
|
|
|
|
|
2025-02-22 09:47:45 +08:00
|
|
|
static void vt_inject_irq(struct kvm_vcpu *vcpu, bool reinjected)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return;
|
|
|
|
|
|
|
|
vmx_inject_irq(vcpu, reinjected);
|
|
|
|
}
|
|
|
|
|
2025-02-27 09:20:13 +08:00
|
|
|
static void vt_inject_exception(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return;
|
|
|
|
|
|
|
|
vmx_inject_exception(vcpu);
|
|
|
|
}
|
|
|
|
|
2025-02-22 09:47:45 +08:00
|
|
|
static void vt_cancel_injection(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return;
|
|
|
|
|
|
|
|
vmx_cancel_injection(vcpu);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int vt_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu))
|
2025-02-27 09:20:07 +08:00
|
|
|
return tdx_interrupt_allowed(vcpu);
|
2025-02-22 09:47:45 +08:00
|
|
|
|
|
|
|
return vmx_interrupt_allowed(vcpu, for_injection);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void vt_enable_irq_window(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return;
|
|
|
|
|
|
|
|
vmx_enable_irq_window(vcpu);
|
|
|
|
}
|
|
|
|
|
KVM: TDX: Add a place holder to handle TDX VM exit
Introduce the wiring for handling TDX VM exits by implementing the
callbacks .get_exit_info(), .get_entry_info(), and .handle_exit().
Additionally, add error handling during the TDX VM exit flow, and add a
place holder to handle various exit reasons.
Store VMX exit reason and exit qualification in struct vcpu_vt for TDX,
so that TDX/VMX can use the same helpers to get exit reason and exit
qualification. Store extended exit qualification and exit GPA info in
struct vcpu_tdx because they are used by TDX code only.
Contention Handling: The TDH.VP.ENTER operation may contend with TDH.MEM.*
operations due to secure EPT or TD EPOCH. If the contention occurs,
the return value will have TDX_OPERAND_BUSY set, prompting the vCPU to
attempt re-entry into the guest with EXIT_FASTPATH_EXIT_HANDLED,
not EXIT_FASTPATH_REENTER_GUEST, so that the interrupts pending during
IN_GUEST_MODE can be delivered for sure. Otherwise, the requester of
KVM_REQ_OUTSIDE_GUEST_MODE may be blocked endlessly.
Error Handling:
- TDX_SW_ERROR: This includes #UD caused by SEAMCALL instruction if the
CPU isn't in VMX operation, #GP caused by SEAMCALL instruction when TDX
isn't enabled by the BIOS, and TDX_SEAMCALL_VMFAILINVALID when SEAM
firmware is not loaded or disabled.
- TDX_ERROR: This indicates some check failed in the TDX module, preventing
the vCPU from running.
- Failed VM Entry: Exit to userspace with KVM_EXIT_FAIL_ENTRY. Handle it
separately before handling TDX_NON_RECOVERABLE because when off-TD debug
is not enabled, TDX_NON_RECOVERABLE is set.
- TDX_NON_RECOVERABLE: Set by the TDX module when the error is
non-recoverable, indicating that the TDX guest is dead or the vCPU is
disabled.
A special case is triple fault, which also sets TDX_NON_RECOVERABLE but
exits to userspace with KVM_EXIT_SHUTDOWN, aligning with the VMX case.
- Any unhandled VM exit reason will also return to userspace with
KVM_EXIT_INTERNAL_ERROR.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Message-ID: <20250222014225.897298-4-binbin.wu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-03-06 13:27:04 -05:00
|
|
|
static void vt_get_entry_info(struct kvm_vcpu *vcpu, u32 *intr_info, u32 *error_code)
|
|
|
|
{
|
|
|
|
*intr_info = 0;
|
|
|
|
*error_code = 0;
|
|
|
|
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return;
|
|
|
|
|
|
|
|
vmx_get_entry_info(vcpu, intr_info, error_code);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void vt_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason,
|
|
|
|
u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu)) {
|
|
|
|
tdx_get_exit_info(vcpu, reason, info1, info2, intr_info,
|
|
|
|
error_code);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
vmx_get_exit_info(vcpu, reason, info1, info2, intr_info, error_code);
|
|
|
|
}
|
|
|
|
|
2025-02-27 09:20:13 +08:00
|
|
|
static void vt_update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return;
|
|
|
|
|
|
|
|
vmx_update_cr8_intercept(vcpu, tpr, irr);
|
|
|
|
}
|
|
|
|
|
2025-02-22 09:47:53 +08:00
|
|
|
static void vt_set_apic_access_page_addr(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return;
|
|
|
|
|
|
|
|
vmx_set_apic_access_page_addr(vcpu);
|
|
|
|
}
|
|
|
|
|
KVM: TDX: Force APICv active for TDX guest
Force APICv active for TDX guests in KVM because APICv is always enabled
by TDX module.
From the view of KVM, whether APICv state is active or not is decided by:
1. APIC is hw enabled
2. VM and vCPU have no inhibit reasons set.
After TDX vCPU init, APIC is set to x2APIC mode. KVM_SET_{SREGS,SREGS2} are
rejected due to has_protected_state for TDs and guest_state_protected
for TDX vCPUs are set. Reject KVM_{GET,SET}_LAPIC from userspace since
migration is not supported yet, so that userspace cannot disable APIC.
For various APICv inhibit reasons:
- APICV_INHIBIT_REASON_DISABLED is impossible after checking enable_apicv
in tdx_bringup(). If !enable_apicv, TDX support will be disabled.
- APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED is impossible since x2APIC is
mandatory, KVM emulates APIC_ID as read-only for x2APIC mode. (Note:
APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED could be set if the memory
allocation fails for KVM apic_map.)
- APICV_INHIBIT_REASON_HYPERV is impossible since TDX doesn't support
HyperV guest yet.
- APICV_INHIBIT_REASON_ABSENT is impossible since in-kernel LAPIC is
checked in tdx_vcpu_create().
- APICV_INHIBIT_REASON_BLOCKIRQ is impossible since TDX doesn't support
KVM_SET_GUEST_DEBUG.
- APICV_INHIBIT_REASON_APIC_ID_MODIFIED is impossible since x2APIC is
mandatory.
- APICV_INHIBIT_REASON_APIC_BASE_MODIFIED is impossible since KVM rejects
userspace to set APIC base.
- The rest inhibit reasons are relevant only to AMD's AVIC, including
APICV_INHIBIT_REASON_NESTED, APICV_INHIBIT_REASON_IRQWIN,
APICV_INHIBIT_REASON_PIT_REINJ, APICV_INHIBIT_REASON_SEV, and
APICV_INHIBIT_REASON_LOGICAL_ID_ALIASED.
(For APICV_INHIBIT_REASON_PIT_REINJ, similar to AVIC, KVM can't intercept
EOI for TDX guests neither, but KVM enforces KVM_IRQCHIP_SPLIT for TDX
guests, which eliminates the in-kernel PIT.)
Implement vt_refresh_apicv_exec_ctrl() to call KVM_BUG_ON() if APICv is
disabled for TDX guests.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Message-ID: <20250222014757.897978-12-binbin.wu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-02-22 09:47:52 +08:00
|
|
|
static void vt_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu)) {
|
|
|
|
KVM_BUG_ON(!kvm_vcpu_apicv_active(vcpu), vcpu->kvm);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
vmx_refresh_apicv_exec_ctrl(vcpu);
|
|
|
|
}
|
|
|
|
|
2025-02-27 09:20:13 +08:00
|
|
|
static void vt_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return;
|
|
|
|
|
|
|
|
vmx_load_eoi_exitmap(vcpu, eoi_exit_bitmap);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int vt_set_tss_addr(struct kvm *kvm, unsigned int addr)
|
|
|
|
{
|
|
|
|
if (is_td(kvm))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
return vmx_set_tss_addr(kvm, addr);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int vt_set_identity_map_addr(struct kvm *kvm, u64 ident_addr)
|
|
|
|
{
|
|
|
|
if (is_td(kvm))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
return vmx_set_identity_map_addr(kvm, ident_addr);
|
|
|
|
}
|
|
|
|
|
2025-02-27 09:20:16 +08:00
|
|
|
static u64 vt_get_l2_tsc_offset(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
/* TDX doesn't support L2 guest at the moment. */
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
return vmx_get_l2_tsc_offset(vcpu);
|
|
|
|
}
|
|
|
|
|
|
|
|
static u64 vt_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
/* TDX doesn't support L2 guest at the moment. */
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
return vmx_get_l2_tsc_multiplier(vcpu);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void vt_write_tsc_offset(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
/* In TDX, tsc offset can't be changed. */
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return;
|
|
|
|
|
|
|
|
vmx_write_tsc_offset(vcpu);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void vt_write_tsc_multiplier(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
/* In TDX, tsc multiplier can't be changed. */
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return;
|
|
|
|
|
|
|
|
vmx_write_tsc_multiplier(vcpu);
|
|
|
|
}
|
|
|
|
|
2025-02-27 09:20:15 +08:00
|
|
|
#ifdef CONFIG_X86_64
|
|
|
|
static int vt_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc,
|
|
|
|
bool *expired)
|
|
|
|
{
|
|
|
|
/* VMX-preemption timer isn't available for TDX. */
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
return vmx_set_hv_timer(vcpu, guest_deadline_tsc, expired);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void vt_cancel_hv_timer(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
/* VMX-preemption timer can't be set. See vt_set_hv_timer(). */
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return;
|
|
|
|
|
|
|
|
vmx_cancel_hv_timer(vcpu);
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2025-02-27 09:20:17 +08:00
|
|
|
static void vt_setup_mce(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
if (is_td_vcpu(vcpu))
|
|
|
|
return;
|
|
|
|
|
|
|
|
vmx_setup_mce(vcpu);
|
|
|
|
}
|
|
|
|
|
2024-10-30 12:00:28 -07:00
|
|
|
static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
|
|
|
|
{
|
|
|
|
if (!is_td(kvm))
|
|
|
|
return -ENOTTY;
|
|
|
|
|
|
|
|
return tdx_vm_ioctl(kvm, argp);
|
|
|
|
}
|
|
|
|
|
2024-10-30 12:00:36 -07:00
|
|
|
static int vt_vcpu_mem_enc_ioctl(struct kvm_vcpu *vcpu, void __user *argp)
|
|
|
|
{
|
|
|
|
if (!is_td_vcpu(vcpu))
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
return tdx_vcpu_ioctl(vcpu, argp);
|
|
|
|
}
|
|
|
|
|
2024-11-12 15:38:16 +08:00
|
|
|
static int vt_gmem_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn)
|
|
|
|
{
|
|
|
|
if (is_td(kvm))
|
|
|
|
return tdx_gmem_private_max_mapping_level(kvm, pfn);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2025-03-18 00:35:09 -06:00
|
|
|
#define vt_op(name) vt_##name
|
|
|
|
#define vt_op_tdx_only(name) vt_##name
|
|
|
|
#else /* CONFIG_KVM_INTEL_TDX */
|
|
|
|
#define vt_op(name) vmx_##name
|
|
|
|
#define vt_op_tdx_only(name) NULL
|
|
|
|
#endif /* CONFIG_KVM_INTEL_TDX */
|
|
|
|
|
KVM: VMX: Move out vmx_x86_ops to 'main.c' to dispatch VMX and TDX
KVM accesses Virtual Machine Control Structure (VMCS) with VMX instructions
to operate on VM. TDX doesn't allow VMM to operate VMCS directly.
Instead, TDX has its own data structures, and TDX SEAMCALL APIs for VMM to
indirectly operate those data structures. This means we must have a TDX
version of kvm_x86_ops.
The existing global struct kvm_x86_ops already defines an interface which
can be adapted to TDX, but kvm_x86_ops is a system-wide, not per-VM
structure. To allow VMX to coexist with TDs, the kvm_x86_ops callbacks
will have wrappers "if (tdx) tdx_op() else vmx_op()" to pick VMX or
TDX at run time.
To split the runtime switch, the VMX implementation, and the TDX
implementation, add main.c, and move out the vmx_x86_ops hooks in
preparation for adding TDX. Use 'vt' for the naming scheme as a nod to
VT-x and as a concatenation of VmxTdx.
The eventually converted code will look like this:
vmx.c:
vmx_op() { ... }
VMX initialization
tdx.c:
tdx_op() { ... }
TDX initialization
x86_ops.h:
vmx_op();
tdx_op();
main.c:
static vt_op() { if (tdx) tdx_op() else vmx_op() }
static struct kvm_x86_ops vt_x86_ops = {
.op = vt_op,
initialization functions call both VMX and TDX initialization
Opportunistically, fix the name inconsistency from vmx_create_vcpu() and
vmx_free_vcpu() to vmx_vcpu_create() and vmx_vcpu_free().
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Yuan Yao <yuan.yao@intel.com>
Message-Id: <e603c317587f933a9d1bee8728c84e4935849c16.1705965634.git.isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-03-18 11:39:16 -04:00
|
|
|
#define VMX_REQUIRED_APICV_INHIBITS \
|
2024-05-06 22:53:21 +00:00
|
|
|
(BIT(APICV_INHIBIT_REASON_DISABLED) | \
|
KVM: VMX: Move out vmx_x86_ops to 'main.c' to dispatch VMX and TDX
KVM accesses Virtual Machine Control Structure (VMCS) with VMX instructions
to operate on VM. TDX doesn't allow VMM to operate VMCS directly.
Instead, TDX has its own data structures, and TDX SEAMCALL APIs for VMM to
indirectly operate those data structures. This means we must have a TDX
version of kvm_x86_ops.
The existing global struct kvm_x86_ops already defines an interface which
can be adapted to TDX, but kvm_x86_ops is a system-wide, not per-VM
structure. To allow VMX to coexist with TDs, the kvm_x86_ops callbacks
will have wrappers "if (tdx) tdx_op() else vmx_op()" to pick VMX or
TDX at run time.
To split the runtime switch, the VMX implementation, and the TDX
implementation, add main.c, and move out the vmx_x86_ops hooks in
preparation for adding TDX. Use 'vt' for the naming scheme as a nod to
VT-x and as a concatenation of VmxTdx.
The eventually converted code will look like this:
vmx.c:
vmx_op() { ... }
VMX initialization
tdx.c:
tdx_op() { ... }
TDX initialization
x86_ops.h:
vmx_op();
tdx_op();
main.c:
static vt_op() { if (tdx) tdx_op() else vmx_op() }
static struct kvm_x86_ops vt_x86_ops = {
.op = vt_op,
initialization functions call both VMX and TDX initialization
Opportunistically, fix the name inconsistency from vmx_create_vcpu() and
vmx_free_vcpu() to vmx_vcpu_create() and vmx_vcpu_free().
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Yuan Yao <yuan.yao@intel.com>
Message-Id: <e603c317587f933a9d1bee8728c84e4935849c16.1705965634.git.isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-03-18 11:39:16 -04:00
|
|
|
BIT(APICV_INHIBIT_REASON_ABSENT) | \
|
|
|
|
BIT(APICV_INHIBIT_REASON_HYPERV) | \
|
|
|
|
BIT(APICV_INHIBIT_REASON_BLOCKIRQ) | \
|
|
|
|
BIT(APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED) | \
|
|
|
|
BIT(APICV_INHIBIT_REASON_APIC_ID_MODIFIED) | \
|
|
|
|
BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED))
|
|
|
|
|
|
|
|
struct kvm_x86_ops vt_x86_ops __initdata = {
|
|
|
|
.name = KBUILD_MODNAME,
|
|
|
|
|
|
|
|
.check_processor_compatibility = vmx_check_processor_compat,
|
|
|
|
|
|
|
|
.hardware_unsetup = vmx_hardware_unsetup,
|
|
|
|
|
2024-08-29 21:35:56 -07:00
|
|
|
.enable_virtualization_cpu = vmx_enable_virtualization_cpu,
|
2025-03-18 00:35:09 -06:00
|
|
|
.disable_virtualization_cpu = vt_op(disable_virtualization_cpu),
|
2024-08-29 21:36:00 -07:00
|
|
|
.emergency_disable_virtualization_cpu = vmx_emergency_disable_virtualization_cpu,
|
|
|
|
|
2025-03-18 00:35:09 -06:00
|
|
|
.has_emulated_msr = vt_op(has_emulated_msr),
|
KVM: VMX: Move out vmx_x86_ops to 'main.c' to dispatch VMX and TDX
KVM accesses Virtual Machine Control Structure (VMCS) with VMX instructions
to operate on VM. TDX doesn't allow VMM to operate VMCS directly.
Instead, TDX has its own data structures, and TDX SEAMCALL APIs for VMM to
indirectly operate those data structures. This means we must have a TDX
version of kvm_x86_ops.
The existing global struct kvm_x86_ops already defines an interface which
can be adapted to TDX, but kvm_x86_ops is a system-wide, not per-VM
structure. To allow VMX to coexist with TDs, the kvm_x86_ops callbacks
will have wrappers "if (tdx) tdx_op() else vmx_op()" to pick VMX or
TDX at run time.
To split the runtime switch, the VMX implementation, and the TDX
implementation, add main.c, and move out the vmx_x86_ops hooks in
preparation for adding TDX. Use 'vt' for the naming scheme as a nod to
VT-x and as a concatenation of VmxTdx.
The eventually converted code will look like this:
vmx.c:
vmx_op() { ... }
VMX initialization
tdx.c:
tdx_op() { ... }
TDX initialization
x86_ops.h:
vmx_op();
tdx_op();
main.c:
static vt_op() { if (tdx) tdx_op() else vmx_op() }
static struct kvm_x86_ops vt_x86_ops = {
.op = vt_op,
initialization functions call both VMX and TDX initialization
Opportunistically, fix the name inconsistency from vmx_create_vcpu() and
vmx_free_vcpu() to vmx_vcpu_create() and vmx_vcpu_free().
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Yuan Yao <yuan.yao@intel.com>
Message-Id: <e603c317587f933a9d1bee8728c84e4935849c16.1705965634.git.isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-03-18 11:39:16 -04:00
|
|
|
|
|
|
|
.vm_size = sizeof(struct kvm_vmx),
|
2025-02-25 12:45:13 -05:00
|
|
|
|
2025-03-18 00:35:09 -06:00
|
|
|
.vm_init = vt_op(vm_init),
|
|
|
|
.vm_destroy = vt_op(vm_destroy),
|
|
|
|
.vm_pre_destroy = vt_op_tdx_only(vm_pre_destroy),
|
KVM: VMX: Move out vmx_x86_ops to 'main.c' to dispatch VMX and TDX
KVM accesses Virtual Machine Control Structure (VMCS) with VMX instructions
to operate on VM. TDX doesn't allow VMM to operate VMCS directly.
Instead, TDX has its own data structures, and TDX SEAMCALL APIs for VMM to
indirectly operate those data structures. This means we must have a TDX
version of kvm_x86_ops.
The existing global struct kvm_x86_ops already defines an interface which
can be adapted to TDX, but kvm_x86_ops is a system-wide, not per-VM
structure. To allow VMX to coexist with TDs, the kvm_x86_ops callbacks
will have wrappers "if (tdx) tdx_op() else vmx_op()" to pick VMX or
TDX at run time.
To split the runtime switch, the VMX implementation, and the TDX
implementation, add main.c, and move out the vmx_x86_ops hooks in
preparation for adding TDX. Use 'vt' for the naming scheme as a nod to
VT-x and as a concatenation of VmxTdx.
The eventually converted code will look like this:
vmx.c:
vmx_op() { ... }
VMX initialization
tdx.c:
tdx_op() { ... }
TDX initialization
x86_ops.h:
vmx_op();
tdx_op();
main.c:
static vt_op() { if (tdx) tdx_op() else vmx_op() }
static struct kvm_x86_ops vt_x86_ops = {
.op = vt_op,
initialization functions call both VMX and TDX initialization
Opportunistically, fix the name inconsistency from vmx_create_vcpu() and
vmx_free_vcpu() to vmx_vcpu_create() and vmx_vcpu_free().
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Yuan Yao <yuan.yao@intel.com>
Message-Id: <e603c317587f933a9d1bee8728c84e4935849c16.1705965634.git.isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-03-18 11:39:16 -04:00
|
|
|
|
2025-03-18 00:35:09 -06:00
|
|
|
.vcpu_precreate = vt_op(vcpu_precreate),
|
|
|
|
.vcpu_create = vt_op(vcpu_create),
|
|
|
|
.vcpu_free = vt_op(vcpu_free),
|
|
|
|
.vcpu_reset = vt_op(vcpu_reset),
|
KVM: VMX: Move out vmx_x86_ops to 'main.c' to dispatch VMX and TDX
KVM accesses Virtual Machine Control Structure (VMCS) with VMX instructions
to operate on VM. TDX doesn't allow VMM to operate VMCS directly.
Instead, TDX has its own data structures, and TDX SEAMCALL APIs for VMM to
indirectly operate those data structures. This means we must have a TDX
version of kvm_x86_ops.
The existing global struct kvm_x86_ops already defines an interface which
can be adapted to TDX, but kvm_x86_ops is a system-wide, not per-VM
structure. To allow VMX to coexist with TDs, the kvm_x86_ops callbacks
will have wrappers "if (tdx) tdx_op() else vmx_op()" to pick VMX or
TDX at run time.
To split the runtime switch, the VMX implementation, and the TDX
implementation, add main.c, and move out the vmx_x86_ops hooks in
preparation for adding TDX. Use 'vt' for the naming scheme as a nod to
VT-x and as a concatenation of VmxTdx.
The eventually converted code will look like this:
vmx.c:
vmx_op() { ... }
VMX initialization
tdx.c:
tdx_op() { ... }
TDX initialization
x86_ops.h:
vmx_op();
tdx_op();
main.c:
static vt_op() { if (tdx) tdx_op() else vmx_op() }
static struct kvm_x86_ops vt_x86_ops = {
.op = vt_op,
initialization functions call both VMX and TDX initialization
Opportunistically, fix the name inconsistency from vmx_create_vcpu() and
vmx_free_vcpu() to vmx_vcpu_create() and vmx_vcpu_free().
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Yuan Yao <yuan.yao@intel.com>
Message-Id: <e603c317587f933a9d1bee8728c84e4935849c16.1705965634.git.isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-03-18 11:39:16 -04:00
|
|
|
|
2025-03-18 00:35:09 -06:00
|
|
|
.prepare_switch_to_guest = vt_op(prepare_switch_to_guest),
|
|
|
|
.vcpu_load = vt_op(vcpu_load),
|
|
|
|
.vcpu_put = vt_op(vcpu_put),
|
KVM: VMX: Move out vmx_x86_ops to 'main.c' to dispatch VMX and TDX
KVM accesses Virtual Machine Control Structure (VMCS) with VMX instructions
to operate on VM. TDX doesn't allow VMM to operate VMCS directly.
Instead, TDX has its own data structures, and TDX SEAMCALL APIs for VMM to
indirectly operate those data structures. This means we must have a TDX
version of kvm_x86_ops.
The existing global struct kvm_x86_ops already defines an interface which
can be adapted to TDX, but kvm_x86_ops is a system-wide, not per-VM
structure. To allow VMX to coexist with TDs, the kvm_x86_ops callbacks
will have wrappers "if (tdx) tdx_op() else vmx_op()" to pick VMX or
TDX at run time.
To split the runtime switch, the VMX implementation, and the TDX
implementation, add main.c, and move out the vmx_x86_ops hooks in
preparation for adding TDX. Use 'vt' for the naming scheme as a nod to
VT-x and as a concatenation of VmxTdx.
The eventually converted code will look like this:
vmx.c:
vmx_op() { ... }
VMX initialization
tdx.c:
tdx_op() { ... }
TDX initialization
x86_ops.h:
vmx_op();
tdx_op();
main.c:
static vt_op() { if (tdx) tdx_op() else vmx_op() }
static struct kvm_x86_ops vt_x86_ops = {
.op = vt_op,
initialization functions call both VMX and TDX initialization
Opportunistically, fix the name inconsistency from vmx_create_vcpu() and
vmx_free_vcpu() to vmx_vcpu_create() and vmx_vcpu_free().
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Yuan Yao <yuan.yao@intel.com>
Message-Id: <e603c317587f933a9d1bee8728c84e4935849c16.1705965634.git.isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-03-18 11:39:16 -04:00
|
|
|
|
2025-06-26 09:14:20 -07:00
|
|
|
.HOST_OWNED_DEBUGCTL = VMX_HOST_OWNED_DEBUGCTL_BITS,
|
KVM: VMX: Preserve host's DEBUGCTLMSR_FREEZE_IN_SMM while running the guest
Set/clear DEBUGCTLMSR_FREEZE_IN_SMM in GUEST_IA32_DEBUGCTL based on the
host's pre-VM-Enter value, i.e. preserve the host's FREEZE_IN_SMM setting
while running the guest. When running with the "default treatment of SMIs"
in effect (the only mode KVM supports), SMIs do not generate a VM-Exit that
is visible to host (non-SMM) software, and instead transitions directly
from VMX non-root to SMM. And critically, DEBUGCTL isn't context switched
by hardware on SMI or RSM, i.e. SMM will run with whatever value was
resident in hardware at the time of the SMI.
Failure to preserve FREEZE_IN_SMM results in the PMU unexpectedly counting
events while the CPU is executing in SMM, which can pollute profiling and
potentially leak information into the guest.
Check for changes in FREEZE_IN_SMM prior to every entry into KVM's inner
run loop, as the bit can be toggled in IRQ context via IPI callback (SMP
function call), by way of /sys/devices/cpu/freeze_on_smi.
Add a field in kvm_x86_ops to communicate which DEBUGCTL bits need to be
preserved, as FREEZE_IN_SMM is only supported and defined for Intel CPUs,
i.e. explicitly checking FREEZE_IN_SMM in common x86 is at best weird, and
at worst could lead to undesirable behavior in the future if AMD CPUs ever
happened to pick up a collision with the bit.
Exempt TDX vCPUs, i.e. protected guests, from the check, as the TDX Module
owns and controls GUEST_IA32_DEBUGCTL.
WARN in SVM if KVM_RUN_LOAD_DEBUGCTL is set, mostly to document that the
lack of handling isn't a KVM bug (TDX already WARNs on any run_flag).
Lastly, explicitly reload GUEST_IA32_DEBUGCTL on a VM-Fail that is missed
by KVM but detected by hardware, i.e. in nested_vmx_restore_host_state().
Doing so avoids the need to track host_debugctl on a per-VMCS basis, as
GUEST_IA32_DEBUGCTL is unconditionally written by prepare_vmcs02() and
load_vmcs12_host_state(). For the VM-Fail case, even though KVM won't
have actually entered the guest, vcpu_enter_guest() will have run with
vmcs02 active and thus could result in vmcs01 being run with a stale value.
Cc: stable@vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20250610232010.162191-9-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-10 16:20:10 -07:00
|
|
|
|
2025-03-18 00:35:09 -06:00
|
|
|
.update_exception_bitmap = vt_op(update_exception_bitmap),
|
2024-08-02 11:19:30 -07:00
|
|
|
.get_feature_msr = vmx_get_feature_msr,
|
2025-03-18 00:35:09 -06:00
|
|
|
.get_msr = vt_op(get_msr),
|
|
|
|
.set_msr = vt_op(set_msr),
|
|
|
|
|
|
|
|
.get_segment_base = vt_op(get_segment_base),
|
|
|
|
.get_segment = vt_op(get_segment),
|
|
|
|
.set_segment = vt_op(set_segment),
|
|
|
|
.get_cpl = vt_op(get_cpl),
|
|
|
|
.get_cpl_no_cache = vt_op(get_cpl_no_cache),
|
|
|
|
.get_cs_db_l_bits = vt_op(get_cs_db_l_bits),
|
|
|
|
.is_valid_cr0 = vt_op(is_valid_cr0),
|
|
|
|
.set_cr0 = vt_op(set_cr0),
|
|
|
|
.is_valid_cr4 = vt_op(is_valid_cr4),
|
|
|
|
.set_cr4 = vt_op(set_cr4),
|
|
|
|
.set_efer = vt_op(set_efer),
|
|
|
|
.get_idt = vt_op(get_idt),
|
|
|
|
.set_idt = vt_op(set_idt),
|
|
|
|
.get_gdt = vt_op(get_gdt),
|
|
|
|
.set_gdt = vt_op(set_gdt),
|
|
|
|
.set_dr7 = vt_op(set_dr7),
|
|
|
|
.sync_dirty_debug_regs = vt_op(sync_dirty_debug_regs),
|
|
|
|
.cache_reg = vt_op(cache_reg),
|
|
|
|
.get_rflags = vt_op(get_rflags),
|
|
|
|
.set_rflags = vt_op(set_rflags),
|
|
|
|
.get_if_flag = vt_op(get_if_flag),
|
|
|
|
|
|
|
|
.flush_tlb_all = vt_op(flush_tlb_all),
|
|
|
|
.flush_tlb_current = vt_op(flush_tlb_current),
|
|
|
|
.flush_tlb_gva = vt_op(flush_tlb_gva),
|
|
|
|
.flush_tlb_guest = vt_op(flush_tlb_guest),
|
|
|
|
|
|
|
|
.vcpu_pre_run = vt_op(vcpu_pre_run),
|
|
|
|
.vcpu_run = vt_op(vcpu_run),
|
|
|
|
.handle_exit = vt_op(handle_exit),
|
KVM: VMX: Move out vmx_x86_ops to 'main.c' to dispatch VMX and TDX
KVM accesses Virtual Machine Control Structure (VMCS) with VMX instructions
to operate on VM. TDX doesn't allow VMM to operate VMCS directly.
Instead, TDX has its own data structures, and TDX SEAMCALL APIs for VMM to
indirectly operate those data structures. This means we must have a TDX
version of kvm_x86_ops.
The existing global struct kvm_x86_ops already defines an interface which
can be adapted to TDX, but kvm_x86_ops is a system-wide, not per-VM
structure. To allow VMX to coexist with TDs, the kvm_x86_ops callbacks
will have wrappers "if (tdx) tdx_op() else vmx_op()" to pick VMX or
TDX at run time.
To split the runtime switch, the VMX implementation, and the TDX
implementation, add main.c, and move out the vmx_x86_ops hooks in
preparation for adding TDX. Use 'vt' for the naming scheme as a nod to
VT-x and as a concatenation of VmxTdx.
The eventually converted code will look like this:
vmx.c:
vmx_op() { ... }
VMX initialization
tdx.c:
tdx_op() { ... }
TDX initialization
x86_ops.h:
vmx_op();
tdx_op();
main.c:
static vt_op() { if (tdx) tdx_op() else vmx_op() }
static struct kvm_x86_ops vt_x86_ops = {
.op = vt_op,
initialization functions call both VMX and TDX initialization
Opportunistically, fix the name inconsistency from vmx_create_vcpu() and
vmx_free_vcpu() to vmx_vcpu_create() and vmx_vcpu_free().
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Yuan Yao <yuan.yao@intel.com>
Message-Id: <e603c317587f933a9d1bee8728c84e4935849c16.1705965634.git.isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-03-18 11:39:16 -04:00
|
|
|
.skip_emulated_instruction = vmx_skip_emulated_instruction,
|
|
|
|
.update_emulated_instruction = vmx_update_emulated_instruction,
|
2025-03-18 00:35:09 -06:00
|
|
|
.set_interrupt_shadow = vt_op(set_interrupt_shadow),
|
|
|
|
.get_interrupt_shadow = vt_op(get_interrupt_shadow),
|
|
|
|
.patch_hypercall = vt_op(patch_hypercall),
|
|
|
|
.inject_irq = vt_op(inject_irq),
|
|
|
|
.inject_nmi = vt_op(inject_nmi),
|
|
|
|
.inject_exception = vt_op(inject_exception),
|
|
|
|
.cancel_injection = vt_op(cancel_injection),
|
|
|
|
.interrupt_allowed = vt_op(interrupt_allowed),
|
|
|
|
.nmi_allowed = vt_op(nmi_allowed),
|
|
|
|
.get_nmi_mask = vt_op(get_nmi_mask),
|
|
|
|
.set_nmi_mask = vt_op(set_nmi_mask),
|
|
|
|
.enable_nmi_window = vt_op(enable_nmi_window),
|
|
|
|
.enable_irq_window = vt_op(enable_irq_window),
|
|
|
|
.update_cr8_intercept = vt_op(update_cr8_intercept),
|
2024-07-19 16:51:00 -07:00
|
|
|
|
|
|
|
.x2apic_icr_is_split = false,
|
2025-03-18 00:35:09 -06:00
|
|
|
.set_virtual_apic_mode = vt_op(set_virtual_apic_mode),
|
|
|
|
.set_apic_access_page_addr = vt_op(set_apic_access_page_addr),
|
|
|
|
.refresh_apicv_exec_ctrl = vt_op(refresh_apicv_exec_ctrl),
|
|
|
|
.load_eoi_exitmap = vt_op(load_eoi_exitmap),
|
2025-03-18 00:35:07 -06:00
|
|
|
.apicv_pre_state_restore = pi_apicv_pre_state_restore,
|
KVM: VMX: Move out vmx_x86_ops to 'main.c' to dispatch VMX and TDX
KVM accesses Virtual Machine Control Structure (VMCS) with VMX instructions
to operate on VM. TDX doesn't allow VMM to operate VMCS directly.
Instead, TDX has its own data structures, and TDX SEAMCALL APIs for VMM to
indirectly operate those data structures. This means we must have a TDX
version of kvm_x86_ops.
The existing global struct kvm_x86_ops already defines an interface which
can be adapted to TDX, but kvm_x86_ops is a system-wide, not per-VM
structure. To allow VMX to coexist with TDs, the kvm_x86_ops callbacks
will have wrappers "if (tdx) tdx_op() else vmx_op()" to pick VMX or
TDX at run time.
To split the runtime switch, the VMX implementation, and the TDX
implementation, add main.c, and move out the vmx_x86_ops hooks in
preparation for adding TDX. Use 'vt' for the naming scheme as a nod to
VT-x and as a concatenation of VmxTdx.
The eventually converted code will look like this:
vmx.c:
vmx_op() { ... }
VMX initialization
tdx.c:
tdx_op() { ... }
TDX initialization
x86_ops.h:
vmx_op();
tdx_op();
main.c:
static vt_op() { if (tdx) tdx_op() else vmx_op() }
static struct kvm_x86_ops vt_x86_ops = {
.op = vt_op,
initialization functions call both VMX and TDX initialization
Opportunistically, fix the name inconsistency from vmx_create_vcpu() and
vmx_free_vcpu() to vmx_vcpu_create() and vmx_vcpu_free().
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Yuan Yao <yuan.yao@intel.com>
Message-Id: <e603c317587f933a9d1bee8728c84e4935849c16.1705965634.git.isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-03-18 11:39:16 -04:00
|
|
|
.required_apicv_inhibits = VMX_REQUIRED_APICV_INHIBITS,
|
2025-03-18 00:35:09 -06:00
|
|
|
.hwapic_isr_update = vt_op(hwapic_isr_update),
|
|
|
|
.sync_pir_to_irr = vt_op(sync_pir_to_irr),
|
|
|
|
.deliver_interrupt = vt_op(deliver_interrupt),
|
KVM: VMX: Move out vmx_x86_ops to 'main.c' to dispatch VMX and TDX
KVM accesses Virtual Machine Control Structure (VMCS) with VMX instructions
to operate on VM. TDX doesn't allow VMM to operate VMCS directly.
Instead, TDX has its own data structures, and TDX SEAMCALL APIs for VMM to
indirectly operate those data structures. This means we must have a TDX
version of kvm_x86_ops.
The existing global struct kvm_x86_ops already defines an interface which
can be adapted to TDX, but kvm_x86_ops is a system-wide, not per-VM
structure. To allow VMX to coexist with TDs, the kvm_x86_ops callbacks
will have wrappers "if (tdx) tdx_op() else vmx_op()" to pick VMX or
TDX at run time.
To split the runtime switch, the VMX implementation, and the TDX
implementation, add main.c, and move out the vmx_x86_ops hooks in
preparation for adding TDX. Use 'vt' for the naming scheme as a nod to
VT-x and as a concatenation of VmxTdx.
The eventually converted code will look like this:
vmx.c:
vmx_op() { ... }
VMX initialization
tdx.c:
tdx_op() { ... }
TDX initialization
x86_ops.h:
vmx_op();
tdx_op();
main.c:
static vt_op() { if (tdx) tdx_op() else vmx_op() }
static struct kvm_x86_ops vt_x86_ops = {
.op = vt_op,
initialization functions call both VMX and TDX initialization
Opportunistically, fix the name inconsistency from vmx_create_vcpu() and
vmx_free_vcpu() to vmx_vcpu_create() and vmx_vcpu_free().
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Yuan Yao <yuan.yao@intel.com>
Message-Id: <e603c317587f933a9d1bee8728c84e4935849c16.1705965634.git.isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-03-18 11:39:16 -04:00
|
|
|
.dy_apicv_has_pending_interrupt = pi_has_pending_interrupt,
|
|
|
|
|
2025-03-18 00:35:09 -06:00
|
|
|
.set_tss_addr = vt_op(set_tss_addr),
|
|
|
|
.set_identity_map_addr = vt_op(set_identity_map_addr),
|
KVM: VMX: Move out vmx_x86_ops to 'main.c' to dispatch VMX and TDX
KVM accesses Virtual Machine Control Structure (VMCS) with VMX instructions
to operate on VM. TDX doesn't allow VMM to operate VMCS directly.
Instead, TDX has its own data structures, and TDX SEAMCALL APIs for VMM to
indirectly operate those data structures. This means we must have a TDX
version of kvm_x86_ops.
The existing global struct kvm_x86_ops already defines an interface which
can be adapted to TDX, but kvm_x86_ops is a system-wide, not per-VM
structure. To allow VMX to coexist with TDs, the kvm_x86_ops callbacks
will have wrappers "if (tdx) tdx_op() else vmx_op()" to pick VMX or
TDX at run time.
To split the runtime switch, the VMX implementation, and the TDX
implementation, add main.c, and move out the vmx_x86_ops hooks in
preparation for adding TDX. Use 'vt' for the naming scheme as a nod to
VT-x and as a concatenation of VmxTdx.
The eventually converted code will look like this:
vmx.c:
vmx_op() { ... }
VMX initialization
tdx.c:
tdx_op() { ... }
TDX initialization
x86_ops.h:
vmx_op();
tdx_op();
main.c:
static vt_op() { if (tdx) tdx_op() else vmx_op() }
static struct kvm_x86_ops vt_x86_ops = {
.op = vt_op,
initialization functions call both VMX and TDX initialization
Opportunistically, fix the name inconsistency from vmx_create_vcpu() and
vmx_free_vcpu() to vmx_vcpu_create() and vmx_vcpu_free().
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Yuan Yao <yuan.yao@intel.com>
Message-Id: <e603c317587f933a9d1bee8728c84e4935849c16.1705965634.git.isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-03-18 11:39:16 -04:00
|
|
|
.get_mt_mask = vmx_get_mt_mask,
|
|
|
|
|
2025-03-18 00:35:09 -06:00
|
|
|
.get_exit_info = vt_op(get_exit_info),
|
|
|
|
.get_entry_info = vt_op(get_entry_info),
|
KVM: VMX: Move out vmx_x86_ops to 'main.c' to dispatch VMX and TDX
KVM accesses Virtual Machine Control Structure (VMCS) with VMX instructions
to operate on VM. TDX doesn't allow VMM to operate VMCS directly.
Instead, TDX has its own data structures, and TDX SEAMCALL APIs for VMM to
indirectly operate those data structures. This means we must have a TDX
version of kvm_x86_ops.
The existing global struct kvm_x86_ops already defines an interface which
can be adapted to TDX, but kvm_x86_ops is a system-wide, not per-VM
structure. To allow VMX to coexist with TDs, the kvm_x86_ops callbacks
will have wrappers "if (tdx) tdx_op() else vmx_op()" to pick VMX or
TDX at run time.
To split the runtime switch, the VMX implementation, and the TDX
implementation, add main.c, and move out the vmx_x86_ops hooks in
preparation for adding TDX. Use 'vt' for the naming scheme as a nod to
VT-x and as a concatenation of VmxTdx.
The eventually converted code will look like this:
vmx.c:
vmx_op() { ... }
VMX initialization
tdx.c:
tdx_op() { ... }
TDX initialization
x86_ops.h:
vmx_op();
tdx_op();
main.c:
static vt_op() { if (tdx) tdx_op() else vmx_op() }
static struct kvm_x86_ops vt_x86_ops = {
.op = vt_op,
initialization functions call both VMX and TDX initialization
Opportunistically, fix the name inconsistency from vmx_create_vcpu() and
vmx_free_vcpu() to vmx_vcpu_create() and vmx_vcpu_free().
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Yuan Yao <yuan.yao@intel.com>
Message-Id: <e603c317587f933a9d1bee8728c84e4935849c16.1705965634.git.isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-03-18 11:39:16 -04:00
|
|
|
|
2025-03-18 00:35:09 -06:00
|
|
|
.vcpu_after_set_cpuid = vt_op(vcpu_after_set_cpuid),
|
KVM: VMX: Move out vmx_x86_ops to 'main.c' to dispatch VMX and TDX
KVM accesses Virtual Machine Control Structure (VMCS) with VMX instructions
to operate on VM. TDX doesn't allow VMM to operate VMCS directly.
Instead, TDX has its own data structures, and TDX SEAMCALL APIs for VMM to
indirectly operate those data structures. This means we must have a TDX
version of kvm_x86_ops.
The existing global struct kvm_x86_ops already defines an interface which
can be adapted to TDX, but kvm_x86_ops is a system-wide, not per-VM
structure. To allow VMX to coexist with TDs, the kvm_x86_ops callbacks
will have wrappers "if (tdx) tdx_op() else vmx_op()" to pick VMX or
TDX at run time.
To split the runtime switch, the VMX implementation, and the TDX
implementation, add main.c, and move out the vmx_x86_ops hooks in
preparation for adding TDX. Use 'vt' for the naming scheme as a nod to
VT-x and as a concatenation of VmxTdx.
The eventually converted code will look like this:
vmx.c:
vmx_op() { ... }
VMX initialization
tdx.c:
tdx_op() { ... }
TDX initialization
x86_ops.h:
vmx_op();
tdx_op();
main.c:
static vt_op() { if (tdx) tdx_op() else vmx_op() }
static struct kvm_x86_ops vt_x86_ops = {
.op = vt_op,
initialization functions call both VMX and TDX initialization
Opportunistically, fix the name inconsistency from vmx_create_vcpu() and
vmx_free_vcpu() to vmx_vcpu_create() and vmx_vcpu_free().
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Yuan Yao <yuan.yao@intel.com>
Message-Id: <e603c317587f933a9d1bee8728c84e4935849c16.1705965634.git.isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-03-18 11:39:16 -04:00
|
|
|
|
|
|
|
.has_wbinvd_exit = cpu_has_vmx_wbinvd_exit,
|
|
|
|
|
2025-03-18 00:35:09 -06:00
|
|
|
.get_l2_tsc_offset = vt_op(get_l2_tsc_offset),
|
|
|
|
.get_l2_tsc_multiplier = vt_op(get_l2_tsc_multiplier),
|
|
|
|
.write_tsc_offset = vt_op(write_tsc_offset),
|
|
|
|
.write_tsc_multiplier = vt_op(write_tsc_multiplier),
|
KVM: VMX: Move out vmx_x86_ops to 'main.c' to dispatch VMX and TDX
KVM accesses Virtual Machine Control Structure (VMCS) with VMX instructions
to operate on VM. TDX doesn't allow VMM to operate VMCS directly.
Instead, TDX has its own data structures, and TDX SEAMCALL APIs for VMM to
indirectly operate those data structures. This means we must have a TDX
version of kvm_x86_ops.
The existing global struct kvm_x86_ops already defines an interface which
can be adapted to TDX, but kvm_x86_ops is a system-wide, not per-VM
structure. To allow VMX to coexist with TDs, the kvm_x86_ops callbacks
will have wrappers "if (tdx) tdx_op() else vmx_op()" to pick VMX or
TDX at run time.
To split the runtime switch, the VMX implementation, and the TDX
implementation, add main.c, and move out the vmx_x86_ops hooks in
preparation for adding TDX. Use 'vt' for the naming scheme as a nod to
VT-x and as a concatenation of VmxTdx.
The eventually converted code will look like this:
vmx.c:
vmx_op() { ... }
VMX initialization
tdx.c:
tdx_op() { ... }
TDX initialization
x86_ops.h:
vmx_op();
tdx_op();
main.c:
static vt_op() { if (tdx) tdx_op() else vmx_op() }
static struct kvm_x86_ops vt_x86_ops = {
.op = vt_op,
initialization functions call both VMX and TDX initialization
Opportunistically, fix the name inconsistency from vmx_create_vcpu() and
vmx_free_vcpu() to vmx_vcpu_create() and vmx_vcpu_free().
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Yuan Yao <yuan.yao@intel.com>
Message-Id: <e603c317587f933a9d1bee8728c84e4935849c16.1705965634.git.isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-03-18 11:39:16 -04:00
|
|
|
|
2025-03-18 00:35:09 -06:00
|
|
|
.load_mmu_pgd = vt_op(load_mmu_pgd),
|
KVM: VMX: Move out vmx_x86_ops to 'main.c' to dispatch VMX and TDX
KVM accesses Virtual Machine Control Structure (VMCS) with VMX instructions
to operate on VM. TDX doesn't allow VMM to operate VMCS directly.
Instead, TDX has its own data structures, and TDX SEAMCALL APIs for VMM to
indirectly operate those data structures. This means we must have a TDX
version of kvm_x86_ops.
The existing global struct kvm_x86_ops already defines an interface which
can be adapted to TDX, but kvm_x86_ops is a system-wide, not per-VM
structure. To allow VMX to coexist with TDs, the kvm_x86_ops callbacks
will have wrappers "if (tdx) tdx_op() else vmx_op()" to pick VMX or
TDX at run time.
To split the runtime switch, the VMX implementation, and the TDX
implementation, add main.c, and move out the vmx_x86_ops hooks in
preparation for adding TDX. Use 'vt' for the naming scheme as a nod to
VT-x and as a concatenation of VmxTdx.
The eventually converted code will look like this:
vmx.c:
vmx_op() { ... }
VMX initialization
tdx.c:
tdx_op() { ... }
TDX initialization
x86_ops.h:
vmx_op();
tdx_op();
main.c:
static vt_op() { if (tdx) tdx_op() else vmx_op() }
static struct kvm_x86_ops vt_x86_ops = {
.op = vt_op,
initialization functions call both VMX and TDX initialization
Opportunistically, fix the name inconsistency from vmx_create_vcpu() and
vmx_free_vcpu() to vmx_vcpu_create() and vmx_vcpu_free().
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Yuan Yao <yuan.yao@intel.com>
Message-Id: <e603c317587f933a9d1bee8728c84e4935849c16.1705965634.git.isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-03-18 11:39:16 -04:00
|
|
|
|
|
|
|
.check_intercept = vmx_check_intercept,
|
|
|
|
.handle_exit_irqoff = vmx_handle_exit_irqoff,
|
|
|
|
|
2025-03-18 00:35:09 -06:00
|
|
|
.update_cpu_dirty_logging = vt_op(update_cpu_dirty_logging),
|
KVM: VMX: Move out vmx_x86_ops to 'main.c' to dispatch VMX and TDX
KVM accesses Virtual Machine Control Structure (VMCS) with VMX instructions
to operate on VM. TDX doesn't allow VMM to operate VMCS directly.
Instead, TDX has its own data structures, and TDX SEAMCALL APIs for VMM to
indirectly operate those data structures. This means we must have a TDX
version of kvm_x86_ops.
The existing global struct kvm_x86_ops already defines an interface which
can be adapted to TDX, but kvm_x86_ops is a system-wide, not per-VM
structure. To allow VMX to coexist with TDs, the kvm_x86_ops callbacks
will have wrappers "if (tdx) tdx_op() else vmx_op()" to pick VMX or
TDX at run time.
To split the runtime switch, the VMX implementation, and the TDX
implementation, add main.c, and move out the vmx_x86_ops hooks in
preparation for adding TDX. Use 'vt' for the naming scheme as a nod to
VT-x and as a concatenation of VmxTdx.
The eventually converted code will look like this:
vmx.c:
vmx_op() { ... }
VMX initialization
tdx.c:
tdx_op() { ... }
TDX initialization
x86_ops.h:
vmx_op();
tdx_op();
main.c:
static vt_op() { if (tdx) tdx_op() else vmx_op() }
static struct kvm_x86_ops vt_x86_ops = {
.op = vt_op,
initialization functions call both VMX and TDX initialization
Opportunistically, fix the name inconsistency from vmx_create_vcpu() and
vmx_free_vcpu() to vmx_vcpu_create() and vmx_vcpu_free().
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Yuan Yao <yuan.yao@intel.com>
Message-Id: <e603c317587f933a9d1bee8728c84e4935849c16.1705965634.git.isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-03-18 11:39:16 -04:00
|
|
|
|
|
|
|
.nested_ops = &vmx_nested_ops,
|
|
|
|
|
|
|
|
.pi_update_irte = vmx_pi_update_irte,
|
2025-06-11 15:45:56 -07:00
|
|
|
.pi_start_bypass = vmx_pi_start_bypass,
|
KVM: VMX: Move out vmx_x86_ops to 'main.c' to dispatch VMX and TDX
KVM accesses Virtual Machine Control Structure (VMCS) with VMX instructions
to operate on VM. TDX doesn't allow VMM to operate VMCS directly.
Instead, TDX has its own data structures, and TDX SEAMCALL APIs for VMM to
indirectly operate those data structures. This means we must have a TDX
version of kvm_x86_ops.
The existing global struct kvm_x86_ops already defines an interface which
can be adapted to TDX, but kvm_x86_ops is a system-wide, not per-VM
structure. To allow VMX to coexist with TDs, the kvm_x86_ops callbacks
will have wrappers "if (tdx) tdx_op() else vmx_op()" to pick VMX or
TDX at run time.
To split the runtime switch, the VMX implementation, and the TDX
implementation, add main.c, and move out the vmx_x86_ops hooks in
preparation for adding TDX. Use 'vt' for the naming scheme as a nod to
VT-x and as a concatenation of VmxTdx.
The eventually converted code will look like this:
vmx.c:
vmx_op() { ... }
VMX initialization
tdx.c:
tdx_op() { ... }
TDX initialization
x86_ops.h:
vmx_op();
tdx_op();
main.c:
static vt_op() { if (tdx) tdx_op() else vmx_op() }
static struct kvm_x86_ops vt_x86_ops = {
.op = vt_op,
initialization functions call both VMX and TDX initialization
Opportunistically, fix the name inconsistency from vmx_create_vcpu() and
vmx_free_vcpu() to vmx_vcpu_create() and vmx_vcpu_free().
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Yuan Yao <yuan.yao@intel.com>
Message-Id: <e603c317587f933a9d1bee8728c84e4935849c16.1705965634.git.isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-03-18 11:39:16 -04:00
|
|
|
|
|
|
|
#ifdef CONFIG_X86_64
|
2025-03-18 00:35:09 -06:00
|
|
|
.set_hv_timer = vt_op(set_hv_timer),
|
|
|
|
.cancel_hv_timer = vt_op(cancel_hv_timer),
|
KVM: VMX: Move out vmx_x86_ops to 'main.c' to dispatch VMX and TDX
KVM accesses Virtual Machine Control Structure (VMCS) with VMX instructions
to operate on VM. TDX doesn't allow VMM to operate VMCS directly.
Instead, TDX has its own data structures, and TDX SEAMCALL APIs for VMM to
indirectly operate those data structures. This means we must have a TDX
version of kvm_x86_ops.
The existing global struct kvm_x86_ops already defines an interface which
can be adapted to TDX, but kvm_x86_ops is a system-wide, not per-VM
structure. To allow VMX to coexist with TDs, the kvm_x86_ops callbacks
will have wrappers "if (tdx) tdx_op() else vmx_op()" to pick VMX or
TDX at run time.
To split the runtime switch, the VMX implementation, and the TDX
implementation, add main.c, and move out the vmx_x86_ops hooks in
preparation for adding TDX. Use 'vt' for the naming scheme as a nod to
VT-x and as a concatenation of VmxTdx.
The eventually converted code will look like this:
vmx.c:
vmx_op() { ... }
VMX initialization
tdx.c:
tdx_op() { ... }
TDX initialization
x86_ops.h:
vmx_op();
tdx_op();
main.c:
static vt_op() { if (tdx) tdx_op() else vmx_op() }
static struct kvm_x86_ops vt_x86_ops = {
.op = vt_op,
initialization functions call both VMX and TDX initialization
Opportunistically, fix the name inconsistency from vmx_create_vcpu() and
vmx_free_vcpu() to vmx_vcpu_create() and vmx_vcpu_free().
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Yuan Yao <yuan.yao@intel.com>
Message-Id: <e603c317587f933a9d1bee8728c84e4935849c16.1705965634.git.isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-03-18 11:39:16 -04:00
|
|
|
#endif
|
|
|
|
|
2025-03-18 00:35:09 -06:00
|
|
|
.setup_mce = vt_op(setup_mce),
|
KVM: VMX: Move out vmx_x86_ops to 'main.c' to dispatch VMX and TDX
KVM accesses Virtual Machine Control Structure (VMCS) with VMX instructions
to operate on VM. TDX doesn't allow VMM to operate VMCS directly.
Instead, TDX has its own data structures, and TDX SEAMCALL APIs for VMM to
indirectly operate those data structures. This means we must have a TDX
version of kvm_x86_ops.
The existing global struct kvm_x86_ops already defines an interface which
can be adapted to TDX, but kvm_x86_ops is a system-wide, not per-VM
structure. To allow VMX to coexist with TDs, the kvm_x86_ops callbacks
will have wrappers "if (tdx) tdx_op() else vmx_op()" to pick VMX or
TDX at run time.
To split the runtime switch, the VMX implementation, and the TDX
implementation, add main.c, and move out the vmx_x86_ops hooks in
preparation for adding TDX. Use 'vt' for the naming scheme as a nod to
VT-x and as a concatenation of VmxTdx.
The eventually converted code will look like this:
vmx.c:
vmx_op() { ... }
VMX initialization
tdx.c:
tdx_op() { ... }
TDX initialization
x86_ops.h:
vmx_op();
tdx_op();
main.c:
static vt_op() { if (tdx) tdx_op() else vmx_op() }
static struct kvm_x86_ops vt_x86_ops = {
.op = vt_op,
initialization functions call both VMX and TDX initialization
Opportunistically, fix the name inconsistency from vmx_create_vcpu() and
vmx_free_vcpu() to vmx_vcpu_create() and vmx_vcpu_free().
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Yuan Yao <yuan.yao@intel.com>
Message-Id: <e603c317587f933a9d1bee8728c84e4935849c16.1705965634.git.isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-03-18 11:39:16 -04:00
|
|
|
|
|
|
|
#ifdef CONFIG_KVM_SMM
|
2025-03-18 00:35:09 -06:00
|
|
|
.smi_allowed = vt_op(smi_allowed),
|
|
|
|
.enter_smm = vt_op(enter_smm),
|
|
|
|
.leave_smm = vt_op(leave_smm),
|
|
|
|
.enable_smi_window = vt_op(enable_smi_window),
|
KVM: VMX: Move out vmx_x86_ops to 'main.c' to dispatch VMX and TDX
KVM accesses Virtual Machine Control Structure (VMCS) with VMX instructions
to operate on VM. TDX doesn't allow VMM to operate VMCS directly.
Instead, TDX has its own data structures, and TDX SEAMCALL APIs for VMM to
indirectly operate those data structures. This means we must have a TDX
version of kvm_x86_ops.
The existing global struct kvm_x86_ops already defines an interface which
can be adapted to TDX, but kvm_x86_ops is a system-wide, not per-VM
structure. To allow VMX to coexist with TDs, the kvm_x86_ops callbacks
will have wrappers "if (tdx) tdx_op() else vmx_op()" to pick VMX or
TDX at run time.
To split the runtime switch, the VMX implementation, and the TDX
implementation, add main.c, and move out the vmx_x86_ops hooks in
preparation for adding TDX. Use 'vt' for the naming scheme as a nod to
VT-x and as a concatenation of VmxTdx.
The eventually converted code will look like this:
vmx.c:
vmx_op() { ... }
VMX initialization
tdx.c:
tdx_op() { ... }
TDX initialization
x86_ops.h:
vmx_op();
tdx_op();
main.c:
static vt_op() { if (tdx) tdx_op() else vmx_op() }
static struct kvm_x86_ops vt_x86_ops = {
.op = vt_op,
initialization functions call both VMX and TDX initialization
Opportunistically, fix the name inconsistency from vmx_create_vcpu() and
vmx_free_vcpu() to vmx_vcpu_create() and vmx_vcpu_free().
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Yuan Yao <yuan.yao@intel.com>
Message-Id: <e603c317587f933a9d1bee8728c84e4935849c16.1705965634.git.isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-03-18 11:39:16 -04:00
|
|
|
#endif
|
|
|
|
|
2025-03-18 00:35:09 -06:00
|
|
|
.check_emulate_instruction = vt_op(check_emulate_instruction),
|
|
|
|
.apic_init_signal_blocked = vt_op(apic_init_signal_blocked),
|
KVM: VMX: Move out vmx_x86_ops to 'main.c' to dispatch VMX and TDX
KVM accesses Virtual Machine Control Structure (VMCS) with VMX instructions
to operate on VM. TDX doesn't allow VMM to operate VMCS directly.
Instead, TDX has its own data structures, and TDX SEAMCALL APIs for VMM to
indirectly operate those data structures. This means we must have a TDX
version of kvm_x86_ops.
The existing global struct kvm_x86_ops already defines an interface which
can be adapted to TDX, but kvm_x86_ops is a system-wide, not per-VM
structure. To allow VMX to coexist with TDs, the kvm_x86_ops callbacks
will have wrappers "if (tdx) tdx_op() else vmx_op()" to pick VMX or
TDX at run time.
To split the runtime switch, the VMX implementation, and the TDX
implementation, add main.c, and move out the vmx_x86_ops hooks in
preparation for adding TDX. Use 'vt' for the naming scheme as a nod to
VT-x and as a concatenation of VmxTdx.
The eventually converted code will look like this:
vmx.c:
vmx_op() { ... }
VMX initialization
tdx.c:
tdx_op() { ... }
TDX initialization
x86_ops.h:
vmx_op();
tdx_op();
main.c:
static vt_op() { if (tdx) tdx_op() else vmx_op() }
static struct kvm_x86_ops vt_x86_ops = {
.op = vt_op,
initialization functions call both VMX and TDX initialization
Opportunistically, fix the name inconsistency from vmx_create_vcpu() and
vmx_free_vcpu() to vmx_vcpu_create() and vmx_vcpu_free().
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Yuan Yao <yuan.yao@intel.com>
Message-Id: <e603c317587f933a9d1bee8728c84e4935849c16.1705965634.git.isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-03-18 11:39:16 -04:00
|
|
|
.migrate_timers = vmx_migrate_timers,
|
|
|
|
|
2025-06-10 15:57:25 -07:00
|
|
|
.recalc_msr_intercepts = vt_op(recalc_msr_intercepts),
|
2025-03-18 00:35:09 -06:00
|
|
|
.complete_emulated_msr = vt_op(complete_emulated_msr),
|
KVM: VMX: Move out vmx_x86_ops to 'main.c' to dispatch VMX and TDX
KVM accesses Virtual Machine Control Structure (VMCS) with VMX instructions
to operate on VM. TDX doesn't allow VMM to operate VMCS directly.
Instead, TDX has its own data structures, and TDX SEAMCALL APIs for VMM to
indirectly operate those data structures. This means we must have a TDX
version of kvm_x86_ops.
The existing global struct kvm_x86_ops already defines an interface which
can be adapted to TDX, but kvm_x86_ops is a system-wide, not per-VM
structure. To allow VMX to coexist with TDs, the kvm_x86_ops callbacks
will have wrappers "if (tdx) tdx_op() else vmx_op()" to pick VMX or
TDX at run time.
To split the runtime switch, the VMX implementation, and the TDX
implementation, add main.c, and move out the vmx_x86_ops hooks in
preparation for adding TDX. Use 'vt' for the naming scheme as a nod to
VT-x and as a concatenation of VmxTdx.
The eventually converted code will look like this:
vmx.c:
vmx_op() { ... }
VMX initialization
tdx.c:
tdx_op() { ... }
TDX initialization
x86_ops.h:
vmx_op();
tdx_op();
main.c:
static vt_op() { if (tdx) tdx_op() else vmx_op() }
static struct kvm_x86_ops vt_x86_ops = {
.op = vt_op,
initialization functions call both VMX and TDX initialization
Opportunistically, fix the name inconsistency from vmx_create_vcpu() and
vmx_free_vcpu() to vmx_vcpu_create() and vmx_vcpu_free().
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Yuan Yao <yuan.yao@intel.com>
Message-Id: <e603c317587f933a9d1bee8728c84e4935849c16.1705965634.git.isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-03-18 11:39:16 -04:00
|
|
|
|
|
|
|
.vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector,
|
|
|
|
|
|
|
|
.get_untagged_addr = vmx_get_untagged_addr,
|
2024-10-30 12:00:28 -07:00
|
|
|
|
2025-03-18 00:35:09 -06:00
|
|
|
.mem_enc_ioctl = vt_op_tdx_only(mem_enc_ioctl),
|
|
|
|
.vcpu_mem_enc_ioctl = vt_op_tdx_only(vcpu_mem_enc_ioctl),
|
2024-11-12 15:38:16 +08:00
|
|
|
|
2025-03-18 00:35:09 -06:00
|
|
|
.private_max_mapping_level = vt_op_tdx_only(gmem_private_max_mapping_level)
|
KVM: VMX: Move out vmx_x86_ops to 'main.c' to dispatch VMX and TDX
KVM accesses Virtual Machine Control Structure (VMCS) with VMX instructions
to operate on VM. TDX doesn't allow VMM to operate VMCS directly.
Instead, TDX has its own data structures, and TDX SEAMCALL APIs for VMM to
indirectly operate those data structures. This means we must have a TDX
version of kvm_x86_ops.
The existing global struct kvm_x86_ops already defines an interface which
can be adapted to TDX, but kvm_x86_ops is a system-wide, not per-VM
structure. To allow VMX to coexist with TDs, the kvm_x86_ops callbacks
will have wrappers "if (tdx) tdx_op() else vmx_op()" to pick VMX or
TDX at run time.
To split the runtime switch, the VMX implementation, and the TDX
implementation, add main.c, and move out the vmx_x86_ops hooks in
preparation for adding TDX. Use 'vt' for the naming scheme as a nod to
VT-x and as a concatenation of VmxTdx.
The eventually converted code will look like this:
vmx.c:
vmx_op() { ... }
VMX initialization
tdx.c:
tdx_op() { ... }
TDX initialization
x86_ops.h:
vmx_op();
tdx_op();
main.c:
static vt_op() { if (tdx) tdx_op() else vmx_op() }
static struct kvm_x86_ops vt_x86_ops = {
.op = vt_op,
initialization functions call both VMX and TDX initialization
Opportunistically, fix the name inconsistency from vmx_create_vcpu() and
vmx_free_vcpu() to vmx_vcpu_create() and vmx_vcpu_free().
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Yuan Yao <yuan.yao@intel.com>
Message-Id: <e603c317587f933a9d1bee8728c84e4935849c16.1705965634.git.isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-03-18 11:39:16 -04:00
|
|
|
};
|
|
|
|
|
|
|
|
struct kvm_x86_init_ops vt_init_ops __initdata = {
|
2025-03-18 00:35:09 -06:00
|
|
|
.hardware_setup = vt_op(hardware_setup),
|
KVM: VMX: Move out vmx_x86_ops to 'main.c' to dispatch VMX and TDX
KVM accesses Virtual Machine Control Structure (VMCS) with VMX instructions
to operate on VM. TDX doesn't allow VMM to operate VMCS directly.
Instead, TDX has its own data structures, and TDX SEAMCALL APIs for VMM to
indirectly operate those data structures. This means we must have a TDX
version of kvm_x86_ops.
The existing global struct kvm_x86_ops already defines an interface which
can be adapted to TDX, but kvm_x86_ops is a system-wide, not per-VM
structure. To allow VMX to coexist with TDs, the kvm_x86_ops callbacks
will have wrappers "if (tdx) tdx_op() else vmx_op()" to pick VMX or
TDX at run time.
To split the runtime switch, the VMX implementation, and the TDX
implementation, add main.c, and move out the vmx_x86_ops hooks in
preparation for adding TDX. Use 'vt' for the naming scheme as a nod to
VT-x and as a concatenation of VmxTdx.
The eventually converted code will look like this:
vmx.c:
vmx_op() { ... }
VMX initialization
tdx.c:
tdx_op() { ... }
TDX initialization
x86_ops.h:
vmx_op();
tdx_op();
main.c:
static vt_op() { if (tdx) tdx_op() else vmx_op() }
static struct kvm_x86_ops vt_x86_ops = {
.op = vt_op,
initialization functions call both VMX and TDX initialization
Opportunistically, fix the name inconsistency from vmx_create_vcpu() and
vmx_free_vcpu() to vmx_vcpu_create() and vmx_vcpu_free().
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Yuan Yao <yuan.yao@intel.com>
Message-Id: <e603c317587f933a9d1bee8728c84e4935849c16.1705965634.git.isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-03-18 11:39:16 -04:00
|
|
|
.handle_intel_pt_intr = NULL,
|
|
|
|
|
|
|
|
.runtime_ops = &vt_x86_ops,
|
|
|
|
.pmu_ops = &intel_pmu_ops,
|
|
|
|
};
|
KVM: VMX: Refactor VMX module init/exit functions
Add vt_init() and vt_exit() as the new module init/exit functions and
refactor existing vmx_init()/vmx_exit() as helper to make room for TDX
specific initialization and teardown.
To support TDX, KVM will need to enable TDX during KVM module loading
time. Enabling TDX requires enabling hardware virtualization first so
that all online CPUs (and the new CPU going online) are in post-VMXON
state.
Currently, the vmx_init() flow is:
1) hv_init_evmcs(),
2) kvm_x86_vendor_init(),
3) Other VMX specific initialization,
4) kvm_init()
The kvm_x86_vendor_init() invokes kvm_x86_init_ops::hardware_setup() to
do VMX specific hardware setup and calls kvm_update_ops() to initialize
kvm_x86_ops to VMX's version.
TDX will have its own version for most of kvm_x86_ops callbacks. It
would be nice if kvm_x86_init_ops::hardware_setup() could also be used
for TDX, but in practice it cannot. The reason is, as mentioned above,
TDX initialization requires hardware virtualization having been enabled,
which must happen after kvm_update_ops(), but hardware_setup() is done
before that.
Also, TDX is based on VMX, and it makes sense to only initialize TDX
after VMX has been initialized. If VMX fails to initialize, TDX is
likely broken anyway.
So the new flow of KVM module init function will be:
1) Current VMX initialization code in vmx_init() before kvm_init(),
2) TDX initialization,
3) kvm_init()
Split vmx_init() into two parts based on above 1) and 3) so that TDX
initialization can fit in between. Make part 1) as the new helper
vmx_init(). Introduce vt_init() as the new module init function which
calls vmx_init() and kvm_init(). TDX initialization will be added
later.
Do the same thing for vmx_exit()/vt_exit().
Signed-off-by: Kai Huang <kai.huang@intel.com>
Message-ID: <3f23f24098bdcf42e213798893ffff7cdc7103be.1731664295.git.kai.huang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-01-22 05:40:25 -05:00
|
|
|
|
|
|
|
static void __exit vt_exit(void)
|
|
|
|
{
|
|
|
|
kvm_exit();
|
KVM: VMX: Initialize TDX during KVM module load
Before KVM can use TDX to create and run TDX guests, TDX needs to be
initialized from two perspectives: 1) TDX module must be initialized
properly to a working state; 2) A per-cpu TDX initialization, a.k.a the
TDH.SYS.LP.INIT SEAMCALL must be done on any logical cpu before it can
run any other TDX SEAMCALLs.
The TDX host core-kernel provides two functions to do the above two
respectively: tdx_enable() and tdx_cpu_enable().
There are two options in terms of when to initialize TDX: initialize TDX
at KVM module loading time, or when creating the first TDX guest.
Choose to initialize TDX during KVM module loading time:
Initializing TDX module is both memory and CPU time consuming: 1) the
kernel needs to allocate a non-trivial size(~1/256) of system memory
as metadata used by TDX module to track each TDX-usable memory page's
status; 2) the TDX module needs to initialize this metadata, one entry
for each TDX-usable memory page.
Also, the kernel uses alloc_contig_pages() to allocate those metadata
chunks, because they are large and need to be physically contiguous.
alloc_contig_pages() can fail. If initializing TDX when creating the
first TDX guest, then there's chance that KVM won't be able to run any
TDX guests albeit KVM _declares_ to be able to support TDX.
This isn't good for the user.
On the other hand, initializing TDX at KVM module loading time can make
sure KVM is providing a consistent view of whether KVM can support TDX
to the user.
Always only try to initialize TDX after VMX has been initialized. TDX
is based on VMX, and if VMX fails to initialize then TDX is likely to be
broken anyway. Also, in practice, supporting TDX will require part of
VMX and common x86 infrastructure in working order, so TDX cannot be
enabled alone w/o VMX support.
There are two cases that can result in failure to initialize TDX: 1) TDX
cannot be supported (e.g., because of TDX is not supported or enabled by
hardware, or module is not loaded, or missing some dependency in KVM's
configuration); 2) Any unexpected error during TDX bring-up. For the
first case only mark TDX is disabled but still allow KVM module to be
loaded. For the second case just fail to load the KVM module so that
the user can be aware.
Because TDX costs additional memory, don't enable TDX by default. Add a
new module parameter 'enable_tdx' to allow the user to opt-in.
Note, the name tdx_init() has already been taken by the early boot code.
Use tdx_bringup() for initializing TDX (and tdx_cleanup() since KVM
doesn't actually teardown TDX). They don't match vt_init()/vt_exit(),
vmx_init()/vmx_exit() etc but it's not end of the world.
Also, once initialized, the TDX module cannot be disabled and enabled
again w/o the TDX module runtime update, which isn't supported by the
kernel. After TDX is enabled, nothing needs to be done when KVM
disables hardware virtualization, e.g., when offlining CPU, or during
suspend/resume. TDX host core-kernel code internally tracks TDX status
and can handle "multiple enabling" scenario.
Similar to KVM_AMD_SEV, add a new KVM_INTEL_TDX Kconfig to guide KVM TDX
code. Make it depend on INTEL_TDX_HOST but not replace INTEL_TDX_HOST
because in the longer term there's a use case that requires making
SEAMCALLs w/o KVM as mentioned by Dan [1].
Link: https://lore.kernel.org/6723fc2070a96_60c3294dc@dwillia2-mobl3.amr.corp.intel.com.notmuch/ [1]
Signed-off-by: Kai Huang <kai.huang@intel.com>
Message-ID: <162f9dee05c729203b9ad6688db1ca2960b4b502.1731664295.git.kai.huang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-11-15 22:52:41 +13:00
|
|
|
tdx_cleanup();
|
KVM: VMX: Refactor VMX module init/exit functions
Add vt_init() and vt_exit() as the new module init/exit functions and
refactor existing vmx_init()/vmx_exit() as helper to make room for TDX
specific initialization and teardown.
To support TDX, KVM will need to enable TDX during KVM module loading
time. Enabling TDX requires enabling hardware virtualization first so
that all online CPUs (and the new CPU going online) are in post-VMXON
state.
Currently, the vmx_init() flow is:
1) hv_init_evmcs(),
2) kvm_x86_vendor_init(),
3) Other VMX specific initialization,
4) kvm_init()
The kvm_x86_vendor_init() invokes kvm_x86_init_ops::hardware_setup() to
do VMX specific hardware setup and calls kvm_update_ops() to initialize
kvm_x86_ops to VMX's version.
TDX will have its own version for most of kvm_x86_ops callbacks. It
would be nice if kvm_x86_init_ops::hardware_setup() could also be used
for TDX, but in practice it cannot. The reason is, as mentioned above,
TDX initialization requires hardware virtualization having been enabled,
which must happen after kvm_update_ops(), but hardware_setup() is done
before that.
Also, TDX is based on VMX, and it makes sense to only initialize TDX
after VMX has been initialized. If VMX fails to initialize, TDX is
likely broken anyway.
So the new flow of KVM module init function will be:
1) Current VMX initialization code in vmx_init() before kvm_init(),
2) TDX initialization,
3) kvm_init()
Split vmx_init() into two parts based on above 1) and 3) so that TDX
initialization can fit in between. Make part 1) as the new helper
vmx_init(). Introduce vt_init() as the new module init function which
calls vmx_init() and kvm_init(). TDX initialization will be added
later.
Do the same thing for vmx_exit()/vt_exit().
Signed-off-by: Kai Huang <kai.huang@intel.com>
Message-ID: <3f23f24098bdcf42e213798893ffff7cdc7103be.1731664295.git.kai.huang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-01-22 05:40:25 -05:00
|
|
|
vmx_exit();
|
|
|
|
}
|
|
|
|
module_exit(vt_exit);
|
|
|
|
|
|
|
|
static int __init vt_init(void)
|
|
|
|
{
|
KVM: TDX: Add placeholders for TDX VM/vCPU structures
Add TDX's own VM and vCPU structures as placeholder to manage and run
TDX guests. Also add helper functions to check whether a VM/vCPU is
TDX or normal VMX one, and add helpers to convert between TDX VM/vCPU
and KVM VM/vCPU.
TDX protects guest VMs from malicious host. Unlike VMX guests, TDX
guests are crypto-protected. KVM cannot access TDX guests' memory and
vCPU states directly. Instead, TDX requires KVM to use a set of TDX
architecture-defined firmware APIs (a.k.a TDX module SEAMCALLs) to
manage and run TDX guests.
In fact, the way to manage and run TDX guests and normal VMX guests are
quite different. Because of that, the current structures
('struct kvm_vmx' and 'struct vcpu_vmx') to manage VMX guests are not
quite suitable for TDX guests. E.g., the majority of the members of
'struct vcpu_vmx' don't apply to TDX guests.
Introduce TDX's own VM and vCPU structures ('struct kvm_tdx' and 'struct
vcpu_tdx' respectively) for KVM to manage and run TDX guests. And
instead of building TDX's VM and vCPU structures based on VMX's, build
them directly based on 'struct kvm'.
As a result, TDX and VMX guests will have different VM size and vCPU
size/alignment.
Currently, kvm_arch_alloc_vm() uses 'kvm_x86_ops::vm_size' to allocate
enough space for the VM structure when creating guest. With TDX guests,
ideally, KVM should allocate the VM structure based on the VM type so
that the precise size can be allocated for VMX and TDX guests. But this
requires more extensive code change. For now, simply choose the maximum
size of 'struct kvm_tdx' and 'struct kvm_vmx' for VM structure
allocation for both VMX and TDX guests. This would result in small
memory waste for each VM which has smaller VM structure size but this is
acceptable.
For simplicity, use the same way for vCPU allocation too. Otherwise KVM
would need to maintain a separate 'kvm_vcpu_cache' for each VM type.
Note, updating the 'vt_x86_ops::vm_size' needs to be done before calling
kvm_ops_update(), which copies vt_x86_ops to kvm_x86_ops. However this
happens before TDX module initialization. Therefore theoretically it is
possible that 'kvm_x86_ops::vm_size' is set to size of 'struct kvm_tdx'
(when it's larger) but TDX actually fails to initialize at a later time.
Again the worst case of this is wasting couple of bytes memory for each
VM. KVM could choose to update 'kvm_x86_ops::vm_size' at a later time
depending on TDX's status but that would require base KVM module to
export either kvm_x86_ops or kvm_ops_update().
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
- Make to_kvm_tdx() and to_tdx() private to tdx.c (Francesco, Tony)
- Add pragma poison for to_vmx() (Paolo)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-30 12:00:24 -07:00
|
|
|
unsigned vcpu_size, vcpu_align;
|
KVM: VMX: Refactor VMX module init/exit functions
Add vt_init() and vt_exit() as the new module init/exit functions and
refactor existing vmx_init()/vmx_exit() as helper to make room for TDX
specific initialization and teardown.
To support TDX, KVM will need to enable TDX during KVM module loading
time. Enabling TDX requires enabling hardware virtualization first so
that all online CPUs (and the new CPU going online) are in post-VMXON
state.
Currently, the vmx_init() flow is:
1) hv_init_evmcs(),
2) kvm_x86_vendor_init(),
3) Other VMX specific initialization,
4) kvm_init()
The kvm_x86_vendor_init() invokes kvm_x86_init_ops::hardware_setup() to
do VMX specific hardware setup and calls kvm_update_ops() to initialize
kvm_x86_ops to VMX's version.
TDX will have its own version for most of kvm_x86_ops callbacks. It
would be nice if kvm_x86_init_ops::hardware_setup() could also be used
for TDX, but in practice it cannot. The reason is, as mentioned above,
TDX initialization requires hardware virtualization having been enabled,
which must happen after kvm_update_ops(), but hardware_setup() is done
before that.
Also, TDX is based on VMX, and it makes sense to only initialize TDX
after VMX has been initialized. If VMX fails to initialize, TDX is
likely broken anyway.
So the new flow of KVM module init function will be:
1) Current VMX initialization code in vmx_init() before kvm_init(),
2) TDX initialization,
3) kvm_init()
Split vmx_init() into two parts based on above 1) and 3) so that TDX
initialization can fit in between. Make part 1) as the new helper
vmx_init(). Introduce vt_init() as the new module init function which
calls vmx_init() and kvm_init(). TDX initialization will be added
later.
Do the same thing for vmx_exit()/vt_exit().
Signed-off-by: Kai Huang <kai.huang@intel.com>
Message-ID: <3f23f24098bdcf42e213798893ffff7cdc7103be.1731664295.git.kai.huang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-01-22 05:40:25 -05:00
|
|
|
int r;
|
|
|
|
|
|
|
|
r = vmx_init();
|
|
|
|
if (r)
|
|
|
|
return r;
|
|
|
|
|
KVM: VMX: Initialize TDX during KVM module load
Before KVM can use TDX to create and run TDX guests, TDX needs to be
initialized from two perspectives: 1) TDX module must be initialized
properly to a working state; 2) A per-cpu TDX initialization, a.k.a the
TDH.SYS.LP.INIT SEAMCALL must be done on any logical cpu before it can
run any other TDX SEAMCALLs.
The TDX host core-kernel provides two functions to do the above two
respectively: tdx_enable() and tdx_cpu_enable().
There are two options in terms of when to initialize TDX: initialize TDX
at KVM module loading time, or when creating the first TDX guest.
Choose to initialize TDX during KVM module loading time:
Initializing TDX module is both memory and CPU time consuming: 1) the
kernel needs to allocate a non-trivial size(~1/256) of system memory
as metadata used by TDX module to track each TDX-usable memory page's
status; 2) the TDX module needs to initialize this metadata, one entry
for each TDX-usable memory page.
Also, the kernel uses alloc_contig_pages() to allocate those metadata
chunks, because they are large and need to be physically contiguous.
alloc_contig_pages() can fail. If initializing TDX when creating the
first TDX guest, then there's chance that KVM won't be able to run any
TDX guests albeit KVM _declares_ to be able to support TDX.
This isn't good for the user.
On the other hand, initializing TDX at KVM module loading time can make
sure KVM is providing a consistent view of whether KVM can support TDX
to the user.
Always only try to initialize TDX after VMX has been initialized. TDX
is based on VMX, and if VMX fails to initialize then TDX is likely to be
broken anyway. Also, in practice, supporting TDX will require part of
VMX and common x86 infrastructure in working order, so TDX cannot be
enabled alone w/o VMX support.
There are two cases that can result in failure to initialize TDX: 1) TDX
cannot be supported (e.g., because of TDX is not supported or enabled by
hardware, or module is not loaded, or missing some dependency in KVM's
configuration); 2) Any unexpected error during TDX bring-up. For the
first case only mark TDX is disabled but still allow KVM module to be
loaded. For the second case just fail to load the KVM module so that
the user can be aware.
Because TDX costs additional memory, don't enable TDX by default. Add a
new module parameter 'enable_tdx' to allow the user to opt-in.
Note, the name tdx_init() has already been taken by the early boot code.
Use tdx_bringup() for initializing TDX (and tdx_cleanup() since KVM
doesn't actually teardown TDX). They don't match vt_init()/vt_exit(),
vmx_init()/vmx_exit() etc but it's not end of the world.
Also, once initialized, the TDX module cannot be disabled and enabled
again w/o the TDX module runtime update, which isn't supported by the
kernel. After TDX is enabled, nothing needs to be done when KVM
disables hardware virtualization, e.g., when offlining CPU, or during
suspend/resume. TDX host core-kernel code internally tracks TDX status
and can handle "multiple enabling" scenario.
Similar to KVM_AMD_SEV, add a new KVM_INTEL_TDX Kconfig to guide KVM TDX
code. Make it depend on INTEL_TDX_HOST but not replace INTEL_TDX_HOST
because in the longer term there's a use case that requires making
SEAMCALLs w/o KVM as mentioned by Dan [1].
Link: https://lore.kernel.org/6723fc2070a96_60c3294dc@dwillia2-mobl3.amr.corp.intel.com.notmuch/ [1]
Signed-off-by: Kai Huang <kai.huang@intel.com>
Message-ID: <162f9dee05c729203b9ad6688db1ca2960b4b502.1731664295.git.kai.huang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-11-15 22:52:41 +13:00
|
|
|
/* tdx_init() has been taken */
|
|
|
|
r = tdx_bringup();
|
|
|
|
if (r)
|
|
|
|
goto err_tdx_bringup;
|
|
|
|
|
KVM: TDX: Add placeholders for TDX VM/vCPU structures
Add TDX's own VM and vCPU structures as placeholder to manage and run
TDX guests. Also add helper functions to check whether a VM/vCPU is
TDX or normal VMX one, and add helpers to convert between TDX VM/vCPU
and KVM VM/vCPU.
TDX protects guest VMs from malicious host. Unlike VMX guests, TDX
guests are crypto-protected. KVM cannot access TDX guests' memory and
vCPU states directly. Instead, TDX requires KVM to use a set of TDX
architecture-defined firmware APIs (a.k.a TDX module SEAMCALLs) to
manage and run TDX guests.
In fact, the way to manage and run TDX guests and normal VMX guests are
quite different. Because of that, the current structures
('struct kvm_vmx' and 'struct vcpu_vmx') to manage VMX guests are not
quite suitable for TDX guests. E.g., the majority of the members of
'struct vcpu_vmx' don't apply to TDX guests.
Introduce TDX's own VM and vCPU structures ('struct kvm_tdx' and 'struct
vcpu_tdx' respectively) for KVM to manage and run TDX guests. And
instead of building TDX's VM and vCPU structures based on VMX's, build
them directly based on 'struct kvm'.
As a result, TDX and VMX guests will have different VM size and vCPU
size/alignment.
Currently, kvm_arch_alloc_vm() uses 'kvm_x86_ops::vm_size' to allocate
enough space for the VM structure when creating guest. With TDX guests,
ideally, KVM should allocate the VM structure based on the VM type so
that the precise size can be allocated for VMX and TDX guests. But this
requires more extensive code change. For now, simply choose the maximum
size of 'struct kvm_tdx' and 'struct kvm_vmx' for VM structure
allocation for both VMX and TDX guests. This would result in small
memory waste for each VM which has smaller VM structure size but this is
acceptable.
For simplicity, use the same way for vCPU allocation too. Otherwise KVM
would need to maintain a separate 'kvm_vcpu_cache' for each VM type.
Note, updating the 'vt_x86_ops::vm_size' needs to be done before calling
kvm_ops_update(), which copies vt_x86_ops to kvm_x86_ops. However this
happens before TDX module initialization. Therefore theoretically it is
possible that 'kvm_x86_ops::vm_size' is set to size of 'struct kvm_tdx'
(when it's larger) but TDX actually fails to initialize at a later time.
Again the worst case of this is wasting couple of bytes memory for each
VM. KVM could choose to update 'kvm_x86_ops::vm_size' at a later time
depending on TDX's status but that would require base KVM module to
export either kvm_x86_ops or kvm_ops_update().
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
- Make to_kvm_tdx() and to_tdx() private to tdx.c (Francesco, Tony)
- Add pragma poison for to_vmx() (Paolo)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-30 12:00:24 -07:00
|
|
|
/*
|
|
|
|
* TDX and VMX have different vCPU structures. Calculate the
|
|
|
|
* maximum size/align so that kvm_init() can use the larger
|
|
|
|
* values to create the kmem_vcpu_cache.
|
|
|
|
*/
|
|
|
|
vcpu_size = sizeof(struct vcpu_vmx);
|
|
|
|
vcpu_align = __alignof__(struct vcpu_vmx);
|
|
|
|
if (enable_tdx) {
|
|
|
|
vcpu_size = max_t(unsigned, vcpu_size,
|
|
|
|
sizeof(struct vcpu_tdx));
|
|
|
|
vcpu_align = max_t(unsigned, vcpu_align,
|
|
|
|
__alignof__(struct vcpu_tdx));
|
2024-12-10 08:49:43 +08:00
|
|
|
kvm_caps.supported_vm_types |= BIT(KVM_X86_TDX_VM);
|
KVM: TDX: Add placeholders for TDX VM/vCPU structures
Add TDX's own VM and vCPU structures as placeholder to manage and run
TDX guests. Also add helper functions to check whether a VM/vCPU is
TDX or normal VMX one, and add helpers to convert between TDX VM/vCPU
and KVM VM/vCPU.
TDX protects guest VMs from malicious host. Unlike VMX guests, TDX
guests are crypto-protected. KVM cannot access TDX guests' memory and
vCPU states directly. Instead, TDX requires KVM to use a set of TDX
architecture-defined firmware APIs (a.k.a TDX module SEAMCALLs) to
manage and run TDX guests.
In fact, the way to manage and run TDX guests and normal VMX guests are
quite different. Because of that, the current structures
('struct kvm_vmx' and 'struct vcpu_vmx') to manage VMX guests are not
quite suitable for TDX guests. E.g., the majority of the members of
'struct vcpu_vmx' don't apply to TDX guests.
Introduce TDX's own VM and vCPU structures ('struct kvm_tdx' and 'struct
vcpu_tdx' respectively) for KVM to manage and run TDX guests. And
instead of building TDX's VM and vCPU structures based on VMX's, build
them directly based on 'struct kvm'.
As a result, TDX and VMX guests will have different VM size and vCPU
size/alignment.
Currently, kvm_arch_alloc_vm() uses 'kvm_x86_ops::vm_size' to allocate
enough space for the VM structure when creating guest. With TDX guests,
ideally, KVM should allocate the VM structure based on the VM type so
that the precise size can be allocated for VMX and TDX guests. But this
requires more extensive code change. For now, simply choose the maximum
size of 'struct kvm_tdx' and 'struct kvm_vmx' for VM structure
allocation for both VMX and TDX guests. This would result in small
memory waste for each VM which has smaller VM structure size but this is
acceptable.
For simplicity, use the same way for vCPU allocation too. Otherwise KVM
would need to maintain a separate 'kvm_vcpu_cache' for each VM type.
Note, updating the 'vt_x86_ops::vm_size' needs to be done before calling
kvm_ops_update(), which copies vt_x86_ops to kvm_x86_ops. However this
happens before TDX module initialization. Therefore theoretically it is
possible that 'kvm_x86_ops::vm_size' is set to size of 'struct kvm_tdx'
(when it's larger) but TDX actually fails to initialize at a later time.
Again the worst case of this is wasting couple of bytes memory for each
VM. KVM could choose to update 'kvm_x86_ops::vm_size' at a later time
depending on TDX's status but that would require base KVM module to
export either kvm_x86_ops or kvm_ops_update().
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
- Make to_kvm_tdx() and to_tdx() private to tdx.c (Francesco, Tony)
- Add pragma poison for to_vmx() (Paolo)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-30 12:00:24 -07:00
|
|
|
}
|
|
|
|
|
KVM: VMX: Refactor VMX module init/exit functions
Add vt_init() and vt_exit() as the new module init/exit functions and
refactor existing vmx_init()/vmx_exit() as helper to make room for TDX
specific initialization and teardown.
To support TDX, KVM will need to enable TDX during KVM module loading
time. Enabling TDX requires enabling hardware virtualization first so
that all online CPUs (and the new CPU going online) are in post-VMXON
state.
Currently, the vmx_init() flow is:
1) hv_init_evmcs(),
2) kvm_x86_vendor_init(),
3) Other VMX specific initialization,
4) kvm_init()
The kvm_x86_vendor_init() invokes kvm_x86_init_ops::hardware_setup() to
do VMX specific hardware setup and calls kvm_update_ops() to initialize
kvm_x86_ops to VMX's version.
TDX will have its own version for most of kvm_x86_ops callbacks. It
would be nice if kvm_x86_init_ops::hardware_setup() could also be used
for TDX, but in practice it cannot. The reason is, as mentioned above,
TDX initialization requires hardware virtualization having been enabled,
which must happen after kvm_update_ops(), but hardware_setup() is done
before that.
Also, TDX is based on VMX, and it makes sense to only initialize TDX
after VMX has been initialized. If VMX fails to initialize, TDX is
likely broken anyway.
So the new flow of KVM module init function will be:
1) Current VMX initialization code in vmx_init() before kvm_init(),
2) TDX initialization,
3) kvm_init()
Split vmx_init() into two parts based on above 1) and 3) so that TDX
initialization can fit in between. Make part 1) as the new helper
vmx_init(). Introduce vt_init() as the new module init function which
calls vmx_init() and kvm_init(). TDX initialization will be added
later.
Do the same thing for vmx_exit()/vt_exit().
Signed-off-by: Kai Huang <kai.huang@intel.com>
Message-ID: <3f23f24098bdcf42e213798893ffff7cdc7103be.1731664295.git.kai.huang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-01-22 05:40:25 -05:00
|
|
|
/*
|
|
|
|
* Common KVM initialization _must_ come last, after this, /dev/kvm is
|
|
|
|
* exposed to userspace!
|
|
|
|
*/
|
KVM: TDX: Add placeholders for TDX VM/vCPU structures
Add TDX's own VM and vCPU structures as placeholder to manage and run
TDX guests. Also add helper functions to check whether a VM/vCPU is
TDX or normal VMX one, and add helpers to convert between TDX VM/vCPU
and KVM VM/vCPU.
TDX protects guest VMs from malicious host. Unlike VMX guests, TDX
guests are crypto-protected. KVM cannot access TDX guests' memory and
vCPU states directly. Instead, TDX requires KVM to use a set of TDX
architecture-defined firmware APIs (a.k.a TDX module SEAMCALLs) to
manage and run TDX guests.
In fact, the way to manage and run TDX guests and normal VMX guests are
quite different. Because of that, the current structures
('struct kvm_vmx' and 'struct vcpu_vmx') to manage VMX guests are not
quite suitable for TDX guests. E.g., the majority of the members of
'struct vcpu_vmx' don't apply to TDX guests.
Introduce TDX's own VM and vCPU structures ('struct kvm_tdx' and 'struct
vcpu_tdx' respectively) for KVM to manage and run TDX guests. And
instead of building TDX's VM and vCPU structures based on VMX's, build
them directly based on 'struct kvm'.
As a result, TDX and VMX guests will have different VM size and vCPU
size/alignment.
Currently, kvm_arch_alloc_vm() uses 'kvm_x86_ops::vm_size' to allocate
enough space for the VM structure when creating guest. With TDX guests,
ideally, KVM should allocate the VM structure based on the VM type so
that the precise size can be allocated for VMX and TDX guests. But this
requires more extensive code change. For now, simply choose the maximum
size of 'struct kvm_tdx' and 'struct kvm_vmx' for VM structure
allocation for both VMX and TDX guests. This would result in small
memory waste for each VM which has smaller VM structure size but this is
acceptable.
For simplicity, use the same way for vCPU allocation too. Otherwise KVM
would need to maintain a separate 'kvm_vcpu_cache' for each VM type.
Note, updating the 'vt_x86_ops::vm_size' needs to be done before calling
kvm_ops_update(), which copies vt_x86_ops to kvm_x86_ops. However this
happens before TDX module initialization. Therefore theoretically it is
possible that 'kvm_x86_ops::vm_size' is set to size of 'struct kvm_tdx'
(when it's larger) but TDX actually fails to initialize at a later time.
Again the worst case of this is wasting couple of bytes memory for each
VM. KVM could choose to update 'kvm_x86_ops::vm_size' at a later time
depending on TDX's status but that would require base KVM module to
export either kvm_x86_ops or kvm_ops_update().
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
- Make to_kvm_tdx() and to_tdx() private to tdx.c (Francesco, Tony)
- Add pragma poison for to_vmx() (Paolo)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-30 12:00:24 -07:00
|
|
|
r = kvm_init(vcpu_size, vcpu_align, THIS_MODULE);
|
KVM: VMX: Refactor VMX module init/exit functions
Add vt_init() and vt_exit() as the new module init/exit functions and
refactor existing vmx_init()/vmx_exit() as helper to make room for TDX
specific initialization and teardown.
To support TDX, KVM will need to enable TDX during KVM module loading
time. Enabling TDX requires enabling hardware virtualization first so
that all online CPUs (and the new CPU going online) are in post-VMXON
state.
Currently, the vmx_init() flow is:
1) hv_init_evmcs(),
2) kvm_x86_vendor_init(),
3) Other VMX specific initialization,
4) kvm_init()
The kvm_x86_vendor_init() invokes kvm_x86_init_ops::hardware_setup() to
do VMX specific hardware setup and calls kvm_update_ops() to initialize
kvm_x86_ops to VMX's version.
TDX will have its own version for most of kvm_x86_ops callbacks. It
would be nice if kvm_x86_init_ops::hardware_setup() could also be used
for TDX, but in practice it cannot. The reason is, as mentioned above,
TDX initialization requires hardware virtualization having been enabled,
which must happen after kvm_update_ops(), but hardware_setup() is done
before that.
Also, TDX is based on VMX, and it makes sense to only initialize TDX
after VMX has been initialized. If VMX fails to initialize, TDX is
likely broken anyway.
So the new flow of KVM module init function will be:
1) Current VMX initialization code in vmx_init() before kvm_init(),
2) TDX initialization,
3) kvm_init()
Split vmx_init() into two parts based on above 1) and 3) so that TDX
initialization can fit in between. Make part 1) as the new helper
vmx_init(). Introduce vt_init() as the new module init function which
calls vmx_init() and kvm_init(). TDX initialization will be added
later.
Do the same thing for vmx_exit()/vt_exit().
Signed-off-by: Kai Huang <kai.huang@intel.com>
Message-ID: <3f23f24098bdcf42e213798893ffff7cdc7103be.1731664295.git.kai.huang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-01-22 05:40:25 -05:00
|
|
|
if (r)
|
|
|
|
goto err_kvm_init;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
err_kvm_init:
|
KVM: VMX: Initialize TDX during KVM module load
Before KVM can use TDX to create and run TDX guests, TDX needs to be
initialized from two perspectives: 1) TDX module must be initialized
properly to a working state; 2) A per-cpu TDX initialization, a.k.a the
TDH.SYS.LP.INIT SEAMCALL must be done on any logical cpu before it can
run any other TDX SEAMCALLs.
The TDX host core-kernel provides two functions to do the above two
respectively: tdx_enable() and tdx_cpu_enable().
There are two options in terms of when to initialize TDX: initialize TDX
at KVM module loading time, or when creating the first TDX guest.
Choose to initialize TDX during KVM module loading time:
Initializing TDX module is both memory and CPU time consuming: 1) the
kernel needs to allocate a non-trivial size(~1/256) of system memory
as metadata used by TDX module to track each TDX-usable memory page's
status; 2) the TDX module needs to initialize this metadata, one entry
for each TDX-usable memory page.
Also, the kernel uses alloc_contig_pages() to allocate those metadata
chunks, because they are large and need to be physically contiguous.
alloc_contig_pages() can fail. If initializing TDX when creating the
first TDX guest, then there's chance that KVM won't be able to run any
TDX guests albeit KVM _declares_ to be able to support TDX.
This isn't good for the user.
On the other hand, initializing TDX at KVM module loading time can make
sure KVM is providing a consistent view of whether KVM can support TDX
to the user.
Always only try to initialize TDX after VMX has been initialized. TDX
is based on VMX, and if VMX fails to initialize then TDX is likely to be
broken anyway. Also, in practice, supporting TDX will require part of
VMX and common x86 infrastructure in working order, so TDX cannot be
enabled alone w/o VMX support.
There are two cases that can result in failure to initialize TDX: 1) TDX
cannot be supported (e.g., because of TDX is not supported or enabled by
hardware, or module is not loaded, or missing some dependency in KVM's
configuration); 2) Any unexpected error during TDX bring-up. For the
first case only mark TDX is disabled but still allow KVM module to be
loaded. For the second case just fail to load the KVM module so that
the user can be aware.
Because TDX costs additional memory, don't enable TDX by default. Add a
new module parameter 'enable_tdx' to allow the user to opt-in.
Note, the name tdx_init() has already been taken by the early boot code.
Use tdx_bringup() for initializing TDX (and tdx_cleanup() since KVM
doesn't actually teardown TDX). They don't match vt_init()/vt_exit(),
vmx_init()/vmx_exit() etc but it's not end of the world.
Also, once initialized, the TDX module cannot be disabled and enabled
again w/o the TDX module runtime update, which isn't supported by the
kernel. After TDX is enabled, nothing needs to be done when KVM
disables hardware virtualization, e.g., when offlining CPU, or during
suspend/resume. TDX host core-kernel code internally tracks TDX status
and can handle "multiple enabling" scenario.
Similar to KVM_AMD_SEV, add a new KVM_INTEL_TDX Kconfig to guide KVM TDX
code. Make it depend on INTEL_TDX_HOST but not replace INTEL_TDX_HOST
because in the longer term there's a use case that requires making
SEAMCALLs w/o KVM as mentioned by Dan [1].
Link: https://lore.kernel.org/6723fc2070a96_60c3294dc@dwillia2-mobl3.amr.corp.intel.com.notmuch/ [1]
Signed-off-by: Kai Huang <kai.huang@intel.com>
Message-ID: <162f9dee05c729203b9ad6688db1ca2960b4b502.1731664295.git.kai.huang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-11-15 22:52:41 +13:00
|
|
|
tdx_cleanup();
|
|
|
|
err_tdx_bringup:
|
KVM: VMX: Refactor VMX module init/exit functions
Add vt_init() and vt_exit() as the new module init/exit functions and
refactor existing vmx_init()/vmx_exit() as helper to make room for TDX
specific initialization and teardown.
To support TDX, KVM will need to enable TDX during KVM module loading
time. Enabling TDX requires enabling hardware virtualization first so
that all online CPUs (and the new CPU going online) are in post-VMXON
state.
Currently, the vmx_init() flow is:
1) hv_init_evmcs(),
2) kvm_x86_vendor_init(),
3) Other VMX specific initialization,
4) kvm_init()
The kvm_x86_vendor_init() invokes kvm_x86_init_ops::hardware_setup() to
do VMX specific hardware setup and calls kvm_update_ops() to initialize
kvm_x86_ops to VMX's version.
TDX will have its own version for most of kvm_x86_ops callbacks. It
would be nice if kvm_x86_init_ops::hardware_setup() could also be used
for TDX, but in practice it cannot. The reason is, as mentioned above,
TDX initialization requires hardware virtualization having been enabled,
which must happen after kvm_update_ops(), but hardware_setup() is done
before that.
Also, TDX is based on VMX, and it makes sense to only initialize TDX
after VMX has been initialized. If VMX fails to initialize, TDX is
likely broken anyway.
So the new flow of KVM module init function will be:
1) Current VMX initialization code in vmx_init() before kvm_init(),
2) TDX initialization,
3) kvm_init()
Split vmx_init() into two parts based on above 1) and 3) so that TDX
initialization can fit in between. Make part 1) as the new helper
vmx_init(). Introduce vt_init() as the new module init function which
calls vmx_init() and kvm_init(). TDX initialization will be added
later.
Do the same thing for vmx_exit()/vt_exit().
Signed-off-by: Kai Huang <kai.huang@intel.com>
Message-ID: <3f23f24098bdcf42e213798893ffff7cdc7103be.1731664295.git.kai.huang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-01-22 05:40:25 -05:00
|
|
|
vmx_exit();
|
|
|
|
return r;
|
|
|
|
}
|
|
|
|
module_init(vt_init);
|