linux

mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-08-05 16:54:27 +00:00

Author	SHA1	Message	Date
Sean Christopherson	3338c639da	KVM: SVM: Add helper to deduplicate code for getting AVIC backing page Add a helper to get the physical address of the AVIC backing page, both to deduplicate code and to prepare for getting the address directly from apic->regs, at which point it won't be all that obvious that the address in question is what SVM calls the AVIC backing page. No functional change intended. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Tested-by: Sairaj Kodilkar <sarunkod@amd.com> Reviewed-by: Naveen N Rao (AMD) <naveen@kernel.org> Link: https://lore.kernel.org/r/20250611224604.313496-12-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-20 13:52:57 -07:00
Sean Christopherson	2e002ddc89	KVM: SVM: Drop pointless masking of kernel page pa's with AVIC HPA masks Drop AVIC_HPA_MASK and all its users, the mask is just the 4KiB-aligned maximum theoretical physical address for x86-64 CPUs, as x86-64 is currently defined (going beyond PA52 would require an entirely new paging mode, which would arguably create a new, different architecture). All usage in KVM masks the result of page_to_phys(), which on x86-64 is guaranteed to be 4KiB aligned and a legal physical address; if either of those requirements doesn't hold true, KVM has far bigger problems. Drop masking the avic_backing_page with AVIC_PHYSICAL_ID_ENTRY_BACKING_PAGE_MASK for all the same reasons, but keep the macro even though it's unused in functional code. It's a distinct architectural define, and having the definition in software helps visualize the layout of an entry. And to be hyper-paranoid about MAXPA going beyond 52, add a compile-time assert to ensure the kernel's maximum supported physical address stays in bounds. The unnecessary masking in avic_init_vmcb() also incorrectly assumes that SME's C-bit resides between bits 51:11; that holds true for current CPUs, but isn't required by AMD's architecture: In some implementations, the bit used may be a physical address bit Key word being "may". Opportunistically use the GENMASK_ULL() version for AVIC_PHYSICAL_ID_ENTRY_BACKING_PAGE_MASK, which is far more readable than a set of repeating Fs. Tested-by: Sairaj Kodilkar <sarunkod@amd.com> Reviewed-by: Naveen N Rao (AMD) <naveen@kernel.org> Link: https://lore.kernel.org/r/20250611224604.313496-11-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-20 13:52:57 -07:00
Sean Christopherson	4305795778	KVM: SVM: Drop pointless masking of default APIC base when setting V_APIC_BAR Drop VMCB_AVIC_APIC_BAR_MASK, it's just a regurgitation of the maximum theoretical 4KiB-aligned physical address, i.e. is not novel in any way, and its only usage is to mask the default APIC base, which is 4KiB aligned and (obviously) a legal physical address. No functional change intended. Tested-by: Sairaj Kodilkar <sarunkod@amd.com> Reviewed-by: Naveen N Rao (AMD) <naveen@kernel.org> Link: https://lore.kernel.org/r/20250611224604.313496-10-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-20 13:52:56 -07:00
Sean Christopherson	a0ca34bb1a	KVM: SVM: Delete IRTE link from previous vCPU irrespective of new routing Delete the IRTE link from the previous vCPU irrespective of the new routing state, i.e. even if the IRTE won't be configured to post IRQs to a vCPU. Whether or not the new route is postable as no bearing on the old route. Failure to delete the link can result in KVM incorrectly updating the IRTE, e.g. if the "old" vCPU is scheduled in/out. Fixes: `411b44ba80` ("svm: Implements update_pi_irte hook to setup posted interrupt") Tested-by: Sairaj Kodilkar <sarunkod@amd.com> Link: https://lore.kernel.org/r/20250611224604.313496-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-20 13:52:56 -07:00
Sean Christopherson	1da19c5ce0	iommu/amd: KVM: SVM: Delete now-unused cached/previous GA tag fields Delete the amd_ir_data.prev_ga_tag field now that all usage is superfluous. Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Tested-by: Sairaj Kodilkar <sarunkod@amd.com> Link: https://lore.kernel.org/r/20250611224604.313496-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-20 13:52:55 -07:00
Sean Christopherson	0a917e9d4b	KVM: SVM: Delete IRTE link from previous vCPU before setting new IRTE Delete the previous per-vCPU IRTE link prior to modifying the IRTE. If forcing the IRTE back to remapped mode fails, the IRQ is already broken; keeping stale metadata won't change that, and the IOMMU should be sufficiently paranoid to sanitize the IRTE when the IRQ is freed and reallocated. This will allow hoisting the vCPU tracking to common x86, which in turn will allow most of the IRTE update code to be deduplicated. Tested-by: Sairaj Kodilkar <sarunkod@amd.com> Link: https://lore.kernel.org/r/20250611224604.313496-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-20 13:52:54 -07:00
Sean Christopherson	05c5e23657	KVM: SVM: Track per-vCPU IRTEs using kvm_kernel_irqfd structure Track the IRTEs that are posting to an SVM vCPU via the associated irqfd structure and GSI routing instead of dynamically allocating a separate data structure. In addition to eliminating an atomic allocation, this will allow hoisting much of the IRTE update logic to common x86. Cc: Sairaj Kodilkar <sarunkod@amd.com> Link: https://lore.kernel.org/r/20250611224604.313496-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-20 13:52:54 -07:00
Sean Christopherson	cb21073767	KVM: Pass new routing entries and irqfd when updating IRTEs When updating IRTEs in response to a GSI routing or IRQ bypass change, pass the new/current routing information along with the associated irqfd. This will allow KVM x86 to harden, simplify, and deduplicate its code. Since adding/removing a bypass producer is now conveniently protected with irqfds.lock, i.e. can't run concurrently with kvm_irq_routing_update(), use the routing information cached in the irqfd instead of looking up the information in the current GSI routing tables. Opportunistically convert an existing printk() to pr_info() and put its string onto a single line (old code that strictly adhered to 80 chars). Link: https://lore.kernel.org/r/20250611224604.313496-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-20 13:52:53 -07:00
Tom Lendacky	24be2b7956	KVM: SVM: Allow SNP guest policy to specify SINGLE_SOCKET KVM currently returns -EINVAL when it attempts to create an SNP guest if the SINGLE_SOCKET guest policy bit is set. The reason for this action is that KVM would need specific support (SNP_ACTIVATE_EX command support) to achieve this when running on a system with more than one socket. However, the SEV firmware will make the proper check and return POLICY_FAILURE during SNP_ACTIVATE if the single socket guest policy bit is set and the system has more than one socket: - System with one socket - Guest policy SINGLE_SOCKET == 0 ==> SNP_ACTIVATE succeeds - Guest policy SINGLE_SOCKET == 1 ==> SNP_ACTIVATE succeeds - System with more than one socket - Guest policy SINGLE_SOCKET == 0 ==> SNP_ACTIVATE succeeds - Guest policy SINGLE_SOCKET == 1 ==> SNP_ACTIVATE fails with POLICY_FAILURE Remove the check for the SINGLE_SOCKET policy bit from snp_launch_start() and allow the firmware to perform the proper checking. This does have the effect of allowing an SNP guest with the SINGLE_SOCKET policy bit set to run on a single socket system, but fail when run on a system with more than one socket. However, this should not affect existing SNP guests as setting the SINGLE_SOCKET policy bit is not allowed today. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/4c51018dd3e4f2c543935134d2c4f47076f109f6.1748553480.git.thomas.lendacky@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-20 13:33:45 -07:00
Tom Lendacky	9f4701e05f	KVM: SVM: Allow SNP guest policy disallow running with SMT enabled KVM currently returns -EINVAL when it attempts to create an SNP guest if the SMT guest policy bit is not set. However, there is no reason to check this, as there is no specific support in KVM that is required to support this. The SEV firmware will determine if SMT has been enabled or disabled in the BIOS and process the policy in the proper way: - SMT enabled in BIOS - Guest policy SMT == 0 ==> SNP_LAUNCH_START fails with POLICY_FAILURE - Guest policy SMT == 1 ==> SNP_LAUNCH_START succeeds - SMT disabled in BIOS - Guest policy SMT == 0 ==> SNP_LAUNCH_START succeeds - Guest policy SMT == 1 ==> SNP_LAUNCH_START succeeds Remove the check for the SMT policy bit from snp_launch_start() and allow the firmware to perform the proper checking. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/71043abdd9ef23b6f98fffa9c5c6045ac3a50187.1748553480.git.thomas.lendacky@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-20 13:33:45 -07:00
Sean Christopherson	bea44d1992	KVM: x86: Simplify userspace filter logic when disabling MSR interception Refactor {svm,vmx}_disable_intercept_for_msr() to simplify the handling of userspace filters that disallow access to an MSR. The more complicated logic is no longer needed or justified now that KVM recalculates all MSR intercepts on a userspace MSR filter change, i.e. now that KVM doesn't need to also update shadow bitmaps. No functional change intended. Suggested-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20250610225737.156318-32-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-20 13:07:37 -07:00
Sean Christopherson	73be81b3bb	KVM: SVM: Add a helper to allocate and initialize permissions bitmaps Add a helper to allocate and initialize an MSR or I/O permissions map, as the logic is identical between the two map types, the only difference is the size of the bitmap. Opportunistically add a comment to explain why the bitmaps are initialized with 0xff, e.g. instead of the more common zero-initialized behavior, which is the main motivation for deduplicating the code. No functional change intended. Link: https://lore.kernel.org/r/20250610225737.156318-31-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-20 13:07:37 -07:00
Sean Christopherson	54f1c77061	KVM: nSVM: Merge MSRPM in 64-bit chunks on 64-bit kernels When merging L0 and L1 MSRPMs as part of nested VMRUN emulation, access the bitmaps using "unsigned long" chunks, i.e. use 8-byte access for 64-bit kernels instead of arbitrarily working on 4-byte chunks. Opportunistically rename local variables in nested_svm_merge_msrpm() to more precisely/accurately reflect their purpose ("offset" in particular is extremely ambiguous). Link: https://lore.kernel.org/r/20250610225737.156318-30-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-20 13:07:36 -07:00
Sean Christopherson	5904ba5172	KVM: SVM: Return -EINVAL instead of MSR_INVALID to signal out-of-range MSR Return -EINVAL instead of MSR_INVALID from svm_msrpm_bit_nr() to indicate that the MSR isn't covered by one of the (currently) three MSRPM ranges, and delete the MSR_INVALID macro now that all users are gone. Link: https://lore.kernel.org/r/20250610225737.156318-29-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-20 13:07:35 -07:00
Sean Christopherson	52f8217742	KVM: nSVM: Access MSRPM in 4-byte chunks only for merging L0 and L1 bitmaps Access the MSRPM using u32/4-byte chunks (and appropriately adjusted offsets) only when merging L0 and L1 bitmaps as part of emulating VMRUN. The only reason to batch accesses to MSRPMs is to avoid the overhead of uaccess operations (e.g. STAC/CLAC and bounds checks) when reading L1's bitmap pointed at by vmcb12. For all other uses, either per-bit accesses are more than fast enough (no uaccess), or KVM is only accessing a single bit (nested_svm_exit_handled_msr()) and so there's nothing to batch. In addition to (hopefully) documenting the uniqueness of the merging code, restricting chunked access to _just_ the merging code will allow for increasing the chunk size (to unsigned long) with minimal risk. Link: https://lore.kernel.org/r/20250610225737.156318-28-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-20 13:07:34 -07:00
Sean Christopherson	7fe0578041	KVM: SVM: Store MSRPM pointer as "void " instead of "u32 " Store KVM's MSRPM pointers as "void " instead of "u32 " to guard against directly accessing the bitmaps outside of code that is explicitly written to access the bitmaps with a specific type. Opportunistically use svm_vcpu_free_msrpm() in svm_vcpu_free() instead of open coding an equivalent. Link: https://lore.kernel.org/r/20250610225737.156318-27-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-20 13:07:34 -07:00
Sean Christopherson	5c9c084763	KVM: SVM: Move svm_msrpm_offset() to nested.c Move svm_msrpm_offset() from svm.c to nested.c now that all usage of the u32-index offsets is nested virtualization specific. No functional change intended. Link: https://lore.kernel.org/r/20250610225737.156318-26-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-20 13:07:33 -07:00
Sean Christopherson	2f89888434	KVM: SVM: Drop explicit check on MSRPM offset when emulating SEV-ES accesses Now that msr_write_intercepted() defaults to true, i.e. accurately reflects hardware behavior for out-of-range MSRs, and doesn't WARN (or BUG) on an out-of-range MSR, drop sev_es_prevent_msr_access()'s svm_msrpm_offset() check that guarded against calling msr_write_intercepted() with a "bad" index. Opportunistically clean up the helper's formatting. Link: https://lore.kernel.org/r/20250610225737.156318-25-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-20 13:07:32 -07:00
Sean Christopherson	4880919aaf	KVM: SVM: Merge "after set CPUID" intercept recalc helpers Merge svm_recalc_intercepts_after_set_cpuid() and svm_recalc_instruction_intercepts() such that the "after set CPUID" helper simply invokes the type-specific helpers (MSRs vs. instructions), i.e. make svm_recalc_intercepts_after_set_cpuid() a single entry point for all intercept updates that need to be performed after a CPUID change. No functional change intended. Link: https://lore.kernel.org/r/20250610225737.156318-24-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-20 13:07:32 -07:00
Sean Christopherson	40ba80e4b0	KVM: SVM: Fold svm_vcpu_init_msrpm() into its sole caller Fold svm_vcpu_init_msrpm() into svm_recalc_msr_intercepts() now that there is only the one caller (and because the "init" misnomer is even more misleading than it was in the past). No functional change intended. Link: https://lore.kernel.org/r/20250610225737.156318-23-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-20 13:07:31 -07:00
Sean Christopherson	049dff172b	KVM: SVM: Rename init_vmcb_after_set_cpuid() to make it intercepts specific Rename init_vmcb_after_set_cpuid() to svm_recalc_intercepts_after_set_cpuid() to more precisely describe its role. Strictly speaking, the name isn't perfect as toggling virtual VM{LOAD,SAVE} is arguably not recalculating an intercept, but practically speaking it's close enough. No functional change intended. Link: https://lore.kernel.org/r/20250610225737.156318-22-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-20 13:07:30 -07:00
Sean Christopherson	4ceca57e3f	KVM: x86: Rename msr_filter_changed() => recalc_msr_intercepts() Rename msr_filter_changed() to recalc_msr_intercepts() and drop the trampoline wrapper now that both SVM and VMX use a filter-agnostic recalc helper to react to the new userspace filter. No functional change intended. Reviewed-by: Xin Li (Intel) <xin@zytor.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Link: https://lore.kernel.org/r/20250610225737.156318-21-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-20 13:07:30 -07:00
Sean Christopherson	160f143cc1	KVM: SVM: Manually recalc all MSR intercepts on userspace MSR filter change On a userspace MSR filter change, recalculate all MSR intercepts using the filter-agnostic logic instead of maintaining a "shadow copy" of KVM's desired intercepts. The shadow bitmaps add yet another point of failure, are confusing (e.g. what does "handled specially" mean!?!?), an eyesore, and a maintenance burden. Given that KVM must be able to recalculate the correct intercepts at any given time, and that MSR filter updates are not hot paths, there is zero benefit to maintaining the shadow bitmaps. Opportunistically switch from boot_cpu_has() to cpu_feature_enabled() as appropriate. Link: https://lore.kernel.org/all/aCdPbZiYmtni4Bjs@google.com Link: https://lore.kernel.org/all/20241126180253.GAZ0YNTdXH1UGeqsu6@fat_crate.local Cc: Francesco Lavra <francescolavra.fl@gmail.com> Link: https://lore.kernel.org/r/20250610225737.156318-20-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-20 13:07:29 -07:00
Sean Christopherson	405a63d4d3	KVM: x86: Move definition of X2APIC_MSR() to lapic.h Dedup the definition of X2APIC_MSR and put it in the local APIC code where it belongs. No functional change intended. Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20250610225737.156318-18-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-20 13:07:28 -07:00
Sean Christopherson	cb53d07948	KVM: SVM: Drop "always" flag from list of possible passthrough MSRs Drop the "always" flag from the array of possible passthrough MSRs, and instead manually initialize the permissions for the handful of MSRs that KVM passes through by default. In addition to cutting down on boilerplate copy+paste code and eliminating a misleading flag (the MSRs aren't always passed through, e.g. thanks to MSR filters), this will allow for removing the direct_access_msrs array entirely. Link: https://lore.kernel.org/r/20250610225737.156318-17-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-20 13:07:27 -07:00
Sean Christopherson	3a0f09b361	KVM: SVM: Pass through GHCB MSR if and only if VM is an SEV-ES guest Disable interception of the GHCB MSR if and only if the VM is an SEV-ES guest. While the exact behavior is completely undocumented in the APM, common sense and testing on SEV-ES capable CPUs says that accesses to the GHCB from non-SEV-ES guests will #GP. I.e. from the guest's perspective, no functional change intended. Fixes: `376c6d2850` ("KVM: SVM: Provide support for SEV-ES vCPU creation/loading") Link: https://lore.kernel.org/r/20250610225737.156318-16-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-20 13:07:26 -07:00
Sean Christopherson	6b7315fe54	KVM: SVM: Implement and adopt VMX style MSR intercepts APIs Add and use SVM MSR interception APIs (in most paths) to match VMX's APIs and nomenclature. Specifically, add SVM variants of: vmx_disable_intercept_for_msr(vcpu, msr, type) vmx_enable_intercept_for_msr(vcpu, msr, type) vmx_set_intercept_for_msr(vcpu, msr, type, intercept) to eventually replace SVM's single helper: set_msr_interception(vcpu, msrpm, msr, allow_read, allow_write) which is awkward to use (in all cases, KVM either applies the same logic for both reads and writes, or intercepts one of read or write), and is unintuitive due to using '0' to indicate interception should be set. Keep the guts of the old API for the moment to avoid churning the MSR filter code, as that mess will be overhauled in the near future. Leave behind a temporary comment to call out that the shadow bitmaps have inverted polarity relative to the bitmaps consumed by hardware. No functional change intended. Reviewed-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Link: https://lore.kernel.org/r/20250610225737.156318-15-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-20 13:07:26 -07:00
Sean Christopherson	c38595ad69	KVM: SVM: Add helpers for accessing MSR bitmap that don't rely on offsets Add macro-built helpers for testing, setting, and clearing MSRPM entries without relying on precomputed offsets. This sets the stage for eventually removing general KVM use of precomputed offsets, which are quite confusing and rather inefficient for the vast majority of KVM's usage. Outside of merging L0 and L1 bitmaps for nested SVM, using u32-indexed offsets and accesses is at best unnecessary, and at worst introduces extra operations to retrieve the individual bit from within the offset u32 value. And simply calling them "offsets" is very confusing, as the "unit" of the offset isn't immediately obvious. Use the new helpers in set_msr_interception_bitmap() and msr_write_intercepted() to verify the math and operations, but keep the existing offset-based logic in set_msr_interception_bitmap() to sanity check the "clear" and "set" operations. Manipulating MSR interceptions isn't a hot path and no kernel release is ever expected to contain this specific version of set_msr_interception_bitmap() (it will be removed entirely in the near future). Link: https://lore.kernel.org/r/20250610225737.156318-14-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-20 13:07:25 -07:00
Sean Christopherson	4879dc9469	KVM: nSVM: Don't initialize vmcb02 MSRPM with vmcb01's "always passthrough" Don't initialize vmcb02's MSRPM with KVM's set of "always passthrough" MSRs, as KVM always needs to consult L1's intercepts, i.e. needs to merge vmcb01 with vmcb12 and write the result to vmcb02. This will eventually allow for the removal of svm_vcpu_init_msrpm(). Note, the bitmaps are truly initialized by svm_vcpu_alloc_msrpm() (default to intercepting all MSRs), e.g. if there is a bug lurking elsewhere, the worst case scenario from dropping the call to svm_vcpu_init_msrpm() should be that KVM would fail to passthrough MSRs to L2. Link: https://lore.kernel.org/r/20250610225737.156318-13-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-20 13:07:24 -07:00
Sean Christopherson	f21ff2c8c9	KVM: nSVM: Omit SEV-ES specific passthrough MSRs from L0+L1 bitmap merge Don't merge bitmaps on nested VMRUN for MSRs that KVM passes through only for SEV-ES guests. KVM doesn't support nested virtualization for SEV-ES, and likely never will. Link: https://lore.kernel.org/r/20250610225737.156318-12-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-20 13:07:24 -07:00
Sean Christopherson	9b72c3d59f	KVM: nSVM: Use dedicated array of MSRPM offsets to merge L0 and L1 bitmaps Use a dedicated array of MSRPM offsets to merge L0 and L1 bitmaps, i.e. to merge KVM's vmcb01 bitmap with L1's vmcb12 bitmap. This will eventually allow for the removal of direct_access_msrs, as the only path where tracking the offsets is truly justified is the merge for nested SVM, where merging in chunks is an easy way to batch uaccess reads/writes. Opportunistically omit the x2APIC MSRs from the merge-specific array instead of filtering them out at runtime. Note, disabling interception of DEBUGCTL, XSS, EFER, PAT, GHCB, and TSC_AUX is mutually exclusive with nested virtualization, as KVM passes through those MSRs only for SEV-ES guests, and KVM doesn't support nested virtualization for SEV+ guests. Defer removing those MSRs to a future cleanup in order to make this refactoring as benign as possible. Link: https://lore.kernel.org/r/20250610225737.156318-11-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-20 13:07:23 -07:00
Sean Christopherson	16e9584cc0	KVM: SVM: Clean up macros related to architectural MSRPM definitions Move SVM's MSR Permissions Map macros to svm.h in anticipation of adding helpers that are available to SVM code, and opportunistically replace a variety of open-coded literals with (hopefully) informative macros. Opportunistically open code ARRAY_SIZE(msrpm_ranges) instead of wrapping it as NUM_MSR_MAPS, which is an ambiguous name even if it were qualified with "SVM_MSRPM". Deliberately leave the ranges as open coded literals, as using macros to define the ranges actually introduces more potential failure points, since both the definitions and the usage have to be careful to use the correct index. The lack of clear intent behind the ranges will be addressed in future patches. No functional change intended. Link: https://lore.kernel.org/r/20250610225737.156318-10-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-20 13:07:11 -07:00
Sean Christopherson	925149b6d0	KVM: SVM: Massage name and param of helper that merges vmcb01 and vmcb12 MSRPMs Rename nested_svm_vmrun_msrpm() to nested_svm_merge_msrpm() to better capture its role, and opportunistically feed it @vcpu instead of @svm, as grabbing "svm" only to turn around and grab svm->vcpu is rather silly. No functional change intended. Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20250610225737.156318-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-20 13:05:40 -07:00
Sean Christopherson	b1bccf7883	KVM: x86: Use non-atomic bit ops to manipulate "shadow" MSR intercepts Manipulate the MSR bitmaps using non-atomic bit ops APIs (two underscores), as the bitmaps are per-vCPU and are only ever accessed while vcpu->mutex is held. Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Link: https://lore.kernel.org/r/20250610225737.156318-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-20 13:05:40 -07:00
Sean Christopherson	6353cd685c	KVM: SVM: Kill the VM instead of the host if MSR interception is buggy WARN and kill the VM instead of panicking the host if KVM attempts to set or query MSR interception for an unsupported MSR. Accessing the MSR interception bitmaps only meaningfully affects post-VMRUN behavior, and KVM_BUG_ON() is guaranteed to prevent the current vCPU from doing VMRUN, i.e. there is no need to panic the entire host. Opportunistically move the sanity checks about their use to index into the MSRPM, e.g. so that bugs only WARN and terminate the VM, as opposed to doing that _and_ generating an out-of-bounds load. Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20250610225737.156318-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-20 13:05:40 -07:00
Sean Christopherson	b241c50c4e	KVM: SVM: Use ARRAY_SIZE() to iterate over direct_access_msrs Drop the unnecessary and dangerous value-terminated behavior of direct_access_msrs, and simply iterate over the actual size of the array. The use in svm_set_x2apic_msr_interception() is especially sketchy, as it relies on unused capacity being zero-initialized, and '0' being outside the range of x2APIC MSRs. To ensure the array and shadow_msr_intercept stay synchronized, simply assert that their sizes are identical (note the six 64-bit-only MSRs). Note, direct_access_msrs will soon be removed entirely; keeping the assert synchronized with the array isn't expected to be along-term maintenance burden. Reviewed-by: Chao Gao <chao.gao@intel.com> Link: https://lore.kernel.org/r/20250610225737.156318-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-20 13:05:40 -07:00
Sean Christopherson	f886515f9b	KVM: SVM: Tag MSR bitmap initialization helpers with __init Tag init_msrpm_offsets() and add_msr_offset() with __init, as they're used only during hardware setup to map potential passthrough MSRs to offsets in the bitmap. Reviewed-by: Chao Gao <chao.gao@intel.com> Link: https://lore.kernel.org/r/20250610225737.156318-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-20 13:05:40 -07:00
Sean Christopherson	5ebd737308	KVM: SVM: Don't BUG if setting up the MSR intercept bitmaps fails WARN and reject module loading if there is a problem with KVM's MSR interception bitmaps. Panicking the host in this situation is inexcusable since it is trivially easy to propagate the error up the stack. Link: https://lore.kernel.org/r/20250610225737.156318-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-20 13:05:40 -07:00
Sean Christopherson	fb96d5cf0f	KVM: SVM: Allocate IOPM pages after initial setup in svm_hardware_setup() Allocate pages for the IOPM after initial setup has been completed in svm_hardware_setup(), so that sanity checks can be added in the setup flow without needing to free the IOPM pages. The IOPM is only referenced (via iopm_base) in init_vmcb() and svm_hardware_unsetup(), so there's no need to allocate it early on. No functional change intended (beyond the obvious ordering differences, e.g. if the allocation fails). Link: https://lore.kernel.org/r/20250610225737.156318-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-20 13:05:39 -07:00
Sean Christopherson	674ffc6503	KVM: SVM: Disable interception of SPEC_CTRL iff the MSR exists for the guest Disable interception of SPEC_CTRL when the CPU virtualizes (i.e. context switches) SPEC_CTRL if and only if the MSR exists according to the vCPU's CPUID model. Letting the guest access SPEC_CTRL is generally benign, but the guest would see inconsistent behavior if KVM happened to emulate an access to the MSR. Fixes: `d00b99c514` ("KVM: SVM: Add support for Virtual SPEC_CTRL") Reported-by: Chao Gao <chao.gao@intel.com> Link: https://lore.kernel.org/r/20250610225737.156318-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-20 13:05:39 -07:00
Sean Christopherson	80c64c7afe	KVM: x86: Drop kvm_x86_ops.set_dr6() in favor of a new KVM_RUN flag Instruct vendor code to load the guest's DR6 into hardware via a new KVM_RUN flag, and remove kvm_x86_ops.set_dr6(), whose sole purpose was to load vcpu->arch.dr6 into hardware when DR6 can be read/written directly by the guest. Note, TDX already WARNs on any run_flag being set, i.e. will yell if KVM thinks DR6 needs to be reloaded. TDX vCPUs force KVM_DEBUGREG_AUTO_SWITCH and never clear the flag, i.e. should never observe KVM_RUN_LOAD_GUEST_DR6. Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250610232010.162191-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-20 13:04:24 -07:00
Sean Christopherson	2478b1b220	KVM: x86: Convert vcpu_run()'s immediate exit param into a generic bitmap Convert kvm_x86_ops.vcpu_run()'s "force_immediate_exit" boolean parameter into an a generic bitmap so that similar "take action" information can be passed to vendor code without creating a pile of boolean parameters. This will allow dropping kvm_x86_ops.set_dr6() in favor of a new flag, and will also allow for adding similar functionality for re-loading debugctl in the active VMCS. Opportunistically massage the TDX WARN and comment to prepare for adding more run_flags, all of which are expected to be mutually exclusive with TDX, i.e. should be WARNed on. No functional change intended. Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250610232010.162191-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-06-20 13:04:24 -07:00
Borislav Petkov (AMD)	d8010d4ba4	x86/bugs: Add a Transient Scheduler Attacks mitigation Add the required features detection glue to bugs.c et all in order to support the TSA mitigation. Co-developed-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>	2025-06-17 17:17:02 +02:00
Ashish Kalra	179a8427fc	KVM: SEV: Disable SEV-SNP support on initialization failure During platform init, SNP initialization may fail for several reasons, such as firmware command failures and incompatible versions. However, the KVM capability may continue to advertise support for it. The platform may have SNP enabled but if SNP_INIT fails then SNP is not supported by KVM. During KVM module initialization query the SNP platform status to obtain the SNP initialization state and use it as an additional condition to determine support for SEV-SNP. Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Co-developed-by: Pratik R. Sampat <prsampat@amd.com> Signed-off-by: Pratik R. Sampat <prsampat@amd.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Reviewed-by: Pavan Kumar Paluri <papaluri@amd.com> Message-ID: <20250512221634.12045-1-Ashish.Kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-06-11 14:27:06 -04:00
Linus Torvalds	7f9039c524	Generic: * Clean up locking of all vCPUs for a VM by using the _nest_lock() family of functions, and move duplicated code to virt/kvm/. kernel/ patches acked by Peter Zijlstra. Add MGLRU support to the access tracking perf test. ARM fixes: * Make the irqbypass hooks resilient to changes in the GSI<->MSI routing, avoiding behind stale vLPI mappings being left behind. The fix is to resolve the VGIC IRQ using the host IRQ (which is stable) and nuking the vLPI mapping upon a routing change. * Close another VGIC race where vCPU creation races with VGIC creation, leading to in-flight vCPUs entering the kernel w/o private IRQs allocated. * Fix a build issue triggered by the recently added workaround for Ampere's AC04_CPU_23 erratum. * Correctly sign-extend the VA when emulating a TLBI instruction potentially targeting a VNCR mapping. * Avoid dereferencing a NULL pointer in the VGIC debug code, which can happen if the device doesn't have any mapping yet. s390: * Fix interaction between some filesystems and Secure Execution * Some cleanups and refactorings, preparing for an upcoming big series x86: * Wait for target vCPU to acknowledge KVM_REQ_UPDATE_PROTECTED_GUEST_STATE to fix a race between AP destroy and VMRUN. * Decrypt and dump the VMSA in dump_vmcb() if debugging enabled for the VM. * Refine and harden handling of spurious faults. * Add support for ALLOWED_SEV_FEATURES. * Add #VMGEXIT to the set of handlers special cased for CONFIG_RETPOLINE=y. * Treat DEBUGCTL[5:2] as reserved to pave the way for virtualizing features that utilize those bits. * Don't account temporary allocations in sev_send_update_data(). * Add support for KVM_CAP_X86_BUS_LOCK_EXIT on SVM, via Bus Lock Threshold. * Unify virtualization of IBRS on nested VM-Exit, and cross-vCPU IBPB, between SVM and VMX. * Advertise support to userspace for WRMSRNS and PREFETCHI. * Rescan I/O APIC routes after handling EOI that needed to be intercepted due to the old/previous routing, but not the new/current routing. * Add a module param to control and enumerate support for device posted interrupts. * Fix a potential overflow with nested virt on Intel systems running 32-bit kernels. * Flush shadow VMCSes on emergency reboot. * Add support for SNP to the various SEV selftests. * Add a selftest to verify fastops instructions via forced emulation. * Refine and optimize KVM's software processing of the posted interrupt bitmap, and share the harvesting code between KVM and the kernel's Posted MSI handler -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmg9TjwUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroOUxQf7B7nnWqIKd7jSkGzSD6YsSX9TXktr 2tJIOfWM3zNYg5GRCidg+m4Y5+DqQWd3Hi5hH2P9wUw7RNuOjOFsDe+y0VBr8ysE ve39t/yp+mYalNmHVFl8s3dBDgrIeGKiz+Wgw3zCQIBZ18rJE1dREhv37RlYZ3a2 wSvuObe8sVpCTyKIowDs1xUi7qJUBoopMSuqfleSHawRrcgCpV99U8/KNFF5plLH 7fXOBAHHniVCVc+mqQN2wxtVJDhST+U3TaU4GwlKy9Yevr+iibdOXffveeIgNEU4 D6q1F2zKp6UdV3+p8hxyaTTbiCVDqsp9WOgY/0I/f+CddYn0WVZgOlR+ow== =mYFL -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull more kvm updates from Paolo Bonzini: Generic: - Clean up locking of all vCPUs for a VM by using the _nest_lock() family of functions, and move duplicated code to virt/kvm/. kernel/ patches acked by Peter Zijlstra - Add MGLRU support to the access tracking perf test ARM fixes: - Make the irqbypass hooks resilient to changes in the GSI<->MSI routing, avoiding behind stale vLPI mappings being left behind. The fix is to resolve the VGIC IRQ using the host IRQ (which is stable) and nuking the vLPI mapping upon a routing change - Close another VGIC race where vCPU creation races with VGIC creation, leading to in-flight vCPUs entering the kernel w/o private IRQs allocated - Fix a build issue triggered by the recently added workaround for Ampere's AC04_CPU_23 erratum - Correctly sign-extend the VA when emulating a TLBI instruction potentially targeting a VNCR mapping - Avoid dereferencing a NULL pointer in the VGIC debug code, which can happen if the device doesn't have any mapping yet s390: - Fix interaction between some filesystems and Secure Execution - Some cleanups and refactorings, preparing for an upcoming big series x86: - Wait for target vCPU to ack KVM_REQ_UPDATE_PROTECTED_GUEST_STATE to fix a race between AP destroy and VMRUN - Decrypt and dump the VMSA in dump_vmcb() if debugging enabled for the VM - Refine and harden handling of spurious faults - Add support for ALLOWED_SEV_FEATURES - Add #VMGEXIT to the set of handlers special cased for CONFIG_RETPOLINE=y - Treat DEBUGCTL[5:2] as reserved to pave the way for virtualizing features that utilize those bits - Don't account temporary allocations in sev_send_update_data() - Add support for KVM_CAP_X86_BUS_LOCK_EXIT on SVM, via Bus Lock Threshold - Unify virtualization of IBRS on nested VM-Exit, and cross-vCPU IBPB, between SVM and VMX - Advertise support to userspace for WRMSRNS and PREFETCHI - Rescan I/O APIC routes after handling EOI that needed to be intercepted due to the old/previous routing, but not the new/current routing - Add a module param to control and enumerate support for device posted interrupts - Fix a potential overflow with nested virt on Intel systems running 32-bit kernels - Flush shadow VMCSes on emergency reboot - Add support for SNP to the various SEV selftests - Add a selftest to verify fastops instructions via forced emulation - Refine and optimize KVM's software processing of the posted interrupt bitmap, and share the harvesting code between KVM and the kernel's Posted MSI handler" tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (93 commits) rtmutex_api: provide correct extern functions KVM: arm64: vgic-debug: Avoid dereferencing NULL ITE pointer KVM: arm64: vgic-init: Plug vCPU vs. VGIC creation race KVM: arm64: Unmap vLPIs affected by changes to GSI routing information KVM: arm64: Resolve vLPI by host IRQ in vgic_v4_unset_forwarding() KVM: arm64: Protect vLPI translation with vgic_irq::irq_lock KVM: arm64: Use lock guard in vgic_v4_set_forwarding() KVM: arm64: Mask out non-VA bits from TLBI VA* on VNCR invalidation arm64: sysreg: Drag linux/kconfig.h to work around vdso build issue KVM: s390: Simplify and move pv code KVM: s390: Refactor and split some gmap helpers KVM: s390: Remove unneeded srcu lock s390: Remove unneeded includes s390/uv: Improve splitting of large folios that cannot be split while dirty s390/uv: Always return 0 from s390_wiggle_split_folio() if successful s390/uv: Don't return 0 from make_hva_secure() if the operation was not successful rust: add helper for mutex_trylock RISC-V: KVM: use kvm_trylock_all_vcpus when locking all vCPUs KVM: arm64: use kvm_trylock_all_vcpus when locking all vCPUs x86: KVM: SVM: use kvm_lock_all_vcpus instead of a custom implementation ...	2025-06-02 12:24:58 -07:00
Linus Torvalds	43db111107	ARM: * Add large stage-2 mapping (THP) support for non-protected guests when pKVM is enabled, clawing back some performance. * Enable nested virtualisation support on systems that support it, though it is disabled by default. * Add UBSAN support to the standalone EL2 object used in nVHE/hVHE and protected modes. * Large rework of the way KVM tracks architecture features and links them with the effects of control bits. While this has no functional impact, it ensures correctness of emulation (the data is automatically extracted from the published JSON files), and helps dealing with the evolution of the architecture. * Significant changes to the way pKVM tracks ownership of pages, avoiding page table walks by storing the state in the hypervisor's vmemmap. This in turn enables the THP support described above. * New selftest checking the pKVM ownership transition rules * Fixes for FEAT_MTE_ASYNC being accidentally advertised to guests even if the host didn't have it. * Fixes for the address translation emulation, which happened to be rather buggy in some specific contexts. * Fixes for the PMU emulation in NV contexts, decoupling PMCR_EL0.N from the number of counters exposed to a guest and addressing a number of issues in the process. * Add a new selftest for the SVE host state being corrupted by a guest. * Keep HCR_EL2.xMO set at all times for systems running with the kernel at EL2, ensuring that the window for interrupts is slightly bigger, and avoiding a pretty bad erratum on the AmpereOne HW. * Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers from a pretty bad case of TLB corruption unless accesses to HCR_EL2 are heavily synchronised. * Add a per-VM, per-ITS debugfs entry to dump the state of the ITS tables in a human-friendly fashion. * and the usual random cleanups. LoongArch: * Don't flush tlb if the host supports hardware page table walks. * Add KVM selftests support. RISC-V: * Add vector registers to get-reg-list selftest * VCPU reset related improvements * Remove scounteren initialization from VCPU reset * Support VCPU reset from userspace using set_mpstate() ioctl x86: * Initial support for TDX in KVM. This finally makes it possible to use the TDX module to run confidential guests on Intel processors. This is quite a large series, including support for private page tables (managed by the TDX module and mirrored in KVM for efficiency), forwarding some TDVMCALLs to userspace, and handling several special VM exits from the TDX module. This has been in the works for literally years and it's not really possible to describe everything here, so I'll defer to the various merge commits up to and including commit `7bcf7246c4` ("Merge branch 'kvm-tdx-finish-initial' into HEAD"). -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmg02hwUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroNnkwf/db4xeWKSMseCIvBVR+ObDn3LXhwT hAgmTkDkP1zq9RfbfJSbUA1DXRwfP+f1sWySLMWECkFEQW9fGIJF9fOQRDSXKmhX 158U3+FEt+3jxLRCGFd4zyXAqyY3C8JSkPUyJZxCpUbXtB5tdDNac4rZAXKDULwe sUi0OW/kFDM2yt369pBGQAGdN+75/oOrYISGOSvMXHxjccNqvveX8MUhpBjYIuuj 73iBWmsfv3vCtam56Racz3C3v44ie498PmWFtnB0R+CVfWfrnUAaRiGWx+egLiBW dBPDiZywMn++prmphEUFgaStDTQy23JBLJ8+RvHkp+o5GaTISKJB3nedZQ== =adZU -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "As far as x86 goes this pull request "only" includes TDX host support. Quotes are appropriate because (at 6k lines and 100+ commits) it is much bigger than the rest, which will come later this week and consists mostly of bugfixes and selftests. s390 changes will also come in the second batch. ARM: - Add large stage-2 mapping (THP) support for non-protected guests when pKVM is enabled, clawing back some performance. - Enable nested virtualisation support on systems that support it, though it is disabled by default. - Add UBSAN support to the standalone EL2 object used in nVHE/hVHE and protected modes. - Large rework of the way KVM tracks architecture features and links them with the effects of control bits. While this has no functional impact, it ensures correctness of emulation (the data is automatically extracted from the published JSON files), and helps dealing with the evolution of the architecture. - Significant changes to the way pKVM tracks ownership of pages, avoiding page table walks by storing the state in the hypervisor's vmemmap. This in turn enables the THP support described above. - New selftest checking the pKVM ownership transition rules - Fixes for FEAT_MTE_ASYNC being accidentally advertised to guests even if the host didn't have it. - Fixes for the address translation emulation, which happened to be rather buggy in some specific contexts. - Fixes for the PMU emulation in NV contexts, decoupling PMCR_EL0.N from the number of counters exposed to a guest and addressing a number of issues in the process. - Add a new selftest for the SVE host state being corrupted by a guest. - Keep HCR_EL2.xMO set at all times for systems running with the kernel at EL2, ensuring that the window for interrupts is slightly bigger, and avoiding a pretty bad erratum on the AmpereOne HW. - Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers from a pretty bad case of TLB corruption unless accesses to HCR_EL2 are heavily synchronised. - Add a per-VM, per-ITS debugfs entry to dump the state of the ITS tables in a human-friendly fashion. - and the usual random cleanups. LoongArch: - Don't flush tlb if the host supports hardware page table walks. - Add KVM selftests support. RISC-V: - Add vector registers to get-reg-list selftest - VCPU reset related improvements - Remove scounteren initialization from VCPU reset - Support VCPU reset from userspace using set_mpstate() ioctl x86: - Initial support for TDX in KVM. This finally makes it possible to use the TDX module to run confidential guests on Intel processors. This is quite a large series, including support for private page tables (managed by the TDX module and mirrored in KVM for efficiency), forwarding some TDVMCALLs to userspace, and handling several special VM exits from the TDX module. This has been in the works for literally years and it's not really possible to describe everything here, so I'll defer to the various merge commits up to and including commit `7bcf7246c4` ('Merge branch 'kvm-tdx-finish-initial' into HEAD')" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (248 commits) x86/tdx: mark tdh_vp_enter() as __flatten Documentation: virt/kvm: remove unreferenced footnote RISC-V: KVM: lock the correct mp_state during reset KVM: arm64: Fix documentation for vgic_its_iter_next() KVM: arm64: np-guest CMOs with PMD_SIZE fixmap KVM: arm64: Stage-2 huge mappings for np-guests KVM: arm64: Add a range to pkvm_mappings KVM: arm64: Convert pkvm_mappings to interval tree KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest() KVM: arm64: Add a range to __pkvm_host_wrprotect_guest() KVM: arm64: Add a range to __pkvm_host_unshare_guest() KVM: arm64: Add a range to __pkvm_host_share_guest() KVM: arm64: Introduce for_each_hyp_page KVM: arm64: Handle huge mappings for np-guest CMOs KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical section KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock held KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating RISC-V: KVM: add KVM_CAP_RISCV_MP_STATE_RESET RISC-V: KVM: Remove scounteren initialization KVM: RISC-V: remove unnecessary SBI reset state ...	2025-05-29 08:10:01 -07:00
Paolo Bonzini	8e86e73626	Merge branch 'kvm-lockdep-common' into HEAD Introduce new mutex locking functions mutex_trylock_nest_lock() and mutex_lock_killable_nest_lock() and use them to clean up locking of all vCPUs for a VM. For x86, this removes some complex code that was used instead of lockdep's "nest_lock" feature. For ARM and RISC-V, this removes a lockdep warning when the VM is configured to have more than MAX_LOCK_DEPTH vCPUs, and removes a fair amount of duplicate code by sharing the logic across all architectures. Signed-off-by: Paolo BOnzini <pbonzini@redhat.com>	2025-05-28 06:29:17 -04:00
Maxim Levitsky	c560bc9286	x86: KVM: SVM: use kvm_lock_all_vcpus instead of a custom implementation Use kvm_lock_all_vcpus instead of sev's own implementation. Because kvm_lock_all_vcpus uses the _nest_lock feature of lockdep, which ignores subclasses, there is no longer a need to use separate subclasses for source and target VMs. No functional change intended. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Message-ID: <20250512180407.659015-5-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2025-05-27 12:16:41 -04:00
Paolo Bonzini	4e02d4f973	KVM SVM changes for 6.16: - Wait for target vCPU to acknowledge KVM_REQ_UPDATE_PROTECTED_GUEST_STATE to fix a race between AP destroy and VMRUN. - Decrypt and dump the VMSA in dump_vmcb() if debugging enabled for the VM. - Add support for ALLOWED_SEV_FEATURES. - Add #VMGEXIT to the set of handlers special cased for CONFIG_RETPOLINE=y. - Treat DEBUGCTL[5:2] as reserved to pave the way for virtualizing features that utilize those bits. - Don't account temporary allocations in sev_send_update_data(). - Add support for KVM_CAP_X86_BUS_LOCK_EXIT on SVM, via Bus Lock Threshold. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmgwmwAACgkQOlYIJqCj N/1pHw//edW/x838POMeeCN8j39NBKErW9yZoQLhMbzogttRvfoba+xYY9zXyRFx 8AXB8+2iLtb7pXUohc0eYN0mNqgD0SnoMLqGfn7nrkJafJSUAJHAoZn1Mdom1M1y jHvBPbHCMMsgdLV8wpDRqCNWTH+d5W0kcN5WjKwOswVLj1rybVfK7bSLMhvkk1e5 RrOR4Ewf95/Ag2b36L4SvS1yG9fTClmKeGArMXhEXjy2INVSpBYyZMjVtjHiNzU9 TjtB2RSM45O+Zl0T2fZdVW8LFhA6kVeL1v+Oo433CjOQE0LQff3Vl14GCANIlPJU PiWN/RIKdWkuxStIP3vw02eHzONCcg2GnNHzEyKQ1xW8lmrwzVRdXZzVsc2Dmowb 7qGykBQ+wzoE0sMeZPA0k/QOSqg2vGxUQHjR7720loLV9m9Tu/mJnS9e179GJKgI e1ArSLwKmHpjwKZqU44IQVTZaxSC4Sg2kI670i21ChPgx8+oVkA6I0LFQXymx7uS 2lbH+ovTlJSlP9fbaJhMwAU2wpSHAyXif/HPjdw2LTH3NdgXzfEnZfTlAWiP65LQ hnz5HvmUalW3x9kmzRmeDIAkDnAXhyt3ZQMvbNzqlO5AfS+Tqh4Ed5EFP3IrQAzK HQ+Gi0ip+B84t9Tbi6rfQwzTZEbSSOfYksC7TXqRGhNo/DvHumE= =k6rK -----END PGP SIGNATURE----- Merge tag 'kvm-x86-svm-6.16' of https://github.com/kvm-x86/linux into HEAD KVM SVM changes for 6.16: - Wait for target vCPU to acknowledge KVM_REQ_UPDATE_PROTECTED_GUEST_STATE to fix a race between AP destroy and VMRUN. - Decrypt and dump the VMSA in dump_vmcb() if debugging enabled for the VM. - Add support for ALLOWED_SEV_FEATURES. - Add #VMGEXIT to the set of handlers special cased for CONFIG_RETPOLINE=y. - Treat DEBUGCTL[5:2] as reserved to pave the way for virtualizing features that utilize those bits. - Don't account temporary allocations in sev_send_update_data(). - Add support for KVM_CAP_X86_BUS_LOCK_EXIT on SVM, via Bus Lock Threshold.	2025-05-27 12:15:49 -04:00
Paolo Bonzini	ebd38b26ec	KVM x86 misc changes for 6.16: - Unify virtualization of IBRS on nested VM-Exit, and cross-vCPU IBPB, between SVM and VMX. - Advertise support to userspace for WRMSRNS and PREFETCHI. - Rescan I/O APIC routes after handling EOI that needed to be intercepted due to the old/previous routing, but not the new/current routing. - Add a module param to control and enumerate support for device posted interrupts. - Misc cleanups. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmgwmHsACgkQOlYIJqCj N/2fYhAAiwKkqQpOWLcGjjezDTnpMqDDbHCffroq0Ttmqfg/cuul1oyZax+9fxBO 203HUi5VKcG7uAGSpLcMFkPUs9hKnaln2lsDaQD+AnGucdj+JKF5p3INCsSYCo9N LVRjRZWtZocxJwHSHX9gU8om0pJ5fBCBG2+7+7XhWRaIqCpJe5k944JotiiOkgZ4 5sXeITkN2kouFVMI8eD4wQGNXxRxs837SYUlwCnoD3VuuBesOZuEhz/CEL9l8vNY keXBLPg7bSW53clKfquNKwXDQRephnZaYoexDebUd+OlZphGhTIPh+C75xPQLWSi aYg6W9XDu3TChf4LPxHnJLwLg/rjeKNQARcxrnb3XLpPAtx3i2cKU8pDPhnd4qn0 +YV5H0dato8bbe+oClGv+oIolM01qfI9SJVoaEhTPu3Rdw9cCQSVFn5t32vG3Vab FVxX+seV3+XTmVveD4cjiiMbqtNADwZ/PmHNAi9QCl46DgHR++MLfRtjYuGo1koL QOmCg2fWOFYtQT6XPJqZxp1SHYxuawrB4qcO9FNyxTuMChoslYoLAr3mBUj0DvwL fdXNof74Ccj8OK9o3uCPOXS92pZz92rz/edy/XmYiCmO+VEwBJFR6IdginZMPX11 UAc4mAC+KkDTvOcKPPIWEXArOsfQMFKefi+bPOeWUx9/nqpcVws= =Vs9P -----END PGP SIGNATURE----- Merge tag 'kvm-x86-misc-6.16' of https://github.com/kvm-x86/linux into HEAD KVM x86 misc changes for 6.16: - Unify virtualization of IBRS on nested VM-Exit, and cross-vCPU IBPB, between SVM and VMX. - Advertise support to userspace for WRMSRNS and PREFETCHI. - Rescan I/O APIC routes after handling EOI that needed to be intercepted due to the old/previous routing, but not the new/current routing. - Add a module param to control and enumerate support for device posted interrupts. - Misc cleanups.	2025-05-27 12:14:36 -04:00

1 2 3 4 5 ...

1311 commits