mirror of
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2025-08-05 16:54:27 +00:00
313 commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
![]() |
63eb28bb14 |
ARM:
- Host driver for GICv5, the next generation interrupt controller for arm64, including support for interrupt routing, MSIs, interrupt translation and wired interrupts. - Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on GICv5 hardware, leveraging the legacy VGIC interface. - Userspace control of the 'nASSGIcap' GICv3 feature, allowing userspace to disable support for SGIs w/o an active state on hardware that previously advertised it unconditionally. - Map supporting endpoints with cacheable memory attributes on systems with FEAT_S2FWB and DIC where KVM no longer needs to perform cache maintenance on the address range. - Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the guest hypervisor to inject external aborts into an L2 VM and take traps of masked external aborts to the hypervisor. - Convert more system register sanitization to the config-driven implementation. - Fixes to the visibility of EL2 registers, namely making VGICv3 system registers accessible through the VGIC device instead of the ONE_REG vCPU ioctls. - Various cleanups and minor fixes. LoongArch: - Add stat information for in-kernel irqchip - Add tracepoints for CPUCFG and CSR emulation exits - Enhance in-kernel irqchip emulation - Various cleanups. RISC-V: - Enable ring-based dirty memory tracking - Improve perf kvm stat to report interrupt events - Delegate illegal instruction trap to VS-mode - MMU improvements related to upcoming nested virtualization s390x - Fixes x86: - Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O APIC, PIC, and PIT emulation at compile time. - Share device posted IRQ code between SVM and VMX and harden it against bugs and runtime errors. - Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups O(1) instead of O(n). - For MMIO stale data mitigation, track whether or not a vCPU has access to (host) MMIO based on whether the page tables have MMIO pfns mapped; using VFIO is prone to false negatives - Rework the MSR interception code so that the SVM and VMX APIs are more or less identical. - Recalculate all MSR intercepts from scratch on MSR filter changes, instead of maintaining shadow bitmaps. - Advertise support for LKGS (Load Kernel GS base), a new instruction that's loosely related to FRED, but is supported and enumerated independently. - Fix a user-triggerable WARN that syzkaller found by setting the vCPU in INIT_RECEIVED state (aka wait-for-SIPI), and then putting the vCPU into VMX Root Mode (post-VMXON). Trying to detect every possible path leading to architecturally forbidden states is hard and even risks breaking userspace (if it goes from valid to valid state but passes through invalid states), so just wait until KVM_RUN to detect that the vCPU state isn't allowed. - Add KVM_X86_DISABLE_EXITS_APERFMPERF to allow disabling interception of APERF/MPERF reads, so that a "properly" configured VM can access APERF/MPERF. This has many caveats (APERF/MPERF cannot be zeroed on vCPU creation or saved/restored on suspend and resume, or preserved over thread migration let alone VM migration) but can be useful whenever you're interested in letting Linux guests see the effective physical CPU frequency in /proc/cpuinfo. - Reject KVM_SET_TSC_KHZ for vm file descriptors if vCPUs have been created, as there's no known use case for changing the default frequency for other VM types and it goes counter to the very reason why the ioctl was added to the vm file descriptor. And also, there would be no way to make it work for confidential VMs with a "secure" TSC, so kill two birds with one stone. - Dynamically allocation the shadow MMU's hashed page list, and defer allocating the hashed list until it's actually needed (the TDP MMU doesn't use the list). - Extract many of KVM's helpers for accessing architectural local APIC state to common x86 so that they can be shared by guest-side code for Secure AVIC. - Various cleanups and fixes. x86 (Intel): - Preserve the host's DEBUGCTL.FREEZE_IN_SMM when running the guest. Failure to honor FREEZE_IN_SMM can leak host state into guests. - Explicitly check vmcs12.GUEST_DEBUGCTL on nested VM-Enter to prevent L1 from running L2 with features that KVM doesn't support, e.g. BTF. x86 (AMD): - WARN and reject loading kvm-amd.ko instead of panicking the kernel if the nested SVM MSRPM offsets tracker can't handle an MSR (which is pretty much a static condition and therefore should never happen, but still). - Fix a variety of flaws and bugs in the AVIC device posted IRQ code. - Inhibit AVIC if a vCPU's ID is too big (relative to what hardware supports) instead of rejecting vCPU creation. - Extend enable_ipiv module param support to SVM, by simply leaving IsRunning clear in the vCPU's physical ID table entry. - Disable IPI virtualization, via enable_ipiv, if the CPU is affected by erratum #1235, to allow (safely) enabling AVIC on such CPUs. - Request GA Log interrupts if and only if the target vCPU is blocking, i.e. only if KVM needs a notification in order to wake the vCPU. - Intercept SPEC_CTRL on AMD if the MSR shouldn't exist according to the vCPU's CPUID model. - Accept any SNP policy that is accepted by the firmware with respect to SMT and single-socket restrictions. An incompatible policy doesn't put the kernel at risk in any way, so there's no reason for KVM to care. - Drop a superfluous WBINVD (on all CPUs!) when destroying a VM and use WBNOINVD instead of WBINVD when possible for SEV cache maintenance. - When reclaiming memory from an SEV guest, only do cache flushes on CPUs that have ever run a vCPU for the guest, i.e. don't flush the caches for CPUs that can't possibly have cache lines with dirty, encrypted data. Generic: - Rework irqbypass to track/match producers and consumers via an xarray instead of a linked list. Using a linked list leads to O(n^2) insertion times, which is hugely problematic for use cases that create large numbers of VMs. Such use cases typically don't actually use irqbypass, but eliminating the pointless registration is a future problem to solve as it likely requires new uAPI. - Track irqbypass's "token" as "struct eventfd_ctx *" instead of a "void *", to avoid making a simple concept unnecessarily difficult to understand. - Decouple device posted IRQs from VFIO device assignment, as binding a VM to a VFIO group is not a requirement for enabling device posted IRQs. - Clean up and document/comment the irqfd assignment code. - Disallow binding multiple irqfds to an eventfd with a priority waiter, i.e. ensure an eventfd is bound to at most one irqfd through the entire host, and add a selftest to verify eventfd:irqfd bindings are globally unique. - Add a tracepoint for KVM_SET_MEMORY_ATTRIBUTES to help debug issues related to private <=> shared memory conversions. - Drop guest_memfd's .getattr() implementation as the VFS layer will call generic_fillattr() if inode_operations.getattr is NULL. - Fix issues with dirty ring harvesting where KVM doesn't bound the processing of entries in any way, which allows userspace to keep KVM in a tight loop indefinitely. - Kill off kvm_arch_{start,end}_assignment() and x86's associated tracking, now that KVM no longer uses assigned_device_count as a heuristic for either irqbypass usage or MDS mitigation. Selftests: - Fix a comment typo. - Verify KVM is loaded when getting any KVM module param so that attempting to run a selftest without kvm.ko loaded results in a SKIP message about KVM not being loaded/enabled (versus some random parameter not existing). - Skip tests that hit EACCES when attempting to access a file, and rpint a "Root required?" help message. In most cases, the test just needs to be run with elevated permissions. -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmiKXMgUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroMhMQf/QDhC/CP1aGXph2whuyeD2NMqPKiU 9KdnDNST+ftPwjg9QxZ9mTaa8zeVz/wly6XlxD9OQHy+opM1wcys3k0GZAFFEEQm YrThgURdzEZ3nwJZgb+m0t4wjJQtpiFIBwAf7qq6z1VrqQBEmHXJ/8QxGuqO+BNC j5q/X+q6KZwehKI6lgFBrrOKWFaxqhnRAYfW6rGBxRXxzTJuna37fvDpodQnNceN zOiq+avfriUMArTXTqOteJNKU0229HjiPSnjILLnFQ+B3akBlwNG0jk7TMaAKR6q IZWG1EIS9q1BAkGXaw6DE1y6d/YwtXCR5qgAIkiGwaPt5yj9Oj6kRN2Ytw== =j2At -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "ARM: - Host driver for GICv5, the next generation interrupt controller for arm64, including support for interrupt routing, MSIs, interrupt translation and wired interrupts - Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on GICv5 hardware, leveraging the legacy VGIC interface - Userspace control of the 'nASSGIcap' GICv3 feature, allowing userspace to disable support for SGIs w/o an active state on hardware that previously advertised it unconditionally - Map supporting endpoints with cacheable memory attributes on systems with FEAT_S2FWB and DIC where KVM no longer needs to perform cache maintenance on the address range - Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the guest hypervisor to inject external aborts into an L2 VM and take traps of masked external aborts to the hypervisor - Convert more system register sanitization to the config-driven implementation - Fixes to the visibility of EL2 registers, namely making VGICv3 system registers accessible through the VGIC device instead of the ONE_REG vCPU ioctls - Various cleanups and minor fixes LoongArch: - Add stat information for in-kernel irqchip - Add tracepoints for CPUCFG and CSR emulation exits - Enhance in-kernel irqchip emulation - Various cleanups RISC-V: - Enable ring-based dirty memory tracking - Improve perf kvm stat to report interrupt events - Delegate illegal instruction trap to VS-mode - MMU improvements related to upcoming nested virtualization s390x - Fixes x86: - Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O APIC, PIC, and PIT emulation at compile time - Share device posted IRQ code between SVM and VMX and harden it against bugs and runtime errors - Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups O(1) instead of O(n) - For MMIO stale data mitigation, track whether or not a vCPU has access to (host) MMIO based on whether the page tables have MMIO pfns mapped; using VFIO is prone to false negatives - Rework the MSR interception code so that the SVM and VMX APIs are more or less identical - Recalculate all MSR intercepts from scratch on MSR filter changes, instead of maintaining shadow bitmaps - Advertise support for LKGS (Load Kernel GS base), a new instruction that's loosely related to FRED, but is supported and enumerated independently - Fix a user-triggerable WARN that syzkaller found by setting the vCPU in INIT_RECEIVED state (aka wait-for-SIPI), and then putting the vCPU into VMX Root Mode (post-VMXON). Trying to detect every possible path leading to architecturally forbidden states is hard and even risks breaking userspace (if it goes from valid to valid state but passes through invalid states), so just wait until KVM_RUN to detect that the vCPU state isn't allowed - Add KVM_X86_DISABLE_EXITS_APERFMPERF to allow disabling interception of APERF/MPERF reads, so that a "properly" configured VM can access APERF/MPERF. This has many caveats (APERF/MPERF cannot be zeroed on vCPU creation or saved/restored on suspend and resume, or preserved over thread migration let alone VM migration) but can be useful whenever you're interested in letting Linux guests see the effective physical CPU frequency in /proc/cpuinfo - Reject KVM_SET_TSC_KHZ for vm file descriptors if vCPUs have been created, as there's no known use case for changing the default frequency for other VM types and it goes counter to the very reason why the ioctl was added to the vm file descriptor. And also, there would be no way to make it work for confidential VMs with a "secure" TSC, so kill two birds with one stone - Dynamically allocation the shadow MMU's hashed page list, and defer allocating the hashed list until it's actually needed (the TDP MMU doesn't use the list) - Extract many of KVM's helpers for accessing architectural local APIC state to common x86 so that they can be shared by guest-side code for Secure AVIC - Various cleanups and fixes x86 (Intel): - Preserve the host's DEBUGCTL.FREEZE_IN_SMM when running the guest. Failure to honor FREEZE_IN_SMM can leak host state into guests - Explicitly check vmcs12.GUEST_DEBUGCTL on nested VM-Enter to prevent L1 from running L2 with features that KVM doesn't support, e.g. BTF x86 (AMD): - WARN and reject loading kvm-amd.ko instead of panicking the kernel if the nested SVM MSRPM offsets tracker can't handle an MSR (which is pretty much a static condition and therefore should never happen, but still) - Fix a variety of flaws and bugs in the AVIC device posted IRQ code - Inhibit AVIC if a vCPU's ID is too big (relative to what hardware supports) instead of rejecting vCPU creation - Extend enable_ipiv module param support to SVM, by simply leaving IsRunning clear in the vCPU's physical ID table entry - Disable IPI virtualization, via enable_ipiv, if the CPU is affected by erratum #1235, to allow (safely) enabling AVIC on such CPUs - Request GA Log interrupts if and only if the target vCPU is blocking, i.e. only if KVM needs a notification in order to wake the vCPU - Intercept SPEC_CTRL on AMD if the MSR shouldn't exist according to the vCPU's CPUID model - Accept any SNP policy that is accepted by the firmware with respect to SMT and single-socket restrictions. An incompatible policy doesn't put the kernel at risk in any way, so there's no reason for KVM to care - Drop a superfluous WBINVD (on all CPUs!) when destroying a VM and use WBNOINVD instead of WBINVD when possible for SEV cache maintenance - When reclaiming memory from an SEV guest, only do cache flushes on CPUs that have ever run a vCPU for the guest, i.e. don't flush the caches for CPUs that can't possibly have cache lines with dirty, encrypted data Generic: - Rework irqbypass to track/match producers and consumers via an xarray instead of a linked list. Using a linked list leads to O(n^2) insertion times, which is hugely problematic for use cases that create large numbers of VMs. Such use cases typically don't actually use irqbypass, but eliminating the pointless registration is a future problem to solve as it likely requires new uAPI - Track irqbypass's "token" as "struct eventfd_ctx *" instead of a "void *", to avoid making a simple concept unnecessarily difficult to understand - Decouple device posted IRQs from VFIO device assignment, as binding a VM to a VFIO group is not a requirement for enabling device posted IRQs - Clean up and document/comment the irqfd assignment code - Disallow binding multiple irqfds to an eventfd with a priority waiter, i.e. ensure an eventfd is bound to at most one irqfd through the entire host, and add a selftest to verify eventfd:irqfd bindings are globally unique - Add a tracepoint for KVM_SET_MEMORY_ATTRIBUTES to help debug issues related to private <=> shared memory conversions - Drop guest_memfd's .getattr() implementation as the VFS layer will call generic_fillattr() if inode_operations.getattr is NULL - Fix issues with dirty ring harvesting where KVM doesn't bound the processing of entries in any way, which allows userspace to keep KVM in a tight loop indefinitely - Kill off kvm_arch_{start,end}_assignment() and x86's associated tracking, now that KVM no longer uses assigned_device_count as a heuristic for either irqbypass usage or MDS mitigation Selftests: - Fix a comment typo - Verify KVM is loaded when getting any KVM module param so that attempting to run a selftest without kvm.ko loaded results in a SKIP message about KVM not being loaded/enabled (versus some random parameter not existing) - Skip tests that hit EACCES when attempting to access a file, and print a "Root required?" help message. In most cases, the test just needs to be run with elevated permissions" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (340 commits) Documentation: KVM: Use unordered list for pre-init VGIC registers RISC-V: KVM: Avoid re-acquiring memslot in kvm_riscv_gstage_map() RISC-V: KVM: Use find_vma_intersection() to search for intersecting VMAs RISC-V: perf/kvm: Add reporting of interrupt events RISC-V: KVM: Enable ring-based dirty memory tracking RISC-V: KVM: Fix inclusion of Smnpm in the guest ISA bitmap RISC-V: KVM: Delegate illegal instruction fault to VS mode RISC-V: KVM: Pass VMID as parameter to kvm_riscv_hfence_xyz() APIs RISC-V: KVM: Factor-out g-stage page table management RISC-V: KVM: Add vmid field to struct kvm_riscv_hfence RISC-V: KVM: Introduce struct kvm_gstage_mapping RISC-V: KVM: Factor-out MMU related declarations into separate headers RISC-V: KVM: Use ncsr_xyz() in kvm_riscv_vcpu_trap_redirect() RISC-V: KVM: Implement kvm_arch_flush_remote_tlbs_range() RISC-V: KVM: Don't flush TLB when PTE is unchanged RISC-V: KVM: Replace KVM_REQ_HFENCE_GVMA_VMID_ALL with KVM_REQ_TLB_FLUSH RISC-V: KVM: Rename and move kvm_riscv_local_tlb_sanitize() RISC-V: KVM: Drop the return value of kvm_riscv_vcpu_aia_init() RISC-V: KVM: Check kvm_riscv_vcpu_alloc_vector_context() return value KVM: arm64: selftests: Add FEAT_RAS EL2 registers to get-reg-list ... |
||
![]() |
98e8f2c0e0 |
x86/platform changes for v6.17:
This tree adds support for the AMD hardware feedback interface (HFI), by Perry Yuan. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmiIdwERHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1gYIRAAueF4R6gXv2luxwZc5xDHUp0z3W2nkFWB mpD329mqewy79fPusGRMjwxwuT7/EzIIfVkjOAi21RnwkoU/L6kBdIsn/ZGMTUpG +8zFnjBsdAb2Nm27LZ+W6nTxHn+ZiMs1U2AUqYC+F/sTtILA7LKt0aufgRrw028I aZKHOvHoYRYxgQEgcaCjBXXyUsw4zetIV8i2fj+rflutXL5tLSSgg9Pyh/dNU8JX MfWHxC2J79mYc8sLj2Pc0H6wuMPDCZ7uzBq/uieuZjhDouSdV9oDFoc3bNwYpxZI 7CBqg8qcnDY3HFnbT7Hwv9O0+Kbl3+714N5P1J0Lb1+ovaRazdLCA9jenRu+lXIp 7OehfOQoxw0XbWfESvEsSmMEjiyxIbcuwyu9X+9asY8bybD21s1gzbQYOGcBD4sf YjbGObnGqqS2GWe6LayU6HKs6kMWQfS+8xLlvWtnT4KNPwVOT1D41Fiy8lYQMXd1 MFX1vRxLdtMtjVrotq4UmENdK/t7fd2BRvDw3wOjSfNynIf55XFRuqR7RTCLQivJ Z8ZF390kb1JSmBJdNZ1Mj4XUT5fTcimLra2qSTZ2dTDO23b0IiTlgoDP1LLbYYDL lRGfMViwO4Kinm/nerzELv/VAFMFm/DvHmWuGj/vaRTS3r7iGb3+A5SFe4JS0Utz BWoqrbrAWCg= =h01B -----END PGP SIGNATURE----- Merge tag 'x86-platform-2025-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 platform updates from Ingo Molnar: "This adds support for the AMD hardware feedback interface (HFI), by Perry Yuan" * tag 'x86-platform-2025-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/itmt: Add debugfs file to show core priorities platform/x86/amd: hfi: Add debugfs support platform/x86/amd: hfi: Set ITMT priority from ranking data cpufreq/amd-pstate: Disable preferred cores on designs with workload classification x86/process: Clear hardware feedback history for AMD processors platform/x86: hfi: Add power management callback platform/x86: hfi: Add online and offline callback support platform/x86: hfi: Init per-cpu scores for each class platform/x86: hfi: Parse CPU core ranking data from shared memory platform/x86: hfi: Introduce AMD Hardware Feedback Interface Driver x86/msr-index: Add AMD workload classification MSRs MAINTAINERS: Add maintainer entry for AMD Hardware Feedback Driver Documentation/x86: Add AMD Hardware Feedback Interface documentation |
||
![]() |
1a14928e2e |
Merge tag 'kvm-x86-misc-6.17' of https://github.com/kvm-x86/linux into HEAD
KVM x86 misc changes for 6.17 - Prevert the host's DEBUGCTL.FREEZE_IN_SMM (Intel only) when running the guest. Failure to honor FREEZE_IN_SMM can bleed host state into the guest. - Explicitly check vmcs12.GUEST_DEBUGCTL on nested VM-Enter (Intel only) to prevent L1 from running L2 with features that KVM doesn't support, e.g. BTF. - Intercept SPEC_CTRL on AMD if the MSR shouldn't exist according to the vCPU's CPUID model. - Rework the MSR interception code so that the SVM and VMX APIs are more or less identical. - Recalculate all MSR intercepts from the "source" on MSR filter changes, and drop the dedicated "shadow" bitmaps (and their awful "max" size defines). - WARN and reject loading kvm-amd.ko instead of panicking the kernel if the nested SVM MSRPM offsets tracker can't handle an MSR. - Advertise support for LKGS (Load Kernel GS base), a new instruction that's loosely related to FRED, but is supported and enumerated independently. - Fix a user-triggerable WARN that syzkaller found by stuffing INIT_RECEIVED, a.k.a. WFS, and then putting the vCPU into VMX Root Mode (post-VMXON). Use the same approach KVM uses for dealing with "impossible" emulation when running a !URG guest, and simply wait until KVM_RUN to detect that the vCPU has architecturally impossible state. - Add KVM_X86_DISABLE_EXITS_APERFMPERF to allow disabling interception of APERF/MPERF reads, so that a "properly" configured VM can "virtualize" APERF/MPERF (with many caveats). - Reject KVM_SET_TSC_KHZ if vCPUs have been created, as changing the "default" frequency is unsupported for VMs with a "secure" TSC, and there's no known use case for changing the default frequency for other VM types. |
||
![]() |
5bf2f5119b |
Linux 6.16
-----BEGIN PGP SIGNATURE----- iQFSBAABCgA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmiGmY4eHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGJtgH/3Ao101Cd+qBIQk6 Ad1TrV6JfSnPMnbafjEYU6VjXyXKT4iIIxYDqndyxSt8fij5EIWMFveOVELkZhe9 jDqlaODFrV7VYowVMAcVZCLEkQneommnbNaUaJK+odF+tPhkEbcHfubf/6Fd009q H0QB/EsExdjlLpwtkzM6yWnKss+1HA+wonw5TNbdj7+OE5vFemaonP2+lmO6EnCC PeiwkawLWA67hGonJFD4S6gp7KseqRUyQKLAbICDf6+86GueXI8QsIHhELyznfBG nN1DyhfxiPz1ICfvX7sIdPU+CuyQG0KCLX0iepNc0IGcJ40x8tLeXiHKNy/k+3Sf FBD2gmg= =HI0k -----END PGP SIGNATURE----- Merge tag 'v6.16' into x86/cpu, to resolve conflict Resolve overlapping context conflict between this upstream fix: |
||
![]() |
5b937a1ed6 |
x86/rdrand: Disable RDSEED on AMD Cyan Skillfish
AMD Cyan Skillfish (Family 17h, Model 47h, Stepping 0h) has an error that causes RDSEED to always return 0xffffffff, while RDRAND works correctly. Mask the RDSEED cap for this CPU so that both /proc/cpuinfo and direct CPUID read report RDSEED as unavailable. [ bp: Move to amd.c, massage. ] Signed-off-by: Mikhail Paulyshka <me@mixaill.net> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org> Link: https://lore.kernel.org/20250524145319.209075-1-me@mixaill.net |
||
![]() |
a3c4f3396b |
x86/msr-index: Add AMD workload classification MSRs
Introduce new MSR registers for AMD hardware feedback support. They provide workload classification and configuration capabilities. Signed-off-by: Perry Yuan <perry.yuan@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Acked-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/20250609200518.3616080-4-superm1@kernel.org |
||
![]() |
65f55a3017 |
x86/CPU/AMD: Add CPUID faulting support
Add CPUID faulting support on AMD using the same user interface. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/20250528213105.1149-1-bp@kernel.org |
||
![]() |
17ec2f9653 |
KVM: VMX: Allow guest to set DEBUGCTL.RTM_DEBUG if RTM is supported
Let the guest set DEBUGCTL.RTM_DEBUG if RTM is supported according to the
guest CPUID model, as debug support is supposed to be available if RTM is
supported, and there are no known downsides to letting the guest debug RTM
aborts.
Note, there are no known bug reports related to RTM_DEBUG, the primary
motivation is to reduce the probability of breaking existing guests when a
future change adds a missing consistency check on vmcs12.GUEST_DEBUGCTL
(KVM currently lets L2 run with whatever hardware supports; whoops).
Note #2, KVM already emulates DR6.RTM, and doesn't restrict access to
DR7.RTM.
Fixes:
|
||
![]() |
785cdec46e |
Core x86 updates for v6.16:
Boot code changes: - A large series of changes to reorganize the x86 boot code into a better isolated and easier to maintain base of PIC early startup code in arch/x86/boot/startup/, by Ard Biesheuvel. Motivation & background: | Since commit | | |
||
![]() |
ddddf9d64f |
Performance events updates for v6.16:
Core & generic-arch updates: - Add support for dynamic constraints and propagate it to the Intel driver (Kan Liang) - Fix & enhance driver-specific throttling support (Kan Liang) - Record sample last_period before updating on the x86 and PowerPC platforms (Mark Barnett) - Make perf_pmu_unregister() usable (Peter Zijlstra) - Unify perf_event_free_task() / perf_event_exit_task_context() (Peter Zijlstra) - Simplify perf_event_release_kernel() and perf_event_free_task() (Peter Zijlstra) - Allocate non-contiguous AUX pages by default (Yabin Cui) Uprobes updates: - Add support to emulate NOP instructions (Jiri Olsa) - selftests/bpf: Add 5-byte NOP uprobe trigger benchmark (Jiri Olsa) x86 Intel PMU enhancements: - Support Intel Auto Counter Reload [ACR] (Kan Liang) - Add PMU support for Clearwater Forest (Dapeng Mi) - Arch-PEBS preparatory changes: (Dapeng Mi) - Parse CPUID archPerfmonExt leaves for non-hybrid CPUs - Decouple BTS initialization from PEBS initialization - Introduce pairs of PEBS static calls x86 AMD PMU enhancements: - Use hrtimer for handling overflows in the AMD uncore driver (Sandipan Das) - Prevent UMC counters from saturating (Sandipan Das) Fixes and cleanups: - Fix put_ctx() ordering (Frederic Weisbecker) - Fix irq work dereferencing garbage (Frederic Weisbecker) - Misc fixes and cleanups (Changbin Du, Frederic Weisbecker, Ian Rogers, Ingo Molnar, Kan Liang, Peter Zijlstra, Qing Wang, Sandipan Das, Thorsten Blum) Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmgy4zoRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1j6QRAAvQ4GBPrdJLb8oXkLjCmWSp9PfM1h2IW0 reUrcV0BPRAwz4T60QEU2KyiEjvKxNghR6bNw4i3slAZ8EFwP9eWE/0ZYOo5+W/N wv8vsopv/oZd2L2G5TgxDJf+tLPkqnTvp651LmGAbquPFONN1lsya9UHVPnt2qtv fvFhjW6D828VoevRcUCsdoEUNlFDkUYQ2c3M1y5H2AI6ILDVxLsp5uYtuVUP+2lQ 7UI/elqRIIblTGT7G9LvTGiXZMm8T58fe1OOLekT6NdweJ3XEt1kMdFo/SCRYfzU eDVVVLSextZfzBXNPtAEAlM3aSgd8+4m5sACiD1EeOUNjo5J9Sj1OOCa+bZGF/Rl XNv5Kcp6Kh1T4N5lio8DE/NabmHDqDMbUGfud+VTS8uLLku4kuOWNMxJTD1nQ2Zz BMfJhP89G9Vk07F9fOGuG1N6mKhIKNOgXh0S92tB7XDHcdJegueu2xh4ZszBL1QK JVXa4DbnDj+y0LvnV+A5Z6VILr5RiCAipDb9ascByPja6BbN10Nf9Aj4nWwRTwbO ut5OK/fDKmSjEHn1+a42d4iRxdIXIWhXCyxEhH+hJXEFx9htbQ3oAbXAEedeJTlT g9QYGAjL96QEd0CqviorV8KyU59nVkEPoLVCumXBZ0WWhNwU6GdAmsW1hLfxQdLN sp+XHhfxf8M= =tPRs -----END PGP SIGNATURE----- Merge tag 'perf-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf events updates from Ingo Molnar: "Core & generic-arch updates: - Add support for dynamic constraints and propagate it to the Intel driver (Kan Liang) - Fix & enhance driver-specific throttling support (Kan Liang) - Record sample last_period before updating on the x86 and PowerPC platforms (Mark Barnett) - Make perf_pmu_unregister() usable (Peter Zijlstra) - Unify perf_event_free_task() / perf_event_exit_task_context() (Peter Zijlstra) - Simplify perf_event_release_kernel() and perf_event_free_task() (Peter Zijlstra) - Allocate non-contiguous AUX pages by default (Yabin Cui) Uprobes updates: - Add support to emulate NOP instructions (Jiri Olsa) - selftests/bpf: Add 5-byte NOP uprobe trigger benchmark (Jiri Olsa) x86 Intel PMU enhancements: - Support Intel Auto Counter Reload [ACR] (Kan Liang) - Add PMU support for Clearwater Forest (Dapeng Mi) - Arch-PEBS preparatory changes: (Dapeng Mi) - Parse CPUID archPerfmonExt leaves for non-hybrid CPUs - Decouple BTS initialization from PEBS initialization - Introduce pairs of PEBS static calls x86 AMD PMU enhancements: - Use hrtimer for handling overflows in the AMD uncore driver (Sandipan Das) - Prevent UMC counters from saturating (Sandipan Das) Fixes and cleanups: - Fix put_ctx() ordering (Frederic Weisbecker) - Fix irq work dereferencing garbage (Frederic Weisbecker) - Misc fixes and cleanups (Changbin Du, Frederic Weisbecker, Ian Rogers, Ingo Molnar, Kan Liang, Peter Zijlstra, Qing Wang, Sandipan Das, Thorsten Blum)" * tag 'perf-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits) perf/headers: Clean up <linux/perf_event.h> a bit perf/uapi: Clean up <uapi/linux/perf_event.h> a bit perf/uapi: Fix PERF_RECORD_SAMPLE comments in <uapi/linux/perf_event.h> mips/perf: Remove driver-specific throttle support xtensa/perf: Remove driver-specific throttle support sparc/perf: Remove driver-specific throttle support loongarch/perf: Remove driver-specific throttle support csky/perf: Remove driver-specific throttle support arc/perf: Remove driver-specific throttle support alpha/perf: Remove driver-specific throttle support perf/apple_m1: Remove driver-specific throttle support perf/arm: Remove driver-specific throttle support s390/perf: Remove driver-specific throttle support powerpc/perf: Remove driver-specific throttle support perf/x86/zhaoxin: Remove driver-specific throttle support perf/x86/amd: Remove driver-specific throttle support perf/x86/intel: Remove driver-specific throttle support perf: Only dump the throttle log for the leader perf: Fix the throttle logic for a group perf/core: Add the is_event_in_freq_mode() helper to simplify the code ... |
||
![]() |
c4070e1996 |
Merge commit 'its-for-linus-20250509-merge' into x86/core, to resolve conflicts
Conflicts: Documentation/admin-guide/hw-vuln/index.rst arch/x86/include/asm/cpufeatures.h arch/x86/kernel/alternative.c arch/x86/kernel/cpu/bugs.c arch/x86/kernel/cpu/common.c drivers/base/cpu.c include/linux/cpu.h Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
![]() |
159013a7ca |
x86/its: Enumerate Indirect Target Selection (ITS) bug
ITS bug in some pre-Alderlake Intel CPUs may allow indirect branches in the first half of a cache line get predicted to a target of a branch located in the second half of the cache line. Set X86_BUG_ITS on affected CPUs. Mitigation to follow in later commits. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> |
||
![]() |
f4138de5e4 |
x86/msr: Standardize on u64 in <asm/msr-index.h>
Also fix some nearby whitespace damage while at it. Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juergen Gross <jgross@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Xin Li <xin@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> |
||
![]() |
ec980e4fac |
perf/x86/intel: Support auto counter reload
The relative rates among two or more events are useful for performance analysis, e.g., a high branch miss rate may indicate a performance issue. Usually, the samples with a relative rate that exceeds some threshold are more useful. However, the traditional sampling takes samples of events separately. To get the relative rates among two or more events, a high sample rate is required, which can bring high overhead. Many samples taken in the non-hotspot area are also dropped (useless) in the post-process. The auto counter reload (ACR) feature takes samples when the relative rate of two or more events exceeds some threshold, which provides the fine-grained information at a low cost. To support the feature, two sets of MSRs are introduced. For a given counter IA32_PMC_GPn_CTR/IA32_PMC_FXm_CTR, bit fields in the IA32_PMC_GPn_CFG_B/IA32_PMC_FXm_CFG_B MSR indicate which counter(s) can cause a reload of that counter. The reload value is stored in the IA32_PMC_GPn_CFG_C/IA32_PMC_FXm_CFG_C. The details can be found at Intel SDM (085), Volume 3, 21.9.11 Auto Counter Reload. In the hw_config(), an ACR event is specially configured, because the cause/reloadable counter mask has to be applied to the dyn_constraint. Besides the HW limit, e.g., not support perf metrics, PDist and etc, a SW limit is applied as well. ACR events in a group must be contiguous. It facilitates the later conversion from the event idx to the counter idx. Otherwise, the intel_pmu_acr_late_setup() has to traverse the whole event list again to find the "cause" event. Also, add a new flag PERF_X86_EVENT_ACR to indicate an ACR group, which is set to the group leader. The late setup() is also required for an ACR group. It's to convert the event idx to the counter idx, and saved it in hw.config1. The ACR configuration MSRs are only updated in the enable_event(). The disable_event() doesn't clear the ACR CFG register. Add acr_cfg_b/acr_cfg_c in the struct cpu_hw_events to cache the MSR values. It can avoid a MSR write if the value is not changed. Expose an acr_mask to the sysfs. The perf tool can utilize the new format to configure the relation of events in the group. The bit sequence of the acr_mask follows the events enabled order of the group. Example: Here is the snippet of the mispredict.c. Since the array has a random numbers, jumps are random and often mispredicted. The mispredicted rate depends on the compared value. For the Loop1, ~11% of all branches are mispredicted. For the Loop2, ~21% of all branches are mispredicted. main() { ... for (i = 0; i < N; i++) data[i] = rand() % 256; ... /* Loop 1 */ for (k = 0; k < 50; k++) for (i = 0; i < N; i++) if (data[i] >= 64) sum += data[i]; ... ... /* Loop 2 */ for (k = 0; k < 50; k++) for (i = 0; i < N; i++) if (data[i] >= 128) sum += data[i]; ... } Usually, a code with a high branch miss rate means a bad performance. To understand the branch miss rate of the codes, the traditional method usually samples both branches and branch-misses events. E.g., perf record -e "{cpu_atom/branch-misses/ppu, cpu_atom/branch-instructions/u}" -c 1000000 -- ./mispredict [ perf record: Woken up 4 times to write data ] [ perf record: Captured and wrote 0.925 MB perf.data (5106 samples) ] The 5106 samples are from both events and spread in both Loops. In the post-process stage, a user can know that the Loop 2 has a 21% branch miss rate. Then they can focus on the samples of branch-misses events for the Loop 2. With this patch, the user can generate the samples only when the branch miss rate > 20%. For example, perf record -e "{cpu_atom/branch-misses,period=200000,acr_mask=0x2/ppu, cpu_atom/branch-instructions,period=1000000,acr_mask=0x3/u}" -- ./mispredict (Two different periods are applied to branch-misses and branch-instructions. The ratio is set to 20%. If the branch-instructions is overflowed first, the branch-miss rate < 20%. No samples should be generated. All counters should be automatically reloaded. If the branch-misses is overflowed first, the branch-miss rate > 20%. A sample triggered by the branch-misses event should be generated. Just the counter of the branch-instructions should be automatically reloaded. The branch-misses event should only be automatically reloaded when the branch-instructions is overflowed. So the "cause" event is the branch-instructions event. The acr_mask is set to 0x2, since the event index in the group of branch-instructions is 1. The branch-instructions event is automatically reloaded no matter which events are overflowed. So the "cause" events are the branch-misses and the branch-instructions event. The acr_mask should be set to 0x3.) [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.098 MB perf.data (2498 samples) ] $perf report Percent │154: movl $0x0,-0x14(%rbp) │ ↓ jmp 1af │ for (i = j; i < N; i++) │15d: mov -0x10(%rbp),%eax │ mov %eax,-0x18(%rbp) │ ↓ jmp 1a2 │ if (data[i] >= 128) │165: mov -0x18(%rbp),%eax │ cltq │ lea 0x0(,%rax,4),%rdx │ mov -0x8(%rbp),%rax │ add %rdx,%rax │ mov (%rax),%eax │ ┌──cmp $0x7f,%eax 100.00 0.00 │ ├──jle 19e │ │sum += data[i]; The 2498 samples are all from the branch-misses events for the Loop 2. The number of samples and overhead is significantly reduced without losing any information. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lkml.kernel.org/r/20250327195217.2683619-6-kan.liang@linux.intel.com |
||
![]() |
7d20aa5c32 |
Power management updates for 6.15-rc1
- Manage sysfs attributes and boost frequencies efficiently from cpufreq core to reduce boilerplate code in drivers (Viresh Kumar). - Minor cleanups to cpufreq drivers (Aaron Kling, Benjamin Schneider, Dhananjay Ugwekar, Imran Shaik, zuoqian). - Migrate some cpufreq drivers to using for_each_present_cpu() (Jacky Bai). - cpufreq-qcom-hw DT binding fixes (Krzysztof Kozlowski). - Use str_enable_disable() helper in cpufreq_online() (Lifeng Zheng). - Optimize the amd-pstate driver to avoid cases where call paths end up calling the same writes multiple times and needlessly caching variables through code reorganization, locking overhaul and tracing adjustments (Mario Limonciello, Dhananjay Ugwekar). - Make it possible to avoid enabling capacity-aware scheduling (CAS) in the intel_pstate driver and relocate a check for out-of-band (OOB) platform handling in it to make it detect OOB before checking HWP availability (Rafael Wysocki). - Fix dbs_update() to avoid inadvertent conversions of negative integer values to unsigned int which causes CPU frequency selection to be inaccurate in some cases when the "conservative" cpufreq governor is in use (Jie Zhan). - Update the handling of the most recent idle intervals in the menu cpuidle governor to prevent useful information from being discarded by it in some cases and improve the prediction accuracy (Rafael Wysocki). - Make it possible to tell the intel_idle driver to ignore its built-in table of idle states for the given processor, clean up the handling of auto-demotion disabling on Baytrail and Cherrytrail chips in it, and update its MAINTAINERS entry (David Arcari, Artem Bityutskiy, Rafael Wysocki). - Make some cpuidle drivers use for_each_present_cpu() instead of for_each_possible_cpu() during initialization to avoid issues occurring when nosmp or maxcpus=0 are used (Jacky Bai). - Clean up the Energy Model handling code somewhat (Rafael Wysocki). - Use kfree_rcu() to simplify the handling of runtime Energy Model updates (Li RongQing). - Add an entry for the Energy Model framework to MAINTAINERS as properly maintained (Lukasz Luba). - Address RCU-related sparse warnings in the Energy Model code (Rafael Wysocki). - Remove ENERGY_MODEL dependency on SMP and allow it to be selected when DEVFREQ is set without CPUFREQ so it can be used on a wider range of systems (Jeson Gao). - Unify error handling during runtime suspend and runtime resume in the core to help drivers to implement more consistent runtime PM error handling (Rafael Wysocki). - Drop a redundant check from pm_runtime_force_resume() and rearrange documentation related to __pm_runtime_disable() (Rafael Wysocki). - Rework the handling of the "smart suspend" driver flag in the PM core to avoid issues hat may occur when drivers using it depend on some other drivers and clean up the related PM core code (Rafael Wysocki, Colin Ian King). - Fix the handling of devices with the power.direct_complete flag set if device_suspend() returns an error for at least one device to avoid situations in which some of them may not be resumed (Rafael Wysocki). - Use mutex_trylock() in hibernate_compressor_param_set() to avoid a possible deadlock that may occur if the "compressor" hibernation module parameter is accessed during the registration of a new ieee80211 device (Lizhi Xu). - Suppress sleeping parent warning in device_pm_add() in the case when new children are added under a device with the power.direct_complete set after it has been processed by device_resume() (Xu Yang). - Remove needless return in three void functions related to system wakeup (Zijun Hu). - Replace deprecated kmap_atomic() with kmap_local_page() in the hibernation core code (David Reaver). - Remove unused helper functions related to system sleep (David Alan Gilbert). - Clean up s2idle_enter() so it does not lock and unlock CPU offline in vain and update comments in it (Ulf Hansson). - Clean up broken white space in dpm_wait_for_children() (Geert Uytterhoeven). - Update the cpupower utility to fix lib version-ing in it and memory leaks in error legs, remove hard-coded values, and implement CPU physical core querying (Thomas Renninger, John B. Wyatt IV, Shuah Khan, Yiwei Lin, Zhongqiu Han). -----BEGIN PGP SIGNATURE----- iQFGBAABCAAwFiEEcM8Aw/RY0dgsiRUR7l+9nS/U47UFAmfhhTYSHHJqd0Byand5 c29ja2kubmV0AAoJEO5fvZ0v1OO16/gIAKuRiG1fFgUcUSXC1iFu42vrB/1i4wpA 02GICACqM3K6/5jd3ct/WOU28GUgDs+xcmqH7CnMaM6y9nXEWjWarmSfFekAO+0q TPtQ7xTy0hBCB3he1P2uLKBJBin4Wn47U9/rvs4J7mQd5zDxTINKIiVoHg2lEE+s HAeSoNRb2sp5IZDm9+/LfhHNYRP1mJ97cbZlymqctGB3xgDL7qMLid/1+gFPHAQS 4/LXj3IgyU8DpA/j5nhtpaAqjN5g2QxIUfQgADRIcESK99Y/7aAMs1/G0WhJKaay 9yx+4/xmkGvVCZQx1DphksFLISEzltY0SFWLsoppPzBTGVEW2GQQsNI= =LqVy -----END PGP SIGNATURE----- Merge tag 'pm-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These are dominated by cpufreq updates which in turn are dominated by updates related to boost support in the core and drivers and amd-pstate driver optimizations. Apart from the above, there are some cpuidle updates including a rework of the most recent idle intervals handling in the venerable menu governor that leads to significant improvements in some performance benchmarks, as the governor is now more likely to predict a shorter idle duration in some cases, and there are updates of the core device power management code, mostly related to system suspend and resume, that should help to avoid potential issues arising when the drivers of devices depending on one another want to use different optimizations. There is also a usual collection of assorted fixes and cleanups, including removal of some unused code. Specifics: - Manage sysfs attributes and boost frequencies efficiently from cpufreq core to reduce boilerplate code in drivers (Viresh Kumar) - Minor cleanups to cpufreq drivers (Aaron Kling, Benjamin Schneider, Dhananjay Ugwekar, Imran Shaik, zuoqian) - Migrate some cpufreq drivers to using for_each_present_cpu() (Jacky Bai) - cpufreq-qcom-hw DT binding fixes (Krzysztof Kozlowski) - Use str_enable_disable() helper in cpufreq_online() (Lifeng Zheng) - Optimize the amd-pstate driver to avoid cases where call paths end up calling the same writes multiple times and needlessly caching variables through code reorganization, locking overhaul and tracing adjustments (Mario Limonciello, Dhananjay Ugwekar) - Make it possible to avoid enabling capacity-aware scheduling (CAS) in the intel_pstate driver and relocate a check for out-of-band (OOB) platform handling in it to make it detect OOB before checking HWP availability (Rafael Wysocki) - Fix dbs_update() to avoid inadvertent conversions of negative integer values to unsigned int which causes CPU frequency selection to be inaccurate in some cases when the "conservative" cpufreq governor is in use (Jie Zhan) - Update the handling of the most recent idle intervals in the menu cpuidle governor to prevent useful information from being discarded by it in some cases and improve the prediction accuracy (Rafael Wysocki) - Make it possible to tell the intel_idle driver to ignore its built-in table of idle states for the given processor, clean up the handling of auto-demotion disabling on Baytrail and Cherrytrail chips in it, and update its MAINTAINERS entry (David Arcari, Artem Bityutskiy, Rafael Wysocki) - Make some cpuidle drivers use for_each_present_cpu() instead of for_each_possible_cpu() during initialization to avoid issues occurring when nosmp or maxcpus=0 are used (Jacky Bai) - Clean up the Energy Model handling code somewhat (Rafael Wysocki) - Use kfree_rcu() to simplify the handling of runtime Energy Model updates (Li RongQing) - Add an entry for the Energy Model framework to MAINTAINERS as properly maintained (Lukasz Luba) - Address RCU-related sparse warnings in the Energy Model code (Rafael Wysocki) - Remove ENERGY_MODEL dependency on SMP and allow it to be selected when DEVFREQ is set without CPUFREQ so it can be used on a wider range of systems (Jeson Gao) - Unify error handling during runtime suspend and runtime resume in the core to help drivers to implement more consistent runtime PM error handling (Rafael Wysocki) - Drop a redundant check from pm_runtime_force_resume() and rearrange documentation related to __pm_runtime_disable() (Rafael Wysocki) - Rework the handling of the "smart suspend" driver flag in the PM core to avoid issues hat may occur when drivers using it depend on some other drivers and clean up the related PM core code (Rafael Wysocki, Colin Ian King) - Fix the handling of devices with the power.direct_complete flag set if device_suspend() returns an error for at least one device to avoid situations in which some of them may not be resumed (Rafael Wysocki) - Use mutex_trylock() in hibernate_compressor_param_set() to avoid a possible deadlock that may occur if the "compressor" hibernation module parameter is accessed during the registration of a new ieee80211 device (Lizhi Xu) - Suppress sleeping parent warning in device_pm_add() in the case when new children are added under a device with the power.direct_complete set after it has been processed by device_resume() (Xu Yang) - Remove needless return in three void functions related to system wakeup (Zijun Hu) - Replace deprecated kmap_atomic() with kmap_local_page() in the hibernation core code (David Reaver) - Remove unused helper functions related to system sleep (David Alan Gilbert) - Clean up s2idle_enter() so it does not lock and unlock CPU offline in vain and update comments in it (Ulf Hansson) - Clean up broken white space in dpm_wait_for_children() (Geert Uytterhoeven) - Update the cpupower utility to fix lib version-ing in it and memory leaks in error legs, remove hard-coded values, and implement CPU physical core querying (Thomas Renninger, John B. Wyatt IV, Shuah Khan, Yiwei Lin, Zhongqiu Han)" * tag 'pm-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (139 commits) PM: sleep: Fix bit masking operation dt-bindings: cpufreq: cpufreq-qcom-hw: Narrow properties on SDX75, SA8775p and SM8650 dt-bindings: cpufreq: cpufreq-qcom-hw: Drop redundant minItems:1 dt-bindings: cpufreq: cpufreq-qcom-hw: Add missing constraint for interrupt-names dt-bindings: cpufreq: cpufreq-qcom-hw: Add QCS8300 compatible cpufreq: Init cpufreq only for present CPUs PM: sleep: Fix handling devices with direct_complete set on errors cpuidle: Init cpuidle only for present CPUs PM: clk: Remove unused pm_clk_remove() PM: sleep: core: Fix indentation in dpm_wait_for_children() PM: s2idle: Extend comment in s2idle_enter() PM: s2idle: Drop redundant locks when entering s2idle PM: sleep: Remove unused pm_generic_ wrappers cpufreq: tegra186: Share policy per cluster cpupower: Make lib versioning scheme more obvious and fix version link PM: EM: Rework the depends on for CONFIG_ENERGY_MODEL PM: EM: Address RCU-related sparse warnings cpupower: Implement CPU physical core querying pm: cpupower: remove hard-coded topology depth values pm: cpupower: Fix cmd_monitor() error legs to free cpu_topology ... |
||
![]() |
906174776c |
- Some preparatory work to convert the mitigations machinery to mitigating
attack vectors instead of single vulnerabilities - Untangle and remove a now unneeded X86_FEATURE_USE_IBPB flag - Add support for a Zen5-specific SRSO mitigation - Cleanups and minor improvements -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmfixS0ACgkQEsHwGGHe VUpi1xAAgvH2u8Eo8ibT5dABQpD65w3oQiykO+9aDpObG9w9beDVGlld8DJE61Rz 6tcE0Clp2H/tMcCbn8zXIJ92TQ3wIX/85uZwLi1VEM1Tx7A6VtAbPv8WKfZE3FCX 9v92HRKnK3ql+A2ZR+oyy+/8RedUmia7y7/bXH1H7Zf2uozoKkmq5cQnwfq5iU4A qNiKuvSlQwjZ8Zz6Ax1ugHUkE4R7mlKh8rccLXl4+mVr63/lkPHSY3OFTjcYf4HW Ir92N86Spfo0/l0vsOOsWoYKmoaiVP7ouJh7YbKR3B0BGN0pt2MT476mehkEs427 m4J6XhRKhIrsYmzEkLvvpsg12zO4/PKk8BEYNS7YPYlRaOwjV4ivyFS2aY6e55rh yUHyo9s+16f/Mp+/fNFXll3mdMxYBioPWh3M191nJkdfyKMrtf0MdKPRibaJB8wH yMF4D1gMx+hFbs0/VOS6dtqD9DKW7VgPg0LW+RysfhnLTuFFb5iBcH6Of7l7Z/Ca vVK+JxrhB1EDVI1+MKnESKPF9c6j3DRa2xrQHi/XYje1TGqnQ1v4CmsEObYBuJDN 9M9t4QLzNuA/DA5tS7cxxtQ3YUthuJjPLcO4EVHOCvnqCAxkzp0i3dVMUr+YISl+ 2yFqaZdTt8s8FjTI21LOyuloCo30ZLlzaorFa0lp2cIyYup+1vg= =btX/ -----END PGP SIGNATURE----- Merge tag 'x86_bugs_for_v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 speculation mitigation updates from Borislav Petkov: - Some preparatory work to convert the mitigations machinery to mitigating attack vectors instead of single vulnerabilities - Untangle and remove a now unneeded X86_FEATURE_USE_IBPB flag - Add support for a Zen5-specific SRSO mitigation - Cleanups and minor improvements * tag 'x86_bugs_for_v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/bugs: Make spectre user default depend on MITIGATION_SPECTRE_V2 x86/bugs: Use the cpu_smt_possible() helper instead of open-coded code x86/bugs: Add AUTO mitigations for mds/taa/mmio/rfds x86/bugs: Relocate mds/taa/mmio/rfds defines x86/bugs: Add X86_BUG_SPECTRE_V2_USER x86/bugs: Remove X86_FEATURE_USE_IBPB KVM: nVMX: Always use IBPB to properly virtualize IBRS x86/bugs: Use a static branch to guard IBPB on vCPU switch x86/bugs: Remove the X86_FEATURE_USE_IBPB check in ib_prctl_set() x86/mm: Remove X86_FEATURE_USE_IBPB checks in cond_mitigation() x86/bugs: Move the X86_FEATURE_USE_IBPB check into callers x86/bugs: KVM: Add support for SRSO_MSR_FIX |
||
![]() |
440a65b7d2 |
x86/mm: Enable AMD translation cache extensions
With AMD TCE (translation cache extensions) only the intermediate mappings that cover the address range zapped by INVLPG / INVLPGB get invalidated, rather than all intermediate mappings getting zapped at every TLB invalidation. This can help reduce the TLB miss rate, by keeping more intermediate mappings in the cache. From the AMD manual: Translation Cache Extension (TCE) Bit. Bit 15, read/write. Setting this bit to 1 changes how the INVLPG, INVLPGB, and INVPCID instructions operate on TLB entries. When this bit is 0, these instructions remove the target PTE from the TLB as well as all upper-level table entries that are cached in the TLB, whether or not they are associated with the target PTE. When this bit is set, these instructions will remove the target PTE and only those upper-level entries that lead to the target PTE in the page table hierarchy, leaving unrelated upper-level entries intact. [ bp: use cpu_has()... I know, it is a mess. ] Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250226030129.530345-13-riel@surriel.com |
||
![]() |
b4cc466b97 |
cpufreq/amd-pstate: Replace all AMD_CPPC_* macros with masks
Bitfield masks are easier to follow and less error prone. Reviewed-by: Dhananjay Ugwekar <dhananjay.ugwekar@amd.com> Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> |
||
![]() |
8442df2b49 |
x86/bugs: KVM: Add support for SRSO_MSR_FIX
Add support for CPUID Fn8000_0021_EAX[31] (SRSO_MSR_FIX). If this bit is 1, it indicates that software may use MSR BP_CFG[BpSpecReduce] to mitigate SRSO. Enable BpSpecReduce to mitigate SRSO across guest/host boundaries. Switch back to enabling the bit when virtualization is enabled and to clear the bit when virtualization is disabled because using a MSR slot would clear the bit when the guest is exited and any training the guest has done, would potentially influence the host kernel when execution enters the kernel and hasn't VMRUN the guest yet. More detail on the public thread in Link below. Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20241202120416.6054-1-bp@kernel.org |
||
![]() |
c631a2de7a |
perf/x86/intel: Ensure LBRs are disabled when a CPU is starting
Explicitly clear DEBUGCTL.LBR when a CPU is starting, prior to purging the LBR MSRs themselves, as at least one system has been found to transfer control to the kernel with LBRs enabled (it's unclear whether it's a BIOS flaw or a CPU goof). Because the kernel preserves the original DEBUGCTL, even when toggling LBRs, leaving DEBUGCTL.LBR as is results in running with LBRs enabled at all times. Closes: https://lore.kernel.org/all/c9d8269bff69f6359731d758e3b1135dedd7cc61.camel@redhat.com Reported-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20250131010721.470503-1-seanjc@google.com |
||
![]() |
38cc6495cd |
x86/sev: Prevent GUEST_TSC_FREQ MSR interception for Secure TSC enabled guests
The hypervisor should not be intercepting GUEST_TSC_FREQ MSR(0xcOO10134) when Secure TSC is enabled. A #VC exception will be generated otherwise. If this should occur and Secure TSC is enabled, terminate guest execution. Signed-off-by: Nikunj A Dadhania <nikunj@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20250106124633.1418972-8-nikunj@amd.com |
||
![]() |
8ae3291f77 |
x86/sev: Add full support for a segmented RMP table
A segmented RMP table allows for improved locality of reference between the memory protected by the RMP and the RMP entries themselves. Add support to detect and initialize a segmented RMP table with multiple segments as configured by the system BIOS. While the RMPREAD instruction will be used to read an RMP entry in a segmented RMP, initialization and debugging capabilities will require the mapping of the segments. The RMP_CFG MSR indicates if segmented RMP support is enabled and, if enabled, the amount of memory that an RMP segment covers. When segmented RMP support is enabled, the RMP_BASE MSR points to the start of the RMP bookkeeping area, which is 16K in size. The RMP Segment Table (RST) is located immediately after the bookkeeping area and is 4K in size. The RST contains up to 512 8-byte entries that identify the location of the RMP segment and amount of memory mapped by the segment (which must be less than or equal to the configured segment size). The physical address that is covered by a segment is based on the segment size and the index of the segment in the RST. The RMP entry for a physical address is based on the offset within the segment. For example, if the segment size is 64GB (0x1000000000 or 1 << 36), then physical address 0x9000800000 is RST entry 9 (0x9000800000 >> 36) and RST entry 9 covers physical memory 0x9000000000 to 0x9FFFFFFFFF. The RMP entry index within the RMP segment is the physical address AND-ed with the segment mask, 64GB - 1 (0xFFFFFFFFF), and then right-shifted 12 bits or PHYS_PFN(0x9000800000 & 0xFFFFFFFFF), which is 0x800. CPUID 0x80000025_EBX[9:0] describes the number of RMP segments that can be cached by the hardware. Additionally, if CPUID 0x80000025_EBX[10] is set, then the number of actual RMP segments defined cannot exceed the number of RMP segments that can be cached and can be used as a maximum RST index. [ bp: Unify printk hex format specifiers. ] Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikunj A Dadhania <nikunj@amd.com> Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Link: https://lore.kernel.org/r/02afd0ffd097a19cb6e5fb1bb76eb110496c5b11.1734101742.git.thomas.lendacky@amd.com |
||
![]() |
3efc57369a |
x86:
* KVM currently invalidates the entirety of the page tables, not just those for the memslot being touched, when a memslot is moved or deleted. The former does not have particularly noticeable overhead, but Intel's TDX will require the guest to re-accept private pages if they are dropped from the secure EPT, which is a non starter. Actually, the only reason why this is not already being done is a bug which was never fully investigated and caused VM instability with assigned GeForce GPUs, so allow userspace to opt into the new behavior. * Advertise AVX10.1 to userspace (effectively prep work for the "real" AVX10 functionality that is on the horizon). * Rework common MSR handling code to suppress errors on userspace accesses to unsupported-but-advertised MSRs. This will allow removing (almost?) all of KVM's exemptions for userspace access to MSRs that shouldn't exist based on the vCPU model (the actual cleanup is non-trivial future work). * Rework KVM's handling of x2APIC ICR, again, because AMD (x2AVIC) splits the 64-bit value into the legacy ICR and ICR2 storage, whereas Intel (APICv) stores the entire 64-bit value at the ICR offset. * Fix a bug where KVM would fail to exit to userspace if one was triggered by a fastpath exit handler. * Add fastpath handling of HLT VM-Exit to expedite re-entering the guest when there's already a pending wake event at the time of the exit. * Fix a WARN caused by RSM entering a nested guest from SMM with invalid guest state, by forcing the vCPU out of guest mode prior to signalling SHUTDOWN (the SHUTDOWN hits the VM altogether, not the nested guest) * Overhaul the "unprotect and retry" logic to more precisely identify cases where retrying is actually helpful, and to harden all retry paths against putting the guest into an infinite retry loop. * Add support for yielding, e.g. to honor NEED_RESCHED, when zapping rmaps in the shadow MMU. * Refactor pieces of the shadow MMU related to aging SPTEs in prepartion for adding multi generation LRU support in KVM. * Don't stuff the RSB after VM-Exit when RETPOLINE=y and AutoIBRS is enabled, i.e. when the CPU has already flushed the RSB. * Trace the per-CPU host save area as a VMCB pointer to improve readability and cleanup the retrieval of the SEV-ES host save area. * Remove unnecessary accounting of temporary nested VMCB related allocations. * Set FINAL/PAGE in the page fault error code for EPT violations if and only if the GVA is valid. If the GVA is NOT valid, there is no guest-side page table walk and so stuffing paging related metadata is nonsensical. * Fix a bug where KVM would incorrectly synthesize a nested VM-Exit instead of emulating posted interrupt delivery to L2. * Add a lockdep assertion to detect unsafe accesses of vmcs12 structures. * Harden eVMCS loading against an impossible NULL pointer deref (really truly should be impossible). * Minor SGX fix and a cleanup. * Misc cleanups Generic: * Register KVM's cpuhp and syscore callbacks when enabling virtualization in hardware, as the sole purpose of said callbacks is to disable and re-enable virtualization as needed. * Enable virtualization when KVM is loaded, not right before the first VM is created. Together with the previous change, this simplifies a lot the logic of the callbacks, because their very existence implies virtualization is enabled. * Fix a bug that results in KVM prematurely exiting to userspace for coalesced MMIO/PIO in many cases, clean up the related code, and add a testcase. * Fix a bug in kvm_clear_guest() where it would trigger a buffer overflow _if_ the gpa+len crosses a page boundary, which thankfully is guaranteed to not happen in the current code base. Add WARNs in more helpers that read/write guest memory to detect similar bugs. Selftests: * Fix a goof that caused some Hyper-V tests to be skipped when run on bare metal, i.e. NOT in a VM. * Add a regression test for KVM's handling of SHUTDOWN for an SEV-ES guest. * Explicitly include one-off assets in .gitignore. Past Sean was completely wrong about not being able to detect missing .gitignore entries. * Verify userspace single-stepping works when KVM happens to handle a VM-Exit in its fastpath. * Misc cleanups -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmb201AUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroOM1gf+Ij7dpCh0KwoNYlHfW2aCHAv3PqQd cKMDSGxoCernbJEyPO/3qXNUK+p4zKedk3d92snW3mKa+cwxMdfthJ3i9d7uoNiw 7hAgcfKNHDZGqAQXhx8QcVF3wgp+diXSyirR+h1IKrGtCCmjMdNC8ftSYe6voEkw VTVbLL+tER5H0Xo5UKaXbnXKDbQvWLXkdIqM8dtLGFGLQ2PnF/DdMP0p6HYrKf1w B7LBu0rvqYDL8/pS82mtR3brHJXxAr9m72fOezRLEUbfUdzkTUi/b1vEe6nDCl0Q i/PuFlARDLWuetlR0VVWKNbop/C/l4EmwCcKzFHa+gfNH3L9361Oz+NzBw== =Q7kz -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull x86 kvm updates from Paolo Bonzini: "x86: - KVM currently invalidates the entirety of the page tables, not just those for the memslot being touched, when a memslot is moved or deleted. This does not traditionally have particularly noticeable overhead, but Intel's TDX will require the guest to re-accept private pages if they are dropped from the secure EPT, which is a non starter. Actually, the only reason why this is not already being done is a bug which was never fully investigated and caused VM instability with assigned GeForce GPUs, so allow userspace to opt into the new behavior. - Advertise AVX10.1 to userspace (effectively prep work for the "real" AVX10 functionality that is on the horizon) - Rework common MSR handling code to suppress errors on userspace accesses to unsupported-but-advertised MSRs This will allow removing (almost?) all of KVM's exemptions for userspace access to MSRs that shouldn't exist based on the vCPU model (the actual cleanup is non-trivial future work) - Rework KVM's handling of x2APIC ICR, again, because AMD (x2AVIC) splits the 64-bit value into the legacy ICR and ICR2 storage, whereas Intel (APICv) stores the entire 64-bit value at the ICR offset - Fix a bug where KVM would fail to exit to userspace if one was triggered by a fastpath exit handler - Add fastpath handling of HLT VM-Exit to expedite re-entering the guest when there's already a pending wake event at the time of the exit - Fix a WARN caused by RSM entering a nested guest from SMM with invalid guest state, by forcing the vCPU out of guest mode prior to signalling SHUTDOWN (the SHUTDOWN hits the VM altogether, not the nested guest) - Overhaul the "unprotect and retry" logic to more precisely identify cases where retrying is actually helpful, and to harden all retry paths against putting the guest into an infinite retry loop - Add support for yielding, e.g. to honor NEED_RESCHED, when zapping rmaps in the shadow MMU - Refactor pieces of the shadow MMU related to aging SPTEs in prepartion for adding multi generation LRU support in KVM - Don't stuff the RSB after VM-Exit when RETPOLINE=y and AutoIBRS is enabled, i.e. when the CPU has already flushed the RSB - Trace the per-CPU host save area as a VMCB pointer to improve readability and cleanup the retrieval of the SEV-ES host save area - Remove unnecessary accounting of temporary nested VMCB related allocations - Set FINAL/PAGE in the page fault error code for EPT violations if and only if the GVA is valid. If the GVA is NOT valid, there is no guest-side page table walk and so stuffing paging related metadata is nonsensical - Fix a bug where KVM would incorrectly synthesize a nested VM-Exit instead of emulating posted interrupt delivery to L2 - Add a lockdep assertion to detect unsafe accesses of vmcs12 structures - Harden eVMCS loading against an impossible NULL pointer deref (really truly should be impossible) - Minor SGX fix and a cleanup - Misc cleanups Generic: - Register KVM's cpuhp and syscore callbacks when enabling virtualization in hardware, as the sole purpose of said callbacks is to disable and re-enable virtualization as needed - Enable virtualization when KVM is loaded, not right before the first VM is created Together with the previous change, this simplifies a lot the logic of the callbacks, because their very existence implies virtualization is enabled - Fix a bug that results in KVM prematurely exiting to userspace for coalesced MMIO/PIO in many cases, clean up the related code, and add a testcase - Fix a bug in kvm_clear_guest() where it would trigger a buffer overflow _if_ the gpa+len crosses a page boundary, which thankfully is guaranteed to not happen in the current code base. Add WARNs in more helpers that read/write guest memory to detect similar bugs Selftests: - Fix a goof that caused some Hyper-V tests to be skipped when run on bare metal, i.e. NOT in a VM - Add a regression test for KVM's handling of SHUTDOWN for an SEV-ES guest - Explicitly include one-off assets in .gitignore. Past Sean was completely wrong about not being able to detect missing .gitignore entries - Verify userspace single-stepping works when KVM happens to handle a VM-Exit in its fastpath - Misc cleanups" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (127 commits) Documentation: KVM: fix warning in "make htmldocs" s390: Enable KVM_S390_UCONTROL config in debug_defconfig selftests: kvm: s390: Add VM run test case KVM: SVM: let alternatives handle the cases when RSB filling is required KVM: VMX: Set PFERR_GUEST_{FINAL,PAGE}_MASK if and only if the GVA is valid KVM: x86/mmu: Use KVM_PAGES_PER_HPAGE() instead of an open coded equivalent KVM: x86/mmu: Add KVM_RMAP_MANY to replace open coded '1' and '1ul' literals KVM: x86/mmu: Fold mmu_spte_age() into kvm_rmap_age_gfn_range() KVM: x86/mmu: Morph kvm_handle_gfn_range() into an aging specific helper KVM: x86/mmu: Honor NEED_RESCHED when zapping rmaps and blocking is allowed KVM: x86/mmu: Add a helper to walk and zap rmaps for a memslot KVM: x86/mmu: Plumb a @can_yield parameter into __walk_slot_rmaps() KVM: x86/mmu: Move walk_slot_rmaps() up near for_each_slot_rmap_range() KVM: x86/mmu: WARN on MMIO cache hit when emulating write-protected gfn KVM: x86/mmu: Detect if unprotect will do anything based on invalid_list KVM: x86/mmu: Subsume kvm_mmu_unprotect_page() into the and_retry() version KVM: x86: Rename reexecute_instruction()=>kvm_unprotect_and_retry_on_failure() KVM: x86: Update retry protection fields when forcing retry on emulation failure KVM: x86: Apply retry protection to "unprotect on failure" path KVM: x86: Check EMULTYPE_WRITE_PF_TO_SP before unprotecting gfn ... |
||
![]() |
dc1e67f70f |
KVM VMX: Move MSR_IA32_VMX_MISC bit defines to asm/vmx.h
Move the handful of MSR_IA32_VMX_MISC bit defines that are currently in msr-indx.h to vmx.h so that all of the VMX_MISC defines and wrappers can be found in a single location. Opportunistically use BIT_ULL() instead of open coding hex values, add defines for feature bits that are architecturally defined, and move the defines down in the file so that they are colocated with the helpers for getting fields from VMX_MISC. No functional change intended. Cc: Shan Kang <shan.kang@intel.com> Cc: Kai Huang <kai.huang@intel.com> Signed-off-by: Xin Li <xin3.li@intel.com> [sean: split to separate patch, write changelog] Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20240605231918.2915961-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> |
||
![]() |
d7bfc9ffd5 |
KVM: VMX: Move MSR_IA32_VMX_BASIC bit defines to asm/vmx.h
Move the bit defines for MSR_IA32_VMX_BASIC from msr-index.h to vmx.h so that they are colocated with other VMX MSR bit defines, and with the helpers that extract specific information from an MSR_IA32_VMX_BASIC value. Opportunistically use BIT_ULL() instead of open coding hex values. Opportunistically rename VMX_BASIC_64 to VMX_BASIC_32BIT_PHYS_ADDR_ONLY, as "VMX_BASIC_64" is widly misleading. The flag enumerates that addresses are limited to 32 bits, not that 64-bit addresses are allowed. Last but not least, opportunistically #define DUAL_MONITOR_TREATMENT so that all known single-bit feature flags are defined (this will allow replacing open-coded literals in the future). Cc: Shan Kang <shan.kang@intel.com> Cc: Kai Huang <kai.huang@intel.com> Signed-off-by: Xin Li <xin3.li@intel.com> [sean: split to separate patch, write changelog] Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20240605231918.2915961-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> |
||
![]() |
beb2e44604 |
x86/cpu: KVM: Move macro to encode PAT value to common header
Move pat/memtype.c's PAT() macro to msr-index.h as PAT_VALUE(), and use it in KVM to define the default (Power-On / RESET) PAT value instead of open coding an inscrutable magic number. No functional change intended. Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240605231918.2915961-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> |
||
![]() |
e7e80b66fb |
x86/cpu: KVM: Add common defines for architectural memory types (PAT, MTRRs, etc.)
Add defines for the architectural memory types that can be shoved into various MSRs and registers, e.g. MTRRs, PAT, VMX capabilities MSRs, EPTPs, etc. While most MSRs/registers support only a subset of all memory types, the values themselves are architectural and identical across all users. Leave the goofy MTRR_TYPE_* definitions as-is since they are in a uapi header, but add compile-time assertions to connect the dots (and sanity check that the msr-index.h values didn't get fat-fingered). Keep the VMX_EPTP_MT_* defines so that it's slightly more obvious that the EPTP holds a single memory type in 3 of its 64 bits; those bits just happen to be 2:0, i.e. don't need to be shifted. Opportunistically use X86_MEMTYPE_WB instead of an open coded '6' in setup_vmcs_config(). No functional change intended. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20240605231918.2915961-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> |
||
![]() |
0a3e4e94d1 |
platform/x86/intel/ifs: Add SBAF test image loading support
Structural Based Functional Test at Field (SBAF) is a new type of testing that provides comprehensive core test coverage complementing existing IFS tests like Scan at Field (SAF) or ArrayBist. SBAF device will appear as a new device instance (intel_ifs_2) under /sys/devices/virtual/misc. The user interaction necessary to load the test image and test a particular core is the same as the existing scan test (intel_ifs_0). During the loading stage, the driver will look for a file named ff-mm-ss-<batch02x>.sbft in the /lib/firmware/intel/ifs_2 directory. The hardware interaction needed for loading the image is similar to SAF, with the only difference being the MSR addresses used. Reuse the SAF image loading code, passing the SBAF-specific MSR addresses via struct ifs_test_msrs in the driver device data. Unlike SAF, the SBAF test image chunks are further divided into smaller logical entities called bundles. Since the SBAF test is initiated per bundle, cache the maximum number of bundles in the current image, which is used for iterating through bundles during SBAF test execution. Reviewed-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Jithu Joseph <jithu.joseph@intel.com> Co-developed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://lore.kernel.org/r/20240801051814.1935149-3-sathyanarayanan.kuppuswamy@linux.intel.com Signed-off-by: Hans de Goede <hdegoede@redhat.com> |
||
![]() |
576a997c63 |
Performance events changes for v6.11:
- Intel PT support enhancements & fixes - Fix leaked SIGTRAP events - Improve and fix the Intel uncore driver - Add support for Intel HBM and CXL uncore counters - Add Intel Lake and Arrow Lake support - AMD uncore driver fixes - Make SIGTRAP and __perf_pending_irq() work on RT - Micro-optimizations - Misc cleanups and fixes Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmaWjncRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1iZyg//TSafjCK4N9fyXrPdPqf8L7ntX5uYf0rd uVZpEo/+VGvuFhznHnZIV2DLetvuwYZcUWszCqQMYfokGGi6WI1/k4MeZkSpN5QE p5mFk6gW3cmpHT9bECg7mKQH+w7Qna/b6mnA0HYTFxPGmQKdQDl1/S+ZsgWedxpC 4V3re7/FzenFVS45DwSMPi9s7uZzZhVhTSgb4XLy+0Da4S0iRULItBa8HT8HmqE5 v5aQlw3mmwKPUWvyPMi3Sw6RRWK3C+n5ZxWswSYoLSM3dsp1ZD+YYqtOv2GqAx8v JoL0SOnGnNCfxGHh0kz5D2hztDvq61Enotih2gz7HxvdWh2DasNp4yS1USGQhu5h VJnKNA0TfOUaYqWFVj0EgRVhDX79lMwSHTkR1DZd4vM2GDigHeRPh0zGSn2w/koV oCRxFfBoktHBnX0Te1NE2BhojbuKp25vTGK6GriVcHt/RNpuz6hTxsjdJzHCAlVX M349l0EpUJafvfaIN9zF22uw22J8P9y9JYqI6ebkUIKiuoT9LuafVYhQupSE9H4u IqlozPCTNw6eAQcUo03gkl3n+SY/DZH6eU2ycKgEp3r7TDGYbJPwxY1BgOHbwi4U lySM07leso2accSVAz7GDMI3ejj6Sx64asWS1FSwbajDflouaIK2jtey+1IOdXfv hHY65tomV8U= =gguT -----END PGP SIGNATURE----- Merge tag 'perf-core-2024-07-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull performance events updates from Ingo Molnar: - Intel PT support enhancements & fixes - Fix leaked SIGTRAP events - Improve and fix the Intel uncore driver - Add support for Intel HBM and CXL uncore counters - Add Intel Lake and Arrow Lake support - AMD uncore driver fixes - Make SIGTRAP and __perf_pending_irq() work on RT - Micro-optimizations - Misc cleanups and fixes * tag 'perf-core-2024-07-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits) perf/x86/intel: Add a distinct name for Granite Rapids perf/x86/intel/ds: Fix non 0 retire latency on Raptorlake perf/x86/intel: Hide Topdown metrics events if the feature is not enumerated perf/x86/intel/uncore: Fix the bits of the CHA extended umask for SPR perf: Split __perf_pending_irq() out of perf_pending_irq() perf: Don't disable preemption in perf_pending_task(). perf: Move swevent_htable::recursion into task_struct. perf: Shrink the size of the recursion counter. perf: Enqueue SIGTRAP always via task_work. task_work: Add TWA_NMI_CURRENT as an additional notify mode. perf: Move irq_work_queue() where the event is prepared. perf: Fix event leak upon exec and file release perf: Fix event leak upon exit task_work: Introduce task_work_cancel() again task_work: s/task_work_cancel()/task_work_cancel_func()/ perf/x86/amd/uncore: Fix DF and UMC domain identification perf/x86/amd/uncore: Avoid PMU registration if counters are unavailable perf/x86/intel: Support Perfmon MSRs aliasing perf/x86/intel: Support PERFEVTSEL extension perf/x86: Add config_mask to represent EVENTSEL bitmask ... |
||
![]() |
41906248d0 |
Power management updates for 6.11-rc1
- Add Loongson-3 CPUFreq driver support (Huacai Chen). - Add support for the Arrow Lake and Lunar Lake platforms and the out-of-band (OOB) mode on Emerald Rapids to the intel_pstate cpufreq driver, make it support the highest performance change interrupt and clean it up (Srinivas Pandruvada). - Switch cpufreq to new Intel CPU model defines (Tony Luck). - Simplify the cpufreq driver interface by switching the .exit() driver callback to the void return data type (Lizhe, Viresh Kumar). - Make cpufreq_boost_enabled() return bool (Dhruva Gole). - Add fast CPPC support to the amd-pstate cpufreq driver, address multiple assorted issues in it and clean it up (Perry Yuan, Mario Limonciello, Dhananjay Ugwekar, Meng Li, Xiaojian Du). - Add Allwinner H700 speed bin to the sun50i cpufreq driver (Ryan Walklin). - Fix memory leaks and of_node_put() usage in the sun50i and qcom-nvmem cpufreq drivers (Javier Carrasco). - Clean up the sti and dt-platdev cpufreq drivers (Jeff Johnson, Raphael Gallais-Pou). - Fix deferred probe handling in the TI cpufreq driver and wrong return values of ti_opp_supply_probe(), and add OPP tables for the AM62Ax and AM62Px SoCs to it (Bryan Brattlof, Primoz Fiser). - Avoid overflow of target_freq in .fast_switch() in the SCMI cpufreq driver (Jagadeesh Kona). - Use dev_err_probe() in every error path in probe in the Mediatek cpufreq driver (Nícolas Prado). - Fix kernel-doc param for longhaul_setstate in the longhaul cpufreq driver (Yang Li). - Fix system resume handling in the CPPC cpufreq driver (Riwen Lu). - Improve the teo cpuidle governor and clean up leftover comments from the menu cpuidle governor (Christian Loehle). - Clean up a comment typo in the teo cpuidle governor (Atul Kumar Pant). - Add missing MODULE_DESCRIPTION() macro to cpuidle haltpoll (Jeff Johnson). - Switch the intel_idle driver to new Intel CPU model defines (Tony Luck). - Switch the Intel RAPL driver new Intel CPU model defines (Tony Luck). - Simplify if condition in the idle_inject driver (Thorsten Blum). - Fix missing cleanup on error in _opp_attach_genpd() (Viresh Kumar). - Introduce an OF helper function to inform if required-opps is used and drop a redundant in-parameter to _set_opp_level() (Ulf Hansson). - Update pm-graph to v5.12 which includes fixes and major code revamp for python3.12 (Todd Brandt). - Address several assorted issues in the cpupower utility (Roman Storozhenko). -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmaVb+8SHHJqd0Byand5 c29ja2kubmV0AAoJEILEb/54YlRxXIUQALFhNTO+wo8uPWUmsp0SV81Sbf17zM0f 9IDpzJTUZLK0stTdLtxY4khcClPE4MrwS/LjSJlvkEVZChHpUw6vFezHmx0O42Ti Tmv3ezABSAmx6QVRSpyVhE3Hb0BmXW9V+3dtoefofV0JWenN7mqk4Hbb2Jx1Cvbh zyerUeWWl97yqVMM2l5owKHSvk7SYO6cfML73XcdXQ6pBfQePfekG87i1+r40l+d qEzdyh6JjqGbdkvZKtI4zO1Hdai9FdlLWSqYmVZGS5XRN8RVvDaHDIDlSijNXAei DFPFoBVAvl8CymBXXnzDyJJhCCkEb2aX3xD6WzthoCygZt5W+tqfGxyZfViBfb55 kvpyiWZUVaDyX4Hfz1PLnJ7Xg9kPUKUcDDrsV5vKA7W0Sq2T0RbORsVkaP2nIhlY 4Xspp9nEv+78DG0UjT7jT0Py2Oq9I6BTG+pmMTxcgA7G/U5H2uAvvIM/kwQ+30vi yUxO3W5o9TQmvJF1klHgp3YsCNWZG3IYacHZzUIoPbPusEbevYrCuUNriT+zlANc Pv/FMfBfHDmU2lHWyLzuoKhlzQosNi9NajMANBJgd55zACWKzgNzFV4P5gIMd1KR moJYfosbT2RWetEH8Zrh7xA5dewUphe6tibshElbKJHilnP0iFjYhhdb6aQRcuPd q/RECFYT7z0r =imBx -----END PGP SIGNATURE----- Merge tag 'pm-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These add a new cpufreq driver for Loongson-3, add support for new features in the intel_pstate (Lunar Lake and Arrow Lake platforms, OOB mode for Emerald Rapids, highest performance change interrupt), amd-pstate (fast CPPC) and sun50i (Allwinner H700 speed bin) cpufreq drivers, simplify the cpufreq driver interface, simplify the teo cpuidle governor, adjust the pm-graph utility for a new version of Python, address issues and clean up code. Specifics: - Add Loongson-3 CPUFreq driver support (Huacai Chen) - Add support for the Arrow Lake and Lunar Lake platforms and the out-of-band (OOB) mode on Emerald Rapids to the intel_pstate cpufreq driver, make it support the highest performance change interrupt and clean it up (Srinivas Pandruvada) - Switch cpufreq to new Intel CPU model defines (Tony Luck) - Simplify the cpufreq driver interface by switching the .exit() driver callback to the void return data type (Lizhe, Viresh Kumar) - Make cpufreq_boost_enabled() return bool (Dhruva Gole) - Add fast CPPC support to the amd-pstate cpufreq driver, address multiple assorted issues in it and clean it up (Perry Yuan, Mario Limonciello, Dhananjay Ugwekar, Meng Li, Xiaojian Du) - Add Allwinner H700 speed bin to the sun50i cpufreq driver (Ryan Walklin) - Fix memory leaks and of_node_put() usage in the sun50i and qcom-nvmem cpufreq drivers (Javier Carrasco) - Clean up the sti and dt-platdev cpufreq drivers (Jeff Johnson, Raphael Gallais-Pou) - Fix deferred probe handling in the TI cpufreq driver and wrong return values of ti_opp_supply_probe(), and add OPP tables for the AM62Ax and AM62Px SoCs to it (Bryan Brattlof, Primoz Fiser) - Avoid overflow of target_freq in .fast_switch() in the SCMI cpufreq driver (Jagadeesh Kona) - Use dev_err_probe() in every error path in probe in the Mediatek cpufreq driver (Nícolas Prado) - Fix kernel-doc param for longhaul_setstate in the longhaul cpufreq driver (Yang Li) - Fix system resume handling in the CPPC cpufreq driver (Riwen Lu) - Improve the teo cpuidle governor and clean up leftover comments from the menu cpuidle governor (Christian Loehle) - Clean up a comment typo in the teo cpuidle governor (Atul Kumar Pant) - Add missing MODULE_DESCRIPTION() macro to cpuidle haltpoll (Jeff Johnson) - Switch the intel_idle driver to new Intel CPU model defines (Tony Luck) - Switch the Intel RAPL driver new Intel CPU model defines (Tony Luck) - Simplify if condition in the idle_inject driver (Thorsten Blum) - Fix missing cleanup on error in _opp_attach_genpd() (Viresh Kumar) - Introduce an OF helper function to inform if required-opps is used and drop a redundant in-parameter to _set_opp_level() (Ulf Hansson) - Update pm-graph to v5.12 which includes fixes and major code revamp for python3.12 (Todd Brandt) - Address several assorted issues in the cpupower utility (Roman Storozhenko)" * tag 'pm-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (77 commits) cpufreq: sti: fix build warning cpufreq: mediatek: Use dev_err_probe in every error path in probe cpufreq: Add Loongson-3 CPUFreq driver support cpufreq: Make cpufreq_driver->exit() return void cpufreq/amd-pstate: Fix the scaling_max_freq setting on shared memory CPPC systems cpufreq/amd-pstate-ut: Convert nominal_freq to khz during comparisons cpufreq: pcc: Remove empty exit() callback cpufreq: loongson2: Remove empty exit() callback cpufreq: nforce2: Remove empty exit() callback cpupower: fix lib default installation path cpufreq: docs: Add missing scaling_available_frequencies description cpuidle: teo: Don't count non-existent intercepts cpupower: Disable direct build of the 'bench' subproject cpuidle: teo: Remove recent intercepts metric Revert: "cpuidle: teo: Introduce util-awareness" cpufreq: make cpufreq_boost_enabled() return bool cpufreq: intel_pstate: Support highest performance change interrupt x86/cpufeatures: Add HWP highest perf change feature flag Documentation: cpufreq: amd-pstate: update doc for Per CPU boost control method cpufreq: amd-pstate: Cap the CPPC.max_perf to nominal_perf if CPB is off ... |
||
![]() |
408323581b |
- Add support for running the kernel in a SEV-SNP guest, over a Secure
VM Service Module (SVSM). When running over a SVSM, different services can run at different protection levels, apart from the guest OS but still within the secure SNP environment. They can provide services to the guest, like a vTPM, for example. This series adds the required facilities to interface with such a SVSM module. - The usual fixlets, refactoring and cleanups -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmaWQuoACgkQEsHwGGHe VUrmEw/+KqM5DK5cfpue3gn0RfH6OYUoFxOdYhGkG53qUMc3c3ka5zPVqLoHPkzp WPXha0Z5pVdrcD9mKtVUW9RIuLjInCM/mnoNc3tIUL+09xxemAjyG1+O+4kodiU7 sZ5+HuKUM2ihoC4Rrm+ApRrZfH4+WcgQNvFky77iObWVBo4yIscS7Pet/MYFvuuz zNaGp2SGGExDeoX/pMQNI3S9FKYD26HR17AUI3DHpS0teUl2npVi4xDjFVYZh0dQ yAhTKbSX3Q6ekDDkvAQUbxvWTJw9qoIsvLO9dvZdx6SSWmzF9IbuECpQKGQwYcp+ pVtcHb+3MwfB+nh5/fHyssRTOZp1UuI5GcmLHIQhmhQwCqPgzDH6te4Ud1ovkxOu 3GoBre7KydnQIyv12I+56/ZxyPbjHWmn8Fg106nAwGTdGbBJhfcVYfPmPvwpI4ib nXpjypvM8FkLzLAzDK6GE9QiXqJJlxOn7t66JiH/FkXR4gnY3eI8JLMfnm5blAb+ 97LC7oyeqtstWth9/4tpCILgPR2tirrMQGjUXttgt+2VMzqnEamnFozsKvR95xok 4j6ulKglZjdpn0ixHb2vAzAcOJvD7NP147jtCmXH7M6/f9H1Lih3MKdxX98MVhWB wSp16udXHzu5lF45J0BJG8uejSgBI2y51jc92HLX7kRULOGyaEo= =u15r -----END PGP SIGNATURE----- Merge tag 'x86_sev_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SEV updates from Borislav Petkov: - Add support for running the kernel in a SEV-SNP guest, over a Secure VM Service Module (SVSM). When running over a SVSM, different services can run at different protection levels, apart from the guest OS but still within the secure SNP environment. They can provide services to the guest, like a vTPM, for example. This series adds the required facilities to interface with such a SVSM module. - The usual fixlets, refactoring and cleanups [ And as always: "SEV" is AMD's "Secure Encrypted Virtualization". I can't be the only one who gets all the newer x86 TLA's confused, can I? - Linus ] * tag 'x86_sev_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Documentation/ABI/configfs-tsm: Fix an unexpected indentation silly x86/sev: Do RMP memory coverage check after max_pfn has been set x86/sev: Move SEV compilation units virt: sev-guest: Mark driver struct with __refdata to prevent section mismatch x86/sev: Allow non-VMPL0 execution when an SVSM is present x86/sev: Extend the config-fs attestation support for an SVSM x86/sev: Take advantage of configfs visibility support in TSM fs/configfs: Add a callback to determine attribute visibility sev-guest: configfs-tsm: Allow the privlevel_floor attribute to be updated virt: sev-guest: Choose the VMPCK key based on executing VMPL x86/sev: Provide guest VMPL level to userspace x86/sev: Provide SVSM discovery support x86/sev: Use the SVSM to create a vCPU when not in VMPL0 x86/sev: Perform PVALIDATE using the SVSM when not at VMPL0 x86/sev: Use kernel provided SVSM Calling Areas x86/sev: Check for the presence of an SVSM in the SNP secrets page x86/irqflags: Provide native versions of the local_irq_save()/restore() |
||
![]() |
149fd4712b |
perf/x86/intel: Support Perfmon MSRs aliasing
The architectural performance monitoring V6 supports a new range of counters' MSRs in the 19xxH address range. They include all the GP counter MSRs, the GP control MSRs, and the fixed counter MSRs. The step between each sibling counter is 4. Add intel_pmu_addr_offset() to calculate the correct offset. Add fixedctr in struct x86_pmu to store the address of the fixed counter 0. It can be used to calculate the rest of the fixed counters. The MSR address of the fixed counter control is not changed. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lkml.kernel.org/r/20240626143545.480761-9-kan.liang@linux.intel.com |
||
![]() |
21b362cc76 |
x86/resctrl: Enable shared RMID mode on Sub-NUMA Cluster (SNC) systems
Hardware has two RMID configuration options for SNC systems. The default mode divides RMID counters between SNC nodes. E.g. with 200 RMIDs and two SNC nodes per L3 cache RMIDs 0..99 are used on node 0, and 100..199 on node 1. This isn't compatible with Linux resctrl usage. On this example system a process using RMID 5 would only update monitor counters while running on SNC node 0. The other mode is "RMID Sharing Mode". This is enabled by clearing bit 0 of the RMID_SNC_CONFIG (0xCA0) model specific register. In this mode the number of logical RMIDs is the number of physical RMIDs (from CPUID leaf 0xF) divided by the number of SNC nodes per L3 cache instance. A process can use the same RMID across different SNC nodes. See the "Intel Resource Director Technology Architecture Specification" for additional details. When SNC is enabled, update the MSR when a monitor domain is marked online. Technically this is overkill. It only needs to be done once per L3 cache instance rather than per SNC domain. But there is no harm in doing it more than once, and this is not in a critical path. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/r/20240702173820.90368-3-tony.luck@intel.com |
||
![]() |
4f460bff7b |
cpufreq: acpi: move MSR_K7_HWCR_CPB_DIS_BIT into msr-index.h
There are some other drivers also need to use the MSR_K7_HWCR_CPB_DIS_BIT for CPB control bit, so it makes sense to move the definition to a common header file to allow other driver to use it. No intentional functional impact. Suggested-by: Gautham Ranjal Shenoy <gautham.shenoy@amd.com> Signed-off-by: Perry Yuan <perry.yuan@amd.com> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Acked-by: Huang Rui <ray.huang@amd.com> Link: https://lore.kernel.org/r/78b6c75e6cffddce3e950dd543af6ae9f8eeccc3.1718988436.git.perry.yuan@amd.com Link: https://lore.kernel.org/r/20240626042733.3747-2-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> |
||
![]() |
1beb348d5c |
x86/sev: Provide SVSM discovery support
The SVSM specification documents an alternative method of discovery for the SVSM using a reserved CPUID bit and a reserved MSR. This is intended for guest components that do not have access to the secrets page in order to be able to call the SVSM (e.g. UEFI runtime services). For the MSR support, a new reserved MSR 0xc001f000 has been defined. A #VC should be generated when accessing this MSR. The #VC handler is expected to ignore writes to this MSR and return the physical calling area address (CAA) on reads of this MSR. While the CPUID leaf is updated, allowing the creation of a CPU feature, the code will continue to use the VMPL level as an indication of the presence of an SVSM. This is because the SVSM can be called well before the CPU feature is in place and a non-zero VMPL requires that an SVSM be present. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/4f93f10a2ff3e9f368fd64a5920d51bf38d0c19e.1717600736.git.thomas.lendacky@amd.com |
||
![]() |
c4273a6692 |
x86/cleanups changes for v6.10:
- Fix function prototypes to address clang function type cast warnings in the math-emu code - Reorder definitions in <asm/msr-index.h> - Remove unused code - Fix typos - Simplify #include sections Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmZBvHQRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1jeSBAAqPMBFEYc5nge52ONZ8bzADEPQ6pBohgO xfONNuUpjtQ/Xtnhc8FGoFf+C9pnOlf2eX2VfusqvA6M9XJDgZxu1M6QZSOHuILo 4T4opzTj7VYLbo1DQGLcPMymW/rhJNwKdRwhHr4SNIk9YcIJS7uyxtnLNvqjcCsB /iMw2/mhlXRXN1MP1Eg4YM6BXJ4qYkjx79gzKEGbq6tJgUahR37LGvw1aq+GAiap Wbo0o2jLgu8ByZXKEfUmUnW5jMR02LeUBg1OqDjaziO48df6eUi4ngaCoSA5qIew SDKZ1uq3qTOlDtGlxIGlBznM/HjvPejr+XQXKukCn+B9N62PMtR4fOS5q/4ODTD+ wQttK0rg/fLpp1zgv33ey2N0qpbUxbtxC4JkA4DPfqstO/uiQXTNJM6H68Pqr9p/ 6TuW+HYrsgUdi54X4KTEHIAGOSUP0bjJrtSP6Tzxt9+epOQl+ymHaR07a4rRn2cw SnK7CQcWsjv90PUkCsb3F7gZtYVOkb4C0ZCPn2AlSPo+y0YnBadG+S6uQ6suFwxA kX5QNf+OPmqJZz/muqGQ+c7Swc9ONPdv6RSt35nqp2vz0ugp4Q1FNUciQGfOLj2V O0KaFVcdFvlkLGgxgYlGZJKxWKeuhh+L5IHyaL5fy7nOUhJtI+djoF5ZaCfR0Ofp Piqz80R6w9I= =6pkd -----END PGP SIGNATURE----- Merge tag 'x86-cleanups-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: - Fix function prototypes to address clang function type cast warnings in the math-emu code - Reorder definitions in <asm/msr-index.h> - Remove unused code - Fix typos - Simplify #include sections * tag 'x86-cleanups-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/pci/ce4100: Remove unused 'struct sim_reg_op' x86/msr: Move ARCH_CAP_XAPIC_DISABLE bit definition to its rightful place x86/math-emu: Fix function cast warnings x86/extable: Remove unused fixup type EX_TYPE_COPY x86/rtc: Remove unused intel-mid.h x86/32: Remove unused IA32_STACK_TOP and two externs x86/head: Simplify relative include path to xen-head.S x86/fred: Fix typo in Kconfig description x86/syscall/compat: Remove ia32_unistd.h x86/syscall/compat: Remove unused macro __SYSCALL_ia32_NR x86/virt/tdx: Remove duplicate include x86/xen: Remove duplicate #include |
||
![]() |
53bc516ade |
x86/msr: Move ARCH_CAP_XAPIC_DISABLE bit definition to its rightful place
The ARCH_CAP_XAPIC_DISABLE bit of MSR_IA32_ARCH_CAP is not in the correct sorted order. Move it where it belongs. No functional change. [ bp: Massage commit message. ] Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/243317ff6c8db307b7701a45f71e5c21da80194b.1705632532.git.pawan.kumar.gupta@linux.intel.com |
||
![]() |
be482ff950 |
x86/bhi: Enumerate Branch History Injection (BHI) bug
Mitigation for BHI is selected based on the bug enumeration. Add bits needed to enumerate BHI bug. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> |
||
![]() |
0f4a837615 |
x86/bhi: Define SPEC_CTRL_BHI_DIS_S
Newer processors supports a hardware control BHI_DIS_S to mitigate Branch History Injection (BHI). Setting BHI_DIS_S protects the kernel from userspace BHI attacks without having to manually overwrite the branch history. Define MSR_SPEC_CTRL bit BHI_DIS_S and its enumeration CPUID.BHI_CTRL. Mitigation is enabled later. Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> |
||
![]() |
0e33cf955f |
* Mitigate RFDS vulnerability
-----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmXvZgoACgkQaDWVMHDJ krC2Eg//aZKBp97/DSzRqXKDwJzVUr0sGJ9cii0gVT1sI+1U6ZZCh/roVH4xOT5/ HqtOOnQ+X0mwUx2VG3Yv2VPI7VW68sJ3/y9D8R4tnMEsyQ4CmDw96Pre3NyKr/Av jmW7SK94fOkpNFJOMk3zpk7GtRUlCsVkS1P61dOmMYduguhel/V20rWlx83BgnAY Rf/c3rBjqe8Ri3rzBP5icY/d6OgwoafuhME31DD/j6oKOh+EoQBvA4urj46yMTMX /mrK7hCm/wqwuOOvgGbo7sfZNBLCYy3SZ3EyF4beDERhPF1DaSvCwOULpGVJroqu SelFsKXAtEbYrDgsan+MYlx3bQv43q7PbHska1gjkH91plO4nAsssPr5VsusUKmT sq8jyBaauZb40oLOSgooL4RqAHrfs8q5695Ouwh/DB/XovMezUI1N/BkpGFmqpJI o2xH9P5q520pkB8pFhN9TbRuFSGe/dbWC24QTq1DUajo3M3RwcwX6ua9hoAKLtDF pCV5DNcVcXHD3Cxp0M5dQ5JEAiCnW+ZpUWgxPQamGDNW5PEvjDmFwql2uWw/qOuW lkheOIffq8ejUBQFbN8VXfIzzeeKQNFiIcViaqGITjIwhqdHAzVi28OuIGwtdh3g ywLzSC8yvyzgKrNBgtFMr3ucKN0FoPxpBro253xt2H7w8srXW64= =5V9t -----END PGP SIGNATURE----- Merge tag 'rfds-for-linus-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RFDS mitigation from Dave Hansen: "RFDS is a CPU vulnerability that may allow a malicious userspace to infer stale register values from kernel space. Kernel registers can have all kinds of secrets in them so the mitigation is basically to wait until the kernel is about to return to userspace and has user values in the registers. At that point there is little chance of kernel secrets ending up in the registers and the microarchitectural state can be cleared. This leverages some recent robustness fixes for the existing MDS vulnerability. Both MDS and RFDS use the VERW instruction for mitigation" * tag 'rfds-for-linus-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: KVM/x86: Export RFDS_NO and RFDS_CLEAR to guests x86/rfds: Mitigate Register File Data Sampling (RFDS) Documentation/hw-vuln: Add documentation for RFDS x86/mmio: Disable KVM mitigation when X86_FEATURE_CLEAR_CPU_BUF is set |
||
![]() |
38b334fc76 |
- Add the x86 part of the SEV-SNP host support. This will allow the
kernel to be used as a KVM hypervisor capable of running SNP (Secure Nested Paging) guests. Roughly speaking, SEV-SNP is the ultimate goal of the AMD confidential computing side, providing the most comprehensive confidential computing environment up to date. This is the x86 part and there is a KVM part which did not get ready in time for the merge window so latter will be forthcoming in the next cycle. - Rework the early code's position-dependent SEV variable references in order to allow building the kernel with clang and -fPIE/-fPIC and -mcmodel=kernel - The usual set of fixes, cleanups and improvements all over the place -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmXvH0wACgkQEsHwGGHe VUrzmA//VS/n6dhHRnm/nAGngr4PeegkgV1OhyKYFfiZ272rT6P9QvblQrgcY0dc Ij1DOhEKlke51pTHvMOQ33B3P4Fuc0mx3dpCLY0up5V26kzQiKCjRKEkC4U1bcw8 W4GqMejaR89bE14bYibmwpSib9T/uVsV65eM3xf1iF5UvsnoUaTziymDoy+nb43a B1pdd5vcl4mBNqXeEvt0qjg+xkMLpWUI9tJDB8mbMl/cnIFGgMZzBaY8oktHSROK QpuUnKegOgp1RXpfLbNjmZ2Q4Rkk4MNazzDzWq3EIxaRjXL3Qp507ePK7yeA2qa0 J3jCBQc9E2j7lfrIkUgNIzOWhMAXM2YH5bvH6UrIcMi1qsWJYDmkp2MF1nUedjdf Wj16/pJbeEw1aKKIywJGwsmViSQju158vY3SzXG83U/A/Iz7zZRHFmC/ALoxZptY Bi7VhfcOSpz98PE3axnG8CvvxRDWMfzBr2FY1VmQbg6VBNo1Xl1aP/IH1I8iQNKg /laBYl/qP+1286TygF1lthYROb1lfEIJprgi2xfO6jVYUqPb7/zq2sm78qZRfm7l 25PN/oHnuidfVfI/H3hzcGubjOG9Zwra8WWYBB2EEmelf21rT0OLqq+eS4T6pxFb GNVfc0AzG77UmqbrpkAMuPqL7LrGaSee4NdU3hkEdSphlx1/YTo= =c1ps -----END PGP SIGNATURE----- Merge tag 'x86_sev_for_v6.9_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SEV updates from Borislav Petkov: - Add the x86 part of the SEV-SNP host support. This will allow the kernel to be used as a KVM hypervisor capable of running SNP (Secure Nested Paging) guests. Roughly speaking, SEV-SNP is the ultimate goal of the AMD confidential computing side, providing the most comprehensive confidential computing environment up to date. This is the x86 part and there is a KVM part which did not get ready in time for the merge window so latter will be forthcoming in the next cycle. - Rework the early code's position-dependent SEV variable references in order to allow building the kernel with clang and -fPIE/-fPIC and -mcmodel=kernel - The usual set of fixes, cleanups and improvements all over the place * tag 'x86_sev_for_v6.9_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits) x86/sev: Disable KMSAN for memory encryption TUs x86/sev: Dump SEV_STATUS crypto: ccp - Have it depend on AMD_IOMMU iommu/amd: Fix failure return from snp_lookup_rmpentry() x86/sev: Fix position dependent variable references in startup code crypto: ccp: Make snp_range_list static x86/Kconfig: Remove CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT Documentation: virt: Fix up pre-formatted text block for SEV ioctls crypto: ccp: Add the SNP_SET_CONFIG command crypto: ccp: Add the SNP_COMMIT command crypto: ccp: Add the SNP_PLATFORM_STATUS command x86/cpufeatures: Enable/unmask SEV-SNP CPU feature KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe crypto: ccp: Add panic notifier for SEV/SNP firmware shutdown on kdump iommu/amd: Clean up RMP entries for IOMMU pages during SNP shutdown crypto: ccp: Handle legacy SEV commands when SNP is enabled crypto: ccp: Handle non-volatile INIT_EX data when SNP is enabled crypto: ccp: Handle the legacy TMR allocation when SNP is enabled x86/sev: Introduce an SNP leaked pages list crypto: ccp: Provide an API to issue SEV and SNP commands ... |
||
![]() |
8076fcde01 |
x86/rfds: Mitigate Register File Data Sampling (RFDS)
RFDS is a CPU vulnerability that may allow userspace to infer kernel stale data previously used in floating point registers, vector registers and integer registers. RFDS only affects certain Intel Atom processors. Intel released a microcode update that uses VERW instruction to clear the affected CPU buffers. Unlike MDS, none of the affected cores support SMT. Add RFDS bug infrastructure and enable the VERW based mitigation by default, that clears the affected buffers just before exiting to userspace. Also add sysfs reporting and cmdline parameter "reg_file_data_sampling" to control the mitigation. For details see: Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> |
||
![]() |
d7b69b590b |
x86/sev: Dump SEV_STATUS
It is, and will be even more useful in the future, to dump the SEV features enabled according to SEV_STATUS. Do so: [ 0.542753] Memory Encryption Features active: AMD SEV SEV-ES SEV-SNP [ 0.544425] SEV: Status: SEV SEV-ES SEV-SNP DebugSwap Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikunj A Dadhania <nikunj@amd.com> Link: https://lore.kernel.org/r/20240219094216.GAZdMieDHKiI8aaP3n@fat_crate.local |
||
![]() |
cd6df3f378 |
x86/cpu: Add MSR numbers for FRED configuration
Add MSR numbers for the FRED configuration registers per FRED spec 5.0. Originally-by: Megha Dey <megha.dey@intel.com> Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Xin Li <xin3.li@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Shan Kang <shan.kang@intel.com> Link: https://lore.kernel.org/r/20231205105030.8698-13-xin3.li@intel.com |
||
![]() |
216d106c7f |
x86/sev: Add SEV-SNP host initialization support
The memory integrity guarantees of SEV-SNP are enforced through a new structure called the Reverse Map Table (RMP). The RMP is a single data structure shared across the system that contains one entry for every 4K page of DRAM that may be used by SEV-SNP VMs. The APM Volume 2 section on Secure Nested Paging (SEV-SNP) details a number of steps needed to detect/enable SEV-SNP and RMP table support on the host: - Detect SEV-SNP support based on CPUID bit - Initialize the RMP table memory reported by the RMP base/end MSR registers and configure IOMMU to be compatible with RMP access restrictions - Set the MtrrFixDramModEn bit in SYSCFG MSR - Set the SecureNestedPagingEn and VMPLEn bits in the SYSCFG MSR - Configure IOMMU RMP table entry format is non-architectural and it can vary by processor. It is defined by the PPR document for each respective CPU family. Restrict SNP support to CPU models/families which are compatible with the current RMP table entry format to guard against any undefined behavior when running on other system types. Future models/support will handle this through an architectural mechanism to allow for broader compatibility. SNP host code depends on CONFIG_KVM_AMD_SEV config flag which may be enabled even when CONFIG_AMD_MEM_ENCRYPT isn't set, so update the SNP-specific IOMMU helpers used here to rely on CONFIG_KVM_AMD_SEV instead of CONFIG_AMD_MEM_ENCRYPT. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Co-developed-by: Ashish Kalra <ashish.kalra@amd.com> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Co-developed-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Co-developed-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Co-developed-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Michael Roth <michael.roth@amd.com> Link: https://lore.kernel.org/r/20240126041126.1927228-5-michael.roth@amd.com |
||
![]() |
b4442cadca |
- Add support managing TDX host hardware
-----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmWfCRQACgkQaDWVMHDJ krDUqQ//VCvkpf0mAbYDJa1oTXFW8O5cVTusBtPi8k7cFbtjQpjno/9AqKol+sK8 AKg+y5iHHl7QJmDmEcpS+O9OBbmFOpvDzm3QZhk8RkWS5pe0B108dnINYtS0eP9R MkzZwfrI2yC6NX4hvHGdD8WGHjrt+oxY0bojehX87JZsyRU+xqc/g1OO7a5bUPQe 3Ip0kKiCeqFv0y+Q1pFMEd9RdZ8XxqzUHCJT3hfgZ6FajJ2eVy6jNrPOm6LozycB eOtYYNapSgw3k/WhJCOYWHX7kePXibLxBRONLpi6P3U6pMVk4n8wrgl7qPtdW1Qx nR2UHX5P6eFkxNCuU1BzvmPBROe37C51MFVw29eRnigvuX3j/vfCH1+17xQOVKVv 5JyxYA0rJWqoOz6mX7YaNJHlmrxHzeKXudICyOFuu1j5c8CuGjh8NQsOSCq16XfZ hPzfYDUS8I7/kHYQPJlnB+kF9pmbyjTM70h74I8D6ZWvXESHJZt+TYPyWfkBXP/P L9Pwx1onAyoBApGxCWuvgGTLonzNredgYG4ABbqhUqxqncJS9M7Y/yJa+f+3SOkR T6LxoByuDVld5cIfbOzRwIaRezZDe/NL7rkHm/DWo98OaV3zILsr20Hx1lPZ1Vce ryZ9lCdZGGxm2jmpzr/VymPQz/E+ezahRHE1+F3su8jpCU41txg= =1EJI -----END PGP SIGNATURE----- Merge tag 'x86_tdx_for_6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 TDX updates from Dave Hansen: "This contains the initial support for host-side TDX support so that KVM can run TDX-protected guests. This does not include the actual KVM-side support which will come from the KVM folks. The TDX host interactions with kexec also needs to be ironed out before this is ready for prime time, so this code is currently Kconfig'd off when kexec is on. The majority of the code here is the kernel telling the TDX module which memory to protect and handing some additional memory over to it to use to store TDX module metadata. That sounds pretty simple, but the TDX architecture is rather flexible and it takes quite a bit of back-and-forth to say, "just protect all memory, please." There is also some code tacked on near the end of the series to handle a hardware erratum. The erratum can make software bugs such as a kernel write to TDX-protected memory cause a machine check and masquerade as a real hardware failure. The erratum handling watches out for these and tries to provide nicer user errors" * tag 'x86_tdx_for_6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits) x86/virt/tdx: Make TDX host depend on X86_MCE x86/virt/tdx: Disable TDX host support when kexec is enabled Documentation/x86: Add documentation for TDX host support x86/mce: Differentiate real hardware #MCs from TDX erratum ones x86/cpu: Detect TDX partial write machine check erratum x86/virt/tdx: Handle TDX interaction with sleep and hibernation x86/virt/tdx: Initialize all TDMRs x86/virt/tdx: Configure global KeyID on all packages x86/virt/tdx: Configure TDX module with the TDMRs and global KeyID x86/virt/tdx: Designate reserved areas for all TDMRs x86/virt/tdx: Allocate and set up PAMTs for TDMRs x86/virt/tdx: Fill out TDMRs to cover all TDX memory regions x86/virt/tdx: Add placeholder to construct TDMRs to cover all TDX memory regions x86/virt/tdx: Get module global metadata for module initialization x86/virt/tdx: Use all system memory when initializing TDX module as TDX memory x86/virt/tdx: Add skeleton to enable TDX on demand x86/virt/tdx: Add SEAMCALL error printing for module initialization x86/virt/tdx: Handle SEAMCALL no entropy error in common code x86/virt/tdx: Make INTEL_TDX_HOST depend on X86_X2APIC x86/virt/tdx: Define TDX supported page sizes as macros ... |
||
![]() |
765a0542fd |
x86/virt/tdx: Detect TDX during kernel boot
Intel Trust Domain Extensions (TDX) protects guest VMs from malicious host and certain physical attacks. A CPU-attested software module called 'the TDX module' runs inside a new isolated memory range as a trusted hypervisor to manage and run protected VMs. Pre-TDX Intel hardware has support for a memory encryption architecture called MKTME. The memory encryption hardware underpinning MKTME is also used for Intel TDX. TDX ends up "stealing" some of the physical address space from the MKTME architecture for crypto-protection to VMs. The BIOS is responsible for partitioning the "KeyID" space between legacy MKTME and TDX. The KeyIDs reserved for TDX are called 'TDX private KeyIDs' or 'TDX KeyIDs' for short. During machine boot, TDX microcode verifies that the BIOS programmed TDX private KeyIDs consistently and correctly programmed across all CPU packages. The MSRs are locked in this state after verification. This is why MSR_IA32_MKTME_KEYID_PARTITIONING gets used for TDX enumeration: it indicates not just that the hardware supports TDX, but that all the boot-time security checks passed. The TDX module is expected to be loaded by the BIOS when it enables TDX, but the kernel needs to properly initialize it before it can be used to create and run any TDX guests. The TDX module will be initialized by the KVM subsystem when KVM wants to use TDX. Detect platform TDX support by detecting TDX private KeyIDs. The TDX module itself requires one TDX KeyID as the 'TDX global KeyID' to protect its metadata. Each TDX guest also needs a TDX KeyID for its own protection. Just use the first TDX KeyID as the global KeyID and leave the rest for TDX guests. If no TDX KeyID is left for TDX guests, disable TDX as initializing the TDX module alone is useless. [ dhansen: add X86_FEATURE, replace helper function ] Signed-off-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://lore.kernel.org/all/20231208170740.53979-1-dave.hansen%40intel.com |
||
![]() |
5d2d4a9f60 |
Merge branch 'tip/perf/urgent'
Avoid conflicts, base on fixes. Signed-off-by: Peter Zijlstra <peterz@infradead.org> |
||
![]() |
6803bd7956 |
ARM:
* Generalized infrastructure for 'writable' ID registers, effectively allowing userspace to opt-out of certain vCPU features for its guest * Optimization for vSGI injection, opportunistically compressing MPIDR to vCPU mapping into a table * Improvements to KVM's PMU emulation, allowing userspace to select the number of PMCs available to a VM * Guest support for memory operation instructions (FEAT_MOPS) * Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing bugs and getting rid of useless code * Changes to the way the SMCCC filter is constructed, avoiding wasted memory allocations when not in use * Load the stage-2 MMU context at vcpu_load() for VHE systems, reducing the overhead of errata mitigations * Miscellaneous kernel and selftest fixes LoongArch: * New architecture. The hardware uses the same model as x86, s390 and RISC-V, where guest/host mode is orthogonal to supervisor/user mode. The virtualization extensions are very similar to MIPS, therefore the code also has some similarities but it's been cleaned up to avoid some of the historical bogosities that are found in arch/mips. The kernel emulates MMU, timer and CSR accesses, while interrupt controllers are only emulated in userspace, at least for now. RISC-V: * Support for the Smstateen and Zicond extensions * Support for virtualizing senvcfg * Support for virtualized SBI debug console (DBCN) S390: * Nested page table management can be monitored through tracepoints and statistics x86: * Fix incorrect handling of VMX posted interrupt descriptor in KVM_SET_LAPIC, which could result in a dropped timer IRQ * Avoid WARN on systems with Intel IPI virtualization * Add CONFIG_KVM_MAX_NR_VCPUS, to allow supporting up to 4096 vCPUs without forcing more common use cases to eat the extra memory overhead. * Add virtualization support for AMD SRSO mitigation (IBPB_BRTYPE and SBPB, aka Selective Branch Predictor Barrier). * Fix a bug where restoring a vCPU snapshot that was taken within 1 second of creating the original vCPU would cause KVM to try to synchronize the vCPU's TSC and thus clobber the correct TSC being set by userspace. * Compute guest wall clock using a single TSC read to avoid generating an inaccurate time, e.g. if the vCPU is preempted between multiple TSC reads. * "Virtualize" HWCR.TscFreqSel to make Linux guests happy, which complain about a "Firmware Bug" if the bit isn't set for select F/M/S combos. Likewise "virtualize" (ignore) MSR_AMD64_TW_CFG to appease Windows Server 2022. * Don't apply side effects to Hyper-V's synthetic timer on writes from userspace to fix an issue where the auto-enable behavior can trigger spurious interrupts, i.e. do auto-enabling only for guest writes. * Remove an unnecessary kick of all vCPUs when synchronizing the dirty log without PML enabled. * Advertise "support" for non-serializing FS/GS base MSR writes as appropriate. * Harden the fast page fault path to guard against encountering an invalid root when walking SPTEs. * Omit "struct kvm_vcpu_xen" entirely when CONFIG_KVM_XEN=n. * Use the fast path directly from the timer callback when delivering Xen timer events, instead of waiting for the next iteration of the run loop. This was not done so far because previously proposed code had races, but now care is taken to stop the hrtimer at critical points such as restarting the timer or saving the timer information for userspace. * Follow the lead of upstream Xen and ignore the VCPU_SSHOTTMR_future flag. * Optimize injection of PMU interrupts that are simultaneous with NMIs. * Usual handful of fixes for typos and other warts. x86 - MTRR/PAT fixes and optimizations: * Clean up code that deals with honoring guest MTRRs when the VM has non-coherent DMA and host MTRRs are ignored, i.e. EPT is enabled. * Zap EPT entries when non-coherent DMA assignment stops/start to prevent using stale entries with the wrong memtype. * Don't ignore guest PAT for CR0.CD=1 && KVM_X86_QUIRK_CD_NW_CLEARED=y. This was done as a workaround for virtual machine BIOSes that did not bother to clear CR0.CD (because ancient KVM/QEMU did not bother to set it, in turn), and there's zero reason to extend the quirk to also ignore guest PAT. x86 - SEV fixes: * Report KVM_EXIT_SHUTDOWN instead of EINVAL if KVM intercepts SHUTDOWN while running an SEV-ES guest. * Clean up the recognition of emulation failures on SEV guests, when KVM would like to "skip" the instruction but it had already been partially emulated. This makes it possible to drop a hack that second guessed the (insufficient) information provided by the emulator, and just do the right thing. Documentation: * Various updates and fixes, mostly for x86 * MTRR and PAT fixes and optimizations: -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmVBZc0UHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroP1LQf+NgsmZ1lkGQlKdSdijoQ856w+k0or l2SV1wUwiEdFPSGK+RTUlHV5Y1ni1dn/CqCVIJZKEI3ZtZ1m9/4HKIRXvbMwFHIH hx+E4Lnf8YUjsGjKTLd531UKcpphztZavQ6pXLEwazkSkDEra+JIKtooI8uU+9/p bd/eF1V+13a8CHQf1iNztFJVxqBJbVlnPx4cZDRQQvewskIDGnVDtwbrwCUKGtzD eNSzhY7si6O2kdQNkuA8xPhg29dYX9XLaCK2K1l8xOUm8WipLdtF86GAKJ5BVuOL 6ek/2QCYjZ7a+coAZNfgSEUi8JmFHEqCo7cnKmWzPJp+2zyXsdudqAhT1g== =UIxm -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "ARM: - Generalized infrastructure for 'writable' ID registers, effectively allowing userspace to opt-out of certain vCPU features for its guest - Optimization for vSGI injection, opportunistically compressing MPIDR to vCPU mapping into a table - Improvements to KVM's PMU emulation, allowing userspace to select the number of PMCs available to a VM - Guest support for memory operation instructions (FEAT_MOPS) - Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing bugs and getting rid of useless code - Changes to the way the SMCCC filter is constructed, avoiding wasted memory allocations when not in use - Load the stage-2 MMU context at vcpu_load() for VHE systems, reducing the overhead of errata mitigations - Miscellaneous kernel and selftest fixes LoongArch: - New architecture for kvm. The hardware uses the same model as x86, s390 and RISC-V, where guest/host mode is orthogonal to supervisor/user mode. The virtualization extensions are very similar to MIPS, therefore the code also has some similarities but it's been cleaned up to avoid some of the historical bogosities that are found in arch/mips. The kernel emulates MMU, timer and CSR accesses, while interrupt controllers are only emulated in userspace, at least for now. RISC-V: - Support for the Smstateen and Zicond extensions - Support for virtualizing senvcfg - Support for virtualized SBI debug console (DBCN) S390: - Nested page table management can be monitored through tracepoints and statistics x86: - Fix incorrect handling of VMX posted interrupt descriptor in KVM_SET_LAPIC, which could result in a dropped timer IRQ - Avoid WARN on systems with Intel IPI virtualization - Add CONFIG_KVM_MAX_NR_VCPUS, to allow supporting up to 4096 vCPUs without forcing more common use cases to eat the extra memory overhead. - Add virtualization support for AMD SRSO mitigation (IBPB_BRTYPE and SBPB, aka Selective Branch Predictor Barrier). - Fix a bug where restoring a vCPU snapshot that was taken within 1 second of creating the original vCPU would cause KVM to try to synchronize the vCPU's TSC and thus clobber the correct TSC being set by userspace. - Compute guest wall clock using a single TSC read to avoid generating an inaccurate time, e.g. if the vCPU is preempted between multiple TSC reads. - "Virtualize" HWCR.TscFreqSel to make Linux guests happy, which complain about a "Firmware Bug" if the bit isn't set for select F/M/S combos. Likewise "virtualize" (ignore) MSR_AMD64_TW_CFG to appease Windows Server 2022. - Don't apply side effects to Hyper-V's synthetic timer on writes from userspace to fix an issue where the auto-enable behavior can trigger spurious interrupts, i.e. do auto-enabling only for guest writes. - Remove an unnecessary kick of all vCPUs when synchronizing the dirty log without PML enabled. - Advertise "support" for non-serializing FS/GS base MSR writes as appropriate. - Harden the fast page fault path to guard against encountering an invalid root when walking SPTEs. - Omit "struct kvm_vcpu_xen" entirely when CONFIG_KVM_XEN=n. - Use the fast path directly from the timer callback when delivering Xen timer events, instead of waiting for the next iteration of the run loop. This was not done so far because previously proposed code had races, but now care is taken to stop the hrtimer at critical points such as restarting the timer or saving the timer information for userspace. - Follow the lead of upstream Xen and ignore the VCPU_SSHOTTMR_future flag. - Optimize injection of PMU interrupts that are simultaneous with NMIs. - Usual handful of fixes for typos and other warts. x86 - MTRR/PAT fixes and optimizations: - Clean up code that deals with honoring guest MTRRs when the VM has non-coherent DMA and host MTRRs are ignored, i.e. EPT is enabled. - Zap EPT entries when non-coherent DMA assignment stops/start to prevent using stale entries with the wrong memtype. - Don't ignore guest PAT for CR0.CD=1 && KVM_X86_QUIRK_CD_NW_CLEARED=y This was done as a workaround for virtual machine BIOSes that did not bother to clear CR0.CD (because ancient KVM/QEMU did not bother to set it, in turn), and there's zero reason to extend the quirk to also ignore guest PAT. x86 - SEV fixes: - Report KVM_EXIT_SHUTDOWN instead of EINVAL if KVM intercepts SHUTDOWN while running an SEV-ES guest. - Clean up the recognition of emulation failures on SEV guests, when KVM would like to "skip" the instruction but it had already been partially emulated. This makes it possible to drop a hack that second guessed the (insufficient) information provided by the emulator, and just do the right thing. Documentation: - Various updates and fixes, mostly for x86 - MTRR and PAT fixes and optimizations" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (164 commits) KVM: selftests: Avoid using forced target for generating arm64 headers tools headers arm64: Fix references to top srcdir in Makefile KVM: arm64: Add tracepoint for MMIO accesses where ISV==0 KVM: arm64: selftest: Perform ISB before reading PAR_EL1 KVM: arm64: selftest: Add the missing .guest_prepare() KVM: arm64: Always invalidate TLB for stage-2 permission faults KVM: x86: Service NMI requests after PMI requests in VM-Enter path KVM: arm64: Handle AArch32 SPSR_{irq,abt,und,fiq} as RAZ/WI KVM: arm64: Do not let a L1 hypervisor access the *32_EL2 sysregs KVM: arm64: Refine _EL2 system register list that require trap reinjection arm64: Add missing _EL2 encodings arm64: Add missing _EL12 encodings KVM: selftests: aarch64: vPMU test for validating user accesses KVM: selftests: aarch64: vPMU register test for unimplemented counters KVM: selftests: aarch64: vPMU register test for implemented counters KVM: selftests: aarch64: Introduce vpmu_counter_access test tools: Import arm_pmuv3.h KVM: arm64: PMU: Allow userspace to limit PMCR_EL0.N for the guest KVM: arm64: Sanitize PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} before first run KVM: arm64: Add {get,set}_user for PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} ... |
||
![]() |
59fff63cc2 |
platform-drivers-x86 for v6.7-1
Highlights: - asus-wmi: Support for screenpad and solve brightness key press duplication - int3472: Eliminate the last use of deprecated GPIO functions - mlxbf-pmc: New HW support - msi-ec: Support new EC configurations - thinkpad_acpi: Support reading aux MAC address during passthrough - wmi: Fixes & improvements - x86-android-tablets: Detection fix and avoid use of GPIO private APIs - Debug & metrics interface improvements - Miscellaneous cleanups / fixes / improvements The following is an automated shortlog grouped by driver: acer-wmi: - Remove void function return amd/hsmp: - add support for metrics tbl - create plat specific struct - Fix iomem handling - improve the error log amd/pmc: - Add dump_custom_stb module parameter - Add PMFW command id to support S2D force flush - Handle overflow cases where the num_samples range is higher - Use flex array when calling amd_pmc_stb_debugfs_open_v2() asus-wireless: - Replace open coded acpi_match_acpi_device() asus-wmi: - add support for ASUS screenpad - Do not report brightness up/down keys when also reported by acpi_video gpiolib: acpi: - Add a ignore interrupt quirk for Peaq C1010 - Check if a GPIO is listed in ignore_interrupt earlier hp-bioscfg: - Annotate struct bios_args with __counted_by inspur-platform-profile: - Add platform profile support int3472: - Add new skl_int3472_fill_gpiod_lookup() helper - Add new skl_int3472_gpiod_get_from_temp_lookup() helper - Stop using gpiod_toggle_active_low() - Switch to devm_get_gpiod() intel: bytcrc_pwrsrc: - Convert to platform remove callback returning void intel/ifs: - Add new CPU support - Add new error code - ARRAY BIST for Sierra Forest - Gen2 scan image loading - Gen2 Scan test support - Metadata validation for start_chunk - Refactor image loading code - Store IFS generation number - Validate image size intel_speed_select_if: - Remove hardcoded map size - Use devm_ioremap_resource intel/tpmi: - Add debugfs support for read/write blocked - Add defines to get version information intel-uncore-freq: - Ignore minor version change ISST: - Allow level 0 to be not present - Ignore minor version change - Use fuse enabled mask instead of allowed levels mellanox: - Fix misspelling error in routine name - Rename some init()/exit() functions for consistent naming mlxbf-bootctl: - Convert to platform remove callback returning void mlxbf-pmc: - Add support for BlueField-3 mlxbf-tmfifo: - Convert to platform remove callback returning void mlx-Convert to platform remove callback returning void: - mlx-Convert to platform remove callback returning void mlxreg-hotplug: - Convert to platform remove callback returning void mlxreg-io: - Convert to platform remove callback returning void mlxreg-lc: - Convert to platform remove callback returning void msi-ec: - Add more EC configs - rename fn_super_swap nvsw-sn2201: - Convert to platform remove callback returning void sel3350-Convert to platform remove callback returning void: - sel3350-Convert to platform remove callback returning void siemens: simatic-ipc-batt-apollolake: - Convert to platform remove callback returning void siemens: simatic-ipc-batt: - Convert to platform remove callback returning void siemens: simatic-ipc-batt-elkhartlake: - Convert to platform remove callback returning void siemens: simatic-ipc-batt-f7188x: - Convert to platform remove callback returning void siemens: simatic-ipc-batt: - Simplify simatic_ipc_batt_remove() surface: acpi-notify: - Convert to platform remove callback returning void surface: aggregator: - Annotate struct ssam_event with __counted_by surface: aggregator-cdev: - Convert to platform remove callback returning void surface: aggregator-registry: - Convert to platform remove callback returning void surface: dtx: - Convert to platform remove callback returning void surface: gpe: - Convert to platform remove callback returning void surface: hotplug: - Convert to platform remove callback returning void surface: surface3-wmi: - Convert to platform remove callback returning void think-lmi: - Add bulk save feature - Replace kstrdup() + strreplace() with kstrdup_and_replace() - Use strreplace() to replace a character by nul thinkpad_acpi: - Add battery quirk for Thinkpad X120e - replace deprecated strncpy with memcpy - sysfs interface to auxmac tools/power/x86/intel-speed-select: - Display error for core-power support - Increase max CPUs in one request - No TRL for non compute domains - Sanitize integer arguments - turbo-mode enable disable swapped - Update help for TRL - Use cgroup isolate for CPU 0 - v1.18 release wmi: - Decouple probe deferring from wmi_block_list - Decouple WMI device removal from wmi_block_list - Fix opening of char device - Fix probe failure when failing to register WMI devices - Fix refcounting of WMI devices in legacy functions x86-android-tablets: - Add a comment about x86_android_tablet_get_gpiod() - Create a platform_device from module_init() - Drop "linux,power-supply-name" from lenovo_yt3_bq25892_0_props[] - Fix Lenovo Yoga Tablet 2 830F/L vs 1050F/L detection - Remove invalid_aei_gpiochip from Peaq C1010 - Remove invalid_aei_gpiochip support - Stop using gpiolib private APIs - Use platform-device as gpio-keys parent xo15-ebook: - Replace open coded acpi_match_acpi_device() Merges: - Merge branch 'pdx86/platform-drivers-x86-int3472' into review-ilpo - Merge branch 'pdx86/platform-drivers-x86-mellanox-init' into review-ilpo - Merge remote-tracking branch 'intel-speed-select/intel-sst' into review-ilpo - Merge remote-tracking branch 'pdx86/platform-drivers-x86-android-tablets' into review-hans -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSCSUwRdwTNL2MhaBlZrE9hU+XOMQUCZT+lBwAKCRBZrE9hU+XO Mck0AQCFU7dYLCF4d1CXtHf1eZhSXLpYdhcO+C08JGGoM+MqSgD+Jyb9KJHk4pxE FvKG51I9neyAne9lvNrLodHRzxCYgAo= =duM8 -----END PGP SIGNATURE----- Merge tag 'platform-drivers-x86-v6.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver updates from Ilpo Järvinen: - asus-wmi: Support for screenpad and solve brightness key press duplication - int3472: Eliminate the last use of deprecated GPIO functions - mlxbf-pmc: New HW support - msi-ec: Support new EC configurations - thinkpad_acpi: Support reading aux MAC address during passthrough - wmi: Fixes & improvements - x86-android-tablets: Detection fix and avoid use of GPIO private APIs - Debug & metrics interface improvements - Miscellaneous cleanups / fixes / improvements * tag 'platform-drivers-x86-v6.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (80 commits) platform/x86: inspur-platform-profile: Add platform profile support platform/x86: thinkpad_acpi: Add battery quirk for Thinkpad X120e platform/x86: wmi: Decouple WMI device removal from wmi_block_list platform/x86: wmi: Fix opening of char device platform/x86: wmi: Fix probe failure when failing to register WMI devices platform/x86: wmi: Fix refcounting of WMI devices in legacy functions platform/x86: wmi: Decouple probe deferring from wmi_block_list platform/x86/amd/hsmp: Fix iomem handling platform/x86: asus-wmi: Do not report brightness up/down keys when also reported by acpi_video platform/x86: thinkpad_acpi: replace deprecated strncpy with memcpy tools/power/x86/intel-speed-select: v1.18 release tools/power/x86/intel-speed-select: Use cgroup isolate for CPU 0 tools/power/x86/intel-speed-select: Increase max CPUs in one request tools/power/x86/intel-speed-select: Display error for core-power support tools/power/x86/intel-speed-select: No TRL for non compute domains tools/power/x86/intel-speed-select: turbo-mode enable disable swapped tools/power/x86/intel-speed-select: Update help for TRL tools/power/x86/intel-speed-select: Sanitize integer arguments platform/x86: acer-wmi: Remove void function return platform/x86/amd/pmc: Add dump_custom_stb module parameter ... |