linux

mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-18 22:14:16 +00:00

Author	SHA1	Message	Date
Marc Zyngier	cd08d3216f	KVM: arm64: Unify UNDEF injection helpers We currently have two helpers (undef_access() and trap_undef()) that do exactly the same thing: inject an UNDEF and return 'false' (as an indication that PC should not be incremented). We definitely could do with one less. Given that undef_access() is used 80ish times, while trap_undef() is only used 30 times, the latter loses the battle and is immediately sacrificed. We also have a large number of instances where undef_access() is open-coded. Let's also convert those. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240827152517.3909653-11-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-08-27 18:32:56 +01:00
Marc Zyngier	4a999a1d7a	KVM: arm64: Make most GICv3 accesses UNDEF if they trap We don't expect to trap any GICv3 register for host handling, apart from ICC_SRE_EL1 and the SGI registers. If they trap, that's because the guest is playing with us despite being told it doesn't have a GICv3. If it does, UNDEF is what it will get. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240827152517.3909653-10-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-08-27 18:32:56 +01:00
Marc Zyngier	59af011d00	KVM: arm64: Honor guest requested traps in GICv3 emulation On platforms that require emulation of the CPU interface, we still need to honor the traps requested by the guest (ICH_HCR_EL2 as well as the FGTs for ICC_IGRPEN{0,1}_EL1. Check for these bits early and lail out if any trap applies. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240827152517.3909653-9-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-08-27 18:32:55 +01:00
Marc Zyngier	15a1ba8d04	KVM: arm64: Add trap routing information for ICH_HCR_EL2 The usual song and dance. Anything that is a trap, any register it traps. Note that we don't handle the registers added by FEAT_NMI for now. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240827152517.3909653-8-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-08-27 18:32:55 +01:00
Marc Zyngier	9f5deace58	KVM: arm64: Add ICH_HCR_EL2 to the vcpu state As we are about to describe the trap routing for ICH_HCR_EL2, add the register to the vcpu state in its VNCR form, as well as reset Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240827152517.3909653-7-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-08-27 18:32:55 +01:00
Marc Zyngier	5cb57a1aff	KVM: arm64: Zero ID_AA64PFR0_EL1.GIC when no GICv3 is presented to the guest In order to be consistent, we shouldn't advertise a GICv3 when none is actually usable by the guest. Wipe the feature when these conditions apply, and allow the field to be written from userspace. This now allows us to rewrite the kvm_has_gicv3 helper() in terms of kvm_has_feat(), given that it is always evaluated at runtime. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240827152517.3909653-6-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-08-27 18:32:55 +01:00
Marc Zyngier	795a0bbaee	KVM: arm64: Add helper for last ditch idreg adjustments We already have to perform a set of last-chance adjustments for NV purposes. We will soon have to do the same for the GIC, so introduce a helper for that exact purpose. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240827152517.3909653-5-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-08-27 18:32:55 +01:00
Marc Zyngier	8d917e0a86	KVM: arm64: Force GICv3 trap activation when no irqchip is configured on VHE On a VHE system, no GICv3 traps get configured when no irqchip is present. This is not quite matching the "no GICv3" semantics that we want to present. Force such traps to be configured in this case. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240827152517.3909653-4-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-08-27 18:32:55 +01:00
Marc Zyngier	5739a961b5	KVM: arm64: Force SRE traps when SRE access is not enabled We so far only write the ICH_HCR_EL2 config in two situations: - when we need to emulate the GICv3 CPU interface due to HW bugs - when we do direct injection, as the virtual CPU interface needs to be enabled This is all good. But it also means that we don't do anything special when we emulate a GICv2, or that there is no GIC at all. What happens in this case when the guest uses the GICv3 system registers? The guest gets a trap for a sysreg access (EC=0x18) while we'd really like it to get an UNDEF. Fixing this is a bit involved: - we need to set all the required trap bits (TC, TALL0, TALL1, TDIR) - for these traps to take effect, we need to (counter-intuitively) set ICC_SRE_EL1.SRE to 1 so that the above traps take priority. Note that doesn't fully work when GICv2 emulation is enabled, as we cannot set ICC_SRE_EL1.SRE to 1 (it breaks Group0 delivery as IRQ). Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240827152517.3909653-3-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-08-27 18:32:55 +01:00
Marc Zyngier	d2137ba8d8	KVM: arm64: Move GICv3 trap configuration to kvm_calculate_traps() Follow the pattern introduced with vcpu_set_hcr(), and introduce vcpu_set_ich_hcr(), which configures the GICv3 traps at the same point. This will allow future changes to introduce trap configuration on a per-VM basis. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240827152517.3909653-2-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-08-27 18:32:55 +01:00
Marc Zyngier	0d56099ed5	Merge branch kvm-arm64/tlbi-fixes-6.12 into kvmarm-master/next * kvm-arm64/tlbi-fixes-6.12: : . : A couple of TLB invalidation fixes, only affecting pKVM : out of tree, courtesy of Will Deacon. : . KVM: arm64: Ensure TLBI uses correct VMID after changing context KVM: arm64: Invalidate EL1&0 TLB entries for all VMIDs in nvhe hyp init Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-08-27 18:26:47 +01:00
Marc Zyngier	13c7a51eeb	KVM: arm64: Expose ID_AA64PFR2_EL1 to userspace and guests Everything is now in place for a guest to "enjoy" FP8 support. Expose ID_AA64PFR2_EL1 to both userspace and guests, with the explicit restriction of only being able to clear FPMR. All other features (MTE* at the time of writing) are hidden and not writable. Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240820131802.3547589-9-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-08-27 07:59:27 +01:00
Marc Zyngier	c9150a8ad9	KVM: arm64: Enable FP8 support when available and configured If userspace has enabled FP8 support (by setting ID_AA64PFR2_EL1.FPMR to 1), let's enable the feature by setting HCRX_EL2.EnFPM for the vcpu. Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240820131802.3547589-8-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-08-27 07:59:27 +01:00
Marc Zyngier	6d7307651a	KVM: arm64: Expose ID_AA64FPFR0_EL1 as a writable ID reg ID_AA64FPFR0_EL1 contains all sort of bits that contain a description of which FP8 subfeatures are implemented. We don't really care about them, so let's just expose that register and allow userspace to disable subfeatures at will. Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240820131802.3547589-7-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-08-27 07:59:27 +01:00
Marc Zyngier	b8f669b491	KVM: arm64: Honor trap routing for FPMR HCRX_EL2.EnFPM controls the trapping of FPMR (as well as the validity of any FP8 instruction, but we don't really care about this last part). Describe the trap bit so that the exception can be reinjected in a NV guest. Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240820131802.3547589-6-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-08-27 07:59:27 +01:00
Marc Zyngier	ef3be86021	KVM: arm64: Add save/restore support for FPMR Just like the rest of the FP/SIMD state, FPMR needs to be context switched. The only interesting thing here is that we need to treat the pKVM part a bit differently, as the host FP state is never written back to the vcpu thread, but instead stored locally and eagerly restored. Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240820131802.3547589-5-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-08-27 07:59:27 +01:00
Marc Zyngier	7d9c1ed6f4	KVM: arm64: Move FPMR into the sysreg array Just like SVCR, FPMR is currently stored at the wrong location. Let's move it where it belongs. Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240820131802.3547589-4-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-08-27 07:59:27 +01:00
Marc Zyngier	b556889435	KVM: arm64: Move SVCR into the sysreg array SVCR is just a system register, and has no purpose being outside of the sysreg array. If anything, it only makes it more difficult to eventually support SME one day. If ever. Move it into the array with its little friends, and associate it with a visibility predicate. Although this is dead code, it at least paves the way for the next set of FP-related extensions. Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240820131802.3547589-2-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-08-27 07:59:27 +01:00
Shaoqin Huang	78c4446b5f	KVM: arm64: Allow userspace to change ID_AA64PFR1_EL1 Allow userspace to change the guest-visible value of the register with different way of handling: - Since the RAS and MPAM is not writable in the ID_AA64PFR0_EL1 register, RAS_frac and MPAM_frac are also not writable in the ID_AA64PFR1_EL1 register. - The MTE is controlled by a separate UAPI (KVM_CAP_ARM_MTE) with an internal flag (KVM_ARCH_FLAG_MTE_ENABLED). So it's not writable. - For those fields which KVM doesn't know how to handle, they are not exposed to the guest (being disabled in the register read accessor), those fields value will always be 0. Those fields don't have a known behavior now, so don't advertise them to the userspace. Thus still not writable. Those fields include SME, RNDR_trap, NMI, GCS, THE, DF2, PFAR, MTE_frac, MTEX. - The BT, SSBS, CSV2_frac don't introduce any new registers which KVM doesn't know how to handle, they can be written without ill effect. So let them writable. Besides, we don't do the crosscheck in KVM about the CSV2_frac even if it depends on the value of CSV2, it should be made sure by the VMM instead of KVM. Signed-off-by: Shaoqin Huang <shahuang@redhat.com> Link: https://lore.kernel.org/r/20240723072004.1470688-4-shahuang@redhat.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-08-25 17:48:44 +01:00
Shaoqin Huang	e8d164974c	KVM: arm64: Use kvm_has_feat() to check if FEAT_SSBS is advertised to the guest Currently KVM use cpus_have_final_cap() to check if FEAT_SSBS is advertised to the guest. But if FEAT_SSBS is writable and isn't advertised to the guest, this is wrong. Update it to use kvm_has_feat() to check if FEAT_SSBS is advertised to the guest, thus the KVM can do the right thing if FEAT_SSBS isn't advertised to the guest. Signed-off-by: Shaoqin Huang <shahuang@redhat.com> Link: https://lore.kernel.org/r/20240723072004.1470688-3-shahuang@redhat.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-08-25 17:48:44 +01:00
Shaoqin Huang	ffe68b2d19	KVM: arm64: Disable fields that KVM doesn't know how to handle in ID_AA64PFR1_EL1 For some of the fields in the ID_AA64PFR1_EL1 register, KVM doesn't know how to handle them right now. So explicitly disable them in the register accessor, then those fields value will be masked to 0 even if on the hardware the field value is 1. This is safe because from a UAPI point of view that read_sanitised_ftr_reg() doesn't yet return a nonzero value for any of those fields. This will benifit the migration if the host and VM have different values when restoring a VM. Those fields include RNDR_trap, NMI, MTE_frac, GCS, THE, MTEX, DF2, PFAR. Signed-off-by: Shaoqin Huang <shahuang@redhat.com> Link: https://lore.kernel.org/r/20240723072004.1470688-2-shahuang@redhat.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-08-25 17:48:43 +01:00
Shameer Kolothum	980c41f554	KVM: arm64: Make the exposed feature bits in AA64DFR0_EL1 writable from userspace KVM exposes the OS double lock feature bit to Guests but returns RAZ/WI on Guest OSDLR_EL1 access. This breaks Guest migration between systems where this feature differ. Add support to make this feature writable from userspace by setting the mask bit. While at it, set the mask bits for the exposed WRPs(Number of Watchpoints) as well. Also update the selftest to cover these fields. However we still can't make BRPs and CTX_CMPs fields writable, because as per ARM ARM DDI 0487K.a, section D2.8.3 Breakpoint types and linking of breakpoints, highest numbered breakpoints(BRPs) must be context aware breakpoints(CTX_CMPs). KVM does not trap + emulate the breakpoint registers, and as such cannot support a layout that misaligns with the underlying hardware. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20240816132819.34316-1-shameerali.kolothum.thodi@huawei.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-08-22 18:05:37 +01:00
Marc Zyngier	3e6245ebe7	KVM: arm64: Make ICC_SGI_EL1 undef in the absence of a vGICv3 On a system with a GICv3, if a guest hasn't been configured with GICv3 and that the host is not capable of GICv2 emulation, a write to any of the ICC_SGI_EL1 registers is trapped to EL2. We therefore try to emulate the SGI access, only to hit a NULL pointer as no private interrupt is allocated (no GIC, remember?). The obvious fix is to give the guest what it deserves, in the shape of a UNDEF exception. Reported-by: Alexander Potapenko <glider@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240820100349.3544850-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-08-22 08:08:37 +00:00
Oliver Upton	1d8c3c23a6	KVM: arm64: Ensure canonical IPA is hugepage-aligned when handling fault Zenghui reports that VMs backed by hugetlb pages are no longer booting after commit `fd276e71d1` ("KVM: arm64: nv: Handle shadow stage 2 page faults"). Support for shadow stage-2 MMUs introduced the concept of a fault IPA and canonical IPA to stage-2 fault handling. These are identical in the non-nested case, as the hardware stage-2 context is always that of the canonical IPA space. Both addresses need to be hugepage-aligned when preparing to install a hugepage mapping to ensure that KVM uses the correct GFN->PFN translation and installs that at the correct IPA for the current stage-2. And now I'm feeling thirsty after all this talk of IPAs... Fixes: `fd276e71d1` ("KVM: arm64: nv: Handle shadow stage 2 page faults") Reported-by: Zenghui Yu <yuzenghui@huawei.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240822071710.2291690-1-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-08-22 07:41:00 +00:00
Marc Zyngier	f616506754	KVM: arm64: vgic: Don't hold config_lock while unregistering redistributors We recently moved the teardown of the vgic part of a vcpu inside a critical section guarded by the config_lock. This teardown phase involves calling into kvm_io_bus_unregister_dev(), which takes the kvm->srcu lock. However, this violates the established order where kvm->srcu is taken on a memory fault (such as an MMIO access), possibly followed by taking the config_lock if the GIC emulation requires mutual exclusion from the other vcpus. It therefore results in a bad lockdep splat, as reported by Zenghui. Fix this by moving the call to kvm_io_bus_unregister_dev() outside of the config_lock critical section. At this stage, there shouln't be any need to hold the config_lock. As an additional bonus, document the ordering between kvm->slots_lock, kvm->srcu and kvm->arch.config_lock so that I cannot pretend I didn't know about those anymore. Fixes: `9eb18136af` ("KVM: arm64: vgic: Hold config_lock while tearing down a CPU interface") Reported-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Tested-by: Zenghui Yu <yuzenghui@huawei.com> Link: https://lore.kernel.org/r/20240819125045.3474845-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-08-19 17:05:21 +00:00
Zenghui Yu	2240a50e62	KVM: arm64: vgic-debug: Don't put unmarked LPIs If there were LPIs being mapped behind our back (i.e., between .start() and .stop()), we would put them at iter_unmark_lpis() without checking if they were actually marked, which is obviously not good. Switch to use the xa_for_each_marked() iterator to fix it. Cc: stable@vger.kernel.org Fixes: `85d3ccc8b7` ("KVM: arm64: vgic-debug: Use an xarray mark for debug iterator") Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240817101541.1664-1-yuzenghui@huawei.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-08-19 17:04:36 +00:00
Rob Herring (Arm)	d8226d8cfb	perf: arm_pmuv3: Add support for Armv9.4 PMU instruction counter Armv9.4/8.9 PMU adds optional support for a fixed instruction counter similar to the fixed cycle counter. Support for the feature is indicated in the ID_AA64DFR1_EL1 register PMICNTR field. The counter is not accessible in AArch32. Existing userspace using direct counter access won't know how to handle the fixed instruction counter, so we have to avoid using the counter when user access is requested. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Tested-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20240731-arm-pmu-3-9-icntr-v3-7-280a8d7ff465@kernel.org Signed-off-by: Will Deacon <will@kernel.org>	2024-08-16 13:09:12 +01:00
Rob Herring (Arm)	2f62701fa5	KVM: arm64: Refine PMU defines for number of counters There are 2 defines for the number of PMU counters: ARMV8_PMU_MAX_COUNTERS and ARMPMU_MAX_HWEVENTS. Both are the same currently, but Armv9.4/8.9 increases the number of possible counters from 32 to 33. With this change, the maximum number of counters will differ for KVM's PMU emulation which is PMUv3.4. Give KVM PMU emulation its own define to decouple it from the rest of the kernel's number PMU counters. The VHE PMU code needs to match the PMU driver, so switch it to use ARMPMU_MAX_HWEVENTS instead. Acked-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Tested-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20240731-arm-pmu-3-9-icntr-v3-6-280a8d7ff465@kernel.org Signed-off-by: Will Deacon <will@kernel.org>	2024-08-16 13:09:12 +01:00
Rob Herring (Arm)	126d7d7cce	arm64: perf/kvm: Use a common PMU cycle counter define The PMUv3 and KVM code each have a define for the PMU cycle counter index. Move KVM's define to a shared location and use it for PMUv3 driver. Reviewed-by: Marc Zyngier <maz@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Tested-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20240731-arm-pmu-3-9-icntr-v3-5-280a8d7ff465@kernel.org Signed-off-by: Will Deacon <will@kernel.org>	2024-08-16 13:09:12 +01:00
Rob Herring (Arm)	f9b11aa007	KVM: arm64: pmu: Use generated define for PMSELR_EL0.SEL access ARMV8_PMU_COUNTER_MASK is really a mask for the PMSELR_EL0.SEL register field. Make that clear by adding a standard sysreg definition for the register, and using it instead. Reviewed-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Tested-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20240731-arm-pmu-3-9-icntr-v3-4-280a8d7ff465@kernel.org Signed-off-by: Will Deacon <will@kernel.org>	2024-08-16 13:09:12 +01:00
Rob Herring (Arm)	741ee52845	KVM: arm64: pmu: Use arm_pmuv3.h register accessors Commit `df29ddf4f0` ("arm64: perf: Abstract system register accesses away") split off PMU register accessor functions to a standalone header. Let's use it for KVM PMU code and get rid one copy of the ugly switch macro. Acked-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Tested-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20240731-arm-pmu-3-9-icntr-v3-3-280a8d7ff465@kernel.org Signed-off-by: Will Deacon <will@kernel.org>	2024-08-16 13:09:12 +01:00
Rob Herring (Arm)	a4a6e2078d	perf: arm_pmuv3: Prepare for more than 32 counters Various PMUv3 registers which are a mask of counters are 64-bit registers, but the accessor functions take a u32. This has been fine as the upper 32-bits have been RES0 as there has been a maximum of 32 counters prior to Armv9.4/8.9. With Armv9.4/8.9, a 33rd counter is added. Update the accessor functions to use a u64 instead. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Tested-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20240731-arm-pmu-3-9-icntr-v3-2-280a8d7ff465@kernel.org Signed-off-by: Will Deacon <will@kernel.org>	2024-08-16 13:09:12 +01:00
Rob Herring (Arm)	bf5ffc8c80	perf: arm_pmu: Remove event index to counter remapping Xscale and Armv6 PMUs defined the cycle counter at 0 and event counters starting at 1 and had 1:1 event index to counter numbering. On Armv7 and later, this changed the cycle counter to 31 and event counters start at 0. The drivers for Armv7 and PMUv3 kept the old event index numbering and introduced an event index to counter conversion. The conversion uses masking to convert from event index to a counter number. This operation relies on having at most 32 counters so that the cycle counter index 0 can be transformed to counter number 31. Armv9.4 adds support for an additional fixed function counter (instructions) which increases possible counters to more than 32, and the conversion won't work anymore as a simple subtract and mask. The primary reason for the translation (other than history) seems to be to have a contiguous mask of counters 0-N. Keeping that would result in more complicated index to counter conversions. Instead, store a mask of available counters rather than just number of events. That provides more information in addition to the number of events. No (intended) functional changes. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Tested-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20240731-arm-pmu-3-9-icntr-v3-1-280a8d7ff465@kernel.org Signed-off-by: Will Deacon <will@kernel.org>	2024-08-16 13:09:11 +01:00
Will Deacon	ed49fe5a6f	KVM: arm64: Ensure TLBI uses correct VMID after changing context When the target context passed to enter_vmid_context() matches the current running context, the function returns early without manipulating the registers of the stage-2 MMU. This can result in a stale VMID due to the lack of an ISB instruction in exit_vmid_context() after writing the VTTBR when ARM64_WORKAROUND_SPECULATIVE_AT is not enabled. For example, with pKVM enabled: // Initially running in host context enter_vmid_context(guest); -> __load_stage2(guest); isb // Writes VTCR & VTTBR exit_vmid_context(guest); -> __load_stage2(host); // Restores VTCR & VTTBR enter_vmid_context(host); -> Returns early as we're already in host context tlbi vmalls12e1is // !!! Can use the stale VMID as we // haven't performed context // synchronisation since restoring // VTTBR.VMID Add an unconditional ISB instruction to exit_vmid_context() after restoring the VTTBR. This already existed for the ARM64_WORKAROUND_SPECULATIVE_AT path, so we can simply hoist that onto the common path. Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Fuad Tabba <tabba@google.com> Fixes: `58f3b0fc3b` ("KVM: arm64: Support TLB invalidation in guest context") Signed-off-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20240814123429.20457-3-will@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-08-15 14:05:02 +01:00
Will Deacon	dc0dddb1d6	KVM: arm64: Invalidate EL1&0 TLB entries for all VMIDs in nvhe hyp init When initialising the nVHE hypervisor, we invalidate potentially stale TLB entries for the EL1&0 regime using a 'vmalls12e1' invalidation. However, this invalidation operation applies only to the active VMID and therefore we could proceed with stale TLB entries for other VMIDs. Replace the operation with an 'alle1' which applies to all entries for the EL1&0 regime, regardless of the VMID. Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Fixes: `1025c8c0c6` ("KVM: arm64: Wrap the host with a stage 2") Signed-off-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20240814123429.20457-2-will@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-08-15 14:05:02 +01:00
Sean Christopherson	e0b7de4fd1	KVM: arm64: Disallow copying MTE to guest memory while KVM is dirty logging Disallow copying MTE tags to guest memory while KVM is dirty logging, as writing guest memory without marking the gfn as dirty in the memslot could result in userspace failing to migrate the updated page. Ideally (maybe?), KVM would simply mark the gfn as dirty, but there is no vCPU to work with, and presumably the only use case for copy MTE tags _to_ the guest is when restoring state on the target. Fixes: `f0376edb1d` ("KVM: arm64: Add ioctl to fetch/store tags in a guest") Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Steven Price <steven.price@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20240726235234.228822-3-seanjc@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-08-13 19:29:17 +01:00
Sean Christopherson	ae41d7dbae	KVM: arm64: Release pfn, i.e. put page, if copying MTE tags hits ZONE_DEVICE Put the page reference acquired by gfn_to_pfn_prot() if kvm_vm_ioctl_mte_copy_tags() runs into ZONE_DEVICE memory. KVM's less- than-stellar heuristics for dealing with pfn-mapped memory means that KVM can get a page reference to ZONE_DEVICE memory. Fixes: `f0376edb1d` ("KVM: arm64: Add ioctl to fetch/store tags in a guest") Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Steven Price <steven.price@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20240726235234.228822-2-seanjc@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-08-13 19:28:00 +01:00
Colton Lewis	38753cbc4d	KVM: arm64: Move data barrier to end of split walk This DSB guarantees page table updates have been made visible to the hardware table walker. Moving the DSB from stage2_split_walker() to after the walk is finished in kvm_pgtable_stage2_split() results in a roughly 70% reduction in Clear Dirty Log Time in dirty_log_perf_test (modified to use eager page splitting) when using huge pages. This gain holds steady through a range of vcpus used (tested 1-64) and memory used (tested 1-64GB). This is safe to do because nothing else is using the page tables while they are still being mapped and this is how other page table walkers already function. None of them have a data barrier in the walker itself because relative ordering of table PTEs to table contents comes from the release semantics of stage2_make_pte(). Signed-off-by: Colton Lewis <coltonlewis@google.com> Acked-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240808174243.2836363-1-coltonlewis@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-08-13 19:25:38 +01:00
Marc Zyngier	9eb18136af	KVM: arm64: vgic: Hold config_lock while tearing down a CPU interface Tearing down a vcpu CPU interface involves freeing the private interrupt array. If we don't hold the lock, we may race against another thread trying to configure it. Yeah, fuzzers do wonderful things... Taking the lock early solves this particular problem. Fixes: `03b3d00a70` ("KVM: arm64: vgic: Allocate private interrupts on demand") Reported-by: Alexander Potapenko <glider@google.com> Tested-by: Alexander Potapenko <glider@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240808091546.3262111-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-08-08 16:58:22 +00:00
Fuad Tabba	7e814a20f6	KVM: arm64: Tidying up PAuth code in KVM Tidy up some of the PAuth trapping code to clear up some comments and avoid clang/checkpatch warnings. Also, don't bother setting PAuth HCR_EL2 bits in pKVM, since it's handled by the hypervisor. Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20240722163311.1493879-1-tabba@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-08-07 19:11:05 +00:00
Zenghui Yu	01ab08cafe	KVM: arm64: vgic-debug: Exit the iterator properly w/o LPI In case the guest doesn't have any LPI, we previously relied on the iterator setting 'intid = nr_spis + VGIC_NR_PRIVATE_IRQS' && 'lpi_idx = 1' to exit the iterator. But it was broken with commit `85d3ccc8b7` ("KVM: arm64: vgic-debug: Use an xarray mark for debug iterator") -- the intid remains at 'nr_spis + VGIC_NR_PRIVATE_IRQS - 1', and we end up endlessly printing the last SPI's state. Consider that it's meaningless to search the LPI xarray and populate lpi_idx when there is no LPI, let's just skip the process for that case. The result is that * If there's no LPI, we focus on the intid and exit the iterator when it runs out of the valid SPI range. * Otherwise we keep the current logic and let the xarray drive the iterator. Fixes: `85d3ccc8b7` ("KVM: arm64: vgic-debug: Use an xarray mark for debug iterator") Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240807052024.2084-1-yuzenghui@huawei.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-08-07 19:10:22 +00:00
Marc Zyngier	10f2ad032d	KVM: arm64: Enforce dependency on an ARMv8.4-aware toolchain With the NV support of TLBI-range operations, KVM makes use of instructions that are only supported by binutils versions >= 2.30. This breaks the build for very old toolchains. Make KVM support conditional on having ARMv8.4 support in the assembler, side-stepping the issue. Fixes: `5d476ca57d` ("KVM: arm64: nv: Add handling of range-based TLBI operations") Reported-by: Viresh Kumar <viresh.kumar@linaro.org> Suggested-by: Arnd Bergmann <arnd@linaro.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20240807115144.3237260-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-08-07 19:09:35 +00:00
Sebastian Ott	19d837bc88	KVM: arm64: vgic: fix unexpected unlock sparse warnings Get rid of unexpected unlock sparse warnings in vgic code by adding an annotation to vgic_queue_irq_unlock(). arch/arm64/kvm/vgic/vgic.c:334:17: warning: context imbalance in 'vgic_queue_irq_unlock' - unexpected unlock arch/arm64/kvm/vgic/vgic.c:419:5: warning: context imbalance in 'kvm_vgic_inject_irq' - different lock contexts for basic block Signed-off-by: Sebastian Ott <sebott@redhat.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240723101204.7356-4-sebott@redhat.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-08-02 18:58:03 +00:00
Sebastian Ott	0aa34b37a7	KVM: arm64: fix kdoc warnings in W=1 builds Fix kdoc warnings by adding missing function parameter descriptions or by conversion to a normal comment. Signed-off-by: Sebastian Ott <sebott@redhat.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240723101204.7356-3-sebott@redhat.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-08-02 18:58:03 +00:00
Sebastian Ott	963a08e586	KVM: arm64: fix override-init warnings in W=1 builds Add -Wno-override-init to the build flags for sys_regs.c, handle_exit.c, and switch.c to fix warnings like the following: arch/arm64/kvm/hyp/vhe/switch.c:271:43: warning: initialized field overwritten [-Woverride-init] 271 \| [ESR_ELx_EC_CP15_32] = kvm_hyp_handle_cp15_32, \| Signed-off-by: Sebastian Ott <sebott@redhat.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240723101204.7356-2-sebott@redhat.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-08-02 18:58:03 +00:00
Danilo Krummrich	32b9a52f88	KVM: arm64: free kvm->arch.nested_mmus with kvfree() kvm->arch.nested_mmus is allocated with kvrealloc(), hence free it with kvfree() instead of kfree(). Fixes: `4f128f8e1a` ("KVM: arm64: nv: Support multiple nested Stage-2 mmu structures") Signed-off-by: Danilo Krummrich <dakr@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240723142204.758796-1-dakr@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-08-02 18:57:30 +00:00
Linus Torvalds	2c9b351240	ARM: * Initial infrastructure for shadow stage-2 MMUs, as part of nested virtualization enablement * Support for userspace changes to the guest CTR_EL0 value, enabling (in part) migration of VMs between heterogenous hardware * Fixes + improvements to pKVM's FF-A proxy, adding support for v1.1 of the protocol * FPSIMD/SVE support for nested, including merged trap configuration and exception routing * New command-line parameter to control the WFx trap behavior under KVM * Introduce kCFI hardening in the EL2 hypervisor * Fixes + cleanups for handling presence/absence of FEAT_TCRX * Miscellaneous fixes + documentation updates LoongArch: * Add paravirt steal time support. * Add support for KVM_DIRTY_LOG_INITIALLY_SET. * Add perf kvm-stat support for loongarch. RISC-V: * Redirect AMO load/store access fault traps to guest * perf kvm stat support * Use guest files for IMSIC virtualization, when available ONE_REG support for the Zimop, Zcmop, Zca, Zcf, Zcd, Zcb and Zawrs ISA extensions is coming through the RISC-V tree. s390: * Assortment of tiny fixes which are not time critical x86: * Fixes for Xen emulation. * Add a global struct to consolidate tracking of host values, e.g. EFER * Add KVM_CAP_X86_APIC_BUS_CYCLES_NS to allow configuring the effective APIC bus frequency, because TDX. * Print the name of the APICv/AVIC inhibits in the relevant tracepoint. * Clean up KVM's handling of vendor specific emulation to consistently act on "compatible with Intel/AMD", versus checking for a specific vendor. * Drop MTRR virtualization, and instead always honor guest PAT on CPUs that support self-snoop. * Update to the newfangled Intel CPU FMS infrastructure. * Don't advertise IA32_PERF_GLOBAL_OVF_CTRL as an MSR-to-be-saved, as it reads '0' and writes from userspace are ignored. * Misc cleanups x86 - MMU: * Small cleanups, renames and refactoring extracted from the upcoming Intel TDX support. * Don't allocate kvm_mmu_page.shadowed_translation for shadow pages that can't hold leafs SPTEs. * Unconditionally drop mmu_lock when allocating TDP MMU page tables for eager page splitting, to avoid stalling vCPUs when splitting huge pages. * Bug the VM instead of simply warning if KVM tries to split a SPTE that is non-present or not-huge. KVM is guaranteed to end up in a broken state because the callers fully expect a valid SPTE, it's all but dangerous to let more MMU changes happen afterwards. x86 - AMD: * Make per-CPU save_area allocations NUMA-aware. * Force sev_es_host_save_area() to be inlined to avoid calling into an instrumentable function from noinstr code. * Base support for running SEV-SNP guests. API-wise, this includes a new KVM_X86_SNP_VM type, encrypting/measure the initial image into guest memory, and finalizing it before launching it. Internally, there are some gmem/mmu hooks needed to prepare gmem-allocated pages before mapping them into guest private memory ranges. This includes basic support for attestation guest requests, enough to say that KVM supports the GHCB 2.0 specification. There is no support yet for loading into the firmware those signing keys to be used for attestation requests, and therefore no need yet for the host to provide certificate data for those keys. To support fetching certificate data from userspace, a new KVM exit type will be needed to handle fetching the certificate from userspace. An attempt to define a new KVM_EXIT_COCO/KVM_EXIT_COCO_REQ_CERTS exit type to handle this was introduced in v1 of this patchset, but is still being discussed by community, so for now this patchset only implements a stub version of SNP Extended Guest Requests that does not provide certificate data. x86 - Intel: * Remove an unnecessary EPT TLB flush when enabling hardware. * Fix a series of bugs that cause KVM to fail to detect nested pending posted interrupts as valid wake eents for a vCPU executing HLT in L2 (with HLT-exiting disable by L1). * KVM: x86: Suppress MMIO that is triggered during task switch emulation Explicitly suppress userspace emulated MMIO exits that are triggered when emulating a task switch as KVM doesn't support userspace MMIO during complex (multi-step) emulation. Silently ignoring the exit request can result in the WARN_ON_ONCE(vcpu->mmio_needed) firing if KVM exits to userspace for some other reason prior to purging mmio_needed. See commit `0dc902267c` ("KVM: x86: Suppress pending MMIO write exits if emulator detects exception") for more details on KVM's limitations with respect to emulated MMIO during complex emulator flows. Generic: * Rename the AS_UNMOVABLE flag that was introduced for KVM to AS_INACCESSIBLE, because the special casing needed by these pages is not due to just unmovability (and in fact they are only unmovable because the CPU cannot access them). * New ioctl to populate the KVM page tables in advance, which is useful to mitigate KVM page faults during guest boot or after live migration. The code will also be used by TDX, but (probably) not through the ioctl. * Enable halt poll shrinking by default, as Intel found it to be a clear win. * Setup empty IRQ routing when creating a VM to avoid having to synchronize SRCU when creating a split IRQCHIP on x86. * Rework the sched_in/out() paths to replace kvm_arch_sched_in() with a flag that arch code can use for hooking both sched_in() and sched_out(). * Take the vCPU @id as an "unsigned long" instead of "u32" to avoid truncating a bogus value from userspace, e.g. to help userspace detect bugs. * Mark a vCPU as preempted if and only if it's scheduled out while in the KVM_RUN loop, e.g. to avoid marking it preempted and thus writing guest memory when retrieving guest state during live migration blackout. Selftests: * Remove dead code in the memslot modification stress test. * Treat "branch instructions retired" as supported on all AMD Family 17h+ CPUs. * Print the guest pseudo-RNG seed only when it changes, to avoid spamming the log for tests that create lots of VMs. * Make the PMU counters test less flaky when counting LLC cache misses by doing CLFLUSH{OPT} in every loop iteration. -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmaZQB0UHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroNkZwf/bv2jiENaLFNGPe/VqTKMQ6PHQLMG +sNHx6fJPP35gTM8Jqf0/7/ummZXcSuC1mWrzYbecZm7Oeg3vwNXHZ4LquwwX6Dv 8dKcUzLbWDAC4WA3SKhi8C8RV2v6E7ohy69NtAJmFWTc7H95dtIQm6cduV2osTC3 OEuHe1i8d9umk6couL9Qhm8hk3i9v2KgCsrfyNrQgLtS3hu7q6yOTR8nT0iH6sJR KE5A8prBQgLmF34CuvYDw4Hu6E4j+0QmIqodovg2884W1gZQ9LmcVqYPaRZGsG8S iDdbkualLKwiR1TpRr3HJGKWSFdc7RblbsnHRvHIZgFsMQiimh4HrBSCyQ== =zepX -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "ARM: - Initial infrastructure for shadow stage-2 MMUs, as part of nested virtualization enablement - Support for userspace changes to the guest CTR_EL0 value, enabling (in part) migration of VMs between heterogenous hardware - Fixes + improvements to pKVM's FF-A proxy, adding support for v1.1 of the protocol - FPSIMD/SVE support for nested, including merged trap configuration and exception routing - New command-line parameter to control the WFx trap behavior under KVM - Introduce kCFI hardening in the EL2 hypervisor - Fixes + cleanups for handling presence/absence of FEAT_TCRX - Miscellaneous fixes + documentation updates LoongArch: - Add paravirt steal time support - Add support for KVM_DIRTY_LOG_INITIALLY_SET - Add perf kvm-stat support for loongarch RISC-V: - Redirect AMO load/store access fault traps to guest - perf kvm stat support - Use guest files for IMSIC virtualization, when available s390: - Assortment of tiny fixes which are not time critical x86: - Fixes for Xen emulation - Add a global struct to consolidate tracking of host values, e.g. EFER - Add KVM_CAP_X86_APIC_BUS_CYCLES_NS to allow configuring the effective APIC bus frequency, because TDX - Print the name of the APICv/AVIC inhibits in the relevant tracepoint - Clean up KVM's handling of vendor specific emulation to consistently act on "compatible with Intel/AMD", versus checking for a specific vendor - Drop MTRR virtualization, and instead always honor guest PAT on CPUs that support self-snoop - Update to the newfangled Intel CPU FMS infrastructure - Don't advertise IA32_PERF_GLOBAL_OVF_CTRL as an MSR-to-be-saved, as it reads '0' and writes from userspace are ignored - Misc cleanups x86 - MMU: - Small cleanups, renames and refactoring extracted from the upcoming Intel TDX support - Don't allocate kvm_mmu_page.shadowed_translation for shadow pages that can't hold leafs SPTEs - Unconditionally drop mmu_lock when allocating TDP MMU page tables for eager page splitting, to avoid stalling vCPUs when splitting huge pages - Bug the VM instead of simply warning if KVM tries to split a SPTE that is non-present or not-huge. KVM is guaranteed to end up in a broken state because the callers fully expect a valid SPTE, it's all but dangerous to let more MMU changes happen afterwards x86 - AMD: - Make per-CPU save_area allocations NUMA-aware - Force sev_es_host_save_area() to be inlined to avoid calling into an instrumentable function from noinstr code - Base support for running SEV-SNP guests. API-wise, this includes a new KVM_X86_SNP_VM type, encrypting/measure the initial image into guest memory, and finalizing it before launching it. Internally, there are some gmem/mmu hooks needed to prepare gmem-allocated pages before mapping them into guest private memory ranges This includes basic support for attestation guest requests, enough to say that KVM supports the GHCB 2.0 specification There is no support yet for loading into the firmware those signing keys to be used for attestation requests, and therefore no need yet for the host to provide certificate data for those keys. To support fetching certificate data from userspace, a new KVM exit type will be needed to handle fetching the certificate from userspace. An attempt to define a new KVM_EXIT_COCO / KVM_EXIT_COCO_REQ_CERTS exit type to handle this was introduced in v1 of this patchset, but is still being discussed by community, so for now this patchset only implements a stub version of SNP Extended Guest Requests that does not provide certificate data x86 - Intel: - Remove an unnecessary EPT TLB flush when enabling hardware - Fix a series of bugs that cause KVM to fail to detect nested pending posted interrupts as valid wake eents for a vCPU executing HLT in L2 (with HLT-exiting disable by L1) - KVM: x86: Suppress MMIO that is triggered during task switch emulation Explicitly suppress userspace emulated MMIO exits that are triggered when emulating a task switch as KVM doesn't support userspace MMIO during complex (multi-step) emulation Silently ignoring the exit request can result in the WARN_ON_ONCE(vcpu->mmio_needed) firing if KVM exits to userspace for some other reason prior to purging mmio_needed See commit `0dc902267c` ("KVM: x86: Suppress pending MMIO write exits if emulator detects exception") for more details on KVM's limitations with respect to emulated MMIO during complex emulator flows Generic: - Rename the AS_UNMOVABLE flag that was introduced for KVM to AS_INACCESSIBLE, because the special casing needed by these pages is not due to just unmovability (and in fact they are only unmovable because the CPU cannot access them) - New ioctl to populate the KVM page tables in advance, which is useful to mitigate KVM page faults during guest boot or after live migration. The code will also be used by TDX, but (probably) not through the ioctl - Enable halt poll shrinking by default, as Intel found it to be a clear win - Setup empty IRQ routing when creating a VM to avoid having to synchronize SRCU when creating a split IRQCHIP on x86 - Rework the sched_in/out() paths to replace kvm_arch_sched_in() with a flag that arch code can use for hooking both sched_in() and sched_out() - Take the vCPU @id as an "unsigned long" instead of "u32" to avoid truncating a bogus value from userspace, e.g. to help userspace detect bugs - Mark a vCPU as preempted if and only if it's scheduled out while in the KVM_RUN loop, e.g. to avoid marking it preempted and thus writing guest memory when retrieving guest state during live migration blackout Selftests: - Remove dead code in the memslot modification stress test - Treat "branch instructions retired" as supported on all AMD Family 17h+ CPUs - Print the guest pseudo-RNG seed only when it changes, to avoid spamming the log for tests that create lots of VMs - Make the PMU counters test less flaky when counting LLC cache misses by doing CLFLUSH{OPT} in every loop iteration" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits) crypto: ccp: Add the SNP_VLEK_LOAD command KVM: x86/pmu: Add kvm_pmu_call() to simplify static calls of kvm_pmu_ops KVM: x86: Introduce kvm_x86_call() to simplify static calls of kvm_x86_ops KVM: x86: Replace static_call_cond() with static_call() KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event x86/sev: Move sev_guest.h into common SEV header KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event KVM: x86: Suppress MMIO that is triggered during task switch emulation KVM: x86/mmu: Clean up make_huge_page_split_spte() definition and intro KVM: x86/mmu: Bug the VM if KVM tries to split a !hugepage SPTE KVM: selftests: x86: Add test for KVM_PRE_FAULT_MEMORY KVM: x86: Implement kvm_arch_vcpu_pre_fault_memory() KVM: x86/mmu: Make kvm_mmu_do_page_fault() return mapped level KVM: x86/mmu: Account pf_{fixed,emulate,spurious} in callers of "do page fault" KVM: x86/mmu: Bump pf_taken stat only in the "real" page fault handler KVM: Add KVM_PRE_FAULT_MEMORY vcpu ioctl to pre-populate guest memory KVM: Document KVM_PRE_FAULT_MEMORY ioctl mm, virt: merge AS_UNMOVABLE and AS_INACCESSIBLE perf kvm: Add kvm-stat for loongarch64 LoongArch: KVM: Add PV steal time support in guest side ...	2024-07-20 12:41:03 -07:00
Paolo Bonzini	86014c1e20	KVM generic changes for 6.11 - Enable halt poll shrinking by default, as Intel found it to be a clear win. - Setup empty IRQ routing when creating a VM to avoid having to synchronize SRCU when creating a split IRQCHIP on x86. - Rework the sched_in/out() paths to replace kvm_arch_sched_in() with a flag that arch code can use for hooking both sched_in() and sched_out(). - Take the vCPU @id as an "unsigned long" instead of "u32" to avoid truncating a bogus value from userspace, e.g. to help userspace detect bugs. - Mark a vCPU as preempted if and only if it's scheduled out while in the KVM_RUN loop, e.g. to avoid marking it preempted and thus writing guest memory when retrieving guest state during live migration blackout. - A few minor cleanups -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmaRuOYACgkQOlYIJqCj N/1UnQ/8CI5Qfr+/0gzYgtWmtEMczGG+rMNpzD3XVqPjJjXcMcBiQnplnzUVLhha vlPdYVK7vgmEt003XGzV55mik46LHL+DX/v4hI3HEdblfyCeNLW3fKEWVRB44qJe o+YUQwSK42SORUp9oXuQINxhA//U9EnI7CQxlJ8w8wenv5IJKfIGr01DefmfGPAV PKm9t6WLcNqvhZMEyy/zmzM3KVPCJL0NcwI97x6sHxFpQYIDtL0E/VexA4AFqMoT QK7cSDC/2US41Zvem/r/GzM/ucdF6vb9suzZYBohwhxtVhwJe2CDeYQZvtNKJ1U7 GOHPaKL6nBWdZCm/yyWbbX2nstY1lHqxhN3JD0X8wqU5rNcwm2b8Vfyav0Ehc7H+ jVbDTshOx4YJmIgajoKjgM050rdBK59TdfVL+l+AAV5q/TlHocalYtvkEBdGmIDg 2td9UHSime6sp20vQfczUEz4bgrQsh4l2Fa/qU2jFwLievnBw0AvEaMximkSGMJe b8XfjmdTjlOesWAejANKtQolfrq14+1wYw0zZZ8PA+uNVpKdoovmcqSOcaDC9bT8 GO/NFUvoG+lkcvJcIlo1SSl81SmGLosijwxWfGvFAqsgpR3/3l3dYp0QtztoCNJO d3+HnjgYn5o5FwufuTD3eUOXH4AFjG108DH0o25XrIkb2Kymy0o= =BalU -----END PGP SIGNATURE----- Merge tag 'kvm-x86-generic-6.11' of https://github.com/kvm-x86/linux into HEAD KVM generic changes for 6.11 - Enable halt poll shrinking by default, as Intel found it to be a clear win. - Setup empty IRQ routing when creating a VM to avoid having to synchronize SRCU when creating a split IRQCHIP on x86. - Rework the sched_in/out() paths to replace kvm_arch_sched_in() with a flag that arch code can use for hooking both sched_in() and sched_out(). - Take the vCPU @id as an "unsigned long" instead of "u32" to avoid truncating a bogus value from userspace, e.g. to help userspace detect bugs. - Mark a vCPU as preempted if and only if it's scheduled out while in the KVM_RUN loop, e.g. to avoid marking it preempted and thus writing guest memory when retrieving guest state during live migration blackout. - A few minor cleanups	2024-07-16 09:51:36 -04:00
Paolo Bonzini	1c5a0b55ab	KVM/arm64 changes for 6.11 - Initial infrastructure for shadow stage-2 MMUs, as part of nested virtualization enablement - Support for userspace changes to the guest CTR_EL0 value, enabling (in part) migration of VMs between heterogenous hardware - Fixes + improvements to pKVM's FF-A proxy, adding support for v1.1 of the protocol - FPSIMD/SVE support for nested, including merged trap configuration and exception routing - New command-line parameter to control the WFx trap behavior under KVM - Introduce kCFI hardening in the EL2 hypervisor - Fixes + cleanups for handling presence/absence of FEAT_TCRX - Miscellaneous fixes + documentation updates -----BEGIN PGP SIGNATURE----- iI0EABYIADUWIQSNXHjWXuzMZutrKNKivnWIJHzdFgUCZpTCAxccb2xpdmVyLnVw dG9uQGxpbnV4LmRldgAKCRCivnWIJHzdFjChAQCWs9ucJag4USgvXpg5mo9sxzly kBZZ1o49N/VLxs4cagEAtq3KVNQNQyGXelYH6gr20aI85j6VnZW5W5z+sy5TAgk= =sSOt -----END PGP SIGNATURE----- Merge tag 'kvmarm-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 changes for 6.11 - Initial infrastructure for shadow stage-2 MMUs, as part of nested virtualization enablement - Support for userspace changes to the guest CTR_EL0 value, enabling (in part) migration of VMs between heterogenous hardware - Fixes + improvements to pKVM's FF-A proxy, adding support for v1.1 of the protocol - FPSIMD/SVE support for nested, including merged trap configuration and exception routing - New command-line parameter to control the WFx trap behavior under KVM - Introduce kCFI hardening in the EL2 hypervisor - Fixes + cleanups for handling presence/absence of FEAT_TCRX - Miscellaneous fixes + documentation updates	2024-07-16 09:50:44 -04:00
Linus Torvalds	c89d780cc1	arm64 updates for 6.11: * Virtual CPU hotplug support for arm64 ACPI systems * cpufeature infrastructure cleanups and making the FEAT_ECBHB ID bits visible to guests * CPU errata: expand the speculative SSBS workaround to more CPUs * arm64 ACPI: - acpi=nospcr option to disable SPCR as default console for arm64 - Move some ACPI code (cpuidle, FFH) to drivers/acpi/arm64/ * GICv3, use compile-time PMR values: optimise the way regular IRQs are masked/unmasked when GICv3 pseudo-NMIs are used, removing the need for a static key in fast paths by using a priority value chosen dynamically at boot time * arm64 perf updates: - Rework of the IMX PMU driver to enable support for I.MX95 - Enable support for tertiary match groups in the CMN PMU driver - Initial refactoring of the CPU PMU code to prepare for the fixed instruction counter introduced by Arm v9.4 - Add missing PMU driver MODULE_DESCRIPTION() strings - Hook up DT compatibles for recent CPU PMUs * arm64 kselftest updates: - Kernel mode NEON fp-stress - Cleanups, spelling mistakes * arm64 Documentation update with a minor clarification on TBI * Miscellaneous: - Fix missing IPI statistics - Implement raw_smp_processor_id() using thread_info rather than a per-CPU variable (better code generation) - Make MTE checking of in-kernel asynchronous tag faults conditional on KASAN being enabled - Minor cleanups, typos -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmaQKN4ACgkQa9axLQDI XvE0Nw/+JZ6OEQ+DMUHXZfbWanvn1p0nVOoEV3MYVpOeQK1ILYCoDapatLNIlet0 wcja7tohKbL1ifc7GOqlkitu824LMlotncrdOBycRqb/4C5KuJ+XhygFv5hGfX0T Uh2zbo4w52FPPEUMICfEAHrKT3QB9tv7f66xeUNbWWFqUn3rY02/ZVQVVdw6Zc0e fVYWGUUoQDR7+9hRkk6tnYw3+9YFVAUAbLWk+DGrW7WsANi6HuJ/rBMibwFI6RkG SZDZHum6vnwx0Dj9H7WrYaQCvUMm7AlckhQGfPbIFhUk6pWysfJtP5Qk49yiMl7p oRk/GrSXpiKumuetgTeOHbokiE1Nb8beXx0OcsjCu4RrIaNipAEpH1AkYy5oiKoT 9vKZErMDtQgd96JHFVaXc+A3D2kxVfkc1u7K3TEfVRnZFV7CN+YL+61iyZ+uLxVi d9xrAmwRsWYFVQzlZG3NWvSeQBKisUA1L8JROlzWc/NFDwTqDGIt/zS4pZNL3+OM EXW0LyKt7Ijl6vPXKCXqrODRrPlcLc66VMZxofZOl0/dEqyJ+qLL4GUkWZu8lTqO BqydYnbTSjiDg/ntWjTrD0uJ8c40Qy7KTPEdaPqEIQvkDEsUGlOnhAQjHrnGNb9M psZtpDW2xm7GykEOcd6rgSz4Xeky2iLsaR4Wc7FTyDS0YRmeG44= =ob2k -----END PGP SIGNATURE----- Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "The biggest part is the virtual CPU hotplug that touches ACPI, irqchip. We also have some GICv3 optimisation for pseudo-NMIs that has been queued via the arm64 tree. Otherwise the usual perf updates, kselftest, various small cleanups. Core: - Virtual CPU hotplug support for arm64 ACPI systems - cpufeature infrastructure cleanups and making the FEAT_ECBHB ID bits visible to guests - CPU errata: expand the speculative SSBS workaround to more CPUs - GICv3, use compile-time PMR values: optimise the way regular IRQs are masked/unmasked when GICv3 pseudo-NMIs are used, removing the need for a static key in fast paths by using a priority value chosen dynamically at boot time ACPI: - 'acpi=nospcr' option to disable SPCR as default console for arm64 - Move some ACPI code (cpuidle, FFH) to drivers/acpi/arm64/ Perf updates: - Rework of the IMX PMU driver to enable support for I.MX95 - Enable support for tertiary match groups in the CMN PMU driver - Initial refactoring of the CPU PMU code to prepare for the fixed instruction counter introduced by Arm v9.4 - Add missing PMU driver MODULE_DESCRIPTION() strings - Hook up DT compatibles for recent CPU PMUs Kselftest updates: - Kernel mode NEON fp-stress - Cleanups, spelling mistakes Miscellaneous: - arm64 Documentation update with a minor clarification on TBI - Fix missing IPI statistics - Implement raw_smp_processor_id() using thread_info rather than a per-CPU variable (better code generation) - Make MTE checking of in-kernel asynchronous tag faults conditional on KASAN being enabled - Minor cleanups, typos" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (69 commits) selftests: arm64: tags: remove the result script selftests: arm64: tags_test: conform test to TAP output perf: add missing MODULE_DESCRIPTION() macros arm64: smp: Fix missing IPI statistics irqchip/gic-v3: Fix 'broken_rdists' unused warning when !SMP and !ACPI ACPI: Add acpi=nospcr to disable ACPI SPCR as default console on ARM64 Documentation: arm64: Update memory.rst for TBI arm64/cpufeature: Replace custom macros with fields from ID_AA64PFR0_EL1 KVM: arm64: Replace custom macros with fields from ID_AA64PFR0_EL1 perf: arm_pmuv3: Include asm/arm_pmuv3.h from linux/perf/arm_pmuv3.h perf: arm_v6/7_pmu: Drop non-DT probe support perf/arm: Move 32-bit PMU drivers to drivers/perf/ perf: arm_pmuv3: Drop unnecessary IS_ENABLED(CONFIG_ARM64) check perf: arm_pmuv3: Avoid assigning fixed cycle counter with threshold arm64: Kconfig: Fix dependencies to enable ACPI_HOTPLUG_CPU perf: imx_perf: add support for i.MX95 platform perf: imx_perf: fix counter start and config sequence perf: imx_perf: refactor driver for imx93 perf: imx_perf: let the driver manage the counter usage rather the user perf: imx_perf: add macro definitions for parsing config attr ...	2024-07-15 17:06:19 -07:00

... 2 3 4 5 6 ...

2731 commits