MDCR_EL2.HPME is the 'global' enable bit for event counters reserved for
EL2. Give the PMU a kick when it's changed to ensure events are
reprogrammed before returning to the guest.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20241217175550.3658212-1-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Having separate helpers for enabling/disabling counters provides the
wrong abstraction, as the state of each counter needs to be evaluated
independently and, in some cases, use a different global enable bit.
Collapse the enable/disable accessors into a single, common helper that
reconfigures every counter set in @mask, leaving the complexity of
determining if an event is actually enabled in
kvm_pmu_counter_is_enabled().
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20241217175513.3658056-1-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Catalin reports that a hypervisor lying to a guest about the size
of the ASID field may result in unexpected issues:
- if the underlying HW does only supports 8 bit ASIDs, the ASID
field in a TLBI VAE1* operation is only 8 bits, and the HW will
ignore the other 8 bits
- if on the contrary the HW is 16 bit capable, the ASID field
in the same TLBI operation is always 16 bits, irrespective of
the value of TCR_ELx.AS.
This could lead to missed invalidations if the guest was lead to
assume that the HW had 8 bit ASIDs while they really are 16 bit wide.
In order to avoid any potential disaster that would be hard to debug,
prenent the migration between a host with 8 bit ASIDs to one with
wider ASIDs (the converse was obviously always forbidden). This is
also consistent with what we already do for VMIDs.
If it becomes absolutely mandatory to support such a migration path
in the future, we will have to trap and emulate all TLBIs, something
that nobody should look forward to.
Fixes: d5a32b60dc ("KVM: arm64: Allow userspace to change ID_AA64MMFR{0-2}_EL1")
Reported-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20241203190236.505759-1-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
* kvm-arm64/nv-pmu:
: Support for vEL2 PMU controls
:
: Align the vEL2 PMU support with the current state of non-nested KVM,
: including:
:
: - Trap routing, with the annoying complication of EL2 traps that apply
: in Host EL0
:
: - PMU emulation, using the correct configuration bits depending on
: whether a counter falls in the hypervisor or guest range of PMCs
:
: - Perf event swizzling across nested boundaries, as the event filtering
: needs to be remapped to cope with vEL2
KVM: arm64: nv: Reprogram PMU events affected by nested transition
KVM: arm64: nv: Apply EL2 event filtering when in hyp context
KVM: arm64: nv: Honor MDCR_EL2.HLP
KVM: arm64: nv: Honor MDCR_EL2.HPME
KVM: arm64: Add helpers to determine if PMC counts at a given EL
KVM: arm64: nv: Adjust range of accessible PMCs according to HPMN
KVM: arm64: Rename kvm_pmu_valid_counter_mask()
KVM: arm64: nv: Advertise support for FEAT_HPMN0
KVM: arm64: nv: Describe trap behaviour of MDCR_EL2.HPMN
KVM: arm64: nv: Honor MDCR_EL2.{TPM, TPMCR} in Host EL0
KVM: arm64: nv: Reinject traps that take effect in Host EL0
KVM: arm64: nv: Rename BEHAVE_FORWARD_ANY
KVM: arm64: nv: Allow coarse-grained trap combos to use complex traps
KVM: arm64: Describe RES0/RES1 bits of MDCR_EL2
arm64: sysreg: Add new definitions for ID_AA64DFR0_EL1
arm64: sysreg: Migrate MDCR_EL2 definition to table
arm64: sysreg: Describe ID_AA64DFR2_EL1 fields
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
* kvm-arm64/mpam-ni:
: Hiding FEAT_MPAM from KVM guests, courtesy of James Morse + Joey Gouly
:
: Fix a longstanding bug where FEAT_MPAM was accidentally exposed to KVM
: guests + the EL2 trap configuration was not explicitly configured. As
: part of this, bring in skeletal support for initialising the MPAM CPU
: context so KVM can actually set traps for its guests.
:
: Be warned -- if this series leads to boot failures on your system,
: you're running on turd firmware.
:
: As an added bonus (that builds upon the infrastructure added by the MPAM
: series), allow userspace to configure CTR_EL0.L1Ip, courtesy of Shameer
: Kolothum.
KVM: arm64: Make L1Ip feature in CTR_EL0 writable from userspace
KVM: arm64: selftests: Test ID_AA64PFR0.MPAM isn't completely ignored
KVM: arm64: Disable MPAM visibility by default and ignore VMM writes
KVM: arm64: Add a macro for creating filtered sys_reg_descs entries
KVM: arm64: Fix missing traps of guest accesses to the MPAM registers
arm64: cpufeature: discover CPU support for MPAM
arm64: head.S: Initialise MPAM EL2 registers and disable traps
arm64/sysreg: Convert existing MPAM sysregs and add the remaining entries
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Only allow userspace to set VIPT(0b10) or PIPT(0b11) for L1Ip based on
what hardware reports as both AIVIVT (0b01) and VPIPT (0b00) are
documented as reserved.
Using a VIPT for Guest where hardware reports PIPT may lead to over
invalidation, but is still correct. Hence, we can allow downgrading
PIPT to VIPT, but not the other way around.
Reviewed-by: Sebastian Ott <sebott@redhat.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Link: https://lore.kernel.org/r/20241022073943.35764-1-shameerali.kolothum.thodi@huawei.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
The value of MDCR_EL2.HPMN controls the number of event counters made
visible to EL0 and EL1. This means it is possible for the guest
hypervisor to allow direct access to event counters to the L2.
Rework KVM's PMU register emulation to take the effects of HPMN into
account when handling a trap. For bitmask-style registers, writes only
affect accessible registers.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20241025182354.3364124-14-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Nested PMU support requires dynamically changing the visible range of
PMU counters based on the exception level and value of MDCR_EL2.HPMN. At
the same time, the PMU emulation code needs to know the absolute number
of implemented counters, regardless of context.
Rename the existing helper to make it obvious that it returns the number
of implemented counters and not anything else.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20241025182354.3364124-13-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
commit 011e5f5bf5 ("arm64/cpufeature: Add remaining feature bits in
ID_AA64PFR0 register") exposed the MPAM field of AA64PFR0_EL1 to guests,
but didn't add trap handling. A previous patch supplied the missing trap
handling.
Existing VMs that have the MPAM field of ID_AA64PFR0_EL1 set need to
be migratable, but there is little point enabling the MPAM CPU
interface on new VMs until there is something a guest can do with it.
Clear the MPAM field from the guest's ID_AA64PFR0_EL1 and on hardware
that supports MPAM, politely ignore the VMMs attempts to set this bit.
Guests exposed to this bug have the sanitised value of the MPAM field,
so only the correct value needs to be ignored. This means the field
can continue to be used to block migration to incompatible hardware
(between MPAM=1 and MPAM=5), and the VMM can't rely on the field
being ignored.
Signed-off-by: James Morse <james.morse@arm.com>
Co-developed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20241030160317.2528209-7-joey.gouly@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
The sys_reg_descs array holds function pointers and reset value for
managing the user-space and guest view of system registers. These
are mostly created by a set of macro's as only some combinations
of behaviour are needed.
If a register needs special treatment, its sys_reg_descs entry is
open-coded. This is true of some id registers where the value provided
by user-space is validated by some helpers.
Before adding another one of these, add a helper that covers the
existing special cases. 'ID_FILTERED' expects helpers to set the
user-space value, and retrieve the modified reset value.
Like ID_WRITABLE() this uses id_visibility(), which should have no
functional change for the registers converted to use ID_FILTERED().
read_sanitised_id_aa64dfr0_el1() and read_sanitised_id_aa64pfr0_el1()
have been refactored to be called from kvm_read_sanitised_id_reg(), to
try be consistent with ID_WRITABLE().
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20241030160317.2528209-6-joey.gouly@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
commit 011e5f5bf5 ("arm64/cpufeature: Add remaining feature bits in
ID_AA64PFR0 register") exposed the MPAM field of AA64PFR0_EL1 to guests,
but didn't add trap handling.
If you are unlucky, this results in an MPAM aware guest being delivered
an undef during boot. The host prints:
| kvm [97]: Unsupported guest sys_reg access at: ffff800080024c64 [00000005]
| { Op0( 3), Op1( 0), CRn(10), CRm( 5), Op2( 0), func_read },
Which results in:
| Internal error: Oops - Undefined instruction: 0000000002000000 [#1] PREEMPT SMP
| Modules linked in:
| CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.6.0-rc7-00559-gd89c186d50b2 #14616
| Hardware name: linux,dummy-virt (DT)
| pstate: 00000005 (nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| pc : test_has_mpam+0x18/0x30
| lr : test_has_mpam+0x10/0x30
| sp : ffff80008000bd90
...
| Call trace:
| test_has_mpam+0x18/0x30
| update_cpu_capabilities+0x7c/0x11c
| setup_cpu_features+0x14/0xd8
| smp_cpus_done+0x24/0xb8
| smp_init+0x7c/0x8c
| kernel_init_freeable+0xf8/0x280
| kernel_init+0x24/0x1e0
| ret_from_fork+0x10/0x20
| Code: 910003fd 97ffffde 72001c00 54000080 (d538a500)
| ---[ end trace 0000000000000000 ]---
| Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
| ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
Add the support to enable the traps, and handle the three guest accessible
registers by injecting an UNDEF. This stops KVM from spamming the host
log, but doesn't yet hide the feature from the id registers.
With MPAM v1.0 we can trap the MPAMIDR_EL1 register only if
ARM64_HAS_MPAM_HCR, with v1.1 an additional MPAM2_EL2.TIDR bit traps
MPAMIDR_EL1 on platforms that don't have MPAMHCR_EL2. Enable one of
these if either is supported. If neither is supported, the guest can
discover that the CPU has MPAM support, and how many PARTID etc the
host has ... but it can't influence anything, so its harmless.
Fixes: 011e5f5bf5 ("arm64/cpufeature: Add remaining feature bits in ID_AA64PFR0 register")
CC: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/linux-arm-kernel/20200925160102.118858-1-james.morse@arm.com/
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20241030160317.2528209-5-joey.gouly@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
S1POE support implies support for POR_EL2, which we provide by
- adding it to the vcpu_sysreg enum
- advertising it as mapped to its EL1 counterpart in get_el2_to_el1_mapping
- wiring it in the sys_reg_desc table with the correct visibility
- handling POR_EL1 in __vcpu_{read,write}_sys_reg_from_cpu()
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20241023145345.1613824-32-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Just like we have kvm_has_s1pie(), add its S1POE counterpart,
making the code slightly more readable.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20241023145345.1613824-31-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
With a visibility defined for these registers, there is no need
to check again for S1PIE or TCRX being implemented as perform_access()
already handles it.
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20241023145345.1613824-27-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
When the guest does not support S1PIE we should not allow any access
to the system registers it adds in order to ensure that we do not create
spurious issues with guest migration. Add a visibility operation for these
registers.
Fixes: 86f9de9db1 ("KVM: arm64: Save/restore PIE registers")
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20240822-kvm-arm64-hide-pie-regs-v2-3-376624fa829c@kernel.org
[maz: simplify by using __el2_visibility(), kvm_has_s1pie() throughout]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20241023145345.1613824-26-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
We are starting to have a bunch of visibility helpers checking
for EL2 + something else, and we are going to add more.
Simplify things somehow by introducing a helper that implement
extractly that by taking a visibility helper as a parameter,
and convert the existing ones to that.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20241023145345.1613824-23-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Add the FEAT_S1PIE EL2 registers the sysreg descriptor array so that
they can be handled as a trap.
Access to these registers is conditional based on ID_AA64MMFR3_EL1.S1PIE
being advertised.
Similarly to other other changes, PIRE0_EL2 is guaranteed to trap
thanks to the D22677 update to the architecture.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20241023145345.1613824-17-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
We currently only use the masking (RES0/RES1) facility for VNCR
registers, as they are memory-based and thus easy to sanitise.
But we could apply the same thing to other registers if we:
- split the sanitisation from __VNCR_START__
- apply the sanitisation when reading from a HW register
This involves a new "marker" in the vcpu_sysreg enum, which
defines the point at which the sanitisation applies (the VNCR
registers being of course after this marker).
Whle we are at it, rename kvm_vcpu_sanitise_vncr_reg() to
kvm_vcpu_apply_reg_masks(), which is vaguely more explicit,
and harden set_sysreg_masks() against setting masks for
random registers...
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20241023145345.1613824-10-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Add the TCR2_EL2 register to the per-vcpu sysreg register array,
the sysreg descriptor array, and advertise it as mapped to TCR2_EL1
for NV purposes.
Access to this register is conditional based on ID_AA64MMFR3_EL1.TCRX
being advertised.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20241023145345.1613824-12-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Accessing CNTHCTL_EL2 is fraught with danger if running with
HCR_EL2.E2H=1: half of the bits are held in CNTKCTL_EL1, and
thus can be changed behind our back, while the rest lives
in the CNTHCTL_EL2 shadow copy that is memory-based.
Yes, this is a lot of fun!
Make sure that we merge the two on read access, while we can
write to CNTKCTL_EL1 in a more straightforward manner.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20241023145345.1613824-7-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
As KVM has grown a bunch of new system register for NV, it appears
that we are missing them in the get_el2_to_el1_mapping() list.
Most of them are not crucial as they don't tend to be accessed via
vcpu_read_sys_reg() and vcpu_write_sys_reg().
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20241023145345.1613824-6-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Prior to commit 70ed723829 ("KVM: arm64: Sanitise ID_AA64MMFR3_EL1")
we just exposed the santised view of ID_AA64MMFR3_EL1 to guests, meaning
that they saw both TCRX and S1PIE if present on the host machine. That
commit added VMM control over the contents of the register and exposed
S1POE but removed S1PIE, meaning that the extension is no longer visible
to guests. Reenable support for S1PIE with VMM control.
Fixes: 70ed723829 ("KVM: arm64: Sanitise ID_AA64MMFR3_EL1")
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20241005-kvm-arm64-fix-s1pie-v1-1-5901f02de749@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
There's been a decent amount of attention around unmaps of nested MMUs,
and TLBI handling is no exception to this. Add a comment clarifying why
it is safe to reschedule during a TLBI unmap, even without a reference
on the MMU in progress.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241007233028.2236133-5-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
Right now the nested code allows unmap operations on a shadow stage-2 to
block unconditionally. This is wrong in a couple places, such as a
non-blocking MMU notifier or on the back of a sched_in() notifier as
part of shadow MMU recycling.
Carry through whether or not blocking is allowed to
kvm_pgtable_stage2_unmap(). This 'fixes' an issue where stage-2 MMU
reclaim would precipitate a stack overflow from a pile of kvm_sched_in()
callbacks, all trying to recycle a stage-2 MMU.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241007233028.2236133-3-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
* kvm-arm64/idregs-6.12:
: .
: Make some fields of ID_AA64DFR0_EL1 and ID_AA64PFR1_EL1
: writable from userspace, so that a VMM can influence the
: set of guest-visible features.
:
: - for ID_AA64DFR0_EL1: DoubleLock, WRPs, PMUVer and DebugVer
: are writable (courtesy of Shameer Kolothum)
:
: - for ID_AA64PFR1_EL1: BT, SSBS, CVS2_frac are writable
: (courtesy of Shaoqin Huang)
: .
KVM: selftests: aarch64: Add writable test for ID_AA64PFR1_EL1
KVM: arm64: Allow userspace to change ID_AA64PFR1_EL1
KVM: arm64: Use kvm_has_feat() to check if FEAT_SSBS is advertised to the guest
KVM: arm64: Disable fields that KVM doesn't know how to handle in ID_AA64PFR1_EL1
KVM: arm64: Make the exposed feature bits in AA64DFR0_EL1 writable from userspace
Signed-off-by: Marc Zyngier <maz@kernel.org>
* New Stage-2 page table dumper, reusing the main ptdump infrastructure
* FP8 support
* Nested virtualization now supports the address translation (FEAT_ATS1A)
family of instructions
* Add selftest checks for a bunch of timer emulation corner cases
* Fix multiple cases where KVM/arm64 doesn't correctly handle the guest
trying to use a GICv3 that wasn't advertised
* Remove REG_HIDDEN_USER from the sysreg infrastructure, making
things little simpler
* Prevent MTE tags being restored by userspace if we are actively
logging writes, as that's a recipe for disaster
* Correct the refcount on a page that is not considered for MTE tag
copying (such as a device)
* When walking a page table to split block mappings, synchronize only
at the end the walk rather than on every store
* Fix boundary check when transfering memory using FFA
* Fix pKVM TLB invalidation, only affecting currently out of tree
code but worth addressing for peace of mind
LoongArch:
* Revert qspinlock to test-and-set simple lock on VM.
* Add Loongson Binary Translation extension support.
* Add PMU support for guest.
* Enable paravirt feature control from VMM.
* Implement function kvm_para_has_feature().
RISC-V:
* Fix sbiret init before forwarding to userspace
* Don't zero-out PMU snapshot area before freeing data
* Allow legacy PMU access from guest
* Fix to allow hpmcounter31 from the guest
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmbmghAUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroPFQgf+Ijeqlx90BGy96pyzo/NkYKPeEc8G
gKhlm8PdtdZYaRdJ53MVRLLpzbLuzqbwrn0ZX2tvoDRLzuAqTt2GTFoT6e2HtY5B
Sf7KQMFwHWGtGklC1EmZ1fXsCocswpuAcexCLKLRBoWUcKABlgwV3N3vJo5gx/Ag
8XXhYpcLTh+p7bjMdJShQy019pTwEDE68pPVnL2NPzla1G6Qox7ZJIdOEMZXuyJA
MJ4jbFWE/T8vLFUf/8MGQ/+bo+4140kzB8N9wkazNcBRoodY6Hx+Lm1LiZjNudO1
ilIdB4P3Ht+D8UuBv2DO5XTakfJz9T9YsoRcPlwrOWi/8xBRbt236gFB3Q==
=sHTI
-----END PGP SIGNATURE-----
Merge tag 'for-linus-non-x86' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"These are the non-x86 changes (mostly ARM, as is usually the case).
The generic and x86 changes will come later"
ARM:
- New Stage-2 page table dumper, reusing the main ptdump
infrastructure
- FP8 support
- Nested virtualization now supports the address translation
(FEAT_ATS1A) family of instructions
- Add selftest checks for a bunch of timer emulation corner cases
- Fix multiple cases where KVM/arm64 doesn't correctly handle the
guest trying to use a GICv3 that wasn't advertised
- Remove REG_HIDDEN_USER from the sysreg infrastructure, making
things little simpler
- Prevent MTE tags being restored by userspace if we are actively
logging writes, as that's a recipe for disaster
- Correct the refcount on a page that is not considered for MTE tag
copying (such as a device)
- When walking a page table to split block mappings, synchronize only
at the end the walk rather than on every store
- Fix boundary check when transfering memory using FFA
- Fix pKVM TLB invalidation, only affecting currently out of tree
code but worth addressing for peace of mind
LoongArch:
- Revert qspinlock to test-and-set simple lock on VM.
- Add Loongson Binary Translation extension support.
- Add PMU support for guest.
- Enable paravirt feature control from VMM.
- Implement function kvm_para_has_feature().
RISC-V:
- Fix sbiret init before forwarding to userspace
- Don't zero-out PMU snapshot area before freeing data
- Allow legacy PMU access from guest
- Fix to allow hpmcounter31 from the guest"
* tag 'for-linus-non-x86' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (64 commits)
LoongArch: KVM: Implement function kvm_para_has_feature()
LoongArch: KVM: Enable paravirt feature control from VMM
LoongArch: KVM: Add PMU support for guest
KVM: arm64: Get rid of REG_HIDDEN_USER visibility qualifier
KVM: arm64: Simplify visibility handling of AArch32 SPSR_*
KVM: arm64: Simplify handling of CNTKCTL_EL12
LoongArch: KVM: Add vm migration support for LBT registers
LoongArch: KVM: Add Binary Translation extension support
LoongArch: KVM: Add VM feature detection function
LoongArch: Revert qspinlock to test-and-set simple lock on VM
KVM: arm64: Register ptdump with debugfs on guest creation
arm64: ptdump: Don't override the level when operating on the stage-2 tables
arm64: ptdump: Use the ptdump description from a local context
arm64: ptdump: Expose the attribute parsing functionality
KVM: arm64: Add memory length checks and remove inline in do_ffa_mem_xfer
KVM: arm64: Move pagetable definitions to common header
KVM: arm64: nv: Add support for FEAT_ATS1A
KVM: arm64: nv: Plumb handling of AT S1* traps from EL2
KVM: arm64: nv: Make AT+PAN instructions aware of FEAT_PAN3
KVM: arm64: nv: Sanitise SCTLR_EL1.EPAN according to VM configuration
...
ACPI:
* Enable PMCG erratum workaround for HiSilicon HIP10 and 11 platforms.
* Ensure arm64-specific IORT header is covered by MAINTAINERS.
CPU Errata:
* Enable workaround for hardware access/dirty issue on Ampere-1A cores.
Memory management:
* Define PHYSMEM_END to fix a crash in the amdgpu driver.
* Avoid tripping over invalid kernel mappings on the kexec() path.
* Userspace support for the Permission Overlay Extension (POE) using
protection keys.
Perf and PMUs:
* Add support for the "fixed instruction counter" extension in the CPU
PMU architecture.
* Extend and fix the event encodings for Apple's M1 CPU PMU.
* Allow LSM hooks to decide on SPE permissions for physical profiling.
* Add support for the CMN S3 and NI-700 PMUs.
Confidential Computing:
* Add support for booting an arm64 kernel as a protected guest under
Android's "Protected KVM" (pKVM) hypervisor.
Selftests:
* Fix vector length issues in the SVE/SME sigreturn tests
* Fix build warning in the ptrace tests.
Timers:
* Add support for PR_{G,S}ET_TSC so that 'rr' can deal with
non-determinism arising from the architected counter.
Miscellaneous:
* Rework our IPI-based CPU stopping code to try NMIs if regular IPIs
don't succeed.
* Minor fixes and cleanups.
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmbkVNEQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNKeIB/9YtbN7JMgsXktM94GP03r3tlFF36Y1S51S
+zdDZclAVZCTCZN+PaFeAZ/+ah2EQYrY6rtDoHUSEMQdF9kH+ycuIPDTwaJ4Qkam
QKXMpAgtY/4yf2rX4lhDF8rEvkhLDsu7oGDhqUZQsA33GrMBHfgA3oqpYwlVjvGq
gkm7olTo9LdWAxkPpnjGrjB6Mv5Dq8dJRhW+0Q5AntI5zx3RdYGJZA9GUSzyYCCt
FIYOtMmWPkQ0kKxIVxOxAOm/ubhfyCs2sjSfkaa3vtvtt+Yjye1Xd81rFciIbPgP
QlK/Mes2kBZmjhkeus8guLI5Vi7tx3DQMkNqLXkHAAzOoC4oConE
=6osL
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"The highlights are support for Arm's "Permission Overlay Extension"
using memory protection keys, support for running as a protected guest
on Android as well as perf support for a bunch of new interconnect
PMUs.
Summary:
ACPI:
- Enable PMCG erratum workaround for HiSilicon HIP10 and 11
platforms.
- Ensure arm64-specific IORT header is covered by MAINTAINERS.
CPU Errata:
- Enable workaround for hardware access/dirty issue on Ampere-1A
cores.
Memory management:
- Define PHYSMEM_END to fix a crash in the amdgpu driver.
- Avoid tripping over invalid kernel mappings on the kexec() path.
- Userspace support for the Permission Overlay Extension (POE) using
protection keys.
Perf and PMUs:
- Add support for the "fixed instruction counter" extension in the
CPU PMU architecture.
- Extend and fix the event encodings for Apple's M1 CPU PMU.
- Allow LSM hooks to decide on SPE permissions for physical
profiling.
- Add support for the CMN S3 and NI-700 PMUs.
Confidential Computing:
- Add support for booting an arm64 kernel as a protected guest under
Android's "Protected KVM" (pKVM) hypervisor.
Selftests:
- Fix vector length issues in the SVE/SME sigreturn tests
- Fix build warning in the ptrace tests.
Timers:
- Add support for PR_{G,S}ET_TSC so that 'rr' can deal with
non-determinism arising from the architected counter.
Miscellaneous:
- Rework our IPI-based CPU stopping code to try NMIs if regular IPIs
don't succeed.
- Minor fixes and cleanups"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (94 commits)
perf: arm-ni: Fix an NULL vs IS_ERR() bug
arm64: hibernate: Fix warning for cast from restricted gfp_t
arm64: esr: Define ESR_ELx_EC_* constants as UL
arm64: pkeys: remove redundant WARN
perf: arm_pmuv3: Use BR_RETIRED for HW branch event if enabled
MAINTAINERS: List Arm interconnect PMUs as supported
perf: Add driver for Arm NI-700 interconnect PMU
dt-bindings/perf: Add Arm NI-700 PMU
perf/arm-cmn: Improve format attr printing
perf/arm-cmn: Clean up unnecessary NUMA_NO_NODE check
arm64/mm: use lm_alias() with addresses passed to memblock_free()
mm: arm64: document why pte is not advanced in contpte_ptep_set_access_flags()
arm64: Expose the end of the linear map in PHYSMEM_END
arm64: trans_pgd: mark PTEs entries as valid to avoid dead kexec()
arm64/mm: Delete __init region from memblock.reserved
perf/arm-cmn: Support CMN S3
dt-bindings: perf: arm-cmn: Add CMN S3
perf/arm-cmn: Refactor DTC PMU register access
perf/arm-cmn: Make cycle counts less surprising
perf/arm-cmn: Improve build-time assertion
...
* for-next/poe: (31 commits)
arm64: pkeys: remove redundant WARN
kselftest/arm64: Add test case for POR_EL0 signal frame records
kselftest/arm64: parse POE_MAGIC in a signal frame
kselftest/arm64: add HWCAP test for FEAT_S1POE
selftests: mm: make protection_keys test work on arm64
selftests: mm: move fpregs printing
kselftest/arm64: move get_header()
arm64: add Permission Overlay Extension Kconfig
arm64: enable PKEY support for CPUs with S1POE
arm64: enable POE and PIE to coexist
arm64/ptrace: add support for FEAT_POE
arm64: add POE signal support
arm64: implement PKEYS support
arm64: add pte_access_permitted_no_overlay()
arm64: handle PKEY/POE faults
arm64: mask out POIndex when modifying a PTE
arm64: convert protection key into vm_flags and pgprot values
arm64: add POIndex defines
arm64: re-order MTE VM_ flags
arm64: enable the Permission Overlay Extension for EL0
...
* kvm-arm64/visibility-cleanups:
: .
: Remove REG_HIDDEN_USER from the sysreg infrastructure, making things
: a little more simple. From the cover letter:
:
: "Since 4d4f52052b ("KVM: arm64: nv: Drop EL12 register traps that are
: redirected to VNCR") and the admission that KVM would never be supporting
: the original FEAT_NV, REG_HIDDEN_USER only had a few users, all of which
: could either be replaced by a more ad-hoc mechanism, or removed altogether."
: .
KVM: arm64: Get rid of REG_HIDDEN_USER visibility qualifier
KVM: arm64: Simplify visibility handling of AArch32 SPSR_*
KVM: arm64: Simplify handling of CNTKCTL_EL12
Signed-off-by: Marc Zyngier <maz@kernel.org>
* kvm-arm64/nv-at-pan:
: .
: Add NV support for the AT family of instructions, which mostly results
: in adding a page table walker that deals with most of the complexity
: of the architecture.
:
: From the cover letter:
:
: "Another task that a hypervisor supporting NV on arm64 has to deal with
: is to emulate the AT instruction, because we multiplex all the S1
: translations on a single set of registers, and the guest S2 is never
: truly resident on the CPU.
:
: So given that we lie about page tables, we also have to lie about
: translation instructions, hence the emulation. Things are made
: complicated by the fact that guest S1 page tables can be swapped out,
: and that our shadow S2 is likely to be incomplete. So while using AT
: to emulate AT is tempting (and useful), it is not going to always
: work, and we thus need a fallback in the shape of a SW S1 walker."
: .
KVM: arm64: nv: Add support for FEAT_ATS1A
KVM: arm64: nv: Plumb handling of AT S1* traps from EL2
KVM: arm64: nv: Make AT+PAN instructions aware of FEAT_PAN3
KVM: arm64: nv: Sanitise SCTLR_EL1.EPAN according to VM configuration
KVM: arm64: nv: Add SW walker for AT S1 emulation
KVM: arm64: nv: Make ps_to_output_size() generally available
KVM: arm64: nv: Add emulation of AT S12E{0,1}{R,W}
KVM: arm64: nv: Add basic emulation of AT S1E2{R,W}
KVM: arm64: nv: Add basic emulation of AT S1E1{R,W}P
KVM: arm64: nv: Add basic emulation of AT S1E{0,1}{R,W}
KVM: arm64: nv: Honor absence of FEAT_PAN2
KVM: arm64: nv: Turn upper_attr for S2 walk into the full descriptor
KVM: arm64: nv: Enforce S2 alignment when contiguous bit is set
arm64: Add ESR_ELx_FSC_ADDRSZ_L() helper
arm64: Add system register encoding for PSTATE.PAN
arm64: Add PAR_EL1 field description
arm64: Add missing APTable and TCR_ELx.HPD masks
KVM: arm64: Make kvm_at() take an OP_AT_*
Signed-off-by: Marc Zyngier <maz@kernel.org>
# Conflicts:
# arch/arm64/kvm/nested.c
* kvm-arm64/vgic-sre-traps:
: .
: Fix the multiple of cases where KVM/arm64 doesn't correctly
: handle the guest trying to use a GICv3 that isn't advertised.
:
: From the cover letter:
:
: "It recently appeared that, when running on a GICv3-equipped platform
: (which is what non-ancient arm64 HW has), *not* configuring a GICv3
: for the guest could result in less than desirable outcomes.
:
: We have multiple issues to fix:
:
: - for registers that *always* trap (the SGI registers) or that *may*
: trap (the SRE register), we need to check whether a GICv3 has been
: instantiated before acting upon the trap.
:
: - for registers that only conditionally trap, we must actively trap
: them even in the absence of a GICv3 being instantiated, and handle
: those traps accordingly.
:
: - finally, ID registers must reflect the absence of a GICv3, so that
: we are consistent.
:
: This series goes through all these requirements. The main complexity
: here is to apply a GICv3 configuration on the host in the absence of a
: GICv3 in the guest. This is pretty hackish, but I don't have a much
: better solution so far.
:
: As part of making wider use of of the trap bits, we fully define the
: trap routing as per the architecture, something that we eventually
: need for NV anyway."
: .
KVM: arm64: selftests: Cope with lack of GICv3 in set_id_regs
KVM: arm64: Add selftest checking how the absence of GICv3 is handled
KVM: arm64: Unify UNDEF injection helpers
KVM: arm64: Make most GICv3 accesses UNDEF if they trap
KVM: arm64: Honor guest requested traps in GICv3 emulation
KVM: arm64: Add trap routing information for ICH_HCR_EL2
KVM: arm64: Add ICH_HCR_EL2 to the vcpu state
KVM: arm64: Zero ID_AA64PFR0_EL1.GIC when no GICv3 is presented to the guest
KVM: arm64: Add helper for last ditch idreg adjustments
KVM: arm64: Force GICv3 trap activation when no irqchip is configured on VHE
KVM: arm64: Force SRE traps when SRE access is not enabled
KVM: arm64: Move GICv3 trap configuration to kvm_calculate_traps()
Signed-off-by: Marc Zyngier <maz@kernel.org>
* kvm-arm64/fpmr:
: .
: Add FP8 support to the KVM/arm64 floating point handling.
:
: This includes new ID registers (ID_AA64PFR2_EL1 ID_AA64FPFR0_EL1)
: being made visible to guests, as well as a new confrol register
: (FPMR) which gets context-switched.
: .
KVM: arm64: Expose ID_AA64PFR2_EL1 to userspace and guests
KVM: arm64: Enable FP8 support when available and configured
KVM: arm64: Expose ID_AA64FPFR0_EL1 as a writable ID reg
KVM: arm64: Honor trap routing for FPMR
KVM: arm64: Add save/restore support for FPMR
KVM: arm64: Move FPMR into the sysreg array
KVM: arm64: Add predicate for FPMR support in a VM
KVM: arm64: Move SVCR into the sysreg array
Signed-off-by: Marc Zyngier <maz@kernel.org>
Now that REG_HIDDEN_USER has no direct user anymore, remove it
entirely and update all users of sysreg_hidden_user() to call
sysreg_hidden() instead.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240904082419.1982402-4-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Since SPSR_* are not associated with any register in the sysreg array,
nor do they have .get_user()/.set_user() helpers, they are invisible to
userspace with that encoding.
Therefore hidden_user_visibility() serves no purpose here, and can be
safely removed.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240904082419.1982402-3-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
We go trough a great deal of effort to map CNTKCTL_EL12 to CNTKCTL_EL1
while hidding this mapping from userspace via a special visibility helper.
However, it would be far simpler to just provide an accessor doing the
mapping job, removing the need for a visibility helper.
With that done, we can also remove the EL12_REG() macro which serves
no purpose.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240904082419.1982402-2-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Add the missing sanitisation of ID_AA64MMFR3_EL1, making sure we
solely expose S1POE and TCRX (we currently don't support anything
else).
[joey: Took Marc's patch for S1PIE, and changed it for S1POE]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20240822151113.1479789-11-joey.gouly@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Define the new system registers that POE introduces and context switch them.
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240822151113.1479789-8-joey.gouly@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Handling FEAT_ATS1A (which provides the AT S1E{1,2}A instructions)
is pretty easy, as it is just the usual AT without the permission
check.
This basically amounts to plumbing the instructions in the various
dispatch tables, and handling FEAT_ATS1A being disabled in the
ID registers.
Signed-off-by: Marc Zyngier <maz@kernel.org>
If our guest has been configured without PAN2, make sure that
AT S1E1{R,W}P will generate an UNDEF.
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
We currently have two helpers (undef_access() and trap_undef()) that
do exactly the same thing: inject an UNDEF and return 'false' (as an
indication that PC should not be incremented).
We definitely could do with one less. Given that undef_access() is
used 80ish times, while trap_undef() is only used 30 times, the
latter loses the battle and is immediately sacrificed.
We also have a large number of instances where undef_access() is
open-coded. Let's also convert those.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240827152517.3909653-11-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
We don't expect to trap any GICv3 register for host handling,
apart from ICC_SRE_EL1 and the SGI registers. If they trap,
that's because the guest is playing with us despite being
told it doesn't have a GICv3.
If it does, UNDEF is what it will get.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240827152517.3909653-10-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
As we are about to describe the trap routing for ICH_HCR_EL2, add
the register to the vcpu state in its VNCR form, as well as reset
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240827152517.3909653-7-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
In order to be consistent, we shouldn't advertise a GICv3 when none
is actually usable by the guest.
Wipe the feature when these conditions apply, and allow the field
to be written from userspace.
This now allows us to rewrite the kvm_has_gicv3 helper() in terms
of kvm_has_feat(), given that it is always evaluated at runtime.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240827152517.3909653-6-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
We already have to perform a set of last-chance adjustments for
NV purposes. We will soon have to do the same for the GIC, so
introduce a helper for that exact purpose.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240827152517.3909653-5-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Follow the pattern introduced with vcpu_set_hcr(), and introduce
vcpu_set_ich_hcr(), which configures the GICv3 traps at the same
point.
This will allow future changes to introduce trap configuration on
a per-VM basis.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240827152517.3909653-2-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>