Commit graph

100 commits

Author SHA1 Message Date
Paolo Bonzini
314b40b3b6 KVM/arm64 changes for 6.17, round #1
- Host driver for GICv5, the next generation interrupt controller for
    arm64, including support for interrupt routing, MSIs, interrupt
    translation and wired interrupts.
 
  - Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on
    GICv5 hardware, leveraging the legacy VGIC interface.
 
  - Userspace control of the 'nASSGIcap' GICv3 feature, allowing
    userspace to disable support for SGIs w/o an active state on hardware
    that previously advertised it unconditionally.
 
  - Map supporting endpoints with cacheable memory attributes on systems
    with FEAT_S2FWB and DIC where KVM no longer needs to perform cache
    maintenance on the address range.
 
  - Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the guest
    hypervisor to inject external aborts into an L2 VM and take traps of
    masked external aborts to the hypervisor.
 
  - Convert more system register sanitization to the config-driven
    implementation.
 
  - Fixes to the visibility of EL2 registers, namely making VGICv3 system
    registers accessible through the VGIC device instead of the ONE_REG
    vCPU ioctls.
 
  - Various cleanups and minor fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iI0EABYIADUWIQSNXHjWXuzMZutrKNKivnWIJHzdFgUCaIezbRccb2xpdmVyLnVw
 dG9uQGxpbnV4LmRldgAKCRCivnWIJHzdFr/eAQDY5NIG5cR6ZcAWnPQLmGWpz2ou
 pq4Jhn9E/mGR3n5L1AEAsJpfLLpOsmnLBdwfbjmW59gGsa8k3i5tjWEOJ6yzAwk=
 =r+sp
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-6.17' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 changes for 6.17, round #1

 - Host driver for GICv5, the next generation interrupt controller for
   arm64, including support for interrupt routing, MSIs, interrupt
   translation and wired interrupts.

 - Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on
   GICv5 hardware, leveraging the legacy VGIC interface.

 - Userspace control of the 'nASSGIcap' GICv3 feature, allowing
   userspace to disable support for SGIs w/o an active state on hardware
   that previously advertised it unconditionally.

 - Map supporting endpoints with cacheable memory attributes on systems
   with FEAT_S2FWB and DIC where KVM no longer needs to perform cache
   maintenance on the address range.

 - Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the guest
   hypervisor to inject external aborts into an L2 VM and take traps of
   masked external aborts to the hypervisor.

 - Convert more system register sanitization to the config-driven
   implementation.

 - Fixes to the visibility of EL2 registers, namely making VGICv3 system
   registers accessible through the VGIC device instead of the ONE_REG
   vCPU ioctls.

 - Various cleanups and minor fixes.
2025-07-29 12:27:40 -04:00
Oliver Upton
d9b9fa2c32 Merge branch 'kvm-arm64/config-masks' into kvmarm/next
* kvm-arm64/config-masks:
  : More config-driven mask computation, courtesy of Marc Zyngier
  :
  : Converts more system registers to the config-driven computation of RESx
  : masks based on the advertised feature set
  KVM: arm64: Tighten the definition of FEAT_PMUv3p9
  KVM: arm64: Convert MDCR_EL2 to config-driven sanitisation
  KVM: arm64: Convert SCTLR_EL1 to config-driven sanitisation
  KVM: arm64: Convert TCR2_EL2 to config-driven sanitisation
  arm64: sysreg: Add THE/ASID2 controls to TCR2_ELx

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-28 08:03:08 -07:00
Marc Zyngier
cd64587f10 KVM: arm64: Convert MDCR_EL2 to config-driven sanitisation
As for other registers, convert the determination of the RES0 bits
affecting MDCR_EL2 to be driven by a table extracted from the 2025-06
JSON drop

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250714115503.3334242-5-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-15 20:39:42 -07:00
Marc Zyngier
6bd4a274b0 KVM: arm64: Convert SCTLR_EL1 to config-driven sanitisation
As for other registers, convert the determination of the RES0 bits
affecting SCTLR_EL1 to be driven by a table extracted from the 2025-06
JSON drop

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250714115503.3334242-4-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-15 20:39:42 -07:00
Marc Zyngier
001e032c0f KVM: arm64: Convert TCR2_EL2 to config-driven sanitisation
As for other registers, convert the determination of the RES0 bits
affecting TCR2_EL2 to be driven by a table extracted from the 2025-06
JSON drop.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250714115503.3334242-3-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-15 20:39:42 -07:00
Oliver Upton
abc693fef3 KVM: arm64: Describe SCTLR2_ELx RESx masks
External abort injection will soon rely on a sanitised view of
SCTLR2_ELx to determine exception routing. Compute the RESx masks.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-15-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-08 11:36:35 -07:00
Oliver Upton
a99456abd8 KVM: arm64: nv: Advertise support for FEAT_RAS
Now that the missing bits for vSError injection/deferral have been added
we can merrily claim support for FEAT_RAS.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-10-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-08 11:36:34 -07:00
Oliver Upton
77ee70a073 KVM: arm64: nv: Honor SError exception routing / masking
To date KVM has used HCR_EL2.VSE to track the state of a pending SError
for the guest. With this bit set, hardware respects the EL1 exception
routing / masking rules and injects the vSError when appropriate.

This isn't correct for NV guests as hardware is oblivious to vEL2's
intentions for SErrors. Better yet, with FEAT_NV2 the guest can change
the routing behind our back as HCR_EL2 is redirected to memory. Cope
with this mess by:

 - Using a flag (instead of HCR_EL2.VSE) to track the pending SError
   state when SErrors are unconditionally masked for the current context

 - Resampling the routing / masking of a pending SError on every guest
   entry/exit

 - Emulating exception entry when SError routing implies a translation
   regime change

Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-7-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-08 11:36:31 -07:00
Marc Zyngier
105485a182 KVM: arm64: Fix handling of FEAT_GTG for unimplemented granule sizes
Booting an EL2 guest on a system only supporting a subset of the
possible page sizes leads to interesting situations.

For example, on a system that only supports 4kB and 64kB, and is
booted with a 4kB kernel, we end-up advertising 16kB support at
stage-2, which is pretty weird.

That's because we consider that any S2 bigger than our base granule
is fair game, irrespective of what the HW actually supports. While this
is not impossible to support (KVM would happily handle it), it is likely
to be confusing for the guest.

Add new checks that will verify that this granule size is actually
supported before publishing it to the guest.

Fixes: e7ef6ed458 ("KVM: arm64: Enforce NV limits on a per-idregs basis")
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-07-03 10:39:24 +01:00
Marc Zyngier
8800b7c4bb KVM: arm64: Add RMW specific sysreg accessor
In a number of cases, we perform a Read-Modify-Write operation on
a system register, meaning that we would apply the RESx masks twice.

Instead, provide a new accessor that performs this RMW operation,
allowing the masks to be applied exactly once per operation.

Reviewed-by: Miguel Luis <miguel.luis@oracle.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250603070824.1192795-3-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-06-05 14:18:01 +01:00
Marc Zyngier
6673047405 KVM: arm64: Mask out non-VA bits from TLBI VA* on VNCR invalidation
When handling a TLBI VA* instruction that potentially targets a
VNCR page mapping, we fail to mask out the top bits that contain
the ASID and TTL fields, hence potentially failing the VA check
in the TLB code.

An additional wrinkle is that we fail to sign extend the VA,
again leading to failed VA checks.

Fix both in one go by sign-extending the VA from bit 48, making
it comparable to the way we interpret VNCR_EL2.BADDR.

Fixes: 4ffa72ad8f ("KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2")
Link: https://lore.kernel.org/r/20250525175759.780891-1-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-30 09:09:16 +01:00
Marc Zyngier
7f3225fe8b Merge branch kvm-arm64/nv-nv into kvmarm-master/next
* kvm-arm64/nv-nv:
  : .
  : Flick the switch on the NV support by adding the missing piece
  : in the form of the VNCR page management. From the cover letter:
  :
  : "This is probably the most interesting bit of the whole NV adventure.
  : So far, everything else has been a walk in the park, but this one is
  : where the real fun takes place.
  :
  : With FEAT_NV2, most of the NV support revolves around tricking a guest
  : into accessing memory while it tries to access system registers. The
  : hypervisor's job is to handle the context switch of the actual
  : registers with the state in memory as needed."
  : .
  KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical section
  KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock held
  KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating
  KVM: arm64: Document NV caps and vcpu flags
  KVM: arm64: Allow userspace to request KVM_ARM_VCPU_EL2*
  KVM: arm64: nv: Remove dead code from ERET handling
  KVM: arm64: nv: Plumb TLBI S1E2 into system instruction dispatch
  KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2
  KVM: arm64: nv: Program host's VNCR_EL2 to the fixmap address
  KVM: arm64: nv: Handle VNCR_EL2 invalidation from MMU notifiers
  KVM: arm64: nv: Handle mapping of VNCR_EL2 at EL2
  KVM: arm64: nv: Handle VNCR_EL2-triggered faults
  KVM: arm64: nv: Add userspace and guest handling of VNCR_EL2
  KVM: arm64: nv: Add pseudo-TLB backing VNCR_EL2
  KVM: arm64: nv: Don't adjust PSTATE.M when L2 is nesting
  KVM: arm64: nv: Move TLBI range decoding to a helper
  KVM: arm64: nv: Snapshot S1 ASID tagging information during walk
  KVM: arm64: nv: Extract translation helper from the AT code
  KVM: arm64: nv: Allocate VNCR page when required
  arm64: sysreg: Add layout for VNCR_EL2

Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-23 10:58:57 +01:00
Marc Zyngier
538fbac740 KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical section
The conversion to kvm_release_faultin_page() missed the requirement
for this to be called within a critical section with mmu_lock held
for write. Move this call up to satisfy this requirement.

Fixes: 069a05e535 ("KVM: arm64: nv: Handle VNCR_EL2-triggered faults")
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21 11:40:12 +01:00
Marc Zyngier
beab7d0583 KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock held
Calling invalidate_vncr_va() without the mmu_lock held for write
is a bad idea, and lockdep tells you about that.

Fixes: 4ffa72ad8f ("KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2")
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21 11:40:12 +01:00
Marc Zyngier
d43548f422 KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating
When translating a VNCR translation fault, we start by marking the
current SW-managed TLB as invalid, so that we can populate it
in place. This is, however, done without the mmu_lock held.

A consequence of this is that another CPU dealing with TLBI
emulation can observe a translation still flagged as valid, but
with invalid walk results (such as pgshift being 0). Bad things
can result from this, such as a BUG() in pgshift_level_to_ttl().

Fix it by taking the mmu_lock for write to perform this local
invalidation, and use invalidate_vncr() instead of open-coding
the write to the 'valid' flag.

Fixes: 069a05e535 ("KVM: arm64: nv: Handle VNCR_EL2-triggered faults")
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250520144116.3667978-1-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21 09:53:08 +01:00
Marc Zyngier
4bc0fe0898 KVM: arm64: Add sanitisation for FEAT_FGT2 registers
Just like the FEAT_FGT registers, treat the FGT2 variant the same
way. THis is a large  update, but a fairly mechanical one.

The config dependencies are extracted from the 2025-03 JSON drop.

Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19 11:36:10 +01:00
Marc Zyngier
b2a324ff01 KVM: arm64: Use HCR_EL2 feature map to drive fixed-value bits
Similarly to other registers, describe which HCR_EL2 bit depends
on which feature, and use this to compute the RES0 status of these
bits.

An additional complexity stems from the status of some bits such
as E2H and RW, which do not had a RESx status, but still take
a fixed value due to implementation choices in KVM.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19 11:35:30 +01:00
Marc Zyngier
beed444841 KVM: arm64: Use HCRX_EL2 feature map to drive fixed-value bits
Similarly to other registers, describe which HCR_EL2 bit depends
on which feature, and use this to compute the RES0 status of these
bits.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19 11:35:30 +01:00
Marc Zyngier
c6cbe6a4c1 KVM: arm64: Use FGT feature maps to drive RES0 bits
Another benefit of mapping bits to features is that it becomes trivial
to define which bits should be handled as RES0.

Let's apply this principle to the guest's view of the FGT registers.

Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19 11:35:00 +01:00
Marc Zyngier
4ffa72ad8f KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2
A TLBI by VA for S1 must take effect on our pseudo-TLB for VNCR
and potentially knock the fixmap mapping. Even worse, that TLBI
must be able to work cross-vcpu.

For that, we track on a per-VM basis if any VNCR is mapped, using
an atomic counter. Whenever a TLBI S1E2 occurs and that this counter
is non-zero, we take the long road all the way back to the core code.

There, we iterate over all vcpus and check whether this particular
invalidation has any damaging effect. If it does, we nuke the pseudo
TLB and the corresponding fixmap.

Yes, this is costly.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250514103501.2225951-14-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19 08:01:19 +01:00
Marc Zyngier
7270cc9157 KVM: arm64: nv: Handle VNCR_EL2 invalidation from MMU notifiers
During an invalidation triggered by an MMU notifier, we need to
make sure we can drop the *host* mapping that would have been
translated by the stage-2 mapping being invalidated.

For the moment, the invalidation is pretty brutal, as we nuke
the full IPA range, and therefore any VNCR_EL2 mapping.

At some point, we'll be more light-weight, and the code is able
to deal with something more targetted.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250514103501.2225951-12-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19 08:01:19 +01:00
Marc Zyngier
2a359e0725 KVM: arm64: nv: Handle mapping of VNCR_EL2 at EL2
Now that we can handle faults triggered through VNCR_EL2, we need
to map the corresponding page at EL2. But where, you'll ask?

Since each CPU in the system can run a vcpu, we need a per-CPU
mapping. For that, we carve a NR_CPUS range in the fixmap, giving
us a per-CPU va at which to map the guest's VNCR's page.

The mapping occurs both on vcpu load and on the back of a fault,
both generating a request that will take care of the mapping.
That mapping will also get dropped on vcpu put.

Yes, this is a bit heavy handed, but it is simple. Eventually,
we may want to have a per-VM, per-CPU mapping, which would avoid
all the TLBI overhead.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250514103501.2225951-11-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19 08:01:19 +01:00
Marc Zyngier
069a05e535 KVM: arm64: nv: Handle VNCR_EL2-triggered faults
As VNCR_EL2.BADDR contains a VA, it is bound to trigger faults.

These faults can have multiple source:

- We haven't mapped anything on the host: we need to compute the
  resulting translation, populate a TLB, and eventually map
  the corresponding page

- The permissions are out of whack: we need to tell the guest about
  this state of affairs

Note that the kernel doesn't support S1POE for itself yet, so
the particular case of a VNCR page mapped with no permissions
or with write-only permissions is not correctly handled yet.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250514103501.2225951-10-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19 08:01:19 +01:00
Marc Zyngier
6fb75733f1 KVM: arm64: nv: Add userspace and guest handling of VNCR_EL2
Plug VNCR_EL2 in the vcpu_sysreg enum, define its RES0/RES1 bits,
and make it accessible to userspace when the VM is configured to
support FEAT_NV2.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250514103501.2225951-9-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19 07:59:46 +01:00
Marc Zyngier
ea8d3cf46d KVM: arm64: nv: Add pseudo-TLB backing VNCR_EL2
FEAT_NV2 introduces an interesting problem for NV, as VNCR_EL2.BADDR
is a virtual address in the EL2&0 (or EL2, but we thankfully ignore
this) translation regime.

As we need to replicate such mapping in the real EL2, it means that
we need to remember that there is such a translation, and that any
TLBI affecting EL2 can possibly affect this translation.

It also means that any invalidation driven by an MMU notifier must
be able to shoot down any such mapping.

All in all, we need a data structure that represents this mapping,
and that is extremely close to a TLB. Given that we can only use
one of those per vcpu at any given time, we only allocate one.

No effort is made to keep that structure small. If we need to
start caching multiple of them, we may want to revisit that design
point. But for now, it is kept simple so that we can reason about it.

Oh, and add a braindump of how things are supposed to work, because
I will definitely page this out at some point. Yes, pun intended.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250514103501.2225951-8-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19 07:59:46 +01:00
Marc Zyngier
469c4713d4 KVM: arm64: nv: Allocate VNCR page when required
If running a NV guest on an ARMv8.4-NV capable system, let's
allocate an additional page that will be used by the hypervisor
to fulfill system register accesses.

Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250514103501.2225951-3-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19 07:59:46 +01:00
Marc Zyngier
ef6d7d2682 KVM: arm64: Use KVM-specific HCRX_EL2 RES0 mask
We do not have a computed table for HCRX_EL2, so statically define
the bits we know about. A warning will fire if the architecture
grows bits that are not handled yet.

Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-10 11:04:35 +01:00
Marc Zyngier
7ed43d84c1 KVM: arm64: Use computed masks as sanitisers for FGT registers
Now that we have computed RES0 bits, use them to sanitise the
guest view of FGT registers.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06 17:35:25 +01:00
Marc Zyngier
0f013a524b arm64: sysreg: Replace HFGxTR_EL2 with HFG{R,W}TR_EL2
Treating HFGRTR_EL2 and HFGWTR_EL2 identically was a mistake.
It makes things hard to reason about, has the potential to
introduce bugs by giving a meaning to bits that are really reserved,
and is in general a bad description of the architecture.

Given that #defines are cheap, let's describe both registers as
intended by the architecture, and repaint all the existing uses.

Yes, this is painful.

The registers themselves are generated from the JSON file in
an automated way.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06 17:35:03 +01:00
Oliver Upton
13f64f6d21 Merge branch 'kvm-arm64/nv-idregs' into kvmarm/next
* kvm-arm64/nv-idregs:
  : Changes to exposure of NV features, courtesy of Marc Zyngier
  :
  : Apply NV-specific feature restrictions at reset rather than at the point
  : of KVM_RUN. This makes the true feature set visible to userspace, a
  : necessary step towards save/restore support or NV VMs.
  :
  : Add an additional vCPU feature flag for selecting the E2H0 flavor of NV,
  : such that the VHE-ness of the VM can be applied to the feature set.
  KVM: arm64: selftests: Test that TGRAN*_2 fields are writable
  KVM: arm64: Allow userspace to write ID_AA64MMFR0_EL1.TGRAN*_2
  KVM: arm64: Advertise FEAT_ECV when possible
  KVM: arm64: Make ID_AA64MMFR4_EL1.NV_frac writable
  KVM: arm64: Allow userspace to limit NV support to nVHE
  KVM: arm64: Move NV-specific capping to idreg sanitisation
  KVM: arm64: Enforce NV limits on a per-idregs basis
  KVM: arm64: Make ID_REG_LIMIT_FIELD_ENUM() more widely available
  KVM: arm64: Consolidate idreg callbacks
  KVM: arm64: Advertise NV2 in the boot messages
  KVM: arm64: Mark HCR.EL2.{NV*,AT} RES0 when ID_AA64MMFR4_EL1.NV_frac is 0
  KVM: arm64: Mark HCR.EL2.E2H RES0 when ID_AA64MMFR1_EL1.VH is zero
  KVM: arm64: Hide ID_AA64MMFR2_EL1.NV from guest and userspace
  arm64: cpufeature: Handle NV_frac as a synonym of NV2

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-19 14:52:26 -07:00
Marc Zyngier
4b1b97f0d7 KVM: arm64: nv: Handle L2->L1 transition on interrupt injection
An interrupt being delivered to L1 while running L2 must result
in the correct exception being delivered to L1.

This means that if, on entry to L2, we found ourselves with pending
interrupts in the L1 distributor, we need to take immediate action.
This is done by posting a request which will prevent the entry in
L2, and deliver an IRQ exception to L1, forcing the switch.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250225172930.1850838-10-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-03 14:57:10 -08:00
Marc Zyngier
21d29cd814 KVM: arm64: nv: Sanitise ICH_HCR_EL2 accesses
As ICH_HCR_EL2 is a VNCR accessor when runnintg NV, add some
sanitising to what gets written. Crucially, mark TDIR as RES0
if the HW doesn't support it (unlikely, but hey...), as well
as anything GICv4 related, since we only expose a GICv3 to the
uest.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250225172930.1850838-8-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-03 14:55:10 -08:00
Marc Zyngier
8b0b98ebf3 KVM: arm64: Advertise FEAT_ECV when possible
We can advertise support for FEAT_ECV if supported on the HW as
long as we limit it to the basic trap bits, and not advertise
CNTPOFF_EL2 support, even if the host has it (the short story
being that CNTPOFF_EL2 is not virtualisable).

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20250220134907.554085-13-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-02-24 11:33:34 -08:00
Marc Zyngier
f83c41fb3d KVM: arm64: Allow userspace to limit NV support to nVHE
NV is hard. No kidding.

In order to make things simpler, we have established that NV would
support two mutually exclusive configurations:

- VHE-only, and supporting recursive virtualisation

- nVHE-only, and not supporting recursive virtualisation

For that purpose, introduce a new vcpu feature flag that denotes
the second configuration. We use this flag to limit the idregs
further.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20250220134907.554085-11-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-02-24 11:30:17 -08:00
Marc Zyngier
94f296dcd6 KVM: arm64: Move NV-specific capping to idreg sanitisation
Instead of applying the NV idreg limits at run time, switch to
doing it at the same time as the reset of the VM initialisation.

This will make things much simpler once we introduce vcpu-driven
variants of NV.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20250220134907.554085-10-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-02-24 11:28:43 -08:00
Marc Zyngier
e7ef6ed458 KVM: arm64: Enforce NV limits on a per-idregs basis
As we are about to change the way the idreg reset values are computed,
move all the NV limits into a function that initialises one register
at a time.

This will be most useful in the upcoming patches. We take this opportunity
to remove the NV_FTR() macro and rely on the generated names instead.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20250220134907.554085-9-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-02-24 11:28:37 -08:00
Marc Zyngier
8f8d6084f5 KVM: arm64: Mark HCR.EL2.{NV*,AT} RES0 when ID_AA64MMFR4_EL1.NV_frac is 0
Enforce HCR_EL2.{NV*,AT} being RES0 when NV2 is disabled, so that
we can actually rely on these bits never being flipped behind our back.

This of course relies on our earlier ID reg sanitising.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20250220134907.554085-5-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-02-24 11:06:55 -08:00
Marc Zyngier
d9f943f765 KVM: arm64: Mark HCR.EL2.E2H RES0 when ID_AA64MMFR1_EL1.VH is zero
Enforce HCR_EL2.E2H being RES0 when VHE is disabled, so that we can
actually rely on that bit never being flipped behind our back.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20250220134907.554085-4-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-02-24 11:06:55 -08:00
Marc Zyngier
5417a2e9b1 KVM: arm64: Fix nested S2 MMU structures reallocation
For each vcpu that userspace creates, we allocate a number of
s2_mmu structures that will eventually contain our shadow S2
page tables.

Since this is a dynamically allocated array, we reallocate
the array and initialise the newly allocated elements. Once
everything is correctly initialised, we adjust pointer and size
in the kvm structure, and move on.

But should that initialisation fail *and* the reallocation triggered
a copy to another location, we end-up returning early, with the
kvm structure still containing the (now stale) old pointer. Weeee!

Cure it by assigning the pointer early, and use this to perform
the initialisation. If everything succeeds, we adjust the size.
Otherwise, we just leave the size as it was, no harm done, and the
new memory is as good as the ol' one (we hope...).

Fixes: 4f128f8e1a ("KVM: arm64: nv: Support multiple nested Stage-2 mmu structures")
Reported-by: Alexander Potapenko <glider@google.com>
Tested-by: Alexander Potapenko <glider@google.com>
Link: https://lore.kernel.org/r/20250204145554.774427-1-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-02-04 15:02:16 +00:00
Marc Zyngier
fa5e4043e9 Merge branch kvm-arm64/misc-6.14 into kvmarm-master/next
* kvm-arm64/misc-6.14:
  : .
  : Misc KVM/arm64 changes for 6.14
  :
  : - Don't expose AArch32 EL0 capability when NV is enabled
  :
  : - Update documentation to reflect the full gamut of kvm-arm.mode
  :   behaviours
  :
  : - Use the hypervisor VA bit width when dumping stacktraces
  :
  : - Decouple the hypervisor stack size from PAGE_SIZE, at least
  :   on the surface...
  :
  : - Make use of str_enabled_disabled() when advertising GICv4.1 support
  :
  : - Explicitly handle BRBE traps as UNDEFINED
  : .
  KVM: arm64: Explicitly handle BRBE traps as UNDEFINED
  KVM: arm64: vgic: Use str_enabled_disabled() in vgic_v3_probe()
  arm64: kvm: Introduce nvhe stack size constants
  KVM: arm64: Fix nVHE stacktrace VA bits mask
  Documentation: Update the behaviour of "kvm-arm.mode"
  KVM: arm64: nv: Advertise the lack of AArch32 EL0 support

Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-17 11:06:50 +00:00
Marc Zyngier
3643b334aa Merge branch kvm-arm64/nv-resx-fixes-6.14 into kvmarm-master/next
* kvm-arm64/nv-resx-fixes-6.14:
  : .
  : Fixes for NV sysreg accessors. From the cover letter:
  :
  : "Joey recently reported that some rather basic tests were failing on
  : NV, and managed to track it down to critical register fields (such as
  : HCR_EL2.E2H) not having their expect value.
  :
  : Further investigation has outlined a couple of critical issues:
  :
  : - Evaluating HCR_EL2.E2H must always be done with a sanitising
  :   accessor, no ifs, no buts. Given that KVM assumes a fixed value for
  :   this bit, we cannot leave it to the guest to mess with.
  :
  : - Resetting the sysreg file must result in the RESx bits taking
  :   effect. Otherwise, we may end-up making the wrong decision (see
  :   above), and we definitely expose invalid values to the guest. Note
  :   that because we compute the RESx masks very late in the VM setup, we
  :   need to apply these masks at that particular point as well.
  : [...]"
  : .
  KVM: arm64: nv: Apply RESx settings to sysreg reset values
  KVM: arm64: nv: Always evaluate HCR_EL2 using sanitising accessors

Signed-off-by: Marc Zyngier <maz@kernel.org>

# Conflicts:
#	arch/arm64/kvm/nested.c
2025-01-17 11:06:33 +00:00
Marc Zyngier
080612b294 Merge branch kvm-arm64/nv-timers into kvmarm-master/next
* kvm-arm64/nv-timers:
  : .
  : Nested Virt support for the EL2 timers. From the initial cover letter:
  :
  : "Here's another batch of NV-related patches, this time bringing in most
  : of the timer support for EL2 as well as nested guests.
  :
  : The code is pretty convoluted for a bunch of reasons:
  :
  : - FEAT_NV2 breaks the timer semantics by redirecting HW controls to
  :   memory, meaning that a guest could setup a timer and never see it
  :   firing until the next exit
  :
  : - We go try hard to reflect the timer state in memory, but that's not
  :   great.
  :
  : - With FEAT_ECV, we can finally correctly emulate the virtual timer,
  :   but this emulation is pretty costly
  :
  : - As a way to make things suck less, we handle timer reads as early as
  :   possible, and only defer writes to the normal trap handling
  :
  : - Finally, some implementations are badly broken, and require some
  :   hand-holding, irrespective of NV support. So we try and reuse the NV
  :   infrastructure to make them usable. This could be further optimised,
  :   but I'm running out of patience for this sort of HW.
  :
  : [...]"
  : .
  KVM: arm64: nv: Fix doc header layout for timers
  KVM: arm64: nv: Document EL2 timer API
  KVM: arm64: Work around x1e's CNTVOFF_EL2 bogosity
  KVM: arm64: nv: Sanitise CNTHCTL_EL2
  KVM: arm64: nv: Propagate CNTHCTL_EL2.EL1NV{P,V}CT bits
  KVM: arm64: nv: Add trap routing for CNTHCTL_EL2.EL1{NVPCT,NVVCT,TVT,TVCT}
  KVM: arm64: Handle counter access early in non-HYP context
  KVM: arm64: nv: Accelerate EL0 counter accesses from hypervisor context
  KVM: arm64: nv: Accelerate EL0 timer read accesses when FEAT_ECV in use
  KVM: arm64: nv: Use FEAT_ECV to trap access to EL0 timers
  KVM: arm64: nv: Publish emulated timer interrupt state in the in-memory state
  KVM: arm64: nv: Sync nested timer state with FEAT_NV2
  KVM: arm64: nv: Add handling of EL2-specific timer registers

Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-17 11:04:53 +00:00
Marc Zyngier
36f998de85 KVM: arm64: nv: Apply RESx settings to sysreg reset values
While we have sanitisation in place for the guest sysregs, we lack
that sanitisation out of reset. So some of the fields could be
evaluated and not reflect their RESx status, which sounds like
a very bad idea.

Apply the RESx masks to the the sysreg file in two situations:

- when going via a reset of the sysregs

- after having computed the RESx masks

Having this separate reset phase from the actual reset handling is
a bit grotty, but we need to apply this after the ID registers are
final.

Tested-by: Joey Gouly <joey.gouly@arm.com>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20250112165029.1181056-3-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-14 11:33:09 +00:00
Marc Zyngier
d1e37a50e1 KVM: arm64: nv: Sanitise CNTHCTL_EL2
Inject some sanity in CNTHCTL_EL2, ensuring that we don't handle
more than we advertise to the guest.

Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241217142321.763801-11-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-02 19:19:10 +00:00
Marc Zyngier
e891432cf7 KVM: arm64: nv: Advertise the lack of AArch32 EL0 support
Although we never supported 32bit anywhere in NV, we fail to
advertise so for EL0, probably owing to the relative lack of
hardware supporting both NV2 and 32bit EL0.

Add some sanitising to ID_AA64PFR0_EL1.EL0, and reaffirm that
"in 64bit-only we trust".

Reported-by: Oliver Upton <oliver.upton@linux.dev>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-22 10:09:43 +00:00
Fuad Tabba
aac64ad369 KVM: arm64: Use kvm_vcpu_has_feature() directly for struct kvm
Now that we have introduced kvm_vcpu_has_feature(), use it in the
remaining code that checks for features in struct kvm, instead of
using the __vcpu_has_feature() helper.

No functional change intended.

Suggested-by: Quentin Perret <qperret@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20241216105057.579031-18-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20 13:54:12 +00:00
Marc Zyngier
0f3a0f23f5 KVM: arm64: Mark set_sysreg_masks() as inline to avoid build failure
When compiling with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, set_sysreg_masks()
fails to compile thanks to:

	BUILD_BUG_ON(!__builtin_constant_p(sr));

as the compiler doesn't identify sr as a constant, despite all the
callers passing constants.

Fix the issue by always inlining this function, which allows GCC to
do the right thing.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202411201857.ZNudtGJl-lkp@intel.com/
Fixes: a016202009 ("KVM: arm64: Extend masking facility to arbitrary registers")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20241120111516.304250-1-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-20 17:22:00 -08:00
Oliver Upton
6d4b81e2e7 Merge branch kvm-arm64/nv-pmu into kvmarm/next
* kvm-arm64/nv-pmu:
  : Support for vEL2 PMU controls
  :
  : Align the vEL2 PMU support with the current state of non-nested KVM,
  : including:
  :
  :  - Trap routing, with the annoying complication of EL2 traps that apply
  :    in Host EL0
  :
  :  - PMU emulation, using the correct configuration bits depending on
  :    whether a counter falls in the hypervisor or guest range of PMCs
  :
  :  - Perf event swizzling across nested boundaries, as the event filtering
  :    needs to be remapped to cope with vEL2
  KVM: arm64: nv: Reprogram PMU events affected by nested transition
  KVM: arm64: nv: Apply EL2 event filtering when in hyp context
  KVM: arm64: nv: Honor MDCR_EL2.HLP
  KVM: arm64: nv: Honor MDCR_EL2.HPME
  KVM: arm64: Add helpers to determine if PMC counts at a given EL
  KVM: arm64: nv: Adjust range of accessible PMCs according to HPMN
  KVM: arm64: Rename kvm_pmu_valid_counter_mask()
  KVM: arm64: nv: Advertise support for FEAT_HPMN0
  KVM: arm64: nv: Describe trap behaviour of MDCR_EL2.HPMN
  KVM: arm64: nv: Honor MDCR_EL2.{TPM, TPMCR} in Host EL0
  KVM: arm64: nv: Reinject traps that take effect in Host EL0
  KVM: arm64: nv: Rename BEHAVE_FORWARD_ANY
  KVM: arm64: nv: Allow coarse-grained trap combos to use complex traps
  KVM: arm64: Describe RES0/RES1 bits of MDCR_EL2
  arm64: sysreg: Add new definitions for ID_AA64DFR0_EL1
  arm64: sysreg: Migrate MDCR_EL2 definition to table
  arm64: sysreg: Describe ID_AA64DFR2_EL1 fields

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-11 18:49:02 +00:00
Oliver Upton
166b77a2f4 KVM: arm64: nv: Advertise support for FEAT_HPMN0
Everything is in place now for KVM to actually handle MDCR_EL2.HPMN. Not
only that, the emulation is capable of doing FEAT_HPMN0. Advertise
support for the feature in the VM's ID registers. It is possible to
emulate FEAT_HPMN0 on hardware that doesn't support it since KVM
currently traps all PMU registers. Having said that, let's only
advertise the feature on supporting hardware in case KVM ever provides
'direct' PMU support to VMs w/o involving host perf.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20241025182354.3364124-12-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31 19:00:40 +00:00
Oliver Upton
eb609638da KVM: arm64: Describe RES0/RES1 bits of MDCR_EL2
Add support for sanitising MDCR_EL2 and describe the RES0/RES1 bits
according to the feature set exposed to the VM.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20241025182354.3364124-6-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31 19:00:39 +00:00