Commit graph

279 commits

Author SHA1 Message Date
Mikołaj Lenczewski
5aa4b62576 arm64: Add BBM Level 2 cpu feature
The Break-Before-Make cpu feature supports multiple levels (levels 0-2),
and this commit adds a dedicated BBML2 cpufeature to test against
support for.

To support BBML2 in as wide a range of contexts as we can, we want not
only the architectural guarantees that BBML2 makes, but additionally
want BBML2 to not create TLB conflict aborts. Not causing aborts avoids
us having to prove that no recursive faults can be induced in any path
that uses BBML2, allowing its use for arbitrary kernel mappings.

This feature builds on the previous ARM64_CPUCAP_EARLY_LOCAL_CPU_FEATURE,
as all early cpus must support BBML2 for us to enable it (and any later
cpus must also support it to be onlined).

Not onlining late cpus that do not support BBML2 is unavoidable, as we
might currently be using BBML2 semantics for kernel memory regions. This
could cause faults in the late cpus, and would be difficult to unwind,
so let us avoid the case altogether.

Signed-off-by: Mikołaj Lenczewski <miko.lenczewski@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Link: https://lore.kernel.org/r/20250625113435.26849-3-miko.lenczewski@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-06-30 18:09:05 +01:00
Catalin Marinas
3eb06f6ce3 arm64: cpufeature: Introduce MATCH_ALL_EARLY_CPUS capability type
For system-wide capabilities, the kernel has the SCOPE_SYSTEM type. Such
capabilities are checked once the SMP boot has completed using the
sanitised ID registers. However, there is a need for a new capability
type similar in scope to the system one but with checking performed
locally on each CPU during boot (e.g. based on MIDR_EL1 which is not a
sanitised register).

Introduce ARM64_CPUCAP_MATCH_ALL_EARLY_CPUS which, together with
ARM64_CPUCAP_SCOPE_LOCAL_CPU, ensures that such capability is enabled
only if all early CPUs have it. For ease of use, define
ARM64_CPUCAP_EARLY_LOCAL_CPU_FEATURE which combines SCOPE_LOCAL_CPU,
PERMITTED_FOR_LATE_CPUS and MATCH_ALL_EARLY_CPUS.

Signed-off-by: Mikołaj Lenczewski <miko.lenczewski@arm.com>
Reviewed-by: Suzuki K Poulose <Suzuki.Poulose@arm.com>
Link: https://lore.kernel.org/r/20250625113435.26849-2-miko.lenczewski@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-06-30 17:42:03 +01:00
Oliver Upton
56290316a4 KVM: arm64: Move PMUVer filtering into KVM code
The supported guest PMU version on a particular platform is ultimately a
KVM decision. Move PMUVer filtering into KVM code.

Tested-by: Janne Grunau <j@jannau.net>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250305202641.428114-9-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-11 12:54:29 -07:00
Oliver Upton
6f34024d18 KVM: arm64: Use a cpucap to determine if system supports FEAT_PMUv3
KVM is about to learn some new tricks to virtualize PMUv3 on IMPDEF
hardware. As part of that, we now need to differentiate host support
from guest support for PMUv3.

Add a cpucap to determine if an architectural PMUv3 is present to guard
host usage of PMUv3 controls.

Tested-by: Janne Grunau <j@jannau.net>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250305202641.428114-6-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-11 12:54:23 -07:00
Linus Torvalds
1d6d399223 Kthreads affinity follow either of 4 existing different patterns:
1) Per-CPU kthreads must stay affine to a single CPU and never execute
    relevant code on any other CPU. This is currently handled by smpboot
    code which takes care of CPU-hotplug operations. Affinity here is
    a correctness constraint.
 
 2) Some kthreads _have_ to be affine to a specific set of CPUs and can't
    run anywhere else. The affinity is set through kthread_bind_mask()
    and the subsystem takes care by itself to handle CPU-hotplug
    operations. Affinity here is assumed to be a correctness constraint.
 
 3) Per-node kthreads _prefer_ to be affine to a specific NUMA node. This
    is not a correctness constraint but merely a preference in terms of
    memory locality. kswapd and kcompactd both fall into this category.
    The affinity is set manually like for any other task and CPU-hotplug
    is supposed to be handled by the relevant subsystem so that the task
    is properly reaffined whenever a given CPU from the node comes up.
    Also care should be taken so that the node affinity doesn't cross
    isolated (nohz_full) cpumask boundaries.
 
 4) Similar to the previous point except kthreads have a _preferred_
    affinity different than a node. Both RCU boost kthreads and RCU
    exp kworkers fall into this category as they refer to "RCU nodes"
    from a distinctly distributed tree.
 
 Currently the preferred affinity patterns (3 and 4) have at least 4
 identified users, with more or less success when it comes to handle
 CPU-hotplug operations and CPU isolation. Each of which do it in its own
 ad-hoc way.
 
 This is an infrastructure proposal to handle this with the following API
 changes:
 
 _ kthread_create_on_node() automatically affines the created kthread to
   its target node unless it has been set as per-cpu or bound with
   kthread_bind[_mask]() before the first wake-up.
 
 - kthread_affine_preferred() is a new function that can be called right
   after kthread_create_on_node() to specify a preferred affinity
   different than the specified node.
 
 When the preferred affinity can't be applied because the possible
 targets are offline or isolated (nohz_full), the kthread is affine
 to the housekeeping CPUs (which means to all online CPUs most of the
 time or only the non-nohz_full CPUs when nohz_full= is set).
 
 kswapd, kcompactd, RCU boost kthreads and RCU exp kworkers have been
 converted, along with a few old drivers.
 
 Summary of the changes:
 
 * Consolidate a bunch of ad-hoc implementations of kthread_run_on_cpu()
 
 * Introduce task_cpu_fallback_mask() that defines the default last
   resort affinity of a task to become nohz_full aware
 
 * Add some correctness check to ensure kthread_bind() is always called
   before the first kthread wake up.
 
 * Default affine kthread to its preferred node.
 
 * Convert kswapd / kcompactd and remove their halfway working ad-hoc
   affinity implementation
 
 * Implement kthreads preferred affinity
 
 * Unify kthread worker and kthread API's style
 
 * Convert RCU kthreads to the new API and remove the ad-hoc affinity
   implementation.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEd76+gtGM8MbftQlOhSRUR1COjHcFAmeNf8gACgkQhSRUR1CO
 jHedQQ/+IxTjjqQiItzrq41TES2S0desHDq8lNJFb7rsR/DtKFyLx3s67cOYV+cM
 Yx54QHg2m/Fz4nXMQ7Po5ygOtJGCKBc5C5QQy7y0lVKeTQK+daDfEtBSa3oG7j3C
 u+E3tTY6qxkbCzymUyaKkHN4/ay2vLvjFS50luV7KMyI3x47Aji+t7VdCX4LCPP2
 eAwOALWD0+7qLJ/VF6gsmQLKA4Qx7PQAzBa3KSBmUN9UcN8Gk1bQHCTIQKDHP9LQ
 v8BXrNZtYX1o2+snNYpX2z6/ECjxkdwriOgqqZY5306hd9RAQ1u46Dx3byrIqjGn
 ULG/XQ2istPyhTqb/h+RbrobdOcwEUIeqk8hRRbBXE8bPpqUz9EMuaCMxWDbQjgH
 NTuKG4ifKJ/IqstkkuDkdOiByE/ysMmwqrTXgSnu2ITNL9yY3BEgFbvA95hgo42s
 f7QCxEfZb1MHcNEMENSMwM3xw5lLMGMpxVZcMQ3gLwyotMBRrhFZm1qZJG7TITYW
 IDIeCbH4JOMdQwLs3CcWTXio0N5/85NhRNFV+IDn96OrgxObgnMtV8QwNgjXBAJ5
 wGeJWt8s34W1Zo3qS9gEuVzEhW4XaxISQQMkHe8faKkK6iHmIB/VjSQikDwwUNQ/
 AspYj82RyWBCDZsqhiYh71kpxjvS6Xp0bj39Ce1sNsOnuksxKkQ=
 =g8In
 -----END PGP SIGNATURE-----

Merge tag 'kthread-for-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks

Pull kthread updates from Frederic Weisbecker:
 "Kthreads affinity follow either of 4 existing different patterns:

   1) Per-CPU kthreads must stay affine to a single CPU and never
      execute relevant code on any other CPU. This is currently handled
      by smpboot code which takes care of CPU-hotplug operations.
      Affinity here is a correctness constraint.

   2) Some kthreads _have_ to be affine to a specific set of CPUs and
      can't run anywhere else. The affinity is set through
      kthread_bind_mask() and the subsystem takes care by itself to
      handle CPU-hotplug operations. Affinity here is assumed to be a
      correctness constraint.

   3) Per-node kthreads _prefer_ to be affine to a specific NUMA node.
      This is not a correctness constraint but merely a preference in
      terms of memory locality. kswapd and kcompactd both fall into this
      category. The affinity is set manually like for any other task and
      CPU-hotplug is supposed to be handled by the relevant subsystem so
      that the task is properly reaffined whenever a given CPU from the
      node comes up. Also care should be taken so that the node affinity
      doesn't cross isolated (nohz_full) cpumask boundaries.

   4) Similar to the previous point except kthreads have a _preferred_
      affinity different than a node. Both RCU boost kthreads and RCU
      exp kworkers fall into this category as they refer to "RCU nodes"
      from a distinctly distributed tree.

  Currently the preferred affinity patterns (3 and 4) have at least 4
  identified users, with more or less success when it comes to handle
  CPU-hotplug operations and CPU isolation. Each of which do it in its
  own ad-hoc way.

  This is an infrastructure proposal to handle this with the following
  API changes:

   - kthread_create_on_node() automatically affines the created kthread
     to its target node unless it has been set as per-cpu or bound with
     kthread_bind[_mask]() before the first wake-up.

   - kthread_affine_preferred() is a new function that can be called
     right after kthread_create_on_node() to specify a preferred
     affinity different than the specified node.

  When the preferred affinity can't be applied because the possible
  targets are offline or isolated (nohz_full), the kthread is affine to
  the housekeeping CPUs (which means to all online CPUs most of the time
  or only the non-nohz_full CPUs when nohz_full= is set).

  kswapd, kcompactd, RCU boost kthreads and RCU exp kworkers have been
  converted, along with a few old drivers.

  Summary of the changes:

   - Consolidate a bunch of ad-hoc implementations of
     kthread_run_on_cpu()

   - Introduce task_cpu_fallback_mask() that defines the default last
     resort affinity of a task to become nohz_full aware

   - Add some correctness check to ensure kthread_bind() is always
     called before the first kthread wake up.

   - Default affine kthread to its preferred node.

   - Convert kswapd / kcompactd and remove their halfway working ad-hoc
     affinity implementation

   - Implement kthreads preferred affinity

   - Unify kthread worker and kthread API's style

   - Convert RCU kthreads to the new API and remove the ad-hoc affinity
     implementation"

* tag 'kthread-for-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks:
  kthread: modify kernel-doc function name to match code
  rcu: Use kthread preferred affinity for RCU exp kworkers
  treewide: Introduce kthread_run_worker[_on_cpu]()
  kthread: Unify kthread_create_on_cpu() and kthread_create_worker_on_cpu() automatic format
  rcu: Use kthread preferred affinity for RCU boost
  kthread: Implement preferred affinity
  mm: Create/affine kswapd to its preferred node
  mm: Create/affine kcompactd to its preferred node
  kthread: Default affine kthread to its preferred NUMA node
  kthread: Make sure kthread hasn't started while binding it
  sched,arm64: Handle CPU isolation on last resort fallback rq selection
  arm64: Exclude nohz_full CPUs from 32bits el0 support
  lib: test_objpool: Use kthread_run_on_cpu()
  kallsyms: Use kthread_run_on_cpu()
  soc/qman: test: Use kthread_run_on_cpu()
  arm/bL_switcher: Use kthread_run_on_cpu()
2025-01-21 17:10:05 -08:00
Frederic Weisbecker
3a5446612a sched,arm64: Handle CPU isolation on last resort fallback rq selection
When a kthread or any other task has an affinity mask that is fully
offline or unallowed, the scheduler reaffines the task to all possible
CPUs as a last resort.

This default decision doesn't mix up very well with nohz_full CPUs that
are part of the possible cpumask but don't want to be disturbed by
unbound kthreads or even detached pinned user tasks.

Make the fallback affinity setting aware of nohz_full.

Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2025-01-08 18:14:23 +01:00
Mark Rutland
264a593da6 arm64: cpufeature: Add HAFT to cpucap_is_possible()
For consistency with other cpucaps, handle the configuration check for
ARM64_HAFT in cpucap_is_possible() rather than this being explicit in
system_supports_haft(). The configuration check will now happen
implicitly as cpus_have_final_cap() uses cpucap_is_possible() via
alternative_has_cap_unlikely().

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20241209155948.2124393-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-12-10 12:10:09 +00:00
Robin Murphy
f00b53f161 arm64: cpufeature: Add GCS to cpucap_is_possible()
Since system_supports_gcs() ends up referring to cpucap_is_possible(),
teach the latter about GCS for consistency with similar features.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/416c7369fcdce4ebb2a8f12daae234507be27e38.1733406275.git.robin.murphy@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-12-05 17:15:38 +00:00
Linus Torvalds
9f16d5e6f2 The biggest change here is eliminating the awful idea that KVM had, of
essentially guessing which pfns are refcounted pages.  The reason to
 do so was that KVM needs to map both non-refcounted pages (for example
 BARs of VFIO devices) and VM_PFNMAP/VM_MIXMEDMAP VMAs that contain
 refcounted pages.  However, the result was security issues in the past,
 and more recently the inability to map VM_IO and VM_PFNMAP memory
 that _is_ backed by struct page but is not refcounted.  In particular
 this broke virtio-gpu blob resources (which directly map host graphics
 buffers into the guest as "vram" for the virtio-gpu device) with the
 amdgpu driver, because amdgpu allocates non-compound higher order pages
 and the tail pages could not be mapped into KVM.
 
 This requires adjusting all uses of struct page in the per-architecture
 code, to always work on the pfn whenever possible.  The large series that
 did this, from David Stevens and Sean Christopherson, also cleaned up
 substantially the set of functions that provided arch code with the
 pfn for a host virtual addresses.  The previous maze of twisty little
 passages, all different, is replaced by five functions (__gfn_to_page,
 __kvm_faultin_pfn, the non-__ versions of these two, and kvm_prefetch_pages)
 saving almost 200 lines of code.
 
 ARM:
 
 * Support for stage-1 permission indirection (FEAT_S1PIE) and
   permission overlays (FEAT_S1POE), including nested virt + the
   emulated page table walker
 
 * Introduce PSCI SYSTEM_OFF2 support to KVM + client driver. This call
   was introduced in PSCIv1.3 as a mechanism to request hibernation,
   similar to the S4 state in ACPI
 
 * Explicitly trap + hide FEAT_MPAM (QoS controls) from KVM guests. As
   part of it, introduce trivial initialization of the host's MPAM
   context so KVM can use the corresponding traps
 
 * PMU support under nested virtualization, honoring the guest
   hypervisor's trap configuration and event filtering when running a
   nested guest
 
 * Fixes to vgic ITS serialization where stale device/interrupt table
   entries are not zeroed when the mapping is invalidated by the VM
 
 * Avoid emulated MMIO completion if userspace has requested synchronous
   external abort injection
 
 * Various fixes and cleanups affecting pKVM, vCPU initialization, and
   selftests
 
 LoongArch:
 
 * Add iocsr and mmio bus simulation in kernel.
 
 * Add in-kernel interrupt controller emulation.
 
 * Add support for virtualization extensions to the eiointc irqchip.
 
 PPC:
 
 * Drop lingering and utterly obsolete references to PPC970 KVM, which was
   removed 10 years ago.
 
 * Fix incorrect documentation references to non-existing ioctls
 
 RISC-V:
 
 * Accelerate KVM RISC-V when running as a guest
 
 * Perf support to collect KVM guest statistics from host side
 
 s390:
 
 * New selftests: more ucontrol selftests and CPU model sanity checks
 
 * Support for the gen17 CPU model
 
 * List registers supported by KVM_GET/SET_ONE_REG in the documentation
 
 x86:
 
 * Cleanup KVM's handling of Accessed and Dirty bits to dedup code, improve
   documentation, harden against unexpected changes.  Even if the hardware
   A/D tracking is disabled, it is possible to use the hardware-defined A/D
   bits to track if a PFN is Accessed and/or Dirty, and that removes a lot
   of special cases.
 
 * Elide TLB flushes when aging secondary PTEs, as has been done in x86's
   primary MMU for over 10 years.
 
 * Recover huge pages in-place in the TDP MMU when dirty page logging is
   toggled off, instead of zapping them and waiting until the page is
   re-accessed to create a huge mapping.  This reduces vCPU jitter.
 
 * Batch TLB flushes when dirty page logging is toggled off.  This reduces
   the time it takes to disable dirty logging by ~3x.
 
 * Remove the shrinker that was (poorly) attempting to reclaim shadow page
   tables in low-memory situations.
 
 * Clean up and optimize KVM's handling of writes to MSR_IA32_APICBASE.
 
 * Advertise CPUIDs for new instructions in Clearwater Forest
 
 * Quirk KVM's misguided behavior of initialized certain feature MSRs to
   their maximum supported feature set, which can result in KVM creating
   invalid vCPU state.  E.g. initializing PERF_CAPABILITIES to a non-zero
   value results in the vCPU having invalid state if userspace hides PDCM
   from the guest, which in turn can lead to save/restore failures.
 
 * Fix KVM's handling of non-canonical checks for vCPUs that support LA57
   to better follow the "architecture", in quotes because the actual
   behavior is poorly documented.  E.g. most MSR writes and descriptor
   table loads ignore CR4.LA57 and operate purely on whether the CPU
   supports LA57.
 
 * Bypass the register cache when querying CPL from kvm_sched_out(), as
   filling the cache from IRQ context is generally unsafe; harden the
   cache accessors to try to prevent similar issues from occuring in the
   future.  The issue that triggered this change was already fixed in 6.12,
   but was still kinda latent.
 
 * Advertise AMD_IBPB_RET to userspace, and fix a related bug where KVM
   over-advertises SPEC_CTRL when trying to support cross-vendor VMs.
 
 * Minor cleanups
 
 * Switch hugepage recovery thread to use vhost_task.  These kthreads can
   consume significant amounts of CPU time on behalf of a VM or in response
   to how the VM behaves (for example how it accesses its memory); therefore
   KVM tried to place the thread in the VM's cgroups and charge the CPU
   time consumed by that work to the VM's container.  However the kthreads
   did not process SIGSTOP/SIGCONT, and therefore cgroups which had KVM
   instances inside could not complete freezing.  Fix this by replacing the
   kthread with a PF_USER_WORKER thread, via the vhost_task abstraction.
   Another 100+ lines removed, with generally better behavior too like
   having these threads properly parented in the process tree.
 
 * Revert a workaround for an old CPU erratum (Nehalem/Westmere) that didn't
   really work; there was really nothing to work around anyway: the broken
   patch was meant to fix nested virtualization, but the PERF_GLOBAL_CTRL
   MSR is virtualized and therefore unaffected by the erratum.
 
 * Fix 6.12 regression where CONFIG_KVM will be built as a module even
   if asked to be builtin, as long as neither KVM_INTEL nor KVM_AMD is 'y'.
 
 x86 selftests:
 
 * x86 selftests can now use AVX.
 
 Documentation:
 
 * Use rST internal links
 
 * Reorganize the introduction to the API document
 
 Generic:
 
 * Protect vcpu->pid accesses outside of vcpu->mutex with a rwlock instead
   of RCU, so that running a vCPU on a different task doesn't encounter long
   due to having to wait for all CPUs become quiescent.  In general both reads
   and writes are rare, but userspace that supports confidential computing is
   introducing the use of "helper" vCPUs that may jump from one host processor
   to another.  Those will be very happy to trigger a synchronize_rcu(), and
   the effect on performance is quite the disaster.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmc9MRYUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroP00QgArxqxBIGLCW5t7bw7vtNq63QYRyh4
 dTiDguLiYQJ+AXmnRu11R6aPC7HgMAvlFCCmH+GEce4WEgt26hxCmncJr/aJOSwS
 letCS7TrME16PeZvh25A1nhPBUw6mTF1qqzgcdHMrqXG8LuHoGcKYGSRVbkf3kfI
 1ZoMq1r8ChXbVVmCx9DQ3gw1TVr5Dpjs2voLh8rDSE9Xpw0tVVabHu3/NhQEz/F+
 t8/nRaqH777icCHIf9PCk5HnarHxLAOvhM2M0Yj09PuBcE5fFQxpxltw/qiKQqqW
 ep4oquojGl87kZnhlDaac2UNtK90Ws+WxxvCwUmbvGN0ZJVaQwf4FvTwig==
 =lWpE
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "The biggest change here is eliminating the awful idea that KVM had of
  essentially guessing which pfns are refcounted pages.

  The reason to do so was that KVM needs to map both non-refcounted
  pages (for example BARs of VFIO devices) and VM_PFNMAP/VM_MIXMEDMAP
  VMAs that contain refcounted pages.

  However, the result was security issues in the past, and more recently
  the inability to map VM_IO and VM_PFNMAP memory that _is_ backed by
  struct page but is not refcounted. In particular this broke virtio-gpu
  blob resources (which directly map host graphics buffers into the
  guest as "vram" for the virtio-gpu device) with the amdgpu driver,
  because amdgpu allocates non-compound higher order pages and the tail
  pages could not be mapped into KVM.

  This requires adjusting all uses of struct page in the
  per-architecture code, to always work on the pfn whenever possible.
  The large series that did this, from David Stevens and Sean
  Christopherson, also cleaned up substantially the set of functions
  that provided arch code with the pfn for a host virtual addresses.

  The previous maze of twisty little passages, all different, is
  replaced by five functions (__gfn_to_page, __kvm_faultin_pfn, the
  non-__ versions of these two, and kvm_prefetch_pages) saving almost
  200 lines of code.

  ARM:

   - Support for stage-1 permission indirection (FEAT_S1PIE) and
     permission overlays (FEAT_S1POE), including nested virt + the
     emulated page table walker

   - Introduce PSCI SYSTEM_OFF2 support to KVM + client driver. This
     call was introduced in PSCIv1.3 as a mechanism to request
     hibernation, similar to the S4 state in ACPI

   - Explicitly trap + hide FEAT_MPAM (QoS controls) from KVM guests. As
     part of it, introduce trivial initialization of the host's MPAM
     context so KVM can use the corresponding traps

   - PMU support under nested virtualization, honoring the guest
     hypervisor's trap configuration and event filtering when running a
     nested guest

   - Fixes to vgic ITS serialization where stale device/interrupt table
     entries are not zeroed when the mapping is invalidated by the VM

   - Avoid emulated MMIO completion if userspace has requested
     synchronous external abort injection

   - Various fixes and cleanups affecting pKVM, vCPU initialization, and
     selftests

  LoongArch:

   - Add iocsr and mmio bus simulation in kernel.

   - Add in-kernel interrupt controller emulation.

   - Add support for virtualization extensions to the eiointc irqchip.

  PPC:

   - Drop lingering and utterly obsolete references to PPC970 KVM, which
     was removed 10 years ago.

   - Fix incorrect documentation references to non-existing ioctls

  RISC-V:

   - Accelerate KVM RISC-V when running as a guest

   - Perf support to collect KVM guest statistics from host side

  s390:

   - New selftests: more ucontrol selftests and CPU model sanity checks

   - Support for the gen17 CPU model

   - List registers supported by KVM_GET/SET_ONE_REG in the
     documentation

  x86:

   - Cleanup KVM's handling of Accessed and Dirty bits to dedup code,
     improve documentation, harden against unexpected changes.

     Even if the hardware A/D tracking is disabled, it is possible to
     use the hardware-defined A/D bits to track if a PFN is Accessed
     and/or Dirty, and that removes a lot of special cases.

   - Elide TLB flushes when aging secondary PTEs, as has been done in
     x86's primary MMU for over 10 years.

   - Recover huge pages in-place in the TDP MMU when dirty page logging
     is toggled off, instead of zapping them and waiting until the page
     is re-accessed to create a huge mapping. This reduces vCPU jitter.

   - Batch TLB flushes when dirty page logging is toggled off. This
     reduces the time it takes to disable dirty logging by ~3x.

   - Remove the shrinker that was (poorly) attempting to reclaim shadow
     page tables in low-memory situations.

   - Clean up and optimize KVM's handling of writes to
     MSR_IA32_APICBASE.

   - Advertise CPUIDs for new instructions in Clearwater Forest

   - Quirk KVM's misguided behavior of initialized certain feature MSRs
     to their maximum supported feature set, which can result in KVM
     creating invalid vCPU state. E.g. initializing PERF_CAPABILITIES to
     a non-zero value results in the vCPU having invalid state if
     userspace hides PDCM from the guest, which in turn can lead to
     save/restore failures.

   - Fix KVM's handling of non-canonical checks for vCPUs that support
     LA57 to better follow the "architecture", in quotes because the
     actual behavior is poorly documented. E.g. most MSR writes and
     descriptor table loads ignore CR4.LA57 and operate purely on
     whether the CPU supports LA57.

   - Bypass the register cache when querying CPL from kvm_sched_out(),
     as filling the cache from IRQ context is generally unsafe; harden
     the cache accessors to try to prevent similar issues from occuring
     in the future. The issue that triggered this change was already
     fixed in 6.12, but was still kinda latent.

   - Advertise AMD_IBPB_RET to userspace, and fix a related bug where
     KVM over-advertises SPEC_CTRL when trying to support cross-vendor
     VMs.

   - Minor cleanups

   - Switch hugepage recovery thread to use vhost_task.

     These kthreads can consume significant amounts of CPU time on
     behalf of a VM or in response to how the VM behaves (for example
     how it accesses its memory); therefore KVM tried to place the
     thread in the VM's cgroups and charge the CPU time consumed by that
     work to the VM's container.

     However the kthreads did not process SIGSTOP/SIGCONT, and therefore
     cgroups which had KVM instances inside could not complete freezing.

     Fix this by replacing the kthread with a PF_USER_WORKER thread, via
     the vhost_task abstraction. Another 100+ lines removed, with
     generally better behavior too like having these threads properly
     parented in the process tree.

   - Revert a workaround for an old CPU erratum (Nehalem/Westmere) that
     didn't really work; there was really nothing to work around anyway:
     the broken patch was meant to fix nested virtualization, but the
     PERF_GLOBAL_CTRL MSR is virtualized and therefore unaffected by the
     erratum.

   - Fix 6.12 regression where CONFIG_KVM will be built as a module even
     if asked to be builtin, as long as neither KVM_INTEL nor KVM_AMD is
     'y'.

  x86 selftests:

   - x86 selftests can now use AVX.

  Documentation:

   - Use rST internal links

   - Reorganize the introduction to the API document

  Generic:

   - Protect vcpu->pid accesses outside of vcpu->mutex with a rwlock
     instead of RCU, so that running a vCPU on a different task doesn't
     encounter long due to having to wait for all CPUs become quiescent.

     In general both reads and writes are rare, but userspace that
     supports confidential computing is introducing the use of "helper"
     vCPUs that may jump from one host processor to another. Those will
     be very happy to trigger a synchronize_rcu(), and the effect on
     performance is quite the disaster"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (298 commits)
  KVM: x86: Break CONFIG_KVM_X86's direct dependency on KVM_INTEL || KVM_AMD
  KVM: x86: add back X86_LOCAL_APIC dependency
  Revert "KVM: VMX: Move LOAD_IA32_PERF_GLOBAL_CTRL errata handling out of setup_vmcs_config()"
  KVM: x86: switch hugepage recovery thread to vhost_task
  KVM: x86: expose MSR_PLATFORM_INFO as a feature MSR
  x86: KVM: Advertise CPUIDs for new instructions in Clearwater Forest
  Documentation: KVM: fix malformed table
  irqchip/loongson-eiointc: Add virt extension support
  LoongArch: KVM: Add irqfd support
  LoongArch: KVM: Add PCHPIC user mode read and write functions
  LoongArch: KVM: Add PCHPIC read and write functions
  LoongArch: KVM: Add PCHPIC device support
  LoongArch: KVM: Add EIOINTC user mode read and write functions
  LoongArch: KVM: Add EIOINTC read and write functions
  LoongArch: KVM: Add EIOINTC device support
  LoongArch: KVM: Add IPI user mode read and write function
  LoongArch: KVM: Add IPI read and write function
  LoongArch: KVM: Add IPI device support
  LoongArch: KVM: Add iocsr and mmio bus simulation in kernel
  KVM: arm64: Pass on SVE mapping failures
  ...
2024-11-23 16:00:50 -08:00
Catalin Marinas
5a4332062e Merge branches 'for-next/gcs', 'for-next/probes', 'for-next/asm-offsets', 'for-next/tlb', 'for-next/misc', 'for-next/mte', 'for-next/sysreg', 'for-next/stacktrace', 'for-next/hwcap3', 'for-next/kselftest', 'for-next/crc32', 'for-next/guest-cca', 'for-next/haft' and 'for-next/scs', remote-tracking branch 'arm64/for-next/perf' into for-next/core
* arm64/for-next/perf:
  perf: Switch back to struct platform_driver::remove()
  perf: arm_pmuv3: Add support for Samsung Mongoose PMU
  dt-bindings: arm: pmu: Add Samsung Mongoose core compatible
  perf/dwc_pcie: Fix typos in event names
  perf/dwc_pcie: Add support for Ampere SoCs
  ARM: pmuv3: Add missing write_pmuacr()
  perf/marvell: Marvell PEM performance monitor support
  perf/arm_pmuv3: Add PMUv3.9 per counter EL0 access control
  perf/dwc_pcie: Convert the events with mixed case to lowercase
  perf/cxlpmu: Support missing events in 3.1 spec
  perf: imx_perf: add support for i.MX91 platform
  dt-bindings: perf: fsl-imx-ddr: Add i.MX91 compatible
  drivers perf: remove unused field pmu_node

* for-next/gcs: (42 commits)
  : arm64 Guarded Control Stack user-space support
  kselftest/arm64: Fix missing printf() argument in gcs/gcs-stress.c
  arm64/gcs: Fix outdated ptrace documentation
  kselftest/arm64: Ensure stable names for GCS stress test results
  kselftest/arm64: Validate that GCS push and write permissions work
  kselftest/arm64: Enable GCS for the FP stress tests
  kselftest/arm64: Add a GCS stress test
  kselftest/arm64: Add GCS signal tests
  kselftest/arm64: Add test coverage for GCS mode locking
  kselftest/arm64: Add a GCS test program built with the system libc
  kselftest/arm64: Add very basic GCS test program
  kselftest/arm64: Always run signals tests with GCS enabled
  kselftest/arm64: Allow signals tests to specify an expected si_code
  kselftest/arm64: Add framework support for GCS to signal handling tests
  kselftest/arm64: Add GCS as a detected feature in the signal tests
  kselftest/arm64: Verify the GCS hwcap
  arm64: Add Kconfig for Guarded Control Stack (GCS)
  arm64/ptrace: Expose GCS via ptrace and core files
  arm64/signal: Expose GCS state in signal frames
  arm64/signal: Set up and restore the GCS context for signal handlers
  arm64/mm: Implement map_shadow_stack()
  ...

* for-next/probes:
  : Various arm64 uprobes/kprobes cleanups
  arm64: insn: Simulate nop instruction for better uprobe performance
  arm64: probes: Remove probe_opcode_t
  arm64: probes: Cleanup kprobes endianness conversions
  arm64: probes: Move kprobes-specific fields
  arm64: probes: Fix uprobes for big-endian kernels
  arm64: probes: Fix simulate_ldr*_literal()
  arm64: probes: Remove broken LDR (literal) uprobe support

* for-next/asm-offsets:
  : arm64 asm-offsets.c cleanup (remove unused offsets)
  arm64: asm-offsets: remove PREEMPT_DISABLE_OFFSET
  arm64: asm-offsets: remove DMA_{TO,FROM}_DEVICE
  arm64: asm-offsets: remove VM_EXEC and PAGE_SZ
  arm64: asm-offsets: remove MM_CONTEXT_ID
  arm64: asm-offsets: remove COMPAT_{RT_,SIGFRAME_REGS_OFFSET
  arm64: asm-offsets: remove VMA_VM_*
  arm64: asm-offsets: remove TSK_ACTIVE_MM

* for-next/tlb:
  : TLB flushing optimisations
  arm64: optimize flush tlb kernel range
  arm64: tlbflush: add __flush_tlb_range_limit_excess()

* for-next/misc:
  : Miscellaneous patches
  arm64: tls: Fix context-switching of tpidrro_el0 when kpti is enabled
  arm64/ptrace: Clarify documentation of VL configuration via ptrace
  acpi/arm64: remove unnecessary cast
  arm64/mm: Change protval as 'pteval_t' in map_range()
  arm64: uprobes: Optimize cache flushes for xol slot
  acpi/arm64: Adjust error handling procedure in gtdt_parse_timer_block()
  arm64: fix .data.rel.ro size assertion when CONFIG_LTO_CLANG
  arm64/ptdump: Test both PTE_TABLE_BIT and PTE_VALID for block mappings
  arm64/mm: Sanity check PTE address before runtime P4D/PUD folding
  arm64/mm: Drop setting PTE_TYPE_PAGE in pte_mkcont()
  ACPI: GTDT: Tighten the check for the array of platform timer structures
  arm64/fpsimd: Fix a typo
  arm64: Expose ID_AA64ISAR1_EL1.XS to sanitised feature consumers
  arm64: Return early when break handler is found on linked-list
  arm64/mm: Re-organize arch_make_huge_pte()
  arm64/mm: Drop _PROT_SECT_DEFAULT
  arm64: Add command-line override for ID_AA64MMFR0_EL1.ECV
  arm64: head: Drop SWAPPER_TABLE_SHIFT
  arm64: cpufeature: add POE to cpucap_is_possible()
  arm64/mm: Change pgattr_change_is_safe() arguments as pteval_t

* for-next/mte:
  : Various MTE improvements
  selftests: arm64: add hugetlb mte tests
  hugetlb: arm64: add mte support

* for-next/sysreg:
  : arm64 sysreg updates
  arm64/sysreg: Update ID_AA64MMFR1_EL1 to DDI0601 2024-09

* for-next/stacktrace:
  : arm64 stacktrace improvements
  arm64: preserve pt_regs::stackframe during exec*()
  arm64: stacktrace: unwind exception boundaries
  arm64: stacktrace: split unwind_consume_stack()
  arm64: stacktrace: report recovered PCs
  arm64: stacktrace: report source of unwind data
  arm64: stacktrace: move dump_backtrace() to kunwind_stack_walk()
  arm64: use a common struct frame_record
  arm64: pt_regs: swap 'unused' and 'pmr' fields
  arm64: pt_regs: rename "pmr_save" -> "pmr"
  arm64: pt_regs: remove stale big-endian layout
  arm64: pt_regs: assert pt_regs is a multiple of 16 bytes

* for-next/hwcap3:
  : Add AT_HWCAP3 support for arm64 (also wire up AT_HWCAP4)
  arm64: Support AT_HWCAP3
  binfmt_elf: Wire up AT_HWCAP3 at AT_HWCAP4

* for-next/kselftest: (30 commits)
  : arm64 kselftest fixes/cleanups
  kselftest/arm64: Try harder to generate different keys during PAC tests
  kselftest/arm64: Don't leak pipe fds in pac.exec_sign_all()
  kselftest/arm64: Corrupt P0 in the irritator when testing SSVE
  kselftest/arm64: Add FPMR coverage to fp-ptrace
  kselftest/arm64: Expand the set of ZA writes fp-ptrace does
  kselftets/arm64: Use flag bits for features in fp-ptrace assembler code
  kselftest/arm64: Enable build of PAC tests with LLVM=1
  kselftest/arm64: Check that SVCR is 0 in signal handlers
  kselftest/arm64: Fix printf() compiler warnings in the arm64 syscall-abi.c tests
  kselftest/arm64: Fix printf() warning in the arm64 MTE prctl() test
  kselftest/arm64: Fix printf() compiler warnings in the arm64 fp tests
  kselftest/arm64: Fix build with stricter assemblers
  kselftest/arm64: Test signal handler state modification in fp-stress
  kselftest/arm64: Provide a SIGUSR1 handler in the kernel mode FP stress test
  kselftest/arm64: Implement irritators for ZA and ZT
  kselftest/arm64: Remove unused ADRs from irritator handlers
  kselftest/arm64: Correct misleading comments on fp-stress irritators
  kselftest/arm64: Poll less often while waiting for fp-stress children
  kselftest/arm64: Increase frequency of signal delivery in fp-stress
  kselftest/arm64: Fix encoding for SVE B16B16 test
  ...

* for-next/crc32:
  : Optimise CRC32 using PMULL instructions
  arm64/crc32: Implement 4-way interleave using PMULL
  arm64/crc32: Reorganize bit/byte ordering macros
  arm64/lib: Handle CRC-32 alternative in C code

* for-next/guest-cca:
  : Support for running Linux as a guest in Arm CCA
  arm64: Document Arm Confidential Compute
  virt: arm-cca-guest: TSM_REPORT support for realms
  arm64: Enable memory encrypt for Realms
  arm64: mm: Avoid TLBI when marking pages as valid
  arm64: Enforce bounce buffers for realm DMA
  efi: arm64: Map Device with Prot Shared
  arm64: rsi: Map unprotected MMIO as decrypted
  arm64: rsi: Add support for checking whether an MMIO is protected
  arm64: realm: Query IPA size from the RMM
  arm64: Detect if in a realm and set RIPAS RAM
  arm64: rsi: Add RSI definitions

* for-next/haft:
  : Support for arm64 FEAT_HAFT
  arm64: pgtable: Warn unexpected pmdp_test_and_clear_young()
  arm64: Enable ARCH_HAS_NONLEAF_PMD_YOUNG
  arm64: Add support for FEAT_HAFT
  arm64: setup: name 'tcr2' register
  arm64/sysreg: Update ID_AA64MMFR1_EL1 register

* for-next/scs:
  : Dynamic shadow call stack fixes
  arm64/scs: Drop unused prototype __pi_scs_patch_vmlinux()
  arm64/scs: Deal with 64-bit relative offsets in FDE frames
  arm64/scs: Fix handling of DWARF augmentation data in CIE/FDE frames
2024-11-14 12:07:16 +00:00
Yicong Yang
efe7254135 arm64: Add support for FEAT_HAFT
Armv8.9/v9.4 introduces the feature Hardware managed Access Flag
for Table descriptors (FEAT_HAFT). The feature is indicated by
ID_AA64MMFR1_EL1.HAFDBS == 0b0011 and can be enabled by
TCR2_EL1.HAFT so it has a dependency on FEAT_TCR2.

Adds the Kconfig for FEAT_HAFT and support detecting and enabling
the feature. The feature is enabled in __cpu_setup() before MMU on
just like HA. A CPU capability is added to notify the user of the
feature.

Add definition of P{G,4,U,M}D_TABLE_AF bit and set the AF bit
when creating the page table, which will save the hardware
from having to update them at runtime. This will be ignored if
FEAT_HAFT is not enabled.

The AF bit of table descriptors cannot be managed by the software
per spec, unlike the HA. So this should be used only if it's supported
system wide by system_supports_haft().

Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20241102104235.62560-4-yangyicong@huawei.com
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
[catalin.marinas@arm.com: added the ID check back to __cpu_setup in case of future CPU errata]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-11-05 13:18:35 +00:00
James Morse
31ff96c38e KVM: arm64: Fix missing traps of guest accesses to the MPAM registers
commit 011e5f5bf5 ("arm64/cpufeature: Add remaining feature bits in
ID_AA64PFR0 register") exposed the MPAM field of AA64PFR0_EL1 to guests,
but didn't add trap handling.

If you are unlucky, this results in an MPAM aware guest being delivered
an undef during boot. The host prints:
| kvm [97]: Unsupported guest sys_reg access at: ffff800080024c64 [00000005]
| { Op0( 3), Op1( 0), CRn(10), CRm( 5), Op2( 0), func_read },

Which results in:
| Internal error: Oops - Undefined instruction: 0000000002000000 [#1] PREEMPT SMP
| Modules linked in:
| CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.6.0-rc7-00559-gd89c186d50b2 #14616
| Hardware name: linux,dummy-virt (DT)
| pstate: 00000005 (nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| pc : test_has_mpam+0x18/0x30
| lr : test_has_mpam+0x10/0x30
| sp : ffff80008000bd90
...
| Call trace:
|  test_has_mpam+0x18/0x30
|  update_cpu_capabilities+0x7c/0x11c
|  setup_cpu_features+0x14/0xd8
|  smp_cpus_done+0x24/0xb8
|  smp_init+0x7c/0x8c
|  kernel_init_freeable+0xf8/0x280
|  kernel_init+0x24/0x1e0
|  ret_from_fork+0x10/0x20
| Code: 910003fd 97ffffde 72001c00 54000080 (d538a500)
| ---[ end trace 0000000000000000 ]---
| Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
| ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---

Add the support to enable the traps, and handle the three guest accessible
registers by injecting an UNDEF. This stops KVM from spamming the host
log, but doesn't yet hide the feature from the id registers.

With MPAM v1.0 we can trap the MPAMIDR_EL1 register only if
ARM64_HAS_MPAM_HCR, with v1.1 an additional MPAM2_EL2.TIDR bit traps
MPAMIDR_EL1 on platforms that don't have MPAMHCR_EL2. Enable one of
these if either is supported. If neither is supported, the guest can
discover that the CPU has MPAM support, and how many PARTID etc the
host has ... but it can't influence anything, so its harmless.

Fixes: 011e5f5bf5 ("arm64/cpufeature: Add remaining feature bits in ID_AA64PFR0 register")
CC: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/linux-arm-kernel/20200925160102.118858-1-james.morse@arm.com/
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20241030160317.2528209-5-joey.gouly@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31 18:09:38 +00:00
James Morse
09e6b306f3 arm64: cpufeature: discover CPU support for MPAM
ARMv8.4 adds support for 'Memory Partitioning And Monitoring' (MPAM)
which describes an interface to cache and bandwidth controls wherever
they appear in the system.

Add support to detect MPAM. Like SVE, MPAM has an extra id register that
describes some more properties, including the virtualisation support,
which is optional. Detect this separately so we can detect
mismatched/insane systems, but still use MPAM on the host even if the
virtualisation support is missing.

MPAM needs enabling at the highest implemented exception level, otherwise
the register accesses trap. The 'enabled' flag is accessible to lower
exception levels, but its in a register that traps when MPAM isn't enabled.
The cpufeature 'matches' hook is extended to test this on one of the
CPUs, so that firmware can emulate MPAM as disabled if it is reserved
for use by secure world.

Secondary CPUs that appear late could trip cpufeature's 'lower safe'
behaviour after the MPAM properties have been advertised to user-space.
Add a verify call to ensure late secondaries match the existing CPUs.

(If you have a boot failure that bisects here its likely your CPUs
advertise MPAM in the id registers, but firmware failed to either enable
or MPAM, or emulate the trap as if it were disabled)

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20241030160317.2528209-4-joey.gouly@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31 18:09:38 +00:00
Mark Brown
ddadbcdaae arm64: Support AT_HWCAP3
We have filled all 64 bits of AT_HWCAP2 so in order to support discovery of
further features provide the framework to use the already defined AT_HWCAP3
for further CPU features.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20241004-arm64-elf-hwcap3-v2-2-799d1daad8b0@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-17 18:38:50 +01:00
Joey Gouly
9c4a25140d arm64: cpufeature: add POE to cpucap_is_possible()
Since de66cb37ab ("arm64: Add cpucap_is_possible()"),
alternative_has_cap_unlikely() includes the IS_ENABLED() check.

Add CONFIG_ARM64_POE to cpucap_is_possible() to avoid the explicit check.

Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Will Deacon <will@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20241008140121.2774348-1-joey.gouly@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-16 15:30:53 +01:00
Mark Brown
6487c96308 arm64/cpufeature: Runtime detection of Guarded Control Stack (GCS)
Add a cpufeature for GCS, allowing other code to conditionally support it
at runtime.

Reviewed-by: Thiago Jung Bauermann <thiago.bauermann@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20241001-arm64-gcs-v13-12-222b78d87eee@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-04 12:04:35 +01:00
Joey Gouly
160a8e13de arm64: context switch POR_EL0 register
POR_EL0 is a register that can be modified by userspace directly,
so it must be context switched.

Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20240822151113.1479789-7-joey.gouly@arm.com
[will: Dropped unnecessary isb()s]
Signed-off-by: Will Deacon <will@kernel.org>
2024-09-04 12:52:18 +01:00
Anshuman Khandual
056600ff73 arm64/cpufeature: Replace custom macros with fields from ID_AA64PFR0_EL1
This replaces custom macros usage (i.e ID_AA64PFR0_EL1_ELx_64BIT_ONLY and
ID_AA64PFR0_EL1_ELx_32BIT_64BIT) and instead directly uses register fields
from ID_AA64PFR0_EL1 sysreg definition. Finally let's drop off both these
custom macros as they are now redundant.

Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20240613102710.3295108-3-anshuman.khandual@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-07-04 12:21:12 +01:00
Linus Torvalds
4f712ee0cb S390:
* Changes to FPU handling came in via the main s390 pull request
 
 * Only deliver to the guest the SCLP events that userspace has
   requested.
 
 * More virtual vs physical address fixes (only a cleanup since
   virtual and physical address spaces are currently the same).
 
 * Fix selftests undefined behavior.
 
 x86:
 
 * Fix a restriction that the guest can't program a PMU event whose
   encoding matches an architectural event that isn't included in the
   guest CPUID.  The enumeration of an architectural event only says
   that if a CPU supports an architectural event, then the event can be
   programmed *using the architectural encoding*.  The enumeration does
   NOT say anything about the encoding when the CPU doesn't report support
   the event *in general*.  It might support it, and it might support it
   using the same encoding that made it into the architectural PMU spec.
 
 * Fix a variety of bugs in KVM's emulation of RDPMC (more details on
   individual commits) and add a selftest to verify KVM correctly emulates
   RDMPC, counter availability, and a variety of other PMC-related
   behaviors that depend on guest CPUID and therefore are easier to
   validate with selftests than with custom guests (aka kvm-unit-tests).
 
 * Zero out PMU state on AMD if the virtual PMU is disabled, it does not
   cause any bug but it wastes time in various cases where KVM would check
   if a PMC event needs to be synthesized.
 
 * Optimize triggering of emulated events, with a nice ~10% performance
   improvement in VM-Exit microbenchmarks when a vPMU is exposed to the
   guest.
 
 * Tighten the check for "PMI in guest" to reduce false positives if an NMI
   arrives in the host while KVM is handling an IRQ VM-Exit.
 
 * Fix a bug where KVM would report stale/bogus exit qualification information
   when exiting to userspace with an internal error exit code.
 
 * Add a VMX flag in /proc/cpuinfo to report 5-level EPT support.
 
 * Rework TDP MMU root unload, free, and alloc to run with mmu_lock held for
   read, e.g. to avoid serializing vCPUs when userspace deletes a memslot.
 
 * Tear down TDP MMU page tables at 4KiB granularity (used to be 1GiB).  KVM
   doesn't support yielding in the middle of processing a zap, and 1GiB
   granularity resulted in multi-millisecond lags that are quite impolite
   for CONFIG_PREEMPT kernels.
 
 * Allocate write-tracking metadata on-demand to avoid the memory overhead when
   a kernel is built with i915 virtualization support but the workloads use
   neither shadow paging nor i915 virtualization.
 
 * Explicitly initialize a variety of on-stack variables in the emulator that
   triggered KMSAN false positives.
 
 * Fix the debugregs ABI for 32-bit KVM.
 
 * Rework the "force immediate exit" code so that vendor code ultimately decides
   how and when to force the exit, which allowed some optimization for both
   Intel and AMD.
 
 * Fix a long-standing bug where kvm_has_noapic_vcpu could be left elevated if
   vCPU creation ultimately failed, causing extra unnecessary work.
 
 * Cleanup the logic for checking if the currently loaded vCPU is in-kernel.
 
 * Harden against underflowing the active mmu_notifier invalidation
   count, so that "bad" invalidations (usually due to bugs elsehwere in the
   kernel) are detected earlier and are less likely to hang the kernel.
 
 x86 Xen emulation:
 
 * Overlay pages can now be cached based on host virtual address,
   instead of guest physical addresses.  This removes the need to
   reconfigure and invalidate the cache if the guest changes the
   gpa but the underlying host virtual address remains the same.
 
 * When possible, use a single host TSC value when computing the deadline for
   Xen timers in order to improve the accuracy of the timer emulation.
 
 * Inject pending upcall events when the vCPU software-enables its APIC to fix
   a bug where an upcall can be lost (and to follow Xen's behavior).
 
 * Fall back to the slow path instead of warning if "fast" IRQ delivery of Xen
   events fails, e.g. if the guest has aliased xAPIC IDs.
 
 RISC-V:
 
 * Support exception and interrupt handling in selftests
 
 * New self test for RISC-V architectural timer (Sstc extension)
 
 * New extension support (Ztso, Zacas)
 
 * Support userspace emulation of random number seed CSRs.
 
 ARM:
 
 * Infrastructure for building KVM's trap configuration based on the
   architectural features (or lack thereof) advertised in the VM's ID
   registers
 
 * Support for mapping vfio-pci BARs as Normal-NC (vaguely similar to
   x86's WC) at stage-2, improving the performance of interacting with
   assigned devices that can tolerate it
 
 * Conversion of KVM's representation of LPIs to an xarray, utilized to
   address serialization some of the serialization on the LPI injection
   path
 
 * Support for _architectural_ VHE-only systems, advertised through the
   absence of FEAT_E2H0 in the CPU's ID register
 
 * Miscellaneous cleanups, fixes, and spelling corrections to KVM and
   selftests
 
 LoongArch:
 
 * Set reserved bits as zero in CPUCFG.
 
 * Start SW timer only when vcpu is blocking.
 
 * Do not restart SW timer when it is expired.
 
 * Remove unnecessary CSR register saving during enter guest.
 
 * Misc cleanups and fixes as usual.
 
 Generic:
 
 * cleanup Kconfig by removing CONFIG_HAVE_KVM, which was basically always
   true on all architectures except MIPS (where Kconfig determines the
   available depending on CPU capabilities).  It is replaced either by
   an architecture-dependent symbol for MIPS, and IS_ENABLED(CONFIG_KVM)
   everywhere else.
 
 * Factor common "select" statements in common code instead of requiring
   each architecture to specify it
 
 * Remove thoroughly obsolete APIs from the uapi headers.
 
 * Move architecture-dependent stuff to uapi/asm/kvm.h
 
 * Always flush the async page fault workqueue when a work item is being
   removed, especially during vCPU destruction, to ensure that there are no
   workers running in KVM code when all references to KVM-the-module are gone,
   i.e. to prevent a very unlikely use-after-free if kvm.ko is unloaded.
 
 * Grab a reference to the VM's mm_struct in the async #PF worker itself instead
   of gifting the worker a reference, so that there's no need to remember
   to *conditionally* clean up after the worker.
 
 Selftests:
 
 * Reduce boilerplate especially when utilize selftest TAP infrastructure.
 
 * Add basic smoke tests for SEV and SEV-ES, along with a pile of library
   support for handling private/encrypted/protected memory.
 
 * Fix benign bugs where tests neglect to close() guest_memfd files.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmX0iP8UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroND7wf+JZoNvwZ+bmwWe/4jn/YwNoYi/C5z
 eypn8M1gsWEccpCpqPBwznVm9T29rF4uOlcMvqLEkHfTpaL1EKUUjP1lXPz/ileP
 6a2RdOGxAhyTiFC9fjy+wkkjtLbn1kZf6YsS0hjphP9+w0chNbdn0w81dFVnXryd
 j7XYI8R/bFAthNsJOuZXSEjCfIHxvTTG74OrTf1B1FEBB+arPmrgUeJftMVhffQK
 Sowgg8L/Ii/x6fgV5NZQVSIyVf1rp8z7c6UaHT4Fwb0+RAMW8p9pYv9Qp1YkKp8y
 5j0V9UzOHP7FRaYimZ5BtwQoqiZXYylQ+VuU/Y2f4X85cvlLzSqxaEMAPA==
 =mqOV
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "S390:

   - Changes to FPU handling came in via the main s390 pull request

   - Only deliver to the guest the SCLP events that userspace has
     requested

   - More virtual vs physical address fixes (only a cleanup since
     virtual and physical address spaces are currently the same)

   - Fix selftests undefined behavior

  x86:

   - Fix a restriction that the guest can't program a PMU event whose
     encoding matches an architectural event that isn't included in the
     guest CPUID. The enumeration of an architectural event only says
     that if a CPU supports an architectural event, then the event can
     be programmed *using the architectural encoding*. The enumeration
     does NOT say anything about the encoding when the CPU doesn't
     report support the event *in general*. It might support it, and it
     might support it using the same encoding that made it into the
     architectural PMU spec

   - Fix a variety of bugs in KVM's emulation of RDPMC (more details on
     individual commits) and add a selftest to verify KVM correctly
     emulates RDMPC, counter availability, and a variety of other
     PMC-related behaviors that depend on guest CPUID and therefore are
     easier to validate with selftests than with custom guests (aka
     kvm-unit-tests)

   - Zero out PMU state on AMD if the virtual PMU is disabled, it does
     not cause any bug but it wastes time in various cases where KVM
     would check if a PMC event needs to be synthesized

   - Optimize triggering of emulated events, with a nice ~10%
     performance improvement in VM-Exit microbenchmarks when a vPMU is
     exposed to the guest

   - Tighten the check for "PMI in guest" to reduce false positives if
     an NMI arrives in the host while KVM is handling an IRQ VM-Exit

   - Fix a bug where KVM would report stale/bogus exit qualification
     information when exiting to userspace with an internal error exit
     code

   - Add a VMX flag in /proc/cpuinfo to report 5-level EPT support

   - Rework TDP MMU root unload, free, and alloc to run with mmu_lock
     held for read, e.g. to avoid serializing vCPUs when userspace
     deletes a memslot

   - Tear down TDP MMU page tables at 4KiB granularity (used to be
     1GiB). KVM doesn't support yielding in the middle of processing a
     zap, and 1GiB granularity resulted in multi-millisecond lags that
     are quite impolite for CONFIG_PREEMPT kernels

   - Allocate write-tracking metadata on-demand to avoid the memory
     overhead when a kernel is built with i915 virtualization support
     but the workloads use neither shadow paging nor i915 virtualization

   - Explicitly initialize a variety of on-stack variables in the
     emulator that triggered KMSAN false positives

   - Fix the debugregs ABI for 32-bit KVM

   - Rework the "force immediate exit" code so that vendor code
     ultimately decides how and when to force the exit, which allowed
     some optimization for both Intel and AMD

   - Fix a long-standing bug where kvm_has_noapic_vcpu could be left
     elevated if vCPU creation ultimately failed, causing extra
     unnecessary work

   - Cleanup the logic for checking if the currently loaded vCPU is
     in-kernel

   - Harden against underflowing the active mmu_notifier invalidation
     count, so that "bad" invalidations (usually due to bugs elsehwere
     in the kernel) are detected earlier and are less likely to hang the
     kernel

  x86 Xen emulation:

   - Overlay pages can now be cached based on host virtual address,
     instead of guest physical addresses. This removes the need to
     reconfigure and invalidate the cache if the guest changes the gpa
     but the underlying host virtual address remains the same

   - When possible, use a single host TSC value when computing the
     deadline for Xen timers in order to improve the accuracy of the
     timer emulation

   - Inject pending upcall events when the vCPU software-enables its
     APIC to fix a bug where an upcall can be lost (and to follow Xen's
     behavior)

   - Fall back to the slow path instead of warning if "fast" IRQ
     delivery of Xen events fails, e.g. if the guest has aliased xAPIC
     IDs

  RISC-V:

   - Support exception and interrupt handling in selftests

   - New self test for RISC-V architectural timer (Sstc extension)

   - New extension support (Ztso, Zacas)

   - Support userspace emulation of random number seed CSRs

  ARM:

   - Infrastructure for building KVM's trap configuration based on the
     architectural features (or lack thereof) advertised in the VM's ID
     registers

   - Support for mapping vfio-pci BARs as Normal-NC (vaguely similar to
     x86's WC) at stage-2, improving the performance of interacting with
     assigned devices that can tolerate it

   - Conversion of KVM's representation of LPIs to an xarray, utilized
     to address serialization some of the serialization on the LPI
     injection path

   - Support for _architectural_ VHE-only systems, advertised through
     the absence of FEAT_E2H0 in the CPU's ID register

   - Miscellaneous cleanups, fixes, and spelling corrections to KVM and
     selftests

  LoongArch:

   - Set reserved bits as zero in CPUCFG

   - Start SW timer only when vcpu is blocking

   - Do not restart SW timer when it is expired

   - Remove unnecessary CSR register saving during enter guest

   - Misc cleanups and fixes as usual

  Generic:

   - Clean up Kconfig by removing CONFIG_HAVE_KVM, which was basically
     always true on all architectures except MIPS (where Kconfig
     determines the available depending on CPU capabilities). It is
     replaced either by an architecture-dependent symbol for MIPS, and
     IS_ENABLED(CONFIG_KVM) everywhere else

   - Factor common "select" statements in common code instead of
     requiring each architecture to specify it

   - Remove thoroughly obsolete APIs from the uapi headers

   - Move architecture-dependent stuff to uapi/asm/kvm.h

   - Always flush the async page fault workqueue when a work item is
     being removed, especially during vCPU destruction, to ensure that
     there are no workers running in KVM code when all references to
     KVM-the-module are gone, i.e. to prevent a very unlikely
     use-after-free if kvm.ko is unloaded

   - Grab a reference to the VM's mm_struct in the async #PF worker
     itself instead of gifting the worker a reference, so that there's
     no need to remember to *conditionally* clean up after the worker

  Selftests:

   - Reduce boilerplate especially when utilize selftest TAP
     infrastructure

   - Add basic smoke tests for SEV and SEV-ES, along with a pile of
     library support for handling private/encrypted/protected memory

   - Fix benign bugs where tests neglect to close() guest_memfd files"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (246 commits)
  selftests: kvm: remove meaningless assignments in Makefiles
  KVM: riscv: selftests: Add Zacas extension to get-reg-list test
  RISC-V: KVM: Allow Zacas extension for Guest/VM
  KVM: riscv: selftests: Add Ztso extension to get-reg-list test
  RISC-V: KVM: Allow Ztso extension for Guest/VM
  RISC-V: KVM: Forward SEED CSR access to user space
  KVM: riscv: selftests: Add sstc timer test
  KVM: riscv: selftests: Change vcpu_has_ext to a common function
  KVM: riscv: selftests: Add guest helper to get vcpu id
  KVM: riscv: selftests: Add exception handling support
  LoongArch: KVM: Remove unnecessary CSR register saving during enter guest
  LoongArch: KVM: Do not restart SW timer when it is expired
  LoongArch: KVM: Start SW timer only when vcpu is blocking
  LoongArch: KVM: Set reserved bits as zero in CPUCFG
  KVM: selftests: Explicitly close guest_memfd files in some gmem tests
  KVM: x86/xen: fix recursive deadlock in timer injection
  KVM: pfncache: simplify locking and make more self-contained
  KVM: x86/xen: remove WARN_ON_ONCE() with false positives in evtchn delivery
  KVM: x86/xen: inject vCPU upcall vector when local APIC is enabled
  KVM: x86/xen: improve accuracy of Xen timers
  ...
2024-03-15 13:03:13 -07:00
Linus Torvalds
6d75c6f40a arm64 updates for 6.9:
* Reorganise the arm64 kernel VA space and add support for LPA2 (at
   stage 1, KVM stage 2 was merged earlier) - 52-bit VA/PA address range
   with 4KB and 16KB pages
 
 * Enable Rust on arm64
 
 * Support for the 2023 dpISA extensions (data processing ISA), host only
 
 * arm64 perf updates:
 
   - StarFive's StarLink (integrates one or more CPU cores with a shared
     L3 memory system) PMU support
 
   - Enable HiSilicon Erratum 162700402 quirk for HIP09
 
   - Several updates for the HiSilicon PCIe PMU driver
 
   - Arm CoreSight PMU support
 
   - Convert all drivers under drivers/perf/ to use .remove_new()
 
 * Miscellaneous:
 
   - Don't enable workarounds for "rare" errata by default
 
   - Clean up the DAIF flags handling for EL0 returns (in preparation for
     NMI support)
 
   - Kselftest update for ptrace()
 
   - Update some of the sysreg field definitions
 
   - Slight improvement in the code generation for inline asm I/O
     accessors to permit offset addressing
 
   - kretprobes: acquire regs via a BRK exception (previously done via a
     trampoline handler)
 
   - SVE/SME cleanups, comment updates
 
   - Allow CALL_OPS+CC_OPTIMIZE_FOR_SIZE with clang (previously disabled
     due to gcc silently ignoring -falign-functions=N)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmXxiSgACgkQa9axLQDI
 XvHd7hAAjQrQqxJogPT2ahM5/gxct8qTrXpIgX0B1Y7bb5R8ztvOUN9MJNuDyRsj
 0s28SSZw387LReM5OUu+U6G/iahcuNAyP/8d9qeac32Tidd255fV3KPEh4C4eC+u
 0HeOqLBZ+stmNoa71tBC2K6SmchizhYyYduvRnri8km8K4OMDawHWqWRTXl0PNRT
 RMVJvZTDJMPfMBFeD4+B7EnSFOoP14tKCw9MZvlbpT2PEV0kINjhCQiojW2jJgqv
 w36vm/dhwsg1avSzT1xhy3KE+m+7n28+IC/wr1HB7c1WumvYKv7Z84ieCp3PlO3Z
 owvVO7dKJC6X3RkoY6Kge5p2RHU6poDerDVHYiAvG+Zi57nrDmHyAubskThsGTGR
 AibSEeJ5nQ0yM6hx7zAIQa5XEo4l0svD1ZM7NynY+5JR44W9cdAH3SnEsvIBMGIf
 /ja+iZ1W4ZQnIESQXD5uDPSxILfqQ8Ebhdorpw+Qg3rB7OhdTdGSSGQCi6V2PcJH
 d/ErFO+i0lFRBPJtBbUAN4EEu3HJcVYEoEnVJYQahC+6KyNGLxO+7L6sH0YO7Pag
 P1LRa6h8ktuBMrbCrOPWdmJYNDYCbb5rRtmcCwO0ItZ4g5tYWp9djFc8pyctCaNB
 MZxxRrUCNwXTOcFTDiYzyk+JCvpf3EvXfvj8AH+P8BMjFWgqHqw=
 =KTD/
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:
 "The major features are support for LPA2 (52-bit VA/PA with 4K and 16K
  pages), the dpISA extension and Rust enabled on arm64. The changes are
  mostly contained within the usual arch/arm64/, drivers/perf, the arm64
  Documentation and kselftests. The exception is the Rust support which
  touches some generic build files.

  Summary:

   - Reorganise the arm64 kernel VA space and add support for LPA2 (at
     stage 1, KVM stage 2 was merged earlier) - 52-bit VA/PA address
     range with 4KB and 16KB pages

   - Enable Rust on arm64

   - Support for the 2023 dpISA extensions (data processing ISA), host
     only

   - arm64 perf updates:

      - StarFive's StarLink (integrates one or more CPU cores with a
        shared L3 memory system) PMU support

      - Enable HiSilicon Erratum 162700402 quirk for HIP09

      - Several updates for the HiSilicon PCIe PMU driver

      - Arm CoreSight PMU support

      - Convert all drivers under drivers/perf/ to use .remove_new()

   - Miscellaneous:

      - Don't enable workarounds for "rare" errata by default

      - Clean up the DAIF flags handling for EL0 returns (in preparation
        for NMI support)

      - Kselftest update for ptrace()

      - Update some of the sysreg field definitions

      - Slight improvement in the code generation for inline asm I/O
        accessors to permit offset addressing

      - kretprobes: acquire regs via a BRK exception (previously done
        via a trampoline handler)

      - SVE/SME cleanups, comment updates

      - Allow CALL_OPS+CC_OPTIMIZE_FOR_SIZE with clang (previously
        disabled due to gcc silently ignoring -falign-functions=N)"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (134 commits)
  Revert "mm: add arch hook to validate mmap() prot flags"
  Revert "arm64: mm: add support for WXN memory translation attribute"
  Revert "ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512"
  ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512
  kselftest/arm64: Add 2023 DPISA hwcap test coverage
  kselftest/arm64: Add basic FPMR test
  kselftest/arm64: Handle FPMR context in generic signal frame parser
  arm64/hwcap: Define hwcaps for 2023 DPISA features
  arm64/ptrace: Expose FPMR via ptrace
  arm64/signal: Add FPMR signal handling
  arm64/fpsimd: Support FEAT_FPMR
  arm64/fpsimd: Enable host kernel access to FPMR
  arm64/cpufeature: Hook new identification registers up to cpufeature
  docs: perf: Fix build warning of hisi-pcie-pmu.rst
  perf: starfive: Only allow COMPILE_TEST for 64-bit architectures
  MAINTAINERS: Add entry for StarFive StarLink PMU
  docs: perf: Add description for StarFive's StarLink PMU
  dt-bindings: perf: starfive: Add JH8100 StarLink PMU
  perf: starfive: Add StarLink PMU support
  docs: perf: Update usage for target filter of hisi-pcie-pmu
  ...
2024-03-14 15:35:42 -07:00
Catalin Marinas
69ebc01824 Revert "arm64: mm: add support for WXN memory translation attribute"
This reverts commit 50e3ed0f93.

The SCTLR_EL1.WXN control forces execute-never when a page has write
permissions. While the idea of hardening such write/exec combinations is
good, with permissions indirection enabled (FEAT_PIE) this control
becomes RES0. FEAT_PIE introduces a slightly different form of WXN which
only has an effect when the base permission is RWX and the write is
toggled by the permission overlay (FEAT_POE, not yet supported by the
arm64 kernel). Revert the patch for now.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/ZfGESD3a91lxH367@arm.com
2024-03-13 10:53:20 +00:00
Paolo Bonzini
961e2bfcf3 KVM/arm64 updates for 6.9
- Infrastructure for building KVM's trap configuration based on the
    architectural features (or lack thereof) advertised in the VM's ID
    registers
 
  - Support for mapping vfio-pci BARs as Normal-NC (vaguely similar to
    x86's WC) at stage-2, improving the performance of interacting with
    assigned devices that can tolerate it
 
  - Conversion of KVM's representation of LPIs to an xarray, utilized to
    address serialization some of the serialization on the LPI injection
    path
 
  - Support for _architectural_ VHE-only systems, advertised through the
    absence of FEAT_E2H0 in the CPU's ID register
 
  - Miscellaneous cleanups, fixes, and spelling corrections to KVM and
    selftests
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSNXHjWXuzMZutrKNKivnWIJHzdFgUCZepBjgAKCRCivnWIJHzd
 FnngAP93VxjCkJ+5qSmYpFNG6r0ECVIbLHFQ59nKn0+GgvbPEgEAwt8svdLdW06h
 njFTpdzvl4Po+aD/V9xHgqVz3kVvZwE=
 =1FbW
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-6.9' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for 6.9

 - Infrastructure for building KVM's trap configuration based on the
   architectural features (or lack thereof) advertised in the VM's ID
   registers

 - Support for mapping vfio-pci BARs as Normal-NC (vaguely similar to
   x86's WC) at stage-2, improving the performance of interacting with
   assigned devices that can tolerate it

 - Conversion of KVM's representation of LPIs to an xarray, utilized to
   address serialization some of the serialization on the LPI injection
   path

 - Support for _architectural_ VHE-only systems, advertised through the
   absence of FEAT_E2H0 in the CPU's ID register

 - Miscellaneous cleanups, fixes, and spelling corrections to KVM and
   selftests
2024-03-11 10:02:32 -04:00
Catalin Marinas
88f0912253 Merge branch 'for-next/stage1-lpa2' into for-next/core
* for-next/stage1-lpa2: (48 commits)
  : Add support for LPA2 and WXN and stage 1
  arm64/mm: Avoid ID mapping of kpti flag if it is no longer needed
  arm64/mm: Use generic __pud_free() helper in pud_free() implementation
  arm64: gitignore: ignore relacheck
  arm64: Use Signed/Unsigned enums for TGRAN{4,16,64} and VARange
  arm64: mm: Make PUD folding check in set_pud() a runtime check
  arm64: mm: add support for WXN memory translation attribute
  mm: add arch hook to validate mmap() prot flags
  arm64: defconfig: Enable LPA2 support
  arm64: Enable 52-bit virtual addressing for 4k and 16k granule configs
  arm64: kvm: avoid CONFIG_PGTABLE_LEVELS for runtime levels
  arm64: ptdump: Deal with translation levels folded at runtime
  arm64: ptdump: Disregard unaddressable VA space
  arm64: mm: Add support for folding PUDs at runtime
  arm64: kasan: Reduce minimum shadow alignment and enable 5 level paging
  arm64: mm: Add 5 level paging support to fixmap and swapper handling
  arm64: Enable LPA2 at boot if supported by the system
  arm64: mm: add LPA2 and 5 level paging support to G-to-nG conversion
  arm64: mm: Add definitions to support 5 levels of paging
  arm64: mm: Add LPA2 support to phys<->pte conversion routines
  arm64: mm: Wire up TCR.DS bit to PTE shareability fields
  ...
2024-03-07 19:05:29 +00:00
Mark Brown
203f2b95a8 arm64/fpsimd: Support FEAT_FPMR
FEAT_FPMR defines a new EL0 accessible register FPMR use to configure the
FP8 related features added to the architecture at the same time. Detect
support for this register and context switch it for EL0 when present.

Due to the sharing of responsibility for saving floating point state
between the host kernel and KVM FP8 support is not yet implemented in KVM
and a stub similar to that used for SVCR is provided for FPMR in order to
avoid bisection issues. To make it easier to share host state with the
hypervisor we store FPMR as a hardened usercopy field in uw (along with
some padding).

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20240306-arm64-2023-dpisa-v5-3-c568edc8ed7f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-03-07 17:14:53 +00:00
Ard Biesheuvel
50e3ed0f93 arm64: mm: add support for WXN memory translation attribute
The AArch64 virtual memory system supports a global WXN control, which
can be enabled to make all writable mappings implicitly no-exec. This is
a useful hardening feature, as it prevents mistakes in managing page
table permissions from being exploited to attack the system.

When enabled at EL1, the restrictions apply to both EL1 and EL0. EL1 is
completely under our control, and has been cleaned up to allow WXN to be
enabled from boot onwards. EL0 is not under our control, but given that
widely deployed security features such as selinux or PaX already limit
the ability of user space to create mappings that are writable and
executable at the same time, the impact of enabling this for EL0 is
expected to be limited. (For this reason, common user space libraries
that have a legitimate need for manipulating executable code already
carry fallbacks such as [0].)

If enabled at compile time, the feature can still be disabled at boot if
needed, by passing arm64.nowxn on the kernel command line.

[0] https://github.com/libffi/libffi/blob/master/src/closures.c#L440

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-88-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16 12:42:43 +00:00
Ard Biesheuvel
9684ec186f arm64: Enable LPA2 at boot if supported by the system
Update the early kernel mapping code to take 52-bit virtual addressing
into account based on the LPA2 feature. This is a bit more involved than
LVA (which is supported with 64k pages only), given that some page table
descriptor bits change meaning in this case.

To keep the handling in asm to a minimum, the initial ID map is still
created with 48-bit virtual addressing, which implies that the kernel
image must be loaded into 48-bit addressable physical memory. This is
currently required by the boot protocol, even though we happen to
support placement outside of that for LVA/64k based configurations.

Enabling LPA2 involves more than setting TCR.T1SZ to a lower value,
there is also a DS bit in TCR that needs to be set, and which changes
the meaning of bits [9:8] in all page table descriptors. Since we cannot
enable DS and every live page table descriptor at the same time, let's
pivot through another temporary mapping. This avoids the need to
reintroduce manipulations of the page tables with the MMU and caches
disabled.

To permit the LPA2 feature to be overridden on the kernel command line,
which may be necessary to work around silicon errata, or to deal with
mismatched features on heterogeneous SoC designs, test for CPU feature
overrides first, and only then enable LPA2.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-78-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16 12:42:40 +00:00
Ard Biesheuvel
68aec33f8f arm64: mm: Add feature override support for LVA
Add support for overriding the VARange field of the MMFR2 CPU ID
register. This permits the associated LVA feature to be overridden early
enough for the boot code that creates the kernel mapping to take it into
account.

Given that LPA2 implies LVA, disabling the latter should disable the
former as well. So override the ID_AA64MMFR0.TGran field of the current
page size as well if it advertises support for 52-bit addressing.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-71-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16 12:42:36 +00:00
Ard Biesheuvel
9cce9c6c2c arm64: mm: Handle LVA support as a CPU feature
Currently, we detect CPU support for 52-bit virtual addressing (LVA)
extremely early, before creating the kernel page tables or enabling the
MMU. We cannot override the feature this early, and so large virtual
addressing is always enabled on CPUs that implement support for it if
the software support for it was enabled at build time. It also means we
rely on non-trivial code in asm to deal with this feature.

Given that both the ID map and the TTBR1 mapping of the kernel image are
guaranteed to be 48-bit addressable, it is not actually necessary to
enable support this early, and instead, we can model it as a CPU
feature. That way, we can rely on code patching to get the correct
TCR.T1SZ values programmed on secondary boot and resume from suspend.

On the primary boot path, we simply enable the MMU with 48-bit virtual
addressing initially, and update TCR.T1SZ if LVA is supported from C
code, right before creating the kernel mapping. Given that TTBR1 still
points to reserved_pg_dir at this point, updating TCR.T1SZ should be
safe without the need for explicit TLB maintenance.

Since this gets rid of all accesses to the vabits_actual variable from
asm code that occurred before TCR.T1SZ had been programmed, we no longer
have a need for this variable, and we can replace it with a C expression
that produces the correct value directly, based on the value of TCR.T1SZ.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-70-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16 12:42:36 +00:00
Ard Biesheuvel
a669c6a493 arm64: Add helpers to probe local CPU for PAC and BTI support
Add some helpers that will be used by the early kernel mapping code to
check feature support on the local CPU. This permits the early kernel
mapping to be created with the right attributes, removing the need for
tearing it down and recreating it.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-58-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16 12:42:31 +00:00
Ard Biesheuvel
9ddd9baa42 arm64: idreg-override: Create a pseudo feature for rodata=off
Add rodata=off to the set of kernel command line options that is parsed
early using the CPU feature override detection code, so we can easily
refer to it when creating the kernel mapping.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-57-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16 12:42:31 +00:00
Ard Biesheuvel
af73b9a2dd arm64: kaslr: Use feature override instead of parsing the cmdline again
The early kaslr code open codes the detection of 'nokaslr' on the kernel
command line, and this is no longer necessary now that the feature
detection code, which also looks for the same string, executes before
this code.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-56-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16 12:42:31 +00:00
Ard Biesheuvel
35876f35f4 arm64: cpufeature: Add helper to test for CPU feature overrides
Add some helpers to extract and apply feature overrides to the bare
idreg values. This involves inspecting the value and mask of the
specific field that we are interested in, given that an override
value/mask pair might be invalid for one field but valid for another.

Then, wire up the new helper for the hVHE test - note that we can drop
the sysreg test here, as the override will be invalid when trying to
enable hVHE on non-VHE capable hardware.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-55-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16 12:42:30 +00:00
Seongsu Park
c0b26c06ed arm64: fix typo in comments
fix typo in comments

thath -> that

Signed-off-by: Seongsu Park <sgsu.park@samsung.com>
Link: https://lore.kernel.org/r/20240202013306.883777-1-sgsu.park@samsung.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-02-09 16:54:13 +00:00
Marc Zyngier
d9a065914d arm64: cpufeatures: Correctly handle signed values
Although we've had signed values for some features such as PMUv3
and FP, the code that handles the comparaison with some limit
has a couple of annoying issues:

- the min_field_value is always unsigned, meaning that we cannot
  easily compare it with a negative value

- it is not possible to have a range of values, let alone a range
  of negative values

Fix this by:

- adding an upper limit to the comparison, defaulting to all bits
  being set to the maximum positive value

- ensuring that the signess of the min and max values are taken into
  account

A ARM64_CPUID_FIELDS_NEG() macro is provided for signed features, but
nothing is using it yet.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20240122181344.258974-3-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-02-08 15:12:44 +00:00
Will Deacon
ccaeeec529 Merge branch 'for-next/lpa2-prep' into for-next/core
* for-next/lpa2-prep:
  arm64: mm: get rid of kimage_vaddr global variable
  arm64: mm: Take potential load offset into account when KASLR is off
  arm64: kernel: Disable latent_entropy GCC plugin in early C runtime
  arm64: Add ARM64_HAS_LPA2 CPU capability
  arm64/mm: Add FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2]
  arm64/mm: Update tlb invalidation routines for FEAT_LPA2
  arm64/mm: Add lpa2_is_enabled() kvm_lpa2_is_enabled() stubs
  arm64/mm: Modify range-based tlbi to decrement scale
2024-01-04 12:27:42 +00:00
Mark Rutland
eb15d707c2 arm64: Align boot cpucap handling with system cpucap handling
Currently the detection+enablement of boot cpucaps is separate from the
patching of boot cpucap alternatives, which means there's a period where
cpus_have_cap($CAP) and alternative_has_cap($CAP) may be mismatched.

It would be preferable to manage the boot cpucaps in the same way as the
system cpucaps, both for clarity and to minimize the risk of accidental
usage of code relying upon an alternative which has not yet been
patched.

This patch aligns the handling of boot cpucaps with the handling of
system cpucaps:

* The existing setup_boot_cpu_capabilities() function is moved to be
  closer to the setup_system_capabilities() and setup_system_features()
  functions so that they're more clearly related and more likely to be
  updated together in future.

* The patching of boot cpucap alternatives is moved into
  setup_boot_cpu_capabilities(), immediately after boot cpucaps are
  detected and enabled.

* A new setup_boot_cpu_features() function is added to mirror
  setup_system_features(); this handles initialization of cpucap data
  structures and calls setup_boot_cpu_capabilities(). This makes
  init_cpu_features() a closer mirror to update_cpu_features(), and
  makes smp_prepare_boot_cpu() a closer mirror to smp_cpus_done().

Importantly, while these changes alter the structure of the code, they
retain the existing order of calls to:

  init_cpu_features(); // prefix initializing feature regs
  init_cpucap_indirect_list();
  detect_system_supports_pseudo_nmi();
  update_cpu_capabilities(SCOPE_BOOT_CPU | SCOPE_LOCAL_CPU);
  enable_cpu_capabilities(SCOPE_BOOT_CPU);
  apply_boot_alternatives();

... and hence there should be no functional change as a result of this
patch; this is purely a structural cleanup.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20231212170910.3745497-3-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-12-13 16:02:01 +00:00
Ryan Roberts
b1366d21da arm64: Add ARM64_HAS_LPA2 CPU capability
Expose FEAT_LPA2 as a capability so that we can take advantage of
alternatives patching in the hypervisor.

Although FEAT_LPA2 presence is advertised separately for stage1 and
stage2, the expectation is that in practice both stages will either
support or not support it. Therefore, we combine both into a single
capability, allowing us to simplify the implementation. KVM requires
support in both stages in order to use LPA2 since the same library is
used for hyp stage 1 and guest stage 2 pgtables.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231127111737.1897081-6-ryan.roberts@arm.com
2023-11-27 15:03:50 +00:00
Catalin Marinas
14dcf78a6c Merge branch 'for-next/cpus_have_const_cap' into for-next/core
* for-next/cpus_have_const_cap: (38 commits)
  : cpus_have_const_cap() removal
  arm64: Remove cpus_have_const_cap()
  arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_REPEAT_TLBI
  arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_NVIDIA_CARMEL_CNP
  arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_CAVIUM_23154
  arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_2645198
  arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_1742098
  arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_1542419
  arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_843419
  arm64: Avoid cpus_have_const_cap() for ARM64_UNMAP_KERNEL_AT_EL0
  arm64: Avoid cpus_have_const_cap() for ARM64_{SVE,SME,SME2,FA64}
  arm64: Avoid cpus_have_const_cap() for ARM64_SPECTRE_V2
  arm64: Avoid cpus_have_const_cap() for ARM64_SSBS
  arm64: Avoid cpus_have_const_cap() for ARM64_MTE
  arm64: Avoid cpus_have_const_cap() for ARM64_HAS_TLB_RANGE
  arm64: Avoid cpus_have_const_cap() for ARM64_HAS_WFXT
  arm64: Avoid cpus_have_const_cap() for ARM64_HAS_RNG
  arm64: Avoid cpus_have_const_cap() for ARM64_HAS_EPAN
  arm64: Avoid cpus_have_const_cap() for ARM64_HAS_PAN
  arm64: Avoid cpus_have_const_cap() for ARM64_HAS_GIC_PRIO_MASKING
  arm64: Avoid cpus_have_const_cap() for ARM64_HAS_DIT
  ...
2023-10-26 17:10:18 +01:00
Jeremy Linton
23b727dc20 arm64: cpufeature: Display the set of cores with a feature
The AMU feature can be enabled on a subset of the cores in a system.
Because of that, it prints a message for each core as it is detected.
This becomes tedious when there are hundreds of cores. Instead, for
CPU features which can be enabled on a subset of the present cores,
lets wait until update_cpu_capabilities() and print the subset of cores
the feature was enabled on.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com>
Tested-by: Ionela Voinescu <ionela.voinescu@arm.com>
Reviewed-by: Punit Agrawal <punit.agrawal@bytedance.com>
Tested-by: Punit Agrawal <punit.agrawal@bytedance.com>
Link: https://lore.kernel.org/r/20231017052322.1211099-2-jeremy.linton@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-23 16:40:37 +01:00
Mark Rutland
e8d4006dc2 arm64: Remove cpus_have_const_cap()
There are no longer any users of cpus_have_const_cap(), and therefore it
can be removed.

Remove cpus_have_const_cap(). At the same time, remove
__cpus_have_const_cap(), as this is a trivial wrapper of
alternative_has_cap_unlikely(), which can be used directly instead.

The comment for __system_matches_cap() is updated to no longer refer to
cpus_have_const_cap(). As we have a number of ways to check the cpucaps,
the specific suggestions are removed.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Kristina Martsenko <kristina.martsenko@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16 14:17:07 +01:00
Mark Rutland
a76521d160 arm64: Avoid cpus_have_const_cap() for ARM64_{SVE,SME,SME2,FA64}
In system_supports_{sve,sme,sme2,fa64}() we use cpus_have_const_cap() to
check for the relevant cpucaps, but this is only necessary so that
sve_setup() and sme_setup() can run prior to alternatives being patched,
and otherwise alternative_has_cap_*() would be preferable.

For historical reasons, cpus_have_const_cap() is more complicated than
it needs to be. Before cpucaps are finalized, it will perform a bitmap
test of the system_cpucaps bitmap, and once cpucaps are finalized it
will use an alternative branch. This used to be necessary to handle some
race conditions in the window between cpucap detection and the
subsequent patching of alternatives and static branches, where different
branches could be out-of-sync with one another (or w.r.t. alternative
sequences). Now that we use alternative branches instead of static
branches, these are all patched atomically w.r.t. one another, and there
are only a handful of cases that need special care in the window between
cpucap detection and alternative patching.

Due to the above, it would be nice to remove cpus_have_const_cap(), and
migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
or cpus_have_cap() depending on when their requirements. This will
remove redundant instructions and improve code generation, and will make
it easier to determine how each callsite will behave before, during, and
after alternative patching.

All of system_supports_{sve,sme,sme2,fa64}() will return false prior to
system cpucaps being detected. In the window between system cpucaps being
detected and patching alternatives, we need system_supports_sve() and
system_supports_sme() to run to initialize SVE and SME properties, but
all other users of system_supports_{sve,sme,sme2,fa64}() don't depend on
the relevant cpucap becoming true until alternatives are patched:

* No KVM code runs until after alternatives are patched, and so this can
  safely use cpus_have_final_cap() or alternative_has_cap_*().

* The cpuid_cpu_online() callback in arch/arm64/kernel/cpuinfo.c is
  registered later from cpuinfo_regs_init() as a device_initcall, and so
  this can safely use cpus_have_final_cap() or alternative_has_cap_*().

* The entry, signal, and ptrace code isn't reachable until userspace has
  run, and so this can safely use cpus_have_final_cap() or
  alternative_has_cap_*().

* Currently perf_reg_validate() will un-reserve the PERF_REG_ARM64_VG
  pseudo-register before alternatives are patched, and before
  sve_setup() has run. If a sampling event is created early enough, this
  would allow perf_ext_reg_value() to sample (the as-yet uninitialized)
  thread_struct::vl[] prior to alternatives being patched.

  It would be preferable to defer this until alternatives are patched,
  and this can safely use alternative_has_cap_*().

* The context-switch code will run during this window as part of
  stop_machine() used during alternatives_patch_all(), and potentially
  for other work if other kernel threads are created early. No threads
  require the use of SVE/SME/SME2/FA64 prior to alternatives being
  patched, and it would be preferable for the related context-switch
  logic to take effect after alternatives are patched so that ths is
  guaranteed to see a consistent system-wide state (e.g. anything
  initialized by sve_setup() and sme_setup().

  This can safely ues alternative_has_cap_*().

This patch replaces the use of cpus_have_const_cap() with
alternative_has_cap_unlikely(), which will avoid generating code to test
the system_cpucaps bitmap and should be better for all subsequent calls
at runtime. The sve_setup() and sme_setup() functions are modified to
use cpus_have_cap() directly so that they can observe the cpucaps being
set prior to alternatives being patched.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16 14:17:06 +01:00
Mark Rutland
94324bcbc9 arm64: Avoid cpus_have_const_cap() for ARM64_MTE
In system_supports_mte() we use cpus_have_const_cap() to check for
ARM64_MTE, but this is not necessary and cpus_have_final_boot_cap()
would be preferable.

For historical reasons, cpus_have_const_cap() is more complicated than
it needs to be. Before cpucaps are finalized, it will perform a bitmap
test of the system_cpucaps bitmap, and once cpucaps are finalized it
will use an alternative branch. This used to be necessary to handle some
race conditions in the window between cpucap detection and the
subsequent patching of alternatives and static branches, where different
branches could be out-of-sync with one another (or w.r.t. alternative
sequences). Now that we use alternative branches instead of static
branches, these are all patched atomically w.r.t. one another, and there
are only a handful of cases that need special care in the window between
cpucap detection and alternative patching.

Due to the above, it would be nice to remove cpus_have_const_cap(), and
migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
or cpus_have_cap() depending on when their requirements. This will
remove redundant instructions and improve code generation, and will make
it easier to determine how each callsite will behave before, during, and
after alternative patching.

The ARM64_MTE cpucap is a boot cpu feature which is detected and patched
early on the boot CPU under smp_prepare_boot_cpu(). In the window
between detecting the ARM64_MTE cpucap and patching alternatives,
nothing depends on the ARM64_MTE cpucap:

* The kasan_hw_tags_enabled() helper depends upon the kasan_flag_enabled
  static key, which is initialized later in kasan_init_hw_tags() after
  alternatives have been applied.

* No KVM code is called during this window, and KVM is not initialized
  until after system cpucaps have been detected and patched. KVM code
  can safely use cpus_have_final_cap() or alternative_has_cap_*().

* We don't context-switch prior to patching boot alternatives, and thus
  mte_thread_switch() is not reachable during this window. Thus, we can
  safely use cpus_have_final_boot_cap() or alternative_has_cap_*() in
  the context-switch code.

* IRQ and FIQ are masked during this window, and we can only take SError
  and Debug exceptions. SError exceptions are fatal at this point in
  time, and we do not expect to take Debug exceptions, thus:

  - It's fine to lave TCO set for exceptions taken during this window,
    and mte_disable_tco_entry() doesn't need to do anything.

  - We don't need to detect and report asynchronous tag cehck faults
    during this window, and neither mte_check_tfsr_entry() nor
    mte_check_tfsr_exit() need to do anything.

  Since we want to report any SErrors taken during thiw window, these
  cannot safely use cpus_have_final_boot_cap() or cpus_have_final_cap(),
  but these can safely use alternative_has_cap_*().

* The __set_pte_at() function is not used during this window. It is
  possible for this to be used on kernel mappings prior to boot cpucaps
  being finalized, so this cannot safely use cpus_have_final_boot_cap()
  or cpus_have_final_cap(), but this can safely use
  alternative_has_cap_*().

* No userspace translation tables have been created yet, and swap has
  not been initialized yet. Thus swapping is not possible and none of
  the following are called:

  - arch_thp_swp_supported()
  - arch_prepare_to_swap()
  - arch_swap_invalidate_page()
  - arch_swap_invalidate_area()
  - arch_swap_restore()

  These can safely use system_has_final_cap() or
  alternative_has_cap_*().

* The elfcore functions are only reachable after userspace is brought
  up, which happens after system cpucaps have been detected and patched.
  Thus the elfcore code can safely use cpus_have_final_cap() or
  alternative_has_cap_*().

* Hibernation is only possible after userspace is brought up, which
  happens after system cpucaps have been detected and patched. Thus the
  hibernate code can safely use cpus_have_final_cap() or
  alternative_has_cap_*().

* The set_tagged_addr_ctrl() function is only reachable after userspace
  is brought up, which happens after system cpucaps have been detected
  and patched. Thus this can safely use cpus_have_final_cap() or
  alternative_has_cap_*().

* The copy_user_highpage() and copy_highpage() functions are not used
  during this window, and can safely use alternative_has_cap_*().

This patch replaces the use of cpus_have_const_cap() with
alternative_has_cap_unlikely(), which avoid generating code to test the
system_cpucaps bitmap and should be better for all subsequent calls at
runtime.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16 14:17:05 +01:00
Mark Rutland
b54b525764 arm64: Avoid cpus_have_const_cap() for ARM64_HAS_TLB_RANGE
We use cpus_have_const_cap() to check for ARM64_HAS_TLB_RANGE, but this
is not necessary and alternative_has_cap_*() would be preferable.

For historical reasons, cpus_have_const_cap() is more complicated than
it needs to be. Before cpucaps are finalized, it will perform a bitmap
test of the system_cpucaps bitmap, and once cpucaps are finalized it
will use an alternative branch. This used to be necessary to handle some
race conditions in the window between cpucap detection and the
subsequent patching of alternatives and static branches, where different
branches could be out-of-sync with one another (or w.r.t. alternative
sequences). Now that we use alternative branches instead of static
branches, these are all patched atomically w.r.t. one another, and there
are only a handful of cases that need special care in the window between
cpucap detection and alternative patching.

Due to the above, it would be nice to remove cpus_have_const_cap(), and
migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
or cpus_have_cap() depending on when their requirements. This will
remove redundant instructions and improve code generation, and will make
it easier to determine how each callsite will behave before, during, and
after alternative patching.

In the window between detecting the ARM64_HAS_TLB_RANGE cpucap and
patching alternative branches, we do not perform any TLB invalidation,
and even if we were to perform TLB invalidation here it would not be
functionally necessary to optimize this by using range invalidation.
Hence there's no need to use cpus_have_const_cap(), and
alternative_has_cap_unlikely() is sufficient.

This patch replaces the use of cpus_have_const_cap() with
alternative_has_cap_unlikely(), which will avoid generating code to test
the system_cpucaps bitmap and should be better for all subsequent calls
at runtime.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16 14:17:05 +01:00
Mark Rutland
53d62e995d arm64: Avoid cpus_have_const_cap() for ARM64_HAS_PAN
In system_uses_hw_pan() we use cpus_have_const_cap() to check for
ARM64_HAS_PAN, but this is only necessary so that the
system_uses_ttbr0_pan() check in setup_cpu_features() can run prior to
alternatives being patched, and otherwise this is not necessary and
alternative_has_cap_*() would be preferable.

For historical reasons, cpus_have_const_cap() is more complicated than
it needs to be. Before cpucaps are finalized, it will perform a bitmap
test of the system_cpucaps bitmap, and once cpucaps are finalized it
will use an alternative branch. This used to be necessary to handle some
race conditions in the window between cpucap detection and the
subsequent patching of alternatives and static branches, where different
branches could be out-of-sync with one another (or w.r.t. alternative
sequences). Now that we use alternative branches instead of static
branches, these are all patched atomically w.r.t. one another, and there
are only a handful of cases that need special care in the window between
cpucap detection and alternative patching.

Due to the above, it would be nice to remove cpus_have_const_cap(), and
migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
or cpus_have_cap() depending on when their requirements. This will
remove redundant instructions and improve code generation, and will make
it easier to determine how each callsite will behave before, during, and
after alternative patching.

The ARM64_HAS_PAN cpucap is used by system_uses_hw_pan() and
system_uses_ttbr0_pan() depending on whether CONFIG_ARM64_SW_TTBR0_PAN
is selected, and:

* We only use system_uses_hw_pan() directly in __sdei_handler(), which
  isn't reachable until after alternatives have been patched, and for
  this it is safe to use alternative_has_cap_*().

* We use system_uses_ttbr0_pan() in a few places:

  - In check_and_switch_context() and cpu_uninstall_idmap(), which will
    defer installing a translation table into TTBR0 when the
    ARM64_HAS_PAN cpucap is not detected.

    Prior to patching alternatives, all CPUs will be using init_mm with
    the reserved ttbr0 translation tables install in TTBR0, so these can
    safely use alternative_has_cap_*().

  - In update_saved_ttbr0(), which will only save the active TTBR0 into
    a per-thread variable when the ARM64_HAS_PAN cpucap is not detected.

    Prior to patching alternatives, all CPUs will be using init_mm with
    the reserved ttbr0 translation tables install in TTBR0, so these can
    safely use alternative_has_cap_*().

  - In efi_set_pgd(), which will handle check_and_switch_context()
    deferring the installation of TTBR0 when TTBR0 PAN is detected.

    The EFI runtime services are not initialized until after
    alternatives have been patched, and so this can safely use
    alternative_has_cap_*() or cpus_have_final_cap().

  - In uaccess_ttbr0_disable() and uaccess_ttbr0_enable(), where we'll
    avoid installing/uninstalling a translation table in TTBR0 when
    ARM64_HAS_PAN is detected.

    Prior to patching alternatives we will not perform any uaccess and
    will not call uaccess_ttbr0_disable() or uaccess_ttbr0_enable(), and
    so these can safely use alternative_has_cap_*() or
    cpus_have_final_cap().

  - In is_el1_permission_fault() where we will consider a translation
    fault on a TTBR0 address to be a permission fault when ARM64_HAS_PAN
    is not detected *and* we have set the PAN bit in the SPSR (which
    tells us that in the interrupted context, TTBR0 pointed at the
    reserved zero ttbr).

    In the window between detecting system cpucaps and patching
    alternatives we should not perform any accesses to TTBR0 addresses,
    and no userspace translation tables exist until after patching
    alternatives. Thus it is safe for this to use alternative_has_cap*().

This patch replaces the use of cpus_have_const_cap() with
alternative_has_cap_unlikely(), which will avoid generating code to test
the system_cpucaps bitmap and should be better for all subsequent calls
at runtime.

So that the check for TTBR0 PAN in setup_cpu_features() can run prior to
alternatives being patched, the call to system_uses_ttbr0_pan() is
replaced with an explicit check of the ARM64_HAS_PAN bit in the
system_cpucaps bitmap.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16 14:17:04 +01:00
Mark Rutland
20af807d80 arm64: Avoid cpus_have_const_cap() for ARM64_HAS_GIC_PRIO_MASKING
In system_uses_irq_prio_masking() we use cpus_have_const_cap() to check
for ARM64_HAS_GIC_PRIO_MASKING, but this is not necessary and
alternative_has_cap_*() would be preferable.

For historical reasons, cpus_have_const_cap() is more complicated than
it needs to be. Before cpucaps are finalized, it will perform a bitmap
test of the system_cpucaps bitmap, and once cpucaps are finalized it
will use an alternative branch. This used to be necessary to handle some
race conditions in the window between cpucap detection and the
subsequent patching of alternatives and static branches, where different
branches could be out-of-sync with one another (or w.r.t. alternative
sequences). Now that we use alternative branches instead of static
branches, these are all patched atomically w.r.t. one another, and there
are only a handful of cases that need special care in the window between
cpucap detection and alternative patching.

Due to the above, it would be nice to remove cpus_have_const_cap(), and
migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
or cpus_have_cap() depending on when their requirements. This will
remove redundant instructions and improve code generation, and will make
it easier to determine how each callsite will behave before, during, and
after alternative patching.

When CONFIG_ARM64_PSEUDO_NMI=y the ARM64_HAS_GIC_PRIO_MASKING cpucap is
a strict boot cpu feature which is detected and patched early on the
boot cpu, which both happen in smp_prepare_boot_cpu(). In the window
between the ARM64_HAS_GIC_PRIO_MASKING cpucap is detected and
alternatives are patched we don't run any code that depends upon the
ARM64_HAS_GIC_PRIO_MASKING cpucap:

* We leave DAIF.IF set until after boot alternatives are patched, and
  interrupts are unmasked later in init_IRQ(), so we cannot reach
  IRQ/FIQ entry code and will not use irqs_priority_unmasked().

* We don't call any code which uses arm_cpuidle_save_irq_context() and
  arm_cpuidle_restore_irq_context() during this window.

* We don't call start_thread_common() during this window.

* The local_irq_*() code in <asm/irqflags.h> depends solely on an
  alternative branch since commit:

  a5f61cc636 ("arm64: irqflags: use alternative branches for pseudo-NMI logic")

  ... and hence will use the default (DAIF-only) masking behaviour until
  alternatives are patched.

* Secondary CPUs are brought up later after alternatives are patched,
  and alternatives are patched on the boot CPU immediately prior to
  calling init_gic_priority_masking(), so we'll correctly initialize
  interrupt masking regardless.

This patch replaces the use of cpus_have_const_cap() with
alternative_has_cap_unlikely(), which avoid generating code to test the
system_cpucaps bitmap and should be better for all subsequent calls at
runtime. As this makes system_uses_irq_prio_masking() equivalent to
__irqflags_uses_pmr(), the latter is removed and replaced with the
former for consistency.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16 14:17:04 +01:00
Mark Rutland
54c8818aa2 arm64: Avoid cpus_have_const_cap() for ARM64_HAS_CNP
In system_supports_cnp() we use cpus_have_const_cap() to check for
ARM64_HAS_CNP, but this is only necessary so that the cpu_enable_cnp()
callback can run prior to alternatives being patched, and otherwise this
is not necessary and alternative_has_cap_*() would be preferable.

For historical reasons, cpus_have_const_cap() is more complicated than
it needs to be. Before cpucaps are finalized, it will perform a bitmap
test of the system_cpucaps bitmap, and once cpucaps are finalized it
will use an alternative branch. This used to be necessary to handle some
race conditions in the window between cpucap detection and the
subsequent patching of alternatives and static branches, where different
branches could be out-of-sync with one another (or w.r.t. alternative
sequences). Now that we use alternative branches instead of static
branches, these are all patched atomically w.r.t. one another, and there
are only a handful of cases that need special care in the window between
cpucap detection and alternative patching.

Due to the above, it would be nice to remove cpus_have_const_cap(), and
migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
or cpus_have_cap() depending on when their requirements. This will
remove redundant instructions and improve code generation, and will make
it easier to determine how each callsite will behave before, during, and
after alternative patching.

The cpu_enable_cnp() callback is run immediately after the ARM64_HAS_CNP
cpucap is detected system-wide under setup_system_capabilities(), prior
to alternatives being patched. During this window cpu_enable_cnp() uses
cpu_replace_ttbr1() to set the CNP bit for the swapper_pg_dir in TTBR1.
No other users of the ARM64_HAS_CNP cpucap need the up-to-date value
during this window:

* As KVM isn't initialized yet, kvm_get_vttbr() isn't reachable.

* As cpuidle isn't initialized yet, __cpu_suspend_exit() isn't
  reachable.

* At this point all CPUs are using the swapper_pg_dir with a reserved
  ASID in TTBR1, and the idmap_pg_dir in TTBR0, so neither
  check_and_switch_context() nor cpu_do_switch_mm() need to do anything
  special.

This patch replaces the use of cpus_have_const_cap() with
alternative_has_cap_unlikely(), which will avoid generating code to test
the system_cpucaps bitmap and should be better for all subsequent calls
at runtime. To allow cpu_enable_cnp() to function prior to alternatives
being patched, cpu_replace_ttbr1() is split into cpu_replace_ttbr1() and
cpu_enable_swapper_cnp(), with the former only used for early TTBR1
replacement, and the latter used by both cpu_enable_cnp() and
__cpu_suspend_exit().

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16 14:17:04 +01:00
Mark Rutland
bbbb65770b arm64: Avoid cpus_have_const_cap() for ARM64_HAS_BTI
In system_supports_bti() we use cpus_have_const_cap() to check for
ARM64_HAS_BTI, but this is not necessary and alternative_has_cap_*() or
cpus_have_final_*cap() would be preferable.

For historical reasons, cpus_have_const_cap() is more complicated than
it needs to be. Before cpucaps are finalized, it will perform a bitmap
test of the system_cpucaps bitmap, and once cpucaps are finalized it
will use an alternative branch. This used to be necessary to handle some
race conditions in the window between cpucap detection and the
subsequent patching of alternatives and static branches, where different
branches could be out-of-sync with one another (or w.r.t. alternative
sequences). Now that we use alternative branches instead of static
branches, these are all patched atomically w.r.t. one another, and there
are only a handful of cases that need special care in the window between
cpucap detection and alternative patching.

Due to the above, it would be nice to remove cpus_have_const_cap(), and
migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
or cpus_have_cap() depending on when their requirements. This will
remove redundant instructions and improve code generation, and will make
it easier to determine how each callsite will behave before, during, and
after alternative patching.

When CONFIG_ARM64_BTI_KERNEL=y, the ARM64_HAS_BTI cpucap is a strict
boot cpu feature which is detected and patched early on the boot cpu.
All uses guarded by CONFIG_ARM64_BTI_KERNEL happen after the boot CPU
has detected ARM64_HAS_BTI and patched boot alternatives, and hence can
safely use alternative_has_cap_*() or cpus_have_final_boot_cap().

Regardless of CONFIG_ARM64_BTI_KERNEL, all other uses of ARM64_HAS_BTI
happen after system capabilities have been finalized and alternatives
have been patched. Hence these can safely use alternative_has_cap_*) or
cpus_have_final_cap().

This patch splits system_supports_bti() into system_supports_bti() and
system_supports_bti_kernel(), with the former handling where the cpucap
affects userspace functionality, and ther latter handling where the
cpucap affects kernel functionality. The use of cpus_have_const_cap() is
replaced by cpus_have_final_cap() in cpus_have_const_cap, and
cpus_have_final_boot_cap() in system_supports_bti_kernel(). This will
avoid generating code to test the system_cpucaps bitmap and should be
better for all subsequent calls at runtime. The use of
cpus_have_final_cap() and cpus_have_final_boot_cap() will make it easier
to spot if code is chaanged such that these run before the ARM64_HAS_BTI
cpucap is guaranteed to have been finalized.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16 14:17:04 +01:00
Mark Rutland
7f0387cf76 arm64: Avoid cpus_have_const_cap() for ARM64_HAS_{ADDRESS,GENERIC}_AUTH
In system_supports_address_auth() and system_supports_generic_auth() we
use cpus_have_const_cap to check for ARM64_HAS_ADDRESS_AUTH and
ARM64_HAS_GENERIC_AUTH respectively, but this is not necessary and
alternative_has_cap_*() would bre preferable.

For historical reasons, cpus_have_const_cap() is more complicated than
it needs to be. Before cpucaps are finalized, it will perform a bitmap
test of the system_cpucaps bitmap, and once cpucaps are finalized it
will use an alternative branch. This used to be necessary to handle some
race conditions in the window between cpucap detection and the
subsequent patching of alternatives and static branches, where different
branches could be out-of-sync with one another (or w.r.t. alternative
sequences). Now that we use alternative branches instead of static
branches, these are all patched atomically w.r.t. one another, and there
are only a handful of cases that need special care in the window between
cpucap detection and alternative patching.

Due to the above, it would be nice to remove cpus_have_const_cap(), and
migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
or cpus_have_cap() depending on when their requirements. This will
remove redundant instructions and improve code generation, and will make
it easier to determine how each callsite will behave before, during, and
after alternative patching.

The ARM64_HAS_ADDRESS_AUTH cpucap is a boot cpu feature which is
detected and patched early on the boot CPU before any pointer
authentication keys are enabled via their respective SCTLR_ELx.EN* bits.
Nothing which uses system_supports_address_auth() is called before the
boot alternatives are patched. Thus it is safe for
system_supports_address_auth() to use cpus_have_final_boot_cap() to
check for ARM64_HAS_ADDRESS_AUTH.

The ARM64_HAS_GENERIC_AUTH cpucap is a system feature which is detected
on all CPUs, then finalized and patched under
setup_system_capabilities(). We use system_supports_generic_auth() in a
few places:

* The pac_generic_keys_get() and pac_generic_keys_set() functions are
  only reachable from system calls once userspace is up and running. As
  cpucaps are finalzied long before userspace runs, these can safely use
  alternative_has_cap_*() or cpus_have_final_cap().

* The ptrauth_prctl_reset_keys() function is only reachable from system
  calls once userspace is up and running. As cpucaps are finalized long
  before userspace runs, this can safely use alternative_has_cap_*() or
  cpus_have_final_cap().

* The ptrauth_keys_install_user() function is used during
  context-switch. This is called prior to alternatives being applied,
  and so cannot use cpus_have_final_cap(), but as this only needs to
  switch the APGA key for userspace tasks, it's safe to use
  alternative_has_cap_*().

* The ptrauth_keys_init_user() function is used to initialize userspace
  keys, and is only reachable after system cpucaps have been finalized
  and patched. Thus this can safely use alternative_has_cap_*() or
  cpus_have_final_cap().

* The system_has_full_ptr_auth() helper function is only used by KVM
  code, which is only reachable after system cpucaps have been finalized
  and patched. Thus this can safely use alternative_has_cap_*() or
  cpus_have_final_cap().

This patch modifies system_supports_address_auth() to use
cpus_have_final_boot_cap() to check ARM64_HAS_ADDRESS_AUTH, and modifies
system_supports_generic_auth() to use alternative_has_cap_unlikely() to
check ARM64_HAS_GENERIC_AUTH. In either case this will avoid generating
code to test the system_cpucaps bitmap and should be better for all
subsequent calls at runtime. The use of cpus_have_final_boot_cap() will
make it easier to spot if code is chaanged such that these run before
the relevant cpucap is guaranteed to have been finalized.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16 14:17:03 +01:00
Mark Rutland
34f66c4c4d arm64: Use a positive cpucap for FP/SIMD
Currently we have a negative cpucap which describes the *absence* of
FP/SIMD rather than *presence* of FP/SIMD. This largely works, but is
somewhat awkward relative to other cpucaps that describe the presence of
a feature, and it would be nicer to have a cpucap which describes the
presence of FP/SIMD:

* This will allow the cpucap to be treated as a standard
  ARM64_CPUCAP_SYSTEM_FEATURE, which can be detected with the standard
  has_cpuid_feature() function and ARM64_CPUID_FIELDS() description.

* This ensures that the cpucap will only transition from not-present to
  present, reducing the risk of unintentional and/or unsafe usage of
  FP/SIMD before cpucaps are finalized.

* This will allow using arm64_cpu_capabilities::cpu_enable() to enable
  the use of FP/SIMD later, with FP/SIMD being disabled at boot time
  otherwise. This will ensure that any unintentional and/or unsafe usage
  of FP/SIMD prior to this is trapped, and will ensure that FP/SIMD is
  never unintentionally enabled for userspace in mismatched big.LITTLE
  systems.

This patch replaces the negative ARM64_HAS_NO_FPSIMD cpucap with a
positive ARM64_HAS_FPSIMD cpucap, making changes as described above.
Note that as FP/SIMD will now be trapped when not supported system-wide,
do_fpsimd_acc() must handle these traps in the same way as for SVE and
SME. The commentary in fpsimd_restore_current_state() is updated to
describe the new scheme.

No users of system_supports_fpsimd() need to know that FP/SIMD is
available prior to alternatives being patched, so this is updated to
use alternative_has_cap_likely() to check for the ARM64_HAS_FPSIMD
cpucap, without generating code to test the system_cpucaps bitmap.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16 14:17:03 +01:00
Mark Rutland
075f48c924 arm64: Rework setup_cpu_features()
Currently setup_cpu_features() handles a mixture of one-time kernel
feature setup (e.g. cpucaps) and one-time user feature setup (e.g. ELF
hwcaps). Subsequent patches will rework other one-time setup and expand
the logic currently in setup_cpu_features(), and in preparation for this
it would be helpful to split the kernel and user setup into separate
functions.

This patch splits setup_user_features() out of setup_cpu_features(),
with a few additional cleanups of note:

* setup_cpu_features() is renamed to setup_system_features() to make it
  clear that it handles system-wide feature setup rather than cpu-local
  feature setup.

* setup_system_capabilities() is folded into setup_system_features().

* Presence of TTBR0 pan is logged immediately after
  update_cpu_capabilities(), so that this is guaranteed to appear
  alongside all the other detected system cpucaps.

* The 'cwg' variable is removed as its value is only consumed once and
  it's simpler to use cache_type_cwg() directly without assigning its
  return value to a variable.

* The call to setup_user_features() is moved after alternatives are
  patched, which will allow user feature setup code to depend on
  alternative branches and allow for simplifications in subsequent
  patches.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16 12:57:51 +01:00