The GICv5 ITS will reuse some GICv3 ITS MSI parent functions therefore
it makes sense to keep the code functionality in a compilation unit
shared by the two drivers.
Rename the GICv3 ITS MSI parent file and update the related
Kconfig/Makefile entries to pave the way for code sharing.
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250703-gicv5-host-v7-26-12e71f1b3528@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
- Switch the MSI decriptor locking to lock guards
- Replace a broken and naive implementation of PCI/MSI-X control word
updates in the PCI/TPH driver with a properly serialized variant in the
PCI/MSI core code.
- Remove the MSI descriptor abuse in the SCCI/UFS/QCOM driver by
replacing the direct access to the MSI descriptors with the proper API
function calls. People will never understand that APIs exist for a
reason...
- Provide core infrastructre for the upcoming PCI endpoint library
extensions. Currently limited to ARM GICv3+, but in theory extensible
to other architectures.
- Provide a MSI domain::teardown() callback, which allows drivers to undo
the effects of the prepare() callback.
- Move the MSI domain::prepare() callback invocation to domain creation
time to avoid redundant (and in case of ARM/GIC-V3-ITS confusing)
invocations on every allocation.
In combination with the new teardown callback this removes some ugly
hacks in the GIC-V3-ITS driver, which pretended to work around the
short comings of the core code so far. With this update the code is
correct by design and implementation.
- Make the irqchip MSI library globally available, provide a MSI parent
domain creation helper and convert a bunch of (PCI/)MSI drivers over to
the modern MSI parent mechanism. This is the first step to get rid of
at least one incarnation of the three PCI/MSI management schemes.
- The usual small cleanups and improvements
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmgzgFsTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoR0KD/402K12tlI/D70H2aTG25dbTx+dkVk+
pKpJz0985uUlLJiPCR54dZL0ofcfRU+CdjEIf1I+6TPshtg6IWLJCfqu7OWVPYzz
2lJDO0yeUGwJqc0CIa1vttvJWvcUcxfWBX/ZSkOIM5avaXqSwRwsFNfd7TQ+T+eG
79VS1yyW197mUva53ekSF2voa8EEPWfEslAjoX1dRg5d4viAxaLtKm/KpBqo1oPh
Eb+E67xEWiIonvWNdr1AOisxnbi19PyDo1xnftgBToaeXXYBodNrNIAfAkx40YUZ
IZQLHvhZ91x15hXYIS4Cz1RXqPECbu/tHxs4AFUgGvqdgJUF89wzI3C21ymrKA6E
tDlWfpIcuE3vV/bsqj1gHGL5G5m1tyBRgIdIAOOmMoTHvwp5rrQtuZzpuqzGmEzj
iVIHnn5m08kRpOZQc7+PlxQMh3eunEyj9WWG49EJgoAnJPb5lou4shTwBUheHcKm
NXxKsfo4x5C+WehGTxv80UlnMcK3Yh/TuWf2OPR6QuT2iHP2VL5jyHjIs0ICn0cp
1tvSJtdc1rgvk/4Vn4lu5eyVaTx5ZAH8ZXNQfwwBTWTp3ZyAW+7GkaCq3LPaNJoZ
4LWpgZ5gs6wT+1XNT3boKdns81VolmeTI8P1ciQKpUtaTt6Cy9P/i2az/J+BCS4U
Fn5Qqk08PHGrUQ==
=OBMj
-----END PGP SIGNATURE-----
Merge tag 'irq-msi-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull MSI updates from Thomas Gleixner:
"Updates for the MSI subsystem (core code and PCI):
- Switch the MSI descriptor locking to lock guards
- Replace a broken and naive implementation of PCI/MSI-X control word
updates in the PCI/TPH driver with a properly serialized variant in
the PCI/MSI core code.
- Remove the MSI descriptor abuse in the SCCI/UFS/QCOM driver by
replacing the direct access to the MSI descriptors with the proper
API function calls. People will never understand that APIs exist
for a reason...
- Provide core infrastructre for the upcoming PCI endpoint library
extensions. Currently limited to ARM GICv3+, but in theory
extensible to other architectures.
- Provide a MSI domain::teardown() callback, which allows drivers to
undo the effects of the prepare() callback.
- Move the MSI domain::prepare() callback invocation to domain
creation time to avoid redundant (and in case of ARM/GIC-V3-ITS
confusing) invocations on every allocation.
In combination with the new teardown callback this removes some
ugly hacks in the GIC-V3-ITS driver, which pretended to work around
the short comings of the core code so far. With this update the
code is correct by design and implementation.
- Make the irqchip MSI library globally available, provide a MSI
parent domain creation helper and convert a bunch of (PCI/)MSI
drivers over to the modern MSI parent mechanism. This is the first
step to get rid of at least one incarnation of the three PCI/MSI
management schemes.
- The usual small cleanups and improvements"
* tag 'irq-msi-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
PCI/MSI: Use bool for MSI enable state tracking
PCI: tegra: Convert to MSI parent infrastructure
PCI: xgene: Convert to MSI parent infrastructure
PCI: apple: Convert to MSI parent infrastructure
irqchip/msi-lib: Honour the MSI_FLAG_NO_AFFINITY flag
irqchip/mvebu: Convert to msi_create_parent_irq_domain() helper
irqchip/gic: Convert to msi_create_parent_irq_domain() helper
genirq/msi: Add helper for creating MSI-parent irq domains
irqchip: Make irq-msi-lib.h globally available
irqchip/gic-v3-its: Use allocation size from the prepare call
genirq/msi: Engage the .msi_teardown() callback on domain removal
genirq/msi: Move prepare() call to per-device allocation
irqchip/gic-v3-its: Implement .msi_teardown() callback
genirq/msi: Add .msi_teardown() callback as the reverse of .msi_prepare()
irqchip/gic-v3-its: Add support for device tree msi-map and msi-mask
dt-bindings: PCI: pci-ep: Add support for iommu-map and msi-map
irqchip/gic-v3-its: Set IRQ_DOMAIN_FLAG_MSI_IMMUTABLE for ITS
irqdomain: Add IRQ_DOMAIN_FLAG_MSI_IMMUTABLE and irq_domain_is_msi_immutable()
platform-msi: Add msi_remove_device_irq_domain() in platform_device_msi_free_irqs_all()
genirq/msi: Rename msi_[un]lock_descs()
...
Switch the GIC family of interrupt chip drivers over to the common helper
function to create the interrupt domains.
[ tglx: Moved the struct out of the function call argument ]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250513172819.2216709-4-maz@kernel.org
Move irq-msi-lib.h into include/linux/irqchip, making it available
to compilation units outside of drivers/irqchip.
This requires some churn in drivers to fetch it from the new location,
generated using this script:
git grep -l -w \"irq-msi-lib.h\" | \
xargs sed -i -e 's:"irq-msi-lib.h":\<linux/irqchip/irq-msi-lib.h\>:'
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250513172819.2216709-2-maz@kernel.org
On multi-node GICv4.1 system, VSGI senders always use one certain 4_1 ITS,
because find_4_1_its() returns the first its_node in the list, regardless of
which node the VSGI sender is on. This brings guest VSGI performance drop
when VM is not running on the same node as this returned ITS.
On a 2-socket environment, each with one ITS and 32 cpu, GICv4.1 enabled,
4U8G guest, 4 vcpu is running on same socket.
When the VM is on socket0, kvm-unit-tests ipi_hw result is 850ns.
When the VM is on socket1, it is 750ns.
The reason is that the VSGI sender always uses the last reported ITS (that
on socket1) to inject VSGI. The access from a CPU to a other-socket ITS
will cost 100ns more compared to an access to the local ITS.
Using the local ITS results in a 12% reduction in IPI latency.
Modify find_4_1_its() to return the first per-CPU local_4_1_its, which is
initialized when the VPE table is inherited from the ITS or from another
CPU. If it fails to find a local 4_1 ITS, it returns any 4_1 ITS like
before.
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Nianyao Tang <tangnianyao@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/all/20250515145359.2795959-1-tangnianyao@huawei.com
The ITS driver currently nukes the structure representing an endpoint
device translating via an ITS on freeing the last LPI allocated for it.
That's an unfortunate state of affair, as it is pretty common for a driver
to allocate a single MSI, do something clever, teardown this MSI, and
reallocate a whole bunch of them. The NVME driver does exactly that,
amongst others.
What happens in that case is that the core code is accidentaly issuing
another .msi_prepare() call, even if it shouldn't. This luckily cancels
the above behaviour and hides the problem.
In order to fix the core code, start by implementing the new
.msi_teardown() callback. Nothing calls it yet, so a side effect is that
the its_dev structure will not be freed and that the DID will stay
mapped. Not a big deal, and this will be solved in following patches.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250513163144.2215824-3-maz@kernel.org
Set the IRQ_DOMAIN_FLAG_MSI_IMMUTABLE flag for ITS, as it does not change
the address/data pair after setup.
Ensure compatibility with MSI users, such as PCIe Endpoint Doorbell, which
require the address/data pair to remain unchanged. Enable PCIe endpoints to
use ITS for triggering doorbells from the PCIe Root Complex (RC) side.
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250414-ep-msi-v18-3-f69b49917464@nxp.com
Including:
- Core: IOMMUFD dependencies from Jason:
- Change the iommufd fault handle into an always present hwpt handle in
the domain
- Give iommufd its own SW_MSI implementation along with some IRQ layer
rework
- Improvements to the handle attach API
- Core: Fixes for probe-issues from Robin
- Intel VT-d changes:
- Checking for SVA support in domain allocation and attach paths
- Move PCI ATS and PRI configuration into probe paths
- Fix a pentential hang on reboot -f
- Miscellaneous cleanups
- AMD-Vi changes:
- Support for up to 2k IRQs per PCI device function
- Set of smaller fixes
- ARM-SMMU changes:
- SMMUv2 devicetree binding updates for Qualcomm implementations
(QCS8300 GPU and MSM8937)
- Clean up SMMUv2 runtime PM implementation to help with wider rework of
pm_runtime_put_autosuspend()
- Rockchip driver changes:
- Driver adjustments for recent DT probing changes
- S390 IOMMU changes:
- Support for IOMMU passthrough
- Apple Dart changes:
- Driver adjustments to meet ISP device requirements
- Null-ptr deref fix
- Disable subpage protection for DART 1
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmfkFSkACgkQK/BELZcB
GuOVUg/9En4QlF0eeg4E1stszGcOOXn9IkKiGP9iqKpL/WqVVitoagn7nwq0rBFF
PdqqcGbSPNHctiUursB18DyENOAmRUE8mGG29po9rAfq5wxZ2yA6eLmpFAf9QDCW
vY3nz2oen1lUrzbPwVI1IR67L6BrtqEUz/1syf295RMr2yXkdmBXb12lbL1rdjv0
7YfWwlDtMOVsewlZmCuDAAfWHtd7cO/ulDcYwq+GOiSXojVmKw8624EFlFbM9E9f
tkceE4mjGQnN/CpZzUc48f1DgIW82PVVTek95Cznbk9ACDJQ4VYYwDMx2WszfMzu
D1CXLiQu0vrxQgFdHn02bw3T4yvXQ4HeawIaTF+1J1gEHbrj6MoVQ88zaO3GnKFk
yxYI5sfOs5iL6zSMkig4OytXuRmRqDVJhgrF1i4XIM3GXrRKf9EiRZ/1eXb+1gab
3Q5k7+u4+7vWbaKLsxOdIBNzz02hb7LVndPIFlWmSnr1YY0LFBfwmgbfTAsSINZo
22Ni8aE2C42lYOZxJ67m3khLpqTZaTERtQIab/gUd5FcuEjxKuHBIzj6SZeC/0Q9
Y6rzklHxJxxtQLrUJp16IzjpiMJbWH0H2jKstRberOpxk3L5aLrJNd8M1B+CokmP
LZeKgqgxV5F+VlR2Bap5AXL9ZEuqjCVDIEy79j8XzAzxobDVnVM=
=a8Pd
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux
Pull iommu updates from Joerg Roedel:
"Core iommufd dependencies from Jason:
- Change the iommufd fault handle into an always present hwpt handle
in the domain
- Give iommufd its own SW_MSI implementation along with some IRQ
layer rework
- Improvements to the handle attach API
Core fixes for probe-issues from Robin
Intel VT-d changes:
- Checking for SVA support in domain allocation and attach paths
- Move PCI ATS and PRI configuration into probe paths
- Fix a pentential hang on reboot -f
- Miscellaneous cleanups
AMD-Vi changes:
- Support for up to 2k IRQs per PCI device function
- Set of smaller fixes
ARM-SMMU changes:
- SMMUv2 devicetree binding updates for Qualcomm implementations
(QCS8300 GPU and MSM8937)
- Clean up SMMUv2 runtime PM implementation to help with wider rework
of pm_runtime_put_autosuspend()
Rockchip driver changes:
- Driver adjustments for recent DT probing changes
S390 IOMMU changes:
- Support for IOMMU passthrough
Apple Dart changes:
- Driver adjustments to meet ISP device requirements
- Null-ptr deref fix
- Disable subpage protection for DART 1"
* tag 'iommu-updates-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (54 commits)
iommu/vt-d: Fix possible circular locking dependency
iommu/vt-d: Don't clobber posted vCPU IRTE when host IRQ affinity changes
iommu/vt-d: Put IRTE back into posted MSI mode if vCPU posting is disabled
iommu: apple-dart: fix potential null pointer deref
iommu/rockchip: Retire global dma_dev workaround
iommu/rockchip: Register in a sensible order
iommu/rockchip: Allocate per-device data sensibly
iommu/mediatek-v1: Support COMPILE_TEST
iommu/amd: Enable support for up to 2K interrupts per function
iommu/amd: Rename DTE_INTTABLEN* and MAX_IRQS_PER_TABLE macro
iommu/amd: Replace slab cache allocator with page allocator
iommu/amd: Introduce generic function to set multibit feature value
iommu: Don't warn prematurely about dodgy probes
iommu/arm-smmu: Set rpm auto_suspend once during probe
dt-bindings: arm-smmu: Document QCS8300 GPU SMMU
iommu: Get DT/ACPI parsing into the proper probe path
iommu: Keep dev->iommu state consistent
iommu: Resolve ops in iommu_init_device()
iommu: Handle race with default domain setup
iommu: Unexport iommu_fwspec_free()
...
The two-step process to translate the MSI address involves two functions,
iommu_dma_prepare_msi() and iommu_dma_compose_msi_msg().
Previously iommu_dma_compose_msi_msg() needed to be in the iommu layer as
it had to dereference the opaque cookie pointer. Now, the previous patch
changed the cookie pointer into an integer so there is no longer any need
for the iommu layer to be involved.
Further, the call sites of iommu_dma_compose_msi_msg() all follow the same
pattern of setting an MSI message address_hi/lo to non-translated and then
immediately calling iommu_dma_compose_msi_msg().
Refactor iommu_dma_compose_msi_msg() into msi_msg_set_addr() that directly
accepts the u64 version of the address and simplifies all the callers.
Move the new helper to linux/msi.h since it has nothing to do with iommu.
Aside from refactoring, this logically prepares for the next patch, which
allows multiple implementation options for iommu_dma_prepare_msi(). So, it
does not make sense to have the iommu_dma_compose_msi_msg() in dma-iommu.c
as it no longer provides the only iommu_dma_prepare_msi() implementation.
Link: https://patch.msgid.link/r/eda62a9bafa825e9cdabd7ddc61ad5a21c32af24.1740014950.git.nicolinc@nvidia.com
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Rockchip RK3566/RK3568 GIC600 integration has DDR addressing
limited to the first 32bit of physical address space. Rockchip
assigned Erratum ID #3568002 for this issue. Add driver quirk for
this Rockchip GIC Erratum.
Note, that the 0x0201743b GIC600 ID is not Rockchip-specific and is
common for many ARM GICv3 implementations. Hence, there is an extra
of_machine_is_compatible() check.
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/all/20250216221634.364158-2-dmitry.osipenko@collabora.com
The following call-chain leads to enabling interrupts in a nested interrupt
disabled section:
irq_set_vcpu_affinity()
irq_get_desc_lock()
raw_spin_lock_irqsave() <--- Disable interrupts
its_irq_set_vcpu_affinity()
guard(raw_spinlock_irq) <--- Enables interrupts when leaving the guard()
irq_put_desc_unlock() <--- Warns because interrupts are enabled
This was broken in commit b97e8a2f71, which replaced the original
raw_spin_[un]lock() pair with guard(raw_spinlock_irq).
Fix the issue by using guard(raw_spinlock).
[ tglx: Massaged change log ]
Fixes: b97e8a2f71 ("irqchip/gic-v3-its: Fix potential race condition in its_vlpi_prop_update()")
Signed-off-by: Tomas Krcka <krckatom@amazon.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20241230150825.62894-1-krckatom@amazon.de
When enabling GICv4.1 in hip09, VMAPP fails to clear some caches during
the unmap operation, which can causes vSGIs to be lost.
To fix the issue, invalidate the related vPE cache through GICR_INVALLR
after VMOVP.
Suggested-by: Marc Zyngier <maz@kernel.org>
Co-developed-by: Nianyao Tang <tangnianyao@huawei.com>
Signed-off-by: Nianyao Tang <tangnianyao@huawei.com>
Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Marc Zyngier <maz@kernel.org>
- Tree wide:
* Make nr_irqs static to the core code and provide accessor functions
to remove existing and prevent future aliasing problems with local
variables or function arguments of the same name.
- Core code:
* Prevent freeing an interrupt in the devres code which is not managed
by devres in the first place.
* Use seq_put_decimal_ull_width() for decimal values output in
/proc/interrupts which increases performance significantly as it
avoids parsing the format strings over and over.
* Optimize raising the timer and hrtimer soft interrupts by using the
'set bit only' variants instead of the combined version which checks
whether ksoftirqd should be woken up. The latter is a pointless
exercise as both soft interrupts are raised in the context of the
timer interrupt and therefore never wake up ksoftirqd.
* Delegate timer/hrtimer soft interrupt processing to a dedicated thread
on RT.
Timer and hrtimer soft interrupts are always processed in ksoftirqd
on RT enabled kernels. This can lead to high latencies when other
soft interrupts are delegated to ksoftirqd as well.
The separate thread allows to run them seperately under a RT
scheduling policy to reduce the latency overhead.
- Drivers:
* New drivers or extensions of existing drivers to support Renesas
RZ/V2H(P), Aspeed AST27XX, T-HEAD C900 and ATMEL sam9x7 interrupt
chips
* Support for multi-cluster GICs on MIPS.
MIPS CPUs can come with multiple CPU clusters, where each CPU cluster
has its own GIC (Generic Interrupt Controller). This requires to
access the GIC of a remote cluster through a redirect register block.
This is encapsulated into a set of helper functions to keep the
complexity out of the actual code paths which handle the GIC details.
* Support for encrypted guests in the ARM GICV3 ITS driver
The ITS page needs to be shared with the hypervisor and therefore
must be decrypted.
* Small cleanups and fixes all over the place
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmc7ggcTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoaf7D/9G6FgJXx/60zqnpnOr9Yx0hxjaI47x
PFyCd3P05qyVMBYXfI99vrSKuVdMZXJ/fH5L83y+sOaTASyLTzg37igZycIDJzLI
FnHh/m/+UA8k2aIC5VUiNAjne2RLaTZiRN15uEHFVjByC5Y+YTlCNUE4BBhg5RfQ
hKmskeffWdtui3ou13CSNvbFn+pmqi4g6n1ysUuLhiwM2E5b1rZMprcCOnun/cGP
IdUQsODNWTTv9eqPJez985M6A1x2SCGNv7Z73h58B9N0pBRPEC1xnhUnCJ1sA0cJ
pnfde2C1lztEjYbwDngy0wgq0P6LINjQ5Ma2YY2F2hTMsXGJxGPDZm24/u5uR46x
N/gsOQMXqw6f5yvbiS7Asx9WzR6ry8rJl70QRgTyozz7xxJTaiNm2HqVFe2wc+et
Q/BzaKdhmUJj1GMZmqD2rrgwYeDcb4wWYNtwjM4PVHHxYlJVq0mEF1kLLS8YDyjf
HuGPVqtSkt3E0+Br3FKcv5ltUQP8clXbudc6L1u98YBfNK12hW8L+c3YSvIiFoYM
ZOAeANPM7VtQbP2Jg2q81Dd3CShImt5jqL2um+l8g7+mUE7l9gyuO/w/a5dQ57+b
kx7mHHIW2zCeHrkZZbRUYzI2BJfMCCOVN4Ax5OZxTLnLsL9VEehy8NM8QYT4TS8R
XmTOYW3U9XR3gw==
=JqxC
-----END PGP SIGNATURE-----
Merge tag 'irq-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull interrupt subsystem updates from Thomas Gleixner:
"Tree wide:
- Make nr_irqs static to the core code and provide accessor functions
to remove existing and prevent future aliasing problems with local
variables or function arguments of the same name.
Core code:
- Prevent freeing an interrupt in the devres code which is not
managed by devres in the first place.
- Use seq_put_decimal_ull_width() for decimal values output in
/proc/interrupts which increases performance significantly as it
avoids parsing the format strings over and over.
- Optimize raising the timer and hrtimer soft interrupts by using the
'set bit only' variants instead of the combined version which
checks whether ksoftirqd should be woken up. The latter is a
pointless exercise as both soft interrupts are raised in the
context of the timer interrupt and therefore never wake up
ksoftirqd.
- Delegate timer/hrtimer soft interrupt processing to a dedicated
thread on RT.
Timer and hrtimer soft interrupts are always processed in ksoftirqd
on RT enabled kernels. This can lead to high latencies when other
soft interrupts are delegated to ksoftirqd as well.
The separate thread allows to run them seperately under a RT
scheduling policy to reduce the latency overhead.
Drivers:
- New drivers or extensions of existing drivers to support Renesas
RZ/V2H(P), Aspeed AST27XX, T-HEAD C900 and ATMEL sam9x7 interrupt
chips
- Support for multi-cluster GICs on MIPS.
MIPS CPUs can come with multiple CPU clusters, where each CPU
cluster has its own GIC (Generic Interrupt Controller). This
requires to access the GIC of a remote cluster through a redirect
register block.
This is encapsulated into a set of helper functions to keep the
complexity out of the actual code paths which handle the GIC
details.
- Support for encrypted guests in the ARM GICV3 ITS driver
The ITS page needs to be shared with the hypervisor and therefore
must be decrypted.
- Small cleanups and fixes all over the place"
* tag 'irq-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
irqchip/riscv-aplic: Prevent crash when MSI domain is missing
genirq/proc: Use seq_put_decimal_ull_width() for decimal values
softirq: Use a dedicated thread for timer wakeups on PREEMPT_RT.
timers: Use __raise_softirq_irqoff() to raise the softirq.
hrtimer: Use __raise_softirq_irqoff() to raise the softirq
riscv: defconfig: Enable T-HEAD C900 ACLINT SSWI drivers
irqchip: Add T-HEAD C900 ACLINT SSWI driver
dt-bindings: interrupt-controller: Add T-HEAD C900 ACLINT SSWI device
irqchip/stm32mp-exti: Use of_property_present() for non-boolean properties
irqchip/mips-gic: Fix selection of GENERIC_IRQ_EFFECTIVE_AFF_MASK
irqchip/mips-gic: Prevent indirect access to clusters without CPU cores
irqchip/mips-gic: Multi-cluster support
irqchip/mips-gic: Setup defaults in each cluster
irqchip/mips-gic: Support multi-cluster in for_each_online_cpu_gic()
irqchip/mips-gic: Replace open coded online CPU iterations
genirq/irqdesc: Use str_enabled_disabled() helper in wakeup_show()
genirq/devres: Don't free interrupt which is not managed by devres
irqchip/gic-v3-its: Fix over allocation in itt_alloc_pool()
irqchip/aspeed-intc: Add AST27XX INTC support
dt-bindings: interrupt-controller: Add support for ASPEED AST27XX INTC
...
Zenghui points out that a recent change to the way set_affinity is
handled for VPEs has the potential to return an error if the VPE
hasn't been mapped yet (because the guest hasn't emited a MAPTI
command yet), affecting GICv4.0 implementations that rely on the
ITSList feature.
Fix this by making the set_affinity succeed in this case, and
return early, without trying to touch the HW.
Fixes: 1442ee0011 ("irqchip/gic-v4: Don't allow a VMOVP on a dying VPE")
Reported-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/all/20241027102220.1858558-1-maz@kernel.org
Link: https://lore.kernel.org/r/aab45cd3-e5ca-58cf-e081-e32a17f5b4e7@huawei.com
itt_alloc_pool() calls its_alloc_pages_node() to allocate an individual
page to add to the pool (for allocations <PAGE_SIZE). However the final
argument of its_alloc_pages_node() is the page order not the number of
pages. Currently it allocates two pages and leaks the second page.
Fix it by passing 0 instead (1 << 0 = 1 page).
Fixes: b08e2f42e8 ("irqchip/gic-v3-its: Share ITS tables with a non-trusted hypervisor")
Reported-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/1f6e19c4-1fb9-43ab-a8a2-a465c9cff84b@arm.com
Closes: https://lore.kernel.org/r/ed65312a-245c-4fa5-91ad-5d620cab7c6b%40nvidia.com
Kunkun Jiang reported that there is a small window of opportunity for
userspace to force a change of affinity for a VPE while the VPE has already
been unmapped, but the corresponding doorbell interrupt still visible in
/proc/irq/.
Plug the race by checking the value of vmapp_count, which tracks whether
the VPE is mapped ot not, and returning an error in this case.
This involves making vmapp_count common to both GICv4.1 and its v4.0
ancestor.
Fixes: 64edfaa9a2 ("irqchip/gic-v4.1: Implement the v4.1 flavour of VMAPP")
Reported-by: Kunkun Jiang <jiangkunkun@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/c182ece6-2ba0-ce4f-3404-dba7a3ab6c52@huawei.com
Link: https://lore.kernel.org/all/20241002204959.2051709-1-maz@kernel.org
its_create_device() over-allocated by ITS_ITT_ALIGN - 1 bytes to ensure
that an aligned area was available within the allocation. The new genpool
allocator has its min_alloc_order set to get_order(ITS_ITT_ALIGN) so all
allocations from it should be appropriately aligned.
Remove the over-allocation from its_create_device() and alignment from
its_build_mapd_cmd().
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Will Deacon <will@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/all/20241002141630.433502-3-steven.price@arm.com
Within a realm guest the ITS is emulated by the host. This means the
allocations must have been made available to the host by a call to
set_memory_decrypted(). Introduce an allocation function which performs
this extra call.
For the ITT use a custom genpool-based allocator that calls
set_memory_decrypted() for each page allocated, but then suballocates the
size needed for each ITT. Note that there is no mechanism implemented to
return pages from the genpool, but it is unlikely that the peak number of
devices will be much larger than the normal level - so this isn't expected
to be an issue.
Co-developed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Will Deacon <will@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/all/20241002141630.433502-2-steven.price@arm.com
The recently established lock ordering mandates that the per-VM
vmapp_lock is acquired before taking the per-VPE lock.
As it turns out, its_vpe_set_affinity() takes the VPE lock, and
then calls into its_send_vmovp(), which itself takes the vmapp
lock. Obviously, this is a lock order violation.
As its_send_vmovp() is only called from its_vpe_set_affinity(),
hoist the vmapp locking from the former into the latter, restoring
the expected order.
Fixes: f0eb154c39 ("irqchip/gic-v4: Substitute vmovp_lock for a per-VM lock")
Reported-by: Zhou Wang <wangzhou1@hisilicon.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20240818171625.3030584-1-maz@kernel.org
This simplifies the handling of platform MSI and wire to MSI controllers
and removes about 500 lines of legacy code.
Aside of that it paves the way for ARM/ARM64 to utilize the dynamic
allocation of PCI/MSI interrupts and to support the upcoming non
standard IMS (Interrupt Message Store) mechanism on PCIe devices
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmaeheUTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYocX4D/wLYD+DQDpA3U1XS8jPNE4vKcmBBNX8
Mj4qdHsY8fK+FhmtLsj8FL3iSTymPtgXzFupXGS+5iFG3LhbW8JWEbqjbowcJ1c8
/4w8sKyyWdCSScrCTrH4A3RrLNDAX3DzSMqqi17sdETuwtN0RJiXgcm/CwRXETmn
kVqB7ddalyAR0Z2N/ym1fkuwyBAdeu3cBxMy/BWR6GFae1dAGe8Kr8GsmmuzBTFi
DQSmkh6kZntTn9y+K7juXF+1q8InolmHiOOUeoUJachSCyp6nu9W2+S2MVUiuOA2
X1/Ei3eKvkBHFDd7phZnIrVecuNehAQEV6BRMKOYBiDG4lwD6vCbbr9/YF5vBGni
tbZAetk9VBpIj0YRVAz7WkLC2JmVbw4znlrDwe8+xeLeDwRXl9f4Xc1Udr0qKgpd
1bNE1zG1z45v5J3OtJLJ4MCYcUCsQgv1CkUlNEdz5+NhXHT+W+oKJor/0WYJ3Qwe
iqTEJ9BA1/SzvngwIt/uoMZlEjBl/0/T1UEMJvP/7oEqjl/UAEWGlpKnID3hsDc2
GcIEOJod6hWzyPyeJUI6RpCHy4ZG93WL7Ks+lvzfp381yoDL5/KlveDtSomyuzYF
2xXHUAvw8MAYfJ/CFft/DYme8sBpn1cxAMWdctEiAn0qfS7X1RNZ/RhQ2OXxRw3q
tNpc0jEen9m72A==
=2adH
-----END PGP SIGNATURE-----
Merge tag 'irq-msi-2024-07-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull MSI interrupt updates from Thomas Gleixner:
"Switch ARM/ARM64 over to the modern per device MSI domains.
This simplifies the handling of platform MSI and wire to MSI
controllers and removes about 500 lines of legacy code.
Aside of that it paves the way for ARM/ARM64 to utilize the dynamic
allocation of PCI/MSI interrupts and to support the upcoming non
standard IMS (Interrupt Message Store) mechanism on PCIe devices"
* tag 'irq-msi-2024-07-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
irqchip/gic-v3-its: Correctly fish out the DID for platform MSI
irqchip/gic-v3-its: Correctly honor the RID remapping
genirq/msi: Move msi_device_data to core
genirq/msi: Remove platform MSI leftovers
irqchip/irq-mvebu-icu: Remove platform MSI leftovers
irqchip/irq-mvebu-sei: Switch to MSI parent
irqchip/mvebu-odmi: Switch to parent MSI
irqchip/mvebu-gicp: Switch to MSI parent
irqchip/irq-mvebu-icu: Prepare for real per device MSI
irqchip/imx-mu-msi: Switch to MSI parent
irqchip/gic-v2m: Switch to device MSI
irqchip/gic_v3_mbi: Switch over to parent domain
genirq/msi: Remove platform_msi_create_device_domain()
irqchip/mbigen: Remove platform_msi_create_device_domain() fallback
irqchip/gic-v3-its: Switch platform MSI to MSI parent
irqchip/irq-msi-lib: Prepare for DOMAIN_BUS_WIRED_TO_MSI
irqchip/mbigen: Prepare for real per device MSI
irqchip/irq-msi-lib: Prepare for DEVICE MSI to replace platform MSI
irqchip/gic-v3-its: Provide MSI parent for PCI/MSI[-X]
irqchip/irq-msi-lib: Prepare for PCI MSI/MSIX
...
- Core:
- Provide a new mechanism to create interrupt domains. The existing
interfaces have already too many parameters and it's a pain to expand
any of this for new required functionality.
The new function takes a pointer to a data structure as argument. The
data structure combines all existing parameters and allows for easy
extension.
The first extension for this is to handle the instantiation of
generic interrupt chips at the core level and to allow drivers to
provide extra init/exit callbacks.
This is necessary to do the full interrupt chip initialization before
the new domain is published, so that concurrent usage sites won't see
a half initialized interrupt domain. Similar problems exist on
teardown.
This has turned out to be a real problem due to the deferred and
parallel probing which was added in recent years.
Handling this at the core level allows to remove quite some accrued
boilerplate code in existing drivers and avoids horrible workarounds
at the driver level.
- The usual small improvements all over the place
- Drivers
- Add support for LAN966x OIC and RZ/Five SoC
- Split the STM ExtI driver into a microcontroller and a SMP version to
allow building the latter as a module for multi-platform kernels.
- Enable MSI support for Armada 370XP on platforms which do not support
IPIs.
- The usual small fixes and enhancements all over the place.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmaVJbUTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoXTuD/9Tc9BhY5CW7HQkdPQu2Db1O+esprkQ
Uo9lMpTTpPiy9btg4LONzLf4mjbufZpyKBxkRWoZFO0Zj5q4UE9NZYh7EcxrF5Tl
CIFJmyteLsYuOyCmPrtSDSovonXjQKYBE3u2LVJNNkwEkhYbYW9sqIKeT8nneLv6
53gd28ESFUEUjHNTblw/eXviweyUKSXc0qyg+3hgZQPMoh9RkdkEPvyaw9Y/s5Ce
FelLLxzMqX86dR2TJMLqiaGiMpUu/kl+Yz2m5c77TwA2D68qjhHywbtKtlH7b3C6
LMHu2dMrrKSJrLL8roVIYJdHAd1TKWVdnYhqv9WBHFTu1sDuztpR44mewbo8exUU
L2RgVSGYNmeFC3p4wztWYSQfIVa9uOg7+TnJJdh7G0jLIeKM/TbufWqDAJAuoVPL
QhGbZ5xNbZJZ8bvhhItjxpRN/kPs44p3mUGyRJBQzm+mDN118bqfmQzhLcwRbfE2
smp73SQzg9alG2rGdNVEqkKmp8zhg2Crx2VCeVdgbeOxWQRet9zLWcp4FfCEUE9e
eK3iEi8z+rmwafaf3rsxYdrdIRLaUmcni0v7R/16cJH/Cs7bU3Re8XyGhevo3lsO
pJiP5wZDxbckwXNpLm3S/qPDW7vSCnuFPF7QmOvC3a70PsD+E4NKUgiwJuHtn/ZV
pFBKzbQgCsowQA==
=QCRH
-----END PGP SIGNATURE-----
Merge tag 'irq-core-2024-07-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull interrupt subsystem updates from Thomas Gleixner:
"Core:
- Provide a new mechanism to create interrupt domains. The existing
interfaces have already too many parameters and it's a pain to
expand any of this for new required functionality.
The new function takes a pointer to a data structure as argument.
The data structure combines all existing parameters and allows for
easy extension.
The first extension for this is to handle the instantiation of
generic interrupt chips at the core level and to allow drivers to
provide extra init/exit callbacks.
This is necessary to do the full interrupt chip initialization
before the new domain is published, so that concurrent usage sites
won't see a half initialized interrupt domain. Similar problems
exist on teardown.
This has turned out to be a real problem due to the deferred and
parallel probing which was added in recent years.
Handling this at the core level allows to remove quite some accrued
boilerplate code in existing drivers and avoids horrible
workarounds at the driver level.
- The usual small improvements all over the place
Drivers:
- Add support for LAN966x OIC and RZ/Five SoC
- Split the STM ExtI driver into a microcontroller and a SMP version
to allow building the latter as a module for multi-platform
kernels
- Enable MSI support for Armada 370XP on platforms which do not
support IPIs
- The usual small fixes and enhancements all over the place"
* tag 'irq-core-2024-07-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (59 commits)
irqdomain: Fix the kernel-doc and plug it into Documentation
genirq: Set IRQF_COND_ONESHOT in request_irq()
irqchip/imx-irqsteer: Handle runtime power management correctly
irqchip/gic-v3: Pass #redistributor-regions to gic_of_setup_kvm_info()
irqchip/bcm2835: Enable SKIP_SET_WAKE and MASK_ON_SUSPEND
irqchip/gic-v4: Make sure a VPE is locked when VMAPP is issued
irqchip/gic-v4: Substitute vmovp_lock for a per-VM lock
irqchip/gic-v4: Always configure affinity on VPE activation
Revert "irqchip/dw-apb-ictl: Support building as module"
Revert "Loongarch: Support loongarch avec"
arm64: Kconfig: Allow build irq-stm32mp-exti driver as module
ARM: stm32: Allow build irq-stm32mp-exti driver as module
irqchip/stm32mp-exti: Allow building as module
irqchip/stm32mp-exti: Rename internal symbols
irqchip/stm32-exti: Split MCU and MPU code
arm64: Kconfig: Select STM32MP_EXTI on STM32 platforms
ARM: stm32: Use different EXTI driver on ARMv7m and ARMv7a
irqchip/stm32-exti: Add CONFIG_STM32MP_EXTI
irqchip/dw-apb-ictl: Support building as module
irqchip/riscv-aplic: Simplify the initialization code
...
To support per device MSI domains the ITS must provide MSI parent domain
functionality.
Provide the basic skeleton for this:
- msi_parent_ops
- child domain init callback
- the MSI parent flag set in irqdomain::flags
This does not make ITS a functional parent domain as there is no bit set in
the bus_select_mask yet, but it provides the base to implement PCI and
platform MSI support gradually on top.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Shivamurthy Shastri <shivamurthy.shastri@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240623142234.903076277@linutronix.de
In order to make sure that vpe->col_idx is correctly sampled when a VMAPP
command is issued, the vpe_lock must be held for the VPE. This is now
possible since the introduction of the per-VM vmapp_lock, which can be
taken before vpe_lock in the correct locking order.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nianyao Tang <tangnianyao@huawei.com>
Link: https://lore.kernel.org/r/20240705093155.871070-4-maz@kernel.org
vmovp_lock is abused in a number of cases to serialise updates
to vlpi_count[] and deal with map/unmap of a VM to ITSs.
Instead, provide a per-VM lock and revisit the use of vlpi_count[]
so that it is always wrapped in this per-VM vmapp_lock.
This reduces the potential contention on a concurrent VMOVP command,
and paves the way for subsequent VPE locking that holding vmovp_lock
actively prevents due to the lock ordering.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nianyao Tang <tangnianyao@huawei.com>
Link: https://lore.kernel.org/r/20240705093155.871070-3-maz@kernel.org
There are currently two paths to set the initial affinity of a VPE:
- at activation time on GICv4 without the stupid VMOVP list, and
on GICv4.1
- at map time for GICv4 with VMOVP list
The latter location may end-up modifying the affinity of VPE that is
currently running, making the results unpredictible.
Instead, unify the two paths, making sure to set the initial affinity only
at activation time.
Reported-by: Nianyao Tang <tangnianyao@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nianyao Tang <tangnianyao@huawei.com>
Link: https://lore.kernel.org/r/20240705093155.871070-2-maz@kernel.org
In subsequent patches the GICv3 driver will choose the regular interrupt
priority at boot time.
In preparation for using dynamic priorities, place the priorities in
variables and update the code to pass these as parameters. Users of
GICD_INT_DEF_PRI_X4 are modified to replicate the priority byte using
REPEAT_BYTE_U32().
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240617111841.2529370-4-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
The GIC architecture specification defines a set of registers for
redistributors and ITSes that control the sharebility and cacheability
attributes of redistributors/ITSes initiator ports on the interconnect
(GICR_[V]PROPBASER, GICR_[V]PENDBASER, GITS_BASER<n>).
Architecturally the GIC provides a means to drive shareability and
cacheability attributes signals but it is not mandatory for designs to
wire up the corresponding interconnect signals that control the
cacheability/shareability of transactions.
Redistributors and ITSes interconnect ports can be connected to
non-coherent interconnects that are not able to manage the
shareability/cacheability attributes; this implicitly makes the
redistributors and ITSes non-coherent observers.
To enable non-coherent GIC designs on ACPI based systems, parse the MADT
GICC/GICR/ITS subtables non-coherent flags to determine whether the
respective components are non-coherent observers and force the
shareability attributes to be programmed into the redistributors and
ITSes registers.
An ACPI global function (acpi_get_madt_revision()) is added to retrieve
the MADT revision, in that it is essential to check the MADT revision
before checking for flags that were added with MADT revision 7 so that
if the kernel is booted with an ACPI MADT table with revision < 7 it
skips parsing the newly added flags (that should be zeroed reserved
values for MADT versions < 7 but they could turn out to be buggy and
should be ignored).
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Link: https://lore.kernel.org/r/20240606094238.757649-2-lpieralisi@kernel.org
its_vlpi_prop_update() calls lpi_write_config() which obtains the
mapping information for a VLPI without lock held. So it could race
with its_vlpi_unmap().
Since all calls from its_irq_set_vcpu_affinity() require the same
lock to be held, hoist the locking there instead of sprinkling the
locking all over the place.
This bug was discovered using Coverity Static Analysis Security Testing
(SAST) by Synopsys, Inc.
[ tglx: Use guard() instead of goto ]
Fixes: 015ec0386a ("irqchip/gic-v3-its: Add VLPI configuration handling")
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Hagar Hemdan <hagarhem@amazon.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240531162144.28650-1-hagarhem@amazon.com
- Core code:
- Interrupt storm detection for the lockup watchdog:
Lockups which are caused by interrupt storms are not easy to debug
because there is no information about the events which make the lockup
detector trigger.
To make this more user friendly, provide an extenstion to interrupt
statistics which allows to take snapshots and an interface to retrieve
the delta to the snapshot. Use this new mechanism in the watchdog code
to do a two stage lockup analysis by taking the snapshot and printing
the deltas for the topmost active interrupts on the second trigger.
Note: This contains both the interrupt and the watchdog changes as
the latter depend on the former obviously.
- Avoid summation loops in the /proc/interrupts output and use the global
counter when possible
- Skip suspended interrupts on CPU hotplug operations to ensure that they
are not delivered before the system resumes the device drivers when
coming out of suspend.
- On CPU hot-unplug interrupts which are affine to the outgoing CPU are
migrated to a different CPU in the affinity mask. This can fail when
the CPUs have no vectors left. Instead of giving up try to migrate it
to any online CPU and thereby breaking the affinity setting in order to
prevent a stale device interrupt which targets an offline CPU
- The usual small cleanups
- Driver code:
- Support for the RISCV AIA MSI controller
- Make the interrupt allocation for the Loongson PCH controller more
flexible to prevent vector exhaustion
- The usual set of cleanups and fixes all over the place
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmZBCM0THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoeZHEACqMLN3K+1HyWflYtcTHJeYCjZLHS77
2tQeKaaskOA4W6dcGXPxMw5CHqAobHVQQMqgcJxhUdqQiOJnFFnrtCD7JtqM0hWK
UORNbyeovuhAo+iJ0fTuS8p63H7vm2GIWwBLWJnOuChYv/6Yyx5Cald1skvyvbzL
zePhiiAf5mkdmJMeT5wJSCqEWSRYOXsVAJ/0YAwFG3bKkJH3bmDo6SDJY02sXT5P
pjbtD/0hum9wIVT4fNdYleHHQMdBdj9dLlcxXBikHq50mDMw7GxvjKiLcXmoerw3
rEBfVVJp3qpSofpNJZ3HH0ywcF3yUzq04/LPE9Tk2MoQ8NF0GzP8r9Ahke4B7cUj
FysWNiAlC2IisEi6th313FZkTLx0zgewdsdEBTLt8eAE9TU0wamRbo99LZ8i/Qr3
hk7jV8DzL+EDQJLgl4p1iPJgA708eW17tbCxLEa15VKVV6P58miohmhx/IfPO2Gx
FV1PPehtItsmiK/UoRtUCoFdFsqNQtOE+h8DWLyy8RDmhBqGbn9Ut4euXiQIF+rX
WJKPFfslCTR39BrBcZnZeNsgOCN7tEfFRstzjzkey1DaeTGWtxmA5UGhpC2vT74y
YyXluvZlgKr4S64ABmcqQj++hQLho0OQAih3uW5YVxt4VxEUcXYMJOsV1AQGpMjF
UnewWH5opBQdfw==
=jFLf
-----END PGP SIGNATURE-----
Merge tag 'irq-core-2024-05-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull interrupt subsystem updates from Thomas Gleixner:
"Core code:
- Interrupt storm detection for the lockup watchdog:
Lockups which are caused by interrupt storms are not easy to debug
because there is no information about the events which make the
lockup detector trigger.
To make this more user friendly, provide an extenstion to interrupt
statistics which allows to take snapshots and an interface to
retrieve the delta to the snapshot. Use this new mechanism in the
watchdog code to do a two stage lockup analysis by taking the
snapshot and printing the deltas for the topmost active interrupts
on the second trigger.
Note: This contains both the interrupt and the watchdog changes as
the latter depend on the former obviously.
- Avoid summation loops in the /proc/interrupts output and use the
global counter when possible
- Skip suspended interrupts on CPU hotplug operations to ensure that
they are not delivered before the system resumes the device drivers
when coming out of suspend.
- On CPU hot-unplug interrupts which are affine to the outgoing CPU
are migrated to a different CPU in the affinity mask. This can fail
when the CPUs have no vectors left. Instead of giving up try to
migrate it to any online CPU and thereby breaking the affinity
setting in order to prevent a stale device interrupt which targets
an offline CPU
- The usual small cleanups
Driver code:
- Support for the RISCV AIA MSI controller
- Make the interrupt allocation for the Loongson PCH controller more
flexible to prevent vector exhaustion
- The usual set of cleanups and fixes all over the place"
* tag 'irq-core-2024-05-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits)
irqchip/gic-v3-its: Remove BUG_ON in its_vpe_irq_domain_alloc
cpuidle: Avoid explicit cpumask allocation on stack
irqchip/sifive-plic: Avoid explicit cpumask allocation on stack
irqchip/riscv-aplic-direct: Avoid explicit cpumask allocation on stack
irqchip/loongson-eiointc: Avoid explicit cpumask allocation on stack
irqchip/gic-v3-its: Avoid explicit cpumask allocation on stack
irqchip/irq-bcm6345-l1: Avoid explicit cpumask allocation on stack
cpumask: Introduce cpumask_first_and_and()
irqchip/irq-brcmstb-l2: Avoid saving mask on shutdown
genirq: Reuse irq_is_nmi()
genirq/cpuhotplug: Retry with cpu_online_mask when migration fails
genirq/cpuhotplug: Skip suspended interrupts when restoring affinity
arm64: dts: st: Add interrupt parent to pinctrl on stm32mp251
arm64: dts: st: Add exti1 and exti2 nodes on stm32mp251
ARM: dts: stm32: List exti parent interrupts on stm32mp131
ARM: dts: stm32: List exti parent interrupts on stm32mp151
arm64: Kconfig.platforms: Enable STM32_EXTI for ARCH_STM32
irqchip/stm32-exti: Mark events reserved with RIF configuration check
irqchip/stm32-exti: Skip secure events
irqchip/stm32-exti: Convert driver to standard PM
...
This BUG_ON() is useless, because the same effect will be obtained
by letting the code run its course and vm being dereferenced,
triggering an exception.
So just remove this check.
Signed-off-by: Guanrui Huang <guanrui.huang@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240418061053.96803-3-guanrui.huang@linux.alibaba.com
The error handling path in its_vpe_irq_domain_alloc() causes a double free
when its_vpe_init() fails after successfully allocating at least one
interrupt. This happens because its_vpe_irq_domain_free() frees the
interrupts along with the area bitmap and the vprop_page and
its_vpe_irq_domain_alloc() subsequently frees the area bitmap and the
vprop_page again.
Fix this by unconditionally invoking its_vpe_irq_domain_free() which
handles all cases correctly and by removing the bitmap/vprop_page freeing
from its_vpe_irq_domain_alloc().
[ tglx: Massaged change log ]
Fixes: 7d75bbb4bc ("irqchip/gic-v3-its: Add VPE irq domain allocation/teardown")
Signed-off-by: Guanrui Huang <guanrui.huang@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240418061053.96803-2-guanrui.huang@linux.alibaba.com
In general it's preferable to avoid placing cpumasks on the stack, as
for large values of NR_CPUS these can consume significant amounts of
stack space and make stack overflows more likely.
Remove cpumask var on stack and use cpumask_any_and() to address it.
Signed-off-by: Dawei Li <dawei.li@shingroup.cn>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240416085454.3547175-4-dawei.li@shingroup.cn
As per the GICv4.1 spec (Arm IHI 0069H, 5.3.19):
"A VMAPP with {V, Alloc}=={0, x} is self-synchronizing, This means the ITS
command queue does not show the command as consumed until all of its
effects are completed."
Furthermore, VSYNC is allowed to deliver an SError when referencing a
non existent VPE.
By these definitions, a VMAPP followed by a VSYNC is a bug, as the
later references a VPE that has been unmapped by the former.
Fix it by eliding the VSYNC in this scenario.
Fixes: 64edfaa9a2 ("irqchip/gic-v4.1: Implement the v4.1 flavour of VMAPP")
Signed-off-by: Nianyao Tang <tangnianyao@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/r/20240406022737.3898763-1-tangnianyao@huawei.com
- Core:
- Make affinity changes immediately effective for interrupt
threads. This reduces the impact on isolated CPUs as it pulls over the
thread right away instead of doing it after the next hardware
interrupt arrived.
- Cleanup and improvements for the interrupt chip simulator
- Deduplication of the interrupt descriptor initialization code so the
sparse and non-sparse mode share more code.
- Drivers:
- A set of conversions to platform_drivers::remove_new() which gets rid
of the pointless return value.
- A new driver for the Starfive JH8100 SoC
- Support for Amlogic-T7 SoCs
- Improvement for the interrupt handling and EOI management for the
loongson interrupt controller.
- The usual fixes and improvements all over the place.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmXt6RUTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoRahEACenZz//vEy+n5t94UCNoYEBsqL4qsl
eHb2LPkOwJdzy0I0et8sSRfmjFgfmiB5vmcOtuTjbA+pAASMU16M5nU38dD4Qw7V
lwfutv3wb0XT7INslvrsEF4SvhapoiSBtzdK4IEVJysaHek/bbvZg8rot2tXTjCR
3sK4sMuWLXxB+MzcaYEXSZlIlsrXcARHYNVCbudsEqL2Rt7mGtBJBMIPAYXaWLMn
Y1B15huDNcj+Z9s/rbX218oSajEYJv24NE7JW/eYhG8Rv3yc+1zMTIARq35V77/3
KIV15XqKozkR4G8BEzQ1hUp6l1cggOjMslkwjyKnXTddkHQnQs5928/48y1qs4W0
IDpJqpPL30ckfzg/fUKfUU98t95qB4X55jmK3LuiWfdS8cfd65gq4Ro2bIszM1NQ
SYhcTvZRRcNJqlbO3rQfFAmVU0bvVyR3DlmrLzVl2tH5touwNBBQ/3D3o7CRGEns
37c07zjVZnir+HFmrtTKOiENTay+fHrtIw5dFf7FMqREpE4kL/nsgZfN0wgZPUHj
QGFExV/kJNSMvqwCz77uvHt6c5uoVZGn2j8iYAdqWVKYRcWCMids2gVEkc8QK4gQ
eWsIEAClIEjArPqpQzPE2v3a9puCmOpbHWRmU7VDtNka9/ur8qoU2KMXMJBySaL4
UKXfWYE+43RVbQ==
=AbVv
-----END PGP SIGNATURE-----
Merge tag 'irq-core-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"Core:
- Make affinity changes take effect immediately for interrupt
threads. This reduces the impact on isolated CPUs as it pulls over
the thread right away instead of doing it after the next hardware
interrupt arrived.
- Cleanup and improvements for the interrupt chip simulator
- Deduplication of the interrupt descriptor initialization code so
the sparse and non-sparse mode share more code.
Drivers:
- A set of conversions to platform_drivers::remove_new() which gets
rid of the pointless return value.
- A new driver for the Starfive JH8100 SoC
- Support for Amlogic-T7 SoCs
- Improvement for the interrupt handling and EOI management for the
loongson interrupt controller.
- The usual fixes and improvements all over the place"
* tag 'irq-core-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
irqchip/ts4800: Convert to platform_driver::remove_new() callback
irqchip/stm32-exti: Convert to platform_driver::remove_new() callback
irqchip/renesas-rza1: Convert to platform_driver::remove_new() callback
irqchip/renesas-irqc: Convert to platform_driver::remove_new() callback
irqchip/renesas-intc-irqpin: Convert to platform_driver::remove_new() callback
irqchip/pruss-intc: Convert to platform_driver::remove_new() callback
irqchip/mvebu-pic: Convert to platform_driver::remove_new() callback
irqchip/madera: Convert to platform_driver::remove_new() callback
irqchip/ls-scfg-msi: Convert to platform_driver::remove_new() callback
irqchip/keystone: Convert to platform_driver::remove_new() callback
irqchip/imx-irqsteer: Convert to platform_driver::remove_new() callback
irqchip/imx-intmux: Convert to platform_driver::remove_new() callback
irqchip/imgpdc: Convert to platform_driver::remove_new() callback
irqchip: Add StarFive external interrupt controller
dt-bindings: interrupt-controller: Add starfive,jh8100-intc
arm64: dts: Add gpio_intc node for Amlogic-T7 SoCs
irqchip/meson-gpio: Add support for Amlogic-T7 SoCs
dt-bindings: interrupt-controller: Add support for Amlogic-T7 SoCs
irqchip/vic: Fix a kernel-doc warning
genirq: Wake interrupt threads immediately when changing affinity
...
The GIC/ITS code is designed to ensure to pick up any preallocated LPI
tables on the redistributors, as enabling LPIs is a one-way switch. There
is no such restriction for vLPIs, and for GICv4.1 it is expected to
allocate a new vPE table at boot.
This works as intended when initializing an ITS, however when setting up a
redistributor in cpu_init_lpis() the early return for preallocated RD
tables skips straight past the GICv4 setup. This all comes to a head when
trying to kexec() into a new kernel, as the new kernel silently fails to
set up GICv4, leading to a complete loss of SGIs and LPIs for KVM VMs.
Slap a band-aid on the problem by ensuring its_cpu_init_lpis() always
initializes GICv4 on the way out, even if the other RD tables were
preallocated.
Fixes: 6479450f72 ("irqchip/gic-v4: Fix occasional VLPI drop")
Reported-by: George Cherian <gcherian@marvell.com>
Co-developed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240219185809.286724-2-oliver.upton@linux.dev
When updating the affinity of a VPE, the VMOVP command is currently skipped
if the two CPUs are part of the same VPE affinity.
But this is wrong, as the doorbell corresponding to this VPE is still
delivered on the 'old' CPU, which screws up the balancing. Furthermore,
offlining that 'old' CPU results in doorbell interrupts generated for this
VPE being discarded.
The harsh reality is that VMOVP cannot be elided when a set_affinity()
request occurs. It needs to be obeyed, and if an optimisation is to be
made, it is at the point where the affinity change request is made (such as
in KVM).
Drop the VMOVP elision altogether, and only use the vpe_table_mask
to try and stay within the same ITS affinity group if at all possible.
Fixes: dd3f050a21 (irqchip/gic-v4.1: Implement the v4.1 flavour of VMOVP)
Reported-by: Kunkun Jiang <jiangkunkun@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240213101206.2137483-4-maz@kernel.org
While refactoring the way the ITSs are probed, the handling of quirks
applicable to ACPI-based platforms was lost. As a result, systems such as
HIP07 lose their GICv4 functionnality, and some other may even fail to
boot, unless they are configured to boot with DT.
Move the enabling of quirks into its_probe_one(), making it common to all
firmware implementations.
Fixes: 9585a495ac ("irqchip/gic-v3-its: Split allocation from initialisation of its_node")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240213101206.2137483-3-maz@kernel.org
Although the GICv3 code base has gained some handling of systems failing to
handle the shareability attributes, the GICv4 side of things has been
firmly ignored.
This is unfortunate, as the new recent addition of the "dma-noncoherent" is
supposed to apply to all of the GICR tables, and not just the ones that are
common to v3 and v4.
Add some checks to handle the VPROPBASE/VPENDBASE shareability and
cacheability attributes in the same way we deal with the other GICR_BASE
registers, wrapping the flag check in a helper for improved readability.
Note that this has been found by inspection only, as I don't have access to
HW that suffers from this particular issue.
Fixes: 3a0fff0fb6 ("irqchip/gic-v3: Enable non-coherent redistributors/ITSes DT probing")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Link: https://lore.kernel.org/r/20240213101206.2137483-2-maz@kernel.org
ida_alloc() and ida_free() should be used instead of the deprecated
ida_simple_get() and ida_simple_remove().
The upper limit of ida_simple_get() is exclusive, but the one of
ida_alloc_max() is inclusive. Adjust the code accordingly.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/3b472b0e7edf6e483b8b255cf8d1cb0163532adf.1705222332.git.christophe.jaillet@wanadoo.fr
commit 23baf831a3 ("mm, treewide: redefine MAX_ORDER sanely") has
changed the definition of MAX_ORDER to be inclusive. This has caused
issues with code that was not yet upstream and depended on the previous
definition.
To draw attention to the altered meaning of the define, rename MAX_ORDER
to MAX_PAGE_ORDER.
Link: https://lkml.kernel.org/r/20231228144704.14033-2-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In non-coherent GIC designs, the ITS tables must be flushed before writing
to the GITS_BASER<n> registers, otherwise the ITS could read dirty tables,
which results in unpredictable behavior.
Flush the tables right at the begin of its_setup_baser() to prevent that.
[ tglx: Massage changelog ]
Fixes: a8707f5538 ("irqchip/gic-v3: Add Rockchip 3588001 erratum workaround")
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Fang Xiang <fangxiang3@xiaomi.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231030083256.4345-1-fangxiang3@xiaomi.com
When splitting the allocation of the ITS node from its configuration,
some of the default settings were kept in the latter instead of
being moved to the former.
This has the side effect of negating some of the quirk detections that
have happened in between, amongst which the dreaded Synquacer hack
(that also affect Dominic's TI platform).
Move the initialisation of these fields early, so that they can again be
overriden by the Synquacer quirk.
Fixes: 9585a495ac ("irqchip/gic-v3-its: Split allocation from initialisation of its_node")
Reported by: Dominic Rath <dominic.rath@ibv-augsburg.net>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Dominic Rath <dominic.rath@ibv-augsburg.net>
Link: https://lore.kernel.org/r/20231024084831.GA3788@JADEVM-DRA
Link: https://lore.kernel.org/r/20231024143431.2144579-1-maz@kernel.org
The GIC architecture specification defines a set of registers
for redistributors and ITSes that control the sharebility and
cacheability attributes of redistributors/ITSes initiator ports
on the interconnect (GICR_[V]PROPBASER, GICR_[V]PENDBASER,
GITS_BASER<n>).
Architecturally the GIC provides a means to drive shareability
and cacheability attributes signals and related IWB/OWB/ISH barriers
but it is not mandatory for designs to wire up the corresponding
interconnect signals that control the cacheability/shareability
of transactions.
Redistributors and ITSes interconnect ports can be connected to
non-coherent interconnects that are not able to manage the
shareability/cacheability attributes; this implicitly makes
the redistributors and ITSes non-coherent observers.
So far, the GIC driver on probe executes a write to "probe" for
the redistributors and ITSes registers shareability bitfields
by writing a value (ie InnerShareable - the shareability domain the
CPUs are in) and check it back to detect whether the value sticks or
not; this hinges on a GIC programming model behaviour that predates the
current specifications, that just define shareability bits as writeable
but do not guarantee that writing certain shareability values
enable the expected behaviour for the redistributors/ITSes
memory interconnect ports.
To enable non-coherent GIC designs, introduce the "dma-noncoherent"
device tree property to allow firmware to describe redistributors and
ITSes as non-coherent observers on the memory interconnect and use the
property to force the shareability attributes to be programmed into the
redistributors and ITSes registers through the GIC quirks mechanism.
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231006125929.48591-3-lpieralisi@kernel.org
In order to pave the way for more fancy quirk handling without making
more of a mess of this terrible driver, split the allocation of the
ITS descriptor (its_node) from the actual probing.
This will allow firmware-specific hooks to be added between these
two points.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231006125929.48591-4-lpieralisi@kernel.org
Commit a8707f5538 ("irqchip/gic-v3: Add Rockchip 3588001 erratum
workaround") mentioned RK3588S (the slimmed down variant of RK3588)
being affected, but did not check for its compatible value. Thus the
quirk is not applied on RK3588S. Since the GIC ITS node got added to the
upstream DT, boards using RK3588S are no longer booting without this
quirk being applied.
Fixes: 06cdac8e84 ("arm64: dts: rockchip: add GIC ITS support to rk3588")
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230703164129.193991-1-sebastian.reichel@collabora.com
We normally rely on the irq_to_cpuid_[un]lock() primitives to make
sure nothing will change col->idx while performing a LPI invalidation.
However, these primitives do not cover VPE doorbells, and we have
some open-coded locking for that. Unfortunately, this locking is
pretty bogus.
Instead, extend the above primitives to cover VPE doorbells and
convert the whole thing to it.
Fixes: f3a059219b ("irqchip/gic-v4.1: Ensure mutual exclusion between vPE affinity change and RD access")
Reported-by: Kunkun Jiang <jiangkunkun@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Cc: wanghaibin.wang@huawei.com
Tested-by: Kunkun Jiang <jiangkunkun@huawei.com>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/r/20230617073242.3199746-1-maz@kernel.org
GICv3 LPIs are impacted by an architectural design issue: they do not
have a global active state and as such a given LPI can be delivered to
a new CPU after an affinity change while the previous instance of the
same LPI handler has not yet completed on the original CPU.
If LPIs had an active state, this second LPI would not be delivered
until the first CPU deactivated the initial LPI, just like SPIs.
To solve this issue, use the newly introduced IRQD_RESEND_WHEN_IN_PROGRESS
flag, ensuring that we do not lose an LPI being delivered during that window
by getting the GIC to resend it.
This workaround gets enabled for all LPIs, including the VPE doorbells.
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: James Gowans <jgowans@amazon.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <maz@kernel.org>
Cc: KarimAllah Raslan <karahmed@amazon.com>
Cc: Yipeng Zou <zouyipeng@huawei.com>
Cc: Zhang Jianhua <chris.zjh@huawei.com>
[maz: massaged commit message]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230608120021.3273400-4-jgowans@amazon.com
switching from a user process to a kernel thread.
- More folio conversions from Kefeng Wang, Zhang Peng and Pankaj Raghav.
- zsmalloc performance improvements from Sergey Senozhatsky.
- Yue Zhao has found and fixed some data race issues around the
alteration of memcg userspace tunables.
- VFS rationalizations from Christoph Hellwig:
- removal of most of the callers of write_one_page().
- make __filemap_get_folio()'s return value more useful
- Luis Chamberlain has changed tmpfs so it no longer requires swap
backing. Use `mount -o noswap'.
- Qi Zheng has made the slab shrinkers operate locklessly, providing
some scalability benefits.
- Keith Busch has improved dmapool's performance, making part of its
operations O(1) rather than O(n).
- Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd,
permitting userspace to wr-protect anon memory unpopulated ptes.
- Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive rather
than exclusive, and has fixed a bunch of errors which were caused by its
unintuitive meaning.
- Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature,
which causes minor faults to install a write-protected pte.
- Vlastimil Babka has done some maintenance work on vma_merge():
cleanups to the kernel code and improvements to our userspace test
harness.
- Cleanups to do_fault_around() by Lorenzo Stoakes.
- Mike Rapoport has moved a lot of initialization code out of various
mm/ files and into mm/mm_init.c.
- Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for
DRM, but DRM doesn't use it any more.
- Lorenzo has also coverted read_kcore() and vread() to use iterators
and has thereby removed the use of bounce buffers in some cases.
- Lorenzo has also contributed further cleanups of vma_merge().
- Chaitanya Prakash provides some fixes to the mmap selftesting code.
- Matthew Wilcox changes xfs and afs so they no longer take sleeping
locks in ->map_page(), a step towards RCUification of pagefaults.
- Suren Baghdasaryan has improved mmap_lock scalability by switching to
per-VMA locking.
- Frederic Weisbecker has reworked the percpu cache draining so that it
no longer causes latency glitches on cpu isolated workloads.
- Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig
logic.
- Liu Shixin has changed zswap's initialization so we no longer waste a
chunk of memory if zswap is not being used.
- Yosry Ahmed has improved the performance of memcg statistics flushing.
- David Stevens has fixed several issues involving khugepaged,
userfaultfd and shmem.
- Christoph Hellwig has provided some cleanup work to zram's IO-related
code paths.
- David Hildenbrand has fixed up some issues in the selftest code's
testing of our pte state changing.
- Pankaj Raghav has made page_endio() unneeded and has removed it.
- Peter Xu contributed some rationalizations of the userfaultfd
selftests.
- Yosry Ahmed has fixed an issue around memcg's page recalim accounting.
- Chaitanya Prakash has fixed some arm-related issues in the
selftests/mm code.
- Longlong Xia has improved the way in which KSM handles hwpoisoned
pages.
- Peter Xu fixes a few issues with uffd-wp at fork() time.
- Stefan Roesch has changed KSM so that it may now be used on a
per-process and per-cgroup basis.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZEr3zQAKCRDdBJ7gKXxA
jlLoAP0fpQBipwFxED0Us4SKQfupV6z4caXNJGPeay7Aj11/kQD/aMRC2uPfgr96
eMG3kwn2pqkB9ST2QpkaRbxA//eMbQY=
=J+Dj
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of
switching from a user process to a kernel thread.
- More folio conversions from Kefeng Wang, Zhang Peng and Pankaj
Raghav.
- zsmalloc performance improvements from Sergey Senozhatsky.
- Yue Zhao has found and fixed some data race issues around the
alteration of memcg userspace tunables.
- VFS rationalizations from Christoph Hellwig:
- removal of most of the callers of write_one_page()
- make __filemap_get_folio()'s return value more useful
- Luis Chamberlain has changed tmpfs so it no longer requires swap
backing. Use `mount -o noswap'.
- Qi Zheng has made the slab shrinkers operate locklessly, providing
some scalability benefits.
- Keith Busch has improved dmapool's performance, making part of its
operations O(1) rather than O(n).
- Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd,
permitting userspace to wr-protect anon memory unpopulated ptes.
- Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive
rather than exclusive, and has fixed a bunch of errors which were
caused by its unintuitive meaning.
- Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature,
which causes minor faults to install a write-protected pte.
- Vlastimil Babka has done some maintenance work on vma_merge():
cleanups to the kernel code and improvements to our userspace test
harness.
- Cleanups to do_fault_around() by Lorenzo Stoakes.
- Mike Rapoport has moved a lot of initialization code out of various
mm/ files and into mm/mm_init.c.
- Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for
DRM, but DRM doesn't use it any more.
- Lorenzo has also coverted read_kcore() and vread() to use iterators
and has thereby removed the use of bounce buffers in some cases.
- Lorenzo has also contributed further cleanups of vma_merge().
- Chaitanya Prakash provides some fixes to the mmap selftesting code.
- Matthew Wilcox changes xfs and afs so they no longer take sleeping
locks in ->map_page(), a step towards RCUification of pagefaults.
- Suren Baghdasaryan has improved mmap_lock scalability by switching to
per-VMA locking.
- Frederic Weisbecker has reworked the percpu cache draining so that it
no longer causes latency glitches on cpu isolated workloads.
- Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig
logic.
- Liu Shixin has changed zswap's initialization so we no longer waste a
chunk of memory if zswap is not being used.
- Yosry Ahmed has improved the performance of memcg statistics
flushing.
- David Stevens has fixed several issues involving khugepaged,
userfaultfd and shmem.
- Christoph Hellwig has provided some cleanup work to zram's IO-related
code paths.
- David Hildenbrand has fixed up some issues in the selftest code's
testing of our pte state changing.
- Pankaj Raghav has made page_endio() unneeded and has removed it.
- Peter Xu contributed some rationalizations of the userfaultfd
selftests.
- Yosry Ahmed has fixed an issue around memcg's page recalim
accounting.
- Chaitanya Prakash has fixed some arm-related issues in the
selftests/mm code.
- Longlong Xia has improved the way in which KSM handles hwpoisoned
pages.
- Peter Xu fixes a few issues with uffd-wp at fork() time.
- Stefan Roesch has changed KSM so that it may now be used on a
per-process and per-cgroup basis.
* tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits)
mm,unmap: avoid flushing TLB in batch if PTE is inaccessible
shmem: restrict noswap option to initial user namespace
mm/khugepaged: fix conflicting mods to collapse_file()
sparse: remove unnecessary 0 values from rc
mm: move 'mmap_min_addr' logic from callers into vm_unmapped_area()
hugetlb: pte_alloc_huge() to replace huge pte_alloc_map()
maple_tree: fix allocation in mas_sparse_area()
mm: do not increment pgfault stats when page fault handler retries
zsmalloc: allow only one active pool compaction context
selftests/mm: add new selftests for KSM
mm: add new KSM process and sysfs knobs
mm: add new api to enable ksm per process
mm: shrinkers: fix debugfs file permissions
mm: don't check VMA write permissions if the PTE/PMD indicates write permissions
migrate_pages_batch: fix statistics for longterm pin retry
userfaultfd: use helper function range_in_vma()
lib/show_mem.c: use for_each_populated_zone() simplify code
mm: correct arg in reclaim_pages()/reclaim_clean_pages_from_list()
fs/buffer: convert create_page_buffers to folio_create_buffers
fs/buffer: add folio_create_empty_buffers helper
...
Rockchip RK3588/RK3588s GIC600 integration does not support the
sharability feature. Rockchip assigned Erratum ID #3588001 for this
issue.
Note, that the 0x0201743b ID is not Rockchip specific and thus
there is an extra of_machine_is_compatible() check.
The flags are named FORCE_NON_SHAREABLE to be vendor agnostic,
since apparently similar integration design errors exist in other
platforms and they can reuse the same flag.
Co-developed-by: XiaoDong Huang <derrick.huang@rock-chips.com>
Signed-off-by: XiaoDong Huang <derrick.huang@rock-chips.com>
Co-developed-by: Kever Yang <kever.yang@rock-chips.com>
Signed-off-by: Kever Yang <kever.yang@rock-chips.com>
Co-developed-by: Lucas Tanure <lucas.tanure@collabora.com>
Signed-off-by: Lucas Tanure <lucas.tanure@collabora.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230418142109.49762-2-sebastian.reichel@collabora.com