Commit graph

10119 commits

Author SHA1 Message Date
Linus Torvalds
664a231d90 Carve out the resctrl filesystem-related code into fs/resctrl/ so that
multiple architectures can share the fs API for manipulating their
 respective hw resource control implementation. This is the second step
 in the work towards sharing the resctrl filesystem interface, the next
 one being plugging ARM's MPAM into the aforementioned fs API.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmg0UDwACgkQEsHwGGHe
 VUqsZw//SNSNcVHF7Gz2YvHrMXGYQFBETScg6fRWn/pTe3x1NrKEJedzMANXpAIy
 1sBAsfDSOyi8MxIZnvMYapLcRdfLGAD+6FQTkyu/IQ3oSsjAxPgrTXornhxUswMY
 LUs40hCv/UaEMkg35NVrRqDlT973kWLwA4iDNNnm6IGtrC8qv4EmdJvgVWHyPTjk
 D80KA5ta+iPzK4l8noBrqyhUIZN3ZAJVJLrjS3Tx/gabuolLURE6p4IdlF/O6WzC
 4NcqUjpwDeFpHpl2M9QJLVEKXHxKz9zZF2gLpT8Eon/ftqqQigBjzsUx/FKp07hZ
 fe2AiQsd4gN9GZa3BGX+Lv+bjvyFadARsOoFbY45szuiUb0oceaRYtFF1ihmO0bV
 bD4nAROE1kAfZpr/9ZRZT63LfE/DAm9TR1YBsViq1rrJvp4odvL15YbdOlIDHZD3
 SmxhTxAokj058MRnhGdHoiMtPa54iw186QYDp0KxLQHLrToBPd7RBtRE8jsYrqrv
 2EvwUxYKyO4vtwr9tzr0ZfptZ/DEsGovoTYD5EtlEGjotQUqsmi5Rxx4+SEQuwFw
 CKSJ3j73gpxqDXTujjOe9bCeeXJqyEbrIkaWpkiBRwm5of7eFPG3Sw74jaCGvm4L
 NM4UufMSDtyVAKfu3HmPkGhujHv0/7h1zYND51aW+GXEroKxy9s=
 =eNCr
 -----END PGP SIGNATURE-----

Merge tag 'x86_cache_for_v6.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 resource control updates from Borislav Petkov:
 "Carve out the resctrl filesystem-related code into fs/resctrl/ so that
  multiple architectures can share the fs API for manipulating their
  respective hw resource control implementation.

  This is the second step in the work towards sharing the resctrl
  filesystem interface, the next one being plugging ARM's MPAM into the
  aforementioned fs API"

* tag 'x86_cache_for_v6.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
  MAINTAINERS: Add reviewers for fs/resctrl
  x86,fs/resctrl: Move the resctrl filesystem code to live in /fs/resctrl
  x86/resctrl: Always initialise rid field in rdt_resources_all[]
  x86/resctrl: Relax some asm #includes
  x86/resctrl: Prefer alloc(sizeof(*foo)) idiom in rdt_init_fs_context()
  x86/resctrl: Squelch whitespace anomalies in resctrl core code
  x86/resctrl: Move pseudo lock prototypes to include/linux/resctrl.h
  x86/resctrl: Fix types in resctrl_arch_mon_ctx_{alloc,free}() stubs
  x86/resctrl: Move enum resctrl_event_id to resctrl.h
  x86/resctrl: Move the filesystem bits to headers visible to fs/resctrl
  fs/resctrl: Add boiler plate for external resctrl code
  x86/resctrl: Add 'resctrl' to the title of the resctrl documentation
  x86/resctrl: Split trace.h
  x86/resctrl: Expand the width of domid by replacing mon_data_bits
  x86/resctrl: Add end-marker to the resctrl_event_id enum
  x86/resctrl: Move is_mba_sc() out of core.c
  x86/resctrl: Drop __init/__exit on assorted symbols
  x86/resctrl: Resctrl_exit() teardown resctrl but leave the mount point
  x86/resctrl: Check all domains are offline in resctrl_exit()
  x86/resctrl: Rename resctrl_sched_in() to begin with "resctrl_arch_"
  ...
2025-05-27 09:53:02 -07:00
Linus Torvalds
ba45037098 linux_kselftest-kunit-6.16-rc1
- Enables qemu_config for riscv32, sparc 64-bit, PowerPC 32-bit BE and
   64-bit LE.
 - Enables CONFIG_SPARC32 to clearly differentiate between sparc 32-bit
   and 64-bit configurations.
 - Enables CONFIG_CPU_BIG_ENDIAN to clearly differentiate between powerpc
   LE and BE configurations.
 - Add feature to list available architectures to kunit tool.
 - Fixes to bugs and changes to documentation.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmgwylUACgkQCwJExA0N
 Qxz1/w/+NVxyC80omepHOxoi8Y+j5vWSlQ3S4n+HThyeb4srJlvJuidZN+zrRIEU
 QLKo4edUDa25I+JkXnnw/8gLG1000PESM/VjC6NIzhCrEHc/cX/WS9OlwbR5JkpW
 wOlxju3vWYrfy5i+LSYiH+v822WlSZsznH9AM9w+ws4FWoRNN+7LiEitj2LElrWI
 XHpEYDsBhePu2cFQgAdHaQ5YcMmG64KqeX5Xj+cGjtURciijQDxVuX5k03n0qkbn
 YrH4U0it8ZqxJTImpExnqlP2G8uZS72hE91vK0hwW9oZ+/k9vVeBQWJ13JvNKveb
 +zHGUimJ0n6d5HB5ZFaUAazkbZHKq7mWPp86lBYNiRXsseDwwqDSva5tfUxzXGQP
 MNDK2wrHusH0beRdzdlPjynJABxiOnczmBOAj/Y6t+ZgC2D8BHvyvj+Pecdz3Tr0
 YPLgpY72LqxTKDCvOf91IeT8EFaAjyGfVcN6SvUJ/zjPc7UcopM2DEQvlSZcU/Ac
 EAzgWmFfkwowSWwr8Q4GOXuKzkMj65QhouJrSCoBjuHQuUxa2MZxAbv5snPmSLs0
 635MdPU9uPAEvZKJTbVlXKBMCM3FrS8Buy+dHFEuUiUy0i3PndV5aJUQpIxwM/Du
 +DjQkREy6wsIHXSTu4lm+zoZ0yVKQdEZJgsYjY+JfRQ25cLWgiQ=
 =32OH
 -----END PGP SIGNATURE-----

Merge tag 'linux_kselftest-kunit-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kunit updates from Shuah Khan:

 - Enable qemu_config for riscv32, sparc 64-bit, PowerPC 32-bit BE and
   64-bit LE

 - Enable CONFIG_SPARC32 to clearly differentiate between sparc 32-bit
   and 64-bit configurations

 - Enable CONFIG_CPU_BIG_ENDIAN to clearly differentiate between powerpc
   LE and BE configurations

 - Add feature to list available architectures to kunit tool

 - Fixes to bugs and changes to documentation

* tag 'linux_kselftest-kunit-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  kunit: Fix wrong parameter to kunit_deactivate_static_stub()
  kunit: tool: add test counts to JSON output
  Documentation: kunit: improve example on testing static functions
  kunit: executor: Remove const from kunit_filter_suites() allocation type
  kunit: qemu_configs: Disable faulting tests on 32-bit SPARC
  kunit: qemu_configs: Add 64-bit SPARC configuration
  kunit: qemu_configs: sparc: Explicitly enable CONFIG_SPARC32=y
  kunit: qemu_configs: Add PowerPC 32-bit BE and 64-bit LE
  kunit: qemu_configs: powerpc: Explicitly enable CONFIG_CPU_BIG_ENDIAN=y
  kunit: tool: Implement listing of available architectures
  kunit: qemu_configs: Add riscv32 config
  kunit: configs: Enable CONFIG_INIT_STACK_ALL_PATTERN in all_tests
2025-05-26 14:29:44 -07:00
Linus Torvalds
14418ddcc2 This update includes the following changes:
API:
 
 - Fix memcpy_sglist to handle partially overlapping SG lists.
 - Use memcpy_sglist to replace null skcipher.
 - Rename CRYPTO_TESTS to CRYPTO_BENCHMARK.
 - Flip CRYPTO_MANAGER_DISABLE_TEST into CRYPTO_SELFTESTS.
 - Hide CRYPTO_MANAGER.
 - Add delayed freeing of driver crypto_alg structures.
 
 Compression:
 
 - Allocate large buffers on first use instead of initialisation in scomp.
 - Drop destination linearisation buffer in scomp.
 - Move scomp stream allocation into acomp.
 - Add acomp scatter-gather walker.
 - Remove request chaining.
 - Add optional async request allocation.
 
 Hashing:
 
 - Remove request chaining.
 - Add optional async request allocation.
 - Move partial block handling into API.
 - Add ahash support to hmac.
 - Fix shash documentation to disallow usage in hard IRQs.
 
 Algorithms:
 
 - Remove unnecessary SIMD fallback code on x86 and arm/arm64.
 - Drop avx10_256 xts(aes)/ctr(aes) on x86.
 - Improve avx-512 optimisations for xts(aes).
 - Move chacha arch implementations into lib/crypto.
 - Move poly1305 into lib/crypto and drop unused Crypto API algorithm.
 - Disable powerpc/poly1305 as it has no SIMD fallback.
 - Move sha256 arch implementations into lib/crypto.
 - Convert deflate to acomp.
 - Set block size correctly in cbcmac.
 
 Drivers:
 
 - Do not use sg_dma_len before mapping in sun8i-ss.
 - Fix warm-reboot failure by making shutdown do more work in qat.
 - Add locking in zynqmp-sha.
 - Remove cavium/zip.
 - Add support for PCI device 0x17D8 to ccp.
 - Add qat_6xxx support in qat.
 - Add support for RK3576 in rockchip-rng.
 - Add support for i.MX8QM in caam.
 
 Others:
 
 - Fix irq_fpu_usable/kernel_fpu_begin inconsistency during CPU bring-up.
 - Add new SEV/SNP platform shutdown API in ccp.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmgz47AACgkQxycdCkmx
 i6fvKRAAr4Xa903L0r1Q1P1alQqoFFCqimUWeH72m68LiWynHWi0lUo0z/+tKweg
 mnPStz7/Ha9HRHJjdNCMPnlJqXQDkuH3bIOuBJCwduDuhHo9VGOd46XGzmGMv3gb
 HKuZhI0lk7pznK3CSyD/2nHmbDCHD+7feTZSBMoN9mm875+aSoM6fdxgak8uPFcq
 KbB1L+hObTn2kAPSqRrNOR8/xG2N7hdH8eax7Li+LAtqYNVT5HvWVECsB/CKRPfB
 sgAv3UTzcIFapSSHUHaONppSeoqPAIAeV7SdQhJvlT+EUUR/h/B6+D9OUQQqbphQ
 LBalgTnqMKl0ymDEQFQ6QyYCat9ZfNmDft2WcXEsxc8PxImkgJI1W3B8O51sOjbG
 78D8JqVQ96dleo4FsBhM2wfG0b41JM6zU4raC4vS7a3qsUS+Q1MpehvcS1iORicy
 SpGdE8e7DLlxKhzWyW1xJnbrtMZDC7Sa2hUnxrvP0/xOvRhChKscRVtWcf0a5q7X
 8JmuvwVSOJuSbQ3MeFbQvpo5lR9+0WsNjM6e9miiH6Y7vZUKmWcq2yDp377qVzeh
 7NK6+OwGIQZZExrmtPw2BXwssT9Eg+ks6Y7g2Ne7yzvrjVNfEPY7Cws/5w7p8mRS
 qhrcpbJNFlWgD7YYkmGZFTQ8DCN25ipP8lklO/hbcfchqLE/o1o=
 =O8L5
 -----END PGP SIGNATURE-----

Merge tag 'v6.16-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto updates from Herbert Xu:
 "API:
   - Fix memcpy_sglist to handle partially overlapping SG lists
   - Use memcpy_sglist to replace null skcipher
   - Rename CRYPTO_TESTS to CRYPTO_BENCHMARK
   - Flip CRYPTO_MANAGER_DISABLE_TEST into CRYPTO_SELFTESTS
   - Hide CRYPTO_MANAGER
   - Add delayed freeing of driver crypto_alg structures

  Compression:
   - Allocate large buffers on first use instead of initialisation in scomp
   - Drop destination linearisation buffer in scomp
   - Move scomp stream allocation into acomp
   - Add acomp scatter-gather walker
   - Remove request chaining
   - Add optional async request allocation

  Hashing:
   - Remove request chaining
   - Add optional async request allocation
   - Move partial block handling into API
   - Add ahash support to hmac
   - Fix shash documentation to disallow usage in hard IRQs

  Algorithms:
   - Remove unnecessary SIMD fallback code on x86 and arm/arm64
   - Drop avx10_256 xts(aes)/ctr(aes) on x86
   - Improve avx-512 optimisations for xts(aes)
   - Move chacha arch implementations into lib/crypto
   - Move poly1305 into lib/crypto and drop unused Crypto API algorithm
   - Disable powerpc/poly1305 as it has no SIMD fallback
   - Move sha256 arch implementations into lib/crypto
   - Convert deflate to acomp
   - Set block size correctly in cbcmac

  Drivers:
   - Do not use sg_dma_len before mapping in sun8i-ss
   - Fix warm-reboot failure by making shutdown do more work in qat
   - Add locking in zynqmp-sha
   - Remove cavium/zip
   - Add support for PCI device 0x17D8 to ccp
   - Add qat_6xxx support in qat
   - Add support for RK3576 in rockchip-rng
   - Add support for i.MX8QM in caam

  Others:
   - Fix irq_fpu_usable/kernel_fpu_begin inconsistency during CPU bring-up
   - Add new SEV/SNP platform shutdown API in ccp"

* tag 'v6.16-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (382 commits)
  x86/fpu: Fix irq_fpu_usable() to return false during CPU onlining
  crypto: qat - add missing header inclusion
  crypto: api - Redo lookup on EEXIST
  Revert "crypto: testmgr - Add hash export format testing"
  crypto: marvell/cesa - Do not chain submitted requests
  crypto: powerpc/poly1305 - add depends on BROKEN for now
  Revert "crypto: powerpc/poly1305 - Add SIMD fallback"
  crypto: ccp - Add missing tee info reg for teev2
  crypto: ccp - Add missing bootloader info reg for pspv5
  crypto: sun8i-ce - move fallback ahash_request to the end of the struct
  crypto: octeontx2 - Use dynamic allocated memory region for lmtst
  crypto: octeontx2 - Initialize cptlfs device info once
  crypto: xts - Only add ecb if it is not already there
  crypto: lrw - Only add ecb if it is not already there
  crypto: testmgr - Add hash export format testing
  crypto: testmgr - Use ahash for generic tfm
  crypto: hmac - Add ahash support
  crypto: testmgr - Ignore EEXIST on shash allocation
  crypto: algapi - Add driver template support to crypto_inst_setname
  crypto: shash - Set reqsize in shash_alg
  ...
2025-05-26 13:47:28 -07:00
Linus Torvalds
15d90a5e55 CRC updates for 6.16
Cleanups for the kernel's CRC (cyclic redundancy check) code:
 
 - Use __ro_after_init where appropriate
 - Remove unnecessary static_key on s390
 - Rename some source code files
 - Rename the crc32 and crc32c crypto API modules
 - Use subsys_initcall instead of arch_initcall
 - Restore maintainers for crc_kunit.c
 - Fold crc16_byte() into crc16.c
 - Add some SPDX license identifiers
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCaDNd3xQcZWJpZ2dlcnNA
 Z29vZ2xlLmNvbQAKCRDzXCl4vpKOKz0tAQCDqDA4Jd/54nnKpChMlKH8MTQDuwfz
 8GHZi50mn4Rw5gD/f+hOGItPfswBId/+MZy+rKWL7bE2e9DdJdtoqRRtwA4=
 =RWFl
 -----END PGP SIGNATURE-----

Merge tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux

Pull CRC updates from Eric Biggers:
 "Cleanups for the kernel's CRC (cyclic redundancy check) code:

   - Use __ro_after_init where appropriate

   - Remove unnecessary static_key on s390

   - Rename some source code files

   - Rename the crc32 and crc32c crypto API modules

   - Use subsys_initcall instead of arch_initcall

   - Restore maintainers for crc_kunit.c

   - Fold crc16_byte() into crc16.c

   - Add some SPDX license identifiers"

* tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
  lib/crc32: add SPDX license identifier
  lib/crc16: unexport crc16_table and crc16_byte()
  w1: ds2406: use crc16() instead of crc16_byte() loop
  MAINTAINERS: add crc_kunit.c back to CRC LIBRARY
  lib/crc: make arch-optimized code use subsys_initcall
  crypto: crc32 - remove "generic" from file and module names
  x86/crc: drop "glue" from filenames
  sparc/crc: drop "glue" from filenames
  s390/crc: drop "glue" from filenames
  powerpc/crc: rename crc32-vpmsum_core.S to crc-vpmsum-template.S
  powerpc/crc: drop "glue" from filenames
  arm64/crc: drop "glue" from filenames
  arm/crc: drop "glue" from filenames
  s390/crc32: Remove no-op module init and exit functions
  s390/crc32: Remove have_vxrs static key
  lib/crc: make the CPU feature static keys __ro_after_init
2025-05-26 13:32:06 -07:00
Suren Baghdasaryan
12ca42c237 alloc_tag: allocate percpu counters for module tags dynamically
When a module gets unloaded it checks whether any of its tags are still in
use and if so, we keep the memory containing module's allocation tags
alive until all tags are unused.  However percpu counters referenced by
the tags are freed by free_module().  This will lead to UAF if the memory
allocated by a module is accessed after module was unloaded.

To fix this we allocate percpu counters for module allocation tags
dynamically and we keep it alive for tags which are still in use after
module unloading.  This also removes the requirement of a larger
PERCPU_MODULE_RESERVE when memory allocation profiling is enabled because
percpu memory for counters does not need to be reserved anymore.

Link: https://lkml.kernel.org/r/20250517000739.5930-1-surenb@google.com
Fixes: 0db6f8d782 ("alloc_tag: load module tags into separate contiguous memory")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reported-by: David Wang <00107082@163.com>
Closes: https://lore.kernel.org/all/20250516131246.6244-1-00107082@163.com/
Tested-by: David Wang <00107082@163.com>
Cc: Christoph Lameter (Ampere) <cl@gentwo.org>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-25 00:53:48 -07:00
Casey Chen
780138b123 alloc_tag: check mem_profiling_support in alloc_tag_init
If mem_profiling_support is false, for example by
sysctl.vm.mem_profiling=never, alloc_tag_init should skip module tags
allocation, codetag type registration and procfs init.

Link: https://lkml.kernel.org/r/20250513182602.121843-1-cachen@purestorage.com
Signed-off-by: Casey Chen <cachen@purestorage.com>
Reviewed-by: Yuanyuan Zhong <yzhong@purestorage.com>
Acked-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-22 14:55:38 -07:00
Eric Biggers
b82f72292a lib/crc32: remove unused support for CRC32C combination
crc32c_combine() and crc32c_shift() are no longer used (except by the
KUnit test that tests them), and their current implementation is very
slow.  Remove them.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://patch.msgid.link/20250519175012.36581-8-ebiggers@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21 15:40:17 -07:00
Tzung-Bi Shih
772e50a76e kunit: Fix wrong parameter to kunit_deactivate_static_stub()
kunit_deactivate_static_stub() accepts real_fn_addr instead of
replacement_addr.  In the case, it always passes NULL to
kunit_deactivate_static_stub().

Fix it.

Link: https://lore.kernel.org/r/20250520082050.2254875-1-tzungbi@kernel.org
Fixes: e047c5eaa7 ("kunit: Expose 'static stub' API to redirect functions")
Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-05-21 09:51:23 -06:00
Luca Ceresoli
592ebd77e6 vsprintf: remove redundant and unused %pCn format specifier
%pC and %pCn print the same string, and commit 900cca2944 ("lib/vsprintf:
add %pC{,n,r} format specifiers for clocks") introducing them does not
clarify any intended difference. It can be assumed %pC is a default for
%pCn as some other specifiers do, but not all are consistent with this
policy. Moreover there is now no other suffix other than 'n', which makes a
default not really useful.

All users in the kernel were using %pC except for one which has been
converted. So now remove %pCn and all the unnecessary extra code and
documentation.

Acked-by: Stephen Boyd <sboyd@kernel.org>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Tested-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Yanteng Si <si.yanteng@linux.dev>
Link: https://lore.kernel.org/r/20250311-vsprintf-pcn-v2-2-0af40fc7dee4@bootlin.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2025-05-16 12:50:00 +02:00
Yury Norov [NVIDIA]
13f0a02bf4 find: Add find_first_andnot_bit()
The function helps to implement cpumask_andnot() APIs.

Signed-off-by: Yury Norov [NVIDIA] <yury.norov@gmail.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: James Morse <james.morse@arm.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: James Morse <james.morse@arm.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Link: https://lore.kernel.org/20250515165855.31452-3-james.morse@arm.com
2025-05-15 20:24:40 +02:00
Lee Trager
e505e14073 pldmfw: Don't require send_package_data or send_component_table to be defined
Not all drivers require send_package_data or send_component_table when
updating firmware. Instead of forcing drivers to implement a stub allow
these functions to go undefined.

Signed-off-by: Lee Trager <lee@trager.us>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250512190109.2475614-2-lee@trager.us
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-15 12:59:18 +02:00
Eric Biggers
289c99bec7 lib/crc32: add SPDX license identifier
lib/crc32.c and include/linux/crc32.h got missed by the bulk SPDX
conversion because of the nonstandard explanation of the license.
However, crc32.c clearly states that it's licensed under the GNU General
Public License, Version 2.  And the comment in crc32.h clearly indicates
that it's meant to have the same license as crc32.c.  Therefore, apply
SPDX-License-Identifier: GPL-2.0-only to both files.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20250514052409.194822-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-05-14 09:15:38 -07:00
Eric Biggers
3937f6db6e lib/crc16: unexport crc16_table and crc16_byte()
Now that neither crc16_table nor crc16_byte() is used outside
lib/crc16.c, fold them into lib/crc16.c.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250513022115.39109-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-05-13 20:37:16 -07:00
Christoph Hellwig
f1c2bca267 xarray: fix kerneldoc for __xa_cmpxchg
Fix the documentation for __xa_cmpxchg to actually describe the
cmpxch-like semantics correctly, based on the version for xa_cmpxchg.

Link: https://lkml.kernel.org/r/20250507051656.3900864-1-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-12 23:50:49 -07:00
Eric Biggers
698de82278 crypto: testmgr - make it easier to enable the full set of tests
Currently the full set of crypto self-tests requires
CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y.  This is problematic in two ways.
First, developers regularly overlook this option.  Second, the
description of the tests as "extra" sometimes gives the impression that
it is not required that all algorithms pass these tests.

Given that the main use case for the crypto self-tests is for
developers, make enabling CONFIG_CRYPTO_SELFTESTS=y just enable the full
set of crypto self-tests by default.

The slow tests can still be disabled by adding the command-line
parameter cryptomgr.noextratests=1, soon to be renamed to
cryptomgr.noslowtests=1.  The only known use case for doing this is for
people trying to use the crypto self-tests to satisfy the FIPS 140-3
pre-operational self-testing requirements when the kernel is being
validated as a FIPS 140-3 cryptographic module.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-12 13:34:03 +08:00
Eric Biggers
40b9969796 crypto: testmgr - replace CRYPTO_MANAGER_DISABLE_TESTS with CRYPTO_SELFTESTS
The negative-sense of CRYPTO_MANAGER_DISABLE_TESTS is a longstanding
mistake that regularly causes confusion.  Especially bad is that you can
have CRYPTO=n && CRYPTO_MANAGER_DISABLE_TESTS=n, which is ambiguous.

Replace CRYPTO_MANAGER_DISABLE_TESTS with CRYPTO_SELFTESTS which has the
expected behavior.

The tests continue to be disabled by default.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-12 13:33:14 +08:00
Eric Biggers
bdc2a55687 crypto: lib/chacha - add array bounds to function prototypes
Add explicit array bounds to the function prototypes for the parameters
that didn't already get handled by the conversion to use chacha_state:

- chacha_block_*():
  Change 'u8 *out' or 'u8 *stream' to u8 out[CHACHA_BLOCK_SIZE].

- hchacha_block_*():
  Change 'u32 *out' or 'u32 *stream' to u32 out[HCHACHA_OUT_WORDS].

- chacha_init():
  Change 'const u32 *key' to 'const u32 key[CHACHA_KEY_WORDS]'.
  Change 'const u8 *iv' to 'const u8 iv[CHACHA_IV_SIZE]'.

No functional changes.  This just makes it clear when fixed-size arrays
are expected.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-12 13:32:53 +08:00
Eric Biggers
607c92141c crypto: lib/chacha - add strongly-typed state zeroization
Now that the ChaCha state matrix is strongly-typed, add a helper
function chacha_zeroize_state() which zeroizes it.  Then convert all
applicable callers to use it instead of direct memzero_explicit.  No
functional changes.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-12 13:32:53 +08:00
Eric Biggers
32c9541189 crypto: lib/chacha - use struct assignment to copy state
Use struct assignment instead of memcpy() in lib/crypto/chacha.c where
appropriate.  No functional change.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-12 13:32:53 +08:00
Eric Biggers
98066f2f89 crypto: lib/chacha - strongly type the ChaCha state
The ChaCha state matrix is 16 32-bit words.  Currently it is represented
in the code as a raw u32 array, or even just a pointer to u32.  This
weak typing is error-prone.  Instead, introduce struct chacha_state:

    struct chacha_state {
            u32 x[16];
    };

Convert all ChaCha and HChaCha functions to use struct chacha_state.
No functional changes.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-12 13:32:53 +08:00
Dr. David Alan Gilbert
479d26ee01 lib/oid_registry.c: remove unused sprint_OID
sprint_OID() was added as part of 2012's commit 4f73175d03 ("X.509: Add
utility functions to render OIDs as strings") but it hasn't been used. 
Remove it.

Note that there's also 'sprint_oid' (lower case) which is used in a lot of
places; that's left as is except for fixing its case in a comment.

Link: https://lkml.kernel.org/r/20250501010502.326472-1-linux@treblig.org
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11 17:54:13 -07:00
Herton R. Krzesinski
92f3c5a005 lib/test_kmod: do not hardcode/depend on any filesystem
Right now test_kmod has hardcoded dependencies on btrfs/xfs.  That is not
optimal since you end up needing to select/build them, but it is not
really required since other fs could be selected for the testing.  Also,
we can't change the default/driver module used for testing on
initialization.

Thus make it more generic: introduce two module parameters (start_driver
and start_test_fs), which allow to select which modules/fs to use for the
testing on test_kmod initialization.  Then it's up to the user to select
which modules/fs to use for testing based on his config.  However, keep
test_module as required default.

This way, config/modules becomes selectable as when the testing is done
from selftests (userspace).

While at it, also change trigger_config_run_type, since at module
initialization we already set the defaults at __kmod_config_init and
should not need to do it again in test_kmod_init(), thus we can avoid to
again set test_driver/test_fs.

Link: https://lkml.kernel.org/r/20250418165047.702487-1-herton@redhat.com
Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
Reviewed-by: Luis Chambelrain <mcgrof@kernel.org>
Cc: Daniel Gomez <da.gomez@samsung.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Petr Pavlu <petr.pavlu@suse.com>
Cc: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11 17:54:09 -07:00
Caleb Sander Mateos
8d1d4b538b scatterlist: inline sg_next()
sg_next() is a short function called frequently in I/O paths. Define it
in the header file so it can be inlined into its callers.

Link: https://lkml.kernel.org/r/20250416160615.3571958-1-csander@purestorage.com
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11 17:54:08 -07:00
Zijun Hu
3eff6a3e57 errseq: eliminate special limitation for macro MAX_ERRNO
Current errseq implementation depends on a very special precondition that
macro MAX_ERRNO must be (2^n - 1).

Eliminate the limitation by

- redefining macro ERRSEQ_SHIFT
- defining a new macro ERRNO_MASK instead of MAX_ERRNO for errno mask.

There is no plan to change the value of MAX_ERRNO, but this makes the
implementation more generic and eliminates the BUILD_BUG_ON().

Link: https://lkml.kernel.org/r/20250407-improve_errseq-v1-1-7b27cbeb8298@quicinc.com
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11 17:54:06 -07:00
Mario Limonciello
ae5b350085 kstrtox: add support for enabled and disabled in kstrtobool()
In some places in the kernel there is a design pattern for sysfs
attributes to use kstrtobool() in store() and str_enabled_disabled() in
show().

This is counterintuitive to interact with because kstrtobool() takes
on/off but str_enabled_disabled() shows enabled/disabled.  Some of those
sysfs uses could switch to str_on_off() but for some attributes
enabled/disabled really makes more sense.

Add support for kstrtobool() to accept enabled/disabled.

Link: https://lkml.kernel.org/r/20250321022538.1532445-1-superm1@kernel.org
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11 17:54:06 -07:00
Chisheng Chen
3dfd79cc87 lib/rbtree.c: fix the example typo
Replace `sr` with `Sr`.  The condition `!tmp1 || rb_is_black(tmp1)`
ensures that `tmp1` (which is `sibling->rb_right`) is either NULL or a
black node.  Therefore, the right child of the sibling must be black, and
the example should use `Sr` instead of `sr`.

Link: https://lkml.kernel.org/r/20250403112614.570140-1-johnny1001s000602@gmail.com
Signed-off-by: Chisheng Chen <johnny1001s000602@gmail.com>
Cc: Hsin Chang Yu <zxcvb600870024@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11 17:54:04 -07:00
Uladzislau Rezki (Sony)
2d76e79315 lib/test_vmalloc.c: allow built-in execution
Remove the dependency on module loading ("m") for the vmalloc test suite,
enabling it to be built directly into the kernel, so both ("=m") and
("=y") are supported.

Motivation:
- Faster debugging/testing of vmalloc code;
- It allows to configure the test via kernel-boot parameters.

Configuration example:
  test_vmalloc.nr_threads=64
  test_vmalloc.run_test_mask=7
  test_vmalloc.sequential_test_order=1

Link: https://lkml.kernel.org/r/20250417161216.88318-2-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Adrian Huang <ahuang12@lenovo.com>
Tested-by: Adrian Huang <ahuang12@lenovo.com>
Cc: Christop Hellwig <hch@infradead.org>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11 17:48:34 -07:00
Uladzislau Rezki (Sony)
7a73348e5d lib/test_vmalloc.c: replace RWSEM to SRCU for setup
The test has the initialization step during which threads are created.  To
prevent the workers from starting prematurely a write lock was previously
used by the main setup thread, while each worker would block on a read
lock.

Replace this RWSEM based synchronization with a simpler SRCU based
approach.  Which does two basic steps:

- Main thread wraps the setup phase in an SRCU read-side critical
  section.  Pair of srcu_read_lock()/srcu_read_unlock().
- Each worker calls synchronize_srcu() on entry, ensuring it waits for
  the initialization phase to be completed.

This patch eliminates the need for down_read()/up_read() and
down_write()/up_write() pairs thus simplifying the logic and improving
clarity.

[urezki@gmail.com: fix compile error with CONFIG_TINY_RCU]
  Link: https://lkml.kernel.org/r/20250420142029.103169-1-urezki@gmail.com
Link: https://lkml.kernel.org/r/20250417161216.88318-1-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Adrian Huang <ahuang12@lenovo.com>
Tested-by: Adrian Huang <ahuang12@lenovo.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Christop Hellwig <hch@infradead.org>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11 17:48:33 -07:00
Sidhartha Kumar
2a6ed1b411 maple_tree: reorder mas->store_type case statements
Move the unlikely case that mas->store_type is invalid to be the last
evaluated case and put liklier cases higher up.

Link: https://lkml.kernel.org/r/20250410191446.2474640-7-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Suggested-by: Liam R. Howlett <liam.howlett@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11 17:48:29 -07:00
Sidhartha Kumar
271152a973 maple_tree: add sufficient height
In order to support rebalancing and spanning stores using less than the
worst case number of nodes, we need to track more than just the vacant
height.  Using only vacant height to reduce the worst case maple node
allocation count can lead to a shortcoming of nodes in the following
scenarios.

For rebalancing writes, when a leaf node becomes insufficient, it may be
combined with a sibling into a single node.  This means that the parent
node which has entries for this children will lose one entry.  If this
parent node was just meeting the minimum entries, losing one entry will
now cause this parent node to be insufficient.  This leads to a cascading
operation of rebalancing at different levels and can lead to more node
allocations than simply using vacant height can return.

For spanning writes, a similar situation occurs.  At the location at which
a spanning write is detected, the number of ancestor nodes may similarly
need to rebalanced into a smaller number of nodes and the same cascading
situation could occur.

To use less than the full height of the tree for the number of
allocations, we also need to track the height at which a non-leaf node
cannot become insufficient.  This means even if a rebalance occurs to a
child of this node, it currently has enough entries that it can lose one
without any further action.  This field is stored in the maple write state
as sufficient height.  In mas_prealloc_calc() when figuring out how many
nodes to allocate, we check if the vacant node is lower in the tree than a
sufficient node (has a larger value).  If it is, we cannot use the vacant
height and must use the difference in the height and sufficient height as
the basis for the number of nodes needed.

An off by one bug was also discovered in mast_overflow() where it is using
>= rather than >.  This caused extra iterations of the
mas_spanning_rebalance() loop and lead to unneeded allocations.  A test is
also added to check the number of allocations is correct.

Link: https://lkml.kernel.org/r/20250410191446.2474640-6-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11 17:48:29 -07:00
Sidhartha Kumar
300a5b4ffe maple_tree: break on convergence in mas_spanning_rebalance()
This allows support for using the vacant height to calculate the worst
case number of nodes needed for wr_rebalance operation. 
mas_spanning_rebalance() was seen to perform unnecessary node allocations.
We can reduce allocations by breaking early during the rebalancing loop
once we realize that we have ascended to a common ancestor.

Link: https://lkml.kernel.org/r/20250410191446.2474640-5-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Suggested-by: Liam Howlett <liam.howlett@oracle.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11 17:48:28 -07:00
Sidhartha Kumar
ad88fc17d2 maple_tree: use vacant nodes to reduce worst case allocations
In order to determine the store type for a maple tree operation, a walk of
the tree is done through mas_wr_walk().  This function descends the tree
until a spanning write is detected or we reach a leaf node.  While
descending, keep track of the height at which we encounter a node with
available space.  This is done by checking if mas->end is less than the
number of slots a given node type can fit.

Now that the height of the vacant node is tracked, we can use the
difference between the height of the tree and the height of the vacant
node to know how many levels we will have to propagate creating new nodes.
Update mas_prealloc_calc() to consider the vacant height and reduce the
number of worst-case allocations.

Rebalancing and spanning stores are not supported and fall back to using
the full height of the tree for allocations.

Update preallocation testing assertions to take into account vacant
height.

Link: https://lkml.kernel.org/r/20250410191446.2474640-4-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11 17:48:28 -07:00
Sidhartha Kumar
f9d3a963fe maple_tree: use height and depth consistently
For the maple tree, the root node is defined to have a depth of 0 with a
height of 1.  Each level down from the node, these values are incremented
by 1.  Various code paths define a root with depth 1 which is inconsisent
with the definition.  Modify the code to be consistent with this
definition.

In mas_spanning_rebalance(), l_mas.depth was being used to track the
height based on the number of iterations done in the main loop.  This
information was then used in mas_put_in_tree() to set the height.  Rather
than overload the l_mas.depth field to track height, simply keep track of
height in the local variable new_height and directly pass this to
mas_wmb_replace() which will be passed into mas_put_in_tree().  This
allows up to remove writes to l_mas.depth.

Link: https://lkml.kernel.org/r/20250410191446.2474640-3-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11 17:48:28 -07:00
Sidhartha Kumar
28092a652f maple_tree: convert mas_prealloc_calc() to take in a maple write state
Patch series "Track node vacancy to reduce worst case allocation counts", v5.

================ overview ========================
Currently, the maple tree preallocates the worst case number of nodes for
given store type by taking into account the whole height of the tree. 
This comes from a worst case scenario of every node in the tree being full
and having to propagate node allocation upwards until we reach the root of
the tree.  This can be optimized if there are vacancies in nodes that are
at a lower depth than the root node.  This series implements tracking the
level at which there is a vacant node so we only need to allocate until
this level is reached, rather than always using the full height of the
tree.  The ma_wr_state struct is modified to add a field which keeps track
of the vacant height and is updated during walks of the tree.  This value
is then read in mas_prealloc_calc() when we decide how many nodes to
allocate.

For rebalancing and spanning stores, we also need to track the lowest
height at which a node has 1 more entry than the minimum sufficient number
of entries.  This is because rebalancing can cause a parent node to become
insufficient which results in further node allocations.  In this case, we
need to use the sufficient height as the worst case rather than the vacant
height.

patch 1-2: preparatory patches
patch 3: implement vacant height tracking + update the tests
patch 4: support vacant height tracking for rebalancing writes
patch 5: implement sufficient height tracking
patch 6: reorder switch case statements

================ results =========================
Bpftrace was used to profile the allocation path for requesting new maple
nodes while running stress-ng mmap 120s.  The histograms below represent
requests to kmem_cache_alloc_bulk() and show the count argument.  This
represnts how many maple nodes the caller is requesting in
kmem_cache_alloc_bulk()

command: stress-ng --mmap 4 --timeout 120

mm-unstable 

@bulk_alloc_req:
[3, 4)                 4 |                                                    |
[4, 5)             54170 |@                                                   |
[5, 6)                 0 |                                                    |
[6, 7)            893057 |@@@@@@@@@@@@@@@@@@@@                                |
[7, 8)                 4 |                                                    |
[8, 9)           2230287 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[9, 10)            55811 |@                                                   |
[10, 11)           77834 |@                                                   |
[11, 12)               0 |                                                    |
[12, 13)         1368684 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                     |
[13, 14)               0 |                                                    |
[14, 15)               0 |                                                    |
[15, 16)          367197 |@@@@@@@@                                            |


@maple_node_total: 46,630,160
@total_vmas: 46184591

mm-unstable + this series

@bulk_alloc_req:
[2, 3)               198 |                                                    |
[3, 4)                 4 |                                                    |
[4, 5)                43 |                                                    |
[5, 6)                 0 |                                                    |
[6, 7)           1069503 |@@@@@@@@@@@@@@@@@@@@@                               |
[7, 8)                 4 |                                                    |
[8, 9)           2597268 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[9, 10)           472191 |@@@@@@@@@                                           |
[10, 11)          191904 |@@@                                                 |
[11, 12)               0 |                                                    |
[12, 13)          247316 |@@@@                                                |
[13, 14)               0 |                                                    |
[14, 15)               0 |                                                    |
[15, 16)           98769 |@                                                   |


@maple_node_total: 37,813,856
@total_vmas: 43493287

This represents a ~19% reduction in the number of bulk maple nodes allocated.

For more reproducible results, a historgram of the return value of
mas_prealloc_calc() is displayed while running the maple_tree_tests whcih
have a deterministic store pattern

mas_prealloc_calc() return value mm-unstable
1   :                                                    (12068)
3   :                                                    (11836)
5   : *****                                              (271192)
7   : ************************************************** (2329329)
9   : ***********                                        (534186)
10  :                                                    (435)
11  : ***************                                    (704306)
13  : ********                                           (409781)

mas_prealloc_calc() return value mm-unstable + this series
1   :                                                    (12070)
3   : ************************************************** (3548777)
5   : ********                                           (633458)
7   :                                                    (65081)
9   :                                                    (11224)
10  :                                                    (341)
11  :                                                    (2973)
13  :                                                    (68)

do_mmap latency was also measured for regressions:
command: stress-ng --mmap 4 --timeout 120

mm-unstable:
avg = 7162 nsecs, total: 16101821292 nsecs, count: 2248034

mm-unstable + this series:
avg = 6689 nsecs, total: 15135391764 nsecs, count: 2262726


stress-ng --mmap4 --timeout 120

with vacant_height:
stress-ng: info:  [257]                   21526312 Maple Tree Read                0.176 M/sec
stress-ng: info:  [257]                  339979348 Maple Tree Write               2.774 M/sec

without vacant_height:
stress-ng: info:  [8228]                   20968900 Maple Tree Read                0.171 M/sec
stress-ng: info:  [8228]                  312214648 Maple Tree Write               2.547 M/sec

This represents an increase of ~3% read throughput and ~9% increase in
write throughput.


This patch (of 6):

In a subsequent patch, mas_prealloc_calc() will need to access fields only
in the ma_wr_state.  Convert the function to take in a ma_wr_state and
modify all callers.  There is no functional change.

Link: https://lkml.kernel.org/r/20250410191446.2474640-1-sidhartha.kumar@oracle.com
Link: https://lkml.kernel.org/r/20250410191446.2474640-2-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11 17:48:28 -07:00
Przemek Kitszel
4c97a17a25 xarray: make xa_alloc_cyclic() return 0 on all success cases
Change xa_alloc_cyclic() to return 0 even on wrap-around.  Do the same for
xa_alloc_cyclic_irq() and xa_alloc_cyclic_bh().

This will prevent any future bug of treating return of 1 as an error:
	int ret = xa_alloc_cyclic(...)
	if (ret) // currently mishandles ret==1
		goto failure;

If there will be someone interested in when wrap-around occurs, there is
still __xa_alloc_cyclic() that behaves as before.  For now there is no
such user.

Link: https://lkml.kernel.org/r/20250320102219.8101-1-przemyslaw.kitszel@intel.com
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Link: https://lore.kernel.org/netdev/Z9gUd-5t8b5NX2wE@casper.infradead.org
Cc: Andriy Shevchenko <andriy.shevchenko@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11 17:48:19 -07:00
Matthew Wilcox (Oracle)
70d1be00b4 iov_iter: convert iov_iter_extract_xarray_pages() to use folios
ITER_XARRAY is exclusively used with xarrays that contain folios, not
pages, so extract folio pointers from it, not page pointers.  Removes a
use of find_subpage().

Link: https://lkml.kernel.org/r/20250402210612.2444135-5-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11 17:48:05 -07:00
Matthew Wilcox (Oracle)
b57f4f4f18 iov_iter: convert iter_xarray_populate_pages() to use folios
ITER_XARRAY is exclusively used with xarrays that contain folios, not
pages, so extract folio pointers from it, not page pointers.  Removes a
hidden call to compound_head() and a use of find_subpage().

Link: https://lkml.kernel.org/r/20250402210612.2444135-4-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11 17:48:05 -07:00
Paul E. McKenney
ba575cea29 ratelimit: Drop redundant accesses to burst
Now that there is the "burst <= 0" fastpath, for all later code, burst
must be strictly greater than zero.  Therefore, drop the redundant checks
of this local variable.

Link: https://lore.kernel.org/all/fbe93a52-365e-47fe-93a4-44a44547d601@paulmck-laptop/
Link: https://lore.kernel.org/all/20250423115409.3425-1-spasswolf@web.de/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
2025-05-08 16:13:27 -07:00
Paul E. McKenney
4b2cce999c ratelimit: Use nolock_ret restructuring to collapse common case code
Now that unlock_ret releases the lock, then falls into nolock_ret, which
handles ->missed based on the value of ret, the common-case lock-held
code can be collapsed into a single "if" statement with a single-statement
"then" clause.

Yes, we could go further and just assign the "if" condition to ret,
but in the immortal words of MSDOS, "Are you sure?".

Link: https://lore.kernel.org/all/fbe93a52-365e-47fe-93a4-44a44547d601@paulmck-laptop/
Link: https://lore.kernel.org/all/20250423115409.3425-1-spasswolf@web.de/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
2025-05-08 16:13:27 -07:00
Paul E. McKenney
743a1942d5 ratelimit: Use nolock_ret label to collapse lock-failure code
Now that we have a nolock_ret label that handles ->missed correctly
based on the value of ret, we can eliminate a local variable and collapse
several "if" statements on the lock-acquisition-failure code path.

Link: https://lore.kernel.org/all/fbe93a52-365e-47fe-93a4-44a44547d601@paulmck-laptop/
Link: https://lore.kernel.org/all/20250423115409.3425-1-spasswolf@web.de/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
2025-05-08 16:13:27 -07:00
Paul E. McKenney
a69114c2a1 ratelimit: Use nolock_ret label to save a couple of lines of code
Create a nolock_ret label in order to start consolidating the unlocked
return paths that conditionally invoke ratelimit_state_inc_miss().

Link: https://lore.kernel.org/all/fbe93a52-365e-47fe-93a4-44a44547d601@paulmck-laptop/
Link: https://lore.kernel.org/all/20250423115409.3425-1-spasswolf@web.de/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
2025-05-08 16:13:27 -07:00
Paul E. McKenney
f2d0ea0f08 ratelimit: Simplify common-case exit path
By making "ret" always be initialized, and moving the final call to
ratelimit_state_inc_miss() out from under the lock, we save a goto and
a couple lines of code.  This also saves a couple of lines of code from
the unconditional enable/disable slowpath.

Link: https://lore.kernel.org/all/fbe93a52-365e-47fe-93a4-44a44547d601@paulmck-laptop/
Link: https://lore.kernel.org/all/20250423115409.3425-1-spasswolf@web.de/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
2025-05-08 16:13:27 -07:00
Petr Mladek
a940d145cc ratelimit: Warn if ->interval or ->burst are negative
Currently, ___ratelimit() treats a negative ->interval or ->burst as
if it was zero, but this is an accident of the current implementation.
Therefore, splat in this case, which might have the benefit of detecting
use of uninitialized ratelimit_state structures on the one hand or easing
addition of new features on the other.

Link: https://lore.kernel.org/all/fbe93a52-365e-47fe-93a4-44a44547d601@paulmck-laptop/
Link: https://lore.kernel.org/all/20250423115409.3425-1-spasswolf@web.de/
Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
2025-05-08 16:13:27 -07:00
Paul E. McKenney
96d366048f ratelimit: Avoid atomic decrement under lock if already rate-limited
Currently, if the lock is acquired, the code unconditionally does
an atomic decrement on ->rs_n_left, even if that atomic operation is
guaranteed to return a limit-rate verdict.  A limit-rate verdict will
in fact be the common case when something is spewing into a rate limit.
This unconditional atomic operation incurs needless overhead and also
raises the spectre of counter wrap.

Therefore, do the atomic decrement only if there is some chance that
rates won't be limited.

Link: https://lore.kernel.org/all/fbe93a52-365e-47fe-93a4-44a44547d601@paulmck-laptop/
Link: https://lore.kernel.org/all/20250423115409.3425-1-spasswolf@web.de/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
2025-05-08 16:13:27 -07:00
Paul E. McKenney
123a1d97b2 ratelimit: Avoid atomic decrement if already rate-limited
Currently, if the lock could not be acquired, the code unconditionally
does an atomic decrement on ->rs_n_left, even if that atomic operation
is guaranteed to return a limit-rate verdict.  This incurs needless
overhead and also raises the spectre of counter wrap.

Therefore, do the atomic decrement only if there is some chance that
rates won't be limited.

Link: https://lore.kernel.org/all/fbe93a52-365e-47fe-93a4-44a44547d601@paulmck-laptop/
Link: https://lore.kernel.org/all/20250423115409.3425-1-spasswolf@web.de/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
2025-05-08 16:13:27 -07:00
Paul E. McKenney
21ac6e5eda ratelimit: Don't flush misses counter if RATELIMIT_MSG_ON_RELEASE
Restore the previous semantics where the misses counter is unchanged if
the RATELIMIT_MSG_ON_RELEASE flag is set.

Link: https://lore.kernel.org/all/fbe93a52-365e-47fe-93a4-44a44547d601@paulmck-laptop/
Link: https://lore.kernel.org/all/20250423115409.3425-1-spasswolf@web.de/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
2025-05-08 16:13:27 -07:00
Paul E. McKenney
aa2cc356f8 ratelimit: Force re-initialization when rate-limiting re-enabled
Currently, if rate limiting is disabled, ___ratelimit() does an immediate
early return with no state changes.  This can result in false-positive
drops when re-enabling rate limiting.  Therefore, mark the ratelimit_state
structure "uninitialized" when rate limiting is disabled.

[ paulmck: Apply Petr Mladek feedback. ]

Link: https://lore.kernel.org/all/fbe93a52-365e-47fe-93a4-44a44547d601@paulmck-laptop/
Link: https://lore.kernel.org/all/20250423115409.3425-1-spasswolf@web.de/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
2025-05-08 16:13:27 -07:00
Paul E. McKenney
084a990ded ratelimit: Allow zero ->burst to disable ratelimiting
If ->interval is zero, then rate-limiting will be disabled.
Alternatively, if interval is greater than zero and ->burst is zero,
then rate-limiting will be applied unconditionally.  The point of this
distinction is to handle current users that pass zero-initialized
ratelimit_state structures to ___ratelimit(), and in such cases the
->lock field will be uninitialized.  Acquiring ->lock in this case is
clearly not a strategy to win.

Therefore, make this classification be lockless.

Note that although negative ->interval and ->burst happen to be treated
as if they were zero, this is an accident of the current implementation.
The semantics of negative values for these fields is subject to change
without notice.  Especially given that Bert Karwatzki determined that
no current calls to ___ratelimit() ever have negative values for these
fields.

This commit replaces an earlier buggy versions.

Link: https://lore.kernel.org/all/fbe93a52-365e-47fe-93a4-44a44547d601@paulmck-laptop/
Link: https://lore.kernel.org/all/20250423115409.3425-1-spasswolf@web.de/
Reported-by: Bert Karwatzki <spasswolf@web.de>
Reported-by: "Aithal, Srikanth" <sraithal@amd.com>
Closes: https://lore.kernel.org/all/20250423115409.3425-1-spasswolf@web.de/
Reported-by: Mark Brown <broonie@kernel.org>
Closes: https://lore.kernel.org/all/257c3b91-e30f-48be-9788-d27a4445a416@sirena.org.uk/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: "Aithal, Srikanth" <sraithal@amd.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
2025-05-08 16:13:27 -07:00
Petr Mladek
cf8cfa8a99 ratelimit: Reduce ___ratelimit() false-positive rate limiting
Retain the locked design, but check rate-limiting even when the lock
could not be acquired.

Link: https://lore.kernel.org/all/Z_VRo63o2UsVoxLG@pathway.suse.cz/
Link: https://lore.kernel.org/all/fbe93a52-365e-47fe-93a4-44a44547d601@paulmck-laptop/
Link: https://lore.kernel.org/all/20250423115409.3425-1-spasswolf@web.de/
Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
2025-05-08 16:13:27 -07:00
Paul E. McKenney
e64a348dc1 ratelimit: Avoid jiffies=0 special case
The ___ratelimit() function special-cases the jiffies-counter value of zero
as "uninitialized".  This works well on 64-bit systems, where the jiffies
counter is not going to return to zero for more than half a billion years
on systems with HZ=1000, but similar 32-bit systems take less than 50 days
to wrap the jiffies counter.  And although the consequences of wrapping the
jiffies counter seem to be limited to minor confusion on the duration of
the rate-limiting interval that happens to end at time zero, it is almost
no work to avoid this confusion.

Therefore, introduce a RATELIMIT_INITIALIZED bit to the ratelimit_state
structure's ->flags field so that a ->begin value of zero is no longer
special.

Link: https://lore.kernel.org/all/fbe93a52-365e-47fe-93a4-44a44547d601@paulmck-laptop/
Link: https://lore.kernel.org/all/20250423115409.3425-1-spasswolf@web.de/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
2025-05-08 16:13:26 -07:00