Commit graph

1379 commits

Author SHA1 Message Date
Linus Torvalds
35a813e010 printk changes for 6.17
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmiQpykACgkQUqAMR0iA
 lPJcrg/9Hez6+zO7LECCn5VkuK5oJWR5CyCfwx14ki8UF38djQGU2frckI5837rE
 MnVoEBexZunK5SXy4MAy7bTCitzR+lMqNtP5uq9J2ovlSPtNlfuJRDr7uGQLDtSS
 M5KZ1qsZnhgwLYeNhfVVToHgp+OwIQb2GcgYmYc8k03fUI1NQpdxIM46DzoTj+06
 x6qgrNsmmJbm8E73VWBByJAEFoq9ugjny8Rt+tYMi/CmhgZpp0ZyF1r5dYfYX/KS
 VS8UQY//aZOFhNsQUAXwP7Ym00CYRgTg7Na+MHivYLXmYGH2gF6tWQhX/eEgHKcJ
 RTmUbLFx70fdBbjJMxv2k8vyMk2sy6sTfJHPqM/NS/Fb0tSPBXQJG/EexzfoqiBc
 wcjgOPkeALIosVdFdTqXxjoIGOP8rqsU4t6Y6WFjJlWK04SBVjxBUofytRdQSxkG
 5Sb0rFVGKrKIkXaVkt4byPa1/BDpfNhfKMYPtQ56pv2VNUgzfye4prUAZHE5pLnK
 8nixeeMtKDFFCBpn6rG5wZW7k2mK5FrWGZUfdfxdK1gWQ1y0kqGy5wa3lNZLcxlH
 l3AtOYoDeWM2DjDVO6WCj8ambEWkbjbGg7tC9TI3F0NvRJSYytTb6npMqb3Gwhcb
 U4NgT+Ho0GJ/5BLUye8HMfhvrGoCfRCeptHtEFXAK7pzKyjc0+c=
 =Mocd
 -----END PGP SIGNATURE-----

Merge tag 'printk-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux

Pull printk updates from Petr Mladek:

 - Add new "hash_pointers=[auto|always|never]" boot parameter to force
   the hashing even with "slab_debug" enabled

 - Allow to stop CPU, after losing nbcon console ownership during
   panic(), even without proper NMI

 - Allow to use the printk kthread immediately even for the 1st
   registered nbcon

 - Compiler warning removal

* tag 'printk-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
  printk: nbcon: Allow reacquire during panic
  printk: Allow to use the printk kthread immediately even for 1st nbcon
  slab: Decouple slab_debug and no_hash_pointers
  vsprintf: Use __diag macros to disable '-Wsuggest-attribute=format'
  compiler-gcc.h: Introduce __diag_GCC_all
2025-08-04 10:54:36 -07:00
Linus Torvalds
e991acf1bc Significant patch series in this pull request:
- The 2 patch series "squashfs: Remove page->mapping references" from
   Matthew Wilcox gets us closer to being able to remove page->mapping.
 
 - The 5 patch series "relayfs: misc changes" from Jason Xing does some
   maintenance and minor feature addition work in relayfs.
 
 - The 5 patch series "kdump: crashkernel reservation from CMA" from Jiri
   Bohac switches us from static preallocation of the kdump crashkernel's
   working memory over to dynamic allocation.  So the difficulty of
   a-priori estimation of the second kernel's needs is removed and the
   first kernel obtains extra memory.
 
 - The 5 patch series "generalize panic_print's dump function to be used
   by other kernel parts" from Feng Tang implements some consolidation and
   rationalizatio of the various ways in which a faiing kernel splats
   information at the operator.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaI+82gAKCRDdBJ7gKXxA
 jj4JAP9xb+w9DrBY6sa+7KTPIb+aTqQ7Zw3o9O2m+riKQJv6jAEA6aEwRnDA0451
 fDT5IqVlCWGvnVikdZHSnvhdD7TGsQ0=
 =rT71
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2025-08-03-12-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:
 "Significant patch series in this pull request:

   - "squashfs: Remove page->mapping references" (Matthew Wilcox) gets
     us closer to being able to remove page->mapping

   - "relayfs: misc changes" (Jason Xing) does some maintenance and
     minor feature addition work in relayfs

   - "kdump: crashkernel reservation from CMA" (Jiri Bohac) switches
     us from static preallocation of the kdump crashkernel's working
     memory over to dynamic allocation. So the difficulty of a-priori
     estimation of the second kernel's needs is removed and the first
     kernel obtains extra memory

   - "generalize panic_print's dump function to be used by other
     kernel parts" (Feng Tang) implements some consolidation and
     rationalization of the various ways in which a failing kernel
     splats information at the operator

* tag 'mm-nonmm-stable-2025-08-03-12-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (80 commits)
  tools/getdelays: add backward compatibility for taskstats version
  kho: add test for kexec handover
  delaytop: enhance error logging and add PSI feature description
  samples: Kconfig: fix spelling mistake "instancess" -> "instances"
  fat: fix too many log in fat_chain_add()
  scripts/spelling.txt: add notifer||notifier to spelling.txt
  xen/xenbus: fix typo "notifer"
  net: mvneta: fix typo "notifer"
  drm/xe: fix typo "notifer"
  cxl: mce: fix typo "notifer"
  KVM: x86: fix typo "notifer"
  MAINTAINERS: add maintainers for delaytop
  ucount: use atomic_long_try_cmpxchg() in atomic_long_inc_below()
  ucount: fix atomic_long_inc_below() argument type
  kexec: enable CMA based contiguous allocation
  stackdepot: make max number of pools boot-time configurable
  lib/xxhash: remove unused functions
  init/Kconfig: restore CONFIG_BROKEN help text
  lib/raid6: update recov_rvv.c zero page usage
  docs: update docs after introducing delaytop
  ...
2025-08-03 16:23:09 -07:00
Matt Fleming
ed4f142f72 stackdepot: make max number of pools boot-time configurable
We're hitting the WARN in depot_init_pool() about reaching the stack depot
limit because we have long stacks that don't dedup very well.

Introduce a new start-up parameter to allow users to set the number of
maximum stack depot pools.

Link: https://lkml.kernel.org/r/20250718153928.94229-1-matt@readmodwrite.com
Signed-off-by: Matt Fleming <mfleming@cloudflare.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-02 12:01:38 -07:00
Linus Torvalds
6aee5aed2e cgroup: Changes for v6.17
- Allow css_rstat_updated() in NMI context to enable memory accounting for
   allocations in NMI context.
 
 - /proc/cgroups doesn't contain useful information for cgroup2 and was
   updated to only show v1 controllers. This unfortunately broke something in
   the wild. Add an option to bring back the old behavior to ease transition.
 
 - selftest updates and other cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCaIqlxQ4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGcTMAQDUlGf50ATWB9hDU7zUG4lVn8s8n8/+x8QFGHn4
 e4NERQD9FpU/jLN+cwGgspKo+L9qpu/1g+t36cJLcOuEKKoaQwI=
 =FLwx
 -----END PGP SIGNATURE-----

Merge tag 'cgroup-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup updates from Tejun Heo:

 - Allow css_rstat_updated() in NMI context to enable memory accounting
   for allocations in NMI context.

 - /proc/cgroups doesn't contain useful information for cgroup2 and was
   updated to only show v1 controllers. This unfortunately broke
   something in the wild. Add an option to bring back the old behavior
   to ease transition.

 - selftest updates and other cleanups.

* tag 'cgroup-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: Add compatibility option for content of /proc/cgroups
  selftests/cgroup: fix cpu.max tests
  cgroup: llist: avoid memory tears for llist_node
  selftests: cgroup: Fix missing newline in test_zswap_writeback_one
  selftests: cgroup: Allow longer timeout for kmem_dead_cgroups cleanup
  memcg: cgroup: call css_rstat_updated irrespective of in_nmi()
  cgroup: remove per-cpu per-subsystem locks
  cgroup: make css_rstat_updated nmi safe
  cgroup: support to enable nmi-safe css_rstat_updated
  selftests: cgroup: Fix compilation on pre-cgroupns kernels
  selftests: cgroup: Optionally set up v1 environment
  selftests: cgroup: Add support for named v1 hierarchies in test_core
  selftests: cgroup_util: Add helpers for testing named v1 hierarchies
  Documentation: cgroup: add section explaining controller availability
  cgroup: Drop sock_cgroup_classid() dummy implementation
2025-07-31 16:04:19 -07:00
Linus Torvalds
02523d2d93 integrity-v6.17
-----BEGIN PGP SIGNATURE-----
 
 iIoEABYKADIWIQQdXVVFGN5XqKr1Hj7LwZzRsCrn5QUCaItL7xQcem9oYXJAbGlu
 dXguaWJtLmNvbQAKCRDLwZzRsCrn5TduAQDu7W14clgQiJNwYo2hN5cEnfKZVkRI
 6PJGyxV+g+cMOQEA4Aepo2EL86kQJH33iAUmzi0bvyQl4cPTuKqpw5CgjQg=
 =eGra
 -----END PGP SIGNATURE-----

Merge tag 'integrity-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity

Pull integrity update from Mimi Zohar:
 "A single commit to permit disabling IMA from the boot command line for
  just the kdump kernel.

  The exception itself sort of makes sense. My concern is that
  exceptions do not remain as exceptions, but somehow morph to become
  the norm"

* tag 'integrity-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
  ima: add a knob ima= to allow disabling IMA in kdump kernel
2025-07-31 11:42:11 -07:00
Linus Torvalds
e8d780dcd9 slab updates for 6.17
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEe7vIQRWZI0iWSE3xu+CwddJFiJoFAmiHj+8ACgkQu+CwddJF
 iJrnOggAjBwzwvJsUWB3YBaF0wyLipLcdNsbbDOqvLYShQifaEuwN/i8FYO+D7a3
 DyBR3NK4pWcZtxrSJVHcAuy06yQq5sqeU9Dc5iJ+ADCXnqYshUFp5ARtNVaputGy
 b4990JMIG0YxEBD3Gx01kicCdae9JkU5FGZKFk65oHalaGQk7GtMfG+e/obh4z9D
 e9R5Ub+9zM9Efwl/DD7qkETWKAq0gBjvbj0dYO0E7ctO/WNr93Z1FsnbxiUcPiG3
 ED1LwTuNYYccBf/8iPGy/cp0WcWTwGtjbPEUk3lyY0KcrpgGT+cyvJj8G0GfnvV4
 V/OLZrzwVZw2k3MopbFl/RdgWGf0bA==
 =gZB5
 -----END PGP SIGNATURE-----

Merge tag 'slab-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab

Pull slab updates from Vlastimil Babka:

 - Convert struct slab to its own flags instead of referencing page
   flags, which is another preparation step before separating it from
   struct page completely.

   Along with that, a bunch of documentation fixes and cleanups (Matthew
   Wilcox)

 - Convert large kmalloc to use frozen pages in order to be consistent
   with non-large kmalloc slabs (Vlastimil Babka)

 - MAINTAINERS updates (Matthew Wilcox, Lorenzo Stoakes)

 - Restore NUMA policy support for large kmalloc, broken by mistake in
   v6.1 (Vlastimil Babka)

* tag 'slab-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
  MAINTAINERS: add missing files to slab section
  slab: Update MAINTAINERS entry
  memcg_slabinfo: Fix use of PG_slab
  kfence: Remove mention of PG_slab
  vmcoreinfo: Remove documentation of PG_slab and PG_hugetlb
  doc: Add slab internal kernel-doc
  slub: Fix a documentation build error for krealloc()
  slab: Add SL_pfmemalloc flag
  slab: Add SL_partial flag
  slab: Rename slab->__page_flags to slab->flags
  doc: Move SLUB documentation to the admin guide
  mm, slab: use frozen pages for large kmalloc
  mm, slab: restore NUMA policy support for large kmalloc
2025-07-30 11:32:38 -07:00
Linus Torvalds
2db4df0c09 RCU pull request for v6.17
This pull request contains the following branches:
 
 rcu-exp.23.07.2025
 
   - Protect against early RCU exp quiescent state reporting during exp
     grace period initialization.
 
   - Remove superfluous barrier in task unblock path.
 
   - Remove the CPU online quiescent state report optimization, which is
     error prone for certain scenarios.
 
   - Add warning for unexpected pending requested expedited quiescent
     state on dying CPU.
 
 rcu.22.07.2025
 
   - Robustify rcu_is_cpu_rrupt_from_idle() by using more accurate
     indicators of the actual context tracking state of a CPU.
 
   - Handle ->defer_qs_iw_pending field data race.
 
   - Enable rcu_normal_wake_from_gp by default on systems with <= 16 CPUs.
 
   - Fix lockup in rcu_read_unlock() due to recursive irq_exit() calls.
 
   - Refactor expedited handling condition in rcu_read_unlock_special().
 
   - Documentation updates for hotplug and GP init scan ordering,
     separation of rcu_state and rnp's gp_seq states, quiescent state
     reporting for offline CPUs.
 
 torture-scripts.16.07.2025
 
   - Cleanup and improve scripts : remove superfluous warnings for disabled
     tests; better handling of kvm.sh --kconfig arg; suppress some confusing
     diagnostics; tolerate bad kvm.sh args; add new diagnostic for build
     output; fail allmodconfig testing on warnings.
 
   - Include RCU_TORTURE_TEST_CHK_RDR_STATE config for KCSAN kernels.
 
   - Disable default RCU-tasks and clocksource-wdog testing on arm64.
 
   - Add EXPERT Kconfig option for arm64 KCSAN runs.
 
   - Remove SRCU-lite testing.
 
 rcutorture.16.07.2025
 
   - Start torture writer threads creation after reader threads to handle
     race in SRCU-P scenario.
 
   - Add SRCU down_read()/up_read() test.
 
   - Add diagnostics for delayed SRCU up_read(), unmatched up_read(), print
     number of up/down readers and the number of such readers which
     migrated to other CPU.
 
   - Ignore certain unsupported configurations for trivial RCU test.
 
   - Fix splats in RT kernels due to inaccurate checks for BH-disabled
     context.
 
   - Enable checks and logs to capture intentionally exercised unexpected
     scenarios (too short readers) for BUSTED test.
 
   - Remove SRCU-lite testing.
 
 srcu.19.07.2025
 
   - Expedite SRCU-fast grace periods.
 
   - Remove SRCU-lite implementation.
 
   - Add guards for SRCU-fast readers.
 
 rcu.nocb.18.07.2025
 
   - Dump NOCB group leader state on stall detection.
 
   - Robustify nocb_cb_kthread pointer accesses.
 
   - Fix delayed execution of hurry callbacks when LAZY_RCU is enabled.
 
 refscale.07.07.2025
 
   - Fix multiplication overflow in "loops" and "nreaders" calculations.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSi2tPIQIc2VEtjarIAHS7/6Z0wpQUCaINnRwAKCRAAHS7/6Z0w
 pRYJAQC97ZDW2wBegDbQPsg5ECLX9Lyd6+IC65sdi38IENl+TQEA4/oMzUUceIH+
 CDCnxv3fAMhPncJfvIukOLzMJpKw0go=
 =8t4O
 -----END PGP SIGNATURE-----

Merge tag 'rcu.release.v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux

Pull RCU updates from Neeraj Upadhyay:
 "Expedited grace period updates:

   - Protect against early RCU exp quiescent state reporting during exp
     grace period initialization

   - Remove superfluous barrier in task unblock path

   - Remove the CPU online quiescent state report optimization, which is
     error prone for certain scenarios

   - Add warning for unexpected pending requested expedited quiescent
     state on dying CPU

  Core:

   - Robustify rcu_is_cpu_rrupt_from_idle() by using more accurate
     indicators of the actual context tracking state of a CPU

   - Handle ->defer_qs_iw_pending field data race

   - Enable rcu_normal_wake_from_gp by default on systems with <= 16
     CPUs

   - Fix lockup in rcu_read_unlock() due to recursive irq_exit() calls

   - Refactor expedited handling condition in rcu_read_unlock_special()

   - Documentation updates for hotplug and GP init scan ordering,
     separation of rcu_state and rnp's gp_seq states, quiescent state
     reporting for offline CPUs

  torture-scripts:

   - Cleanup and improve scripts : remove superfluous warnings for
     disabled tests; better handling of kvm.sh --kconfig arg; suppress
     some confusing diagnostics; tolerate bad kvm.sh args; add new
     diagnostic for build output; fail allmodconfig testing on warnings

   - Include RCU_TORTURE_TEST_CHK_RDR_STATE config for KCSAN kernels

   - Disable default RCU-tasks and clocksource-wdog testing on arm64

   - Add EXPERT Kconfig option for arm64 KCSAN runs

   - Remove SRCU-lite testing

  rcutorture:

   - Start torture writer threads creation after reader threads to
     handle race in SRCU-P scenario

   - Add SRCU down_read()/up_read() test

   - Add diagnostics for delayed SRCU up_read(), unmatched up_read(),
     print number of up/down readers and the number of such readers
     which migrated to other CPU

   - Ignore certain unsupported configurations for trivial RCU test

   - Fix splats in RT kernels due to inaccurate checks for BH-disabled
     context

   - Enable checks and logs to capture intentionally exercised
     unexpected scenarios (too short readers) for BUSTED test

   - Remove SRCU-lite testing

  srcu:

   - Expedite SRCU-fast grace periods

   - Remove SRCU-lite implementation

   - Add guards for SRCU-fast readers

  rcu nocb:

   - Dump NOCB group leader state on stall detection

   - Robustify nocb_cb_kthread pointer accesses

   - Fix delayed execution of hurry callbacks when LAZY_RCU is enabled

  refscale:

   - Fix multiplication overflow in "loops" and "nreaders" calculations"

* tag 'rcu.release.v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (49 commits)
  rcu: Document concurrent quiescent state reporting for offline CPUs
  rcu: Document separation of rcu_state and rnp's gp_seq
  rcu: Document GP init vs hotplug-scan ordering requirements
  srcu: Add guards for SRCU-fast readers
  rcu: Fix delayed execution of hurry callbacks
  rcu: Refactor expedited handling check in rcu_read_unlock_special()
  checkpatch: Remove SRCU-lite deprecation
  srcu: Remove SRCU-lite implementation
  srcu: Expedite SRCU-fast grace periods
  rcutorture: Remove support for SRCU-lite
  rcutorture: Remove SRCU-lite scenarios
  torture: Remove support for SRCU-lite
  torture: Make torture.sh --allmodconfig testing fail on warnings
  torture: Add "ERROR" diagnostic for testing kernel-build output
  torture: Make torture.sh tolerate runs having bad kvm.sh arguments
  torture: Add textid.txt file to --do-allmodconfig and --do-rcu-rust runs
  torture: Extract testid.txt generation to separate script
  torture: Suppress "find" diagnostics from torture.sh --do-none run
  torture: Provide EXPERT Kconfig option for arm64 KCSAN torture.sh runs
  rcu: Fix rcu_read_unlock() deadloop due to IRQ work
  ...
2025-07-30 11:01:41 -07:00
Linus Torvalds
bf76f23aa1 Scheduler updates for v6.17:
Core scheduler changes:
 
  - Better tracking of maximum lag of tasks in presence of different
    slices duration, for better handling of lag in the fair
    scheduler. (Vincent Guittot)
 
  - Clean up and standardize #if/#else/#endif markers throughout
    the entire scheduler code base (Ingo Molnar)
 
  - Make SMP unconditional: build the SMP scheduler's
    data structures and logic on UP kernel too, even though
    they are not used, to simplify the scheduler and remove
    around 200 #ifdef/[#else]/#endif blocks from the
    scheduler. (Ingo Molnar)
 
  - Reorganize cgroup bandwidth control interface handling
    for better interfacing with sched_ext (Tejun Heo)
 
 Balancing:
 
  - Bump sd->max_newidle_lb_cost when newidle balance fails (Chris Mason)
  - Remove sched_domain_topology_level::flags to simplify the code (Prateek Nayak)
  - Simplify and clean up build_sched_topology() (Li Chen)
  - Optimize build_sched_topology() on large machines (Li Chen)
 
 Real-time scheduling:
 
  - Add initial version of proxy execution: a mechanism for mutex-owning
    tasks to inherit the scheduling context of higher priority waiters.
    Currently limited to a single runqueue and conditional on CONFIG_EXPERT,
    and other limitations. (John Stultz, Peter Zijlstra, Valentin Schneider)
 
  - Deadline scheduler (Juri Lelli):
 
    - Fix dl_servers initialization order (Juri Lelli)
    - Fix DL scheduler's root domain reinitialization logic (Juri Lelli)
    - Fix accounting bugs after global limits change (Juri Lelli)
    - Fix scalability regression by implementing less agressive dl_server handling
      (Peter Zijlstra)
 
 PSI:
 
  - Improve scalability by optimizing psi_group_change() cpu_clock() usage
    (Peter Zijlstra)
 
 Rust changes:
 
  - Make Task, CondVar and PollCondVar methods inline to avoid unnecessary
    function calls (Kunwu Chan, Panagiotis Foliadis)
 
  - Add might_sleep() support for Rust code: Rust's "#[track_caller]"
    mechanism is used so that Rust's might_sleep() doesn't need to be
    defined as a macro (Fujita Tomonori)
 
  - Introduce file_from_location() (Boqun Feng)
 
 Debugging & instrumentation:
 
  - Make clangd usable with scheduler source code files again (Peter Zijlstra)
 
  - tools: Add root_domains_dump.py which dumps root domains info (Juri Lelli)
 
  - tools: Add dl_bw_dump.py for printing bandwidth accounting info (Juri Lelli)
 
 Misc cleanups & fixes:
 
  - Remove play_idle() (Feng Lee)
 
  - Fix check_preemption_disabled() (Sebastian Andrzej Siewior)
 
  - Do not call __put_task_struct() on RT if pi_blocked_on is set
    (Luis Claudio R. Goncalves)
 
  - Correct the comment in place_entity() (wang wei)
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmiHHNIRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1g7DhAAg9aMW33PuC24A4hCS1XQay6j3rgmR5qC
 AOqDofj/CY4Q374HQtOl4m5CYZB/G5csRv6TZliWQKhAy9vr6VWddoyOMJYOAlAx
 XRurl1Z3MriOMD6DPgNvtHd5PrR5Un8ygALgT+32d0PRz27KNXORW5TyvEf2Bv4r
 BX4/GazlOlK0PdGUdZl0q/3dtkU4Wr5IifQzT8KbarOSBbNwZwVcg+83hLW5gJMx
 LgMGLaAATmiN7VuvJWNDATDfEOmOvQOu8veoS8TuP1AOVeJPfPT2JVh9Jen5V1/5
 3w1RUOkUI2mQX+cujWDW3koniSxjsA1OegXfHnFkF5BXp4q5e54k6D5sSh1xPFDX
 iDhkU5jsbKkkJS2ulD6Vi4bIAct3apMl4IrbJn/OYOLcUVI8WuunHs4UPPEuESAS
 TuQExKSdj4Ntrzo3pWEy8kX3/Z9VGa+WDzwsPUuBSvllB5Ir/jjKgvkxPA6zGsiY
 rbkmZT8qyI01IZ/GXqfI2AQYCGvgp+SOvFPi755ZlELTQS6sUkGZH2/2M5XnKA9t
 Z1wB2iwttoS1VQInx0HgiiAGrXrFkr7IzSIN2T+CfWIqilnL7+nTxzwlJtC206P4
 DB97bF6azDtJ6yh1LetRZ1ZMX/Gr56Cy0Z6USNoOu+a12PLqlPk9+fPBBpkuGcdy
 BRk8KgysEuk=
 =8T0v
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2025-07-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:
 "Core scheduler changes:

   - Better tracking of maximum lag of tasks in presence of different
     slices duration, for better handling of lag in the fair scheduler
     (Vincent Guittot)

   - Clean up and standardize #if/#else/#endif markers throughout the
     entire scheduler code base (Ingo Molnar)

   - Make SMP unconditional: build the SMP scheduler's data structures
     and logic on UP kernel too, even though they are not used, to
     simplify the scheduler and remove around 200 #ifdef/[#else]/#endif
     blocks from the scheduler (Ingo Molnar)

   - Reorganize cgroup bandwidth control interface handling for better
     interfacing with sched_ext (Tejun Heo)

  Balancing:

   - Bump sd->max_newidle_lb_cost when newidle balance fails (Chris
     Mason)

   - Remove sched_domain_topology_level::flags to simplify the code
     (Prateek Nayak)

   - Simplify and clean up build_sched_topology() (Li Chen)

   - Optimize build_sched_topology() on large machines (Li Chen)

  Real-time scheduling:

   - Add initial version of proxy execution: a mechanism for
     mutex-owning tasks to inherit the scheduling context of higher
     priority waiters.

     Currently limited to a single runqueue and conditional on
     CONFIG_EXPERT, and other limitations (John Stultz, Peter Zijlstra,
     Valentin Schneider)

   - Deadline scheduler (Juri Lelli):
      - Fix dl_servers initialization order (Juri Lelli)
      - Fix DL scheduler's root domain reinitialization logic (Juri
        Lelli)
      - Fix accounting bugs after global limits change (Juri Lelli)
      - Fix scalability regression by implementing less agressive
        dl_server handling (Peter Zijlstra)

  PSI:

   - Improve scalability by optimizing psi_group_change() cpu_clock()
     usage (Peter Zijlstra)

  Rust changes:

   - Make Task, CondVar and PollCondVar methods inline to avoid
     unnecessary function calls (Kunwu Chan, Panagiotis Foliadis)

   - Add might_sleep() support for Rust code: Rust's "#[track_caller]"
     mechanism is used so that Rust's might_sleep() doesn't need to be
     defined as a macro (Fujita Tomonori)

   - Introduce file_from_location() (Boqun Feng)

  Debugging & instrumentation:

   - Make clangd usable with scheduler source code files again (Peter
     Zijlstra)

   - tools: Add root_domains_dump.py which dumps root domains info (Juri
     Lelli)

   - tools: Add dl_bw_dump.py for printing bandwidth accounting info
     (Juri Lelli)

  Misc cleanups & fixes:

   - Remove play_idle() (Feng Lee)

   - Fix check_preemption_disabled() (Sebastian Andrzej Siewior)

   - Do not call __put_task_struct() on RT if pi_blocked_on is set (Luis
     Claudio R. Goncalves)

   - Correct the comment in place_entity() (wang wei)"

* tag 'sched-core-2025-07-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (84 commits)
  sched/idle: Remove play_idle()
  sched: Do not call __put_task_struct() on rt if pi_blocked_on is set
  sched: Start blocked_on chain processing in find_proxy_task()
  sched: Fix proxy/current (push,pull)ability
  sched: Add an initial sketch of the find_proxy_task() function
  sched: Fix runtime accounting w/ split exec & sched contexts
  sched: Move update_curr_task logic into update_curr_se
  locking/mutex: Add p->blocked_on wrappers for correctness checks
  locking/mutex: Rework task_struct::blocked_on
  sched: Add CONFIG_SCHED_PROXY_EXEC & boot argument to enable/disable
  sched/topology: Remove sched_domain_topology_level::flags
  x86/smpboot: avoid SMT domain attach/destroy if SMT is not enabled
  x86/smpboot: moves x86_topology to static initialize and truncate
  x86/smpboot: remove redundant CONFIG_SCHED_SMT
  smpboot: introduce SDTL_INIT() helper to tidy sched topology setup
  tools/sched: Add dl_bw_dump.py for printing bandwidth accounting info
  tools/sched: Add root_domains_dump.py which dumps root domains info
  sched/deadline: Fix accounting after global limits change
  sched/deadline: Reset extra_bw to max_bw when clearing root domains
  sched/deadline: Initialize dl_servers after SMP
  ...
2025-07-29 17:42:52 -07:00
Linus Torvalds
04d29e3609 - Untangle the Retbleed from the ITS mitigation on Intel. Allow for ITS
to enable stuffing independently from Retbleed, do some cleanups to
   simplify and streamline the code
 
 - Simplify SRSO and make mitigation types selection more versatile
   depending on the Retbleed mitigation selection. Simplify code some
 
 - Add the second part of the attack vector controls which provide a lot
   friendlier user interface to the speculation mitigations than
   selecting each one by one as it is now.
 
   Instead, the selection of whole attack vectors which are relevant to
   the system in use can be done and protection against only those
   vectors is enabled, thus giving back some performance to the users
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmiHh6cACgkQEsHwGGHe
 VUqprw//QMpqtWGVbo4bJ176sLtwn8cdKxOJwx9rWyFH/f3Zcn5hK1x+Zifm22hj
 NNo7YMLTvEg6BicxIDKp89tfXM5cwLS3pcUabWy7IS7Xzs7yLyRajNQ3hOFhQd9g
 UAUg8xx33xspCatlXzl4HcbOR0xyxb/qR4vd5H89Gir9GIuiO5+uz+3SdqEzzl8w
 2UfPDY5B9cXO8VoGsvJMtLTO1ULUHHZPgRdPaH8rSr9QkGlVFefpgUaw6Budic84
 kjNpE4tyJEvVLceZr8UtZWmVBwBS4z9oNRdqHbCFnrpPdYXnzYXA6pKMm1vP3zCz
 atRuWxmn0U6o9wZfxcBF7ZI2o3k049U8zxLWlz9mX4pXbMuqSX6MsR4kw82ta/Hp
 IzM9LckPO2STYHvJJlcEOivYbKTKttwYZd0rjfaFtJ0z+vVar4EyPyTbfGAdiH50
 T2UUmC9SpffVVhnOcaTUGtT/4SFCVA8ZNsoPm27auGVzZRnLOFSV63iv5fl41o3X
 pELyVfLzR3XtXFNXrzXY09lEKh5HIiy33Qe+syCNEoF56zTN+IREu37M7dKiWBmx
 xRJE9U9ZgxZjbEuMV0jKEMPOMzMf1ONQw5HSpfIgoT5OLwKXhP5HptHkKS3rwppG
 5Glo2kfvxKzFl/THHv7EPoIvVVL/tezcvO3H7z4owRl/jgw0CvA=
 =zO6b
 -----END PGP SIGNATURE-----

Merge tag 'x86_bugs_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 CPU mitigation updates from Borislav Petkov:

 - Untangle the Retbleed from the ITS mitigation on Intel. Allow for ITS
   to enable stuffing independently from Retbleed, do some cleanups to
   simplify and streamline the code

 - Simplify SRSO and make mitigation types selection more versatile
   depending on the Retbleed mitigation selection. Simplify code some

 - Add the second part of the attack vector controls which provide a lot
   friendlier user interface to the speculation mitigations than
   selecting each one by one as it is now.

   Instead, the selection of whole attack vectors which are relevant to
   the system in use can be done and protection against only those
   vectors is enabled, thus giving back some performance to the users

* tag 'x86_bugs_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
  x86/bugs: Print enabled attack vectors
  x86/bugs: Add attack vector controls for TSA
  x86/pti: Add attack vector controls for PTI
  x86/bugs: Add attack vector controls for ITS
  x86/bugs: Add attack vector controls for SRSO
  x86/bugs: Add attack vector controls for L1TF
  x86/bugs: Add attack vector controls for spectre_v2
  x86/bugs: Add attack vector controls for BHI
  x86/bugs: Add attack vector controls for spectre_v2_user
  x86/bugs: Add attack vector controls for retbleed
  x86/bugs: Add attack vector controls for spectre_v1
  x86/bugs: Add attack vector controls for GDS
  x86/bugs: Add attack vector controls for SRBDS
  x86/bugs: Add attack vector controls for RFDS
  x86/bugs: Add attack vector controls for MMIO
  x86/bugs: Add attack vector controls for TAA
  x86/bugs: Add attack vector controls for MDS
  x86/bugs: Define attack vectors relevant for each bug
  x86/Kconfig: Add arch attack vector support
  cpu: Define attack vectors
  ...
2025-07-29 16:34:45 -07:00
Linus Torvalds
0b29600a30 Updates for interrupt chip drivers:
- Add support of forced affinity setting to yet offline CPUs for the
    MIPS-GIC to ensure that the affinity of per CPU interrupts can be set
    during the early bringup phase of a secondary CPU in the hotplug code
    before the CPU is set online and interrupts are enabled.\
 
  - Add support for the MIPS (RISC-V !?!?) P8700 SoC in the ACLINT_SSWI
    interrupt chip
 
  - Make the interrupt routing to RISV-V harts specification compliant so it
    supports arbitrary hart indices
 
  - Add a command line parameter and related handling to disable the generic
    RISCV IMSIC mechanism on platforms which use a trap-emulated IMSIC.
    Unfortunatly this is required because there is no mechanism available to
    discover this programatically.
 
  - Enable wakeup sources on the Renesas RZV2H driver
 
  - Convert interrupt chip drivers, which use a open coded variant of
    msi_create_parent_irq_domain() to use the new functionality
 
  - Convert interrupt chip drivers, which use the old style two level
    implementation of MSI support over to the MSI parent mechanism to
    prepare for removing at least one of the three PCI/MSI backend variants.
 
  - The usual cleanups and improvements all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmiGj60THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoSh6D/9wY0G2dGz+EJeiDsldzB1n5jmf5I0k
 3XsI3o5j0Ma/Yy+nu9Re3fZq0+qzPFZZErxkBp5igCJbSoaIGheqOyXQDuQu/8tm
 s2t8Wx9k6er7Cywg9rU9pWKzJ6AFXFvcKOEvGG2q2+lFbJbIoGdAM93qPrOJhqeo
 a3NyhQv6kNl7xAjOVyEZmOlCZgCYotFwC0+K1TVQgGDGbwHWH3wad64gLTyyQQlK
 RvtbUBKfCBqqwLJ7Mww7Xclezjk/Hpgm/OppxBAglv5WyRd0e15u2dpdTdM9r4BC
 4wX5Old3ZgbqBQjdHGNlljthu4lO2S0PXU6j0EC8W2NiQjN+hPaKK/EPeerlAJCz
 UmxY0/E3HFNUk8ZHkiif3PGiOSvAbn0JwWi3+D6HuK1rlVTXNs07NIgUBk727Ty5
 S7r5JEuUA5s9dGta4pszxHGn/0Dqg/WvnMZGcbPNaV6POH47wNnPlO2mj14I1HLk
 SfG+deohJM34pVVq7fiqgGukLVPm6PfiJkXx90MK6l+BfE58uo7Oue9mm9pqT2dy
 b6K1gdNPRsZzG7AoAqkx3UrjQuD7maWIpDGb4VZeUW/34bthLygIDUY4OZhpdrUZ
 m33T8zv0PrmNuvnMdFt0RyoDTu8PC9rYS0XVvsIMqsMxJDE/URVGH2tCi5CVMiEg
 PbRWL56yGyT1NA==
 =5LGj
 -----END PGP SIGNATURE-----

Merge tag 'irq-drivers-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull interrupt chip driver updates from Thomas Gleixner:

 - Add support of forced affinity setting to yet offline CPUs for the
   MIPS-GIC to ensure that the affinity of per CPU interrupts can be set
   during the early bringup phase of a secondary CPU in the hotplug code
   before the CPU is set online and interrupts are enabled

 - Add support for the MIPS (RISC-V !?!?) P8700 SoC in the ACLINT_SSWI
   interrupt chip

 - Make the interrupt routing to RISV-V harts specification compliant so
   it supports arbitrary hart indices

 - Add a command line parameter and related handling to disable the
   generic RISCV IMSIC mechanism on platforms which use a trap-emulated
   IMSIC. Unfortunatly this is required because there is no mechanism
   available to discover this programatically.

 - Enable wakeup sources on the Renesas RZV2H driver

 - Convert interrupt chip drivers, which use a open coded variant of
   msi_create_parent_irq_domain() to use the new functionality

 - Convert interrupt chip drivers, which use the old style two level
   implementation of MSI support over to the MSI parent mechanism to
   prepare for removing at least one of the three PCI/MSI backend
   variants.

 - The usual cleanups and improvements all over the place

* tag 'irq-drivers-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits)
  irqchip/renesas-irqc: Convert to DEFINE_SIMPLE_DEV_PM_OPS()
  irqchip/renesas-intc-irqpin: Convert to DEFINE_SIMPLE_DEV_PM_OPS()
  irqchip/riscv-imsic: Add kernel parameter to disable IPIs
  irqchip/gic-v3: Fix GICD_CTLR register naming
  irqchip/ls-scfg-msi: Fix NULL dereference in error handling
  irqchip/ls-scfg-msi: Switch to use msi_create_parent_irq_domain()
  irqchip/armada-370-xp: Switch to msi_create_parent_irq_domain()
  irqchip/alpine-msi: Switch to msi_create_parent_irq_domain()
  irqchip/alpine-msi: Convert to __free
  irqchip/alpine-msi: Convert to lock guards
  irqchip/alpine-msi: Clean up whitespace style
  irqchip/sg2042-msi: Switch to msi_create_parent_irq_domain()
  irqchip/loongson-pch-msi.c: Switch to msi_create_parent_irq_domain()
  irqchip/imx-mu-msi: Convert to msi_create_parent_irq_domain() helper
  irqchip/riscv-imsic: Convert to msi_create_parent_irq_domain() helper
  irqchip/bcm2712-mip: Switch to msi_create_parent_irq_domain()
  irqdomain: Add device pointer to irq_domain_info and msi_domain_info
  irqchip/renesas-rzv2h: Remove unneeded includes
  irqchip/renesas-rzv2h: Enable SKIP_SET_WAKE and MASK_ON_SUSPEND
  irqchip/aslint-sswi: Resolve hart index
  ...
2025-07-29 13:26:05 -07:00
Linus Torvalds
53edfecef6 Power management updates for 6.17-rc1
- Fix two initialization ordering issues in the cpufreq core and a
    governor initialization error path in it, and clean it up (Lifeng
    Zheng)
 
  - Add Granite Rapids support in no-HWP mode to the intel_pstate cpufreq
    driver (Li RongQing)
 
  - Make intel_pstate always use HWP_DESIRED_PERF when operating in the
    passive mode (Rafael Wysocki)
 
  - Allow building the tegra124 cpufreq driver as a module (Aaron Kling)
 
  - Do minor cleanups for Rust cpufreq and cpumask APIs and fix MAINTAINERS
    entry for cpu.rs (Abhinav Ananthu, Ritvik Gupta, Lukas Bulwahn)
 
  - Clean up assorted cpufreq drivers (Arnd Bergmann, Dan Carpenter,
    Krzysztof Kozlowski, Sven Peter, Svyatoslav Ryhel, Lifeng Zheng)
 
  - Add the NEED_UPDATE_LIMITS flag to the CPPC cpufreq driver (Prashant
    Malani)
 
  - Fix minimum performance state label error in the amd-pstate driver
    documentation (Shouye Liu)
 
  - Add the CPUFREQ_GOV_STRICT_TARGET flag to the userspace cpufreq
    governor and explain HW coordination influence on it in the
    documentation (Shashank Balaji)
 
  - Fix opencoded for_each_cpu() in idle_state_valid() in the DT cpuidle
    driver (Yury Norov)
 
  - Remove info about non-existing QoS interfaces from the PM QoS
    documentation (Ulf Hansson)
 
  - Use c_* types via kernel prelude in Rust for OPP (Abhinav Ananthu)
 
  - Add HiSilicon uncore frequency scaling driver to devfreq (Jie Zhan)
 
  - Allow devfreq drivers to add custom sysfs ABIs (Jie Zhan)
 
  - Simplify the sun8i-a33-mbus devfreq driver by using more devm
    functions (Uwe Kleine-König)
 
  - Fix an index typo in trans_stat() in devfreq (Chanwoo Choi)
 
  - Check devfreq governor before using governor->name (Lifeng Zheng)
 
  - Remove a redundant devfreq_get_freq_range() call from
    devfreq_add_device() (Lifeng Zheng)
 
  - Limit max_freq with scaling_min_freq in devfreq (Lifeng Zheng)
 
  - Replace sscanf() with kstrtoul() in set_freq_store() (Lifeng Zheng)
 
  - Extend the asynchronous suspend and resume of devices to handle
    suppliers like parents and consumers like children (Rafael Wysocki)
 
  - Make pm_runtime_force_resume() work for drivers that set the
    DPM_FLAG_SMART_SUSPEND flag and allow PCI drivers and drivers that
    collaborate with the general ACPI PM domain to set it (Rafael
    Wysocki)
 
  - Add kernel parameter to disable asynchronous suspend/resume of
    devices (Tudor Ambarus)
 
  - Drop redundant might_sleep() calls from some functions in the device
    suspend/resume core code (Zhongqiu Han)
 
  - Fix the handling of monitors connected right before waking up the
    system from sleep (tuhaowen)
 
  - Clean up MAINTAINERS entries for suspend and hibernation (Rafael
    Wysocki)
 
  - Fix error code path in the KEXEC_JUMP flow and drop a redundant
    pm_restore_gfp_mask() call from it (Rafael Wysocki)
 
  - Rearrange suspend/resume error handling in the core device suspend
    and resume code (Rafael Wysocki)
 
  - Fix up white space that does not follow coding style in the
    hibernation core code (Darshan Rathod)
 
  - Document return values of suspend-related API functions in the
    runtime PM framework (Sakari Ailus)
 
  - Mark last busy stamp in multiple autosuspend-related functions in the
    runtime PM framework and update its documentation (Sakari Ailus)
 
  - Take active children into account in pm_runtime_get_if_in_use() for
    consistency (Rafael Wysocki)
 
  - Fix NULL pointer dereference in get_pd_power_uw() in the dtpm_cpu
    power capping driver (Sivan Zohar-Kotzer)
 
  - Add support for the Bartlett Lake platform to the Intel RAPL power
    capping driver (Qiao Wei)
 
  - Add PL4 support for Panther Lake to the intel_rapl_msr power capping
    driver (Zhang Rui)
 
  - Update contact information in the PM ABI docs and maintainer
    information in the power domains DT binding (Rafael Wysocki)
 
  - Update PM header inclusions to follow the IWYU (Include What You Use)
    principle (Andy Shevchenko)
 
  - Add flags to specify power on attach/detach for PM domains, make the
    driver core detach PM domains in device_unbind_cleanup(), and drop
    the dev_pm_domain_detach() call from the platform bus type (Claudiu
    Beznea)
 
  - Improve Python binding's Makefile for cpupower (John B. Wyatt IV)
 
  - Fix printing of CORE, CPU fields in cpupower-monitor (Gautham Shenoy)
 -----BEGIN PGP SIGNATURE-----
 
 iQFGBAABCAAwFiEEcM8Aw/RY0dgsiRUR7l+9nS/U47UFAmh/wC4SHHJqd0Byand5
 c29ja2kubmV0AAoJEO5fvZ0v1OO1O6MIAJtfclAleksv+PzbEyC+yk72zKinJg35
 WJUk4Kz1yMOqAPazbpXRXt1tuxqyB3HWeixnTFyZbz+bbhZjYJ0lvpWGkdsFaS0i
 NSbILSpHNGtOrP6s6hVKTBmLAdAzdWYWMQizlWgGrkhOiN5BnQzL7pAi2aGqu9KS
 tGqnIg/3QwBAvnxijgpkm7qozOUMPJ9dzSvxMaFeB6JH7SNbTOODVFtsoD+mbJlH
 YVMMWxih8b4MRJgAo4N2bL1Glp/Qnwg4ACawnQokt8Rknbtwku57QF9YwTbubr36
 Ok7qbNnUSx0h9KtMQQNogLLkFreTJkbGknVWEwaWWhXNeW9l4cr6MWo=
 =xVF9
 -----END PGP SIGNATURE-----

Merge tag 'pm-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "As is tradition, cpufreq is the part with the largest number of
  updates that include core fixes and cleanups as well as updates of
  several assorted drivers, but there are also quite a few updates
  related to system sleep, mostly focused on asynchronous suspend and
  resume of devices and on making the integration of system suspend
  and resume with runtime PM easier.

  Runtime PM is also updated to allow some code duplication in drivers
  to be eliminated going forward and to work more consistently overall
  in some cases.

  Apart from that, there are some driver core updates related to PM
  domains that should help to address ordering issues with devm_ cleanup
  routines relying on PM domains, some assorted devfreq updates
  including core fixes and cleanups, tooling updates, and documentation
  and MAINTAINERS updates.

  Specifics:

   - Fix two initialization ordering issues in the cpufreq core and a
     governor initialization error path in it, and clean it up (Lifeng
     Zheng)

   - Add Granite Rapids support in no-HWP mode to the intel_pstate
     cpufreq driver (Li RongQing)

   - Make intel_pstate always use HWP_DESIRED_PERF when operating in the
     passive mode (Rafael Wysocki)

   - Allow building the tegra124 cpufreq driver as a module (Aaron
     Kling)

   - Do minor cleanups for Rust cpufreq and cpumask APIs and fix
     MAINTAINERS entry for cpu.rs (Abhinav Ananthu, Ritvik Gupta, Lukas
     Bulwahn)

   - Clean up assorted cpufreq drivers (Arnd Bergmann, Dan Carpenter,
     Krzysztof Kozlowski, Sven Peter, Svyatoslav Ryhel, Lifeng Zheng)

   - Add the NEED_UPDATE_LIMITS flag to the CPPC cpufreq driver
     (Prashant Malani)

   - Fix minimum performance state label error in the amd-pstate driver
     documentation (Shouye Liu)

   - Add the CPUFREQ_GOV_STRICT_TARGET flag to the userspace cpufreq
     governor and explain HW coordination influence on it in the
     documentation (Shashank Balaji)

   - Fix opencoded for_each_cpu() in idle_state_valid() in the DT
     cpuidle driver (Yury Norov)

   - Remove info about non-existing QoS interfaces from the PM QoS
     documentation (Ulf Hansson)

   - Use c_* types via kernel prelude in Rust for OPP (Abhinav Ananthu)

   - Add HiSilicon uncore frequency scaling driver to devfreq (Jie Zhan)

   - Allow devfreq drivers to add custom sysfs ABIs (Jie Zhan)

   - Simplify the sun8i-a33-mbus devfreq driver by using more devm
     functions (Uwe Kleine-König)

   - Fix an index typo in trans_stat() in devfreq (Chanwoo Choi)

   - Check devfreq governor before using governor->name (Lifeng Zheng)

   - Remove a redundant devfreq_get_freq_range() call from
     devfreq_add_device() (Lifeng Zheng)

   - Limit max_freq with scaling_min_freq in devfreq (Lifeng Zheng)

   - Replace sscanf() with kstrtoul() in set_freq_store() (Lifeng Zheng)

   - Extend the asynchronous suspend and resume of devices to handle
     suppliers like parents and consumers like children (Rafael Wysocki)

   - Make pm_runtime_force_resume() work for drivers that set the
     DPM_FLAG_SMART_SUSPEND flag and allow PCI drivers and drivers that
     collaborate with the general ACPI PM domain to set it (Rafael
     Wysocki)

   - Add kernel parameter to disable asynchronous suspend/resume of
     devices (Tudor Ambarus)

   - Drop redundant might_sleep() calls from some functions in the
     device suspend/resume core code (Zhongqiu Han)

   - Fix the handling of monitors connected right before waking up the
     system from sleep (tuhaowen)

   - Clean up MAINTAINERS entries for suspend and hibernation (Rafael
     Wysocki)

   - Fix error code path in the KEXEC_JUMP flow and drop a redundant
     pm_restore_gfp_mask() call from it (Rafael Wysocki)

   - Rearrange suspend/resume error handling in the core device suspend
     and resume code (Rafael Wysocki)

   - Fix up white space that does not follow coding style in the
     hibernation core code (Darshan Rathod)

   - Document return values of suspend-related API functions in the
     runtime PM framework (Sakari Ailus)

   - Mark last busy stamp in multiple autosuspend-related functions in
     the runtime PM framework and update its documentation (Sakari
     Ailus)

   - Take active children into account in pm_runtime_get_if_in_use() for
     consistency (Rafael Wysocki)

   - Fix NULL pointer dereference in get_pd_power_uw() in the dtpm_cpu
     power capping driver (Sivan Zohar-Kotzer)

   - Add support for the Bartlett Lake platform to the Intel RAPL power
     capping driver (Qiao Wei)

   - Add PL4 support for Panther Lake to the intel_rapl_msr power
     capping driver (Zhang Rui)

   - Update contact information in the PM ABI docs and maintainer
     information in the power domains DT binding (Rafael Wysocki)

   - Update PM header inclusions to follow the IWYU (Include What You
     Use) principle (Andy Shevchenko)

   - Add flags to specify power on attach/detach for PM domains, make
     the driver core detach PM domains in device_unbind_cleanup(), and
     drop the dev_pm_domain_detach() call from the platform bus type
     (Claudiu Beznea)

   - Improve Python binding's Makefile for cpupower (John B. Wyatt IV)

   - Fix printing of CORE, CPU fields in cpupower-monitor (Gautham
     Shenoy)"

* tag 'pm-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (75 commits)
  cpufreq: CPPC: Mark driver with NEED_UPDATE_LIMITS flag
  PM: docs: Use my kernel.org address in ABI docs and DT bindings
  PM: hibernate: Fix up white space that does not follow coding style
  PM: sleep: Rearrange suspend/resume error handling in the core
  Documentation: amd-pstate:fix minimum performance state label error
  PM: runtime: Take active children into account in pm_runtime_get_if_in_use()
  kexec_core: Drop redundant pm_restore_gfp_mask() call
  kexec_core: Fix error code path in the KEXEC_JUMP flow
  PM: sleep: Clean up MAINTAINERS entries for suspend and hibernation
  drivers: cpufreq: add Tegra114 support
  rust: cpumask: Replace `MaybeUninit` and `mem::zeroed` with `Opaque` APIs
  cpufreq: Exit governor when failed to start old governor
  cpufreq: Move the check of cpufreq_driver->get into cpufreq_verify_current_freq()
  cpufreq: Init policy->rwsem before it may be possibly used
  cpufreq: Initialize cpufreq-based frequency-invariance later
  cpufreq: Remove duplicate check in __cpufreq_offline()
  cpufreq: Contain scaling_cur_freq.attr in cpufreq_attrs
  cpufreq: intel_pstate: Add Granite Rapids support in no-HWP mode
  cpufreq: intel_pstate: Always use HWP_DESIRED_PERF in passive mode
  PM / devfreq: Add HiSilicon uncore frequency scaling driver
  ...
2025-07-28 20:13:36 -07:00
Prachotan Bathi
7f0c6675b3 tpm_crb_ffa: handle tpm busy return code
Platforms supporting direct message request v2 [1] can support secure
partitions that support multiple services. For CRB over FF-A interface,
if the firmware TPM or TPM service [1] shares its Secure Partition (SP)
with another service, message requests may fail with a -EBUSY error.

To handle this, replace the single check and call with a retry loop
that attempts the TPM message send operation until it succeeds or a
configurable timeout is reached. Implement a _try_send_receive function
to do a single send/receive and modify the existing send_receive to
add this retry loop.
The retry mechanism introduces a module parameter (`busy_timeout_ms`,
default: 2000ms) to control how long to keep retrying on -EBUSY
responses. Between retries, the code waits briefly (50-100 microseconds)
to avoid busy-waiting and handling TPM BUSY conditions more gracefully.

The parameter can be modified at run-time as such:
echo 3000 | tee /sys/module/tpm_crb_ffa/parameters/busy_timeout_ms
This changes the timeout from the default 2000ms to 3000ms.

[1] TPM Service Command Response Buffer Interface Over FF-A
https://developer.arm.com/documentation/den0138/latest/

Signed-off-by: Prachotan Bathi <prachotan.bathi@arm.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2025-07-23 02:32:00 +03:00
Feng Tang
ee13240cd7 panic: add note that panic_print sysctl interface is deprecated
Add a dedicated core parameter 'panic_console_replay' for controlling
console replay, and add note that 'panic_print' sysctl interface will be
obsoleted by 'panic_sys_info' and 'panic_console_replay'.  When it
happens, the SYS_INFO_PANIC_CONSOLE_REPLAY can be removed as well.

Link: https://lkml.kernel.org/r/20250703021004.42328-6-feng.tang@linux.alibaba.com
Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
Suggested-by: Petr Mladek <pmladek@suse.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19 19:08:25 -07:00
Feng Tang
9743d12d0c panic: add 'panic_sys_info=' setup option for kernel cmdline
'panic_sys_info=' sysctl interface is already added for runtime setting. 
Add counterpart kernel cmdline option for boottime setting.

Link: https://lkml.kernel.org/r/20250703021004.42328-5-feng.tang@linux.alibaba.com
Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
Suggested-by: Petr Mladek <pmladek@suse.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19 19:08:24 -07:00
Feng Tang
261743b013 panic: clean up code for console replay
Patch series "generalize panic_print's dump function to be used by other
kernel parts", v3.

When working on kernel stability issues, panic, task-hung and
software/hardware lockup are frequently met.  And to debug them, user may
need lots of system information at that time, like task call stacks, lock
info, memory info etc.  

panic case already has panic_print_sys_info() for this purpose, and has a
'panic_print' bitmask to control what kinds of information is needed,
which is also helpful to debug other task-hung and lockup cases.

So this patchset extracts the function out to a new file 'lib/sys_info.c',
and makes it available for other cases which also need to dump system info
for debugging.  

Also as suggested by Petr Mladek, add 'panic_sys_info=' interface to take
human readable string like "tasks,mem,locks,timers,ftrace,....", and
eventually obsolete the current 'panic_print' bitmap interface.

In RFC and V1 version, hung_task and SW/HW watchdog modules are enabled
with the new sys_info dump interface.  In v2, they are kept out for better
review of current change, and will be posted later.  

Locally these have been used in our bug chasing for stability issues and
was proven helpful.

Many thanks to Petr Mladek for great suggestions on both the code and
architectures!


This patch (of 5):

Currently the panic_print_sys_info() was called twice with different
parameters to handle console replay case, which is kind of confusing.

Add panic_console_replay() explicitly and rename
'PANIC_PRINT_ALL_PRINTK_MSG' to 'PANIC_CONSOLE_REPLAY', to make the code
straightforward.  The related kernel document is also updated.

Link: https://lkml.kernel.org/r/20250703021004.42328-1-feng.tang@linux.alibaba.com
Link: https://lkml.kernel.org/r/20250703021004.42328-2-feng.tang@linux.alibaba.com
Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
Suggested-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19 19:08:24 -07:00
Jiri Bohac
ce1bf19a34 kdump, documentation: describe craskernel CMA reservation
Describe the new crashkernel ",cma" suffix in Documentation/

Link: https://lkml.kernel.org/r/aEqpQwUy6gqSiUkV@dwarf.suse.cz
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Cc: Baoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Donald Dutile <ddutile@redhat.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Philipp Rudo <prudo@redhat.com>
Cc: Pingfan Liu <piliu@redhat.com>
Cc: Tao Liu <ltao@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19 19:08:23 -07:00
Michal Koutný
646faf36d7 cgroup: Add compatibility option for content of /proc/cgroups
/proc/cgroups lists only v1 controllers by default, however, this is
only enforced since the commit af000ce852 ("cgroup: Do not report
unavailable v1 controllers in /proc/cgroups") and there is software in
the wild that uses content of /proc/cgroups to decide on availability of
v2 (sic) controllers.

Add a boottime param that can bring back the previous behavior for
setups where the check in the software cannot be changed and it causes
e.g. unintended OOMs.

Also, this patch takes out cgrp_v1_visible from cgroup1_subsys_absent()
guard since it's only important to check which hierarchy (v1 vs v2) the
subsys is attached to. This has no effect on the printed message but
the code is cleaner since cgrp_v1_visible is really about mounted
hierarchies, not the content of /proc/cgroups.

Link: https://lore.kernel.org/r/b26b60b7d0d2a5ecfd2f3c45f95f32922ed24686.camel@decadent.org.uk
Fixes: af000ce852 ("cgroup: Do not report unavailable v1 controllers in /proc/cgroups")
Fixes: a0ab145322 ("cgroup: Print message when /proc/cgroups is read on v2-only system")
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-07-19 06:14:44 -10:00
Anup Patel
ea92b6046d irqchip/riscv-imsic: Add kernel parameter to disable IPIs
When injecting IPIs to a set of harts, the IMSIC IPI support will do a
separate MMIO write to the SETIPNUM_LE register of each target hart. This
means on a platform where IMSIC is trap-n-emulated, there will be N MMIO
traps when injecting IPI to N target harts hence IMSIC IPIs will be slow on
such platforms compared to the SBI IPI extension.

Unfortunately, there is no DT, ACPI, or any other way of discovering
whether the underlying IMSIC is trap-n-emulated. Using MMIO write to the
SETIPNUM_LE register for injecting IPI is purely a software choice in the
IMSIC driver hence add a kernel parameter to allow users to disable IMSIC
IPIs on platforms with trap-n-emulated IMSIC.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250716123745.557585-1-apatel@ventanamicro.com
2025-07-18 16:46:09 +02:00
Rafael J. Wysocki
7c1f7c22e6 Merge back earlier material related to system sleep 2025-07-17 19:55:25 +02:00
Uladzislau Rezki (Sony)
d827673d8a Documentation/kernel-parameters: Update rcu_normal_wake_from_gp doc
Update the documentation about rcu_normal_wake_from_gp parameter.

Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
2025-07-16 09:24:39 +05:30
John Stultz
25c411fce7 sched: Add CONFIG_SCHED_PROXY_EXEC & boot argument to enable/disable
Add a CONFIG_SCHED_PROXY_EXEC option, along with a boot argument
sched_proxy_exec= that can be used to disable the feature at boot
time if CONFIG_SCHED_PROXY_EXEC was enabled.

Also uses this option to allow the rq->donor to be different from
rq->curr.

Signed-off-by: John Stultz <jstultz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lkml.kernel.org/r/20250712033407.2383110-2-jstultz@google.com
2025-07-14 17:16:31 +02:00
David Kaplan
1caa1b0509 Documentation/x86: Document new attack vector controls
Document the 5 new attack vector command line options, how they
interact with existing vulnerability controls, and recommendations on when
they can be disabled.

Note that while mitigating against untrusted userspace requires both
user-to-kernel and user-to-user protection, these are kept separate.  The
kernel can control what code executes inside of it and that may affect the
risk associated with vulnerabilities especially if new kernel mitigations
are implemented.  The same isn't typically true of userspace.

In other words, the risk associated with user-to-user or guest-to-guest
attacks is unlikely to change over time.  While the risk associated with
user-to-kernel or guest-to-host attacks may change.  Therefore, these
controls are separated.

Signed-off-by: David Kaplan <david.kaplan@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250709155731.3279419-1-david.kaplan@amd.com
2025-07-11 17:51:43 +02:00
Tudor Ambarus
f747cde5e7 PM: sleep: add kernel parameter to disable asynchronous suspend/resume
On some platforms, device dependencies are not properly represented by
device links, which can cause issues when asynchronous power management
is enabled. While it is possible to disable this via sysfs, doing so
at runtime can race with the first system suspend event.

This patch introduces a kernel command-line parameter, "pm_async", which
can be set to "off" to globally disable asynchronous suspend and resume
operations from early boot. It effectively provides a way to set the
initial value of the existing pm_async sysfs knob at boot time. This
offers a robust method to fall back to synchronous (sequential)
operation, which can stabilize platforms with problematic dependencies
and also serve as a useful debugging tool.

The default behavior remains unchanged (asynchronous enabled). To
disable it, boot the kernel with the "pm_async=off" parameter.

Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://patch.msgid.link/20250709-pm-async-off-v3-1-cb69a6fc8d04@linaro.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-07-10 14:40:07 +02:00
Matthew Wilcox (Oracle)
262e086f93 doc: Move SLUB documentation to the admin guide
This section is supposed to be for internal documentation, while the
document is advice for sysadmins.  Move it to the appropriate place.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Harry Yoo <harry.yoo@oracle.com>
Link: https://patch.msgid.link/20250611155916.2579160-2-willy@infradead.org
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-06-18 13:06:26 +02:00
Borislav Petkov (AMD)
d8010d4ba4 x86/bugs: Add a Transient Scheduler Attacks mitigation
Add the required features detection glue to bugs.c et all in order to
support the TSA mitigation.

Co-developed-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
2025-06-17 17:17:02 +02:00
Baoquan He
aa9bb1b325 ima: add a knob ima= to allow disabling IMA in kdump kernel
Kdump kernel doesn't need IMA functionality, and enabling IMA will cost
extra memory. It would be very helpful to allow IMA to be disabled for
kdump kernel.

Hence add a knob ima=on|off here to allow turning IMA off in kdump
kernel if needed.

Note that this IMA disabling is limited to kdump kernel, please don't
abuse it in other kernel and thus serious consequences are caused.

Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2025-06-16 09:15:13 -04:00
Kees Cook
de1c831a78 slab: Decouple slab_debug and no_hash_pointers
Some system owners use slab_debug=FPZ (or similar) as a hardening option,
but do not want to be forced into having kernel addresses exposed due
to the implicit "no_hash_pointers" boot param setting.[1]

Introduce the "hash_pointers" boot param, which defaults to "auto"
(the current behavior), but also includes "always" (forcing on hashing
even when "slab_debug=..." is defined), and "never". The existing
"no_hash_pointers" boot param becomes an alias for "hash_pointers=never".

This makes it possible to boot with "slab_debug=FPZ hash_pointers=always".

Link: https://github.com/KSPP/linux/issues/368 [1]
Fixes: 792702911f ("slub: force on no_hash_pointers when slub_debug is enabled")
Co-developed-by: Sergio Perez Gonzalez <sperezglz@gmail.com>
Signed-off-by: Sergio Perez Gonzalez <sperezglz@gmail.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Kees Cook <kees@kernel.org>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Acked-by: Rafael Aquini <raquini@redhat.com>
Tested-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Link: https://patch.msgid.link/20250415170232.it.467-kees@kernel.org
[kees@kernel.org: Add note about hash_pointers into slab_debug kernel parameter documentation.]
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-06-09 16:26:10 +02:00
Linus Torvalds
e9e668cd27 arm64 fixes for -rc1
- Disable problematic linker assertions for broken versions of LLD.
 
 - Work around sporadic link failure with LLD and various randconfig
   builds.
 
 - Fix missing invalidation in the TLB batching code when reclaim races
   with mprotect() and friends.
 
 - Add a command-line override for MPAM to allow booting on systems with
   broken firmware.
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmhBcycQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNDwWCACtc4Jw3wwkmaiiP9Ner1/7wKq8xRLC2WRU
 tJjWLSkeoTthxf0DZILc61rNpOalfaRK774/Xo0OiYOBpKeAi5cSaUYMyabVJGcK
 k1R0KXDUu8oS6xKXmXyeuBV2pK4v4aET3E6lzUQZfvamhzuZfCvvKKrF5K8vv5Ph
 eowBMWKugMrwXMOBkRgVopppobdneFuVvnoMlNNYWOy70wDekoPV3qizoVJG/ulQ
 BTFunXX8Otufrm48Ye2bYalfwoiGdUQaJz/gRuHko0o3SOhqR3qZp2DWxQgBwJ+g
 VI6/dRLnVQpdg6toTvS9jzPczVfLt4/5VhLevbBcJuaUOER4SOZl
 =cfnk
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:
 "We've got a couple of build fixes when using LLD, a missing TLB
  invalidation and a workaround for broken firmware on SoCs with CPUs
  that implement MPAM:

   - Disable problematic linker assertions for broken versions of LLD

   - Work around sporadic link failure with LLD and various randconfig
     builds

   - Fix missing invalidation in the TLB batching code when reclaim
     races with mprotect() and friends

   - Add a command-line override for MPAM to allow booting on systems
     with broken firmware"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: Add override for MPAM
  arm64/mm: Close theoretical race where stale TLB entry remains valid
  arm64: Work around convergence issue with LLD linker
  arm64: Disable LLD linker ASSERT()s for the time being
2025-06-05 11:39:17 -07:00
Xi Ruoyao
10f885d63a arm64: Add override for MPAM
As the message of the commit 09e6b306f3 ("arm64: cpufeature: discover
CPU support for MPAM") already states, if a buggy firmware fails to
either enable MPAM or emulate the trap as if it were disabled, the
kernel will just fail to boot.  While upgrading the firmware should be
the best solution, we have some hardware of which the vendor have made
no response 2 months after we requested a firmware update.  Allow
overriding it so our devices don't become some e-waste.

Cc: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Cc: Mingcong Bai <jeffbai@aosc.io>
Cc: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Cc: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250602043723.216338-1-xry111@xry111.site
Signed-off-by: Will Deacon <will@kernel.org>
2025-06-02 13:49:09 +01:00
Linus Torvalds
00c010e130 - The 11 patch series "Add folio_mk_pte()" from Matthew Wilcox
simplifies the act of creating a pte which addresses the first page in a
   folio and reduces the amount of plumbing which architecture must
   implement to provide this.
 
 - The 8 patch series "Misc folio patches for 6.16" from Matthew Wilcox
   is a shower of largely unrelated folio infrastructure changes which
   clean things up and better prepare us for future work.
 
 - The 3 patch series "memory,x86,acpi: hotplug memory alignment
   advisement" from Gregory Price adds early-init code to prevent x86 from
   leaving physical memory unused when physical address regions are not
   aligned to memory block size.
 
 - The 2 patch series "mm/compaction: allow more aggressive proactive
   compaction" from Michal Clapinski provides some tuning of the (sadly,
   hard-coded (more sadly, not auto-tuned)) thresholds for our invokation
   of proactive compaction.  In a simple test case, the reduction of a guest
   VM's memory consumption was dramatic.
 
 - The 8 patch series "Minor cleanups and improvements to swap freeing
   code" from Kemeng Shi provides some code cleaups and a small efficiency
   improvement to this part of our swap handling code.
 
 - The 6 patch series "ptrace: introduce PTRACE_SET_SYSCALL_INFO API"
   from Dmitry Levin adds the ability for a ptracer to modify syscalls
   arguments.  At this time we can alter only "system call information that
   are used by strace system call tampering, namely, syscall number,
   syscall arguments, and syscall return value.
 
   This series should have been incorporated into mm.git's "non-MM"
   branch, but I goofed.
 
 - The 3 patch series "fs/proc: extend the PAGEMAP_SCAN ioctl to report
   guard regions" from Andrei Vagin extends the info returned by the
   PAGEMAP_SCAN ioctl against /proc/pid/pagemap.  This permits CRIU to more
   efficiently get at the info about guard regions.
 
 - The 2 patch series "Fix parameter passed to page_mapcount_is_type()"
   from Gavin Shan implements that fix.  No runtime effect is expected
   because validate_page_before_insert() happens to fix up this error.
 
 - The 3 patch series "kernel/events/uprobes: uprobe_write_opcode()
   rewrite" from David Hildenbrand basically brings uprobe text poking into
   the current decade.  Remove a bunch of hand-rolled implementation in
   favor of using more current facilities.
 
 - The 3 patch series "mm/ptdump: Drop assumption that pxd_val() is u64"
   from Anshuman Khandual provides enhancements and generalizations to the
   pte dumping code.  This might be needed when 128-bit Page Table
   Descriptors are enabled for ARM.
 
 - The 12 patch series "Always call constructor for kernel page tables"
   from Kevin Brodsky "ensures that the ctor/dtor is always called for
   kernel pgtables, as it already is for user pgtables".  This permits the
   addition of more functionality such as "insert hooks to protect page
   tables".  This change does result in various architectures performing
   unnecesary work, but this is fixed up where it is anticipated to occur.
 
 - The 9 patch series "Rust support for mm_struct, vm_area_struct, and
   mmap" from Alice Ryhl adds plumbing to permit Rust access to core MM
   structures.
 
 - The 3 patch series "fix incorrectly disallowed anonymous VMA merges"
   from Lorenzo Stoakes takes advantage of some VMA merging opportunities
   which we've been missing for 15 years.
 
 - The 4 patch series "mm/madvise: batch tlb flushes for MADV_DONTNEED
   and MADV_FREE" from SeongJae Park optimizes process_madvise()'s TLB
   flushing.  Instead of flushing each address range in the provided iovec,
   we batch the flushing across all the iovec entries.  The syscall's cost
   was approximately halved with a microbenchmark which was designed to
   load this particular operation.
 
 - The 6 patch series "Track node vacancy to reduce worst case allocation
   counts" from Sidhartha Kumar makes the maple tree smarter about its node
   preallocation.  stress-ng mmap performance increased by single-digit
   percentages and the amount of unnecessarily preallocated memory was
   dramaticelly reduced.
 
 - The 3 patch series "mm/gup: Minor fix, cleanup and improvements" from
   Baoquan He removes a few unnecessary things which Baoquan noted when
   reading the code.
 
 - The 3 patch series ""Enhance sysfs handling for memory hotplug in
   weighted interleave" from Rakie Kim "enhances the weighted interleave
   policy in the memory management subsystem by improving sysfs handling,
   fixing memory leaks, and introducing dynamic sysfs updates for memory
   hotplug support".  Fixes things on error paths which we are unlikely to
   hit.
 
 - The 7 patch series "mm/damon: auto-tune DAMOS for NUMA setups
   including tiered memory" from SeongJae Park introduces new DAMOS quota
   goal metrics which eliminate the manual tuning which is required when
   utilizing DAMON for memory tiering.
 
 - The 5 patch series "mm/vmalloc.c: code cleanup and improvements" from
   Baoquan He provides cleanups and small efficiency improvements which
   Baoquan found via code inspection.
 
 - The 2 patch series "vmscan: enforce mems_effective during demotion"
   from Gregory Price "changes reclaim to respect cpuset.mems_effective
   during demotion when possible".  because "presently, reclaim explicitly
   ignores cpuset.mems_effective when demoting, which may cause the cpuset
   settings to violated." "This is useful for isolating workloads on a
   multi-tenant system from certain classes of memory more consistently."
 
 - The 2 patch series ""Clean up split_huge_pmd_locked() and remove
   unnecessary folio pointers" from Gavin Guo provides minor cleanups and
   efficiency gains in in the huge page splitting and migrating code.
 
 - The 3 patch series "Use kmem_cache for memcg alloc" from Huan Yang
   creates a slab cache for `struct mem_cgroup', yielding improved memory
   utilization.
 
 - The 4 patch series "add max arg to swappiness in memory.reclaim and
   lru_gen" from Zhongkun He adds a new "max" argument to the "swappiness="
   argument for memory.reclaim MGLRU's lru_gen.  This directs proactive
   reclaim to reclaim from only anon folios rather than file-backed folios.
 
 - The 17 patch series "kexec: introduce Kexec HandOver (KHO)" from Mike
   Rapoport is the first step on the path to permitting the kernel to
   maintain existing VMs while replacing the host kernel via file-based
   kexec.  At this time only memblock's reserve_mem is preserved.
 
 - The 7 patch series "mm: Introduce for_each_valid_pfn()" from David
   Woodhouse provides and uses a smarter way of looping over a pfn range.
   By skipping ranges of invalid pfns.
 
 - The 2 patch series "sched/numa: Skip VMA scanning on memory pinned to
   one NUMA node via cpuset.mems" from Libo Chen removes a lot of pointless
   VMA scanning when a task is pinned a single NUMA mode.  Dramatic
   performance benefits were seen in some real world cases.
 
 - The 2 patch series "JFS: Implement migrate_folio for
   jfs_metapage_aops" from Shivank Garg addresses a warning which occurs
   during memory compaction when using JFS.
 
 - The 4 patch series "move all VMA allocation, freeing and duplication
   logic to mm" from Lorenzo Stoakes moves some VMA code from kernel/fork.c
   into the more appropriate mm/vma.c.
 
 - The 6 patch series "mm, swap: clean up swap cache mapping helper" from
   Kairui Song provides code consolidation and cleanups related to the
   folio_index() function.
 
 - The 2 patch series "mm/gup: Cleanup memfd_pin_folios()" from Vishal
   Moola does that.
 
 - The 8 patch series "memcg: Fix test_memcg_min/low test failures" from
   Waiman Long addresses some bogus failures which are being reported by
   the test_memcontrol selftest.
 
 - The 3 patch series "eliminate mmap() retry merge, add .mmap_prepare
   hook" from Lorenzo Stoakes commences the deprecation of
   file_operations.mmap() in favor of the new
   file_operations.mmap_prepare().  The latter is more restrictive and
   prevents drivers from messing with things in ways which, amongst other
   problems, may defeat VMA merging.
 
 - The 4 patch series "memcg: decouple memcg and objcg stocks"" from
   Shakeel Butt decouples the per-cpu memcg charge cache from the objcg's
   one.  This is a step along the way to making memcg and objcg charging
   NMI-safe, which is a BPF requirement.
 
 - The 6 patch series "mm/damon: minor fixups and improvements for code,
   tests, and documents" from SeongJae Park is "yet another batch of
   miscellaneous DAMON changes.  Fix and improve minor problems in code,
   tests and documents."
 
 - The 7 patch series "memcg: make memcg stats irq safe" from Shakeel
   Butt converts memcg stats to be irq safe.  Another step along the way to
   making memcg charging and stats updates NMI-safe, a BPF requirement.
 
 - The 4 patch series "Let unmap_hugepage_range() and several related
   functions take folio instead of page" from Fan Ni provides folio
   conversions in the hugetlb code.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaDt5qgAKCRDdBJ7gKXxA
 ju6XAP9nTiSfRz8Cz1n5LJZpFKEGzLpSihCYyR6P3o1L9oe3mwEAlZ5+XAwk2I5x
 Qqb/UGMEpilyre1PayQqOnct3aSL9Ao=
 =tYYm
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2025-05-31-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - "Add folio_mk_pte()" from Matthew Wilcox simplifies the act of
   creating a pte which addresses the first page in a folio and reduces
   the amount of plumbing which architecture must implement to provide
   this.

 - "Misc folio patches for 6.16" from Matthew Wilcox is a shower of
   largely unrelated folio infrastructure changes which clean things up
   and better prepare us for future work.

 - "memory,x86,acpi: hotplug memory alignment advisement" from Gregory
   Price adds early-init code to prevent x86 from leaving physical
   memory unused when physical address regions are not aligned to memory
   block size.

 - "mm/compaction: allow more aggressive proactive compaction" from
   Michal Clapinski provides some tuning of the (sadly, hard-coded (more
   sadly, not auto-tuned)) thresholds for our invokation of proactive
   compaction. In a simple test case, the reduction of a guest VM's
   memory consumption was dramatic.

 - "Minor cleanups and improvements to swap freeing code" from Kemeng
   Shi provides some code cleaups and a small efficiency improvement to
   this part of our swap handling code.

 - "ptrace: introduce PTRACE_SET_SYSCALL_INFO API" from Dmitry Levin
   adds the ability for a ptracer to modify syscalls arguments. At this
   time we can alter only "system call information that are used by
   strace system call tampering, namely, syscall number, syscall
   arguments, and syscall return value.

   This series should have been incorporated into mm.git's "non-MM"
   branch, but I goofed.

 - "fs/proc: extend the PAGEMAP_SCAN ioctl to report guard regions" from
   Andrei Vagin extends the info returned by the PAGEMAP_SCAN ioctl
   against /proc/pid/pagemap. This permits CRIU to more efficiently get
   at the info about guard regions.

 - "Fix parameter passed to page_mapcount_is_type()" from Gavin Shan
   implements that fix. No runtime effect is expected because
   validate_page_before_insert() happens to fix up this error.

 - "kernel/events/uprobes: uprobe_write_opcode() rewrite" from David
   Hildenbrand basically brings uprobe text poking into the current
   decade. Remove a bunch of hand-rolled implementation in favor of
   using more current facilities.

 - "mm/ptdump: Drop assumption that pxd_val() is u64" from Anshuman
   Khandual provides enhancements and generalizations to the pte dumping
   code. This might be needed when 128-bit Page Table Descriptors are
   enabled for ARM.

 - "Always call constructor for kernel page tables" from Kevin Brodsky
   ensures that the ctor/dtor is always called for kernel pgtables, as
   it already is for user pgtables.

   This permits the addition of more functionality such as "insert hooks
   to protect page tables". This change does result in various
   architectures performing unnecesary work, but this is fixed up where
   it is anticipated to occur.

 - "Rust support for mm_struct, vm_area_struct, and mmap" from Alice
   Ryhl adds plumbing to permit Rust access to core MM structures.

 - "fix incorrectly disallowed anonymous VMA merges" from Lorenzo
   Stoakes takes advantage of some VMA merging opportunities which we've
   been missing for 15 years.

 - "mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE" from
   SeongJae Park optimizes process_madvise()'s TLB flushing.

   Instead of flushing each address range in the provided iovec, we
   batch the flushing across all the iovec entries. The syscall's cost
   was approximately halved with a microbenchmark which was designed to
   load this particular operation.

 - "Track node vacancy to reduce worst case allocation counts" from
   Sidhartha Kumar makes the maple tree smarter about its node
   preallocation.

   stress-ng mmap performance increased by single-digit percentages and
   the amount of unnecessarily preallocated memory was dramaticelly
   reduced.

 - "mm/gup: Minor fix, cleanup and improvements" from Baoquan He removes
   a few unnecessary things which Baoquan noted when reading the code.

 - ""Enhance sysfs handling for memory hotplug in weighted interleave"
   from Rakie Kim "enhances the weighted interleave policy in the memory
   management subsystem by improving sysfs handling, fixing memory
   leaks, and introducing dynamic sysfs updates for memory hotplug
   support". Fixes things on error paths which we are unlikely to hit.

 - "mm/damon: auto-tune DAMOS for NUMA setups including tiered memory"
   from SeongJae Park introduces new DAMOS quota goal metrics which
   eliminate the manual tuning which is required when utilizing DAMON
   for memory tiering.

 - "mm/vmalloc.c: code cleanup and improvements" from Baoquan He
   provides cleanups and small efficiency improvements which Baoquan
   found via code inspection.

 - "vmscan: enforce mems_effective during demotion" from Gregory Price
   changes reclaim to respect cpuset.mems_effective during demotion when
   possible. because presently, reclaim explicitly ignores
   cpuset.mems_effective when demoting, which may cause the cpuset
   settings to violated.

   This is useful for isolating workloads on a multi-tenant system from
   certain classes of memory more consistently.

 - "Clean up split_huge_pmd_locked() and remove unnecessary folio
   pointers" from Gavin Guo provides minor cleanups and efficiency gains
   in in the huge page splitting and migrating code.

 - "Use kmem_cache for memcg alloc" from Huan Yang creates a slab cache
   for `struct mem_cgroup', yielding improved memory utilization.

 - "add max arg to swappiness in memory.reclaim and lru_gen" from
   Zhongkun He adds a new "max" argument to the "swappiness=" argument
   for memory.reclaim MGLRU's lru_gen.

   This directs proactive reclaim to reclaim from only anon folios
   rather than file-backed folios.

 - "kexec: introduce Kexec HandOver (KHO)" from Mike Rapoport is the
   first step on the path to permitting the kernel to maintain existing
   VMs while replacing the host kernel via file-based kexec. At this
   time only memblock's reserve_mem is preserved.

 - "mm: Introduce for_each_valid_pfn()" from David Woodhouse provides
   and uses a smarter way of looping over a pfn range. By skipping
   ranges of invalid pfns.

 - "sched/numa: Skip VMA scanning on memory pinned to one NUMA node via
   cpuset.mems" from Libo Chen removes a lot of pointless VMA scanning
   when a task is pinned a single NUMA mode.

   Dramatic performance benefits were seen in some real world cases.

 - "JFS: Implement migrate_folio for jfs_metapage_aops" from Shivank
   Garg addresses a warning which occurs during memory compaction when
   using JFS.

 - "move all VMA allocation, freeing and duplication logic to mm" from
   Lorenzo Stoakes moves some VMA code from kernel/fork.c into the more
   appropriate mm/vma.c.

 - "mm, swap: clean up swap cache mapping helper" from Kairui Song
   provides code consolidation and cleanups related to the folio_index()
   function.

 - "mm/gup: Cleanup memfd_pin_folios()" from Vishal Moola does that.

 - "memcg: Fix test_memcg_min/low test failures" from Waiman Long
   addresses some bogus failures which are being reported by the
   test_memcontrol selftest.

 - "eliminate mmap() retry merge, add .mmap_prepare hook" from Lorenzo
   Stoakes commences the deprecation of file_operations.mmap() in favor
   of the new file_operations.mmap_prepare().

   The latter is more restrictive and prevents drivers from messing with
   things in ways which, amongst other problems, may defeat VMA merging.

 - "memcg: decouple memcg and objcg stocks"" from Shakeel Butt decouples
   the per-cpu memcg charge cache from the objcg's one.

   This is a step along the way to making memcg and objcg charging
   NMI-safe, which is a BPF requirement.

 - "mm/damon: minor fixups and improvements for code, tests, and
   documents" from SeongJae Park is yet another batch of miscellaneous
   DAMON changes. Fix and improve minor problems in code, tests and
   documents.

 - "memcg: make memcg stats irq safe" from Shakeel Butt converts memcg
   stats to be irq safe. Another step along the way to making memcg
   charging and stats updates NMI-safe, a BPF requirement.

 - "Let unmap_hugepage_range() and several related functions take folio
   instead of page" from Fan Ni provides folio conversions in the
   hugetlb code.

* tag 'mm-stable-2025-05-31-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (285 commits)
  mm: pcp: increase pcp->free_count threshold to trigger free_high
  mm/hugetlb: convert use of struct page to folio in __unmap_hugepage_range()
  mm/hugetlb: refactor __unmap_hugepage_range() to take folio instead of page
  mm/hugetlb: refactor unmap_hugepage_range() to take folio instead of page
  mm/hugetlb: pass folio instead of page to unmap_ref_private()
  memcg: objcg stock trylock without irq disabling
  memcg: no stock lock for cpu hot-unplug
  memcg: make __mod_memcg_lruvec_state re-entrant safe against irqs
  memcg: make count_memcg_events re-entrant safe against irqs
  memcg: make mod_memcg_state re-entrant safe against irqs
  memcg: move preempt disable to callers of memcg_rstat_updated
  memcg: memcg_rstat_updated re-entrant safe against irqs
  mm: khugepaged: decouple SHMEM and file folios' collapse
  selftests/eventfd: correct test name and improve messages
  alloc_tag: check mem_profiling_support in alloc_tag_init
  Docs/damon: update titles and brief introductions to explain DAMOS
  selftests/damon/_damon_sysfs: read tried regions directories in order
  mm/damon/tests/core-kunit: add a test for damos_set_filters_default_reject()
  mm/damon/paddr: remove unused variable, folio_list, in damon_pa_stat()
  mm/damon/sysfs-schemes: fix wrong comment on damons_sysfs_quota_goal_metric_strs
  ...
2025-05-31 15:44:16 -07:00
Linus Torvalds
c89756bcf4 Power management updates for 6.16-rc1
- Fix potential division-by-zero error in em_compute_costs() (Yaxiong
    Tian).
 
  - Fix typos in energy model documentation and example driver code (Moon
    Hee Lee, Atul Kumar Pant).
 
  - Rearrange the energy model management code and add a new function for
    adjusting a CPU energy model after adjusting the capacity of the
    given CPU to it (Rafael Wysocki).
 
  - Refactor cpufreq_online(), add and use cpufreq policy locking guards,
    use __free() in policy reference counting, and clean up core cpufreq
    code on top of that (Rafael Wysocki).
 
  - Fix boost handling on CPU suspend/resume and sysfs updates (Viresh
    Kumar).
 
  - Fix des_perf clamping with max_perf in amd_pstate_update() (Dhananjay
    Ugwekar).
 
  - Add offline, online and suspend callbacks to the amd-pstate driver,
    rename and use the existing amd_pstate_epp callbacks in it (Dhananjay
    Ugwekar).
 
  - Add support for the "Requested CPU Min frequency" BIOS option to the
    amd-pstate driver (Dhananjay Ugwekar).
 
  - Reset amd-pstate driver mode after running selftests (Swapnil
    Sapkal).
 
  - Avoid shadowing ret in amd_pstate_ut_check_driver() (Nathan
    Chancellor).
 
  - Add helper for governor checks to the schedutil cpufreq governor and
    move cpufreq-specific EAS checks to cpufreq (Rafael Wysocki).
 
  - Populate the cpu_capacity sysfs entries from the intel_pstate driver
    after registering asym capacity support (Ricardo Neri).
 
  - Add support for enabling Energy-aware scheduling (EAS) to the
    intel_pstate driver when operating in the passive mode on a hybrid
    platform (Rafael Wysocki).
 
  - Drop redundant cpus_read_lock() from store_local_boost() in the
    cpufreq core (Seyediman Seyedarab).
 
  - Replace sscanf() with kstrtouint() in the cpufreq code and use a
    symbol instead of a raw number in it (Bowen Yu).
 
  - Add support for autonomous CPU performance state selection to the
    CPPC cpufreq driver (Lifeng Zheng).
 
  - OPP: Add dev_pm_opp_set_level() (Praveen Talari).
 
  - Introduce scope-based cleanup headers and mutex locking guards in OPP
    core (Viresh Kumar).
 
  - Switch OPP to use kmemdup_array() (Zhang Enpei).
 
  - Optimize bucket assignment when next_timer_ns equals KTIME_MAX in the
    menu cpuidle governor (Zhongqiu Han).
 
  - Convert the cpuidle PSCI driver to a faux device one (Sudeep Holla).
 
  - Add C1 demotion on/off sysfs knob to the intel_idle driver (Artem
    Bityutskiy).
 
  - Fix typos in two comments in the teo cpuidle governor (Atul Kumar
    Pant).
 
  - Fix denying of auto suspend in pm_suspend_timer_fn() (Charan Teja
    Kalla).
 
  - Move debug runtime PM attributes to runtime_attrs[] (Rafael Wysocki).
 
  - Add new devm_ functions for enabling runtime PM and runtime PM
    reference counting (Bence Csókás).
 
  - Remove size arguments from strscpy() calls in the hibernation core
    code (Thorsten Blum).
 
  - Adjust the handling of devices with asynchronous suspend enabled
    during system suspend and resume to start resuming them immediately
    after resuming their parents and to start suspending such a device
    immediately after suspending its first child (Rafael Wysocki).
 
  - Adjust messages printed during tasks freezing to avoid using
    pr_cont() (Andrew Sayers, Paul Menzel).
 
  - Clean up unnecessary usage of !! in pm_print_times_init() (Zihuan
    Zhang).
 
  - Add missing wakeup source attribute relax_count to sysfs and
    remove the space character at the end ofi the string produced by
    pm_show_wakelocks() (Zijun Hu).
 
  - Add configurable pm_test delay for hibernation (Zihuan Zhang).
 
  - Disable asynchronous suspend in ucsi_ccg_probe() to prevent the
    cypd4226 device on Tegra boards from suspending prematurely (Jon
    Hunter).
 
  - Unbreak printing PM debug messages during hibernation and clean up
    some related code (Rafael Wysocki).
 
  - Add a systemd service to run cpupower and change cpupower binding's
    Makefile to use -lcpupower (John B. Wyatt IV, Francesco Poli).
 -----BEGIN PGP SIGNATURE-----
 
 iQFGBAABCAAwFiEEcM8Aw/RY0dgsiRUR7l+9nS/U47UFAmg0xS0SHHJqd0Byand5
 c29ja2kubmV0AAoJEO5fvZ0v1OO1AwwH/Rvgza5YBPb9JZqWJT/ZiBw7HcEWHhP1
 fNfcVU1gXPZiF0yoPfjfJua6BcLj6lyQ3d/+zWqqAcWfmRSD6HPe8yYz8qALUAqj
 RWhDa04aGj6B9bQuOjejatznYlQlkwCRT7zec+75D+dAHVMqR/Vt2LFAetCadgHe
 MQibAQmVFXu3RFkBjReTAdGzVoTXkwoZDrzdfA2aFAfMJNtJpOW4atUZvnucuctv
 VK3ZratrctCIw7yXEoB1nWSmlY7R5JlslplBfndjmmOnky3YxNr7C6paqwtbTWoF
 MiX48qkmLOGeO6gS8s/lVCDQ4oZ+UNFQvXRsM5NGjycBikhHX/dp/w4=
 =dIqJ
 -----END PGP SIGNATURE-----

Merge tag 'pm-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "Once again, the changes are dominated by cpufreq updates, but this
  time the majority of them are cpufreq core changes, mostly related to
  the introduction of policy locking guards and __free() usage, and
  fixes related to boost handling.

  Still, there is also a significant update of the intel_pstate driver
  making it register an energy model when running on a hybrid platform
  which is used for enabling energy-aware scheduling (EAS) if the driver
  operates in the passive mode (and schedutil is used as the cpufreq
  governor for all CPUs which is the passive mode default).

  There are some amd-pstate driver updates too, for a good measure,
  including the "Requested CPU Min frequency" BIOS option support and
  new online/offline callbacks.

  In the cpuidle space, the most significant change is the addition of a
  C1 demotion on/off sysfs knob to intel_idle which should help some
  users to configure their systems more precisely. There is also the
  conversion of the PSCI cpuidle driver to a faux device one and there
  are two small updates of cpuidle governors.

  Device power management is also modified quite a bit, especially the
  handling of devices with asynchronous suspend and resume enabled
  during system transitions. They are now going to be handled more
  asynchronously during suspend transitions and somewhat less
  aggressively during resume transitions.

  Apart from the above, the operating performance points (OPP) library
  is now going to use mutex locking guards and scope-based cleanup
  helpers and there is the usual bunch of assorted fixes and code
  cleanups.

  Specifics:

   - Fix potential division-by-zero error in em_compute_costs() (Yaxiong
     Tian)

   - Fix typos in energy model documentation and example driver code
     (Moon Hee Lee, Atul Kumar Pant)

   - Rearrange the energy model management code and add a new function
     for adjusting a CPU energy model after adjusting the capacity of
     the given CPU to it (Rafael Wysocki)

   - Refactor cpufreq_online(), add and use cpufreq policy locking
     guards, use __free() in policy reference counting, and clean up
     core cpufreq code on top of that (Rafael Wysocki)

   - Fix boost handling on CPU suspend/resume and sysfs updates (Viresh
     Kumar)

   - Fix des_perf clamping with max_perf in amd_pstate_update()
     (Dhananjay Ugwekar)

   - Add offline, online and suspend callbacks to the amd-pstate driver,
     rename and use the existing amd_pstate_epp callbacks in it
     (Dhananjay Ugwekar)

   - Add support for the "Requested CPU Min frequency" BIOS option to
     the amd-pstate driver (Dhananjay Ugwekar)

   - Reset amd-pstate driver mode after running selftests (Swapnil
     Sapkal)

   - Avoid shadowing ret in amd_pstate_ut_check_driver() (Nathan
     Chancellor)

   - Add helper for governor checks to the schedutil cpufreq governor
     and move cpufreq-specific EAS checks to cpufreq (Rafael Wysocki)

   - Populate the cpu_capacity sysfs entries from the intel_pstate
     driver after registering asym capacity support (Ricardo Neri)

   - Add support for enabling Energy-aware scheduling (EAS) to the
     intel_pstate driver when operating in the passive mode on a hybrid
     platform (Rafael Wysocki)

   - Drop redundant cpus_read_lock() from store_local_boost() in the
     cpufreq core (Seyediman Seyedarab)

   - Replace sscanf() with kstrtouint() in the cpufreq code and use a
     symbol instead of a raw number in it (Bowen Yu)

   - Add support for autonomous CPU performance state selection to the
     CPPC cpufreq driver (Lifeng Zheng)

   - OPP: Add dev_pm_opp_set_level() (Praveen Talari)

   - Introduce scope-based cleanup headers and mutex locking guards in
     OPP core (Viresh Kumar)

   - Switch OPP to use kmemdup_array() (Zhang Enpei)

   - Optimize bucket assignment when next_timer_ns equals KTIME_MAX in
     the menu cpuidle governor (Zhongqiu Han)

   - Convert the cpuidle PSCI driver to a faux device one (Sudeep Holla)

   - Add C1 demotion on/off sysfs knob to the intel_idle driver (Artem
     Bityutskiy)

   - Fix typos in two comments in the teo cpuidle governor (Atul Kumar
     Pant)

   - Fix denying of auto suspend in pm_suspend_timer_fn() (Charan Teja
     Kalla)

   - Move debug runtime PM attributes to runtime_attrs[] (Rafael
     Wysocki)

   - Add new devm_ functions for enabling runtime PM and runtime PM
     reference counting (Bence Csókás)

   - Remove size arguments from strscpy() calls in the hibernation core
     code (Thorsten Blum)

   - Adjust the handling of devices with asynchronous suspend enabled
     during system suspend and resume to start resuming them immediately
     after resuming their parents and to start suspending such a device
     immediately after suspending its first child (Rafael Wysocki)

   - Adjust messages printed during tasks freezing to avoid using
     pr_cont() (Andrew Sayers, Paul Menzel)

   - Clean up unnecessary usage of !! in pm_print_times_init() (Zihuan
     Zhang)

   - Add missing wakeup source attribute relax_count to sysfs and remove
     the space character at the end ofi the string produced by
     pm_show_wakelocks() (Zijun Hu)

   - Add configurable pm_test delay for hibernation (Zihuan Zhang)

   - Disable asynchronous suspend in ucsi_ccg_probe() to prevent the
     cypd4226 device on Tegra boards from suspending prematurely (Jon
     Hunter)

   - Unbreak printing PM debug messages during hibernation and clean up
     some related code (Rafael Wysocki)

   - Add a systemd service to run cpupower and change cpupower binding's
     Makefile to use -lcpupower (John B. Wyatt IV, Francesco Poli)"

* tag 'pm-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (72 commits)
  cpufreq: CPPC: Add support for autonomous selection
  cpufreq: Update sscanf() to kstrtouint()
  cpufreq: Replace magic number
  OPP: switch to use kmemdup_array()
  PM: freezer: Rewrite restarting tasks log to remove stray *done.*
  PM: runtime: fix denying of auto suspend in pm_suspend_timer_fn()
  cpufreq: drop redundant cpus_read_lock() from store_local_boost()
  cpupower: do not install files to /etc/default/
  cpupower: do not call systemctl at install time
  cpupower: do not write DESTDIR to cpupower.service
  PM: sleep: Introduce pm_sleep_transition_in_progress()
  cpufreq/amd-pstate: Avoid shadowing ret in amd_pstate_ut_check_driver()
  cpufreq: intel_pstate: Document hybrid processor support
  cpufreq: intel_pstate: EAS: Increase cost for CPUs using L3 cache
  cpufreq: intel_pstate: EAS support for hybrid platforms
  PM: EM: Introduce em_adjust_cpu_capacity()
  PM: EM: Move CPU capacity check to em_adjust_new_capacity()
  PM: EM: Documentation: Fix typos in example driver code
  cpufreq: Drop policy locking from cpufreq_policy_is_good_for_eas()
  PM: sleep: Introduce pm_suspend_in_progress()
  ...
2025-05-27 16:48:47 -07:00
Linus Torvalds
eaed94d1f6 Scheduler updates for v6.16:
Core & fair scheduler changes:
 
   - Tweak wait_task_inactive() to force dequeue sched_delayed tasks
     (John Stultz)
 
   - Adhere to place_entity() constraints (Peter Zijlstra)
 
   - Allow decaying util_est when util_avg > CPU capacity (Pierre Gondois)
 
   - Fix up wake_up_sync() vs DELAYED_DEQUEUE (Xuewen Yan)
 
 Energy management:
 
   - Introduce sched_update_asym_prefer_cpu() (K Prateek Nayak)
 
   - cpufreq/amd-pstate: Update asym_prefer_cpu when core rankings change
     (K Prateek Nayak)
 
   - Align uclamp and util_est and call before freq update (Xuewen Yan)
 
 CPU isolation:
 
   - Make use of more than one housekeeping CPU (Phil Auld)
 
 RT scheduler:
 
   - Fix race in push_rt_task() (Harshit Agarwal)
 
   - Add kernel cmdline option for rt_group_sched (Michal Koutný)
 
 Scheduler topology support:
 
   - Improve topology_span_sane speed (Steve Wahl)
 
 Scheduler debugging:
 
   - Move and extend the sched_process_exit() tracepoint (Andrii Nakryiko)
 
   - Add RT_GROUP WARN checks for non-root task_groups (Michal Koutný)
 
   - Fix trace_sched_switch(.prev_state) (Peter Zijlstra)
 
   - Untangle cond_resched() and live-patching (Peter Zijlstra)
 
 Fixes and cleanups:
 
   - Misc fixes and cleanups (K Prateek Nayak, Michal Koutný,
     Peter Zijlstra, Xuewen Yan)
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmgy50ARHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jFQQ/+KXl2XDg1V/VVmMG8GmtDlR29V3M3ricy
 D7/2s0D1Y1ErHb+pRMBG31EubT9/bXjUshWIuuf51DciSLBmpELHxY5J+AevRa0L
 /pHFwSvP6H5pDakI/xZ01FlYt7PxZGs+1m1o2615Mbwq6J2bjZTan54CYzrdpLOy
 Nqb3OT4tSqU1+7SV7hVForBpZp9u3CvVBRt/wE6vcHltW/I486bM8OCOd2XrUlnb
 QoIRliGI9KHpqCpbAeKPRSKXpf9tZv/AijZ+0WUu2yY8iwSN4p3RbbbwdCipjVQj
 w5I5oqKI6cylFfl2dEFWXVO+tLBihs06w8KSQrhYmQ9DUu4RGBVM9ORINGDBPejL
 bvoQh1mAkqvIL+oodujdbMDIqLupvOEtVSvwzR7SJn8BJSB00js88ngCWLjo/CcU
 imLbWy9FSBLvOswLBzQthgAJEj+ejCkOIbcvM2lINWhX/zNsMFaaqYcO1wRunGGR
 SavTI1s+ZksCQY6vCwRkwPrOZjyg91TA/q4FK102fHL1IcthH6xubE4yi4lTIUYs
 L56HuGm8e7Shc8M2Y5rAYsVG3GoIHFLXnptOn2HnCRWaAAJYsBaLUlzoBy9MxCfw
 I2YVDCylkQxevosSi2XxXo3tbM6auISU9SelAT/dAz32V1rsjWQojRJXeGYKIbu7
 KBuN/dLItW0=
 =s/ra
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:
 "Core & fair scheduler changes:

   - Tweak wait_task_inactive() to force dequeue sched_delayed tasks
     (John Stultz)

   - Adhere to place_entity() constraints (Peter Zijlstra)

   - Allow decaying util_est when util_avg > CPU capacity (Pierre
     Gondois)

   - Fix up wake_up_sync() vs DELAYED_DEQUEUE (Xuewen Yan)

  Energy management:

   - Introduce sched_update_asym_prefer_cpu() (K Prateek Nayak)

   - cpufreq/amd-pstate: Update asym_prefer_cpu when core rankings
     change (K Prateek Nayak)

   - Align uclamp and util_est and call before freq update (Xuewen Yan)

  CPU isolation:

   - Make use of more than one housekeeping CPU (Phil Auld)

  RT scheduler:

   - Fix race in push_rt_task() (Harshit Agarwal)

   - Add kernel cmdline option for rt_group_sched (Michal Koutný)

  Scheduler topology support:

   - Improve topology_span_sane speed (Steve Wahl)

  Scheduler debugging:

   - Move and extend the sched_process_exit() tracepoint (Andrii
     Nakryiko)

   - Add RT_GROUP WARN checks for non-root task_groups (Michal Koutný)

   - Fix trace_sched_switch(.prev_state) (Peter Zijlstra)

   - Untangle cond_resched() and live-patching (Peter Zijlstra)

  Fixes and cleanups:

   - Misc fixes and cleanups (K Prateek Nayak, Michal Koutný, Peter
     Zijlstra, Xuewen Yan)"

* tag 'sched-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits)
  sched/uclamp: Align uclamp and util_est and call before freq update
  sched/util_est: Simplify condition for util_est_{en,de}queue()
  sched/fair: Fixup wake_up_sync() vs DELAYED_DEQUEUE
  sched,livepatch: Untangle cond_resched() and live-patching
  sched/core: Tweak wait_task_inactive() to force dequeue sched_delayed tasks
  sched/fair: Adhere to place_entity() constraints
  sched/debug: Print the local group's asym_prefer_cpu
  cpufreq/amd-pstate: Update asym_prefer_cpu when core rankings change
  sched/topology: Introduce sched_update_asym_prefer_cpu()
  sched/fair: Use READ_ONCE() to read sg->asym_prefer_cpu
  sched/isolation: Make use of more than one housekeeping cpu
  sched/rt: Fix race in push_rt_task
  sched: Add annotations to RT_GROUP_SCHED fields
  sched: Add RT_GROUP WARN checks for non-root task_groups
  sched: Do not construct nor expose RT_GROUP_SCHED structures if disabled
  sched: Bypass bandwitdh checks with runtime disabled RT_GROUP_SCHED
  sched: Skip non-root task_groups with disabled RT_GROUP_SCHED
  sched: Add commadline option for RT_GROUP_SCHED toggling
  sched: Always initialize rt_rq's task_group
  sched: Remove unneeed macro wrap
  ...
2025-05-26 15:19:58 -07:00
Linus Torvalds
07046958f6 RCU pull request for v6.16
Summary of changes:
 - Removed swake_up_one_online() workaround
 - Reverted an incorrect rcuog wake-up fix from offline softirq
 - Rust RCU Guard methods marked as inline
 - Updated MAINTAINERS with Joel’s and Zqiang's new email address
 - Replaced magic constant in rcu_seq_done_exact() with named constant
 - Added warning mechanism to validate rcu_seq_done_exact()
 - Switched SRCU polling API to use rcu_seq_done_exact()
 - Commented on redundant delta check in rcu_seq_done_exact()
 - Made ->gpwrap tests in rcutorture more frequent
 - Fixed reuse of ARM64 images in rcutorture
 - rcutorture improved to check Kconfig and reader conflict handling
 - Extracted logic from rcu_torture_one_read() for clarity
 - Updated LWN RCU API documentation links
 - Enabled --do-rt in torture.sh for CONFIG_PREEMPT_RT
 - Added tests for SRCU up/down reader primitives
 - Added comments and delays checks in rcutorture
 - Deprecated srcu_read_lock_lite() and srcu_read_unlock_lite() via checkpatch
 - Added --do-normal and --do-no-normal to torture.sh
 - Added RCU Rust binding tests to torture.sh
 - Reduced CPU overcommit and removed MAXSMP/CPUMASK_OFFSTACK in TREE01
 - Replaced kmalloc() with kcalloc() in rcuscale
 - Refined listRCU example code for stale data elimination
 - Fixed hardirq count bug for x86 in cpu_stall_cputime
 - Added safety checks in rcu/nocb for offloaded rdp access
 - Other miscellaneous changes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEcoCIrlGe4gjE06JJqA4nf2o45hAFAmgoF5oACgkQqA4nf2o4
 5hDvVw//TNsJ/g0HTMu02uXMmtFIrgvpTnH7OEGJ+2p/KErrmWYsBJQw41ueLAQL
 Drtq3q9888UFF5LLA43HC88DFmT9uV8V8TmmURH+pZWdmJY1Ekn8UBSBhDPGGpC5
 sGIO2jJKjHN8G7fyJKoPtL9jxKSulHF/XQTIL2pP23jopAIwosoCHVAwGvnGVvBC
 smXfMSu+bd3IifNFroodsqjVXgnNQwWUNboOkz0KfkiiosgZsWWW8DaM3NGjdp+C
 tUHLs1zfC6sgJUjdpokTE3TcNudlMgVlB2Quj5jhh1YvsvedgIJXl4wpR6JVutyN
 F9awKt1AZkyZ+cTp+JpohaWaN9aKfNNG7jZ+rxQ0VcuRh35wmBJtiWNjEtJ38R82
 kTC1RI7MEus+6OZRt92jv5TNSa9t3wHbi5fBjNRiQ8PYq5cibZy7Lyrj2JOK7Zqs
 pgmdUnhQH2Uhf52b+clG5hWO55gEtACY8pin6kNewClcRtz04Jew7gkiYDGka4F4
 EXbuDHSWi25eSb3FzT2BqR72OZcJ0kv747OTp+2yTv2TaBA5p+OD8hvL/WbWC2Ok
 DK1YQ4RgEerTSZ4PbgPtWkNnlf6xjdWBaYNwmo+G/DgfjPoTOy1Jp73Z4b1AqSB5
 IPEQy1d/799QgGTYkbrvRtvWHg8yfOMz3ByZoHg31rafr0AsrXM=
 =6mun
 -----END PGP SIGNATURE-----

Merge tag 'next.2025.05.17a' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux

Pull RCU updates from Joel Fernandes:
 - Removed swake_up_one_online() workaround
 - Reverted an incorrect rcuog wake-up fix from offline softirq
 - Rust RCU Guard methods marked as inline
 - Updated MAINTAINERS with Joel’s and Zqiang's new email address
 - Replaced magic constant in rcu_seq_done_exact() with named constant
 - Added warning mechanism to validate rcu_seq_done_exact()
 - Switched SRCU polling API to use rcu_seq_done_exact()
 - Commented on redundant delta check in rcu_seq_done_exact()
 - Made ->gpwrap tests in rcutorture more frequent
 - Fixed reuse of ARM64 images in rcutorture
 - rcutorture improved to check Kconfig and reader conflict handling
 - Extracted logic from rcu_torture_one_read() for clarity
 - Updated LWN RCU API documentation links
 - Enabled --do-rt in torture.sh for CONFIG_PREEMPT_RT
 - Added tests for SRCU up/down reader primitives
 - Added comments and delays checks in rcutorture
 - Deprecated srcu_read_lock_lite() and srcu_read_unlock_lite() via checkpatch
 - Added --do-normal and --do-no-normal to torture.sh
 - Added RCU Rust binding tests to torture.sh
 - Reduced CPU overcommit and removed MAXSMP/CPUMASK_OFFSTACK in TREE01
 - Replaced kmalloc() with kcalloc() in rcuscale
 - Refined listRCU example code for stale data elimination
 - Fixed hardirq count bug for x86 in cpu_stall_cputime
 - Added safety checks in rcu/nocb for offloaded rdp access
 - Other miscellaneous changes

* tag 'next.2025.05.17a' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (27 commits)
  rcutorture: Fix issue with re-using old images on ARM64
  rcutorture: Remove MAXSMP and CPUMASK_OFFSTACK from TREE01
  rcutorture: Reduce TREE01 CPU overcommit
  torture: Check for "Call trace:" as well as "Call Trace:"
  rcutorture: Perform more frequent testing of ->gpwrap
  torture: Add testing of RCU's Rust bindings to torture.sh
  torture: Add --do-{,no-}normal to torture.sh
  checkpatch: Deprecate srcu_read_lock_lite() and srcu_read_unlock_lite()
  rcutorture: Comment invocations of tick_dep_set_task()
  rcu/nocb: Add Safe checks for access offloaded rdp
  rcuscale: using kcalloc() to relpace kmalloc()
  doc/RCU/listRCU: refine example code for eliminating stale data
  doc: Update LWN RCU API links in whatisRCU.rst
  Revert "rcu/nocb: Fix rcuog wake-up from offline softirq"
  rust: sync: rcu: Mark Guard methods as inline
  rcu/cpu_stall_cputime: fix the hardirq count for x86 architecture
  rcu: Remove swake_up_one_online() bandaid
  MAINTAINERS: Update Zqiang's email address
  rcutorture: Make torture.sh --do-rt use CONFIG_PREEMPT_RT
  srcu: Use rcu_seq_done_exact() for polling API
  ...
2025-05-26 14:20:50 -07:00
Rafael J. Wysocki
76524ffd10 Merge branches 'pm-runtime' and 'pm-sleep'
Merge updates related to system sleep handling and runtime PM for 6.16-rc1:

 - Fix denying of auto suspend in pm_suspend_timer_fn() (Charan Teja
   Kalla).

 - Move debug runtime PM attributes to runtime_attrs[] (Rafael Wysocki).

 - Add new devm_ functions for enabling runtime PM and runtime PM
   reference counting (Bence Csókás).

 - Remove size arguments from strscpy() calls in the hibernation core
   code (Thorsten Blum).

 - Adjust the handling of devices with asynchronous suspend enabled
   during system suspend and resume to start resuming them immediately
   after resuming their parents and to start suspending such a device
   immediately after suspending its first child (Rafael Wysocki).

 - Adjust messages printed during tasks freezing to avoid using
   pr_cont() (Andrew Sayers, Paul Menzel).

 - Clean up unnecessary usage of !! in pm_print_times_init() (Zihuan
   Zhang).

 - Add missing wakeup source attribute relax_count to sysfs and
   remove the space character at the end ofi the string produced by
   pm_show_wakelocks() (Zijun Hu).

 - Add configurable pm_test delay for hibernation (Zihuan Zhang).

 - Disable asynchronous suspend in ucsi_ccg_probe() to prevent the
   cypd4226 device on Tegra boards from suspending prematurely (Jon
   Hunter).

 - Unbreak printing PM debug messages during hibernation and clean up
   some related code (Rafael Wysocki).

* pm-runtime:
  PM: runtime: fix denying of auto suspend in pm_suspend_timer_fn()
  PM: sysfs: Move debug runtime PM attributes to runtime_attrs[]
  PM: runtime: Add new devm functions

* pm-sleep:
  PM: freezer: Rewrite restarting tasks log to remove stray *done.*
  PM: sleep: Introduce pm_sleep_transition_in_progress()
  PM: sleep: Introduce pm_suspend_in_progress()
  PM: sleep: Print PM debug messages during hibernation
  ucsi_ccg: Disable async suspend in ucsi_ccg_probe()
  PM: hibernate: add configurable delay for pm_test
  PM: wakeup: Delete space in the end of string shown by pm_show_wakelocks()
  PM: wakeup: Add missing wakeup source attribute relax_count
  PM: sleep: Remove unnecessary !!
  PM: sleep: Use two lines for "Restarting..." / "done" messages
  PM: sleep: Make suspend of devices more asynchronous
  PM: sleep: Suspend async parents after suspending children
  PM: sleep: Resume children after resuming the parent
  PM: hibernate: Remove size arguments when calling strscpy()
2025-05-26 21:21:58 +02:00
Linus Torvalds
181d8e399f vfs-6.16-rc1.misc
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaDBPTwAKCRCRxhvAZXjc
 om0+AQDMxKLweJXplqQQ7jxuvW2dEa60YpE2EalEKWGg9YA3KgEA3nI4kyKMKn7Y
 PRFXgIcKvhs62oJLKsq8SGQUqExqvAE=
 =atEw
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.16-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull misc vfs updates from Christian Brauner:
 "This contains the usual selections of misc updates for this cycle.

  Features:

   - Use folios for symlinks in the page cache

     FUSE already uses folios for its symlinks. Mirror that conversion
     in the generic code and the NFS code. That lets us get rid of a few
     folio->page->folio conversions in this path, and some of the few
     remaining users of read_cache_page() / read_mapping_page()

   - Try and make a few filesystem operations killable on the VFS
     inode->i_mutex level

   - Add sysctl vfs_cache_pressure_denom for bulk file operations

     Some workloads need to preserve more dentries than we currently
     allow through out sysctl interface

     A HDFS servers with 12 HDDs per server, on a HDFS datanode startup
     involves scanning all files and caching their metadata (including
     dentries and inodes) in memory. Each HDD contains approximately 2
     million files, resulting in a total of ~20 million cached dentries
     after initialization

     To minimize dentry reclamation, they set vfs_cache_pressure to 1.
     Despite this configuration, memory pressure conditions can still
     trigger reclamation of up to 50% of cached dentries, reducing the
     cache from 20 million to approximately 10 million entries. During
     the subsequent cache rebuild period, any HDFS datanode restart
     operation incurs substantial latency penalties until full cache
     recovery completes

     To maintain service stability, more dentries need to be preserved
     during memory reclamation. The current minimum reclaim ratio (1/100
     of total dentries) remains too aggressive for such workload. This
     patch introduces vfs_cache_pressure_denom for more granular cache
     pressure control

     The configuration [vfs_cache_pressure=1,
     vfs_cache_pressure_denom=10000] effectively maintains the full 20
     million dentry cache under memory pressure, preventing datanode
     restart performance degradation

   - Avoid some jumps in inode_permission() using likely()/unlikely()

   - Avid a memory access which is most likely a cache miss when
     descending into devcgroup_inode_permission()

   - Add fastpath predicts for stat() and fdput()

   - Anonymous inodes currently don't come with a proper mode causing
     issues in the kernel when we want to add useful VFS debug assert.
     Fix that by giving them a proper mode and masking it off when we
     report it to userspace which relies on them not having any mode

   - Anonymous inodes currently allow to change inode attributes because
     the VFS falls back to simple_setattr() if i_op->setattr isn't
     implemented. This means the ownership and mode for every single
     user of anon_inode_inode can be changed. Block that as it's either
     useless or actively harmful. If specific ownership is needed the
     respective subsystem should allocate anonymous inodes from their
     own private superblock

   - Raise SB_I_NODEV and SB_I_NOEXEC on the anonymous inode superblock

   - Add proper tests for anonymous inode behavior

   - Make it easy to detect proper anonymous inodes and to ensure that
     we can detect them in codepaths such as readahead()

  Cleanups:

   - Port pidfs to the new anon_inode_{g,s}etattr() helpers

   - Try to remove the uselib() system call

   - Add unlikely branch hint return path for poll

   - Add unlikely branch hint on return path for core_sys_select

   - Don't allow signals to interrupt getdents copying for fuse

   - Provide a size hint to dir_context for during readdir()

   - Use writeback_iter directly in mpage_writepages

   - Update compression and mtime descriptions in initramfs
     documentation

   - Update main netfs API document

   - Remove useless plus one in super_cache_scan()

   - Remove unnecessary NULL-check guards during setns()

   - Add separate separate {get,put}_cgroup_ns no-op cases

  Fixes:

   - Fix typo in root= kernel parameter description

   - Use KERN_INFO for infof()|info_plog()|infofc()

   - Correct comments of fs_validate_description()

   - Mark an unlikely if condition with unlikely() in
     vfs_parse_monolithic_sep()

   - Delete macro fsparam_u32hex()

   - Remove unused and problematic validate_constant_table()

   - Fix potential unsigned integer underflow in fs_name()

   - Make file-nr output the total allocated file handles"

* tag 'vfs-6.16-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (43 commits)
  fs: Pass a folio to page_put_link()
  nfs: Use a folio in nfs_get_link()
  fs: Convert __page_get_link() to use a folio
  fs/read_write: make default_llseek() killable
  fs/open: make do_truncate() killable
  fs/open: make chmod_common() and chown_common() killable
  include/linux/fs.h: add inode_lock_killable()
  readdir: supply dir_context.count as readdir buffer size hint
  vfs: Add sysctl vfs_cache_pressure_denom for bulk file operations
  fuse: don't allow signals to interrupt getdents copying
  Documentation: fix typo in root= kernel parameter description
  include/cgroup: separate {get,put}_cgroup_ns no-op case
  kernel/nsproxy: remove unnecessary guards
  fs: use writeback_iter directly in mpage_writepages
  fs: remove useless plus one in super_cache_scan()
  fs: add S_ANON_INODE
  fs: remove uselib() system call
  device_cgroup: avoid access to ->i_rdev in the common case in devcgroup_inode_permission()
  fs/fs_parse: Remove unused and problematic validate_constant_table()
  fs: touch up predicts in inode_permission()
  ...
2025-05-26 09:02:39 -07:00
Joel Fernandes
aafe12f980 rcutorture: Perform more frequent testing of ->gpwrap
Currently, the ->gpwrap is not tested (at all per my testing) due to the
requirement of a large delta between a CPU's rdp->gp_seq and its node's
rnp->gpseq.

This results in no testing of ->gpwrap being set. This patch by default
adds 5 minutes of testing with ->gpwrap forced by lowering the delta
between rdp->gp_seq and rnp->gp_seq to just 8 GPs. All of this is
configurable, including the active time for the setting and a full
testing cycle.

By default, the first 25 minutes of a test will have the _default_
behavior there is right now (ULONG_MAX / 4) delta. Then for 5 minutes,
we switch to a smaller delta causing 1-2 wraps in 5 minutes. I believe
this is reasonable since we at least add a little bit of testing for
usecases where ->gpwrap is set.

[ Apply fix for Dan Carpenter's bug report on init path cleanup. ]
[ Apply kernel doc warning fix from Akira Yokosawa. ]

Tested-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
2025-05-16 11:13:27 -04:00
Petr Vaněk
678927c0c9
Documentation: fix typo in root= kernel parameter description
Fixes a typo in the root= parameter description, changing
"this a a" to "this is a".

Fixes: c0c1a7dcb6 ("init: move the nfs/cifs/ram special cases out of name_to_dev_t")
Signed-off-by: Petr Vaněk <arkamar@atlas.cz>
Link: https://lore.kernel.org/20250512110827.32530-1-arkamar@atlas.cz
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-13 09:27:57 +02:00
Alexander Graf
3498209ff6 Documentation: add documentation for KHO
With KHO in place, let's add documentation that describes what it is and
how to use it.

Link: https://lkml.kernel.org/r/20250509074635.3187114-17-changyuanl@google.com
Signed-off-by: Alexander Graf <graf@amazon.com>
Co-developed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Co-developed-by: Changyuan Lyu <changyuanl@google.com>
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Anthony Yznaga <anthony.yznaga@oracle.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ashish Kalra <ashish.kalra@amd.com>
Cc: Ben Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Gowans <jgowans@amazon.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Pratyush Yadav <ptyadav@amazon.de>
Cc: Rob Herring <robh@kernel.org>
Cc: Saravana Kannan <saravanak@google.com>
Cc: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Thomas Lendacky <thomas.lendacky@amd.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-12 23:50:42 -07:00
Pawan Gupta
facd226f7e x86/its: Add support for RSB stuffing mitigation
When retpoline mitigation is enabled for spectre-v2, enabling
call-depth-tracking and RSB stuffing also mitigates ITS. Add cmdline option
indirect_target_selection=stuff to allow enabling RSB stuffing mitigation.

When retpoline mitigation is not enabled, =stuff option is ignored, and
default mitigation for ITS is deployed.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
2025-05-09 13:22:05 -07:00
Pawan Gupta
2665281a07 x86/its: Add "vmexit" option to skip mitigation on some CPUs
Ice Lake generation CPUs are not affected by guest/host isolation part of
ITS. If a user is only concerned about KVM guests, they can now choose a
new cmdline option "vmexit" that will not deploy the ITS mitigation when
CPU is not affected by guest/host isolation. This saves the performance
overhead of ITS mitigation on Ice Lake gen CPUs.

When "vmexit" option selected, if the CPU is affected by ITS guest/host
isolation, the default ITS mitigation is deployed.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
2025-05-09 13:22:05 -07:00
Pawan Gupta
f4818881c4 x86/its: Enable Indirect Target Selection mitigation
Indirect Target Selection (ITS) is a bug in some pre-ADL Intel CPUs with
eIBRS. It affects prediction of indirect branch and RETs in the
lower half of cacheline. Due to ITS such branches may get wrongly predicted
to a target of (direct or indirect) branch that is located in the upper
half of the cacheline.

Scope of impact
===============

Guest/host isolation
--------------------
When eIBRS is used for guest/host isolation, the indirect branches in the
VMM may still be predicted with targets corresponding to branches in the
guest.

Intra-mode
----------
cBPF or other native gadgets can be used for intra-mode training and
disclosure using ITS.

User/kernel isolation
---------------------
When eIBRS is enabled user/kernel isolation is not impacted.

Indirect Branch Prediction Barrier (IBPB)
-----------------------------------------
After an IBPB, indirect branches may be predicted with targets
corresponding to direct branches which were executed prior to IBPB. This is
mitigated by a microcode update.

Add cmdline parameter indirect_target_selection=off|on|force to control the
mitigation to relocate the affected branches to an ITS-safe thunk i.e.
located in the upper half of cacheline. Also add the sysfs reporting.

When retpoline mitigation is deployed, ITS safe-thunks are not needed,
because retpoline sequence is already ITS-safe. Similarly, when call depth
tracking (CDT) mitigation is deployed (retbleed=stuff), ITS safe return
thunk is not used, as CDT prevents RSB-underflow.

To not overcomplicate things, ITS mitigation is not supported with
spectre-v2 lfence;jmp mitigation. Moreover, it is less practical to deploy
lfence;jmp mitigation on ITS affected parts anyways.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
2025-05-09 13:22:05 -07:00
Zihuan Zhang
50c9bb30dc PM: hibernate: add configurable delay for pm_test
Turn the default 5 second test delay for hibernation into a
configurable module parameter, so users can determine how
long to wait in this pseudo-hibernate state before resuming
the system.

The configurable delay parameter has been added for suspend, so
add an analogous one for hibernation.

Example (wait 30 seconds);

  # echo 30 > /sys/module/hibernate/parameters/pm_test_delay
  # echo core > /sys/power/pm_test

Signed-off-by: Zihuan Zhang <zhangzihuan@kylinos.cn>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://patch.msgid.link/20250507063520.419635-1-zhangzihuan@kylinos.cn
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-05-09 15:59:10 +02:00
Andy Shevchenko
996457176b x86/early_printk: Use 'mmio32' for consistency, fix comments
First of all, using 'mmio' prevents proper implementation of 8-bit accessors.
Second, it's simply inconsistent with uart8250 set of options. Rename it to
'mmio32'. While at it, remove rather misleading comment in the documentation.
From now on mmio32 is self-explanatory and pciserial supports not only 32-bit
MMIO accessors.

Also, while at it, fix the comment for the "pciserial" case. The comment
seems to be a copy'n'paste error when mentioning "serial" instead of
"pciserial" (with double quotes). Fix this.

With that, move it upper, so we don't calculate 'buf' twice.

Fixes: 3181424aea ("x86/early_printk: Add support for MMIO-based UARTs")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Denis Mukhin <dmukhin@ford.com>
Link: https://lore.kernel.org/r/20250407172214.792745-1-andriy.shevchenko@linux.intel.com
2025-04-09 12:27:08 +02:00
Michal Koutný
e34e0131fe sched: Add commadline option for RT_GROUP_SCHED toggling
Only simple implementation with a static key wrapper, it will be wired
in later.

Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250310170442.504716-5-mkoutny@suse.com
2025-04-08 20:55:53 +02:00
Linus Torvalds
dd9db3bff8 more s390 updates for 6.15 merge window
- Fix machine check handler _CIF_MCCK_GUEST bit setting by adding the
   missing base register for relocated lowcore address
 
 - Fix build failure on older linkers by conditionally adding the -no-pie
   linker option only when it is supported
 
 - Fix inaccurate kernel messages in vfio-ap by providing descriptive
   error notifications for AP queue sharing violations
 
 - Fix PCI isolation logic by ensuring non-VF devices correctly return
   false in zpci_bus_is_isolated_vf()
 
 - Fix PCI DMA range map setup by using dma_direct_set_offset() to add a
   proper sentinel element, preventing potential overruns and translation
   errors
 
 - Cleanup header dependency problems with asm-offsets.c
 
 - Add fault info for unexpected low-address protection faults in user mode
 
 - Add support for HOTPLUG_SMT, replacing the arch-specific "nosmt"
   handling with common code handling
 
 - Use bitop functions to implement CPU flag helper functions to ensure
   that bits cannot get lost if modified in different contexts on a CPU
 
 - Remove unused machine_flags for the lowcore
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmfwbhwACgkQjYWKoQLX
 FBgmlwf5ASFl51yYziSg47NTOdCgkdE1un69VKCU7g9nzQBCFiK4PT+H/d69JHpt
 g9dHi4sW6aXooUWyqtrQbsDZSnGKvJd48wOkpqAKKKJCBx3On/7s1fi1sJXgJovo
 rSwC1cRD+yfFDNoQgXytSrRAkoPYvIPxf0aQei/8ziVNOrl7iF7nyPYpgNcZl2iz
 rQWt+o4oWhygs6lK4w9t+0V0QLL+CqTIN/tpf7qeQzNFN5WfRJBn0dtxQhK72enG
 WepU41LnZxDF5DTKhH4+PZFdLPP+YwwTGdqojW28leGy5T3/Eg0wrDaU1034Wpgd
 m5Fg03z94Vdul/rAEscaH1PLT3Ufng==
 =G5/g
 -----END PGP SIGNATURE-----

Merge tag 's390-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull more s390 updates from Vasily Gorbik:

 - Fix machine check handler _CIF_MCCK_GUEST bit setting by adding the
   missing base register for relocated lowcore address

 - Fix build failure on older linkers by conditionally adding the
   -no-pie linker option only when it is supported

 - Fix inaccurate kernel messages in vfio-ap by providing descriptive
   error notifications for AP queue sharing violations

 - Fix PCI isolation logic by ensuring non-VF devices correctly return
   false in zpci_bus_is_isolated_vf()

 - Fix PCI DMA range map setup by using dma_direct_set_offset() to add a
   proper sentinel element, preventing potential overruns and
   translation errors

 - Cleanup header dependency problems with asm-offsets.c

 - Add fault info for unexpected low-address protection faults in user
   mode

 - Add support for HOTPLUG_SMT, replacing the arch-specific "nosmt"
   handling with common code handling

 - Use bitop functions to implement CPU flag helper functions to ensure
   that bits cannot get lost if modified in different contexts on a CPU

 - Remove unused machine_flags for the lowcore

* tag 's390-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/vfio-ap: Fix no AP queue sharing allowed message written to kernel log
  s390/pci: Fix dev.dma_range_map missing sentinel element
  s390/mm: Dump fault info in case of low address protection fault
  s390/smp: Add support for HOTPLUG_SMT
  s390: Fix linker error when -no-pie option is unavailable
  s390/processor: Use bitop functions for cpu flag helper functions
  s390/asm-offsets: Remove ASM_OFFSETS_C
  s390/asm-offsets: Include ftrace_regs.h instead of ftrace.h
  s390/kvm: Split kvm_host header file
  s390/pci: Fix zpci_bus_is_isolated_vf() for non-VFs
  s390/lowcore: Remove unused machine_flags
  s390/entry: Fix setting _CIF_MCCK_GUEST with lowcore relocation
2025-04-04 16:58:34 -07:00
Linus Torvalds
4a1d8ababd RISC-V Patches for the 6.15 Merge Window, Part 1
* The sub-architecture selection Kconfig system has been cleaned up,
   the documentation has been improved, and various detections have been
   fixed.
 * The vector-related extensions dependencies are now validated when
   parsing from device tree and in the DT bindings.
 * Misaligned access probing can be overridden via a kernel command-line
   parameter, along with various fixes to misalign access handling.
 * Support for relocatable !MMU kernels builds.
 * Support for hpge pfnmaps, which should improve TLB utilization.
 * Support for runtime constants, which improves the d_hash()
   performance.
 * Support for bfloat16, Zicbom, Zaamo, Zalrsc, Zicntr, Zihpm.
 * Various fixes, including:
       - We were missing a secondary mmu notifier call when flushing the
 	tlb which is required for IOMMU.
       - Fix ftrace panics by saving the registers as expected by ftrace.
       - Fix a couple of stimecmp usage related to cpu hotplug.
       - purgatory_start is now aligned as per the STVEC requirements.
       - A fix for hugetlb when calculating the size of non-present PTEs.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmfv/soTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYierZEACDwI9lJFCEbQPon3z8rAy1moTj0+AZ
 bMfZFqMphUTrJ0cMm2+Bc+XZgck12zHCyu1UljDcZVYMCHA9aOoj5C5NkBBVLCuL
 uLYrhIoQXtJaVIANiFl0SHAZmh2s2OoSgmUzrEZ8JGlHpKCF7EVX5bHEsOvzn9ir
 B2W992W6q3ISuKXHKsTpa7rmTtf7swGYg6zW3pX3l6HmY+EMEQOcQl0tAB383J/T
 lm0K4+YvLpRJdm2ARpNGWlcFXj9/UXUM5hplK3aBAHpPKQ5/83/4tMDsfRvhpEVC
 VJXNgK+H4XLD542aQ8d4ZROguyhwn9e2n6Dkv0OqfNk4lg5pUBcJUZftQ+rB7AWg
 VYB1KVpxhwcruheXJFz8S3EzjZTcS+JrcD80vvx8JmHdXkZwHTfYUgiFwe/TR7yr
 b518fEbXpVwDZiCbaAe3Cmpw0mlNnSVmU4hgNbiwt0fu9DGdPN9WQbyds68RKb7A
 TWwDmmD6kV2BTWl0mHPtu9VhX58CDG+0WYbHA7r82p2T50187766C92GYfN2UPpz
 lH0iMRDkmucclZ3fEoosJ+HsDntc4oe6Bhdzuj52Q7vBpDd/QB6t5cfrlDpEEdgU
 3qoWMN5mb5l1rbvrqENh5ZgmEpzV8K0R5F5quiXh/9wO0y1kepDslTqC2oXK/m0p
 DzsvvD6UnNMOUQ==
 =nCJo
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-6.15-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V updates from Palmer Dabbelt:

 - The sub-architecture selection Kconfig system has been cleaned up,
   the documentation has been improved, and various detections have been
   fixed

 - The vector-related extensions dependencies are now validated when
   parsing from device tree and in the DT bindings

 - Misaligned access probing can be overridden via a kernel command-line
   parameter, along with various fixes to misalign access handling

 - Support for relocatable !MMU kernels builds

 - Support for hpge pfnmaps, which should improve TLB utilization

 - Support for runtime constants, which improves the d_hash()
   performance

 - Support for bfloat16, Zicbom, Zaamo, Zalrsc, Zicntr, Zihpm

 - Various fixes, including:
      - We were missing a secondary mmu notifier call when flushing the
        tlb which is required for IOMMU
      - Fix ftrace panics by saving the registers as expected by ftrace
      - Fix a couple of stimecmp usage related to cpu hotplug
      - purgatory_start is now aligned as per the STVEC requirements
      - A fix for hugetlb when calculating the size of non-present PTEs

* tag 'riscv-for-linus-6.15-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (65 commits)
  riscv: Add norvc after .option arch in runtime const
  riscv: Make sure toolchain supports zba before using zba instructions
  riscv/purgatory: 4B align purgatory_start
  riscv/kexec_file: Handle R_RISCV_64 in purgatory relocator
  selftests: riscv: fix v_exec_initval_nolibc.c
  riscv: Fix hugetlb retrieval of number of ptes in case of !present pte
  riscv: print hartid on bringup
  riscv: Add norvc after .option arch in runtime const
  riscv: Remove CONFIG_PAGE_OFFSET
  riscv: Support CONFIG_RELOCATABLE on riscv32
  asm-generic: Always define Elf_Rel and Elf_Rela
  riscv: Support CONFIG_RELOCATABLE on NOMMU
  riscv: Allow NOMMU kernels to access all of RAM
  riscv: Remove duplicate CONFIG_PAGE_OFFSET definition
  RISC-V: errata: Use medany for relocatable builds
  dt-bindings: riscv: document vector crypto requirements
  dt-bindings: riscv: add vector sub-extension dependencies
  dt-bindings: riscv: d requires f
  RISC-V: add f & d extension validation checks
  RISC-V: add vector crypto extension validation checks
  ...
2025-04-04 09:49:17 -07:00
Linus Torvalds
6cb0bd94c0 Persistent buffer cleanups and simplifications for v6.15:
It was mistaken that the physical memory returned from "reserve_mem" had to
 be vmap()'d to get to it from a virtual address. But reserve_mem already
 maps the memory to the virtual address of the kernel so a simple
 phys_to_virt() can be used to get to the virtual address from the physical
 memory returned by "reserve_mem". With this new found knowledge, the
 code can be cleaned up and simplified.
 
 - Enforce that the persistent memory is page aligned
 
   As the buffers using the persistent memory are all going to be
   mapped via pages, make sure that the memory given to the tracing
   infrastructure is page aligned. If it is not, it will print a warning
   and fail to map the buffer.
 
 - Use phys_to_virt() to get the virtual address from reserve_mem
 
   Instead of calling vmap() on the physical memory returned from
   "reserve_mem", use phys_to_virt() instead.
 
   As the memory returned by "memmap" or any other means where a physical
   address is given to the tracing infrastructure, it still needs to
   be vmap(). Since this memory can never be returned back to the buddy
   allocator nor should it ever be memmory mapped to user space, flag
   this buffer and up the ref count. The ref count will keep it from
   ever being freed, and the flag will prevent it from ever being memory
   mapped to user space.
 
 - Use vmap_page_range() for memmap virtual address mapping
 
   For the memmap buffer, instead of allocating an array of struct pages,
   assigning them to the contiguous phsycial memory and then passing that to
   vmap(), use vmap_page_range() instead
 
 - Replace flush_dcache_folio() with flush_kernel_vmap_range()
 
   Instead of calling virt_to_folio() and passing that to
   flush_dcache_folio(), just call flush_kernel_vmap_range() directly.
   This also fixes a bug where if a subbuffer was bigger than PAGE_SIZE
   only the PAGE_SIZE portion would be flushed.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZ+6oZRQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qhq6AP481KHAgaowQCg7zrKPkMlbYBIigYoU
 7aqoAg2rSLBRSQEAl8fViHZgZ9Q+O7xdozQWiIR7/KQW8VIaTcP/V7cHkAU=
 =+5JB
 -----END PGP SIGNATURE-----

Merge tag 'trace-ringbuffer-v6.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull ring-buffer updates from Steven Rostedt:
 "Persistent buffer cleanups and simplifications.

  It was mistaken that the physical memory returned from "reserve_mem"
  had to be vmap()'d to get to it from a virtual address. But
  reserve_mem already maps the memory to the virtual address of the
  kernel so a simple phys_to_virt() can be used to get to the virtual
  address from the physical memory returned by "reserve_mem". With this
  new found knowledge, the code can be cleaned up and simplified.

   - Enforce that the persistent memory is page aligned

     As the buffers using the persistent memory are all going to be
     mapped via pages, make sure that the memory given to the tracing
     infrastructure is page aligned. If it is not, it will print a
     warning and fail to map the buffer.

   - Use phys_to_virt() to get the virtual address from reserve_mem

     Instead of calling vmap() on the physical memory returned from
     "reserve_mem", use phys_to_virt() instead.

     As the memory returned by "memmap" or any other means where a
     physical address is given to the tracing infrastructure, it still
     needs to be vmap(). Since this memory can never be returned back to
     the buddy allocator nor should it ever be memmory mapped to user
     space, flag this buffer and up the ref count. The ref count will
     keep it from ever being freed, and the flag will prevent it from
     ever being memory mapped to user space.

   - Use vmap_page_range() for memmap virtual address mapping

     For the memmap buffer, instead of allocating an array of struct
     pages, assigning them to the contiguous phsycial memory and then
     passing that to vmap(), use vmap_page_range() instead

   - Replace flush_dcache_folio() with flush_kernel_vmap_range()

     Instead of calling virt_to_folio() and passing that to
     flush_dcache_folio(), just call flush_kernel_vmap_range() directly.
     This also fixes a bug where if a subbuffer was bigger than
     PAGE_SIZE only the PAGE_SIZE portion would be flushed"

* tag 'trace-ringbuffer-v6.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  ring-buffer: Use flush_kernel_vmap_range() over flush_dcache_folio()
  tracing: Use vmap_page_range() to map memmap ring buffer
  tracing: Have reserve_mem use phys_to_virt() and separate from memmap buffer
  tracing: Enforce the persistent ring buffer to be page aligned
2025-04-03 16:09:29 -07:00
Steven Rostedt
c44a14f216 tracing: Enforce the persistent ring buffer to be page aligned
Enforce that the address and the size of the memory used by the persistent
ring buffer is page aligned. Also update the documentation to reflect this
requirement.

Link: https://lore.kernel.org/all/CAHk-=whUOfVucfJRt7E0AH+GV41ELmS4wJqxHDnui6Giddfkzw@mail.gmail.com/

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Jann Horn <jannh@google.com>
Link: https://lore.kernel.org/20250402144953.412882844@goodmis.org
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-02 11:02:26 -04:00
Linus Torvalds
d6b02199cd - The 7 patch series "powerpc/crash: use generic crashkernel
reservation" from Sourabh Jain changes powerpc's kexec code to use more
   of the generic layers.
 
 - The 2 patch series "get_maintainer: report subsystem status
   separately" from Vlastimil Babka makes some long-requested improvements
   to the get_maintainer output.
 
 - The 4 patch series "ucount: Simplify refcounting with rcuref_t" from
   Sebastian Siewior cleans up and optimizing the refcounting in the ucount
   code.
 
 - The 12 patch series "reboot: support runtime configuration of
   emergency hw_protection action" from Ahmad Fatoum improves the ability
   for a driver to perform an emergency system shutdown or reboot.
 
 - The 16 patch series "Converge on using secs_to_jiffies() part two"
   from Easwar Hariharan performs further migrations from
   msecs_to_jiffies() to secs_to_jiffies().
 
 - The 7 patch series "lib/interval_tree: add some test cases and
   cleanup" from Wei Yang permits more userspace testing of kernel library
   code, adds some more tests and performs some cleanups.
 
 - The 2 patch series "hung_task: Dump the blocking task stacktrace" from
   Masami Hiramatsu arranges for the hung_task detector to dump the stack
   of the blocking task and not just that of the blocked task.
 
 - The 4 patch series "resource: Split and use DEFINE_RES*() macros" from
   Andy Shevchenko provides some cleanups to the resource definition
   macros.
 
 - Plus the usual shower of singleton patches - please see the individual
   changelogs for details.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ+nuqwAKCRDdBJ7gKXxA
 jtNqAQDxqJpjWkzn4yN9CNSs1ivVx3fr6SqazlYCrt3u89WQvwEA1oRrGpETzUGq
 r6khQUIcQImPPcjFqEFpuiSOU0MBZA0=
 =Kii8
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2025-03-30-18-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:

 - The series "powerpc/crash: use generic crashkernel reservation" from
   Sourabh Jain changes powerpc's kexec code to use more of the generic
   layers.

 - The series "get_maintainer: report subsystem status separately" from
   Vlastimil Babka makes some long-requested improvements to the
   get_maintainer output.

 - The series "ucount: Simplify refcounting with rcuref_t" from
   Sebastian Siewior cleans up and optimizing the refcounting in the
   ucount code.

 - The series "reboot: support runtime configuration of emergency
   hw_protection action" from Ahmad Fatoum improves the ability for a
   driver to perform an emergency system shutdown or reboot.

 - The series "Converge on using secs_to_jiffies() part two" from Easwar
   Hariharan performs further migrations from msecs_to_jiffies() to
   secs_to_jiffies().

 - The series "lib/interval_tree: add some test cases and cleanup" from
   Wei Yang permits more userspace testing of kernel library code, adds
   some more tests and performs some cleanups.

 - The series "hung_task: Dump the blocking task stacktrace" from Masami
   Hiramatsu arranges for the hung_task detector to dump the stack of
   the blocking task and not just that of the blocked task.

 - The series "resource: Split and use DEFINE_RES*() macros" from Andy
   Shevchenko provides some cleanups to the resource definition macros.

 - Plus the usual shower of singleton patches - please see the
   individual changelogs for details.

* tag 'mm-nonmm-stable-2025-03-30-18-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (77 commits)
  mailmap: consolidate email addresses of Alexander Sverdlin
  fs/procfs: fix the comment above proc_pid_wchan()
  relay: use kasprintf() instead of fixed buffer formatting
  resource: replace open coded variant of DEFINE_RES()
  resource: replace open coded variants of DEFINE_RES_*_NAMED()
  resource: replace open coded variant of DEFINE_RES_NAMED_DESC()
  resource: split DEFINE_RES_NAMED_DESC() out of DEFINE_RES_NAMED()
  samples: add hung_task detector mutex blocking sample
  hung_task: show the blocker task if the task is hung on mutex
  kexec_core: accept unaccepted kexec segments' destination addresses
  watchdog/perf: optimize bytes copied and remove manual NUL-termination
  lib/interval_tree: fix the comment of interval_tree_span_iter_next_gap()
  lib/interval_tree: skip the check before go to the right subtree
  lib/interval_tree: add test case for span iteration
  lib/interval_tree: add test case for interval_tree_iter_xxx() helpers
  lib/rbtree: add random seed
  lib/rbtree: split tests
  lib/rbtree: enable userland test suite for rbtree related data structure
  checkpatch: describe --min-conf-desc-length
  scripts/gdb/symbols: determine KASLR offset on s390
  ...
2025-04-01 10:06:52 -07:00
Linus Torvalds
eb0ece1602 - The 6 patch series "Enable strict percpu address space checks" from
Uros Bizjak uses x86 named address space qualifiers to provide
   compile-time checking of percpu area accesses.
 
   This has caused a small amount of fallout - two or three issues were
   reported.  In all cases the calling code was founf to be incorrect.
 
 - The 4 patch series "Some cleanup for memcg" from Chen Ridong
   implements some relatively monir cleanups for the memcontrol code.
 
 - The 17 patch series "mm: fixes for device-exclusive entries (hmm)"
   from David Hildenbrand fixes a boatload of issues which David found then
   using device-exclusive PTE entries when THP is enabled.  More work is
   needed, but this makes thins better - our own HMM selftests now succeed.
 
 - The 2 patch series "mm: zswap: remove z3fold and zbud" from Yosry
   Ahmed remove the z3fold and zbud implementations.  They have been
   deprecated for half a year and nobody has complained.
 
 - The 5 patch series "mm: further simplify VMA merge operation" from
   Lorenzo Stoakes implements numerous simplifications in this area.  No
   runtime effects are anticipated.
 
 - The 4 patch series "mm/madvise: remove redundant mmap_lock operations
   from process_madvise()" from SeongJae Park rationalizes the locking in
   the madvise() implementation.  Performance gains of 20-25% were observed
   in one MADV_DONTNEED microbenchmark.
 
 - The 12 patch series "Tiny cleanup and improvements about SWAP code"
   from Baoquan He contains a number of touchups to issues which Baoquan
   noticed when working on the swap code.
 
 - The 2 patch series "mm: kmemleak: Usability improvements" from Catalin
   Marinas implements a couple of improvements to the kmemleak user-visible
   output.
 
 - The 2 patch series "mm/damon/paddr: fix large folios access and
   schemes handling" from Usama Arif provides a couple of fixes for DAMON's
   handling of large folios.
 
 - The 3 patch series "mm/damon/core: fix wrong and/or useless
   damos_walk() behaviors" from SeongJae Park fixes a few issues with the
   accuracy of kdamond's walking of DAMON regions.
 
 - The 3 patch series "expose mapping wrprotect, fix fb_defio use" from
   Lorenzo Stoakes changes the interaction between framebuffer deferred-io
   and core MM.  No functional changes are anticipated - this is
   preparatory work for the future removal of page structure fields.
 
 - The 4 patch series "mm/damon: add support for hugepage_size DAMOS
   filter" from Usama Arif adds a DAMOS filter which permits the filtering
   by huge page sizes.
 
 - The 4 patch series "mm: permit guard regions for file-backed/shmem
   mappings" from Lorenzo Stoakes extends the guard region feature from its
   present "anon mappings only" state.  The feature now covers shmem and
   file-backed mappings.
 
 - The 4 patch series "mm: batched unmap lazyfree large folios during
   reclamation" from Barry Song cleans up and speeds up the unmapping for
   pte-mapped large folios.
 
 - The 18 patch series "reimplement per-vma lock as a refcount" from
   Suren Baghdasaryan puts the vm_lock back into the vma.  Our reasons for
   pulling it out were largely bogus and that change made the code more
   messy.  This patchset provides small (0-10%) improvements on one
   microbenchmark.
 
 - The 5 patch series "Docs/mm/damon: misc DAMOS filters documentation
   fixes and improves" from SeongJae Park does some maintenance work on the
   DAMON docs.
 
 - The 27 patch series "hugetlb/CMA improvements for large systems" from
   Frank van der Linden addresses a pile of issues which have been observed
   when using CMA on large machines.
 
 - The 2 patch series "mm/damon: introduce DAMOS filter type for unmapped
   pages" from SeongJae Park enables users of DMAON/DAMOS to filter my the
   page's mapped/unmapped status.
 
 - The 19 patch series "zsmalloc/zram: there be preemption" from Sergey
   Senozhatsky teaches zram to run its compression and decompression
   operations preemptibly.
 
 - The 12 patch series "selftests/mm: Some cleanups from trying to run
   them" from Brendan Jackman fixes a pile of unrelated issues which
   Brendan encountered while runnimg our selftests.
 
 - The 2 patch series "fs/proc/task_mmu: add guard region bit to pagemap"
   from Lorenzo Stoakes permits userspace to use /proc/pid/pagemap to
   determine whether a particular page is a guard page.
 
 - The 7 patch series "mm, swap: remove swap slot cache" from Kairui Song
   removes the swap slot cache from the allocation path - it simply wasn't
   being effective.
 
 - The 5 patch series "mm: cleanups for device-exclusive entries (hmm)"
   from David Hildenbrand implements a number of unrelated cleanups in this
   code.
 
 - The 5 patch series "mm: Rework generic PTDUMP configs" from Anshuman
   Khandual implements a number of preparatoty cleanups to the
   GENERIC_PTDUMP Kconfig logic.
 
 - The 8 patch series "mm/damon: auto-tune aggregation interval" from
   SeongJae Park implements a feedback-driven automatic tuning feature for
   DAMON's aggregation interval tuning.
 
 - The 5 patch series "Fix lazy mmu mode" from Ryan Roberts fixes some
   issues in powerpc, sparc and x86 lazy MMU implementations.  Ryan did
   this in preparation for implementing lazy mmu mode for arm64 to optimize
   vmalloc.
 
 - The 2 patch series "mm/page_alloc: Some clarifications for migratetype
   fallback" from Brendan Jackman reworks some commentary to make the code
   easier to follow.
 
 - The 3 patch series "page_counter cleanup and size reduction" from
   Shakeel Butt cleans up the page_counter code and fixes a size increase
   which we accidentally added late last year.
 
 - The 3 patch series "Add a command line option that enables control of
   how many threads should be used to allocate huge pages" from Thomas
   Prescher does that.  It allows the careful operator to significantly
   reduce boot time by tuning the parallalization of huge page
   initialization.
 
 - The 3 patch series "Fix calculations in trace_balance_dirty_pages()
   for cgwb" from Tang Yizhou fixes the tracing output from the dirty page
   balancing code.
 
 - The 9 patch series "mm/damon: make allow filters after reject filters
   useful and intuitive" from SeongJae Park improves the handling of allow
   and reject filters.  Behaviour is made more consistent and the
   documention is updated accordingly.
 
 - The 5 patch series "Switch zswap to object read/write APIs" from Yosry
   Ahmed updates zswap to the new object read/write APIs and thus permits
   the removal of some legacy code from zpool and zsmalloc.
 
 - The 6 patch series "Some trivial cleanups for shmem" from Baolin Wang
   does as it claims.
 
 - The 20 patch series "fs/dax: Fix ZONE_DEVICE page reference counts"
   from Alistair Popple regularizes the weird ZONE_DEVICE page refcount
   handling in DAX, permittig the removal of a number of special-case
   checks.
 
 - The 4 patch series "refactor mremap and fix bug" from Lorenzo Stoakes
   is a preparatoty refactoring and cleanup of the mremap() code.
 
 - The 20 patch series "mm: MM owner tracking for large folios (!hugetlb)
   + CONFIG_NO_PAGE_MAPCOUNT" from David Hildenbrand reworks the manner in
   which we determine whether a large folio is known to be mapped
   exclusively into a single MM.
 
 - The 8 patch series "mm/damon: add sysfs dirs for managing DAMOS
   filters based on handling layers" from SeongJae Park adds a couple of
   new sysfs directories to ease the management of DAMON/DAMOS filters.
 
 - The 13 patch series "arch, mm: reduce code duplication in mem_init()"
   from Mike Rapoport consolidates many per-arch implementations of
   mem_init() into code generic code, where that is practical.
 
 - The 13 patch series "mm/damon/sysfs: commit parameters online via
   damon_call()" from SeongJae Park continues the cleaning up of sysfs
   access to DAMON internal data.
 
 - The 3 patch series "mm: page_ext: Introduce new iteration API" from
   Luiz Capitulino reworks the page_ext initialization to fix a boot-time
   crash which was observed with an unusual combination of compile and
   cmdline options.
 
 - The 8 patch series "Buddy allocator like (or non-uniform) folio split"
   from Zi Yan reworks the code to split a folio into smaller folios.  The
   main benefit is lessened memory consumption: fewer post-split folios are
   generated.
 
 - The 2 patch series "Minimize xa_node allocation during xarry split"
   from Zi Yan reduces the number of xarray xa_nodes which are generated
   during an xarray split.
 
 - The 2 patch series "drivers/base/memory: Two cleanups" from Gavin Shan
   performs some maintenance work on the drivers/base/memory code.
 
 - The 3 patch series "Add tracepoints for lowmem reserves, watermarks
   and totalreserve_pages" from Martin Liu adds some more tracepoints to
   the page allocator code.
 
 - The 4 patch series "mm/madvise: cleanup requests validations and
   classifications" from SeongJae Park cleans up some warts which SeongJae
   observed during his earlier madvise work.
 
 - The 3 patch series "mm/hwpoison: Fix regressions in memory failure
   handling" from Shuai Xue addresses two quite serious regressions which
   Shuai has observed in the memory-failure implementation.
 
 - The 5 patch series "mm: reliable huge page allocator" from Johannes
   Weiner makes huge page allocations cheaper and more reliable by reducing
   fragmentation.
 
 - The 5 patch series "Minor memcg cleanups & prep for memdescs" from
   Matthew Wilcox is preparatory work for the future implementation of
   memdescs.
 
 - The 4 patch series "track memory used by balloon drivers" from Nico
   Pache introduces a way to track memory used by our various balloon
   drivers.
 
 - The 2 patch series "mm/damon: introduce DAMOS filter type for active
   pages" from Nhat Pham permits users to filter for active/inactive pages,
   separately for file and anon pages.
 
 - The 2 patch series "Adding Proactive Memory Reclaim Statistics" from
   Hao Jia separates the proactive reclaim statistics from the direct
   reclaim statistics.
 
 - The 2 patch series "mm/vmscan: don't try to reclaim hwpoison folio"
   from Jinjiang Tu fixes our handling of hwpoisoned pages within the
   reclaim code.
 -----BEGIN PGP SIGNATURE-----
 
 iHQEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ+nZaAAKCRDdBJ7gKXxA
 jsOWAPiP4r7CJHMZRK4eyJOkvS1a1r+TsIarrFZtjwvf/GIfAQCEG+JDxVfUaUSF
 Ee93qSSLR1BkNdDw+931Pu0mXfbnBw==
 =Pn2K
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2025-03-30-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - The series "Enable strict percpu address space checks" from Uros
   Bizjak uses x86 named address space qualifiers to provide
   compile-time checking of percpu area accesses.

   This has caused a small amount of fallout - two or three issues were
   reported. In all cases the calling code was found to be incorrect.

 - The series "Some cleanup for memcg" from Chen Ridong implements some
   relatively monir cleanups for the memcontrol code.

 - The series "mm: fixes for device-exclusive entries (hmm)" from David
   Hildenbrand fixes a boatload of issues which David found then using
   device-exclusive PTE entries when THP is enabled. More work is
   needed, but this makes thins better - our own HMM selftests now
   succeed.

 - The series "mm: zswap: remove z3fold and zbud" from Yosry Ahmed
   remove the z3fold and zbud implementations. They have been deprecated
   for half a year and nobody has complained.

 - The series "mm: further simplify VMA merge operation" from Lorenzo
   Stoakes implements numerous simplifications in this area. No runtime
   effects are anticipated.

 - The series "mm/madvise: remove redundant mmap_lock operations from
   process_madvise()" from SeongJae Park rationalizes the locking in the
   madvise() implementation. Performance gains of 20-25% were observed
   in one MADV_DONTNEED microbenchmark.

 - The series "Tiny cleanup and improvements about SWAP code" from
   Baoquan He contains a number of touchups to issues which Baoquan
   noticed when working on the swap code.

 - The series "mm: kmemleak: Usability improvements" from Catalin
   Marinas implements a couple of improvements to the kmemleak
   user-visible output.

 - The series "mm/damon/paddr: fix large folios access and schemes
   handling" from Usama Arif provides a couple of fixes for DAMON's
   handling of large folios.

 - The series "mm/damon/core: fix wrong and/or useless damos_walk()
   behaviors" from SeongJae Park fixes a few issues with the accuracy of
   kdamond's walking of DAMON regions.

 - The series "expose mapping wrprotect, fix fb_defio use" from Lorenzo
   Stoakes changes the interaction between framebuffer deferred-io and
   core MM. No functional changes are anticipated - this is preparatory
   work for the future removal of page structure fields.

 - The series "mm/damon: add support for hugepage_size DAMOS filter"
   from Usama Arif adds a DAMOS filter which permits the filtering by
   huge page sizes.

 - The series "mm: permit guard regions for file-backed/shmem mappings"
   from Lorenzo Stoakes extends the guard region feature from its
   present "anon mappings only" state. The feature now covers shmem and
   file-backed mappings.

 - The series "mm: batched unmap lazyfree large folios during
   reclamation" from Barry Song cleans up and speeds up the unmapping
   for pte-mapped large folios.

 - The series "reimplement per-vma lock as a refcount" from Suren
   Baghdasaryan puts the vm_lock back into the vma. Our reasons for
   pulling it out were largely bogus and that change made the code more
   messy. This patchset provides small (0-10%) improvements on one
   microbenchmark.

 - The series "Docs/mm/damon: misc DAMOS filters documentation fixes and
   improves" from SeongJae Park does some maintenance work on the DAMON
   docs.

 - The series "hugetlb/CMA improvements for large systems" from Frank
   van der Linden addresses a pile of issues which have been observed
   when using CMA on large machines.

 - The series "mm/damon: introduce DAMOS filter type for unmapped pages"
   from SeongJae Park enables users of DMAON/DAMOS to filter my the
   page's mapped/unmapped status.

 - The series "zsmalloc/zram: there be preemption" from Sergey
   Senozhatsky teaches zram to run its compression and decompression
   operations preemptibly.

 - The series "selftests/mm: Some cleanups from trying to run them" from
   Brendan Jackman fixes a pile of unrelated issues which Brendan
   encountered while runnimg our selftests.

 - The series "fs/proc/task_mmu: add guard region bit to pagemap" from
   Lorenzo Stoakes permits userspace to use /proc/pid/pagemap to
   determine whether a particular page is a guard page.

 - The series "mm, swap: remove swap slot cache" from Kairui Song
   removes the swap slot cache from the allocation path - it simply
   wasn't being effective.

 - The series "mm: cleanups for device-exclusive entries (hmm)" from
   David Hildenbrand implements a number of unrelated cleanups in this
   code.

 - The series "mm: Rework generic PTDUMP configs" from Anshuman Khandual
   implements a number of preparatoty cleanups to the GENERIC_PTDUMP
   Kconfig logic.

 - The series "mm/damon: auto-tune aggregation interval" from SeongJae
   Park implements a feedback-driven automatic tuning feature for
   DAMON's aggregation interval tuning.

 - The series "Fix lazy mmu mode" from Ryan Roberts fixes some issues in
   powerpc, sparc and x86 lazy MMU implementations. Ryan did this in
   preparation for implementing lazy mmu mode for arm64 to optimize
   vmalloc.

 - The series "mm/page_alloc: Some clarifications for migratetype
   fallback" from Brendan Jackman reworks some commentary to make the
   code easier to follow.

 - The series "page_counter cleanup and size reduction" from Shakeel
   Butt cleans up the page_counter code and fixes a size increase which
   we accidentally added late last year.

 - The series "Add a command line option that enables control of how
   many threads should be used to allocate huge pages" from Thomas
   Prescher does that. It allows the careful operator to significantly
   reduce boot time by tuning the parallalization of huge page
   initialization.

 - The series "Fix calculations in trace_balance_dirty_pages() for cgwb"
   from Tang Yizhou fixes the tracing output from the dirty page
   balancing code.

 - The series "mm/damon: make allow filters after reject filters useful
   and intuitive" from SeongJae Park improves the handling of allow and
   reject filters. Behaviour is made more consistent and the documention
   is updated accordingly.

 - The series "Switch zswap to object read/write APIs" from Yosry Ahmed
   updates zswap to the new object read/write APIs and thus permits the
   removal of some legacy code from zpool and zsmalloc.

 - The series "Some trivial cleanups for shmem" from Baolin Wang does as
   it claims.

 - The series "fs/dax: Fix ZONE_DEVICE page reference counts" from
   Alistair Popple regularizes the weird ZONE_DEVICE page refcount
   handling in DAX, permittig the removal of a number of special-case
   checks.

 - The series "refactor mremap and fix bug" from Lorenzo Stoakes is a
   preparatoty refactoring and cleanup of the mremap() code.

 - The series "mm: MM owner tracking for large folios (!hugetlb) +
   CONFIG_NO_PAGE_MAPCOUNT" from David Hildenbrand reworks the manner in
   which we determine whether a large folio is known to be mapped
   exclusively into a single MM.

 - The series "mm/damon: add sysfs dirs for managing DAMOS filters based
   on handling layers" from SeongJae Park adds a couple of new sysfs
   directories to ease the management of DAMON/DAMOS filters.

 - The series "arch, mm: reduce code duplication in mem_init()" from
   Mike Rapoport consolidates many per-arch implementations of
   mem_init() into code generic code, where that is practical.

 - The series "mm/damon/sysfs: commit parameters online via
   damon_call()" from SeongJae Park continues the cleaning up of sysfs
   access to DAMON internal data.

 - The series "mm: page_ext: Introduce new iteration API" from Luiz
   Capitulino reworks the page_ext initialization to fix a boot-time
   crash which was observed with an unusual combination of compile and
   cmdline options.

 - The series "Buddy allocator like (or non-uniform) folio split" from
   Zi Yan reworks the code to split a folio into smaller folios. The
   main benefit is lessened memory consumption: fewer post-split folios
   are generated.

 - The series "Minimize xa_node allocation during xarry split" from Zi
   Yan reduces the number of xarray xa_nodes which are generated during
   an xarray split.

 - The series "drivers/base/memory: Two cleanups" from Gavin Shan
   performs some maintenance work on the drivers/base/memory code.

 - The series "Add tracepoints for lowmem reserves, watermarks and
   totalreserve_pages" from Martin Liu adds some more tracepoints to the
   page allocator code.

 - The series "mm/madvise: cleanup requests validations and
   classifications" from SeongJae Park cleans up some warts which
   SeongJae observed during his earlier madvise work.

 - The series "mm/hwpoison: Fix regressions in memory failure handling"
   from Shuai Xue addresses two quite serious regressions which Shuai
   has observed in the memory-failure implementation.

 - The series "mm: reliable huge page allocator" from Johannes Weiner
   makes huge page allocations cheaper and more reliable by reducing
   fragmentation.

 - The series "Minor memcg cleanups & prep for memdescs" from Matthew
   Wilcox is preparatory work for the future implementation of memdescs.

 - The series "track memory used by balloon drivers" from Nico Pache
   introduces a way to track memory used by our various balloon drivers.

 - The series "mm/damon: introduce DAMOS filter type for active pages"
   from Nhat Pham permits users to filter for active/inactive pages,
   separately for file and anon pages.

 - The series "Adding Proactive Memory Reclaim Statistics" from Hao Jia
   separates the proactive reclaim statistics from the direct reclaim
   statistics.

 - The series "mm/vmscan: don't try to reclaim hwpoison folio" from
   Jinjiang Tu fixes our handling of hwpoisoned pages within the reclaim
   code.

* tag 'mm-stable-2025-03-30-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (431 commits)
  mm/page_alloc: remove unnecessary __maybe_unused in order_to_pindex()
  x86/mm: restore early initialization of high_memory for 32-bits
  mm/vmscan: don't try to reclaim hwpoison folio
  mm/hwpoison: introduce folio_contain_hwpoisoned_page() helper
  cgroup: docs: add pswpin and pswpout items in cgroup v2 doc
  mm: vmscan: split proactive reclaim statistics from direct reclaim statistics
  selftests/mm: speed up split_huge_page_test
  selftests/mm: uffd-unit-tests support for hugepages > 2M
  docs/mm/damon/design: document active DAMOS filter type
  mm/damon: implement a new DAMOS filter type for active pages
  fs/dax: don't disassociate zero page entries
  MM documentation: add "Unaccepted" meminfo entry
  selftests/mm: add commentary about 9pfs bugs
  fork: use __vmalloc_node() for stack allocation
  docs/mm: Physical Memory: Populate the "Zones" section
  xen: balloon: update the NR_BALLOON_PAGES state
  hv_balloon: update the NR_BALLOON_PAGES state
  balloon_compaction: update the NR_BALLOON_PAGES state
  meminfo: add a per node counter for balloon drivers
  mm: remove references to folio in __memcg_kmem_uncharge_page()
  ...
2025-04-01 09:29:18 -07:00