Commit graph

13952 commits

Author SHA1 Message Date
Linus Torvalds
352af6a011 Rust changes for v6.17
Toolchain and infrastructure:
 
  - Enable a set of Clippy lints: 'ptr_as_ptr', 'ptr_cast_constness',
    'as_ptr_cast_mut', 'as_underscore', 'cast_lossless' and 'ref_as_ptr'.
 
    These are intended to avoid type casts with the 'as' operator, which
    are quite powerful, into restricted variants that are less powerful
    and thus should help to avoid mistakes.
 
  - Remove the 'author' key now that most instances were moved to the
    plural one in the previous cycle.
 
 'kernel' crate:
 
  - New 'bug' module: add 'warn_on!' macro which reuses the existing
    'BUG'/'WARN' infrastructure, i.e. it respects the usual sysctls and
    kernel parameters:
 
        warn_on!(value == 42);
 
    To avoid duplicating the assembly code, the same strategy is followed
    as for the static branch code in order to share the assembly between
    both C and Rust. This required a few rearrangements on C arch headers
    -- the existing C macros should still generate the same outputs, thus
    no functional change expected there.
 
  - 'workqueue' module: add delayed work items, including a 'DelayedWork'
    struct, a 'impl_has_delayed_work!' macro and an 'enqueue_delayed'
    method, e.g.:
 
        /// Enqueue the struct for execution on the system workqueue,
        /// where its value will be printed 42 jiffies later.
        fn print_later(value: Arc<MyStruct>) {
            let _ = workqueue::system().enqueue_delayed(value, 42);
        }
 
  - New 'bits' module: add support for 'bit' and 'genmask' functions,
    with runtime- and compile-time variants, e.g.:
 
        static_assert!(0b00010000 == bit_u8(4));
        static_assert!(0b00011110 == genmask_u8(1..=4));
 
        assert!(checked_bit_u32(u32::BITS).is_none());
 
  - 'uaccess' module: add 'UserSliceReader::strcpy_into_buf', which reads
    NUL-terminated strings from userspace into a '&CStr'.
 
    Introduce 'UserPtr' newtype, similar in purpose to '__user' in C, to
    minimize mistakes handling userspace pointers, including mixing them
    up with integers and leaking them via the 'Debug' trait. Add it to
    the prelude, too.
 
  - Start preparations for the replacement of our custom 'CStr' type
    with the analogous type in the 'core' standard library. This will
    take place across several cycles to make it easier. For this one,
    it includes a new 'fmt' module, using upstream method names and some
    other cleanups.
 
    Replace 'fmt!' with a re-export, which helps Clippy lint properly,
    and clean up the found 'uninlined-format-args' instances.
 
  - 'dma' module:
 
    - Clarify wording and be consistent in 'coherent' nomenclature.
 
    - Convert the 'read!()' and 'write!()' macros to return a 'Result'.
 
    - Add 'as_slice()', 'write()' methods in 'CoherentAllocation'.
 
    - Expose 'count()' and 'size()' in 'CoherentAllocation' and add the
      corresponding type invariants.
 
    - Implement 'CoherentAllocation::dma_handle_with_offset()'.
 
  - 'time' module:
 
    - Make 'Instant' generic over clock source. This allows the compiler
      to assert that arithmetic expressions involving the 'Instant' use
      'Instants' based on the same clock source.
 
    - Make 'HrTimer' generic over the timer mode. 'HrTimer' timers take a
      'Duration' or an 'Instant' when setting the expiry time, depending
      on the timer mode. With this change, the compiler can check the
      type matches the timer mode.
 
    - Add an abstraction for 'fsleep'. 'fsleep' is a flexible sleep
      function that will select an appropriate sleep method depending on
      the requested sleep time.
 
    - Avoid 64-bit divisions on 32-bit hardware when calculating
      timestamps.
 
    - Seal the 'HrTimerMode' trait. This prevents users of the
      'HrTimerMode' from implementing the trait on their own types.
 
    - Pass the correct timer mode ID to 'hrtimer_start_range_ns()'.
 
  - 'list' module: remove 'OFFSET' constants, allowing to remove pointer
    arithmetic; now 'impl_list_item!' invokes 'impl_has_list_links!' or
    'impl_has_list_links_self_ptr!'. Other simplifications too.
 
  - 'types' module: remove 'ForeignOwnable::PointedTo' in favor of a
    constant, which avoids exposing the type of the opaque pointer, and
    require 'into_foreign' to return non-null.
 
    Remove the 'Either<L, R>' type as well. It is unused, and we want to
    encourage the use of custom enums for concrete use cases.
 
  - 'sync' module: implement 'Borrow' and 'BorrowMut' for 'Arc' types
    to allow them to be used in generic APIs.
 
  - 'alloc' module: implement 'Borrow' and 'BorrowMut' for 'Box<T, A>';
     and 'Borrow', 'BorrowMut' and 'Default' for 'Vec<T, A>'.
 
  - 'Opaque' type: add 'cast_from' method to perform a restricted cast
    that cannot change the inner type and use it in callers of
    'container_of!'. Rename 'raw_get' to 'cast_into' to match it.
 
  - 'rbtree' module: add 'is_empty' method.
 
  - 'sync' module: new 'aref' submodule to hold 'AlwaysRefCounted' and
    'ARef', which are moved from the too general 'types' module which we
    want to reduce or eventually remove. Also fix a safety comment in
    'static_lock_class'.
 
 'pin-init' crate:
 
  - Add 'impl<T, E> [Pin]Init<T, E> for Result<T, E>', so results are now
    (pin-)initializers.
 
  - Add 'Zeroable::init_zeroed()' that delegates to 'init_zeroed()'.
 
  - New 'zeroed()', a safe version of 'mem::zeroed()' and also provide
    it via 'Zeroable::zeroed()'.
 
  - Implement 'Zeroable' for 'Option<&T>', 'Option<&mut T>' and for
    'Option<[unsafe] [extern "abi"] fn(...args...) -> ret>' for '"Rust"'
    and '"C"' ABIs and up to 20 arguments.
 
  - Changed blanket impls of 'Init' and 'PinInit' from 'impl<T, E>
    [Pin]Init<T, E> for T' to 'impl<T> [Pin]Init<T> for T'.
 
  - Renamed 'zeroed()' to 'init_zeroed()'.
 
  - Upstream dev news: improve CI more to deny warnings, use
    '--all-targets'. Check the synchronization status of the two '-next'
    branches in upstream and the kernel.
 
 MAINTAINERS:
 
  - Add Vlastimil Babka, Liam R. Howlett, Uladzislau Rezki and Lorenzo
    Stoakes as reviewers (thanks everyone).
 
 And a few other cleanups and improvements.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPjU5OPd5QIZ9jqqOGXyLc2htIW0FAmiOWREACgkQGXyLc2ht
 IW39Ig/9E0ExSiBgNKdkCOaULMq31wAxnu3iWoVVisFndlh/Inv+JlaLrmA57BCi
 xXgBwVZ1GoMsG8Fzt6gT+gyhGYi8waNd+5KXr/WJZVTaJ9v1KpdvxuCnSz0DjCbk
 GaKfAfxvJ5GAOEwiIIX8X0TFu6kx911DCJY387/VrqZQ7Msh1QSM3tcZeir/EV4w
 lPjUdlOh1FnLJLI9CGuW20d1IhQUP7K3pdoywgJPpCZV0I8QCyMlMqCEael8Tw2S
 r/PzRaQtiIzk5HTx06V8paK+nEn0K2vQXqW2kV56Y6TNm1Zcv6dES/8hCITsISs2
 nwney3vXEwvoZX+YkQRffZddY4i6YenWMrtLgVxZzdshBL3bn6eHqBL04Nfix+p7
 pQe3qMH3G8UBtX1lugBE7RrWGWcz9ARN8sK12ClmpAUnKJOwTpo97kpqXP7pDme8
 Buh/oV3voAMsqwooSbVBzuUUWnbGaQ5Oj6CiiosSadfNh6AxJLYLKHtRLKJHZEw3
 0Ob/1HhoWS6JSvYKVjMyD19qcH7O8ThZE+83CfMAkI4KphXJarWhpSmN4cHkFn/v
 0clQ7Y5m+up9v1XWTaEq0Biqa6CaxLQwm/qW5WU0Y/TiovmvxAFdCwsQqDkRoJNx
 9kNfMJRvNl78KQxrjEDz9gl7/ajgqX1KkqP8CQbGjv29cGzFlVE=
 =5Wt9
 -----END PGP SIGNATURE-----

Merge tag 'rust-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux

Pull Rust updates from Miguel Ojeda:
 "Toolchain and infrastructure:

   - Enable a set of Clippy lints: 'ptr_as_ptr', 'ptr_cast_constness',
     'as_ptr_cast_mut', 'as_underscore', 'cast_lossless' and
     'ref_as_ptr'

     These are intended to avoid type casts with the 'as' operator,
     which are quite powerful, into restricted variants that are less
     powerful and thus should help to avoid mistakes

   - Remove the 'author' key now that most instances were moved to the
     plural one in the previous cycle

  'kernel' crate:

   - New 'bug' module: add 'warn_on!' macro which reuses the existing
     'BUG'/'WARN' infrastructure, i.e. it respects the usual sysctls and
     kernel parameters:

         warn_on!(value == 42);

     To avoid duplicating the assembly code, the same strategy is
     followed as for the static branch code in order to share the
     assembly between both C and Rust

     This required a few rearrangements on C arch headers -- the
     existing C macros should still generate the same outputs, thus no
     functional change expected there

   - 'workqueue' module: add delayed work items, including a
     'DelayedWork' struct, a 'impl_has_delayed_work!' macro and an
     'enqueue_delayed' method, e.g.:

         /// Enqueue the struct for execution on the system workqueue,
         /// where its value will be printed 42 jiffies later.
         fn print_later(value: Arc<MyStruct>) {
             let _ = workqueue::system().enqueue_delayed(value, 42);
         }

   - New 'bits' module: add support for 'bit' and 'genmask' functions,
     with runtime- and compile-time variants, e.g.:

         static_assert!(0b00010000 == bit_u8(4));
         static_assert!(0b00011110 == genmask_u8(1..=4));

         assert!(checked_bit_u32(u32::BITS).is_none());

   - 'uaccess' module: add 'UserSliceReader::strcpy_into_buf', which
     reads NUL-terminated strings from userspace into a '&CStr'

     Introduce 'UserPtr' newtype, similar in purpose to '__user' in C,
     to minimize mistakes handling userspace pointers, including mixing
     them up with integers and leaking them via the 'Debug' trait. Add
     it to the prelude, too

   - Start preparations for the replacement of our custom 'CStr' type
     with the analogous type in the 'core' standard library. This will
     take place across several cycles to make it easier. For this one,
     it includes a new 'fmt' module, using upstream method names and
     some other cleanups

     Replace 'fmt!' with a re-export, which helps Clippy lint properly,
     and clean up the found 'uninlined-format-args' instances

   - 'dma' module:

      - Clarify wording and be consistent in 'coherent' nomenclature

      - Convert the 'read!()' and 'write!()' macros to return a 'Result'

      - Add 'as_slice()', 'write()' methods in 'CoherentAllocation'

      - Expose 'count()' and 'size()' in 'CoherentAllocation' and add
        the corresponding type invariants

      - Implement 'CoherentAllocation::dma_handle_with_offset()'

   - 'time' module:

      - Make 'Instant' generic over clock source. This allows the
        compiler to assert that arithmetic expressions involving the
        'Instant' use 'Instants' based on the same clock source

      - Make 'HrTimer' generic over the timer mode. 'HrTimer' timers
        take a 'Duration' or an 'Instant' when setting the expiry time,
        depending on the timer mode. With this change, the compiler can
        check the type matches the timer mode

      - Add an abstraction for 'fsleep'. 'fsleep' is a flexible sleep
        function that will select an appropriate sleep method depending
        on the requested sleep time

      - Avoid 64-bit divisions on 32-bit hardware when calculating
        timestamps

      - Seal the 'HrTimerMode' trait. This prevents users of the
        'HrTimerMode' from implementing the trait on their own types

      - Pass the correct timer mode ID to 'hrtimer_start_range_ns()'

   - 'list' module: remove 'OFFSET' constants, allowing to remove
     pointer arithmetic; now 'impl_list_item!' invokes
     'impl_has_list_links!' or 'impl_has_list_links_self_ptr!'. Other
     simplifications too

   - 'types' module: remove 'ForeignOwnable::PointedTo' in favor of a
     constant, which avoids exposing the type of the opaque pointer, and
     require 'into_foreign' to return non-null

     Remove the 'Either<L, R>' type as well. It is unused, and we want
     to encourage the use of custom enums for concrete use cases

   - 'sync' module: implement 'Borrow' and 'BorrowMut' for 'Arc' types
     to allow them to be used in generic APIs

   - 'alloc' module: implement 'Borrow' and 'BorrowMut' for 'Box<T, A>';
     and 'Borrow', 'BorrowMut' and 'Default' for 'Vec<T, A>'

   - 'Opaque' type: add 'cast_from' method to perform a restricted cast
     that cannot change the inner type and use it in callers of
     'container_of!'. Rename 'raw_get' to 'cast_into' to match it

   - 'rbtree' module: add 'is_empty' method

   - 'sync' module: new 'aref' submodule to hold 'AlwaysRefCounted' and
     'ARef', which are moved from the too general 'types' module which
     we want to reduce or eventually remove. Also fix a safety comment
     in 'static_lock_class'

  'pin-init' crate:

   - Add 'impl<T, E> [Pin]Init<T, E> for Result<T, E>', so results are
     now (pin-)initializers

   - Add 'Zeroable::init_zeroed()' that delegates to 'init_zeroed()'

   - New 'zeroed()', a safe version of 'mem::zeroed()' and also provide
     it via 'Zeroable::zeroed()'

   - Implement 'Zeroable' for 'Option<&T>', 'Option<&mut T>' and for
     'Option<[unsafe] [extern "abi"] fn(...args...) -> ret>' for
     '"Rust"' and '"C"' ABIs and up to 20 arguments

   - Changed blanket impls of 'Init' and 'PinInit' from 'impl<T, E>
     [Pin]Init<T, E> for T' to 'impl<T> [Pin]Init<T> for T'

   - Renamed 'zeroed()' to 'init_zeroed()'

   - Upstream dev news: improve CI more to deny warnings, use
     '--all-targets'. Check the synchronization status of the two
     '-next' branches in upstream and the kernel

  MAINTAINERS:

   - Add Vlastimil Babka, Liam R. Howlett, Uladzislau Rezki and Lorenzo
     Stoakes as reviewers (thanks everyone)

  And a few other cleanups and improvements"

* tag 'rust-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux: (76 commits)
  rust: Add warn_on macro
  arm64/bug: Add ARCH_WARN_ASM macro for BUG/WARN asm code sharing with Rust
  riscv/bug: Add ARCH_WARN_ASM macro for BUG/WARN asm code sharing with Rust
  x86/bug: Add ARCH_WARN_ASM macro for BUG/WARN asm code sharing with Rust
  rust: kernel: move ARef and AlwaysRefCounted to sync::aref
  rust: sync: fix safety comment for `static_lock_class`
  rust: types: remove `Either<L, R>`
  rust: kernel: use `core::ffi::CStr` method names
  rust: str: add `CStr` methods matching `core::ffi::CStr`
  rust: str: remove unnecessary qualification
  rust: use `kernel::{fmt,prelude::fmt!}`
  rust: kernel: add `fmt` module
  rust: kernel: remove `fmt!`, fix clippy::uninlined-format-args
  scripts: rust: emit path candidates in panic message
  scripts: rust: replace length checks with match
  rust: list: remove nonexistent generic parameter in link
  rust: bits: add support for bits/genmask macros
  rust: list: remove OFFSET constants
  rust: list: add `impl_list_item!` examples
  rust: list: use fully qualified path
  ...
2025-08-03 13:49:10 -07:00
Linus Torvalds
a6923c06a3 bpf-fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmiNNksACgkQ6rmadz2v
 bTrKRhAAnju4bbFRHU88Y68p6Meq/jxgjxHZAkTqZA0Nvbu2cItPRL7XHAAhTWE7
 OBEIm3UKCH4gs4fY8rDHiIgnnaQavXUmvXZblOIOjxnqRKJpU3px+wwJvGFq5Enq
 WP6UZV8tj+O2tNfNNYS+mgQvvIpUISHGpKimvx7ede3e1U3cJBkppbT3gooMHYuc
 5s1QtYHWaPY/1DpkHgqJ2UPGcbT9/HSPGMHRNaHKjQTcNcLcrj7RRjchgXqcc7Vs
 hVijvVrLiuK0MyU42ritmaqvjjgD6hKPZguRQe2/hAtrOo0Alf+4mXkMgam7simN
 iHfGc7nhw1xAFTPj4WXahja89G00FdDN5NR37Rgurm/i2fY7BuXAkMjiMiwGB3C3
 jk2wG3RSifYeC2rxhkYJdqcx8Cz6m+pjgyJ2o9Jy5dn426VXg/kzkUXpl6u5jaPZ
 SmKoo9Xu1r7xqTaUc9kk8pJI5Xt9vD5oQjF2KQuPZXxNidiwW6k2OGbW+wF26nEi
 Q6pfDu3pvHAd/UE6cD5yFe97o3Cc2XfGwI/Sv2k99UVPvNcvfAvVo9fsItHBhCPn
 zHkihW2S0zmbBlhcrB+PrLclNgLleP9JukFN+5scc0a9lbQxIm6v2TNKGlBfDQtO
 I+Kn266oqT4BEgnQGlCQquINnQAdmS8VMnnunGOu6+rwPUtkI7E=
 =XLHS
 -----END PGP SIGNATURE-----

Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Alexei Starovoitov:

 - Fix kCFI failures in JITed BPF code on arm64 (Sami Tolvanen, Puranjay
   Mohan, Mark Rutland, Maxwell Bland)

 - Disallow tail calls between BPF programs that use different cgroup
   local storage maps to prevent out-of-bounds access (Daniel Borkmann)

 - Fix unaligned access in flow_dissector and netfilter BPF programs
   (Paul Chaignon)

 - Avoid possible use of uninitialized mod_len in libbpf (Achill
   Gilgenast)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: Test for unaligned flow_dissector ctx access
  bpf: Improve ctx access verifier error message
  bpf: Check netfilter ctx accesses are aligned
  bpf: Check flow_dissector ctx accesses are aligned
  arm64/cfi,bpf: Support kCFI + BPF on arm64
  cfi: Move BPF CFI types and helpers to generic code
  cfi: add C CFI type macro
  libbpf: Avoid possible use of uninitialized mod_len
  bpf: Fix oob access in cgroup local storage
  bpf: Move cgroup iterator helpers to bpf.h
  bpf: Move bpf map owner out of common struct
  bpf: Add cookie object to bpf maps
2025-08-01 17:13:26 -07:00
Sami Tolvanen
f1befc82ad cfi: Move BPF CFI types and helpers to generic code
Instead of duplicating the same code for each architecture, move
the CFI type hash variables for BPF function types and related
helper functions to generic CFI code, and allow architectures to
override the function definitions if needed.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20250801001004.1859976-7-samitolvanen@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-31 18:23:53 -07:00
Linus Torvalds
beace86e61 Summary of significant series in this pull request:
- The 4 patch series "mm: ksm: prevent KSM from breaking merging of new
   VMAs" from Lorenzo Stoakes addresses an issue with KSM's
   PR_SET_MEMORY_MERGE mode: newly mapped VMAs were not eligible for
   merging with existing adjacent VMAs.
 
 - The 4 patch series "mm/damon: introduce DAMON_STAT for simple and
   practical access monitoring" from SeongJae Park adds a new kernel module
   which simplifies the setup and usage of DAMON in production
   environments.
 
 - The 6 patch series "stop passing a writeback_control to swap/shmem
   writeout" from Christoph Hellwig is a cleanup to the writeback code
   which removes a couple of pointers from struct writeback_control.
 
 - The 7 patch series "drivers/base/node.c: optimization and cleanups"
   from Donet Tom contains largely uncorrelated cleanups to the NUMA node
   setup and management code.
 
 - The 4 patch series "mm: userfaultfd: assorted fixes and cleanups" from
   Tal Zussman does some maintenance work on the userfaultfd code.
 
 - The 5 patch series "Readahead tweaks for larger folios" from Ryan
   Roberts implements some tuneups for pagecache readahead when it is
   reading into order>0 folios.
 
 - The 4 patch series "selftests/mm: Tweaks to the cow test" from Mark
   Brown provides some cleanups and consistency improvements to the
   selftests code.
 
 - The 4 patch series "Optimize mremap() for large folios" from Dev Jain
   does that.  A 37% reduction in execution time was measured in a
   memset+mremap+munmap microbenchmark.
 
 - The 5 patch series "Remove zero_user()" from Matthew Wilcox expunges
   zero_user() in favor of the more modern memzero_page().
 
 - The 3 patch series "mm/huge_memory: vmf_insert_folio_*() and
   vmf_insert_pfn_pud() fixes" from David Hildenbrand addresses some warts
   which David noticed in the huge page code.  These were not known to be
   causing any issues at this time.
 
 - The 3 patch series "mm/damon: use alloc_migrate_target() for
   DAMOS_MIGRATE_{HOT,COLD" from SeongJae Park provides some cleanup and
   consolidation work in DAMON.
 
 - The 3 patch series "use vm_flags_t consistently" from Lorenzo Stoakes
   uses vm_flags_t in places where we were inappropriately using other
   types.
 
 - The 3 patch series "mm/memfd: Reserve hugetlb folios before
   allocation" from Vivek Kasireddy increases the reliability of large page
   allocation in the memfd code.
 
 - The 14 patch series "mm: Remove pXX_devmap page table bit and pfn_t
   type" from Alistair Popple removes several now-unneeded PFN_* flags.
 
 - The 5 patch series "mm/damon: decouple sysfs from core" from SeongJae
   Park implememnts some cleanup and maintainability work in the DAMON
   sysfs layer.
 
 - The 5 patch series "madvise cleanup" from Lorenzo Stoakes does quite a
   lot of cleanup/maintenance work in the madvise() code.
 
 - The 4 patch series "madvise anon_name cleanups" from Vlastimil Babka
   provides additional cleanups on top or Lorenzo's effort.
 
 - The 11 patch series "Implement numa node notifier" from Oscar Salvador
   creates a standalone notifier for NUMA node memory state changes.
   Previously these were lumped under the more general memory on/offline
   notifier.
 
 - The 6 patch series "Make MIGRATE_ISOLATE a standalone bit" from Zi Yan
   cleans up the pageblock isolation code and fixes a potential issue which
   doesn't seem to cause any problems in practice.
 
 - The 5 patch series "selftests/damon: add python and drgn based DAMON
   sysfs functionality tests" from SeongJae Park adds additional drgn- and
   python-based DAMON selftests which are more comprehensive than the
   existing selftest suite.
 
 - The 5 patch series "Misc rework on hugetlb faulting path" from Oscar
   Salvador fixes a rather obscure deadlock in the hugetlb fault code and
   follows that fix with a series of cleanups.
 
 - The 3 patch series "cma: factor out allocation logic from
   __cma_declare_contiguous_nid" from Mike Rapoport rationalizes and cleans
   up the highmem-specific code in the CMA allocator.
 
 - The 28 patch series "mm/migration: rework movable_ops page migration
   (part 1)" from David Hildenbrand provides cleanups and
   future-preparedness to the migration code.
 
 - The 2 patch series "mm/damon: add trace events for auto-tuned
   monitoring intervals and DAMOS quota" from SeongJae Park adds some
   tracepoints to some DAMON auto-tuning code.
 
 - The 6 patch series "mm/damon: fix misc bugs in DAMON modules" from
   SeongJae Park does that.
 
 - The 6 patch series "mm/damon: misc cleanups" from SeongJae Park also
   does what it claims.
 
 - The 4 patch series "mm: folio_pte_batch() improvements" from David
   Hildenbrand cleans up the large folio PTE batching code.
 
 - The 13 patch series "mm/damon/vaddr: Allow interleaving in
   migrate_{hot,cold} actions" from SeongJae Park facilitates dynamic
   alteration of DAMON's inter-node allocation policy.
 
 - The 3 patch series "Remove unmap_and_put_page()" from Vishal Moola
   provides a couple of page->folio conversions.
 
 - The 4 patch series "mm: per-node proactive reclaim" from Davidlohr
   Bueso implements a per-node control of proactive reclaim - beyond the
   current memcg-based implementation.
 
 - The 14 patch series "mm/damon: remove damon_callback" from SeongJae
   Park replaces the damon_callback interface with a more general and
   powerful damon_call()+damos_walk() interface.
 
 - The 10 patch series "mm/mremap: permit mremap() move of multiple VMAs"
   from Lorenzo Stoakes implements a number of mremap cleanups (of course)
   in preparation for adding new mremap() functionality: newly permit the
   remapping of multiple VMAs when the user is specifying MREMAP_FIXED.  It
   still excludes some specialized situations where this cannot be
   performed reliably.
 
 - The 3 patch series "drop hugetlb_free_pgd_range()" from Anthony Yznaga
   switches some sparc hugetlb code over to the generic version and removes
   the thus-unneeded hugetlb_free_pgd_range().
 
 - The 4 patch series "mm/damon/sysfs: support periodic and automated
   stats update" from SeongJae Park augments the present
   userspace-requested update of DAMON sysfs monitoring files.  Automatic
   update is now provided, along with a tunable to control the update
   interval.
 
 - The 4 patch series "Some randome fixes and cleanups to swapfile" from
   Kemeng Shi does what is claims.
 
 - The 4 patch series "mm: introduce snapshot_page" from Luiz Capitulino
   and David Hildenbrand provides (and uses) a means by which debug-style
   functions can grab a copy of a pageframe and inspect it locklessly
   without tripping over the races inherent in operating on the live
   pageframe directly.
 
 - The 6 patch series "use per-vma locks for /proc/pid/maps reads" from
   Suren Baghdasaryan addresses the large contention issues which can be
   triggered by reads from that procfs file.  Latencies are reduced by more
   than half in some situations.  The series also introduces several new
   selftests for the /proc/pid/maps interface.
 
 - The 6 patch series "__folio_split() clean up" from Zi Yan cleans up
   __folio_split()!
 
 - The 7 patch series "Optimize mprotect() for large folios" from Dev
   Jain provides some quite large (>3x) speedups to mprotect() when dealing
   with large folios.
 
 - The 2 patch series "selftests/mm: reuse FORCE_READ to replace "asm
   volatile("" : "+r" (XXX));" and some cleanup" from wang lian does some
   cleanup work in the selftests code.
 
 - The 3 patch series "tools/testing: expand mremap testing" from Lorenzo
   Stoakes extends the mremap() selftest in several ways, including adding
   more checking of Lorenzo's recently added "permit mremap() move of
   multiple VMAs" feature.
 
 - The 22 patch series "selftests/damon/sysfs.py: test all parameters"
   from SeongJae Park extends the DAMON sysfs interface selftest so that it
   tests all possible user-requested parameters.  Rather than the present
   minimal subset.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaIqcCgAKCRDdBJ7gKXxA
 jkVBAQCCn9DR1QP0CRk961ot0cKzOgioSc0aA03DPb2KXRt2kQEAzDAz0ARurFhL
 8BzbvI0c+4tntHLXvIlrC33n9KWAOQM=
 =XsFy
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2025-07-30-15-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:
 "As usual, many cleanups. The below blurbiage describes 42 patchsets.
  21 of those are partially or fully cleanup work. "cleans up",
  "cleanup", "maintainability", "rationalizes", etc.

  I never knew the MM code was so dirty.

  "mm: ksm: prevent KSM from breaking merging of new VMAs" (Lorenzo Stoakes)
     addresses an issue with KSM's PR_SET_MEMORY_MERGE mode: newly
     mapped VMAs were not eligible for merging with existing adjacent
     VMAs.

  "mm/damon: introduce DAMON_STAT for simple and practical access monitoring" (SeongJae Park)
     adds a new kernel module which simplifies the setup and usage of
     DAMON in production environments.

  "stop passing a writeback_control to swap/shmem writeout" (Christoph Hellwig)
     is a cleanup to the writeback code which removes a couple of
     pointers from struct writeback_control.

  "drivers/base/node.c: optimization and cleanups" (Donet Tom)
     contains largely uncorrelated cleanups to the NUMA node setup and
     management code.

  "mm: userfaultfd: assorted fixes and cleanups" (Tal Zussman)
     does some maintenance work on the userfaultfd code.

  "Readahead tweaks for larger folios" (Ryan Roberts)
     implements some tuneups for pagecache readahead when it is reading
     into order>0 folios.

  "selftests/mm: Tweaks to the cow test" (Mark Brown)
     provides some cleanups and consistency improvements to the
     selftests code.

  "Optimize mremap() for large folios" (Dev Jain)
     does that. A 37% reduction in execution time was measured in a
     memset+mremap+munmap microbenchmark.

  "Remove zero_user()" (Matthew Wilcox)
     expunges zero_user() in favor of the more modern memzero_page().

  "mm/huge_memory: vmf_insert_folio_*() and vmf_insert_pfn_pud() fixes" (David Hildenbrand)
     addresses some warts which David noticed in the huge page code.
     These were not known to be causing any issues at this time.

  "mm/damon: use alloc_migrate_target() for DAMOS_MIGRATE_{HOT,COLD" (SeongJae Park)
     provides some cleanup and consolidation work in DAMON.

  "use vm_flags_t consistently" (Lorenzo Stoakes)
     uses vm_flags_t in places where we were inappropriately using other
     types.

  "mm/memfd: Reserve hugetlb folios before allocation" (Vivek Kasireddy)
     increases the reliability of large page allocation in the memfd
     code.

  "mm: Remove pXX_devmap page table bit and pfn_t type" (Alistair Popple)
     removes several now-unneeded PFN_* flags.

  "mm/damon: decouple sysfs from core" (SeongJae Park)
     implememnts some cleanup and maintainability work in the DAMON
     sysfs layer.

  "madvise cleanup" (Lorenzo Stoakes)
     does quite a lot of cleanup/maintenance work in the madvise() code.

  "madvise anon_name cleanups" (Vlastimil Babka)
     provides additional cleanups on top or Lorenzo's effort.

  "Implement numa node notifier" (Oscar Salvador)
     creates a standalone notifier for NUMA node memory state changes.
     Previously these were lumped under the more general memory
     on/offline notifier.

  "Make MIGRATE_ISOLATE a standalone bit" (Zi Yan)
     cleans up the pageblock isolation code and fixes a potential issue
     which doesn't seem to cause any problems in practice.

  "selftests/damon: add python and drgn based DAMON sysfs functionality tests" (SeongJae Park)
     adds additional drgn- and python-based DAMON selftests which are
     more comprehensive than the existing selftest suite.

  "Misc rework on hugetlb faulting path" (Oscar Salvador)
     fixes a rather obscure deadlock in the hugetlb fault code and
     follows that fix with a series of cleanups.

  "cma: factor out allocation logic from __cma_declare_contiguous_nid" (Mike Rapoport)
     rationalizes and cleans up the highmem-specific code in the CMA
     allocator.

  "mm/migration: rework movable_ops page migration (part 1)" (David Hildenbrand)
     provides cleanups and future-preparedness to the migration code.

  "mm/damon: add trace events for auto-tuned monitoring intervals and DAMOS quota" (SeongJae Park)
     adds some tracepoints to some DAMON auto-tuning code.

  "mm/damon: fix misc bugs in DAMON modules" (SeongJae Park)
     does that.

  "mm/damon: misc cleanups" (SeongJae Park)
     also does what it claims.

  "mm: folio_pte_batch() improvements" (David Hildenbrand)
     cleans up the large folio PTE batching code.

  "mm/damon/vaddr: Allow interleaving in migrate_{hot,cold} actions" (SeongJae Park)
     facilitates dynamic alteration of DAMON's inter-node allocation
     policy.

  "Remove unmap_and_put_page()" (Vishal Moola)
     provides a couple of page->folio conversions.

  "mm: per-node proactive reclaim" (Davidlohr Bueso)
     implements a per-node control of proactive reclaim - beyond the
     current memcg-based implementation.

  "mm/damon: remove damon_callback" (SeongJae Park)
     replaces the damon_callback interface with a more general and
     powerful damon_call()+damos_walk() interface.

  "mm/mremap: permit mremap() move of multiple VMAs" (Lorenzo Stoakes)
     implements a number of mremap cleanups (of course) in preparation
     for adding new mremap() functionality: newly permit the remapping
     of multiple VMAs when the user is specifying MREMAP_FIXED. It still
     excludes some specialized situations where this cannot be performed
     reliably.

  "drop hugetlb_free_pgd_range()" (Anthony Yznaga)
     switches some sparc hugetlb code over to the generic version and
     removes the thus-unneeded hugetlb_free_pgd_range().

  "mm/damon/sysfs: support periodic and automated stats update" (SeongJae Park)
     augments the present userspace-requested update of DAMON sysfs
     monitoring files. Automatic update is now provided, along with a
     tunable to control the update interval.

  "Some randome fixes and cleanups to swapfile" (Kemeng Shi)
     does what is claims.

  "mm: introduce snapshot_page" (Luiz Capitulino and David Hildenbrand)
     provides (and uses) a means by which debug-style functions can grab
     a copy of a pageframe and inspect it locklessly without tripping
     over the races inherent in operating on the live pageframe
     directly.

  "use per-vma locks for /proc/pid/maps reads" (Suren Baghdasaryan)
     addresses the large contention issues which can be triggered by
     reads from that procfs file. Latencies are reduced by more than
     half in some situations. The series also introduces several new
     selftests for the /proc/pid/maps interface.

  "__folio_split() clean up" (Zi Yan)
     cleans up __folio_split()!

  "Optimize mprotect() for large folios" (Dev Jain)
     provides some quite large (>3x) speedups to mprotect() when dealing
     with large folios.

  "selftests/mm: reuse FORCE_READ to replace "asm volatile("" : "+r" (XXX));" and some cleanup" (wang lian)
     does some cleanup work in the selftests code.

  "tools/testing: expand mremap testing" (Lorenzo Stoakes)
     extends the mremap() selftest in several ways, including adding
     more checking of Lorenzo's recently added "permit mremap() move of
     multiple VMAs" feature.

  "selftests/damon/sysfs.py: test all parameters" (SeongJae Park)
     extends the DAMON sysfs interface selftest so that it tests all
     possible user-requested parameters. Rather than the present minimal
     subset"

* tag 'mm-stable-2025-07-30-15-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (370 commits)
  MAINTAINERS: add missing headers to mempory policy & migration section
  MAINTAINERS: add missing file to cgroup section
  MAINTAINERS: add MM MISC section, add missing files to MISC and CORE
  MAINTAINERS: add missing zsmalloc file
  MAINTAINERS: add missing files to page alloc section
  MAINTAINERS: add missing shrinker files
  MAINTAINERS: move memremap.[ch] to hotplug section
  MAINTAINERS: add missing mm_slot.h file THP section
  MAINTAINERS: add missing interval_tree.c to memory mapping section
  MAINTAINERS: add missing percpu-internal.h file to per-cpu section
  mm/page_alloc: remove trace_mm_alloc_contig_migrate_range_info()
  selftests/damon: introduce _common.sh to host shared function
  selftests/damon/sysfs.py: test runtime reduction of DAMON parameters
  selftests/damon/sysfs.py: test non-default parameters runtime commit
  selftests/damon/sysfs.py: generalize DAMON context commit assertion
  selftests/damon/sysfs.py: generalize monitoring attributes commit assertion
  selftests/damon/sysfs.py: generalize DAMOS schemes commit assertion
  selftests/damon/sysfs.py: test DAMOS filters commitment
  selftests/damon/sysfs.py: generalize DAMOS scheme commit assertion
  selftests/damon/sysfs.py: test DAMOS destinations commitment
  ...
2025-07-31 14:57:54 -07:00
Linus Torvalds
63eb28bb14 ARM:
- Host driver for GICv5, the next generation interrupt controller for
   arm64, including support for interrupt routing, MSIs, interrupt
   translation and wired interrupts.
 
 - Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on
   GICv5 hardware, leveraging the legacy VGIC interface.
 
 - Userspace control of the 'nASSGIcap' GICv3 feature, allowing
   userspace to disable support for SGIs w/o an active state on hardware
   that previously advertised it unconditionally.
 
 - Map supporting endpoints with cacheable memory attributes on systems
   with FEAT_S2FWB and DIC where KVM no longer needs to perform cache
   maintenance on the address range.
 
 - Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the guest
   hypervisor to inject external aborts into an L2 VM and take traps of
   masked external aborts to the hypervisor.
 
 - Convert more system register sanitization to the config-driven
   implementation.
 
 - Fixes to the visibility of EL2 registers, namely making VGICv3 system
   registers accessible through the VGIC device instead of the ONE_REG
   vCPU ioctls.
 
 - Various cleanups and minor fixes.
 
 LoongArch:
 
 - Add stat information for in-kernel irqchip
 
 - Add tracepoints for CPUCFG and CSR emulation exits
 
 - Enhance in-kernel irqchip emulation
 
 - Various cleanups.
 
 RISC-V:
 
 - Enable ring-based dirty memory tracking
 
 - Improve perf kvm stat to report interrupt events
 
 - Delegate illegal instruction trap to VS-mode
 
 - MMU improvements related to upcoming nested virtualization
 
 s390x
 
 - Fixes
 
 x86:
 
 - Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O APIC,
   PIC, and PIT emulation at compile time.
 
 - Share device posted IRQ code between SVM and VMX and
   harden it against bugs and runtime errors.
 
 - Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups O(1)
   instead of O(n).
 
 - For MMIO stale data mitigation, track whether or not a vCPU has access to
   (host) MMIO based on whether the page tables have MMIO pfns mapped; using
   VFIO is prone to false negatives
 
 - Rework the MSR interception code so that the SVM and VMX APIs are more or
   less identical.
 
 - Recalculate all MSR intercepts from scratch on MSR filter changes,
   instead of maintaining shadow bitmaps.
 
 - Advertise support for LKGS (Load Kernel GS base), a new instruction
   that's loosely related to FRED, but is supported and enumerated
   independently.
 
 - Fix a user-triggerable WARN that syzkaller found by setting the vCPU
   in INIT_RECEIVED state (aka wait-for-SIPI), and then putting the vCPU
   into VMX Root Mode (post-VMXON).  Trying to detect every possible path
   leading to architecturally forbidden states is hard and even risks
   breaking userspace (if it goes from valid to valid state but passes
   through invalid states), so just wait until KVM_RUN to detect that
   the vCPU state isn't allowed.
 
 - Add KVM_X86_DISABLE_EXITS_APERFMPERF to allow disabling interception of
   APERF/MPERF reads, so that a "properly" configured VM can access
   APERF/MPERF.  This has many caveats (APERF/MPERF cannot be zeroed
   on vCPU creation or saved/restored on suspend and resume, or preserved
   over thread migration let alone VM migration) but can be useful whenever
   you're interested in letting Linux guests see the effective physical CPU
   frequency in /proc/cpuinfo.
 
 - Reject KVM_SET_TSC_KHZ for vm file descriptors if vCPUs have been
   created, as there's no known use case for changing the default
   frequency for other VM types and it goes counter to the very reason
   why the ioctl was added to the vm file descriptor.  And also, there
   would be no way to make it work for confidential VMs with a "secure"
   TSC, so kill two birds with one stone.
 
 - Dynamically allocation the shadow MMU's hashed page list, and defer
   allocating the hashed list until it's actually needed (the TDP MMU
   doesn't use the list).
 
 - Extract many of KVM's helpers for accessing architectural local APIC
   state to common x86 so that they can be shared by guest-side code for
   Secure AVIC.
 
 - Various cleanups and fixes.
 
 x86 (Intel):
 
 - Preserve the host's DEBUGCTL.FREEZE_IN_SMM when running the guest.
   Failure to honor FREEZE_IN_SMM can leak host state into guests.
 
 - Explicitly check vmcs12.GUEST_DEBUGCTL on nested VM-Enter to prevent
   L1 from running L2 with features that KVM doesn't support, e.g. BTF.
 
 x86 (AMD):
 
 - WARN and reject loading kvm-amd.ko instead of panicking the kernel if the
   nested SVM MSRPM offsets tracker can't handle an MSR (which is pretty
   much a static condition and therefore should never happen, but still).
 
 - Fix a variety of flaws and bugs in the AVIC device posted IRQ code.
 
 - Inhibit AVIC if a vCPU's ID is too big (relative to what hardware
   supports) instead of rejecting vCPU creation.
 
 - Extend enable_ipiv module param support to SVM, by simply leaving
   IsRunning clear in the vCPU's physical ID table entry.
 
 - Disable IPI virtualization, via enable_ipiv, if the CPU is affected by
   erratum #1235, to allow (safely) enabling AVIC on such CPUs.
 
 - Request GA Log interrupts if and only if the target vCPU is blocking,
   i.e. only if KVM needs a notification in order to wake the vCPU.
 
 - Intercept SPEC_CTRL on AMD if the MSR shouldn't exist according to the
   vCPU's CPUID model.
 
 - Accept any SNP policy that is accepted by the firmware with respect to
   SMT and single-socket restrictions.  An incompatible policy doesn't put
   the kernel at risk in any way, so there's no reason for KVM to care.
 
 - Drop a superfluous WBINVD (on all CPUs!) when destroying a VM and
   use WBNOINVD instead of WBINVD when possible for SEV cache maintenance.
 
 - When reclaiming memory from an SEV guest, only do cache flushes on CPUs
   that have ever run a vCPU for the guest, i.e. don't flush the caches for
   CPUs that can't possibly have cache lines with dirty, encrypted data.
 
 Generic:
 
 - Rework irqbypass to track/match producers and consumers via an xarray
   instead of a linked list.  Using a linked list leads to O(n^2) insertion
   times, which is hugely problematic for use cases that create large
   numbers of VMs.  Such use cases typically don't actually use irqbypass,
   but eliminating the pointless registration is a future problem to
   solve as it likely requires new uAPI.
 
 - Track irqbypass's "token" as "struct eventfd_ctx *" instead of a "void *",
   to avoid making a simple concept unnecessarily difficult to understand.
 
 - Decouple device posted IRQs from VFIO device assignment, as binding a VM
   to a VFIO group is not a requirement for enabling device posted IRQs.
 
 - Clean up and document/comment the irqfd assignment code.
 
 - Disallow binding multiple irqfds to an eventfd with a priority waiter,
   i.e.  ensure an eventfd is bound to at most one irqfd through the entire
   host, and add a selftest to verify eventfd:irqfd bindings are globally
   unique.
 
 - Add a tracepoint for KVM_SET_MEMORY_ATTRIBUTES to help debug issues
   related to private <=> shared memory conversions.
 
 - Drop guest_memfd's .getattr() implementation as the VFS layer will call
   generic_fillattr() if inode_operations.getattr is NULL.
 
 - Fix issues with dirty ring harvesting where KVM doesn't bound the
   processing of entries in any way, which allows userspace to keep KVM
   in a tight loop indefinitely.
 
 - Kill off kvm_arch_{start,end}_assignment() and x86's associated tracking,
   now that KVM no longer uses assigned_device_count as a heuristic for
   either irqbypass usage or MDS mitigation.
 
 Selftests:
 
 - Fix a comment typo.
 
 - Verify KVM is loaded when getting any KVM module param so that attempting
   to run a selftest without kvm.ko loaded results in a SKIP message about
   KVM not being loaded/enabled (versus some random parameter not existing).
 
 - Skip tests that hit EACCES when attempting to access a file, and rpint
   a "Root required?" help message.  In most cases, the test just needs to
   be run with elevated permissions.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmiKXMgUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMhMQf/QDhC/CP1aGXph2whuyeD2NMqPKiU
 9KdnDNST+ftPwjg9QxZ9mTaa8zeVz/wly6XlxD9OQHy+opM1wcys3k0GZAFFEEQm
 YrThgURdzEZ3nwJZgb+m0t4wjJQtpiFIBwAf7qq6z1VrqQBEmHXJ/8QxGuqO+BNC
 j5q/X+q6KZwehKI6lgFBrrOKWFaxqhnRAYfW6rGBxRXxzTJuna37fvDpodQnNceN
 zOiq+avfriUMArTXTqOteJNKU0229HjiPSnjILLnFQ+B3akBlwNG0jk7TMaAKR6q
 IZWG1EIS9q1BAkGXaw6DE1y6d/YwtXCR5qgAIkiGwaPt5yj9Oj6kRN2Ytw==
 =j2At
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM:

   - Host driver for GICv5, the next generation interrupt controller for
     arm64, including support for interrupt routing, MSIs, interrupt
     translation and wired interrupts

   - Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on
     GICv5 hardware, leveraging the legacy VGIC interface

   - Userspace control of the 'nASSGIcap' GICv3 feature, allowing
     userspace to disable support for SGIs w/o an active state on
     hardware that previously advertised it unconditionally

   - Map supporting endpoints with cacheable memory attributes on
     systems with FEAT_S2FWB and DIC where KVM no longer needs to
     perform cache maintenance on the address range

   - Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the
     guest hypervisor to inject external aborts into an L2 VM and take
     traps of masked external aborts to the hypervisor

   - Convert more system register sanitization to the config-driven
     implementation

   - Fixes to the visibility of EL2 registers, namely making VGICv3
     system registers accessible through the VGIC device instead of the
     ONE_REG vCPU ioctls

   - Various cleanups and minor fixes

  LoongArch:

   - Add stat information for in-kernel irqchip

   - Add tracepoints for CPUCFG and CSR emulation exits

   - Enhance in-kernel irqchip emulation

   - Various cleanups

  RISC-V:

   - Enable ring-based dirty memory tracking

   - Improve perf kvm stat to report interrupt events

   - Delegate illegal instruction trap to VS-mode

   - MMU improvements related to upcoming nested virtualization

  s390x

   - Fixes

  x86:

   - Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O
     APIC, PIC, and PIT emulation at compile time

   - Share device posted IRQ code between SVM and VMX and harden it
     against bugs and runtime errors

   - Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups
     O(1) instead of O(n)

   - For MMIO stale data mitigation, track whether or not a vCPU has
     access to (host) MMIO based on whether the page tables have MMIO
     pfns mapped; using VFIO is prone to false negatives

   - Rework the MSR interception code so that the SVM and VMX APIs are
     more or less identical

   - Recalculate all MSR intercepts from scratch on MSR filter changes,
     instead of maintaining shadow bitmaps

   - Advertise support for LKGS (Load Kernel GS base), a new instruction
     that's loosely related to FRED, but is supported and enumerated
     independently

   - Fix a user-triggerable WARN that syzkaller found by setting the
     vCPU in INIT_RECEIVED state (aka wait-for-SIPI), and then putting
     the vCPU into VMX Root Mode (post-VMXON). Trying to detect every
     possible path leading to architecturally forbidden states is hard
     and even risks breaking userspace (if it goes from valid to valid
     state but passes through invalid states), so just wait until
     KVM_RUN to detect that the vCPU state isn't allowed

   - Add KVM_X86_DISABLE_EXITS_APERFMPERF to allow disabling
     interception of APERF/MPERF reads, so that a "properly" configured
     VM can access APERF/MPERF. This has many caveats (APERF/MPERF
     cannot be zeroed on vCPU creation or saved/restored on suspend and
     resume, or preserved over thread migration let alone VM migration)
     but can be useful whenever you're interested in letting Linux
     guests see the effective physical CPU frequency in /proc/cpuinfo

   - Reject KVM_SET_TSC_KHZ for vm file descriptors if vCPUs have been
     created, as there's no known use case for changing the default
     frequency for other VM types and it goes counter to the very reason
     why the ioctl was added to the vm file descriptor. And also, there
     would be no way to make it work for confidential VMs with a
     "secure" TSC, so kill two birds with one stone

   - Dynamically allocation the shadow MMU's hashed page list, and defer
     allocating the hashed list until it's actually needed (the TDP MMU
     doesn't use the list)

   - Extract many of KVM's helpers for accessing architectural local
     APIC state to common x86 so that they can be shared by guest-side
     code for Secure AVIC

   - Various cleanups and fixes

  x86 (Intel):

   - Preserve the host's DEBUGCTL.FREEZE_IN_SMM when running the guest.
     Failure to honor FREEZE_IN_SMM can leak host state into guests

   - Explicitly check vmcs12.GUEST_DEBUGCTL on nested VM-Enter to
     prevent L1 from running L2 with features that KVM doesn't support,
     e.g. BTF

  x86 (AMD):

   - WARN and reject loading kvm-amd.ko instead of panicking the kernel
     if the nested SVM MSRPM offsets tracker can't handle an MSR (which
     is pretty much a static condition and therefore should never
     happen, but still)

   - Fix a variety of flaws and bugs in the AVIC device posted IRQ code

   - Inhibit AVIC if a vCPU's ID is too big (relative to what hardware
     supports) instead of rejecting vCPU creation

   - Extend enable_ipiv module param support to SVM, by simply leaving
     IsRunning clear in the vCPU's physical ID table entry

   - Disable IPI virtualization, via enable_ipiv, if the CPU is affected
     by erratum #1235, to allow (safely) enabling AVIC on such CPUs

   - Request GA Log interrupts if and only if the target vCPU is
     blocking, i.e. only if KVM needs a notification in order to wake
     the vCPU

   - Intercept SPEC_CTRL on AMD if the MSR shouldn't exist according to
     the vCPU's CPUID model

   - Accept any SNP policy that is accepted by the firmware with respect
     to SMT and single-socket restrictions. An incompatible policy
     doesn't put the kernel at risk in any way, so there's no reason for
     KVM to care

   - Drop a superfluous WBINVD (on all CPUs!) when destroying a VM and
     use WBNOINVD instead of WBINVD when possible for SEV cache
     maintenance

   - When reclaiming memory from an SEV guest, only do cache flushes on
     CPUs that have ever run a vCPU for the guest, i.e. don't flush the
     caches for CPUs that can't possibly have cache lines with dirty,
     encrypted data

  Generic:

   - Rework irqbypass to track/match producers and consumers via an
     xarray instead of a linked list. Using a linked list leads to
     O(n^2) insertion times, which is hugely problematic for use cases
     that create large numbers of VMs. Such use cases typically don't
     actually use irqbypass, but eliminating the pointless registration
     is a future problem to solve as it likely requires new uAPI

   - Track irqbypass's "token" as "struct eventfd_ctx *" instead of a
     "void *", to avoid making a simple concept unnecessarily difficult
     to understand

   - Decouple device posted IRQs from VFIO device assignment, as binding
     a VM to a VFIO group is not a requirement for enabling device
     posted IRQs

   - Clean up and document/comment the irqfd assignment code

   - Disallow binding multiple irqfds to an eventfd with a priority
     waiter, i.e. ensure an eventfd is bound to at most one irqfd
     through the entire host, and add a selftest to verify eventfd:irqfd
     bindings are globally unique

   - Add a tracepoint for KVM_SET_MEMORY_ATTRIBUTES to help debug issues
     related to private <=> shared memory conversions

   - Drop guest_memfd's .getattr() implementation as the VFS layer will
     call generic_fillattr() if inode_operations.getattr is NULL

   - Fix issues with dirty ring harvesting where KVM doesn't bound the
     processing of entries in any way, which allows userspace to keep
     KVM in a tight loop indefinitely

   - Kill off kvm_arch_{start,end}_assignment() and x86's associated
     tracking, now that KVM no longer uses assigned_device_count as a
     heuristic for either irqbypass usage or MDS mitigation

  Selftests:

   - Fix a comment typo

   - Verify KVM is loaded when getting any KVM module param so that
     attempting to run a selftest without kvm.ko loaded results in a
     SKIP message about KVM not being loaded/enabled (versus some random
     parameter not existing)

   - Skip tests that hit EACCES when attempting to access a file, and
     print a "Root required?" help message. In most cases, the test just
     needs to be run with elevated permissions"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (340 commits)
  Documentation: KVM: Use unordered list for pre-init VGIC registers
  RISC-V: KVM: Avoid re-acquiring memslot in kvm_riscv_gstage_map()
  RISC-V: KVM: Use find_vma_intersection() to search for intersecting VMAs
  RISC-V: perf/kvm: Add reporting of interrupt events
  RISC-V: KVM: Enable ring-based dirty memory tracking
  RISC-V: KVM: Fix inclusion of Smnpm in the guest ISA bitmap
  RISC-V: KVM: Delegate illegal instruction fault to VS mode
  RISC-V: KVM: Pass VMID as parameter to kvm_riscv_hfence_xyz() APIs
  RISC-V: KVM: Factor-out g-stage page table management
  RISC-V: KVM: Add vmid field to struct kvm_riscv_hfence
  RISC-V: KVM: Introduce struct kvm_gstage_mapping
  RISC-V: KVM: Factor-out MMU related declarations into separate headers
  RISC-V: KVM: Use ncsr_xyz() in kvm_riscv_vcpu_trap_redirect()
  RISC-V: KVM: Implement kvm_arch_flush_remote_tlbs_range()
  RISC-V: KVM: Don't flush TLB when PTE is unchanged
  RISC-V: KVM: Replace KVM_REQ_HFENCE_GVMA_VMID_ALL with KVM_REQ_TLB_FLUSH
  RISC-V: KVM: Rename and move kvm_riscv_local_tlb_sanitize()
  RISC-V: KVM: Drop the return value of kvm_riscv_vcpu_aia_init()
  RISC-V: KVM: Check kvm_riscv_vcpu_alloc_vector_context() return value
  KVM: arm64: selftests: Add FEAT_RAS EL2 registers to get-reg-list
  ...
2025-07-30 17:14:01 -07:00
Linus Torvalds
a26321ee4c hardening fixes for v6.17-rc1
- staging: media: atomisp: Fix stack buffer overflow in gmin_get_var_int()
   I was asked to carry this fix, so here it is. :)
 
 - fortify: Fix incorrect reporting of read buffer size
 
 - kstack_erase: Fix missed export of renamed KSTACK_ERASE_CFLAGS
 
 - compiler_types: Provide __no_kstack_erase to disable coverage only on Clang
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCaIlnHwAKCRA2KwveOeQk
 u7tdAQCWoq7YUp1ee2RxYt2UdRhwlMfPE4cYrC1E9GHBA3fRnQD/QMvQ/EJ5eb0Y
 u4vO3woSKkMxu4VmZPCzmT0mRNo/kAA=
 =RhhL
 -----END PGP SIGNATURE-----

Merge tag 'hardening-v6.17-rc1-fix1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull hardening fixes from Kees Cook:
 "Notably, this contains the fix for for the GCC __init mess I created
  with the kstack_erase annotations.

   - staging: media: atomisp: Fix stack buffer overflow in
     gmin_get_var_int().

     I was asked to carry this fix, so here it is. :)

   - fortify: Fix incorrect reporting of read buffer size

   - kstack_erase: Fix missed export of renamed KSTACK_ERASE_CFLAGS

   - compiler_types: Provide __no_kstack_erase to disable coverage only
     on Clang"

* tag 'hardening-v6.17-rc1-fix1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  compiler_types: Provide __no_kstack_erase to disable coverage only on Clang
  fortify: Fix incorrect reporting of read buffer size
  kstack_erase: Fix missed export of renamed KSTACK_ERASE_CFLAGS
  staging: media: atomisp: Fix stack buffer overflow in gmin_get_var_int()
2025-07-29 20:49:58 -07:00
Linus Torvalds
98e8f2c0e0 x86/platform changes for v6.17:
This tree adds support for the AMD hardware feedback interface (HFI),
 by Perry Yuan.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmiIdwERHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gYIRAAueF4R6gXv2luxwZc5xDHUp0z3W2nkFWB
 mpD329mqewy79fPusGRMjwxwuT7/EzIIfVkjOAi21RnwkoU/L6kBdIsn/ZGMTUpG
 +8zFnjBsdAb2Nm27LZ+W6nTxHn+ZiMs1U2AUqYC+F/sTtILA7LKt0aufgRrw028I
 aZKHOvHoYRYxgQEgcaCjBXXyUsw4zetIV8i2fj+rflutXL5tLSSgg9Pyh/dNU8JX
 MfWHxC2J79mYc8sLj2Pc0H6wuMPDCZ7uzBq/uieuZjhDouSdV9oDFoc3bNwYpxZI
 7CBqg8qcnDY3HFnbT7Hwv9O0+Kbl3+714N5P1J0Lb1+ovaRazdLCA9jenRu+lXIp
 7OehfOQoxw0XbWfESvEsSmMEjiyxIbcuwyu9X+9asY8bybD21s1gzbQYOGcBD4sf
 YjbGObnGqqS2GWe6LayU6HKs6kMWQfS+8xLlvWtnT4KNPwVOT1D41Fiy8lYQMXd1
 MFX1vRxLdtMtjVrotq4UmENdK/t7fd2BRvDw3wOjSfNynIf55XFRuqR7RTCLQivJ
 Z8ZF390kb1JSmBJdNZ1Mj4XUT5fTcimLra2qSTZ2dTDO23b0IiTlgoDP1LLbYYDL
 lRGfMViwO4Kinm/nerzELv/VAFMFm/DvHmWuGj/vaRTS3r7iGb3+A5SFe4JS0Utz
 BWoqrbrAWCg=
 =h01B
 -----END PGP SIGNATURE-----

Merge tag 'x86-platform-2025-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 platform updates from Ingo Molnar:
 "This adds support for the AMD hardware feedback interface (HFI), by
  Perry Yuan"

* tag 'x86-platform-2025-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/itmt: Add debugfs file to show core priorities
  platform/x86/amd: hfi: Add debugfs support
  platform/x86/amd: hfi: Set ITMT priority from ranking data
  cpufreq/amd-pstate: Disable preferred cores on designs with workload classification
  x86/process: Clear hardware feedback history for AMD processors
  platform/x86: hfi: Add power management callback
  platform/x86: hfi: Add online and offline callback support
  platform/x86: hfi: Init per-cpu scores for each class
  platform/x86: hfi: Parse CPU core ranking data from shared memory
  platform/x86: hfi: Introduce AMD Hardware Feedback Interface Driver
  x86/msr-index: Add AMD workload classification MSRs
  MAINTAINERS: Add maintainer entry for AMD Hardware Feedback Driver
  Documentation/x86: Add AMD Hardware Feedback Interface documentation
2025-07-29 20:05:06 -07:00
Linus Torvalds
0c23929f35 x86 FPU changes for v6.17:
- Most of the changes are related to the implementation of
    CET supervisor state support for guests, and its preparatory
    changes. (Chao Gao)
 
  - Improve/fix the debug output for unexpected FPU exceptions
    (Dave Hansen)
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmiIdA8RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hCoA//f+3tKqUN3tAmiWhkirJmWYdFxUA4gjUU
 xMiTtMarRN10dra2knZAIKSbK7XwVW/tlC9PRD2zTp7vHy/ze2bNI0kq2Yvnpigj
 WPVB9wv4yL6YuiP7Ppklse1ZMAyBCqTcCMQTAdOslC7SINCbhGunrkWjYJnJtU0i
 9BjQj4rEbllI8L7zBlY+DAxQtqKqMRblRIqnUwkPn7DHS4+84ynYpPAiDmenTQIC
 oB9b93SNlISGpqFrtto5cFSaAKcPdqPI+0/ByxVSWkQX0K2kpW6/YvExmyzVLaky
 ACrzwdbC0bTttqKCukXVz0AHVjGReIT+PWzbax4ixzJI3WYM7+pAtB+R2qPn6j/s
 70RJ28amuy8WjIqmGMnh+UAg3ISnZXhbEOdGLIe88xr5rpDaXhxUz3J/BaBAlN6u
 Vz7+H/Py7Q2XymOrRpFUyRoVZKY6tzMI7pfmpXPd8J2NoGiJ/yPtY3H7do3W2B8Z
 6etjaNSpj1vIo4rUdwi37QVkYSdjE2G4dd2wk6wGI+AK45kzkag79HhGE+RqJ6bM
 8YfCGgupB7WCOUkYtl4k/0rLxcHDyKeIPiglXiqHXC4tw0Vn/XGSdhrog/eCb5+d
 GJZ3iLSlaxSwZV+PCwprgtKR+QfPFzYQjGwIO248Mw/IwTQQERo5+tYzjWd67E7H
 dfIpXzHnj2U=
 =D6rO
 -----END PGP SIGNATURE-----

Merge tag 'x86-fpu-2025-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 FPU updates from Ingo Molnar:

 - Most of the changes are related to the implementation of CET
   supervisor state support for guests, and its preparatory changes
   (Chao Gao)

 - Improve/fix the debug output for unexpected FPU exceptions (Dave
   Hansen)

* tag 'x86-fpu-2025-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fpu: Delay instruction pointer fixup until after warning
  x86/fpu/xstate: Add CET supervisor xfeature support as a guest-only feature
  x86/fpu/xstate: Introduce "guest-only" supervisor xfeature set
  x86/fpu: Remove xfd argument from __fpstate_reset()
  x86/fpu: Initialize guest fpstate and FPU pseudo container from guest defaults
  x86/fpu: Initialize guest FPU permissions from guest defaults
  x86/fpu/xstate: Differentiate default features for host and guest FPUs
2025-07-29 19:34:17 -07:00
Linus Torvalds
4dd39ddeb6 Add user-space CPUID faulting support for AMD CPUs.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmiIcjMRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jG+g//ajTp6nL/rKOBkCB9JEyOY6QnABWq/+ts
 Cv65XvaImPTAwgrTxRH3RdGTs+LnKS8jAIMI1Vne2SqqR2gU69Vr7HsQtg5z0r9w
 NRxBTHJ27LkHgT/SNEa6t5iQgWa+g4wDDk/OIm8rBztW29SgCxjJxx88xvuIzjQI
 Ae1fiZ285dViiykmpOmBAiq/IW4U7iXi2AXEvX+TG2Nq2LcIYTQDZBMsykhEHVBq
 87BgVNR5zxk7XEvr5OwDWEUJv0aJBic5oW/ogh575sClO7PtiG/wE3/VmIZz5nWP
 RHe2BR4w4KRbGPHySLLBL6Gg2Gv0rjF4U6sEGUtWVbQyiy0TZbgxk6qz1c749EYR
 Oq8yVzF1zZ0nMNai/0armAkqhVac2Vlk1R7wVcJgp5iVIoJ+hm0eIXrkfLmJulED
 W/iiWemT2B18ujryFdhTLDHR7PXJoe8o9Pqb3MY7hxd7C0Qu3pEj6Ul0YoI4NaxQ
 Ic09ftjVt/arWa3MsHTqi1I8BEW0TiMfJG+xLqo1afs52bTVnoc4FEAkPJX2/Liv
 NMABHh3nxdm55rtXkHPRqt9pUDGbh4dT9N6ofZz7x2hiBtiaY0BYYMJu2bixLqeH
 XSf9QdQLkoLc7LFFX1IA9yael1pnTvBjJki25jSeKjCmvADLUFuJ7k4DbSDyf5SF
 MzmYLnhprfM=
 =7q4q
 -----END PGP SIGNATURE-----

Merge tag 'x86-cpu-2025-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cpu update from Ingo Molnar:
 "Add user-space CPUID faulting support for AMD CPUs"

* tag 'x86-cpu-2025-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/CPU/AMD: Add CPUID faulting support
2025-07-29 19:22:21 -07:00
Linus Torvalds
1645f6ab96 Miscellaneous x86 cleanups.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmiIcXkRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1ipQBAAhDNnPuEuP1KG1+TptXg2rLfacKBHM2wL
 f3vrx0p5nLj+LwwRvCazdIX9yOw/VEIVzXoRcjeoWTn4JvzCIjYA+E8uCtU+wBOs
 tLrbTOpPPctSxZf3TlKhv3BgXKKtkZ1BCmdd5sxdOgFTdBghHJ3GsC2WgzVm/6aP
 GOl/T0mXYLoEaIbJpXYLRUvSdhTmkprIBR9pi2mTPhui+KcYEuF5gKr+B0dHnOKs
 v59Be+sktuROJJa4Zmm0CyABf9s+Ho3k7sXM5BtaA62RLqLcXGDxzH4zF2rass4z
 IRYy1mhJZRzlu7P3PpMy7fb5wYvwpyFqq+gK7eUuxLvDlmxctriwL1kIzibzZn+8
 4T57rhOtxEQujVSwVrTYaX4E7czjWLHedIeXmcAoV+rwUqYBPkADPq9ad+pzc/wy
 DPEb11HF5VxBpKzvTn+RbfJhl9XRJoSuLSaBgIXrCyBjNhc69JDYe6ZP6jCjHbz0
 yrFn/QuI4cTv66Ig9b9tzQxKk5BKXyqFtj2l+L22zamYKBCiQWTpczhlqmbOZ6Bw
 2hJU1l/8yHt70zZAsfmjDFB8ycJsKegDATJWkVDMme6Z071z8wPElOEso3VlayJg
 Y84gKs83W+2tSGa2aq2Mv5ZRRnEnum9NM6dW8RfOZRSWtfLOniJj1qoXQwSJc2n/
 keahYvCNef4=
 =C/ds
 -----END PGP SIGNATURE-----

Merge tag 'x86-cleanups-2025-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cleanups from Ingo Molnar:
 "Miscellaneous x86 cleanups"

* tag 'x86-cleanups-2025-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic: Move apic_update_irq_cfg() call to apic_update_vector()
  x86/mm: Remove duplicated __PAGE_KERNEL(_EXEC) definitions
2025-07-29 19:00:35 -07:00
Kees Cook
f627b51aaa compiler_types: Provide __no_kstack_erase to disable coverage only on Clang
In order to support Clang's stack depth tracking (for Linux's kstack_erase
feature), the coverage sanitizer needed to be disabled for __init (and
__head) section code. Doing this universally (i.e. for GCC too) created
a number of unexpected problems, ranging from changes to inlining logic
to failures to DCE code on earlier GCC versions.

Since this change is only needed for Clang, specialize it so that GCC
doesn't see the change as it isn't needed there (the GCC implementation
of kstack_erase uses a GCC plugin that removes stack depth tracking
instrumentation from __init sections during a late pass in the IR).

Successfully build and boot tested with GCC 12 and Clang 22.

Fixes: 381a38ea53 ("init.h: Disable sanitizer coverage for __init and __head")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202507270258.neWuiXLd-lkp@intel.com/
Reported-by: syzbot+5245cb609175fb6e8122@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/6888d004.a00a0220.26d0e1.0004.GAE@google.com/
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Marco Elver <elver@google.com>
Link: https://lore.kernel.org/r/20250729234055.it.233-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
2025-07-29 17:19:35 -07:00
Linus Torvalds
14bed9bc81 - Map the SNP calling area pages too so that OVMF EFI fw can issue SVSM
calls properly with the goal of implementing EFI variable store in the
   SVSM - a component which is trusted by the guest, vs in the firmware, which
   is not
 
 - Allow the kernel to handle #VC exceptions from EFI runtime services
   properly when running as a SNP guest
 
 - Rework and cleanup the SNP guest request issue glue code a bit
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmiH13gACgkQEsHwGGHe
 VUr88A//fIbR7eaz7QRiHq32S57NOpyOAciYrGsBrWSo1BLSFcrelYG4RTnzaKzR
 ACVr2yALoeoZooH7gtPgjt7554xWJHA9DR9Ln2YPGd5a2Np8fknY0Uu1MGFVIorC
 4z2u1EATlsB0I/nCh/LboryVxFN4C+qRKRk7iJ7wibdJ15zguc0T/P5lU8gY1eB8
 0NZ2e0T8QnpjIc8cx/XSYXDXIwvOJ5rX36Xm5/g6A/vPubLy1UO0hkBDGfVh+2WG
 dt8T+szidtqru8RQ522jW/3R/ct8iZa0U8Cp9QDdwwcQC3jBvo/xyIv5K4ueDEEI
 J0KfcIKn5zbDeQbBHMw5a9XvPshwHKQIUjY83JfSsviZ1yVseQEQHeJOE6mDn2Mj
 QeCWuqtwMaEoElhNX5xhe9p60KID8VoBJqB+bb1bgbN8sPeYoHc8f9p13XJaU1Mo
 hV0dwlpFwCaxCZgWdtxDVji9mmvzaUT4O1QEO88AdfhDNMa+b/T5L0dJb1gnZaUY
 rQ6ePImHh9nXRtJncfK3UsGmSE6HPc4O7dyV83IAcniGTgQycIYlOIUzkUbpF+wJ
 advBv5Zhx2xCOBDI1ucpHNWXCCe99YVE5GeaLjq6DMLgD0HdGnXqrCw4kOluZDBQ
 Xoy07x1XANQVLk0xQ5Bf1MsOiztbCZG9Rvb2dN9lCA06W5v+0MA=
 =KRgO
 -----END PGP SIGNATURE-----

Merge tag 'x86_sev_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 SEV updates from Borislav Petkov:

 - Map the SNP calling area pages too so that OVMF EFI fw can issue SVSM
   calls properly with the goal of implementing EFI variable store in
   the SVSM - a component which is trusted by the guest, vs in the
   firmware, which is not

 - Allow the kernel to handle #VC exceptions from EFI runtime services
   properly when running as a SNP guest

 - Rework and cleanup the SNP guest request issue glue code a bit

* tag 'x86_sev_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sev: Let sev_es_efi_map_ghcbs() map the CA pages too
  x86/sev/vc: Fix EFI runtime instruction emulation
  x86/sev: Drop unnecessary parameter in snp_issue_guest_request()
  x86/sev: Document requirement for linear mapping of guest request buffers
  x86/sev: Allocate request in TSC_INFO_REQ on stack
  virt: sev-guest: Contain snp_guest_request_ioctl in sev-guest
2025-07-29 17:18:46 -07:00
Linus Torvalds
bb78c145f7 - Add helpers for WB{NO,}INVD with the purpose of using them in KVM and thus
diminish the number of invalidations needed. With preceding cleanups, as
   always
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmiHqhkACgkQEsHwGGHe
 VUo0NxAAs+1XZlmqu1ZytjIWJRn/fz5EMFyQmMt2DofrghpTjuAC2Y3PsoXyGQ4k
 AUumIEM5afdJeBBPsF7b5ZMQ61tk4FJMJlxu/56wQrZn0MpSxWDks3aNh4qVCtXJ
 +AJJpHW4ljMw5hIdf4Uk/lOJvvhHPFLAcjQZUPffV4baQvE9glC03xNl0O6UUJ9/
 YIxHQcjIYKqPeQqXc2IYgk+03GFXlufLWjuVgy6tqSaNrv7bxyP4J6GSIOLE9uoF
 wbjAsYo/lHEhLAfwPXXiUHngKSI0lbVdc1pw82Hca0vChl+9uUNpmU0niN4XyDR8
 j8LCnbVzAJZRcFVU2sgOq6paFD44mnyPlC37/wq5dDJ6o0eLccGj/+j0zxiMb6Mc
 EibUSPvaAF/I8VzYWQrM/5L4Hvx6BaqQY2gTo7IBk3DW68cv4VAELIMvcAvS7XbR
 kwrOUuB/qM2mKwWP2pKRVMHzvZNJqgDaObteH2IdeFRbspWyLcFBe7JUMGdxVMhH
 bPOvv/n1rdsrVUaFG/2vG5M2dWAzpZku65x0YsBkiGorq+9gbRGWILStyVwM+IA5
 IFZ2wIOEjQ2WSU+1kUAbUfXDQToCgzJN9Kk2G0Y4L6LX6aEqL+wvQ25KlqtiSfdg
 pSsiTKxIU+L46NoyzGcRsYslavat0LTM5rf0Y5kAV17NriSgWQ0=
 =kddf
 -----END PGP SIGNATURE-----

Merge tag 'x86_core_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cpu updates from Borislav Petkov:

 - Add helpers for WB{NO,}INVD with the purpose of using them in KVM and
   thus diminish the number of invalidations needed. With preceding
   cleanups, as always

* tag 'x86_core_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/lib: Add WBINVD and WBNOINVD helpers to target multiple CPUs
  x86/lib: Add WBNOINVD helper functions
  x86/lib: Drop the unused return value from wbinvd_on_all_cpus()
  drm/gpu: Remove dead checks on wbinvd_on_all_cpus()'s return value
2025-07-29 16:55:29 -07:00
Linus Torvalds
91e60731dd TTY / Serial driver update for 6.17-rc1
Here is the big set of TTY and Serial driver updates for 6.17-rc1.
 Included in here is the following types of changes:
   - another cleanup round from Jiri for the 8250 serial driver and some
     other tty drivers, things are slowly getting better with our apis
     thanks to this work.  This touched many tty drivers all over the
     tree.
   - qcom_geni_serial driver update for new platforms and devices
   - 8250 quirk handling fixups
   - dt serial binding updates for different boards/platforms
   - other minor cleanups and fixes
 
 All of these have been in linux-next with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCaIeWPw8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ymQHgCfR3KXFe7ElzXQqjJhtN/ej6IPijIAoI6k3g0m
 4UewNOIef9XpIAJ8zuoo
 =850o
 -----END PGP SIGNATURE-----

Merge tag 'tty-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty / serial driver updates from Greg KH:
 "Here is the big set of TTY and Serial driver updates for 6.17-rc1.
  Included in here is the following types of changes:

   - another cleanup round from Jiri for the 8250 serial driver and some
     other tty drivers, things are slowly getting better with our apis
     thanks to this work. This touched many tty drivers all over the
     tree.

   - qcom_geni_serial driver update for new platforms and devices

   - 8250 quirk handling fixups

   - dt serial binding updates for different boards/platforms

   - other minor cleanups and fixes

  All of these have been in linux-next with no reported issues"

* tag 'tty-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (79 commits)
  dt-bindings: serial: snps-dw-apb-uart: Allow use of a power-domain
  serial: 8250: fix panic due to PSLVERR
  dt-bindings: serial: samsung: add samsung,exynos2200-uart compatible
  vt: defkeymap: Map keycodes above 127 to K_HOLE
  vt: keyboard: Don't process Unicode characters in K_OFF mode
  serial: qcom-geni: Enable Serial on SA8255p Qualcomm platforms
  serial: qcom-geni: Enable PM runtime for serial driver
  serial: qcom-geni: move clock-rate logic to separate function
  serial: qcom-geni: move resource control logic to separate functions
  serial: qcom-geni: move resource initialization to separate function
  soc: qcom: geni-se: Enable QUPs on SA8255p Qualcomm platforms
  dt-bindings: qcom: geni-se: describe SA8255p
  dt-bindings: serial: describe SA8255p
  serial: 8250_dw: Fix typo "notifer"
  dt-bindings: serial: 8250: spacemit: set clocks property as required
  dt-bindings: serial: renesas: Document RZ/V2N SCIF
  serial: 8250_ce4100: Fix CONFIG_SERIAL_8250=n build
  tty: omit need_resched() before cond_resched()
  serial: 8250_ni: Reorder local variables
  serial: 8250_ni: Fix build warning
  ...
2025-07-29 10:11:01 -07:00
Paolo Bonzini
a10accaef4 Merge tag 'x86_core_for_kvm' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into HEAD
Immutable branch for KVM tree to put the KVM patches from

https://lore.kernel.org/r/20250522233733.3176144-1-seanjc@google.com

ontop.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2025-07-29 08:36:46 -04:00
Paolo Bonzini
89400f0687 Merge tag 'kvm-x86-apic-6.17' of https://github.com/kvm-x86/linux into HEAD
KVM local APIC changes for 6.17

Extract many of KVM's helpers for accessing architectural local APIC state
to common x86 so that they can be shared by guest-side code for Secure AVIC.
2025-07-29 08:36:44 -04:00
Paolo Bonzini
d7f4aac280 Merge tag 'kvm-x86-mmu-6.17' of https://github.com/kvm-x86/linux into HEAD
KVM x86 MMU changes for 6.17

 - Exempt nested EPT from the the !USER + CR0.WP logic, as EPT doesn't interact
   with CR0.WP.

 - Move the TDX hardware setup code to tdx.c to better co-locate TDX code
   and eliminate a few global symbols.

 - Dynamically allocation the shadow MMU's hashed page list, and defer
   allocating the hashed list until it's actually needed (the TDP MMU doesn't
   use the list).
2025-07-29 08:36:43 -04:00
Paolo Bonzini
1a14928e2e Merge tag 'kvm-x86-misc-6.17' of https://github.com/kvm-x86/linux into HEAD
KVM x86 misc changes for 6.17

 - Prevert the host's DEBUGCTL.FREEZE_IN_SMM (Intel only) when running the
   guest.  Failure to honor FREEZE_IN_SMM can bleed host state into the guest.

 - Explicitly check vmcs12.GUEST_DEBUGCTL on nested VM-Enter (Intel only) to
   prevent L1 from running L2 with features that KVM doesn't support, e.g. BTF.

 - Intercept SPEC_CTRL on AMD if the MSR shouldn't exist according to the
   vCPU's CPUID model.

 - Rework the MSR interception code so that the SVM and VMX APIs are more or
   less identical.

 - Recalculate all MSR intercepts from the "source" on MSR filter changes, and
   drop the dedicated "shadow" bitmaps (and their awful "max" size defines).

 - WARN and reject loading kvm-amd.ko instead of panicking the kernel if the
   nested SVM MSRPM offsets tracker can't handle an MSR.

 - Advertise support for LKGS (Load Kernel GS base), a new instruction that's
   loosely related to FRED, but is supported and enumerated independently.

 - Fix a user-triggerable WARN that syzkaller found by stuffing INIT_RECEIVED,
   a.k.a. WFS, and then putting the vCPU into VMX Root Mode (post-VMXON).  Use
   the same approach KVM uses for dealing with "impossible" emulation when
   running a !URG guest, and simply wait until KVM_RUN to detect that the vCPU
   has architecturally impossible state.

 - Add KVM_X86_DISABLE_EXITS_APERFMPERF to allow disabling interception of
   APERF/MPERF reads, so that a "properly" configured VM can "virtualize"
   APERF/MPERF (with many caveats).

 - Reject KVM_SET_TSC_KHZ if vCPUs have been created, as changing the "default"
   frequency is unsupported for VMs with a "secure" TSC, and there's no known
   use case for changing the default frequency for other VM types.
2025-07-29 08:36:43 -04:00
Paolo Bonzini
9de13951d5 Merge tag 'kvm-x86-no_assignment-6.17' of https://github.com/kvm-x86/linux into HEAD
KVM VFIO device assignment cleanups for 6.17

Kill off kvm_arch_{start,end}_assignment() and x86's associated tracking now
that KVM no longer uses assigned_device_count as a bad heuristic for "VM has
an irqbypass producer" or for "VM has access to host MMIO".
2025-07-29 08:36:42 -04:00
Paolo Bonzini
f05efcfe07 Merge tag 'kvm-x86-mmio-6.17' of https://github.com/kvm-x86/linux into HEAD
KVM MMIO Stale Data mitigation cleanup for 6.17

Rework KVM's mitigation for the MMIO State Data vulnerability to track
whether or not a vCPU has access to (host) MMIO based on the MMU that will be
used when running in the guest.  The current approach doesn't actually detect
whether or not a guest has access to MMIO, and is prone to false negatives (and
to a lesser extent, false positives), as KVM_DEV_VFIO_FILE_ADD is optional, and
obviously only covers VFIO devices.
2025-07-29 08:36:28 -04:00
Paolo Bonzini
f02b1bcc73 Merge tag 'kvm-x86-irqs-6.17' of https://github.com/kvm-x86/linux into HEAD
KVM IRQ changes for 6.17

 - Rework irqbypass to track/match producers and consumers via an xarray
   instead of a linked list.  Using a linked list leads to O(n^2) insertion
   times, which is hugely problematic for use cases that create large numbers
   of VMs.  Such use cases typically don't actually use irqbypass, but
   eliminating the pointless registration is a future problem to solve as it
   likely requires new uAPI.

 - Track irqbypass's "token" as "struct eventfd_ctx *" instead of a "void *",
   to avoid making a simple concept unnecessarily difficult to understand.

 - Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O APIC, PIC,
   and PIT emulation at compile time.

 - Drop x86's irq_comm.c, and move a pile of IRQ related code into irq.c.

 - Fix a variety of flaws and bugs in the AVIC device posted IRQ code.

 - Inhibited AVIC if a vCPU's ID is too big (relative to what hardware
   supports) instead of rejecting vCPU creation.

 - Extend enable_ipiv module param support to SVM, by simply leaving IsRunning
   clear in the vCPU's physical ID table entry.

 - Disable IPI virtualization, via enable_ipiv, if the CPU is affected by
   erratum #1235, to allow (safely) enabling AVIC on such CPUs.

 - Dedup x86's device posted IRQ code, as the vast majority of functionality
   can be shared verbatime between SVM and VMX.

 - Harden the device posted IRQ code against bugs and runtime errors.

 - Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups O(1)
   instead of O(n).

 - Generate GA Log interrupts if and only if the target vCPU is blocking, i.e.
   only if KVM needs a notification in order to wake the vCPU.

 - Decouple device posted IRQs from VFIO device assignment, as binding a VM to
   a VFIO group is not a requirement for enabling device posted IRQs.

 - Clean up and document/comment the irqfd assignment code.

 - Disallow binding multiple irqfds to an eventfd with a priority waiter, i.e.
   ensure an eventfd is bound to at most one irqfd through the entire host,
   and add a selftest to verify eventfd:irqfd bindings are globally unique.
2025-07-29 08:35:46 -04:00
Linus Torvalds
9669b2499e platform-drivers-x86 for v6.17-1
Highlights
 
  - alienware: Add more precise labels to fans
 
  - amd/hsmp: Improve misleading probe errors (make the legacy driver
              aware when HSMP is supported through the ACPI driver)
 
  - amd/pmc: Add Lenovo Yoga 6 13ALCL6 to pmc quirk list
 
  - drm/xe: Correct (D)VSEC information to support PMT crashlog feature
 
  - fujitsu: Clamp charge threshold instead of returning an error
 
  - ideapad: Expore change types
 
  - intel/pmt:
 
    - Add PMT Discovery driver
 
    - Add API to retrieve telemetry regions by feature
 
    - Fix crashlog NULL access
 
    - Support Battlemage GPU (BMG) crashlog
 
  - intel/vsec:
 
    - Add Discovery feature
 
    - Add feature dependency support using device links
 
  - lenovo:
 
    - Move lenovo drivers under drivers/platform/x86/lenovo/
 
    - Add WMI drivers for Lenovo Gaming series
 
    - Improve DMI handling
 
  - oxpec:
 
    - Add support for OneXPlayer X1 Mini Pro (Strix Point variant)
 
    - Fix EC registers for G1 AMD
 
  - samsung-laptop: Expose change types
 
  - wmi: Fix WMI device naming issue (same GUID corner cases)
 
  - x86-android-tables: Add ovc-capacity-table to generic battery nodes
 
  - Miscellaneous cleanups / refactoring / improvements
 
 The following is an automated shortlog grouped by driver:
 
 Add Lenovo Capability Data 01 WMI Driver:
  - Add Lenovo Capability Data 01 WMI Driver
 
 Add Lenovo Gamezone WMI Driver:
  - Add Lenovo Gamezone WMI Driver
 
 Add Lenovo Other Mode WMI Driver:
  - Add Lenovo Other Mode WMI Driver
 
 Add lenovo-wmi-* driver Documentation:
  - Add lenovo-wmi-* driver Documentation
 
 Add Lenovo WMI Events Driver:
  - Add Lenovo WMI Events Driver
 
 Add lenovo-wmi-helpers:
  - Add lenovo-wmi-helpers
 
 alienware-wmi-wmax:
  -  Add appropriate labels to fans
 
 amd/hsmp:
  -  Enhance the print messages to prevent confusion
  -  Use IS_ENABLED() instead of IS_REACHABLE()
 
 amd: pmc:
  -  Add Lenovo Yoga 6 13ALC6 to pmc quirk list
 
 arm64: lenovo-yoga-c630:
  -  use the auxiliary device creation helper
 
 dell_rbu:
  -  Remove unused struct
 
 dell-uart-backlight:
  -  Use blacklight power constant
 
 docs:
  -  Add ABI documentation for intel_pmt feature directories
 
 Documentation: ABI:
  -  Update WMI device paths in ABI docs
 
 drm/xe:
  -  Correct BMG VSEC header sizing
  -  Correct the rev value for the DVSEC entries
 
 fujitsu:
  -  clamp charge_control_end_threshold values to 50
  -  use unsigned int for kstrtounit
 
 ideapad:
  -  Expose charge_types
 
 intel/pmt:
  -  Add PMT Discovery driver
  -  add register access helpers
  -  correct types
  -  decouple sysfs and namespace
 
 intel/pmt/discovery:
  -  fix format string warning
  -  Fix size_t specifiers for 32-bit
  -  Get telemetry attributes
 
 intel/pmt:
  -  fix a crashlog NULL pointer access
  -  fix build dependency for kunit test
  -  KUNIT test for PMT Enhanced Discovery API
  -  mutex clean up
  -  refactor base parameter
  -  re-order trigger logic
  -  support BMG crashlog
 
 intel/pmt/telemetry:
  -  Add API to retrieve telemetry regions by feature
 
 intel/pmt:
  -  use a version struct
  -  use guard(mutex)
  -  white space cleanup
 
 intel_telemetry:
  -  Remove unused telemetry_*_events()
  -  Remove unused telemetry_[gs]et_sampling_period()
  -  Remove unused telemetry_raw_read_events()
 
 intel/tpmi:
  -  Get OOBMSM CPU mapping from TPMI
  -  Relocate platform info to intel_vsec.h
 
 intel/vsec:
  -  Add device links to enforce dependencies
  -  Add new Discovery feature
  -  Add private data for per-device data
  -  Create wrapper to walk PCI config space
  -  Set OOBMSM to CPU mapping
  -  Skip absent features during initialization
  -  Skip driverless features
 
 lenovo:
  -  gamezone needs "other mode"
 
 lenovo-yoga-tab2-pro-1380-fastcharger:
  -  Use devm_pinctrl_register_mappings()
 
 MAINTAINERS:
  -  Add link to documentation of Intel PMT ABI
 
 Move Lenovo files into lenovo subdir:
  - Move Lenovo files into lenovo subdir
 
 oxpec:
  -  Add support for OneXPlayer X1 Mini Pro (Strix Point)
  -  Fix turbo register for G1 AMD
 
 samsung-laptop:
  -  Expose charge_types
 
 silicom:
  -  remove unnecessary GPIO line direction check
 
 thinklmi:
  -  improved DMI handling
 
 thinkpad_acpi:
  -  Handle KCOV __init vs inline mismatches
 
 wmi:
  -  Fix WMI device naming issue
 
 x86-android-tablets:
  -  Add generic_lipo_4v2_battery info
  -  Add ovc-capacity-table info
 
 Merges:
  -  Merge branch 'fixes' into 'for-next'
  -  Merge branch 'fixes' into for-next
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSCSUwRdwTNL2MhaBlZrE9hU+XOMQUCaIdbygAKCRBZrE9hU+XO
 MbqTAQCqqczU2YXRnq7TIvw/yl40+scIKMXobjX0EEpmgqhlHwEAkWwjQG0ytS2j
 hzES5gog1xT6A4TIjVr0Up5MUj3crwU=
 =+TRi
 -----END PGP SIGNATURE-----

Merge tag 'platform-drivers-x86-v6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform drivers from Ilpo Järvinen:

 - alienware: Add more precise labels to fans

 - amd/hsmp: Improve misleading probe errors (make the legacy driver
   aware when HSMP is supported through the ACPI driver)

 - amd/pmc: Add Lenovo Yoga 6 13ALCL6 to pmc quirk list

 - drm/xe: Correct (D)VSEC information to support PMT crashlog feature

 - fujitsu: Clamp charge threshold instead of returning an error

 - ideapad: Expore change types

 - intel/pmt:
     - Add PMT Discovery driver
     - Add API to retrieve telemetry regions by feature
     - Fix crashlog NULL access
     - Support Battlemage GPU (BMG) crashlog

 - intel/vsec:
     - Add Discovery feature
     - Add feature dependency support using device links

 - lenovo:
     - Move lenovo drivers under drivers/platform/x86/lenovo/
     - Add WMI drivers for Lenovo Gaming series
     - Improve DMI handling

 - oxpec:
     - Add support for OneXPlayer X1 Mini Pro (Strix Point variant)
     - Fix EC registers for G1 AMD

 - samsung-laptop: Expose change types

 - wmi: Fix WMI device naming issue (same GUID corner cases)

 - x86-android-tables: Add ovc-capacity-table to generic battery nodes

 - Miscellaneous cleanups / refactoring / improvements

* tag 'platform-drivers-x86-v6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (63 commits)
  platform/x86: oxpec: Add support for OneXPlayer X1 Mini Pro (Strix Point)
  platform/x86: oxpec: Fix turbo register for G1 AMD
  platform/x86/intel/pmt: support BMG crashlog
  platform/x86/intel/pmt: use a version struct
  platform/x86/intel/pmt: refactor base parameter
  platform/x86/intel/pmt: add register access helpers
  platform/x86/intel/pmt: decouple sysfs and namespace
  platform/x86/intel/pmt: correct types
  platform/x86/intel/pmt: re-order trigger logic
  platform/x86/intel/pmt: use guard(mutex)
  platform/x86/intel/pmt: mutex clean up
  platform/x86/intel/pmt: white space cleanup
  drm/xe: Correct BMG VSEC header sizing
  drm/xe: Correct the rev value for the DVSEC entries
  platform/x86/intel/pmt: fix a crashlog NULL pointer access
  platform/x86: samsung-laptop: Expose charge_types
  platform/x86/amd: pmc: Add Lenovo Yoga 6 13ALC6 to pmc quirk list
  platform/x86: dell-uart-backlight: Use blacklight power constant
  platform/x86/intel/pmt: fix build dependency for kunit test
  platform/x86: lenovo: gamezone needs "other mode"
  ...
2025-07-28 23:21:28 -07:00
Linus Torvalds
8e736a2eea hardening updates for v6.17-rc1
- Introduce and start using TRAILING_OVERLAP() helper for fixing
   embedded flex array instances (Gustavo A. R. Silva)
 
 - mux: Convert mux_control_ops to a flex array member in mux_chip
   (Thorsten Blum)
 
 - string: Group str_has_prefix() and strstarts() (Andy Shevchenko)
 
 - Remove KCOV instrumentation from __init and __head (Ritesh Harjani,
   Kees Cook)
 
 - Refactor and rename stackleak feature to support Clang
 
 - Add KUnit test for seq_buf API
 
 - Fix KUnit fortify test under LTO
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCaIfUkgAKCRA2KwveOeQk
 uypLAP92r6f47sWcOw/5B9aVffX6Bypsb7dqBJQpCNxI5U1xcAEAiCrZ98UJyOeQ
 JQgnXd4N67K4EsS2JDc+FutRn3Yi+A8=
 =+5Bq
 -----END PGP SIGNATURE-----

Merge tag 'hardening-v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull hardening updates from Kees Cook:

 - Introduce and start using TRAILING_OVERLAP() helper for fixing
   embedded flex array instances (Gustavo A. R. Silva)

 - mux: Convert mux_control_ops to a flex array member in mux_chip
   (Thorsten Blum)

 - string: Group str_has_prefix() and strstarts() (Andy Shevchenko)

 - Remove KCOV instrumentation from __init and __head (Ritesh Harjani,
   Kees Cook)

 - Refactor and rename stackleak feature to support Clang

 - Add KUnit test for seq_buf API

 - Fix KUnit fortify test under LTO

* tag 'hardening-v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (22 commits)
  sched/task_stack: Add missing const qualifier to end_of_stack()
  kstack_erase: Support Clang stack depth tracking
  kstack_erase: Add -mgeneral-regs-only to silence Clang warnings
  init.h: Disable sanitizer coverage for __init and __head
  kstack_erase: Disable kstack_erase for all of arm compressed boot code
  x86: Handle KCOV __init vs inline mismatches
  arm64: Handle KCOV __init vs inline mismatches
  s390: Handle KCOV __init vs inline mismatches
  arm: Handle KCOV __init vs inline mismatches
  mips: Handle KCOV __init vs inline mismatch
  powerpc/mm/book3s64: Move kfence and debug_pagealloc related calls to __init section
  configs/hardening: Enable CONFIG_INIT_ON_FREE_DEFAULT_ON
  configs/hardening: Enable CONFIG_KSTACK_ERASE
  stackleak: Split KSTACK_ERASE_CFLAGS from GCC_PLUGINS_CFLAGS
  stackleak: Rename stackleak_track_stack to __sanitizer_cov_stack_depth
  stackleak: Rename STACKLEAK to KSTACK_ERASE
  seq_buf: Introduce KUnit tests
  string: Group str_has_prefix() and strstarts()
  kunit/fortify: Add back "volatile" for sizeof() constants
  acpi: nfit: intel: avoid multiple -Wflex-array-member-not-at-end warnings
  ...
2025-07-28 17:16:12 -07:00
Ingo Molnar
5bf2f5119b Linux 6.16
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCgA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmiGmY4eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGJtgH/3Ao101Cd+qBIQk6
 Ad1TrV6JfSnPMnbafjEYU6VjXyXKT4iIIxYDqndyxSt8fij5EIWMFveOVELkZhe9
 jDqlaODFrV7VYowVMAcVZCLEkQneommnbNaUaJK+odF+tPhkEbcHfubf/6Fd009q
 H0QB/EsExdjlLpwtkzM6yWnKss+1HA+wonw5TNbdj7+OE5vFemaonP2+lmO6EnCC
 PeiwkawLWA67hGonJFD4S6gp7KseqRUyQKLAbICDf6+86GueXI8QsIHhELyznfBG
 nN1DyhfxiPz1ICfvX7sIdPU+CuyQG0KCLX0iepNc0IGcJ40x8tLeXiHKNy/k+3Sf
 FBD2gmg=
 =HI0k
 -----END PGP SIGNATURE-----

Merge tag 'v6.16' into x86/cpu, to resolve conflict

Resolve overlapping context conflict between this upstream fix:

  d8010d4ba4 ("x86/bugs: Add a Transient Scheduler Attacks mitigation")

And this pending commit in tip:x86/cpu:

  65f55a3017 ("x86/CPU/AMD: Add CPUID faulting support")

  Conflicts:
	arch/x86/kernel/cpu/amd.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-07-28 07:17:38 +02:00
Kees Cook
381a38ea53 init.h: Disable sanitizer coverage for __init and __head
While __noinstr already contained __no_sanitize_coverage, it needs to
be added to __init and __head section markings to support the Clang
implementation of CONFIG_KSTACK_ERASE. This is to make sure the stack
depth tracking callback is not executed in unsupported contexts.

The other sanitizer coverage options (trace-pc and trace-cmp) aren't
needed in __head nor __init either ("We are interested in code coverage
as a function of a syscall inputs"[1]), so this is fine to disable for
them as well.

Link: https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/kcov.c?h=v6.14#n179 [1]
Acked-by: Marco Elver <elver@google.com>
Link: https://lore.kernel.org/r/20250724055029.3623499-3-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
2025-07-26 14:28:35 -07:00
Ryan Roberts
a9e056de66 mm: remove arch_flush_tlb_batched_pending() arch helper
Since commit 4b63491838 ("arm64/mm: Close theoretical race where stale
TLB entry remains valid"), all arches that use tlbbatch for reclaim
(arm64, riscv, x86) implement arch_flush_tlb_batched_pending() with a
flush_tlb_mm().

So let's simplify by removing the unnecessary abstraction and doing the
flush_tlb_mm() directly in flush_tlb_batched_pending().  This effectively
reverts commit db6c1f6f23 ("mm/tlbbatch: introduce
arch_flush_tlb_batched_pending()").

Link: https://lkml.kernel.org/r/20250609103132.447370-1-ryan.roberts@arm.com
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Suggested-by: Will Deacon <will@kernel.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-24 19:12:32 -07:00
Kees Cook
8245d47cfa x86: Handle KCOV __init vs inline mismatches
GCC appears to have kind of fragile inlining heuristics, in the
sense that it can change whether or not it inlines something based on
optimizations. It looks like the kcov instrumentation being added (or in
this case, removed) from a function changes the optimization results,
and some functions marked "inline" are _not_ inlined. In that case,
we end up with __init code calling a function not marked __init, and we
get the build warnings I'm trying to eliminate in the coming patch that
adds __no_sanitize_coverage to __init functions:

WARNING: modpost: vmlinux: section mismatch in reference: xbc_exit+0x8 (section: .text.unlikely) -> _xbc_exit (section: .init.text)
WARNING: modpost: vmlinux: section mismatch in reference: real_mode_size_needed+0x15 (section: .text.unlikely) -> real_mode_blob_end (section: .init.data)
WARNING: modpost: vmlinux: section mismatch in reference: __set_percpu_decrypted+0x16 (section: .text.unlikely) -> early_set_memory_decrypted (section: .init.text)
WARNING: modpost: vmlinux: section mismatch in reference: memblock_alloc_from+0x26 (section: .text.unlikely) -> memblock_alloc_try_nid (section: .init.text)
WARNING: modpost: vmlinux: section mismatch in reference: acpi_arch_set_root_pointer+0xc (section: .text.unlikely) -> x86_init (section: .init.data)
WARNING: modpost: vmlinux: section mismatch in reference: acpi_arch_get_root_pointer+0x8 (section: .text.unlikely) -> x86_init (section: .init.data)
WARNING: modpost: vmlinux: section mismatch in reference: efi_config_table_is_usable+0x16 (section: .text.unlikely) -> xen_efi_config_table_is_usable (section: .init.text)

This problem is somewhat fragile (though using either __always_inline
or __init will deterministically solve it), but we've tripped over
this before with GCC and the solution has usually been to just use
__always_inline and move on.

For x86 this means forcing several functions to be inline with
__always_inline.

Link: https://lore.kernel.org/r/20250724055029.3623499-2-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
2025-07-24 16:55:11 -07:00
FUJITA Tomonori
8c8efa93db x86/bug: Add ARCH_WARN_ASM macro for BUG/WARN asm code sharing with Rust
Add new ARCH_WARN_ASM macro for BUG/WARN assembly code sharing with
Rust to avoid the duplication.

No functional changes.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com>
Link: https://lore.kernel.org/r/20250502094537.231725-2-fujita.tomonori@gmail.com
[ Fixed typo in macro parameter name. - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-07-22 23:58:55 +02:00
Greg Kroah-Hartman
bcbef1e4a6 Linux 6.16-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCgA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmh9azkeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG7GIH/0lpQtHRl6N+q2Qs
 v75iG2ZouWyw2JlhUOHAToKU58MZqqTXLZzc8ZdY6fAd7DpXWKRGSDsyWVyLbUkt
 UKGzXEIJsHXYvw2QIPbhkY9gQBWpdZTh4tHztFyKb0QLn81qkibVP6ChOwSzOGa/
 xUyQ5v6yH+JvQlnQaCgy6hi7cMrLNSNZmuIjy0yc5Y153YPEtX5OUPO2PstpUx5r
 AuiOhU4ewW9QCe07X/Pk7tdn0T2Jg8Kwk1FViaM0RBUf/0GXGfsovIxpUP/eCyMc
 MA+9SXXLlDa/4Z8w3EsQYx6m2MnNmm0HPeevCmWqq3+Ocooik4si1BpzHfUaE6n/
 /0D8zBg=
 =NzEi
 -----END PGP SIGNATURE-----

Merge tag 'v6.16-rc7' into tty-next

We need the tty/serial fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-21 16:53:33 +02:00
Linus Torvalds
5f054ef2e0 hyperv-fixes for v6.16-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmh6uxMTHHdlaS5saXVA
 a2VybmVsLm9yZwAKCRB2FHBfkEGgXsDqB/9EZrNSmMhoctjhBwrEjilRPaBHaD0f
 7AGbdvs78sZDC2FMQeeqVeUDgv7nhZoT9B/cjSNkAXF8LdtEXOpiYY0sqG+fO8c9
 2DHBLBBGbFg+/SI+2SLzgrg8ixplnEF2HPR3pIgpmvqOKJcBd6+LnlyXs3xuBhc/
 XCerEwjHDpC27YQf3ZFvgomH1rw4w0yENZaz/ztpca41GScmbBq6K9c0SAzherBJ
 IDcbSUHNkT9LoFdXG9gawwX++XrBsfsKwlfAX6lYuzj+rdY5uxJaqMBg5sIA5Huh
 aRLB9gXHQVABTqp73kG76ZtBdmMsd2S1JF0YdRxdsL91MQ1C0n7AW9iK
 =0Qh6
 -----END PGP SIGNATURE-----

Merge tag 'hyperv-fixes-signed-20250718' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull hyperv fixes from Wei Liu:

 - Select use CONFIG_SYSFB only if EFI is enabled (Michael Kelley)

 - An assorted set of fixes to remove warnings for missing export.h
   header inclusion (Naman Jain)

 - An assorted set of fixes for when Linux run as the root partition
   for Microsoft Hypervisor (Mukesh Rathor, Nuno Das Neves, Stanislav
   Kinsburskii)

 - Fix the check for HYPERVISOR_CALLBACK_VECTOR (Naman Jain)

 - Fix fcopy tool to handle irregularities with size of ring buffer
   (Naman Jain)

 - Fix incorrect file path conversion in fcopy tool (Yasumasa Suenaga)

* tag 'hyperv-fixes-signed-20250718' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  tools/hv: fcopy: Fix irregularities with size of ring buffer
  PCI: hv: Use the correct hypercall for unmasking interrupts on nested
  x86/hyperv: Expose hv_map_msi_interrupt()
  Drivers: hv: Use nested hypercall for post message and signal event
  x86/hyperv: Clean up hv_map/unmap_interrupt() return values
  x86/hyperv: Fix usage of cpu_online_mask to get valid cpu
  PCI: hv: Don't load the driver for baremetal root partition
  net: mana: Fix warnings for missing export.h header inclusion
  PCI: hv: Fix warnings for missing export.h header inclusion
  clocksource: hyper-v: Fix warnings for missing export.h header inclusion
  x86/hyperv: Fix warnings for missing export.h header inclusion
  Drivers: hv: Fix warnings for missing export.h header inclusion
  Drivers: hv: Fix the check for HYPERVISOR_CALLBACK_VECTOR
  tools/hv: fcopy: Fix incorrect file path conversion
  Drivers: hv: Select CONFIG_SYSFB only if EFI is enabled
2025-07-20 09:29:43 -07:00
Stanislav Kinsburskii
52a45f870e x86/hyperv: Expose hv_map_msi_interrupt()
Move some of the logic of hv_irq_compose_irq_message() into
hv_map_msi_interrupt(). Make hv_map_msi_interrupt() a globally-available
helper function, which will be used to map PCI interrupts when running
in the root partition.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Reviewed-by: Roman Kisel <romank@linux.microsoft.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Link: https://lore.kernel.org/r/1752261532-7225-3-git-send-email-nunodasneves@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <1752261532-7225-3-git-send-email-nunodasneves@linux.microsoft.com>
2025-07-15 06:24:16 +00:00
Nuno Das Neves
36c46e64c5 Drivers: hv: Use nested hypercall for post message and signal event
When running nested, these hypercalls must be sent to the L0 hypervisor
or VMBus will fail.

Remove hv_do_nested_hypercall() and hv_do_fast_nested_hypercall8()
altogether and open-code these cases, since there are only 2 and all
they do is add the nested bit.

Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Reviewed-by: Roman Kisel <romank@linux.microsoft.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Link: https://lore.kernel.org/r/1752261532-7225-2-git-send-email-nunodasneves@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <1752261532-7225-2-git-send-email-nunodasneves@linux.microsoft.com>
2025-07-15 06:24:16 +00:00
Nikolay Borisov
0877ad1c4e x86/mm: Remove duplicated __PAGE_KERNEL(_EXEC) definitions
__PAGE_KERNEL(_EXEC) is defined twice, just remove the superfluous set.

No functional changes.

Signed-off-by: Nikolay Borisov <nik.borisov@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250714170258.390175-1-nik.borisov@suse.com
2025-07-14 21:34:42 +02:00
Linus Torvalds
5d5d62298b - Update Kirill's email address
- Allow hugetlb PMD sharing only on 64-bit as it doesn't make a whole lotta
   sense on 32-bit
 
 - Add fixes for a misconfigured AMD Zen2 client which wasn't even supposed to
   run Linux
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmhziVAACgkQEsHwGGHe
 VUoPLRAAqnv0D8pKO/UPUp05bOvKvEvYarK3Va4MV1QrqOgPIvabGbJOzYStU9+Q
 4FZ5ZCZbi0eV1sZuNP1Zk7Ryp/bYipR6gLX+jg06VTXXTrjKnUN3ofBLPQf0+fE9
 AZDShoSjS+6ifzt6BaUWW3uDMLOzwv50X/xwtLG+Nrprshs7HzfvJq2oFFQX6drQ
 kg7Cj9N8WNHl1kp6CVy2DXVRzv4VR9+yxeNfOCPOJiCVEmzRlMulPzQYWWagYidB
 +U+IYDJiG2p8YNL9aiCmnrRNpSfA4Podn8ZJVPKDwXSpmuUmfcLPur0c1Tjt97h0
 85ovsJs+RqBBzD3ixkbNSpdNLRBFVX7q5mx4n4+1DuR5ygZrBbDyjZce9gwY2YPh
 h1c2dnxxxkp9LAnBFJcaWiv8jzScRRbkqwHprBidkCS4plJiGhrD6MC78fMf8kE5
 i+dydBefrsYnBwe3ciyCZh/fCvPHk6OmegSdT1+0jlz2YJOGlD1uSPSMeE6YFFvW
 64R7MV3BLllBkpRx57zafx8tsdRiH9mZM6naltlcOcQV3JkHZhDJ5aNkfvBJpXw3
 RZtHHAG0noCGSMAhl/crQUkZOany2TdkDQn6SQpcM+iY/E0OQSH+/QM4Rw/s+05/
 FEtmkC2FqJ00RODhzSqKIwkKSgiwSCokR4pty5OJqBirUqDFNiw=
 =d3UC
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v6.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Update Kirill's email address

 - Allow hugetlb PMD sharing only on 64-bit as it doesn't make a whole
   lotta sense on 32-bit

 - Add fixes for a misconfigured AMD Zen2 client which wasn't even
   supposed to run Linux

* tag 'x86_urgent_for_v6.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  MAINTAINERS: Update Kirill Shutemov's email address for TDX
  x86/mm: Disable hugetlb page table sharing on 32-bit
  x86/CPU/AMD: Disable INVLPGB on Zen2
  x86/rdrand: Disable RDSEED on AMD Cyan Skillfish
2025-07-13 10:41:19 -07:00
Neeraj Upadhyay
b95a9d3136 x86/apic: Rename 'reg_off' to 'reg'
Rename the 'reg_off' parameter of apic_{set|get}_reg() to 'reg' to
match other usages in apic.h.

No functional change intended.

Reviewed-by: Tianyu Lan <tiala@microsoft.com>
Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
Link: https://lore.kernel.org/r/20250709033242.267892-15-Neeraj.Upadhyay@amd.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10 09:44:44 -07:00
Neeraj Upadhyay
17776e6c20 x86/apic: KVM: Move apic_test)vector() to common code
Move apic_test_vector() to apic.h in order to reuse it in the Secure AVIC
guest APIC driver in later patches to test vector state in the APIC
backing page.

No functional change intended.

Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
Acked-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20250709033242.267892-14-Neeraj.Upadhyay@amd.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10 09:44:43 -07:00
Neeraj Upadhyay
fe954bcd57 x86/apic: KVM: Move lapic set/clear_vector() helpers to common code
Move apic_clear_vector() and apic_set_vector() helper functions to
apic.h in order to reuse them in the Secure AVIC guest APIC driver
in later patches to atomically set/clear vectors in the APIC backing
page.

No functional change intended.

Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
Acked-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20250709033242.267892-13-Neeraj.Upadhyay@amd.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10 09:44:43 -07:00
Neeraj Upadhyay
3d3a9083da x86/apic: KVM: Move lapic get/set helpers to common code
Move the apic_get_reg(), apic_set_reg(), apic_get_reg64() and
apic_set_reg64() helper functions to apic.h in order to reuse them in the
Secure AVIC guest APIC driver in later patches to read/write registers
from/to the APIC backing page.

No functional change intended.

Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
Acked-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20250709033242.267892-12-Neeraj.Upadhyay@amd.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10 09:44:42 -07:00
Neeraj Upadhyay
39e81633f6 x86/apic: KVM: Move apic_find_highest_vector() to a common header
In preparation for using apic_find_highest_vector() in Secure AVIC
guest APIC driver, move it and associated macros to apic.h.

No functional change intended.

Acked-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
Link: https://lore.kernel.org/r/20250709033242.267892-11-Neeraj.Upadhyay@amd.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10 09:44:41 -07:00
Sean Christopherson
dc98e3bd49 x86/apic: KVM: Deduplicate APIC vector => register+bit math
Consolidate KVM's {REG,VEC}_POS() macros and lapic_vector_set_in_irr()'s
open coded equivalent logic in anticipation of the kernel gaining more
usage of vector => reg+bit lookups.

Use lapic_vector_set_in_irr()'s math as using divides for both the bit
number and register offset makes it easier to connect the dots, and for at
least one user, fixup_irqs(), "/ 32 * 0x10" generates ever so slightly
better code with gcc-14 (shaves a whole 3 bytes from the code stream):

((v) >> 5) << 4:
  c1 ef 05           shr    $0x5,%edi
  c1 e7 04           shl    $0x4,%edi
  81 c7 00 02 00 00  add    $0x200,%edi

(v) / 32 * 0x10:
  c1 ef 05           shr    $0x5,%edi
  83 c7 20           add    $0x20,%edi
  c1 e7 04           shl    $0x4,%edi

Keep KVM's tersely named macros as "wrappers" to avoid unnecessary churn
in KVM, and because the shorter names yield more readable code overall in
KVM.

The new macros type cast the vector parameter to "unsigned int". This is
required from better code generation for cases where an "int" is passed
to these macros in KVM code.

int v;

((v) >> 5) << 4:

  c1 f8 05    sar    $0x5,%eax
  c1 e0 04    shl    $0x4,%eax

((v) / 32 * 0x10):

  85 ff       test   %edi,%edi
  8d 47 1f    lea    0x1f(%rdi),%eax
  0f 49 c7    cmovns %edi,%eax
  c1 f8 05    sar    $0x5,%eax
  c1 e0 04    shl    $0x4,%eax

((unsigned int)(v) / 32 * 0x10):

  c1 f8 05    sar    $0x5,%eax
  c1 e0 04    shl    $0x4,%eax

(v) & (32 - 1):

  89 f8       mov    %edi,%eax
  83 e0 1f    and    $0x1f,%eax

(v) % 32

  89 fa       mov    %edi,%edx
  c1 fa 1f    sar    $0x1f,%edx
  c1 ea 1b    shr    $0x1b,%edx
  8d 04 17    lea    (%rdi,%rdx,1),%eax
  83 e0 1f    and    $0x1f,%eax
  29 d0       sub    %edx,%eax

(unsigned int)(v) % 32:

  89 f8       mov    %edi,%eax
  83 e0 1f    and    $0x1f,%eax

Overall kvm.ko text size is impacted if "unsigned int" is not used.

Bin      Orig     New (w/o unsigned int)  New (w/ unsigned int)

lapic.o  28580        28772                 28580
kvm.o    670810       671002                670810
kvm.ko   708079       708271                708079

No functional change intended.

[Neeraj: Type cast vec macro param to "unsigned int", provide data
         in commit log on "unsigned int" requirement]

Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
Link: https://lore.kernel.org/r/20250709033242.267892-4-Neeraj.Upadhyay@amd.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10 09:44:37 -07:00
Linus Torvalds
73d7cf0710 ARM:
- Remove the last leftovers of the ill-fated FPSIMD host state
   mapping at EL2 stage-1
 
 - Fix unexpected advertisement to the guest of unimplemented S2 base
   granule sizes
 
 - Gracefully fail initialising pKVM if the interrupt controller isn't
   GICv3
 
 - Also gracefully fail initialising pKVM if the carveout allocation
   fails
 
 - Fix the computing of the minimum MMIO range required for the host on
   stage-2 fault
 
 - Fix the generation of the GICv3 Maintenance Interrupt in nested mode
 
 x86:
 
 - Reject SEV{-ES} intra-host migration if one or more vCPUs are actively
   being created, so as not to create a non-SEV{-ES} vCPU in an SEV{-ES} VM.
 
 - Use a pre-allocated, per-vCPU buffer for handling de-sparsification of
   vCPU masks in Hyper-V hypercalls; fixes a "stack frame too large" issue.
 
 - Allow out-of-range/invalid Xen event channel ports when configuring IRQ
   routing, to avoid dictating a specific ioctl() ordering to userspace.
 
 - Conditionally reschedule when setting memory attributes to avoid soft
   lockups when userspace converts huge swaths of memory to/from private.
 
 - Add back MWAIT as a required feature for the MONITOR/MWAIT selftest.
 
 - Add a missing field in struct sev_data_snp_launch_start that resulted in
   the guest-visible workarounds field being filled at the wrong offset.
 
 - Skip non-canonical address when processing Hyper-V PV TLB flushes to avoid
   VM-Fail on INVVPID.
 
 - Advertise supported TDX TDVMCALLs to userspace.
 
 - Pass SetupEventNotifyInterrupt arguments to userspace.
 
 - Fix TSC frequency underflow.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmhurKgUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNxHggApTP4vw+oOzfN7UoNmgR9XZMI1p2a
 R8AzQ1zDyVbEVWq3xTKvXtld+dKeO0yKB/XeI/1JLck1OiHxY57I3X6k5AnsurEr
 CBzeAhAjXivF8woMgmlP+30aqpomcPACdQm0gRnWkRDDJfXqSUas/iE/s9Ct1dT4
 4w3PtFLsSsU8vX/RttR+CqF1AQ6SeV/NRvA8hzPGMGZoQ2um74j4ZsM/3xh77Kdw
 Z2vOnZOIA4dk0074JjO/Yb9l00Ib4hn+MWG5jVJ+6i2HRRYd2knnB29apVS/ARdL
 X20j+LvtYj/jrPPdYwqjvxbIXyLbJrLCZyjKhfueN+rnisPNvzR+7YE4ZQ==
 =NduO
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "Many patches, pretty much all of them small, that accumulated while I
  was on vacation.

  ARM:

   - Remove the last leftovers of the ill-fated FPSIMD host state
     mapping at EL2 stage-1

   - Fix unexpected advertisement to the guest of unimplemented S2 base
     granule sizes

   - Gracefully fail initialising pKVM if the interrupt controller isn't
     GICv3

   - Also gracefully fail initialising pKVM if the carveout allocation
     fails

   - Fix the computing of the minimum MMIO range required for the host
     on stage-2 fault

   - Fix the generation of the GICv3 Maintenance Interrupt in nested
     mode

  x86:

   - Reject SEV{-ES} intra-host migration if one or more vCPUs are
     actively being created, so as not to create a non-SEV{-ES} vCPU in
     an SEV{-ES} VM

   - Use a pre-allocated, per-vCPU buffer for handling de-sparsification
     of vCPU masks in Hyper-V hypercalls; fixes a "stack frame too
     large" issue

   - Allow out-of-range/invalid Xen event channel ports when configuring
     IRQ routing, to avoid dictating a specific ioctl() ordering to
     userspace

   - Conditionally reschedule when setting memory attributes to avoid
     soft lockups when userspace converts huge swaths of memory to/from
     private

   - Add back MWAIT as a required feature for the MONITOR/MWAIT selftest

   - Add a missing field in struct sev_data_snp_launch_start that
     resulted in the guest-visible workarounds field being filled at the
     wrong offset

   - Skip non-canonical address when processing Hyper-V PV TLB flushes
     to avoid VM-Fail on INVVPID

   - Advertise supported TDX TDVMCALLs to userspace

   - Pass SetupEventNotifyInterrupt arguments to userspace

   - Fix TSC frequency underflow"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: avoid underflow when scaling TSC frequency
  KVM: arm64: Remove kvm_arch_vcpu_run_map_fp()
  KVM: arm64: Fix handling of FEAT_GTG for unimplemented granule sizes
  KVM: arm64: Don't free hyp pages with pKVM on GICv2
  KVM: arm64: Fix error path in init_hyp_mode()
  KVM: arm64: Adjust range correctly during host stage-2 faults
  KVM: arm64: nv: Fix MI line level calculation in vgic_v3_nested_update_mi()
  KVM: x86/hyper-v: Skip non-canonical addresses during PV TLB flush
  KVM: SVM: Add missing member in SNP_LAUNCH_START command structure
  Documentation: KVM: Fix unexpected unindent warnings
  KVM: selftests: Add back the missing check of MONITOR/MWAIT availability
  KVM: Allow CPU to reschedule while setting per-page memory attributes
  KVM: x86/xen: Allow 'out of range' event channel ports in IRQ routing table.
  KVM: x86/hyper-v: Use preallocated per-vCPU buffer for de-sparsified vCPU masks
  KVM: SVM: Initialize vmsa_pa in VMCB to INVALID_PAGE if VMSA page is NULL
  KVM: SVM: Reject SEV{-ES} intra host migration if vCPU creation is in-flight
  KVM: TDX: Report supported optional TDVMCALLs in TDX capabilities
  KVM: TDX: Exit to userspace for SetupEventNotifyInterrupt
2025-07-10 09:06:53 -07:00
Zheyun Shen
4fdc3431e0 x86/lib: Add WBINVD and WBNOINVD helpers to target multiple CPUs
Extract KVM's open-coded calls to do writeback caches on multiple CPUs to
common library helpers for both WBINVD and WBNOINVD (KVM will use both).
Put the onus on the caller to check for a non-empty mask to simplify the
SMP=n implementation, e.g. so that it doesn't need to check that the one
and only CPU in the system is present in the mask.

  [sean: move to lib, add SMP=n helpers, clarify usage]

Signed-off-by: Zheyun Shen <szy0127@sjtu.edu.cn>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Acked-by: Kai Huang <kai.huang@intel.com>
Link: https://lore.kernel.org/r/20250128015345.7929-2-szy0127@sjtu.edu.cn
Link: https://lore.kernel.org/20250522233733.3176144-5-seanjc@google.com
2025-07-10 13:30:17 +02:00
Kevin Loughlin
07f99c3fbe x86/lib: Add WBNOINVD helper functions
In line with WBINVD usage, add WBNOINVD helper functions.  Explicitly fall
back to WBINVD (via alternative()) if WBNOINVD isn't supported even though
the instruction itself is backwards compatible (WBNOINVD is WBINVD with an
ignored REP prefix), so that disabling X86_FEATURE_WBNOINVD behaves as one
would expect, e.g. in case there's a hardware issue that affects WBNOINVD.

Opportunistically, add comments explaining the architectural behavior of
WBINVD and WBNOINVD, and provide hints and pointers to uarch-specific
behavior.

Note, alternative() ensures compatibility with early boot code as needed.

  [ bp: Massage, fix typos, make export _GPL. ]

Signed-off-by: Kevin Loughlin <kevinloughlin@google.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/20250522233733.3176144-4-seanjc@google.com
2025-07-10 13:23:10 +02:00
Sean Christopherson
e638081751 x86/lib: Drop the unused return value from wbinvd_on_all_cpus()
Drop wbinvd_on_all_cpus()'s return value; both the "real" version and the
stub always return '0', and none of the callers check the return.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250522233733.3176144-3-seanjc@google.com
2025-07-10 13:13:26 +02:00
Alistair Popple
d438d27341 mm: remove devmap related functions and page table bits
Now that DAX and all other reference counts to ZONE_DEVICE pages are
managed normally there is no need for the special devmap PTE/PMD/PUD page
table bits.  So drop all references to these, freeing up a software
defined page table bit on architectures supporting it.

Link: https://lkml.kernel.org/r/6389398c32cc9daa3dfcaa9f79c7972525d310ce.1750323463.git-series.apopple@nvidia.com
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Acked-by: Will Deacon <will@kernel.org> # arm64
Acked-by: David Hildenbrand <david@redhat.com>
Suggested-by: Chunyan Zhang <zhang.lyra@gmail.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Cc: Balbir Singh <balbirs@nvidia.com>
Cc: Björn Töpel <bjorn@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Deepak Gupta <debug@rivosinc.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Inki Dae <m.szyprowski@samsung.com>
Cc: John Groves <john@groves.net>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09 22:42:18 -07:00
Jim Mattson
6fbef8615d KVM: x86: Replace growing set of *_in_guest bools with a u64
Store each "disabled exit" boolean in a single bit rather than a byte.

No functional change intended.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Link: https://lore.kernel.org/r/20250530185239.2335185-2-jmattson@google.com
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20250626001225.744268-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-09 09:32:32 -07:00
Mikhail Paulyshka
5b937a1ed6 x86/rdrand: Disable RDSEED on AMD Cyan Skillfish
AMD Cyan Skillfish (Family 17h, Model 47h, Stepping 0h) has an error that
causes RDSEED to always return 0xffffffff, while RDRAND works correctly.

Mask the RDSEED cap for this CPU so that both /proc/cpuinfo and direct CPUID
read report RDSEED as unavailable.

  [ bp: Move to amd.c, massage. ]

Signed-off-by: Mikhail Paulyshka <me@mixaill.net>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/20250524145319.209075-1-me@mixaill.net
2025-07-08 21:33:26 +02:00
Linus Torvalds
6e9128ff9d Add the mitigation logic for Transient Scheduler Attacks (TSA)
TSA are new aspeculative side channel attacks related to the execution
 timing of instructions under specific microarchitectural conditions. In
 some cases, an attacker may be able to use this timing information to
 infer data from other contexts, resulting in information leakage.
 
 Add the usual controls of the mitigation and integrate it into the
 existing speculation bugs infrastructure in the kernel.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmhSsvQACgkQEsHwGGHe
 VUrWNw//V+ZabYq3Nnvh4jEe6Altobnpn8bOIWmcBx6I3xuuArb9bLqcbKerDIcC
 POVVW6zrdNigDe/U4aqaJXE7qCRX55uTYbhp8OLH0zzqX3Pjl/hUnEXWtMtlXj/G
 CIM5mqjqEFp5JRGXetdjjuvjG1IPf+CbjKqj2WXbi//T6F3LiAFxkzdUhd+clBF/
 ztWchjwUmqU0WJd6+Smb8ZnvWrLoZuOFldjhFad820B7fqkdJhzjHMmwBHJKUEZu
 oABv8B0/4IALrx6LenCspWS4OuTOGG7DKyIgzitByXygXXb4L3ZUKpuqkxBU7hFx
 bscwtOP7e5HIYAekx6ZSLZoZpYQXr1iH0aRGrjwapi3ASIpUwI0UA9ck2PdGo0IY
 0GvmN0vbybskewBQyG819BM+DCau5pOLWuL7cYmaD2eTNoOHOknMDNlO8VzXqJxa
 NnignSuEWFm2vNV1FXEav2YbVjlanV6JleiPDGBe5Xd9dnxZTvg9HuP2NkYio4dZ
 mb/kEU/kTcN8nWh0Q96tX45kmj0vCbBgrSQkmUpyAugp38n69D1tp3ii9D/hyQFH
 hKGcFC9m+rYVx1NLyAxhTGxaEqF801d5Qawwud8HsnQudTpCdSXD9fcBg9aCbWEa
 FymtDpIeUQrFAjDpVEp6Syh3odKvLXsGEzL+DVvqKDuA8r6DxFo=
 =2cLl
 -----END PGP SIGNATURE-----

Merge tag 'tsa_x86_bugs_for_6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull CPU speculation fixes from Borislav Petkov:
 "Add the mitigation logic for Transient Scheduler Attacks (TSA)

  TSA are new aspeculative side channel attacks related to the execution
  timing of instructions under specific microarchitectural conditions.
  In some cases, an attacker may be able to use this timing information
  to infer data from other contexts, resulting in information leakage.

  Add the usual controls of the mitigation and integrate it into the
  existing speculation bugs infrastructure in the kernel"

* tag 'tsa_x86_bugs_for_6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/process: Move the buffer clearing before MONITOR
  x86/microcode/AMD: Add TSA microcode SHAs
  KVM: SVM: Advertise TSA CPUID bits to guests
  x86/bugs: Add a Transient Scheduler Attacks mitigation
  x86/bugs: Rename MDS machinery to something more generic
2025-07-07 17:08:36 -07:00
Perry Yuan
a3c4f3396b x86/msr-index: Add AMD workload classification MSRs
Introduce new MSR registers for AMD hardware feedback support.  They provide
workload classification and configuration capabilities.

Signed-off-by: Perry Yuan <perry.yuan@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/20250609200518.3616080-4-superm1@kernel.org
2025-07-07 19:24:58 +02:00
Linus Torvalds
5fc2e891a5 - Make sure AMD SEV guests using secure TSC, include a TSC_FACTOR which
prevents their TSCs from going skewed from the hypervisor's
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmhqMCYACgkQEsHwGGHe
 VUrz6A/9EMN2fEdpeStGrk5t0BvYPhnIApr2x2QuvVN6qXgyGWQXhnF5G94SFNn8
 ykNxcGi97R/wxqk2uK3RfBg4P0ScoPDOLKKSeaqO0LVuHDTVX72fwB1F3qdaPNbp
 EIEL+OOEwUAwviT2GSH4mwTb1C7TuJnOZH2lC6yDkWwN5BLnIA4P4C0Wr4pIQ+MT
 TMzxGMT01yTnCAGHGOD2NRUIv/29qeJl+18uOqDO4A64RPT5Pp0yLJ72U4grGzOr
 2C49+/XD0qjXMs8vRz/CbeBK47ZaE1v/ui3g5ofbZ2YtcrfJXDY2/SwoJ0Oz+liM
 TSEXj8IpFZ3aq3+Pvgp9Qibu5QnFxJi7xTzrGCG1OSouHXTH2eFSVIXCiojVU12Y
 s+pKCBTXs9wVJN4z/FaSSwmTvQolld7oozShgPieZsYNBfJeeWIm+6LZRT4Zr/7Y
 UVsYEc/7m36ggKK+XFHsea2ZnmUFV18kEHPuWAXwmH3DW3dDfI5nm/s811jsbS+6
 2RaLZPiKBsYmNZ7iCujrY3GEmE5Eyemr8Ricj2zSGTCH2EYNeODDQBOn+hYgQOTK
 WJFWqpC5JqI5oJapmhugCkjfT75e+XTgO8Dox7HdlJR4UAb61xxf1zGFbThH8T45
 LZgbIKtLwwLShg0FwzDl7swnADJ/SiaKl049Q8Z5YthhllHy6zI=
 =6yl8
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v6.16_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fix from Borislav Petkov:

 - Make sure AMD SEV guests using secure TSC, include a TSC_FACTOR which
   prevents their TSCs from going skewed from the hypervisor's

* tag 'x86_urgent_for_v6.16_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sev: Use TSC_FACTOR for Secure TSC frequency calculation
2025-07-06 10:44:20 -07:00