mirror of
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2025-08-05 16:54:27 +00:00
9273 commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
![]() |
be286b8637 |
powerpc: Mark [h]ssr_valid accesses in check_return_regs_valid
Checks to see if the [H]SRR registers have been clobbered by (soft) NMI interrupts imply the possibility for a data race on the [h]srr_valid entries in the PACA. Annotate accesses to these fields with READ_ONCE, removing the need for the barrier. The diagnostic can use plain-access reads and writes, but annotate with data_race. Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Reported-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230510033117.1395895-5-rmclure@linux.ibm.com |
||
![]() |
b684c09f09 |
powerpc: update ppc_save_regs to save current r1 in pt_regs
ppc_save_regs() skips one stack frame while saving the CPU register states.
Instead of saving current R1, it pulls the previous stack frame pointer.
When vmcores caused by direct panic call (such as `echo c >
/proc/sysrq-trigger`), are debugged with gdb, gdb fails to show the
backtrace correctly. On further analysis, it was found that it was because
of mismatch between r1 and NIP.
GDB uses NIP to get current function symbol and uses corresponding debug
info of that function to unwind previous frames, but due to the
mismatching r1 and NIP, the unwinding does not work, and it fails to
unwind to the 2nd frame and hence does not show the backtrace.
GDB backtrace with vmcore of kernel without this patch:
---------
(gdb) bt
#0 0xc0000000002a53e8 in crash_setup_regs (oldregs=<optimized out>,
newregs=0xc000000004f8f8d8) at ./arch/powerpc/include/asm/kexec.h:69
#1 __crash_kexec (regs=<optimized out>) at kernel/kexec_core.c:974
#2 0x0000000000000063 in ?? ()
#3 0xc000000003579320 in ?? ()
---------
Further analysis revealed that the mismatch occurred because
"ppc_save_regs" was saving the previous stack's SP instead of the current
r1. This patch fixes this by storing current r1 in the saved pt_regs.
GDB backtrace with vmcore of patched kernel:
--------
(gdb) bt
#0 0xc0000000002a53e8 in crash_setup_regs (oldregs=0x0, newregs=0xc00000000670b8d8)
at ./arch/powerpc/include/asm/kexec.h:69
#1 __crash_kexec (regs=regs@entry=0x0) at kernel/kexec_core.c:974
#2 0xc000000000168918 in panic (fmt=fmt@entry=0xc000000001654a60 "sysrq triggered crash\n")
at kernel/panic.c:358
#3 0xc000000000b735f8 in sysrq_handle_crash (key=<optimized out>) at drivers/tty/sysrq.c:155
#4 0xc000000000b742cc in __handle_sysrq (key=key@entry=99, check_mask=check_mask@entry=false)
at drivers/tty/sysrq.c:602
#5 0xc000000000b7506c in write_sysrq_trigger (file=<optimized out>, buf=<optimized out>,
count=2, ppos=<optimized out>) at drivers/tty/sysrq.c:1163
#6 0xc00000000069a7bc in pde_write (ppos=<optimized out>, count=<optimized out>,
buf=<optimized out>, file=<optimized out>, pde=0xc00000000362cb40) at fs/proc/inode.c:340
#7 proc_reg_write (file=<optimized out>, buf=<optimized out>, count=<optimized out>,
ppos=<optimized out>) at fs/proc/inode.c:352
#8 0xc0000000005b3bbc in vfs_write (file=file@entry=0xc000000006aa6b00,
buf=buf@entry=0x61f498b4f60 <error: Cannot access memory at address 0x61f498b4f60>,
count=count@entry=2, pos=pos@entry=0xc00000000670bda0) at fs/read_write.c:582
#9 0xc0000000005b4264 in ksys_write (fd=<optimized out>,
buf=0x61f498b4f60 <error: Cannot access memory at address 0x61f498b4f60>, count=2)
at fs/read_write.c:637
#10 0xc00000000002ea2c in system_call_exception (regs=0xc00000000670be80, r0=<optimized out>)
at arch/powerpc/kernel/syscall.c:171
#11 0xc00000000000c270 in system_call_vectored_common ()
at arch/powerpc/kernel/interrupt_64.S:192
--------
Nick adds:
So this now saves regs as though it was an interrupt taken in the
caller, at the instruction after the call to ppc_save_regs, whereas
previously the NIP was there, but R1 came from the caller's caller and
that mismatch is what causes gdb's dwarf unwinder to go haywire.
Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
Fixes:
|
||
![]() |
97228ca375 |
powerpc/ptrace: Expose HASHKEYR register to ptrace
The HASHKEYR register contains a secret per-process key to enable unique hashes per process. In general it should not be exposed to userspace at all and a regular process has no need to know its key. However, checkpoint restore in userspace (CRIU) functionality requires that a process be able to set the HASHKEYR of another process, otherwise existing hashes on the stack would be invalidated by a new random key. Exposing HASHKEYR in this way also makes it appear in core dumps, which is a security concern. Multiple threads may share a key, for example just after a fork() call, where the kernel cannot know if the child is going to return back along the parent's stack. If such a thread is coerced into making a core dump, then the HASHKEYR value will be readable and able to be used against all other threads sharing that key, effectively undoing any protection offered by hashst/hashchk. Therefore we expose HASHKEYR to ptrace when CONFIG_CHECKPOINT_RESTORE is enabled, providing a choice of increased security or migratable ROP protected processes. This is similar to how ARM exposes its PAC keys. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230616034846.311705-8-bgray@linux.ibm.com |
||
![]() |
884ad5c52d |
powerpc/ptrace: Expose DEXCR and HDEXCR registers to ptrace
The DEXCR register is of interest when ptracing processes. Currently it is static, but eventually will be dynamically controllable by a process. If a process can control its own, then it is useful for it to be ptrace-able to (e.g., for checkpoint-restore functionality). It is also relevant to core dumps (the NPHIE aspect in particular), which use the ptrace mechanism (or is it the other way around?) to decide what to dump. The HDEXCR is useful here too, as the NPHIE aspect may be set in the HDEXCR without being set in the DEXCR. Although the HDEXCR is per-cpu and we don't track it in the task struct (it's useless in normal operation), it would be difficult to imagine why a hypervisor would set it to different values within a guest. A hypervisor cannot safely set NPHIE differently at least, as that would break programs. Expose a read-only view of the userspace DEXCR and HDEXCR to ptrace. The HDEXCR is always readonly, and is useful for diagnosing the core dumps (as the HDEXCR may set NPHIE without the DEXCR setting it). Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Reviewed-by: Russell Currey <ruscur@russell.cc> [mpe: Use lower_32_bits() rather than open coding] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230616034846.311705-7-bgray@linux.ibm.com |
||
![]() |
be98fcf7c1 |
powerpc/dexcr: Support userspace ROP protection
The ISA 3.1B hashst and hashchk instructions use a per-cpu SPR HASHKEYR to hold a key used in the hash calculation. This key should be different for each process to make it harder for a malicious process to recreate valid hash values for a victim process. Add support for storing a per-thread hash key, and setting/clearing HASHKEYR appropriately. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230616034846.311705-6-bgray@linux.ibm.com |
||
![]() |
5bcba4e6c1 |
powerpc/dexcr: Handle hashchk exception
Recognise and pass the appropriate signal to the user program when a hashchk instruction triggers. This is independent of allowing configuration of DEXCR[NPHIE], as a hypervisor can enforce this aspect regardless of the kernel. The signal mirrors how ARM reports their similar check failure. For example, their FPAC handler in arch/arm64/kernel/traps.c do_el0_fpac() does this. When we fail to read the instruction that caused the fault we send a segfault, similar to how emulate_math() does it. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230616034846.311705-5-bgray@linux.ibm.com |
||
![]() |
0ffd60b782 |
powerpc/dexcr: Add initial Dynamic Execution Control Register (DEXCR) support
ISA 3.1B introduces the Dynamic Execution Control Register (DEXCR). It is a per-cpu register that allows control over various CPU behaviours including branch hint usage, indirect branch speculation, and hashst/hashchk support. Add some definitions and basic support for the DEXCR in the kernel. Right now it just * Initialises the DEXCR and HASHKEYR to a fixed value when a CPU onlines. * Clears them in reset_sprs(). * Detects when the NPHIE aspect is supported (the others don't get looked at in this series, so there's no need to waste a CPU_FTR on them). We initialise the HASHKEYR to ensure that all cores have the same key, so an HV enforced NPHIE + swapping cores doesn't randomly crash a process using hash instructions. The stores to HASHKEYR are unconditional because the ISA makes no mention of the SPR being missing if support for doing the hashes isn't present. So all that would happen is the HASHKEYR value gets ignored. This helps slightly if NPHIE detection fails; e.g., we currently only detect it on pseries. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> [mpe: Use simple values for DEXCR constants] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230616034846.311705-4-bgray@linux.ibm.com |
||
![]() |
81e30a5412 |
powerpc/ptrace: Add missing <linux/regset.h> include
ptrace-decl.h uses user_regset_get2_fn (among other things) from regset.h. While all current users of ptrace-decl.h include regset.h before it anyway, it adds an implicit ordering dependency and breaks source tooling that tries to inspect ptrace-decl.h by itself. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230616034846.311705-3-bgray@linux.ibm.com |
||
![]() |
8ad57add77 |
powerpc/build: vdso linker warning for orphan sections
Add --orphan-handlin for vdsos, and adjust vdso linker scripts to deal with orphan sections. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230609051002.3342-1-npiggin@gmail.com |
||
![]() |
27be245633 |
powerpc/64: Rename entry_64.S to prom_entry_64.S
This file contains only the enter_prom implementation now. Trim includes and update header comment while we're here. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230606132447.315714-7-npiggin@gmail.com |
||
![]() |
afc6386815 |
powerpc: merge 32-bit and 64-bit _switch implementation
The _switch stack frame setup are substantially the same, so are the comments. The difference in how the stack and current are switched, and other hardware and software housekeeping is done is moved into macros. Generated code should be unchanged. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Tweak include orer to fix compile errors on some configs] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230606132447.315714-6-npiggin@gmail.com |
||
![]() |
1541a21305 |
powerpc/eeh: Rely on dev->link_active_reporting
Use dev->link_active_reporting to determine whether Data Link Layer Link Active Reporting is available rather than re-retrieving the capability. Link: https://lore.kernel.org/r/alpine.DEB.2.21.2305310124100.59226@angie.orcam.me.uk Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> |
||
![]() |
6958ad05d5 |
powerpc/32: Rearrange _switch to prepare for 32/64 merge
Change the order of some operations and change some register numbers in preparation to merge 32-bit and 64-bit switch. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230606132447.315714-5-npiggin@gmail.com |
||
![]() |
fc8562c9b6 |
powerpc/32: Remove sync from _switch
64-bit has removed the sync from _switch since commit
|
||
![]() |
0eb8088b5a |
powerpc/64: Rearrange 64-bit _switch to prepare for 32/64 merge
More some 64-bit specifics out from the function epilogue and rearrange this to be a bit neater, use 32-bit mem ops for CR save/restore, and change some register numbers. This is preparation to consolidate 32-bit and 64-bit switch code. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230606132447.315714-3-npiggin@gmail.com |
||
![]() |
d6b87c3eb6 |
powerpc/64s: move stack SLB pinning out of line from _switch
The large hunk of SLB pinning in _switch asm code makes it more difficult to see everything else that's going on. It is a less important path now, so icache and fetch footprint overhead can be avoided. Move context switch stack SLB pinning out of line. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230606132447.315714-2-npiggin@gmail.com |
||
![]() |
1eea99c045 |
powerpc/legacy_serial: Handle SERIAL_8250_FSL=n build failures
With SERIAL_8250=y and SERIAL_8250_FSL_CONSOLE=n the both
IS_ENABLED(CONFIG_SERIAL_8250) and IS_REACHABLE(CONFIG_SERIAL_8250)
evaluate to true and so fsl8250_handle_irq() is used. However this
function is only available if CONFIG_SERIAL_8250_CONSOLE=y (and thus
SERIAL_8250_FSL=y).
To prepare SERIAL_8250_FSL becoming tristate and being enabled in more
cases, check for IS_REACHABLE(CONFIG_SERIAL_8250_FSL) before making use
of fsl8250_handle_irq(). This check is correct with and without the
change to make SERIAL_8250_FSL modular.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Fixes:
|
||
![]() |
df95d3085c |
watchdog/hardlockup: rename some "NMI watchdog" constants/function
Do a search and replace of: - NMI_WATCHDOG_ENABLED => WATCHDOG_HARDLOCKUP_ENABLED - SOFT_WATCHDOG_ENABLED => WATCHDOG_SOFTOCKUP_ENABLED - watchdog_nmi_ => watchdog_hardlockup_ - nmi_watchdog_available => watchdog_hardlockup_available - nmi_watchdog_user_enabled => watchdog_hardlockup_user_enabled - soft_watchdog_user_enabled => watchdog_softlockup_user_enabled - NMI_WATCHDOG_DEFAULT => WATCHDOG_HARDLOCKUP_DEFAULT Then update a few comments near where names were changed. This is specifically to make it less confusing when we want to introduce the buddy hardlockup detector, which isn't using NMIs. As part of this, we sanitized a few names for consistency. [trix@redhat.com: make variables static] Link: https://lkml.kernel.org/r/20230525162822.1.I0fb41d138d158c9230573eaa37dc56afa2fb14ee@changeid Link: https://lkml.kernel.org/r/20230519101840.v5.12.I91f7277bab4bf8c0cb238732ed92e7ce7bbd71a6@changeid Signed-off-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Tom Rix <trix@redhat.com> Reviewed-by: Tom Rix <trix@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen-Yu Tsai <wens@csie.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Colin Cross <ccross@android.com> Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Guenter Roeck <groeck@chromium.org> Cc: Ian Rogers <irogers@google.com> Cc: Lecopzer Chen <lecopzer.chen@mediatek.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masayoshi Mizuma <msys.mizuma@gmail.com> Cc: Matthias Kaehlcke <mka@chromium.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Pingfan Liu <kernelfans@gmail.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com> Cc: Ricardo Neri <ricardo.neri@intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Stephen Boyd <swboyd@chromium.org> Cc: Sumit Garg <sumit.garg@linaro.org> Cc: Tzung-Bi Shih <tzungbi@chromium.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
![]() |
946e697c69 |
cachestat: wire up cachestat for other architectures
cachestat is previously only wired in for x86 (and architectures using the generic unistd.h table): https://lore.kernel.org/lkml/20230503013608.2431726-1-nphamcs@gmail.com/ This patch wires cachestat in for all the other architectures. [nphamcs@gmail.com: wire up cachestat for arm64] Link: https://lkml.kernel.org/r/20230511092843.3896327-1-nphamcs@gmail.com Link: https://lkml.kernel.org/r/20230510195806.2902878-1-nphamcs@gmail.com Signed-off-by: Nhat Pham <nphamcs@gmail.com> Tested-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Reviewed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Heiko Carstens <hca@linux.ibm.com> [s390] Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Chris Zankel <chris@zankel.net> Cc: David S. Miller <davem@davemloft.net> Cc: Helge Deller <deller@gmx.de> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
![]() |
a03b1a0b19 |
powerpc/signal32: Force inlining of __unsafe_save_user_regs() and save_tm_user_regs_unsafe()
Looking at generated code for handle_signal32() shows calls to a
function called __unsafe_save_user_regs.constprop.0 while user access
is open.
And that __unsafe_save_user_regs.constprop.0 function has two nops at
the begining, allowing it to be traced, which is unexpected during
user access open window.
The solution could be to mark __unsafe_save_user_regs() no trace, but
to be on the safe side the most efficient is to flag it __always_inline
as already done for function __unsafe_restore_general_regs(). The
function is relatively small and only called twice, so the size
increase will remain in the noise.
Do the same with save_tm_user_regs_unsafe() as it may suffer the
same issue.
Fixes:
|
||
![]() |
0eb089a72f |
powerpc/interrupt: Don't read MSR from interrupt_exit_kernel_prepare()
A disassembly of interrupt_exit_kernel_prepare() shows a useless read
of MSR register. This is shown by r9 being re-used immediately without
doing anything with the value read.
c000e0e0: 60 00 00 00 nop
c000e0e4: 7d 3a c2 a6 mfmd_ap r9
c000e0e8: 7d 20 00 a6 mfmsr r9
c000e0ec: 7c 51 13 a6 mtspr 81,r2
c000e0f0: 81 3f 00 84 lwz r9,132(r31)
c000e0f4: 71 29 80 00 andi. r9,r9,32768
This is due to the use of local_irq_save(). The flags read by
local_irq_save() are never used, use local_irq_disable() instead.
Fixes:
|
||
![]() |
66eff0ef52 |
powerpc/legacy_serial: Warn about 8250 devices operated without active FSL workarounds
If the 8250 driver is built as a module (or built-in without console support) the Freescale specific workaround were silently not activated. Add a warning in this case. Currently CONFIG_SERIAL_8250_FSL=y implies that the function fsl8250_handle_irq() is built-in and can be used. However with the changes of the next commit CONFIG_SERIAL_8250_FSL might be enabled also when the 8250 driver is a module and so more care is needed when fsl8250_handle_irq() is to be used. The code added here is able to handle the new situation already. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Message-ID: <20230605130857.85543-2-u.kleine-koenig@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
![]() |
0f613bfa82 |
locking/atomic: treewide: use raw_atomic*_<op>()
Now that we have raw_atomic*_<op>() definitions, there's no need to use arch_atomic*_<op>() definitions outside of the low-level atomic definitions. Move treewide users of arch_atomic*_<op>() over to the equivalent raw_atomic*_<op>(). There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230605070124.3741859-19-mark.rutland@arm.com |
||
![]() |
a7e5eb53bf |
powerpc/vdso: Include CLANG_FLAGS explicitly in ldflags-y
A future change will move CLANG_FLAGS from KBUILD_{A,C}FLAGS to KBUILD_CPPFLAGS so that '--target' is available while preprocessing. When that occurs, the following error appears when building the compat PowerPC vDSO: clang: error: unsupported option '-mbig-endian' for target 'x86_64-pc-linux-gnu' make[3]: *** [.../arch/powerpc/kernel/vdso/Makefile:76: arch/powerpc/kernel/vdso/vdso32.so.dbg] Error 1 Explicitly add CLANG_FLAGS to ldflags-y, so that '--target' will always be present. Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> |
||
![]() |
1f7aacc5eb |
powerpc/iommu: Incorrect DDW Table is referenced for SR-IOV device
For an SR-IOV device, while enabling DDW, a new table is created and
added at index 1 in the group. In the below 2 scenarios, the table is
incorrectly referenced at index 0 (which is where the table is for
default DMA window).
1. When adding DDW
This issue is exposed with "slub_debug". Error thrown out from
dma_iommu_dma_supported()
Warning: IOMMU offset too big for device mask
mask: 0xffffffff, table offset: 0x800000000000000
2. During Dynamic removal of the PCI device.
Error is from iommu_tce_table_put() since a NULL table pointer is
passed in.
Fixes:
|
||
![]() |
096339ab84 |
powerpc/iommu: DMA address offset is incorrectly calculated with 2MB TCEs
When DMA window is backed by 2MB TCEs, the DMA address for the mapped
page should be the offset of the page relative to the 2MB TCE. The code
was incorrectly setting the DMA address to the beginning of the TCE
range.
Mellanox driver is reporting timeout trying to ENABLE_HCA for an SR-IOV
ethernet port, when DMA window is backed by 2MB TCEs.
Fixes:
|
||
![]() |
ad593827db |
powerpc/iommu: Remove iommu_del_device()
Now that power calls iommu_device_register() and populates its groups
using iommu_ops->device_group it should not be calling
iommu_group_remove_device().
The core code owns the groups and all the other related iommu data, it
will clean it up automatically.
Remove the bus notifiers and explicit calls to
iommu_group_remove_device().
Fixes:
|
||
![]() |
514ca14ed5 |
start_kernel: Add __no_stack_protector function attribute
Back during the discussion of
commit
|
||
![]() |
bd54522466 |
powerpc, workqueue: Use alloc_ordered_workqueue() to create ordered workqueues
BACKGROUND ========== When multiple work items are queued to a workqueue, their execution order doesn't match the queueing order. They may get executed in any order and simultaneously. When fully serialized execution - one by one in the queueing order - is needed, an ordered workqueue should be used which can be created with alloc_ordered_workqueue(). However, alloc_ordered_workqueue() was a later addition. Before it, an ordered workqueue could be obtained by creating an UNBOUND workqueue with @max_active==1. This originally was an implementation side-effect which was broken by |
||
![]() |
79de36042e |
powerpc/isa-bridge: Fix ISA mapping when "ranges" is not present
Commit |
||
![]() |
70cc1b5307 |
powerpc updates for 6.4
- Add support for building the kernel using PC-relative addressing on Power10. - Allow HV KVM guests on Power10 to use prefixed instructions. - Unify support for the P2020 CPU (85xx) into a single machine description. - Always build the 64-bit kernel with 128-bit long double. - Drop support for several obsolete 2000's era development boards as identified by Paul Gortmaker. - A series fixing VFIO on Power since some generic changes. - Various other small features and fixes. Thanks to: Alexey Kardashevskiy, Andrew Donnellan, Benjamin Gray, Bo Liu, Christophe Leroy, Dan Carpenter, David Binderman, Ira Weiny, Joel Stanley, Kajol Jain, Kautuk Consul, Liang He, Luis Chamberlain, Masahiro Yamada, Michael Neuling, Nathan Chancellor, Nathan Lynch, Nicholas Miehlbradt, Nicholas Piggin, Nick Desaulniers, Nysal Jan K.A, Pali Rohár, Paul Gortmaker, Paul Mackerras, Petr Vaněk, Randy Dunlap, Rob Herring, Sachin Sant, Sean Christopherson, Segher Boessenkool, Timothy Pearson. -----BEGIN PGP SIGNATURE----- iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmRLdD8THG1wZUBlbGxl cm1hbi5pZC5hdQAKCRBR6+o8yOGlgPrED/93Hwi1/8udXYV6OAFBnLLLDX9RZWNx sj6W7PC/yY7MGUTIGcayrAURt5xFfQZMBdq9nhhH46Wyd8pSUe1IlpXIEgqeH64Y 5uaSe6u4OZ5dDmiYz8bM+Y4Ixkfq1xMO0Rj27FIRJyU4Pp6gyMQ6/8W+iPU2vYIb jl4frVO8PpmCbW8euOyT/b9YB2h79+5nLgZT4RvmYJblKQwgzRZWaiCAU0wYf+xT xYsMQcqJlPstizTnXIr+GC08VrMPQR51kJCurnNUMXoAQY6toEXebveWRNZ3sH39 K0BRQ036NNGPS4GlHVbjgLIdoWr4pUEROZ48Jy9WeiDh6OwO2vb2zrm7ZLtKlGXI LCQ08T9diPbAAJyVJaKjpsNXhTffuLPRhIOr1o2vqGvY+Fqbw96bPQAyFbxpJtrw rGTTc5e93fEdZV+eaGR8kfdNM75WM6lgiteaIbUnpirYTh/0j6YEO1Ijc4d9e5ux aoHN2eXQDjMm9foZf1D6vaCTRyN8LwLa9hcydKCn6+qULUQzoZ2r4cRt6eSD7+fx Wni1Zg7vD6xx294nVmhg5dWuD4qVIAZj3BZPFHHR2SPqy+UbEJLk/NKgznmrhhgQ HBcZAEFqIBZkA4e0w6LJKh/1j1rxpZUokgjMotOJq4VRivq82W1P7SioFDY3/6w5 UvGtIgGq4Qsa2Q== =89WH -----END PGP SIGNATURE----- Merge tag 'powerpc-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Add support for building the kernel using PC-relative addressing on Power10. - Allow HV KVM guests on Power10 to use prefixed instructions. - Unify support for the P2020 CPU (85xx) into a single machine description. - Always build the 64-bit kernel with 128-bit long double. - Drop support for several obsolete 2000's era development boards as identified by Paul Gortmaker. - A series fixing VFIO on Power since some generic changes. - Various other small features and fixes. Thanks to Alexey Kardashevskiy, Andrew Donnellan, Benjamin Gray, Bo Liu, Christophe Leroy, Dan Carpenter, David Binderman, Ira Weiny, Joel Stanley, Kajol Jain, Kautuk Consul, Liang He, Luis Chamberlain, Masahiro Yamada, Michael Neuling, Nathan Chancellor, Nathan Lynch, Nicholas Miehlbradt, Nicholas Piggin, Nick Desaulniers, Nysal Jan K.A, Pali Rohár, Paul Gortmaker, Paul Mackerras, Petr Vaněk, Randy Dunlap, Rob Herring, Sachin Sant, Sean Christopherson, Segher Boessenkool, and Timothy Pearson. * tag 'powerpc-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (156 commits) powerpc/64s: Disable pcrel code model on Clang powerpc: Fix merge conflict between pcrel and copy_thread changes powerpc/configs/powernv: Add IGB=y powerpc/configs/64s: Drop JFS Filesystem powerpc/configs/64s: Use EXT4 to mount EXT2 filesystems powerpc/configs: Make pseries_defconfig an alias for ppc64le_guest powerpc/configs: Make pseries_le an alias for ppc64le_guest powerpc/configs: Incorporate generic kvm_guest.config into guest configs powerpc/configs: Add IBMVETH=y and IBMVNIC=y to guest configs powerpc/configs/64s: Enable Device Mapper options powerpc/configs/64s: Enable PSTORE powerpc/configs/64s: Enable VLAN support powerpc/configs/64s: Enable BLK_DEV_NVME powerpc/configs/64s: Drop REISERFS powerpc/configs/64s: Use SHA512 for module signatures powerpc/configs/64s: Enable IO_STRICT_DEVMEM powerpc/configs/64s: Enable SCHEDSTATS powerpc/configs/64s: Enable DEBUG_VM & other options powerpc/configs/64s: Enable EMULATED_STATS powerpc/configs/64s: Enable KUNIT and most tests ... |
||
![]() |
f20730efbd |
SMP cross-CPU function-call updates for v6.4:
- Remove diagnostics and adjust config for CSD lock diagnostics - Add a generic IPI-sending tracepoint, as currently there's no easy way to instrument IPI origins: it's arch dependent and for some major architectures it's not even consistently available. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmRK438RHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1jJ5Q/5AZ0HGpyqwdFK8GmGznyu5qjP5HwV9pPq gZQScqSy4tZEeza4TFMi83CoXSg9uJ7GlYJqqQMKm78LGEPomnZtXXC7oWvTA9M5 M/jAvzytmvZloSCXV6kK7jzSejMHhag97J/BjTYhZYQpJ9T+hNC87XO6J6COsKr9 lPIYqkFrIkQNr6B0U11AQfFejRYP1ics2fnbnZL86G/zZAc6x8EveM3KgSer2iHl KbrO+xcYyGY8Ef9P2F72HhEGFfM3WslpT1yzqR3sm4Y+fuMG0oW3qOQuMJx0ZhxT AloterY0uo6gJwI0P9k/K4klWgz81Tf/zLb0eBAtY2uJV9Fo3YhPHuZC7jGPGAy3 JusW2yNYqc8erHVEMAKDUsl/1KN4TE2uKlkZy98wno+KOoMufK5MA2e2kPPqXvUi Jk9RvFolnWUsexaPmCftti0OCv3YFiviVAJ/t0pchfmvvJA2da0VC9hzmEXpLJVF 25nBTV/1uAOrWvOpCyo3ElrC2CkQVkFmK5rXMDdvf6ib0Nid4vFcCkCSLVfu+ePB 11mi7QYro+CcnOug1K+yKogUDmsZgV/u1kUwgQzTIpZ05Kkb49gUiXw9L2RGcBJh yoDoiI66KPR7PWQ2qBdQoXug4zfEEtWG0O9HNLB0FFRC3hu7I+HHyiUkBWs9jasK PA5+V7HcQRk= =Wp7f -----END PGP SIGNATURE----- Merge tag 'smp-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP cross-CPU function-call updates from Ingo Molnar: - Remove diagnostics and adjust config for CSD lock diagnostics - Add a generic IPI-sending tracepoint, as currently there's no easy way to instrument IPI origins: it's arch dependent and for some major architectures it's not even consistently available. * tag 'smp-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: trace,smp: Trace all smp_function_call*() invocations trace: Add trace_ipi_send_cpu() sched, smp: Trace smp callback causing an IPI smp: reword smp call IPI comment treewide: Trace IPIs sent via smp_send_reschedule() irq_work: Trace self-IPIs sent via arch_irq_work_raise() smp: Trace IPIs sent via arch_send_call_function_ipi_mask() sched, smp: Trace IPIs sent via send_call_function_single_ipi() trace: Add trace_ipi_send_cpumask() kernel/smp: Make csdlock_debug= resettable locking/csd_lock: Remove per-CPU data indirection from CSD lock debugging locking/csd_lock: Remove added data from CSD lock debugging locking/csd_lock: Add Kconfig option for csd_debug default |
||
![]() |
2aff7c706c |
Objtool changes for v6.4:
- Mark arch_cpu_idle_dead() __noreturn, make all architectures & drivers that did this inconsistently follow this new, common convention, and fix all the fallout that objtool can now detect statically. - Fix/improve the ORC unwinder becoming unreliable due to UNWIND_HINT_EMPTY ambiguity, split it into UNWIND_HINT_END_OF_STACK and UNWIND_HINT_UNDEFINED to resolve it. - Fix noinstr violations in the KCSAN code and the lkdtm/stackleak code. - Generate ORC data for __pfx code - Add more __noreturn annotations to various kernel startup/shutdown/panic functions. - Misc improvements & fixes. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmRK1x0RHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1ghxQ/+IkCynMYtdF5OG9YwbcGJqsPSfOPMEcEM pUSFYg+gGPBDT/fJfcVSqvUtdnWbLC2kXt9yiswXz3X3J2nmNkBk5YKQftsNDcul TmKeqIIAK51XTncpegKH0EGnOX63oZ9Vxa8CTPdDlb+YF23Km2FoudGRI9F5qbUd LoraXqGYeiaeySkGyWmZVl6Uc8dIxnMkTN3H/oI9aB6TOrsi059hAtFcSaFfyemP c4LqXXCH7k2baiQt+qaLZ8cuZVG/+K5r2N2cmjO5kmJc6ynIaFnfMe4XxZLjp5LT /PulYI15bXkvSARKx5CRh/CDHMOx5Blw+ASO0RhWbdy0WH4ZhhcaVF5AeIpPW86a 1LBcz97rMp72WmvKgrJeVO1r9+ll4SI6/YKGJRsxsCMdP3hgFpqntXyVjTFNdTM1 0gH6H5v55x06vJHvhtTk8SR3PfMTEM2fRU5jXEOrGowoGifx+wNUwORiwj6LE3KQ SKUdT19RNzoW3VkFxhgk65ThK1S7YsJUKRoac3YdhttpqqqtFV//erenrZoR4k/p vzvKy68EQ7RCNyD5wNWNFe0YjeJl5G8gQ8bUm4Xmab7djjgz+pn4WpQB8yYKJLAo x9dqQ+6eUbw3Hcgk6qQ9E+r/svbulnAL0AeALAWK/91DwnZ2mCzKroFkLN7napKi fRho4CqzrtM= =NwEV -----END PGP SIGNATURE----- Merge tag 'objtool-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool updates from Ingo Molnar: - Mark arch_cpu_idle_dead() __noreturn, make all architectures & drivers that did this inconsistently follow this new, common convention, and fix all the fallout that objtool can now detect statically - Fix/improve the ORC unwinder becoming unreliable due to UNWIND_HINT_EMPTY ambiguity, split it into UNWIND_HINT_END_OF_STACK and UNWIND_HINT_UNDEFINED to resolve it - Fix noinstr violations in the KCSAN code and the lkdtm/stackleak code - Generate ORC data for __pfx code - Add more __noreturn annotations to various kernel startup/shutdown and panic functions - Misc improvements & fixes * tag 'objtool-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits) x86/hyperv: Mark hv_ghcb_terminate() as noreturn scsi: message: fusion: Mark mpt_halt_firmware() __noreturn x86/cpu: Mark {hlt,resume}_play_dead() __noreturn btrfs: Mark btrfs_assertfail() __noreturn objtool: Include weak functions in global_noreturns check cpu: Mark nmi_panic_self_stop() __noreturn cpu: Mark panic_smp_self_stop() __noreturn arm64/cpu: Mark cpu_park_loop() and friends __noreturn x86/head: Mark *_start_kernel() __noreturn init: Mark start_kernel() __noreturn init: Mark [arch_call_]rest_init() __noreturn objtool: Generate ORC data for __pfx code x86/linkage: Fix padding for typed functions objtool: Separate prefix code from stack validation code objtool: Remove superfluous dead_end_function() check objtool: Add symbol iteration helpers objtool: Add WARN_INSN() scripts/objdump-func: Support multiple functions context_tracking: Fix KCSAN noinstr violation objtool: Add stackleak instrumentation to uaccess safe list ... |
||
![]() |
7fa8a8ee94 |
- Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of
switching from a user process to a kernel thread. - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj Raghav. - zsmalloc performance improvements from Sergey Senozhatsky. - Yue Zhao has found and fixed some data race issues around the alteration of memcg userspace tunables. - VFS rationalizations from Christoph Hellwig: - removal of most of the callers of write_one_page(). - make __filemap_get_folio()'s return value more useful - Luis Chamberlain has changed tmpfs so it no longer requires swap backing. Use `mount -o noswap'. - Qi Zheng has made the slab shrinkers operate locklessly, providing some scalability benefits. - Keith Busch has improved dmapool's performance, making part of its operations O(1) rather than O(n). - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd, permitting userspace to wr-protect anon memory unpopulated ptes. - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive rather than exclusive, and has fixed a bunch of errors which were caused by its unintuitive meaning. - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature, which causes minor faults to install a write-protected pte. - Vlastimil Babka has done some maintenance work on vma_merge(): cleanups to the kernel code and improvements to our userspace test harness. - Cleanups to do_fault_around() by Lorenzo Stoakes. - Mike Rapoport has moved a lot of initialization code out of various mm/ files and into mm/mm_init.c. - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for DRM, but DRM doesn't use it any more. - Lorenzo has also coverted read_kcore() and vread() to use iterators and has thereby removed the use of bounce buffers in some cases. - Lorenzo has also contributed further cleanups of vma_merge(). - Chaitanya Prakash provides some fixes to the mmap selftesting code. - Matthew Wilcox changes xfs and afs so they no longer take sleeping locks in ->map_page(), a step towards RCUification of pagefaults. - Suren Baghdasaryan has improved mmap_lock scalability by switching to per-VMA locking. - Frederic Weisbecker has reworked the percpu cache draining so that it no longer causes latency glitches on cpu isolated workloads. - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig logic. - Liu Shixin has changed zswap's initialization so we no longer waste a chunk of memory if zswap is not being used. - Yosry Ahmed has improved the performance of memcg statistics flushing. - David Stevens has fixed several issues involving khugepaged, userfaultfd and shmem. - Christoph Hellwig has provided some cleanup work to zram's IO-related code paths. - David Hildenbrand has fixed up some issues in the selftest code's testing of our pte state changing. - Pankaj Raghav has made page_endio() unneeded and has removed it. - Peter Xu contributed some rationalizations of the userfaultfd selftests. - Yosry Ahmed has fixed an issue around memcg's page recalim accounting. - Chaitanya Prakash has fixed some arm-related issues in the selftests/mm code. - Longlong Xia has improved the way in which KSM handles hwpoisoned pages. - Peter Xu fixes a few issues with uffd-wp at fork() time. - Stefan Roesch has changed KSM so that it may now be used on a per-process and per-cgroup basis. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZEr3zQAKCRDdBJ7gKXxA jlLoAP0fpQBipwFxED0Us4SKQfupV6z4caXNJGPeay7Aj11/kQD/aMRC2uPfgr96 eMG3kwn2pqkB9ST2QpkaRbxA//eMbQY= =J+Dj -----END PGP SIGNATURE----- Merge tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of switching from a user process to a kernel thread. - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj Raghav. - zsmalloc performance improvements from Sergey Senozhatsky. - Yue Zhao has found and fixed some data race issues around the alteration of memcg userspace tunables. - VFS rationalizations from Christoph Hellwig: - removal of most of the callers of write_one_page() - make __filemap_get_folio()'s return value more useful - Luis Chamberlain has changed tmpfs so it no longer requires swap backing. Use `mount -o noswap'. - Qi Zheng has made the slab shrinkers operate locklessly, providing some scalability benefits. - Keith Busch has improved dmapool's performance, making part of its operations O(1) rather than O(n). - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd, permitting userspace to wr-protect anon memory unpopulated ptes. - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive rather than exclusive, and has fixed a bunch of errors which were caused by its unintuitive meaning. - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature, which causes minor faults to install a write-protected pte. - Vlastimil Babka has done some maintenance work on vma_merge(): cleanups to the kernel code and improvements to our userspace test harness. - Cleanups to do_fault_around() by Lorenzo Stoakes. - Mike Rapoport has moved a lot of initialization code out of various mm/ files and into mm/mm_init.c. - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for DRM, but DRM doesn't use it any more. - Lorenzo has also coverted read_kcore() and vread() to use iterators and has thereby removed the use of bounce buffers in some cases. - Lorenzo has also contributed further cleanups of vma_merge(). - Chaitanya Prakash provides some fixes to the mmap selftesting code. - Matthew Wilcox changes xfs and afs so they no longer take sleeping locks in ->map_page(), a step towards RCUification of pagefaults. - Suren Baghdasaryan has improved mmap_lock scalability by switching to per-VMA locking. - Frederic Weisbecker has reworked the percpu cache draining so that it no longer causes latency glitches on cpu isolated workloads. - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig logic. - Liu Shixin has changed zswap's initialization so we no longer waste a chunk of memory if zswap is not being used. - Yosry Ahmed has improved the performance of memcg statistics flushing. - David Stevens has fixed several issues involving khugepaged, userfaultfd and shmem. - Christoph Hellwig has provided some cleanup work to zram's IO-related code paths. - David Hildenbrand has fixed up some issues in the selftest code's testing of our pte state changing. - Pankaj Raghav has made page_endio() unneeded and has removed it. - Peter Xu contributed some rationalizations of the userfaultfd selftests. - Yosry Ahmed has fixed an issue around memcg's page recalim accounting. - Chaitanya Prakash has fixed some arm-related issues in the selftests/mm code. - Longlong Xia has improved the way in which KSM handles hwpoisoned pages. - Peter Xu fixes a few issues with uffd-wp at fork() time. - Stefan Roesch has changed KSM so that it may now be used on a per-process and per-cgroup basis. * tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits) mm,unmap: avoid flushing TLB in batch if PTE is inaccessible shmem: restrict noswap option to initial user namespace mm/khugepaged: fix conflicting mods to collapse_file() sparse: remove unnecessary 0 values from rc mm: move 'mmap_min_addr' logic from callers into vm_unmapped_area() hugetlb: pte_alloc_huge() to replace huge pte_alloc_map() maple_tree: fix allocation in mas_sparse_area() mm: do not increment pgfault stats when page fault handler retries zsmalloc: allow only one active pool compaction context selftests/mm: add new selftests for KSM mm: add new KSM process and sysfs knobs mm: add new api to enable ksm per process mm: shrinkers: fix debugfs file permissions mm: don't check VMA write permissions if the PTE/PMD indicates write permissions migrate_pages_batch: fix statistics for longterm pin retry userfaultfd: use helper function range_in_vma() lib/show_mem.c: use for_each_populated_zone() simplify code mm: correct arg in reclaim_pages()/reclaim_clean_pages_from_list() fs/buffer: convert create_page_buffers to folio_create_buffers fs/buffer: add folio_create_empty_buffers helper ... |
||
![]() |
b6a7828502 |
modules-6.4-rc1
The summary of the changes for this pull requests is: * Song Liu's new struct module_memory replacement * Nick Alcock's MODULE_LICENSE() removal for non-modules * My cleanups and enhancements to reduce the areas where we vmalloc module memory for duplicates, and the respective debug code which proves the remaining vmalloc pressure comes from userspace. Most of the changes have been in linux-next for quite some time except the minor fixes I made to check if a module was already loaded prior to allocating the final module memory with vmalloc and the respective debug code it introduces to help clarify the issue. Although the functional change is small it is rather safe as it can only *help* reduce vmalloc space for duplicates and is confirmed to fix a bootup issue with over 400 CPUs with KASAN enabled. I don't expect stable kernels to pick up that fix as the cleanups would have also had to have been picked up. Folks on larger CPU systems with modules will want to just upgrade if vmalloc space has been an issue on bootup. Given the size of this request, here's some more elaborate details on this pull request. The functional change change in this pull request is the very first patch from Song Liu which replaces the struct module_layout with a new struct module memory. The old data structure tried to put together all types of supported module memory types in one data structure, the new one abstracts the differences in memory types in a module to allow each one to provide their own set of details. This paves the way in the future so we can deal with them in a cleaner way. If you look at changes they also provide a nice cleanup of how we handle these different memory areas in a module. This change has been in linux-next since before the merge window opened for v6.3 so to provide more than a full kernel cycle of testing. It's a good thing as quite a bit of fixes have been found for it. Jason Baron then made dynamic debug a first class citizen module user by using module notifier callbacks to allocate / remove module specific dynamic debug information. Nick Alcock has done quite a bit of work cross-tree to remove module license tags from things which cannot possibly be module at my request so to: a) help him with his longer term tooling goals which require a deterministic evaluation if a piece a symbol code could ever be part of a module or not. But quite recently it is has been made clear that tooling is not the only one that would benefit. Disambiguating symbols also helps efforts such as live patching, kprobes and BPF, but for other reasons and R&D on this area is active with no clear solution in sight. b) help us inch closer to the now generally accepted long term goal of automating all the MODULE_LICENSE() tags from SPDX license tags In so far as a) is concerned, although module license tags are a no-op for non-modules, tools which would want create a mapping of possible modules can only rely on the module license tag after the commit |
||
![]() |
556eb8b791 |
Driver core changes for 6.4-rc1
Here is the large set of driver core changes for 6.4-rc1. Once again, a busy development cycle, with lots of changes happening in the driver core in the quest to be able to move "struct bus" and "struct class" into read-only memory, a task now complete with these changes. This will make the future rust interactions with the driver core more "provably correct" as well as providing more obvious lifetime rules for all busses and classes in the kernel. The changes required for this did touch many individual classes and busses as many callbacks were changed to take const * parameters instead. All of these changes have been submitted to the various subsystem maintainers, giving them plenty of time to review, and most of them actually did so. Other than those changes, included in here are a small set of other things: - kobject logging improvements - cacheinfo improvements and updates - obligatory fw_devlink updates and fixes - documentation updates - device property cleanups and const * changes - firwmare loader dependency fixes. All of these have been in linux-next for a while with no reported problems. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZEp7Sw8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ykitQCfamUHpxGcKOAGuLXMotXNakTEsxgAoIquENm5 LEGadNS38k5fs+73UaxV =7K4B -----END PGP SIGNATURE----- Merge tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the large set of driver core changes for 6.4-rc1. Once again, a busy development cycle, with lots of changes happening in the driver core in the quest to be able to move "struct bus" and "struct class" into read-only memory, a task now complete with these changes. This will make the future rust interactions with the driver core more "provably correct" as well as providing more obvious lifetime rules for all busses and classes in the kernel. The changes required for this did touch many individual classes and busses as many callbacks were changed to take const * parameters instead. All of these changes have been submitted to the various subsystem maintainers, giving them plenty of time to review, and most of them actually did so. Other than those changes, included in here are a small set of other things: - kobject logging improvements - cacheinfo improvements and updates - obligatory fw_devlink updates and fixes - documentation updates - device property cleanups and const * changes - firwmare loader dependency fixes. All of these have been in linux-next for a while with no reported problems" * tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (120 commits) device property: make device_property functions take const device * driver core: update comments in device_rename() driver core: Don't require dynamic_debug for initcall_debug probe timing firmware_loader: rework crypto dependencies firmware_loader: Strip off \n from customized path zram: fix up permission for the hot_add sysfs file cacheinfo: Add use_arch[|_cache]_info field/function arch_topology: Remove early cacheinfo error message if -ENOENT cacheinfo: Check cache properties are present in DT cacheinfo: Check sib_leaf in cache_leaves_are_shared() cacheinfo: Allow early level detection when DT/ACPI info is missing/broken cacheinfo: Add arm64 early level initializer implementation cacheinfo: Add arch specific early level initializer tty: make tty_class a static const structure driver core: class: remove struct class_interface * from callbacks driver core: class: mark the struct class in struct class_interface constant driver core: class: make class_register() take a const * driver core: class: mark class_release() as taking a const * driver core: remove incorrect comment for device_create* MIPS: vpe-cmp: remove module owner pointer from struct class usage. ... |
||
![]() |
34b62f186d |
pci-v6.4-changes
-----BEGIN PGP SIGNATURE----- iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmRIKooUHGJoZWxnYWFz QGdvb2dsZS5jb20ACgkQWYigwDrT+vxq7A/9G0sInrqvqH2I9/Set/FnmMfCtGDH YcEjHYYxL+pztSiXTavDV+ib9iaut83oYtcV9p1bUMhJoZdKNZhrNdIGzRFSemI4 0/ShtklPzNEu6nPPL24CnEzgbrODBU56ZvzrIE/tShEoOjkKa1triBnOA/JMxYTL cUwqDQlDkdpYniCgxy05QfcFZ0mmSOkbl7runGfTMTiUKKC3xSRiaW5YN9KZe3i7 G5YHu1VVCjeQdQSICHYwyFmkyiqosCoajQNp1IHBkWqSwilzyZMg0NWJobVSA7M/ mXXnzLtFcC60oT58/9MaggQwDTaSGDE8mG+sWv05bB2u5TQVyZEZqZ4c2FzmZIZT WLZYLB6PFRW0zePEuMnVkSLS2npkX+aGaBv28bf88sjorpaYNG01uYijnLEceolQ yBPFRN3bsRuOyHvYY/tiZX/BP7z/DS++XXwA8zQWZnYsXSlncJdwCNquV0xIwUt+ hij4/Yu7o9SgV1LbuwtkMFAn3C9Szc65Eer+IvRRdnMZYphjVHbA5F2msRFyiCeR HxECtMQ1jBnVrpQAcBX1Sz+Vu5MrwCqzc2n6tvTQHDvVNjXfkG3NaFhxYPc1IL9Z NJMeCKfK1qzw7TtbvWXCluTTIM9N/bNJXrJhQbjNY7V6IaBZY1QNYW0ZFfGgj6Gb UUPgndidRy4/hzw= =HPXl -----END PGP SIGNATURE----- Merge tag 'pci-v6.4-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci updates from Bjorn Helgaas: "Resource management: - Add pci_dev_for_each_resource() and pci_bus_for_each_resource() iterators PCIe native device hotplug: - Fix AB-BA deadlock between reset_lock and device_lock Power management: - Wait longer for devices to become ready after resume (as we do for reset) to accommodate Intel Titan Ridge xHCI devices - Extend D3hot delay for NVIDIA HDA controllers to avoid unrecoverable devices after a bus reset Error handling: - Clear PCIe Device Status after EDR since generic error recovery now only clears it when AER is native ASPM: - Work around Chromebook firmware defect that clobbers Capability list (including ASPM L1 PM Substates Cap) when returning from D3cold to D0 Freescale i.MX6 PCIe controller driver: - Install imprecise external abort handler only when DT indicates PCIe support Freescale Layerscape PCIe controller driver: - Add ls1028a endpoint mode support Qualcomm PCIe controller driver: - Add SM8550 DT binding and driver support - Add SDX55 DT binding and driver support - Use bulk APIs for clocks of IP 1.0.0, 2.3.2, 2.3.3 - Use bulk APIs for reset of IP 2.1.0, 2.3.3, 2.4.0 - Add DT "mhi" register region for supported SoCs - Expose link transition counts via debugfs to help debug low power issues - Support system suspend and resume; reduce interconnect bandwidth and turn off clock and PHY if there are no active devices - Enable async probe by default to reduce boot time Miscellaneous: - Sort controller Kconfig entries by vendor" * tag 'pci-v6.4-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (56 commits) PCI: xilinx: Drop obsolete dependency on COMPILE_TEST PCI: mobiveil: Sort Kconfig entries by vendor PCI: dwc: Sort Kconfig entries by vendor PCI: Sort controller Kconfig entries by vendor PCI: Use consistent controller Kconfig menu entry language PCI: xilinx-nwl: Add 'Xilinx' to Kconfig prompt PCI: hv: Add 'Microsoft' to Kconfig prompt PCI: meson: Add 'Amlogic' to Kconfig prompt PCI: Use of_property_present() for testing DT property presence PCI/PM: Extend D3hot delay for NVIDIA HDA controllers dt-bindings: PCI: qcom: Document msi-map and msi-map-mask properties PCI: qcom: Add SM8550 PCIe support dt-bindings: PCI: qcom: Add SM8550 compatible PCI: qcom: Add support for SDX55 SoC dt-bindings: PCI: qcom-ep: Fix the unit address used in example dt-bindings: PCI: qcom: Add SDX55 SoC dt-bindings: PCI: qcom: Update maintainers entry PCI: qcom: Enable async probe by default PCI: qcom: Add support for system suspend and resume PCI/PM: Drop pci_bridge_wait_for_secondary_bus() timeout parameter ... |
||
![]() |
0c993300d5 |
powerpc: Fix merge conflict between pcrel and copy_thread changes
Fix a conflict between commit |
||
![]() |
e7989789c6 |
Timers and timekeeping updates:
- Improve the VDSO build time checks to cover all dynamic relocations VDSO does not allow dynamic relcations, but the build time check is incomplete and fragile. It's based on architectures specifying the relocation types to search for and does not handle R_*_NONE relocation entries correctly. R_*_NONE relocations are injected by some GNU ld variants if they fail to determine the exact .rel[a]/dyn_size to cover trailing zeros. R_*_NONE relocations must be ignored by dynamic loaders, so they should be ignored in the build time check too. Remove the architecture specific relocation types to check for and validate strictly that no other relocations than R_*_NONE end up in the VSDO .so file. - Prefer signal delivery to the current thread for CLOCK_PROCESS_CPUTIME_ID based posix-timers Such timers prefer to deliver the signal to the main thread of a process even if the context in which the timer expires is the current task. This has the downside that it might wake up an idle thread. As there is no requirement or guarantee that the signal has to be delivered to the main thread, avoid this by preferring the current task if it is part of the thread group which shares sighand. This not only avoids waking idle threads, it also distributes the signal delivery in case of multiple timers firing in the context of different threads close to each other better. - Align the tick period properly (again) For a long time the tick was starting at CLOCK_MONOTONIC zero, which allowed users space applications to either align with the tick or to place a periodic computation so that it does not interfere with the tick. The alignement of the tick period was more by chance than by intention as the tick is set up before a high resolution clocksource is installed, i.e. timekeeping is still tick based and the tick period advances from there. The early enablement of sched_clock() broke this alignement as the time accumulated by sched_clock() is taken into account when timekeeping is initialized. So the base value now(CLOCK_MONOTONIC) is not longer a multiple of tick periods, which breaks applications which relied on that behaviour. Cure this by aligning the tick starting point to the next multiple of tick periods, i.e 1000ms/CONFIG_HZ. - A set of NOHZ fixes and enhancements - Cure the concurrent writer race for idle and IO sleeptime statistics The statitic values which are exposed via /proc/stat are updated from the CPU local idle exit and remotely by cpufreq, but that happens without any form of serialization. As a consequence sleeptimes can be accounted twice or worse. Prevent this by restricting the accumulation writeback to the CPU local idle exit and let the remote access compute the accumulated value. - Protect idle/iowait sleep time with a sequence count Reading idle/iowait sleep time, e.g. from /proc/stat, can race with idle exit updates. As a consequence the readout may result in random and potentially going backwards values. Protect this by a sequence count, which fixes the idle time statistics issue, but cannot fix the iowait time problem because iowait time accounting races with remote wake ups decrementing the remote runqueues nr_iowait counter. The latter is impossible to fix, so the only way to deal with that is to document it properly and to remove the assertion in the selftest which triggers occasionally due to that. - Restructure struct tick_sched for better cache layout - Some small cleanups and a better cache layout for struct tick_sched - Implement the missing timer_wait_running() callback for POSIX CPU timers For unknown reason the introduction of the timer_wait_running() callback missed to fixup posix CPU timers, which went unnoticed for almost four years. While initially only targeted to prevent livelocks between a timer deletion and the timer expiry function on PREEMPT_RT enabled kernels, it turned out that fixing this for mainline is not as trivial as just implementing a stub similar to the hrtimer/timer callbacks. The reason is that for CONFIG_POSIX_CPU_TIMERS_TASK_WORK enabled systems there is a livelock issue independent of RT. CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y moves the expiry of POSIX CPU timers out from hard interrupt context to task work, which is handled before returning to user space or to a VM. The expiry mechanism moves the expired timers to a stack local list head with sighand lock held. Once sighand is dropped the task can be preempted and a task which wants to delete a timer will spin-wait until the expiry task is scheduled back in. In the worst case this will end up in a livelock when the preempting task and the expiry task are pinned on the same CPU. The timer wheel has a timer_wait_running() mechanism for RT, which uses a per CPU timer-base expiry lock which is held by the expiry code and the task waiting for the timer function to complete blocks on that lock. This does not work in the same way for posix CPU timers as there is no timer base and expiry for process wide timers can run on any task belonging to that process, but the concept of waiting on an expiry lock can be used too in a slightly different way. Add a per task mutex to struct posix_cputimers_work, let the expiry task hold it accross the expiry function and let the deleting task which waits for the expiry to complete block on the mutex. In the non-contended case this results in an extra mutex_lock()/unlock() pair on both sides. This avoids spin-waiting on a task which is scheduled out, prevents the livelock and cures the problem for RT and !RT systems. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmRGrj4THHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoZhdEAC/lwfDWCnTXHC8ExQQRDIVNyXmDlLb EHB8ZY7Wc4gNZ8UEXEOLOXJHMG9bsbtPGctVewJwRGnXZWKVhpPwQba6kCRycyX0 0J6l5DlvUaGGrpoOzOZwgETRmtIZE9tEArZR8xlfRScYd93a7yLhwIjO8JaV9vKs IQpAQMeJ/ysp6gHrS59qakYfoHU/ERUAu3Tk4GqHUtPtcyz3nX3eTlLWV8LySqs+ 00qr2yc0bQFUFoKzTCxtM8lcEi9ja9SOj1rw28348O+BXE4d0HC12Ie7eU/CDN2Y OAlWYxVjy4LMh24LDrRQKTzoVqx9MXDx2g+09B3t8NK5LgeS+EJIjujDhZF147/H 5y906nplZUKa8BiZW5Rpm/HKH8tFI80T9XWSQCRBeMgTEJyRyRU1yASAwO4xw+dY Dn3tGmFGymcV/72o4ic9JFKQd8cTSxPjEJS3qqzMkEAtyI/zPBmKxj/Tce50OH40 6FSZq1uU21ZQzszwSHISwgFtNr75laUSK4Z1te5OhPOOz+C7O9YqHvqS/1jwhPj2 tMd8X17fRW3UTUBlBj+zqxqiEGBl/Yk2AvKrJIXGUtfWYCtjMJ7ieCf0kZ7NSVJx 9ewubA0gqseMD783YomZsy8LLtMKnhclJeslUOVb1oKs1q/WF1R/k6qjy9vUwYaB nIJuHl8mxSetag== =SVnj -----END PGP SIGNATURE----- Merge tag 'timers-core-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timers and timekeeping updates from Thomas Gleixner: - Improve the VDSO build time checks to cover all dynamic relocations VDSO does not allow dynamic relocations, but the build time check is incomplete and fragile. It's based on architectures specifying the relocation types to search for and does not handle R_*_NONE relocation entries correctly. R_*_NONE relocations are injected by some GNU ld variants if they fail to determine the exact .rel[a]/dyn_size to cover trailing zeros. R_*_NONE relocations must be ignored by dynamic loaders, so they should be ignored in the build time check too. Remove the architecture specific relocation types to check for and validate strictly that no other relocations than R_*_NONE end up in the VSDO .so file. - Prefer signal delivery to the current thread for CLOCK_PROCESS_CPUTIME_ID based posix-timers Such timers prefer to deliver the signal to the main thread of a process even if the context in which the timer expires is the current task. This has the downside that it might wake up an idle thread. As there is no requirement or guarantee that the signal has to be delivered to the main thread, avoid this by preferring the current task if it is part of the thread group which shares sighand. This not only avoids waking idle threads, it also distributes the signal delivery in case of multiple timers firing in the context of different threads close to each other better. - Align the tick period properly (again) For a long time the tick was starting at CLOCK_MONOTONIC zero, which allowed users space applications to either align with the tick or to place a periodic computation so that it does not interfere with the tick. The alignement of the tick period was more by chance than by intention as the tick is set up before a high resolution clocksource is installed, i.e. timekeeping is still tick based and the tick period advances from there. The early enablement of sched_clock() broke this alignement as the time accumulated by sched_clock() is taken into account when timekeeping is initialized. So the base value now(CLOCK_MONOTONIC) is not longer a multiple of tick periods, which breaks applications which relied on that behaviour. Cure this by aligning the tick starting point to the next multiple of tick periods, i.e 1000ms/CONFIG_HZ. - A set of NOHZ fixes and enhancements: * Cure the concurrent writer race for idle and IO sleeptime statistics The statitic values which are exposed via /proc/stat are updated from the CPU local idle exit and remotely by cpufreq, but that happens without any form of serialization. As a consequence sleeptimes can be accounted twice or worse. Prevent this by restricting the accumulation writeback to the CPU local idle exit and let the remote access compute the accumulated value. * Protect idle/iowait sleep time with a sequence count Reading idle/iowait sleep time, e.g. from /proc/stat, can race with idle exit updates. As a consequence the readout may result in random and potentially going backwards values. Protect this by a sequence count, which fixes the idle time statistics issue, but cannot fix the iowait time problem because iowait time accounting races with remote wake ups decrementing the remote runqueues nr_iowait counter. The latter is impossible to fix, so the only way to deal with that is to document it properly and to remove the assertion in the selftest which triggers occasionally due to that. * Restructure struct tick_sched for better cache layout * Some small cleanups and a better cache layout for struct tick_sched - Implement the missing timer_wait_running() callback for POSIX CPU timers For unknown reason the introduction of the timer_wait_running() callback missed to fixup posix CPU timers, which went unnoticed for almost four years. While initially only targeted to prevent livelocks between a timer deletion and the timer expiry function on PREEMPT_RT enabled kernels, it turned out that fixing this for mainline is not as trivial as just implementing a stub similar to the hrtimer/timer callbacks. The reason is that for CONFIG_POSIX_CPU_TIMERS_TASK_WORK enabled systems there is a livelock issue independent of RT. CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y moves the expiry of POSIX CPU timers out from hard interrupt context to task work, which is handled before returning to user space or to a VM. The expiry mechanism moves the expired timers to a stack local list head with sighand lock held. Once sighand is dropped the task can be preempted and a task which wants to delete a timer will spin-wait until the expiry task is scheduled back in. In the worst case this will end up in a livelock when the preempting task and the expiry task are pinned on the same CPU. The timer wheel has a timer_wait_running() mechanism for RT, which uses a per CPU timer-base expiry lock which is held by the expiry code and the task waiting for the timer function to complete blocks on that lock. This does not work in the same way for posix CPU timers as there is no timer base and expiry for process wide timers can run on any task belonging to that process, but the concept of waiting on an expiry lock can be used too in a slightly different way. Add a per task mutex to struct posix_cputimers_work, let the expiry task hold it accross the expiry function and let the deleting task which waits for the expiry to complete block on the mutex. In the non-contended case this results in an extra mutex_lock()/unlock() pair on both sides. This avoids spin-waiting on a task which is scheduled out, prevents the livelock and cures the problem for RT and !RT systems * tag 'timers-core-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: posix-cpu-timers: Implement the missing timer_wait_running callback selftests/proc: Assert clock_gettime(CLOCK_BOOTTIME) VS /proc/uptime monotonicity selftests/proc: Remove idle time monotonicity assertions MAINTAINERS: Remove stale email address timers/nohz: Remove middle-function __tick_nohz_idle_stop_tick() timers/nohz: Add a comment about broken iowait counter update race timers/nohz: Protect idle/iowait sleep time under seqcount timers/nohz: Only ever update sleeptime from idle exit timers/nohz: Restructure and reshuffle struct tick_sched tick/common: Align tick period with the HZ tick. selftests/timers/posix_timers: Test delivery of signals across threads posix-timers: Prefer delivery of signals to the current thread vdso: Improve cmd_vdso_check to check all dynamic relocations |
||
![]() |
6fee130204 |
powerpc/64: Don't call trace_hardirqs_on() in prep_irq_for_idle()
Since commit
|
||
![]() |
7640854d96 |
powerpc/64: Mark prep_irq_for_idle() __cpuidle
Code in the idle path is not allowed to be instrumented because RCU is
disabled, see commit
|
||
![]() |
e5b6634aa1 |
powerpc/irq: Mark check_return_regs_valid() notrace
check_return_regs_valid() is called from the middle of the irq exit handling, which is all notrace, so mark it notrace also. Reported-by: Sachin Sant <sachinp@linux.ibm.com> Link: https://lore.kernel.org/all/4C073F6A-C812-4C4A-BB7A-ECD10B75FB88@linux.ibm.com/ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230406122118.3760344-1-mpe@ellerman.id.au |
||
![]() |
77e69ee7ce |
powerpc/64: modules support building with PCREL addresing
Build modules using PCREL addressing when CONFIG_PPC_KERNEL_PCREL=y. - The module loader must handle several new relocation types: * R_PPC64_REL24_NOTOC is a function call handled like R_PPC_REL24, but does not restore r2 upon return. The external function call stub is changed to use pcrel addressing to load the function pointer rather than based on the module TOC. * R_PPC64_GOT_PCREL34 is a reference to external data. A GOT table must be built by hand, because the linker adds this during the final link (which is not done for kernel modules). The GOT table is built similarly to the way the external function call stub table is. This section is called .mygot because .got has a special meaning for the linker and can become upset. * R_PPC64_PCREL34 is used for local data addressing, but there is a special case where the percpu section is moved at load-time to the percpu area which is out of range of this relocation. This requires the PCREL34 relocations are converted to use GOT_PCREL34 addressing. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Some coding style & formatting fixups] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230408021752.862660-7-npiggin@gmail.com |
||
![]() |
7e3a68be42 |
powerpc/64: vmlinux support building with PCREL addresing
PC-Relative or PCREL addressing is an extension to the ELF ABI which uses Power ISA v3.1 PC-relative instructions to calculate addresses, rather than the traditional TOC scheme. Add an option to build vmlinux using pcrel addressing. Modules continue to use TOC addressing. - TOC address helpers and r2 are poisoned with -1 when running vmlinux. r2 could be used for something useful once things are ironed out. - Assembly must call C functions with @notoc annotation, or the linker complains aobut a missing nop after the call. This is done with the CFUNC macro introduced earlier. - Boot: with the exception of prom_init, the execution branches to the kernel virtual address early in boot, before any addresses are generated, which ensures 34-bit pcrel addressing does not miss the high PAGE_OFFSET bits. TOC relative addressing has a similar requirement. prom_init does not go to the virtual address and its addresses should not carry over to the post-prom kernel. - Ftrace trampolines are converted from TOC addressing to pcrel addressing, including module ftrace trampolines that currently use the kernel TOC to find ftrace target functions. - BPF function prologue and function calling generation are converted from TOC to pcrel. - copypage_64.S has an interesting problem, prefixed instructions have alignment restrictions so the linker can add padding, which makes the assembler treat the difference between two local labels as non-constant even if alignment is arranged so padding is not required. This may need toolchain help to solve nicely, for now move the prefix instruction out of the alternate patch section to work around it. This reduces kernel text size by about 6%. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230408021752.862660-6-npiggin@gmail.com |
||
![]() |
4e991e3c16 |
powerpc: add CFUNC assembly label annotation
This macro is to be used in assembly where C functions are called. pcrel addressing mode requires branches to functions with a localentry value of 1 to have either a trailing nop or @notoc. This macro permits the latter without changing callers. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Add dummy definitions to fix selftests build] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230408021752.862660-5-npiggin@gmail.com |
||
![]() |
dc5dac748a |
powerpc/64: Add support to build with prefixed instructions
Add an option to build kernel and module with prefixed instructions if the CPU and toolchain support it. This is not related to kernel support for userspace execution of prefixed instructions. Building with prefixed instructions breaks some extended inline asm memory addressing, for example it will provide immediates that exceed the range of simple load/store displacement. Whether this is a toolchain or a kernel asm problem remains to be seen. For now, these are replaced with simpler and less efficient direct register addressing when compiling with prefixed. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230408021752.862660-4-npiggin@gmail.com |
||
![]() |
b270bebd34 |
powerpc/64s: Run at the kernel virtual address earlier in boot
This mostly consolidates the Book3E and Book3S behaviour in boot WRT executing from the physical or virtual address. Book3E sets up kernel virtual linear map in start_initialization_book3e and runs from the virtual linear alias after that. This change makes Book3S begin to execute from the virtual alias at the same point. Book3S can not use its MMU for that at this point, but when the MMU is disabled, the virtual linear address correctly aliases to physical memory because the top bits of the address are ignored with MMU disabled. Secondaries execute from the virtual address similarly early. This reduces the differences between subarchs, but the main motivation was to enable the PC-relative addressing ABI for Book3S, where pointer calculations must execute from the virtual address or the top bits of the pointer will be lost. This is similar to the requirement the TOC relative addressing already has that the TOC pointer use its virtual address. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230408021752.862660-3-npiggin@gmail.com |
||
![]() |
4f18b9e6ca |
powerpc/64: Move initial base and TOC pointer calculation
A later change moves the non-prom case to run at the virtual address earlier, which calls for virtual TOC and kernel base. Split these two calculations for prom and non-prom to make that change simpler. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Retain relative_toc call for start_initialization_book3e] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230408021752.862660-2-npiggin@gmail.com |
||
![]() |
7412a60dec |
cpu: Mark panic_smp_self_stop() __noreturn
In preparation for improving objtool's handling of weak noreturn functions, mark panic_smp_self_stop() __noreturn. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/92d76ab5c8bf660f04fdcd3da1084519212de248.1681342859.git.jpoimboe@kernel.org |
||
![]() |
8002725b9e |
powerpc/32: Include thread_info.h in head_booke.h
When building with W=1 after commit
|