linux/arch/powerpc/include/asm/book3s/64/radix.h

367 lines
12 KiB
C
Raw Permalink Normal View History

License cleanup: add SPDX GPL-2.0 license identifier to files with no license Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 15:07:57 +01:00
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _ASM_POWERPC_PGTABLE_RADIX_H
#define _ASM_POWERPC_PGTABLE_RADIX_H
#include <asm/asm-const.h>
#ifndef __ASSEMBLY__
#include <asm/cmpxchg.h>
#endif
#ifdef CONFIG_PPC_64K_PAGES
#include <asm/book3s/64/radix-64k.h>
#else
#include <asm/book3s/64/radix-4k.h>
#endif
#ifndef __ASSEMBLY__
#include <asm/book3s/64/tlbflush-radix.h>
#include <asm/cpu_has_feature.h>
#endif
/* An empty PTE can still have a R or C writeback */
#define RADIX_PTE_NONE_MASK (_PAGE_DIRTY | _PAGE_ACCESSED)
/* Bits to set in a RPMD/RPUD/RPGD */
#define RADIX_PMD_VAL_BITS (0x8000000000000000UL | RADIX_PTE_INDEX_SIZE)
#define RADIX_PUD_VAL_BITS (0x8000000000000000UL | RADIX_PMD_INDEX_SIZE)
#define RADIX_PGD_VAL_BITS (0x8000000000000000UL | RADIX_PUD_INDEX_SIZE)
/* Don't have anything in the reserved bits and leaf bits */
#define RADIX_PMD_BAD_BITS 0x60000000000000e0UL
#define RADIX_PUD_BAD_BITS 0x60000000000000e0UL
powerpc: add support for folded p4d page tables Implement primitives necessary for the 4th level folding, add walks of p4d level where appropriate and replace 5level-fixup.h with pgtable-nop4d.h. [rppt@linux.ibm.com: powerpc/xmon: drop unused pgdir varialble in show_pte() function] Link: http://lkml.kernel.org/r/20200519181454.GI1059226@linux.ibm.com [rppt@linux.ibm.com; build fix] Link: http://lkml.kernel.org/r/20200423141845.GI13521@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Christophe Leroy <christophe.leroy@c-s.fr> # 8xx and 83xx Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Brian Cain <bcain@codeaurora.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: James Morse <james.morse@arm.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Julien Thierry <julien.thierry.kdev@gmail.com> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/20200414153455.21744-9-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04 16:46:44 -07:00
#define RADIX_P4D_BAD_BITS 0x60000000000000e0UL
#define RADIX_PMD_SHIFT (PAGE_SHIFT + RADIX_PTE_INDEX_SIZE)
#define RADIX_PUD_SHIFT (RADIX_PMD_SHIFT + RADIX_PMD_INDEX_SIZE)
#define RADIX_PGD_SHIFT (RADIX_PUD_SHIFT + RADIX_PUD_INDEX_SIZE)
powerpc: Book3S 64-bit outline-only KASAN support Implement a limited form of KASAN for Book3S 64-bit machines running under the Radix MMU, supporting only outline mode. - Enable the compiler instrumentation to check addresses and maintain the shadow region. (This is the guts of KASAN which we can easily reuse.) - Require kasan-vmalloc support to handle modules and anything else in vmalloc space. - KASAN needs to be able to validate all pointer accesses, but we can't instrument all kernel addresses - only linear map and vmalloc. On boot, set up a single page of read-only shadow that marks all iomap and vmemmap accesses as valid. - Document KASAN in powerpc docs. Background ---------- KASAN support on Book3S is a bit tricky to get right: - It would be good to support inline instrumentation so as to be able to catch stack issues that cannot be caught with outline mode. - Inline instrumentation requires a fixed offset. - Book3S runs code with translations off ("real mode") during boot, including a lot of generic device-tree parsing code which is used to determine MMU features. [ppc64 mm note: The kernel installs a linear mapping at effective address c000...-c008.... This is a one-to-one mapping with physical memory from 0000... onward. Because of how memory accesses work on powerpc 64-bit Book3S, a kernel pointer in the linear map accesses the same memory both with translations on (accessing as an 'effective address'), and with translations off (accessing as a 'real address'). This works in both guests and the hypervisor. For more details, see s5.7 of Book III of version 3 of the ISA, in particular the Storage Control Overview, s5.7.3, and s5.7.5 - noting that this KASAN implementation currently only supports Radix.] - Some code - most notably a lot of KVM code - also runs with translations off after boot. - Therefore any offset has to point to memory that is valid with translations on or off. One approach is just to give up on inline instrumentation. This way boot-time checks can be delayed until after the MMU is set is up, and we can just not instrument any code that runs with translations off after booting. Take this approach for now and require outline instrumentation. Previous attempts allowed inline instrumentation. However, they came with some unfortunate restrictions: only physically contiguous memory could be used and it had to be specified at compile time. Maybe we can do better in the future. [paulus@ozlabs.org - Rebased onto 5.17. Note that a kernel with CONFIG_KASAN=y will crash during boot on a machine using HPT translation because not all the entry points to the generic KASAN code are protected with a call to kasan_arch_is_ready().] Originally-by: Balbir Singh <bsingharora@gmail.com> # ppc64 out-of-line radix version Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> [mpe: Update copyright year and comment formatting] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/YoTE69OQwiG7z+Gu@cleo
2022-05-18 20:05:31 +10:00
#define R_PTRS_PER_PTE (1 << RADIX_PTE_INDEX_SIZE)
#define R_PTRS_PER_PMD (1 << RADIX_PMD_INDEX_SIZE)
#define R_PTRS_PER_PUD (1 << RADIX_PUD_INDEX_SIZE)
/*
* Size of EA range mapped by our pagetables.
*/
#define RADIX_PGTABLE_EADDR_SIZE (RADIX_PTE_INDEX_SIZE + RADIX_PMD_INDEX_SIZE + \
RADIX_PUD_INDEX_SIZE + RADIX_PGD_INDEX_SIZE + PAGE_SHIFT)
#define RADIX_PGTABLE_RANGE (ASM_CONST(1) << RADIX_PGTABLE_EADDR_SIZE)
/*
* We support 52 bit address space, Use top bit for kernel
* virtual mapping. Also make sure kernel fit in the top
* quadrant.
*
* +------------------+
* +------------------+ Kernel virtual map (0xc008000000000000)
* | |
* | |
* | |
* 0b11......+------------------+ Kernel linear map (0xc....)
* | |
* | 2 quadrant |
* | |
* 0b10......+------------------+
* | |
* | 1 quadrant |
* | |
* 0b01......+------------------+
* | |
* | 0 quadrant |
* | |
* 0b00......+------------------+
*
*
* 3rd quadrant expanded:
powerpc: Book3S 64-bit outline-only KASAN support Implement a limited form of KASAN for Book3S 64-bit machines running under the Radix MMU, supporting only outline mode. - Enable the compiler instrumentation to check addresses and maintain the shadow region. (This is the guts of KASAN which we can easily reuse.) - Require kasan-vmalloc support to handle modules and anything else in vmalloc space. - KASAN needs to be able to validate all pointer accesses, but we can't instrument all kernel addresses - only linear map and vmalloc. On boot, set up a single page of read-only shadow that marks all iomap and vmemmap accesses as valid. - Document KASAN in powerpc docs. Background ---------- KASAN support on Book3S is a bit tricky to get right: - It would be good to support inline instrumentation so as to be able to catch stack issues that cannot be caught with outline mode. - Inline instrumentation requires a fixed offset. - Book3S runs code with translations off ("real mode") during boot, including a lot of generic device-tree parsing code which is used to determine MMU features. [ppc64 mm note: The kernel installs a linear mapping at effective address c000...-c008.... This is a one-to-one mapping with physical memory from 0000... onward. Because of how memory accesses work on powerpc 64-bit Book3S, a kernel pointer in the linear map accesses the same memory both with translations on (accessing as an 'effective address'), and with translations off (accessing as a 'real address'). This works in both guests and the hypervisor. For more details, see s5.7 of Book III of version 3 of the ISA, in particular the Storage Control Overview, s5.7.3, and s5.7.5 - noting that this KASAN implementation currently only supports Radix.] - Some code - most notably a lot of KVM code - also runs with translations off after boot. - Therefore any offset has to point to memory that is valid with translations on or off. One approach is just to give up on inline instrumentation. This way boot-time checks can be delayed until after the MMU is set is up, and we can just not instrument any code that runs with translations off after booting. Take this approach for now and require outline instrumentation. Previous attempts allowed inline instrumentation. However, they came with some unfortunate restrictions: only physically contiguous memory could be used and it had to be specified at compile time. Maybe we can do better in the future. [paulus@ozlabs.org - Rebased onto 5.17. Note that a kernel with CONFIG_KASAN=y will crash during boot on a machine using HPT translation because not all the entry points to the generic KASAN code are protected with a call to kasan_arch_is_ready().] Originally-by: Balbir Singh <bsingharora@gmail.com> # ppc64 out-of-line radix version Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> [mpe: Update copyright year and comment formatting] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/YoTE69OQwiG7z+Gu@cleo
2022-05-18 20:05:31 +10:00
* +------------------------------+ Highest address (0xc010000000000000)
* +------------------------------+ KASAN shadow end (0xc00fc00000000000)
* | |
* | |
powerpc: Book3S 64-bit outline-only KASAN support Implement a limited form of KASAN for Book3S 64-bit machines running under the Radix MMU, supporting only outline mode. - Enable the compiler instrumentation to check addresses and maintain the shadow region. (This is the guts of KASAN which we can easily reuse.) - Require kasan-vmalloc support to handle modules and anything else in vmalloc space. - KASAN needs to be able to validate all pointer accesses, but we can't instrument all kernel addresses - only linear map and vmalloc. On boot, set up a single page of read-only shadow that marks all iomap and vmemmap accesses as valid. - Document KASAN in powerpc docs. Background ---------- KASAN support on Book3S is a bit tricky to get right: - It would be good to support inline instrumentation so as to be able to catch stack issues that cannot be caught with outline mode. - Inline instrumentation requires a fixed offset. - Book3S runs code with translations off ("real mode") during boot, including a lot of generic device-tree parsing code which is used to determine MMU features. [ppc64 mm note: The kernel installs a linear mapping at effective address c000...-c008.... This is a one-to-one mapping with physical memory from 0000... onward. Because of how memory accesses work on powerpc 64-bit Book3S, a kernel pointer in the linear map accesses the same memory both with translations on (accessing as an 'effective address'), and with translations off (accessing as a 'real address'). This works in both guests and the hypervisor. For more details, see s5.7 of Book III of version 3 of the ISA, in particular the Storage Control Overview, s5.7.3, and s5.7.5 - noting that this KASAN implementation currently only supports Radix.] - Some code - most notably a lot of KVM code - also runs with translations off after boot. - Therefore any offset has to point to memory that is valid with translations on or off. One approach is just to give up on inline instrumentation. This way boot-time checks can be delayed until after the MMU is set is up, and we can just not instrument any code that runs with translations off after booting. Take this approach for now and require outline instrumentation. Previous attempts allowed inline instrumentation. However, they came with some unfortunate restrictions: only physically contiguous memory could be used and it had to be specified at compile time. Maybe we can do better in the future. [paulus@ozlabs.org - Rebased onto 5.17. Note that a kernel with CONFIG_KASAN=y will crash during boot on a machine using HPT translation because not all the entry points to the generic KASAN code are protected with a call to kasan_arch_is_ready().] Originally-by: Balbir Singh <bsingharora@gmail.com> # ppc64 out-of-line radix version Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> [mpe: Update copyright year and comment formatting] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/YoTE69OQwiG7z+Gu@cleo
2022-05-18 20:05:31 +10:00
* +------------------------------+ Kernel vmemmap end/shadow start (0xc00e000000000000)
* | |
* | 512TB |
* | |
* +------------------------------+ Kernel IO map end/vmemap start
* | |
* | 512TB |
* | |
* +------------------------------+ Kernel vmap end/ IO map start
* | |
* | 512TB |
* | |
* +------------------------------+ Kernel virt start (0xc008000000000000)
* | |
* | |
* | |
* +------------------------------+ Kernel linear (0xc.....)
*/
powerpc: Book3S 64-bit outline-only KASAN support Implement a limited form of KASAN for Book3S 64-bit machines running under the Radix MMU, supporting only outline mode. - Enable the compiler instrumentation to check addresses and maintain the shadow region. (This is the guts of KASAN which we can easily reuse.) - Require kasan-vmalloc support to handle modules and anything else in vmalloc space. - KASAN needs to be able to validate all pointer accesses, but we can't instrument all kernel addresses - only linear map and vmalloc. On boot, set up a single page of read-only shadow that marks all iomap and vmemmap accesses as valid. - Document KASAN in powerpc docs. Background ---------- KASAN support on Book3S is a bit tricky to get right: - It would be good to support inline instrumentation so as to be able to catch stack issues that cannot be caught with outline mode. - Inline instrumentation requires a fixed offset. - Book3S runs code with translations off ("real mode") during boot, including a lot of generic device-tree parsing code which is used to determine MMU features. [ppc64 mm note: The kernel installs a linear mapping at effective address c000...-c008.... This is a one-to-one mapping with physical memory from 0000... onward. Because of how memory accesses work on powerpc 64-bit Book3S, a kernel pointer in the linear map accesses the same memory both with translations on (accessing as an 'effective address'), and with translations off (accessing as a 'real address'). This works in both guests and the hypervisor. For more details, see s5.7 of Book III of version 3 of the ISA, in particular the Storage Control Overview, s5.7.3, and s5.7.5 - noting that this KASAN implementation currently only supports Radix.] - Some code - most notably a lot of KVM code - also runs with translations off after boot. - Therefore any offset has to point to memory that is valid with translations on or off. One approach is just to give up on inline instrumentation. This way boot-time checks can be delayed until after the MMU is set is up, and we can just not instrument any code that runs with translations off after booting. Take this approach for now and require outline instrumentation. Previous attempts allowed inline instrumentation. However, they came with some unfortunate restrictions: only physically contiguous memory could be used and it had to be specified at compile time. Maybe we can do better in the future. [paulus@ozlabs.org - Rebased onto 5.17. Note that a kernel with CONFIG_KASAN=y will crash during boot on a machine using HPT translation because not all the entry points to the generic KASAN code are protected with a call to kasan_arch_is_ready().] Originally-by: Balbir Singh <bsingharora@gmail.com> # ppc64 out-of-line radix version Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> [mpe: Update copyright year and comment formatting] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/YoTE69OQwiG7z+Gu@cleo
2022-05-18 20:05:31 +10:00
/* For the sizes of the shadow area, see kasan.h */
/*
* If we store section details in page->flags we can't increase the MAX_PHYSMEM_BITS
* if we increase SECTIONS_WIDTH we will not store node details in page->flags and
* page_to_nid does a page->section->node lookup
* Hence only increase for VMEMMAP. Further depending on SPARSEMEM_EXTREME reduce
* memory requirements with large number of sections.
* 51 bits is the max physical real address on POWER9
*/
#if defined(CONFIG_SPARSEMEM_VMEMMAP) && defined(CONFIG_SPARSEMEM_EXTREME)
#define R_MAX_PHYSMEM_BITS 51
#else
#define R_MAX_PHYSMEM_BITS 46
#endif
#define RADIX_KERN_VIRT_START ASM_CONST(0xc008000000000000)
/*
* 49 = MAX_EA_BITS_PER_CONTEXT (hash specific). To make sure we pick
* the same value as hash.
*/
#define RADIX_KERN_MAP_SIZE (1UL << 49)
#define RADIX_VMALLOC_START RADIX_KERN_VIRT_START
#define RADIX_VMALLOC_SIZE RADIX_KERN_MAP_SIZE
#define RADIX_VMALLOC_END (RADIX_VMALLOC_START + RADIX_VMALLOC_SIZE)
#define RADIX_KERN_IO_START RADIX_VMALLOC_END
#define RADIX_KERN_IO_SIZE RADIX_KERN_MAP_SIZE
#define RADIX_KERN_IO_END (RADIX_KERN_IO_START + RADIX_KERN_IO_SIZE)
#define RADIX_VMEMMAP_START RADIX_KERN_IO_END
#define RADIX_VMEMMAP_SIZE RADIX_KERN_MAP_SIZE
#define RADIX_VMEMMAP_END (RADIX_VMEMMAP_START + RADIX_VMEMMAP_SIZE)
#ifndef __ASSEMBLY__
#define RADIX_PTE_TABLE_SIZE (sizeof(pte_t) << RADIX_PTE_INDEX_SIZE)
#define RADIX_PMD_TABLE_SIZE (sizeof(pmd_t) << RADIX_PMD_INDEX_SIZE)
#define RADIX_PUD_TABLE_SIZE (sizeof(pud_t) << RADIX_PUD_INDEX_SIZE)
#define RADIX_PGD_TABLE_SIZE (sizeof(pgd_t) << RADIX_PGD_INDEX_SIZE)
#ifdef CONFIG_STRICT_KERNEL_RWX
extern void radix__mark_rodata_ro(void);
extern void radix__mark_initmem_nx(void);
#endif
extern void radix__ptep_set_access_flags(struct vm_area_struct *vma, pte_t *ptep,
pte_t entry, unsigned long address,
int psize);
extern void radix__ptep_modify_prot_commit(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep,
pte_t old_pte, pte_t pte);
static inline unsigned long __radix_pte_update(pte_t *ptep, unsigned long clr,
unsigned long set)
{
__be64 old_be, tmp_be;
__asm__ __volatile__(
"1: ldarx %0,0,%3 # pte_update\n"
" andc %1,%0,%5 \n"
" or %1,%1,%4 \n"
" stdcx. %1,0,%3 \n"
" bne- 1b"
: "=&r" (old_be), "=&r" (tmp_be), "=m" (*ptep)
: "r" (ptep), "r" (cpu_to_be64(set)), "r" (cpu_to_be64(clr))
: "cc" );
return be64_to_cpu(old_be);
}
static inline unsigned long radix__pte_update(struct mm_struct *mm,
unsigned long addr,
pte_t *ptep, unsigned long clr,
unsigned long set,
int huge)
{
unsigned long old_pte;
old_pte = __radix_pte_update(ptep, clr, set);
if (!huge)
assert_pte_locked(mm, addr);
return old_pte;
}
static inline pte_t radix__ptep_get_and_clear_full(struct mm_struct *mm,
unsigned long addr,
pte_t *ptep, int full)
{
unsigned long old_pte;
if (full) {
old_pte = pte_val(*ptep);
*ptep = __pte(0);
} else
old_pte = radix__pte_update(mm, addr, ptep, ~0ul, 0, 0);
return __pte(old_pte);
}
static inline int radix__pte_same(pte_t pte_a, pte_t pte_b)
{
return ((pte_raw(pte_a) ^ pte_raw(pte_b)) == 0);
}
static inline int radix__pte_none(pte_t pte)
{
return (pte_val(pte) & ~RADIX_PTE_NONE_MASK) == 0;
}
static inline void radix__set_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte, int percpu)
{
*ptep = pte;
/*
* The architecture suggests a ptesync after setting the pte, which
* orders the store that updates the pte with subsequent page table
* walk accesses which may load the pte. Without this it may be
* possible for a subsequent access to result in spurious fault.
*
* This is not necessary for correctness, because a spurious fault
* is tolerated by the page fault handler, and this store will
* eventually be seen. In testing, there was no noticable increase
* in user faults on POWER9. Avoiding ptesync here is a significant
* win for things like fork. If a future microarchitecture benefits
* from ptesync, it should probably go into update_mmu_cache, rather
* than set_pte_at (which is used to set ptes unrelated to faults).
*
powerpc/64s: Fix pte update for kernel memory on radix When adding a PTE a ptesync is needed to order the update of the PTE with subsequent accesses otherwise a spurious fault may be raised. radix__set_pte_at() does not do this for performance gains. For non-kernel memory this is not an issue as any faults of this kind are corrected by the page fault handler. For kernel memory these faults are not handled. The current solution is that there is a ptesync in flush_cache_vmap() which should be called when mapping from the vmalloc region. However, map_kernel_page() does not call flush_cache_vmap(). This is troublesome in particular for code patching with Strict RWX on radix. In do_patch_instruction() the page frame that contains the instruction to be patched is mapped and then immediately patched. With no ordering or synchronization between setting up the PTE and writing to the page it is possible for faults. As the code patching is done using __put_user_asm_goto() the resulting fault is obscured - but using a normal store instead it can be seen: BUG: Unable to handle kernel data access on write at 0xc008000008f24a3c Faulting instruction address: 0xc00000000008bd74 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: nop_module(PO+) [last unloaded: nop_module] CPU: 4 PID: 757 Comm: sh Tainted: P O 5.10.0-rc5-01361-ge3c1b78c8440-dirty #43 NIP: c00000000008bd74 LR: c00000000008bd50 CTR: c000000000025810 REGS: c000000016f634a0 TRAP: 0300 Tainted: P O (5.10.0-rc5-01361-ge3c1b78c8440-dirty) MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 44002884 XER: 00000000 CFAR: c00000000007c68c DAR: c008000008f24a3c DSISR: 42000000 IRQMASK: 1 This results in the kind of issue reported here: https://lore.kernel.org/linuxppc-dev/15AC5B0E-A221-4B8C-9039-FA96B8EF7C88@lca.pw/ Chris Riedl suggested a reliable way to reproduce the issue: $ mount -t debugfs none /sys/kernel/debug $ (while true; do echo function > /sys/kernel/debug/tracing/current_tracer ; echo nop > /sys/kernel/debug/tracing/current_tracer ; done) & Turning ftrace on and off does a large amount of code patching which in usually less then 5min will crash giving a trace like: ftrace-powerpc: (____ptrval____): replaced (4b473b11) != old (60000000) ------------[ ftrace bug ]------------ ftrace failed to modify [<c000000000bf8e5c>] napi_busy_loop+0xc/0x390 actual: 11:3b:47:4b Setting ftrace call site to call ftrace function ftrace record flags: 80000001 (1) expected tramp: c00000000006c96c ------------[ cut here ]------------ WARNING: CPU: 4 PID: 809 at kernel/trace/ftrace.c:2065 ftrace_bug+0x28c/0x2e8 Modules linked in: nop_module(PO-) [last unloaded: nop_module] CPU: 4 PID: 809 Comm: sh Tainted: P O 5.10.0-rc5-01360-gf878ccaf250a #1 NIP: c00000000024f334 LR: c00000000024f330 CTR: c0000000001a5af0 REGS: c000000004c8b760 TRAP: 0700 Tainted: P O (5.10.0-rc5-01360-gf878ccaf250a) MSR: 900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 28008848 XER: 20040000 CFAR: c0000000001a9c98 IRQMASK: 0 GPR00: c00000000024f330 c000000004c8b9f0 c000000002770600 0000000000000022 GPR04: 00000000ffff7fff c000000004c8b6d0 0000000000000027 c0000007fe9bcdd8 GPR08: 0000000000000023 ffffffffffffffd8 0000000000000027 c000000002613118 GPR12: 0000000000008000 c0000007fffdca00 0000000000000000 0000000000000000 GPR16: 0000000023ec37c5 0000000000000000 0000000000000000 0000000000000008 GPR20: c000000004c8bc90 c0000000027a2d20 c000000004c8bcd0 c000000002612fe8 GPR24: 0000000000000038 0000000000000030 0000000000000028 0000000000000020 GPR28: c000000000ff1b68 c000000000bf8e5c c00000000312f700 c000000000fbb9b0 NIP ftrace_bug+0x28c/0x2e8 LR ftrace_bug+0x288/0x2e8 Call Trace: ftrace_bug+0x288/0x2e8 (unreliable) ftrace_modify_all_code+0x168/0x210 arch_ftrace_update_code+0x18/0x30 ftrace_run_update_code+0x44/0xc0 ftrace_startup+0xf8/0x1c0 register_ftrace_function+0x4c/0xc0 function_trace_init+0x80/0xb0 tracing_set_tracer+0x2a4/0x4f0 tracing_set_trace_write+0xd4/0x130 vfs_write+0xf0/0x330 ksys_write+0x84/0x140 system_call_exception+0x14c/0x230 system_call_common+0xf0/0x27c To fix this when updating kernel memory PTEs using ptesync. Fixes: f1cb8f9beba8 ("powerpc/64s/radix: avoid ptesync after set_pte and ptep_set_access_flags") Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Tidy up change log slightly] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210208032957.1232102-1-jniethe5@gmail.com
2021-02-08 14:29:56 +11:00
* Spurious faults from the kernel memory are not tolerated, so there
* is a ptesync in flush_cache_vmap, and __map_kernel_page() follows
* the pte update sequence from ISA Book III 6.10 Translation Table
* Update Synchronization Requirements.
*/
}
static inline int radix__pmd_bad(pmd_t pmd)
{
return !!(pmd_val(pmd) & RADIX_PMD_BAD_BITS);
}
static inline int radix__pmd_same(pmd_t pmd_a, pmd_t pmd_b)
{
return ((pmd_raw(pmd_a) ^ pmd_raw(pmd_b)) == 0);
}
static inline int radix__pud_bad(pud_t pud)
{
return !!(pud_val(pud) & RADIX_PUD_BAD_BITS);
}
static inline int radix__pud_same(pud_t pud_a, pud_t pud_b)
{
return ((pud_raw(pud_a) ^ pud_raw(pud_b)) == 0);
}
powerpc: add support for folded p4d page tables Implement primitives necessary for the 4th level folding, add walks of p4d level where appropriate and replace 5level-fixup.h with pgtable-nop4d.h. [rppt@linux.ibm.com: powerpc/xmon: drop unused pgdir varialble in show_pte() function] Link: http://lkml.kernel.org/r/20200519181454.GI1059226@linux.ibm.com [rppt@linux.ibm.com; build fix] Link: http://lkml.kernel.org/r/20200423141845.GI13521@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Christophe Leroy <christophe.leroy@c-s.fr> # 8xx and 83xx Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Brian Cain <bcain@codeaurora.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: James Morse <james.morse@arm.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Julien Thierry <julien.thierry.kdev@gmail.com> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/20200414153455.21744-9-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04 16:46:44 -07:00
static inline int radix__p4d_bad(p4d_t p4d)
{
powerpc: add support for folded p4d page tables Implement primitives necessary for the 4th level folding, add walks of p4d level where appropriate and replace 5level-fixup.h with pgtable-nop4d.h. [rppt@linux.ibm.com: powerpc/xmon: drop unused pgdir varialble in show_pte() function] Link: http://lkml.kernel.org/r/20200519181454.GI1059226@linux.ibm.com [rppt@linux.ibm.com; build fix] Link: http://lkml.kernel.org/r/20200423141845.GI13521@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Christophe Leroy <christophe.leroy@c-s.fr> # 8xx and 83xx Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Brian Cain <bcain@codeaurora.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: James Morse <james.morse@arm.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Julien Thierry <julien.thierry.kdev@gmail.com> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/20200414153455.21744-9-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04 16:46:44 -07:00
return !!(p4d_val(p4d) & RADIX_P4D_BAD_BITS);
}
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
static inline int radix__pmd_trans_huge(pmd_t pmd)
{
return (pmd_val(pmd) & _PAGE_PTE) == _PAGE_PTE;
}
static inline pmd_t radix__pmd_mkhuge(pmd_t pmd)
{
return __pmd(pmd_val(pmd) | _PAGE_PTE);
}
static inline int radix__pud_trans_huge(pud_t pud)
{
return (pud_val(pud) & _PAGE_PTE) == _PAGE_PTE;
}
static inline pud_t radix__pud_mkhuge(pud_t pud)
{
return __pud(pud_val(pud) | _PAGE_PTE);
}
extern unsigned long radix__pmd_hugepage_update(struct mm_struct *mm, unsigned long addr,
pmd_t *pmdp, unsigned long clr,
unsigned long set);
extern unsigned long radix__pud_hugepage_update(struct mm_struct *mm, unsigned long addr,
pud_t *pudp, unsigned long clr,
unsigned long set);
extern pmd_t radix__pmdp_collapse_flush(struct vm_area_struct *vma,
unsigned long address, pmd_t *pmdp);
extern void radix__pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
pgtable_t pgtable);
extern pgtable_t radix__pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp);
extern pmd_t radix__pmdp_huge_get_and_clear(struct mm_struct *mm,
unsigned long addr, pmd_t *pmdp);
pud_t radix__pudp_huge_get_and_clear(struct mm_struct *mm,
unsigned long addr, pud_t *pudp);
static inline int radix__has_transparent_hugepage(void)
{
/* For radix 2M at PMD level means thp */
if (mmu_psize_defs[MMU_PAGE_2M].shift == PMD_SHIFT)
return 1;
return 0;
}
static inline int radix__has_transparent_pud_hugepage(void)
{
/* For radix 1G at PUD level means pud hugepage support */
if (mmu_psize_defs[MMU_PAGE_1G].shift == PUD_SHIFT)
return 1;
return 0;
}
#endif
struct vmem_altmap;
struct dev_pagemap;
extern int __meminit radix__vmemmap_create_mapping(unsigned long start,
unsigned long page_size,
unsigned long phys);
int __meminit radix__vmemmap_populate(unsigned long start, unsigned long end,
int node, struct vmem_altmap *altmap);
powerpc/book3s64/vmemmap: switch radix to use a different vmemmap handling function This is in preparation to update radix to implement vmemmap optimization for devdax. Below are the rules w.r.t radix vmemmap mapping 1. First try to map things using PMD (2M) 2. With altmap if altmap cross-boundary check returns true, fall back to PAGE_SIZE 3. If we can't allocate PMD_SIZE backing memory for vmemmap, fallback to PAGE_SIZE On removing vmemmap mapping, check if every subsection that is using the vmemmap area is invalid. If found to be invalid, that implies we can safely free the vmemmap area. We don't use the PAGE_UNUSED pattern used by x86 because with 64K page size, we need to do the above check even at the PAGE_SIZE granularity. [aneesh.kumar@linux.ibm.com: fix section mismatch warning] Link: https://lkml.kernel.org/r/87h6pqvu5g.fsf@linux.ibm.com [aneesh.kumar@linux.ibm.com: fix kernel build error] Link: https://lkml.kernel.org/r/877cqkwd20.fsf@linux.ibm.com Link: https://lkml.kernel.org/r/20230724190759.483013-11-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-25 00:37:56 +05:30
void __ref radix__vmemmap_free(unsigned long start, unsigned long end,
struct vmem_altmap *altmap);
extern void radix__vmemmap_remove_mapping(unsigned long start,
unsigned long page_size);
extern int radix__map_kernel_page(unsigned long ea, unsigned long pa,
pgprot_t flags, unsigned int psz);
static inline unsigned long radix__get_tree_size(void)
{
unsigned long rts_field;
/*
* We support 52 bits, hence:
* bits 52 - 31 = 21, 0b10101
* RTS encoding details
* bits 0 - 3 of rts -> bits 6 - 8 unsigned long
* bits 4 - 5 of rts -> bits 62 - 63 of unsigned long
*/
rts_field = (0x5UL << 5); /* 6 - 8 bits */
rts_field |= (0x2UL << 61);
return rts_field;
}
#ifdef CONFIG_MEMORY_HOTPLUG
int radix__create_section_mapping(unsigned long start, unsigned long end,
int nid, pgprot_t prot);
int radix__remove_section_mapping(unsigned long start, unsigned long end);
#endif /* CONFIG_MEMORY_HOTPLUG */
#ifdef CONFIG_ARCH_WANT_OPTIMIZE_DAX_VMEMMAP
#define vmemmap_can_optimize vmemmap_can_optimize
bool vmemmap_can_optimize(struct vmem_altmap *altmap, struct dev_pagemap *pgmap);
#endif
#define vmemmap_populate_compound_pages vmemmap_populate_compound_pages
int __meminit vmemmap_populate_compound_pages(unsigned long start_pfn,
unsigned long start,
unsigned long end, int node,
struct dev_pagemap *pgmap);
#endif /* __ASSEMBLY__ */
#endif