linux/arch/powerpc/include/asm/book3s/32/pgtable.h

608 lines
18 KiB
C
Raw Permalink Normal View History

License cleanup: add SPDX GPL-2.0 license identifier to files with no license Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 15:07:57 +01:00
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _ASM_POWERPC_BOOK3S_32_PGTABLE_H
#define _ASM_POWERPC_BOOK3S_32_PGTABLE_H
#include <asm-generic/pgtable-nopmd.h>
/*
* The "classic" 32-bit implementation of the PowerPC MMU uses a hash
* table containing PTEs, together with a set of 16 segment registers,
* to define the virtual to physical address mapping.
*
* We use the hash table as an extended TLB, i.e. a cache of currently
* active mappings. We maintain a two-level page table tree, much
* like that used by the i386, for the sake of the Linux memory
* management code. Low-level assembler code in hash_low_32.S
* (procedure hash_page) is responsible for extracting ptes from the
* tree and putting them into the hash table when necessary, and
* updating the accessed and modified bits in the page table tree.
*/
#define _PAGE_PRESENT 0x001 /* software: pte contains a translation */
#define _PAGE_HASHPTE 0x002 /* hash_page has made an HPTE for this pte */
#define _PAGE_READ 0x004 /* software: read access allowed */
#define _PAGE_GUARDED 0x008 /* G: prohibit speculative access */
#define _PAGE_COHERENT 0x010 /* M: enforce memory coherence (SMP systems) */
#define _PAGE_NO_CACHE 0x020 /* I: cache inhibit */
#define _PAGE_WRITETHRU 0x040 /* W: cache write-through */
#define _PAGE_DIRTY 0x080 /* C: page changed */
#define _PAGE_ACCESSED 0x100 /* R: page referenced */
#define _PAGE_EXEC 0x200 /* software: exec allowed */
#define _PAGE_WRITE 0x400 /* software: user write access allowed */
#define _PAGE_SPECIAL 0x800 /* software: Special page */
#ifdef CONFIG_PTE_64BIT
/* We never clear the high word of the pte */
#define _PTE_NONE_MASK (0xffffffff00000000ULL | _PAGE_HASHPTE)
#else
#define _PTE_NONE_MASK _PAGE_HASHPTE
#endif
#define _PMD_PRESENT 0
#define _PMD_PRESENT_MASK (PAGE_MASK)
#define _PMD_BAD (~PAGE_MASK)
/* We borrow the _PAGE_READ bit to store the exclusive marker in swap PTEs. */
#define _PAGE_SWP_EXCLUSIVE _PAGE_READ
/* And here we include common definitions */
#define _PAGE_HPTEFLAGS _PAGE_HASHPTE
/*
* Location of the PFN in the PTE. Most 32-bit platforms use the same
* as _PAGE_SHIFT here (ie, naturally aligned).
* Platform who don't just pre-define the value so we don't override it here.
*/
#define PTE_RPN_SHIFT (PAGE_SHIFT)
/*
* The mask covered by the RPN must be a ULL on 32-bit platforms with
* 64-bit PTEs.
*/
#ifdef CONFIG_PTE_64BIT
#define PTE_RPN_MASK (~((1ULL << PTE_RPN_SHIFT) - 1))
arch: pgtable: define MAX_POSSIBLE_PHYSMEM_BITS where needed Stefan Agner reported a bug when using zsram on 32-bit Arm machines with RAM above the 4GB address boundary: Unable to handle kernel NULL pointer dereference at virtual address 00000000 pgd = a27bd01c [00000000] *pgd=236a0003, *pmd=1ffa64003 Internal error: Oops: 207 [#1] SMP ARM Modules linked in: mdio_bcm_unimac(+) brcmfmac cfg80211 brcmutil raspberrypi_hwmon hci_uart crc32_arm_ce bcm2711_thermal phy_generic genet CPU: 0 PID: 123 Comm: mkfs.ext4 Not tainted 5.9.6 #1 Hardware name: BCM2711 PC is at zs_map_object+0x94/0x338 LR is at zram_bvec_rw.constprop.0+0x330/0xa64 pc : [<c0602b38>] lr : [<c0bda6a0>] psr: 60000013 sp : e376bbe0 ip : 00000000 fp : c1e2921c r10: 00000002 r9 : c1dda730 r8 : 00000000 r7 : e8ff7a00 r6 : 00000000 r5 : 02f9ffa0 r4 : e3710000 r3 : 000fdffe r2 : c1e0ce80 r1 : ebf979a0 r0 : 00000000 Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user Control: 30c5383d Table: 235c2a80 DAC: fffffffd Process mkfs.ext4 (pid: 123, stack limit = 0x495a22e6) Stack: (0xe376bbe0 to 0xe376c000) As it turns out, zsram needs to know the maximum memory size, which is defined in MAX_PHYSMEM_BITS when CONFIG_SPARSEMEM is set, or in MAX_POSSIBLE_PHYSMEM_BITS on the x86 architecture. The same problem will be hit on all 32-bit architectures that have a physical address space larger than 4GB and happen to not enable sparsemem and include asm/sparsemem.h from asm/pgtable.h. After the initial discussion, I suggested just always defining MAX_POSSIBLE_PHYSMEM_BITS whenever CONFIG_PHYS_ADDR_T_64BIT is set, or provoking a build error otherwise. This addresses all configurations that can currently have this runtime bug, but leaves all other configurations unchanged. I looked up the possible number of bits in source code and datasheets, here is what I found: - on ARC, CONFIG_ARC_HAS_PAE40 controls whether 32 or 40 bits are used - on ARM, CONFIG_LPAE enables 40 bit addressing, without it we never support more than 32 bits, even though supersections in theory allow up to 40 bits as well. - on MIPS, some MIPS32r1 or later chips support 36 bits, and MIPS32r5 XPA supports up to 60 bits in theory, but 40 bits are more than anyone will ever ship - On PowerPC, there are three different implementations of 36 bit addressing, but 32-bit is used without CONFIG_PTE_64BIT - On RISC-V, the normal page table format can support 34 bit addressing. There is no highmem support on RISC-V, so anything above 2GB is unused, but it might be useful to eventually support CONFIG_ZRAM for high pages. Fixes: 61989a80fb3a ("staging: zsmalloc: zsmalloc memory allocation library") Fixes: 02390b87a945 ("mm/zsmalloc: Prepare to variable MAX_PHYSMEM_BITS") Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Reviewed-by: Stefan Agner <stefan@agner.ch> Tested-by: Stefan Agner <stefan@agner.ch> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Link: https://lore.kernel.org/linux-mm/bdfa44bf1c570b05d6c70898e2bbb0acf234ecdf.1604762181.git.stefan@agner.ch/ Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2020-11-11 17:52:58 +01:00
#define MAX_POSSIBLE_PHYSMEM_BITS 36
#else
#define PTE_RPN_MASK (~((1UL << PTE_RPN_SHIFT) - 1))
arch: pgtable: define MAX_POSSIBLE_PHYSMEM_BITS where needed Stefan Agner reported a bug when using zsram on 32-bit Arm machines with RAM above the 4GB address boundary: Unable to handle kernel NULL pointer dereference at virtual address 00000000 pgd = a27bd01c [00000000] *pgd=236a0003, *pmd=1ffa64003 Internal error: Oops: 207 [#1] SMP ARM Modules linked in: mdio_bcm_unimac(+) brcmfmac cfg80211 brcmutil raspberrypi_hwmon hci_uart crc32_arm_ce bcm2711_thermal phy_generic genet CPU: 0 PID: 123 Comm: mkfs.ext4 Not tainted 5.9.6 #1 Hardware name: BCM2711 PC is at zs_map_object+0x94/0x338 LR is at zram_bvec_rw.constprop.0+0x330/0xa64 pc : [<c0602b38>] lr : [<c0bda6a0>] psr: 60000013 sp : e376bbe0 ip : 00000000 fp : c1e2921c r10: 00000002 r9 : c1dda730 r8 : 00000000 r7 : e8ff7a00 r6 : 00000000 r5 : 02f9ffa0 r4 : e3710000 r3 : 000fdffe r2 : c1e0ce80 r1 : ebf979a0 r0 : 00000000 Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user Control: 30c5383d Table: 235c2a80 DAC: fffffffd Process mkfs.ext4 (pid: 123, stack limit = 0x495a22e6) Stack: (0xe376bbe0 to 0xe376c000) As it turns out, zsram needs to know the maximum memory size, which is defined in MAX_PHYSMEM_BITS when CONFIG_SPARSEMEM is set, or in MAX_POSSIBLE_PHYSMEM_BITS on the x86 architecture. The same problem will be hit on all 32-bit architectures that have a physical address space larger than 4GB and happen to not enable sparsemem and include asm/sparsemem.h from asm/pgtable.h. After the initial discussion, I suggested just always defining MAX_POSSIBLE_PHYSMEM_BITS whenever CONFIG_PHYS_ADDR_T_64BIT is set, or provoking a build error otherwise. This addresses all configurations that can currently have this runtime bug, but leaves all other configurations unchanged. I looked up the possible number of bits in source code and datasheets, here is what I found: - on ARC, CONFIG_ARC_HAS_PAE40 controls whether 32 or 40 bits are used - on ARM, CONFIG_LPAE enables 40 bit addressing, without it we never support more than 32 bits, even though supersections in theory allow up to 40 bits as well. - on MIPS, some MIPS32r1 or later chips support 36 bits, and MIPS32r5 XPA supports up to 60 bits in theory, but 40 bits are more than anyone will ever ship - On PowerPC, there are three different implementations of 36 bit addressing, but 32-bit is used without CONFIG_PTE_64BIT - On RISC-V, the normal page table format can support 34 bit addressing. There is no highmem support on RISC-V, so anything above 2GB is unused, but it might be useful to eventually support CONFIG_ZRAM for high pages. Fixes: 61989a80fb3a ("staging: zsmalloc: zsmalloc memory allocation library") Fixes: 02390b87a945 ("mm/zsmalloc: Prepare to variable MAX_PHYSMEM_BITS") Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Reviewed-by: Stefan Agner <stefan@agner.ch> Tested-by: Stefan Agner <stefan@agner.ch> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Link: https://lore.kernel.org/linux-mm/bdfa44bf1c570b05d6c70898e2bbb0acf234ecdf.1604762181.git.stefan@agner.ch/ Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2020-11-11 17:52:58 +01:00
#define MAX_POSSIBLE_PHYSMEM_BITS 32
#endif
/*
* _PAGE_CHG_MASK masks of bits that are to be preserved across
* pgprot changes.
*/
#define _PAGE_CHG_MASK (PTE_RPN_MASK | _PAGE_HASHPTE | _PAGE_DIRTY | \
_PAGE_ACCESSED | _PAGE_SPECIAL)
/*
* We define 2 sets of base prot bits, one for basic pages (ie,
* cacheable kernel and user pages) and one for non cacheable
* pages. We always set _PAGE_COHERENT when SMP is enabled or
* the processor might need it for DMA coherency.
*/
#define _PAGE_BASE_NC (_PAGE_PRESENT | _PAGE_ACCESSED)
#define _PAGE_BASE (_PAGE_BASE_NC | _PAGE_COHERENT)
#include <asm/pgtable-masks.h>
/* Permission masks used for kernel mappings */
#define PAGE_KERNEL __pgprot(_PAGE_BASE | _PAGE_KERNEL_RW)
#define PAGE_KERNEL_NC __pgprot(_PAGE_BASE_NC | _PAGE_KERNEL_RW | _PAGE_NO_CACHE)
#define PAGE_KERNEL_NCG __pgprot(_PAGE_BASE_NC | _PAGE_KERNEL_RW | _PAGE_NO_CACHE | _PAGE_GUARDED)
#define PAGE_KERNEL_X __pgprot(_PAGE_BASE | _PAGE_KERNEL_RWX)
#define PAGE_KERNEL_RO __pgprot(_PAGE_BASE | _PAGE_KERNEL_RO)
#define PAGE_KERNEL_ROX __pgprot(_PAGE_BASE | _PAGE_KERNEL_ROX)
#define PTE_INDEX_SIZE PTE_SHIFT
#define PMD_INDEX_SIZE 0
#define PUD_INDEX_SIZE 0
#define PGD_INDEX_SIZE (32 - PGDIR_SHIFT)
#define PMD_CACHE_INDEX PMD_INDEX_SIZE
#define PUD_CACHE_INDEX PUD_INDEX_SIZE
#ifndef __ASSEMBLY__
#define PTE_TABLE_SIZE (sizeof(pte_t) << PTE_INDEX_SIZE)
#define PMD_TABLE_SIZE 0
#define PUD_TABLE_SIZE 0
#define PGD_TABLE_SIZE (sizeof(pgd_t) << PGD_INDEX_SIZE)
mm: consolidate pte_index() and pte_offset_*() definitions All architectures define pte_index() as (address >> PAGE_SHIFT) & (PTRS_PER_PTE - 1) and all architectures define pte_offset_kernel() as an entry in the array of PTEs indexed by the pte_index(). For the most architectures the pte_offset_kernel() implementation relies on the availability of pmd_page_vaddr() that converts a PMD entry value to the virtual address of the page containing PTEs array. Let's move x86 definitions of the PTE accessors to the generic place in <linux/pgtable.h> and then simply drop the respective definitions from the other architectures. The architectures that didn't provide pmd_page_vaddr() are updated to have that defined. The generic implementation of pte_offset_kernel() can be overridden by an architecture and alpha makes use of this because it has special ordering requirements for its version of pte_offset_kernel(). [rppt@linux.ibm.com: v2] Link: http://lkml.kernel.org/r/20200514170327.31389-11-rppt@kernel.org [rppt@linux.ibm.com: update] Link: http://lkml.kernel.org/r/20200514170327.31389-12-rppt@kernel.org [rppt@linux.ibm.com: update] Link: http://lkml.kernel.org/r/20200514170327.31389-13-rppt@kernel.org [akpm@linux-foundation.org: fix x86 warning] [sfr@canb.auug.org.au: fix powerpc build] Link: http://lkml.kernel.org/r/20200607153443.GB738695@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Cain <bcain@codeaurora.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Mark Salter <msalter@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Nick Hu <nickhu@andestech.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/20200514170327.31389-10-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-08 21:33:10 -07:00
/* Bits to mask out from a PMD to get to the PTE page */
#define PMD_MASKED_BITS (PTE_TABLE_SIZE - 1)
#endif /* __ASSEMBLY__ */
#define PTRS_PER_PTE (1 << PTE_INDEX_SIZE)
#define PTRS_PER_PGD (1 << PGD_INDEX_SIZE)
/*
* The normal case is that PTEs are 32-bits and we have a 1-page
* 1024-entry pgdir pointing to 1-page 1024-entry PTE pages. -- paulus
*
* For any >32-bit physical address platform, we can use the following
* two level page table layout where the pgdir is 8KB and the MS 13 bits
* are an index to the second level table. The combined pgdir/pmd first
* level has 2048 entries and the second level has 512 64-bit PTE entries.
* -Matt
*/
/* PGDIR_SHIFT determines what a top-level page table entry can map */
#define PGDIR_SHIFT (PAGE_SHIFT + PTE_INDEX_SIZE)
#define PGDIR_SIZE (1UL << PGDIR_SHIFT)
#define PGDIR_MASK (~(PGDIR_SIZE-1))
#define USER_PTRS_PER_PGD (TASK_SIZE / PGDIR_SIZE)
#ifndef __ASSEMBLY__
int map_kernel_page(unsigned long va, phys_addr_t pa, pgprot_t prot);
powerpc/fixmap: Fix VM debug warning on unmap Unmapping a fixmap entry is done by calling __set_fixmap() with FIXMAP_PAGE_CLEAR as flags. Today, powerpc __set_fixmap() calls map_kernel_page(). map_kernel_page() is not happy when called a second time for the same page. WARNING: CPU: 0 PID: 1 at arch/powerpc/mm/pgtable.c:194 set_pte_at+0xc/0x1e8 CPU: 0 PID: 1 Comm: swapper Not tainted 5.16.0-rc3-s3k-dev-01993-g350ff07feb7d-dirty #682 NIP: c0017cd4 LR: c00187f0 CTR: 00000010 REGS: e1011d50 TRAP: 0700 Not tainted (5.16.0-rc3-s3k-dev-01993-g350ff07feb7d-dirty) MSR: 00029032 <EE,ME,IR,DR,RI> CR: 42000208 XER: 00000000 GPR00: c0165fec e1011e10 c14c0000 c0ee2550 ff800000 c0f3d000 00000000 c001686c GPR08: 00001000 b00045a9 00000001 c0f58460 c0f50000 00000000 c0007e10 00000000 GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 GPR24: 00000000 00000000 c0ee2550 00000000 c0f57000 00000ff8 00000000 ff800000 NIP [c0017cd4] set_pte_at+0xc/0x1e8 LR [c00187f0] map_kernel_page+0x9c/0x100 Call Trace: [e1011e10] [c0736c68] vsnprintf+0x358/0x6c8 (unreliable) [e1011e30] [c0165fec] __set_fixmap+0x30/0x44 [e1011e40] [c0c13bdc] early_iounmap+0x11c/0x170 [e1011e70] [c0c06cb0] ioremap_legacy_serial_console+0x88/0xc0 [e1011e90] [c0c03634] do_one_initcall+0x80/0x178 [e1011ef0] [c0c0385c] kernel_init_freeable+0xb4/0x250 [e1011f20] [c0007e34] kernel_init+0x24/0x140 [e1011f30] [c0016268] ret_from_kernel_thread+0x5c/0x64 Instruction dump: 7fe3fb78 48019689 80010014 7c630034 83e1000c 5463d97e 7c0803a6 38210010 4e800020 81250000 712a0001 41820008 <0fe00000> 9421ffe0 93e1001c 48000030 Implement unmap_kernel_page() which clears an existing pte. Reported-by: Maxime Bizon <mbizon@freebox.fr> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Tested-by: Maxime Bizon <mbizon@freebox.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b0b752f6f6ecc60653e873f385c6f0dce4e9ab6a.1638789098.git.christophe.leroy@csgroup.eu
2021-12-06 11:11:51 +00:00
void unmap_kernel_page(unsigned long va);
#endif /* !__ASSEMBLY__ */
/*
* This is the bottom of the PKMAP area with HIGHMEM or an arbitrary
* value (for now) on others, from where we can start layout kernel
* virtual space that goes below PKMAP and FIXMAP
*/
#define FIXADDR_SIZE 0
#ifdef CONFIG_KASAN
#include <asm/kasan.h>
#define FIXADDR_TOP (KASAN_SHADOW_START - PAGE_SIZE)
#else
#define FIXADDR_TOP ((unsigned long)(-PAGE_SIZE))
#endif
/*
* ioremap_bot starts at that address. Early ioremaps move down from there,
* until mem_init() at which point this becomes the top of the vmalloc
* and ioremap space
*/
#ifdef CONFIG_HIGHMEM
#define IOREMAP_TOP PKMAP_BASE
#else
#define IOREMAP_TOP FIXADDR_START
#endif
/* PPC32 shares vmalloc area with ioremap */
#define IOREMAP_START VMALLOC_START
#define IOREMAP_END VMALLOC_END
/*
* Just any arbitrary offset to the start of the vmalloc VM area: the
* current 16MB value just means that there will be a 64MB "hole" after the
* physical memory until the kernel virtual memory starts. That means that
* any out-of-bounds memory accesses will hopefully be caught.
* The vmalloc() routines leaves a hole of 4kB between each vmalloced
* area for the same reason. ;)
*
* We no longer map larger than phys RAM with the BATs so we don't have
* to worry about the VMALLOC_OFFSET causing problems. We do have to worry
* about clashes between our early calls to ioremap() that start growing down
* from ioremap_base being run into the VM area allocations (growing upwards
* from VMALLOC_START). For this reason we have ioremap_bot to check when
* we actually run into our mappings setup in the early boot with the VM
* system. This really does become a problem for machines with good amounts
* of RAM. -- Cort
*/
#define VMALLOC_OFFSET (0x1000000) /* 16M */
powerpc/mm/32s: Use BATs for STRICT_KERNEL_RWX Today, STRICT_KERNEL_RWX is based on the use of regular pages to map kernel pages. On Book3s 32, it has three consequences: - Using pages instead of BAT for mapping kernel linear memory severely impacts performance. - Exec protection is not effective because no-execute cannot be set at page level (except on 603 which doesn't have hash tables) - Write protection is not effective because PP bits do not provide RO mode for kernel-only pages (except on 603 which handles it in software via PAGE_DIRTY) On the 603+, we have: - Independent IBAT and DBAT allowing limitation of exec parts. - NX bit can be set in segment registers to forbit execution on memory mapped by pages. - RO mode on DBATs even for kernel-only blocks. On the 601, there is nothing much we can do other than warn the user about it, because: - BATs are common to instructions and data. - BAT do not provide RO mode for kernel-only blocks. - segment registers don't have the NX bit. In order to use IBAT for exec protection, this patch: - Aligns _etext to BAT block sizes (128kb) - Set NX bit in kernel segment register (Except on vmalloc area when CONFIG_MODULES is selected) - Maps kernel text with IBATs. In order to use DBAT for exec protection, this patch: - Aligns RW DATA to BAT block sizes (4M) - Maps kernel RO area with write prohibited DBATs - Maps remaining memory with remaining DBATs Here is what we get with this patch on a 832x when activating STRICT_KERNEL_RWX: Symbols: c0000000 T _stext c0680000 R __start_rodata c0680000 R _etext c0800000 T __init_begin c0800000 T _sinittext ~# cat /sys/kernel/debug/block_address_translation ---[ Instruction Block Address Translation ]--- 0: 0xc0000000-0xc03fffff 0x00000000 Kernel EXEC coherent 1: 0xc0400000-0xc05fffff 0x00400000 Kernel EXEC coherent 2: 0xc0600000-0xc067ffff 0x00600000 Kernel EXEC coherent 3: - 4: - 5: - 6: - 7: - ---[ Data Block Address Translation ]--- 0: 0xc0000000-0xc07fffff 0x00000000 Kernel RO coherent 1: 0xc0800000-0xc0ffffff 0x00800000 Kernel RW coherent 2: 0xc1000000-0xc1ffffff 0x01000000 Kernel RW coherent 3: 0xc2000000-0xc3ffffff 0x02000000 Kernel RW coherent 4: 0xc4000000-0xc7ffffff 0x04000000 Kernel RW coherent 5: 0xc8000000-0xcfffffff 0x08000000 Kernel RW coherent 6: 0xd0000000-0xdfffffff 0x10000000 Kernel RW coherent 7: - ~# cat /sys/kernel/debug/segment_registers ---[ User Segments ]--- 0x00000000-0x0fffffff Kern key 1 User key 1 VSID 0xa085d0 0x10000000-0x1fffffff Kern key 1 User key 1 VSID 0xa086e1 0x20000000-0x2fffffff Kern key 1 User key 1 VSID 0xa087f2 0x30000000-0x3fffffff Kern key 1 User key 1 VSID 0xa08903 0x40000000-0x4fffffff Kern key 1 User key 1 VSID 0xa08a14 0x50000000-0x5fffffff Kern key 1 User key 1 VSID 0xa08b25 0x60000000-0x6fffffff Kern key 1 User key 1 VSID 0xa08c36 0x70000000-0x7fffffff Kern key 1 User key 1 VSID 0xa08d47 0x80000000-0x8fffffff Kern key 1 User key 1 VSID 0xa08e58 0x90000000-0x9fffffff Kern key 1 User key 1 VSID 0xa08f69 0xa0000000-0xafffffff Kern key 1 User key 1 VSID 0xa0907a 0xb0000000-0xbfffffff Kern key 1 User key 1 VSID 0xa0918b ---[ Kernel Segments ]--- 0xc0000000-0xcfffffff Kern key 0 User key 1 No Exec VSID 0x000ccc 0xd0000000-0xdfffffff Kern key 0 User key 1 No Exec VSID 0x000ddd 0xe0000000-0xefffffff Kern key 0 User key 1 No Exec VSID 0x000eee 0xf0000000-0xffffffff Kern key 0 User key 1 No Exec VSID 0x000fff Aligning _etext to 128kb allows to map up to 32Mb text with 8 IBATs: 16Mb + 8Mb + 4Mb + 2Mb + 1Mb + 512kb + 256kb + 128kb (+ 128kb) = 32Mb (A 9th IBAT is unneeded as 32Mb would need only a single 32Mb block) Aligning data to 4M allows to map up to 512Mb data with 8 DBATs: 16Mb + 8Mb + 4Mb + 4Mb + 32Mb + 64Mb + 128Mb + 256Mb = 512Mb Because some processors only have 4 BATs and because some targets need DBATs for mapping other areas, the following patch will allow to modify _etext and data alignment. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-21 19:08:49 +00:00
#define VMALLOC_START ((((long)high_memory + VMALLOC_OFFSET) & ~(VMALLOC_OFFSET-1)))
#ifdef CONFIG_KASAN_VMALLOC
#define VMALLOC_END ALIGN_DOWN(ioremap_bot, PAGE_SIZE << KASAN_SHADOW_SCALE_SHIFT)
#else
#define VMALLOC_END ioremap_bot
#endif
#define MODULES_END ALIGN_DOWN(PAGE_OFFSET, SZ_256M)
#define MODULES_SIZE (CONFIG_MODULES_SIZE * SZ_1M)
#define MODULES_VADDR (MODULES_END - MODULES_SIZE)
#ifndef __ASSEMBLY__
#include <linux/sched.h>
#include <linux/threads.h>
/* Bits to mask out from a PGD to get to the PUD page */
#define PGD_MASKED_BITS 0
#define pgd_ERROR(e) \
pr_err("%s:%d: bad pgd %08lx.\n", __FILE__, __LINE__, pgd_val(e))
/*
* Bits in a linux-style PTE. These match the bits in the
* (hardware-defined) PowerPC PTE as closely as possible.
*/
#define pte_clear(mm, addr, ptep) \
do { pte_update(mm, addr, ptep, ~_PAGE_HASHPTE, 0, 0); } while (0)
#define pmd_none(pmd) (!pmd_val(pmd))
#define pmd_bad(pmd) (pmd_val(pmd) & _PMD_BAD)
#define pmd_present(pmd) (pmd_val(pmd) & _PMD_PRESENT_MASK)
static inline void pmd_clear(pmd_t *pmdp)
{
*pmdp = __pmd(0);
}
/*
* When flushing the tlb entry for a page, we also need to flush the hash
* table entry. flush_hash_pages is assembler (for speed) in hashtable.S.
*/
extern int flush_hash_pages(unsigned context, unsigned long va,
unsigned long pmdval, int count);
/* Add an HPTE to the hash table */
extern void add_hash_page(unsigned context, unsigned long va,
unsigned long pmdval);
/* Flush an entry from the TLB/hash table */
static inline void flush_hash_entry(struct mm_struct *mm, pte_t *ptep, unsigned long addr)
{
if (mmu_has_feature(MMU_FTR_HPTE_TABLE)) {
unsigned long ptephys = __pa(ptep) & PAGE_MASK;
flush_hash_pages(mm->context.id, addr, ptephys, 1);
}
}
/*
* PTE updates. This function is called whenever an existing
* valid PTE is updated. This does -not- include set_pte_at()
* which nowadays only sets a new PTE.
*
* Depending on the type of MMU, we may need to use atomic updates
* and the PTE may be either 32 or 64 bit wide. In the later case,
* when using atomic updates, only the low part of the PTE is
* accessed atomically.
*/
static inline pte_basic_t pte_update(struct mm_struct *mm, unsigned long addr, pte_t *p,
unsigned long clr, unsigned long set, int huge)
{
pte_basic_t old;
if (mmu_has_feature(MMU_FTR_HPTE_TABLE)) {
unsigned long tmp;
asm volatile(
#ifndef CONFIG_PTE_64BIT
"1: lwarx %0, 0, %3\n"
" andc %1, %0, %4\n"
#else
"1: lwarx %L0, 0, %3\n"
" lwz %0, -4(%3)\n"
" andc %1, %L0, %4\n"
#endif
" or %1, %1, %5\n"
" stwcx. %1, 0, %3\n"
" bne- 1b"
: "=&r" (old), "=&r" (tmp), "=m" (*p)
#ifndef CONFIG_PTE_64BIT
: "r" (p),
#else
: "b" ((unsigned long)(p) + 4),
#endif
"r" (clr), "r" (set), "m" (*p)
: "cc" );
} else {
old = pte_val(*p);
*p = __pte((old & ~(pte_basic_t)clr) | set);
}
return old;
}
/*
* 2.6 calls this without flushing the TLB entry; this is wrong
* for our hash-based implementation, we fix that up here.
*/
#define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
static inline int __ptep_test_and_clear_young(struct mm_struct *mm,
unsigned long addr, pte_t *ptep)
{
unsigned long old;
old = pte_update(mm, addr, ptep, _PAGE_ACCESSED, 0, 0);
if (old & _PAGE_HASHPTE)
flush_hash_entry(mm, ptep, addr);
return (old & _PAGE_ACCESSED) != 0;
}
#define ptep_test_and_clear_young(__vma, __addr, __ptep) \
__ptep_test_and_clear_young((__vma)->vm_mm, __addr, __ptep)
#define __HAVE_ARCH_PTEP_GET_AND_CLEAR
static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
pte_t *ptep)
{
return __pte(pte_update(mm, addr, ptep, ~_PAGE_HASHPTE, 0, 0));
}
#define __HAVE_ARCH_PTEP_SET_WRPROTECT
static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr,
pte_t *ptep)
{
pte_update(mm, addr, ptep, _PAGE_WRITE, 0, 0);
}
static inline void __ptep_set_access_flags(struct vm_area_struct *vma,
pte_t *ptep, pte_t entry,
unsigned long address,
int psize)
{
unsigned long set = pte_val(entry) &
(_PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_RW | _PAGE_EXEC);
pte_update(vma->vm_mm, address, ptep, 0, set, 0);
flush_tlb_page(vma, address);
}
#define __HAVE_ARCH_PTE_SAME
#define pte_same(A,B) (((pte_val(A) ^ pte_val(B)) & ~_PAGE_HASHPTE) == 0)
#define pmd_pfn(pmd) (pmd_val(pmd) >> PAGE_SHIFT)
#define pmd_page(pmd) pfn_to_page(pmd_pfn(pmd))
/*
* Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
* are !pte_none() && !pte_present().
*
* Format of swap PTEs (32bit PTEs):
*
* 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3
* 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
* <----------------- offset --------------------> < type -> E H P
*
* E is the exclusive marker that is not stored in swap entries.
* _PAGE_PRESENT (P) and __PAGE_HASHPTE (H) must be 0.
*
* For 64bit PTEs, the offset is extended by 32bit.
*/
#define __swp_type(entry) ((entry).val & 0x1f)
#define __swp_offset(entry) ((entry).val >> 5)
#define __swp_entry(type, offset) ((swp_entry_t) { ((type) & 0x1f) | ((offset) << 5) })
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) >> 3 })
#define __swp_entry_to_pte(x) ((pte_t) { (x).val << 3 })
static inline bool pte_swp_exclusive(pte_t pte)
{
return pte_val(pte) & _PAGE_SWP_EXCLUSIVE;
}
static inline pte_t pte_swp_mkexclusive(pte_t pte)
{
return __pte(pte_val(pte) | _PAGE_SWP_EXCLUSIVE);
}
static inline pte_t pte_swp_clear_exclusive(pte_t pte)
{
return __pte(pte_val(pte) & ~_PAGE_SWP_EXCLUSIVE);
}
/* Generic accessors to PTE bits */
static inline bool pte_read(pte_t pte)
{
return !!(pte_val(pte) & _PAGE_READ);
}
static inline bool pte_write(pte_t pte)
{
return !!(pte_val(pte) & _PAGE_WRITE);
}
static inline int pte_dirty(pte_t pte) { return !!(pte_val(pte) & _PAGE_DIRTY); }
static inline int pte_young(pte_t pte) { return !!(pte_val(pte) & _PAGE_ACCESSED); }
static inline int pte_special(pte_t pte) { return !!(pte_val(pte) & _PAGE_SPECIAL); }
static inline int pte_none(pte_t pte) { return (pte_val(pte) & ~_PTE_NONE_MASK) == 0; }
static inline bool pte_exec(pte_t pte) { return pte_val(pte) & _PAGE_EXEC; }
static inline int pte_present(pte_t pte)
{
return pte_val(pte) & _PAGE_PRESENT;
}
static inline bool pte_hw_valid(pte_t pte)
{
return pte_val(pte) & _PAGE_PRESENT;
}
static inline bool pte_hashpte(pte_t pte)
{
return !!(pte_val(pte) & _PAGE_HASHPTE);
}
static inline bool pte_ci(pte_t pte)
{
return !!(pte_val(pte) & _PAGE_NO_CACHE);
}
/*
* We only find page table entry in the last level
* Hence no need for other accessors
*/
#define pte_access_permitted pte_access_permitted
static inline bool pte_access_permitted(pte_t pte, bool write)
{
/*
* A read-only access is controlled by _PAGE_READ bit.
powerpc: Support execute-only on all powerpc Introduce PAGE_EXECONLY_X macro which provides exec-only rights. The _X may be seen as redundant with the EXECONLY but it helps keep consistency, all macros having the EXEC right have _X. And put it next to PAGE_NONE as PAGE_EXECONLY_X is somehow PAGE_NONE + EXEC just like all other SOMETHING_X are just SOMETHING + EXEC. On book3s/64 PAGE_EXECONLY becomes PAGE_READONLY_X. On book3s/64, as PAGE_EXECONLY is only valid for Radix add VM_READ flag in vm_get_page_prot() for non-Radix. And update access_error() so that a non exec fault on a VM_EXEC only mapping is always invalid, even when the underlying layer don't always generate a fault for that. For 8xx, set PAGE_EXECONLY_X as _PAGE_NA | _PAGE_EXEC. For others, only set it as just _PAGE_EXEC With that change, 8xx, e500 and 44x fully honor execute-only protection. On 40x that is a partial implementation of execute-only. The implementation won't be complete because once a TLB has been loaded via the Instruction TLB miss handler, it will be possible to read the page. But at least it can't be read unless it is executed first. On 603 MMU, TLB missed are handled by SW and there are separate DTLB and ITLB. Execute-only is therefore now supported by not loading DTLB when read access is not permitted. On hash (604) MMU it is more tricky because hash table is common to load/store and execute. Nevertheless it is still possible to check whether _PAGE_READ is set before loading hash table for a load/store access. At least it can't be read unless it is executed first. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/4283ea9cbef9ff2fbee468904800e1962bc8fc18.1695659959.git.christophe.leroy@csgroup.eu
2023-09-25 20:31:51 +02:00
* We have _PAGE_READ set for WRITE
*/
if (!pte_present(pte) || !pte_read(pte))
return false;
if (write && !pte_write(pte))
return false;
return true;
}
/* Conversion functions: convert a page and protection to a page entry,
* and a page entry and page directory to the page they refer to.
*
* Even if PTEs can be unsigned long long, a PFN is always an unsigned
* long for now.
*/
static inline pte_t pfn_pte(unsigned long pfn, pgprot_t pgprot)
{
return __pte(((pte_basic_t)(pfn) << PTE_RPN_SHIFT) |
pgprot_val(pgprot));
}
/* Generic modifiers for PTE bits */
static inline pte_t pte_wrprotect(pte_t pte)
{
return __pte(pte_val(pte) & ~_PAGE_WRITE);
}
static inline pte_t pte_exprotect(pte_t pte)
{
return __pte(pte_val(pte) & ~_PAGE_EXEC);
}
static inline pte_t pte_mkclean(pte_t pte)
{
return __pte(pte_val(pte) & ~_PAGE_DIRTY);
}
static inline pte_t pte_mkold(pte_t pte)
{
return __pte(pte_val(pte) & ~_PAGE_ACCESSED);
}
static inline pte_t pte_mkexec(pte_t pte)
{
return __pte(pte_val(pte) | _PAGE_EXEC);
}
static inline pte_t pte_mkpte(pte_t pte)
{
return pte;
}
mm: Rename arch pte_mkwrite()'s to pte_mkwrite_novma() The x86 Shadow stack feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. One of these unusual properties is that shadow stack memory is writable, but only in limited ways. These limits are applied via a specific PTE bit combination. Nevertheless, the memory is writable, and core mm code will need to apply the writable permissions in the typical paths that call pte_mkwrite(). The goal is to make pte_mkwrite() take a VMA, so that the x86 implementation of it can know whether to create regular writable or shadow stack mappings. But there are a couple of challenges to this. Modifying the signatures of each arch pte_mkwrite() implementation would be error prone because some are generated with macros and would need to be re-implemented. Also, some pte_mkwrite() callers operate on kernel memory without a VMA. So this can be done in a three step process. First pte_mkwrite() can be renamed to pte_mkwrite_novma() in each arch, with a generic pte_mkwrite() added that just calls pte_mkwrite_novma(). Next callers without a VMA can be moved to pte_mkwrite_novma(). And lastly, pte_mkwrite() and all callers can be changed to take/pass a VMA. Start the process by renaming pte_mkwrite() to pte_mkwrite_novma() and adding the pte_mkwrite() wrapper in linux/pgtable.h. Apply the same pattern for pmd_mkwrite(). Since not all archs have a pmd_mkwrite_novma(), create a new arch config HAS_HUGE_PAGE that can be used to tell if pmd_mkwrite() should be defined. Otherwise in the !HAS_HUGE_PAGE cases the compiler would not be able to find pmd_mkwrite_novma(). No functional change. Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/lkml/CAHk-=wiZjSu7c9sFYZb3q04108stgHff2wfbokGCCgW7riz+8Q@mail.gmail.com/ Link: https://lore.kernel.org/all/20230613001108.3040476-2-rick.p.edgecombe%40intel.com
2023-06-12 17:10:27 -07:00
static inline pte_t pte_mkwrite_novma(pte_t pte)
{
/*
* write implies read, hence set both
*/
return __pte(pte_val(pte) | _PAGE_RW);
}
static inline pte_t pte_mkdirty(pte_t pte)
{
return __pte(pte_val(pte) | _PAGE_DIRTY);
}
static inline pte_t pte_mkyoung(pte_t pte)
{
return __pte(pte_val(pte) | _PAGE_ACCESSED);
}
static inline pte_t pte_mkspecial(pte_t pte)
{
return __pte(pte_val(pte) | _PAGE_SPECIAL);
}
static inline pte_t pte_mkhuge(pte_t pte)
{
return pte;
}
static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
{
return __pte((pte_val(pte) & _PAGE_CHG_MASK) | pgprot_val(newprot));
}
/* This low level function performs the actual PTE insertion
* Setting the PTE depends on the MMU type and other factors.
*
* First case is 32-bit in UP mode with 32-bit PTEs, we need to preserve
* the _PAGE_HASHPTE bit since we may not have invalidated the previous
* translation in the hash yet (done in a subsequent flush_tlb_xxx())
* and see we need to keep track that this PTE needs invalidating.
*
* Second case is 32-bit with 64-bit PTE. In this case, we
* can just store as long as we do the two halves in the right order
* with a barrier in between. This is possible because we take care,
* in the hash code, to pre-invalidate if the PTE was already hashed,
* which synchronizes us with any concurrent invalidation.
* In the percpu case, we fallback to the simple update preserving
* the hash bits (ie, same as the non-SMP case).
*
* Third case is 32-bit in SMP mode with 32-bit PTEs. We use the
* helper pte_update() which does an atomic update. We need to do that
* because a concurrent invalidation can clear _PAGE_HASHPTE. If it's a
* per-CPU PTE such as a kmap_atomic, we also do a simple update preserving
* the hash bits instead.
*/
static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte, int percpu)
{
if ((!IS_ENABLED(CONFIG_SMP) && !IS_ENABLED(CONFIG_PTE_64BIT)) || percpu) {
*ptep = __pte((pte_val(*ptep) & _PAGE_HASHPTE) |
(pte_val(pte) & ~_PAGE_HASHPTE));
} else if (IS_ENABLED(CONFIG_PTE_64BIT)) {
if (pte_val(*ptep) & _PAGE_HASHPTE)
flush_hash_entry(mm, ptep, addr);
asm volatile("stw%X0 %2,%0; eieio; stw%X1 %L2,%1" :
"=m" (*ptep), "=m" (*((unsigned char *)ptep+4)) :
"r" (pte) : "memory");
} else {
pte_update(mm, addr, ptep, ~_PAGE_HASHPTE, pte_val(pte), 0);
}
}
/*
* Macro to mark a page protection value as "uncacheable".
*/
#define _PAGE_CACHE_CTL (_PAGE_COHERENT | _PAGE_GUARDED | _PAGE_NO_CACHE | \
_PAGE_WRITETHRU)
#define pgprot_noncached pgprot_noncached
static inline pgprot_t pgprot_noncached(pgprot_t prot)
{
return __pgprot((pgprot_val(prot) & ~_PAGE_CACHE_CTL) |
_PAGE_NO_CACHE | _PAGE_GUARDED);
}
#define pgprot_noncached_wc pgprot_noncached_wc
static inline pgprot_t pgprot_noncached_wc(pgprot_t prot)
{
return __pgprot((pgprot_val(prot) & ~_PAGE_CACHE_CTL) |
_PAGE_NO_CACHE);
}
#define pgprot_cached pgprot_cached
static inline pgprot_t pgprot_cached(pgprot_t prot)
{
return __pgprot((pgprot_val(prot) & ~_PAGE_CACHE_CTL) |
_PAGE_COHERENT);
}
#define pgprot_cached_wthru pgprot_cached_wthru
static inline pgprot_t pgprot_cached_wthru(pgprot_t prot)
{
return __pgprot((pgprot_val(prot) & ~_PAGE_CACHE_CTL) |
_PAGE_COHERENT | _PAGE_WRITETHRU);
}
#define pgprot_cached_noncoherent pgprot_cached_noncoherent
static inline pgprot_t pgprot_cached_noncoherent(pgprot_t prot)
{
return __pgprot(pgprot_val(prot) & ~_PAGE_CACHE_CTL);
}
#define pgprot_writecombine pgprot_writecombine
static inline pgprot_t pgprot_writecombine(pgprot_t prot)
{
return pgprot_noncached_wc(prot);
}
#endif /* !__ASSEMBLY__ */
#endif /* _ASM_POWERPC_BOOK3S_32_PGTABLE_H */