mirror of
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2025-08-05 16:54:27 +00:00

Sergey Senozhatsky improves zram's post-processing selection algorithm. This leads to improved memory savings. - Wei Yang has gone to town on the mapletree code, contributing several series which clean up the implementation: - "refine mas_mab_cp()" - "Reduce the space to be cleared for maple_big_node" - "maple_tree: simplify mas_push_node()" - "Following cleanup after introduce mas_wr_store_type()" - "refine storing null" - The series "selftests/mm: hugetlb_fault_after_madv improvements" from David Hildenbrand fixes this selftest for s390. - The series "introduce pte_offset_map_{ro|rw}_nolock()" from Qi Zheng implements some rationaizations and cleanups in the page mapping code. - The series "mm: optimize shadow entries removal" from Shakeel Butt optimizes the file truncation code by speeding up the handling of shadow entries. - The series "Remove PageKsm()" from Matthew Wilcox completes the migration of this flag over to being a folio-based flag. - The series "Unify hugetlb into arch_get_unmapped_area functions" from Oscar Salvador implements a bunch of consolidations and cleanups in the hugetlb code. - The series "Do not shatter hugezeropage on wp-fault" from Dev Jain takes away the wp-fault time practice of turning a huge zero page into small pages. Instead we replace the whole thing with a THP. More consistent cleaner and potentiall saves a large number of pagefaults. - The series "percpu: Add a test case and fix for clang" from Andy Shevchenko enhances and fixes the kernel's built in percpu test code. - The series "mm/mremap: Remove extra vma tree walk" from Liam Howlett optimizes mremap() by avoiding doing things which we didn't need to do. - The series "Improve the tmpfs large folio read performance" from Baolin Wang teaches tmpfs to copy data into userspace at the folio size rather than as individual pages. A 20% speedup was observed. - The series "mm/damon/vaddr: Fix issue in damon_va_evenly_split_region()" fro Zheng Yejian fixes DAMON splitting. - The series "memcg-v1: fully deprecate charge moving" from Shakeel Butt removes the long-deprecated memcgv2 charge moving feature. - The series "fix error handling in mmap_region() and refactor" from Lorenzo Stoakes cleanup up some of the mmap() error handling and addresses some potential performance issues. - The series "x86/module: use large ROX pages for text allocations" from Mike Rapoport teaches x86 to use large pages for read-only-execute module text. - The series "page allocation tag compression" from Suren Baghdasaryan is followon maintenance work for the new page allocation profiling feature. - The series "page->index removals in mm" from Matthew Wilcox remove most references to page->index in mm/. A slow march towards shrinking struct page. - The series "damon/{self,kunit}tests: minor fixups for DAMON debugfs interface tests" from Andrew Paniakin performs maintenance work for DAMON's self testing code. - The series "mm: zswap swap-out of large folios" from Kanchana Sridhar improves zswap's batching of compression and decompression. It is a step along the way towards using Intel IAA hardware acceleration for this zswap operation. - The series "kasan: migrate the last module test to kunit" from Sabyrzhan Tasbolatov completes the migration of the KASAN built-in tests over to the KUnit framework. - The series "implement lightweight guard pages" from Lorenzo Stoakes permits userapace to place fault-generating guard pages within a single VMA, rather than requiring that multiple VMAs be created for this. Improved efficiencies for userspace memory allocators are expected. - The series "memcg: tracepoint for flushing stats" from JP Kobryn uses tracepoints to provide increased visibility into memcg stats flushing activity. - The series "zram: IDLE flag handling fixes" from Sergey Senozhatsky fixes a zram buglet which potentially affected performance. - The series "mm: add more kernel parameters to control mTHP" from Maíra Canal enhances our ability to control/configuremultisize THP from the kernel boot command line. - The series "kasan: few improvements on kunit tests" from Sabyrzhan Tasbolatov has a couple of fixups for the KASAN KUnit tests. - The series "mm/list_lru: Split list_lru lock into per-cgroup scope" from Kairui Song optimizes list_lru memory utilization when lockdep is enabled. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZzwFqgAKCRDdBJ7gKXxA jkeuAQCkl+BmeYHE6uG0hi3pRxkupseR6DEOAYIiTv0/l8/GggD/Z3jmEeqnZaNq xyyenpibWgUoShU2wZ/Ha8FE5WDINwg= =JfWR -----END PGP SIGNATURE----- Merge tag 'mm-stable-2024-11-18-19-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - The series "zram: optimal post-processing target selection" from Sergey Senozhatsky improves zram's post-processing selection algorithm. This leads to improved memory savings. - Wei Yang has gone to town on the mapletree code, contributing several series which clean up the implementation: - "refine mas_mab_cp()" - "Reduce the space to be cleared for maple_big_node" - "maple_tree: simplify mas_push_node()" - "Following cleanup after introduce mas_wr_store_type()" - "refine storing null" - The series "selftests/mm: hugetlb_fault_after_madv improvements" from David Hildenbrand fixes this selftest for s390. - The series "introduce pte_offset_map_{ro|rw}_nolock()" from Qi Zheng implements some rationaizations and cleanups in the page mapping code. - The series "mm: optimize shadow entries removal" from Shakeel Butt optimizes the file truncation code by speeding up the handling of shadow entries. - The series "Remove PageKsm()" from Matthew Wilcox completes the migration of this flag over to being a folio-based flag. - The series "Unify hugetlb into arch_get_unmapped_area functions" from Oscar Salvador implements a bunch of consolidations and cleanups in the hugetlb code. - The series "Do not shatter hugezeropage on wp-fault" from Dev Jain takes away the wp-fault time practice of turning a huge zero page into small pages. Instead we replace the whole thing with a THP. More consistent cleaner and potentiall saves a large number of pagefaults. - The series "percpu: Add a test case and fix for clang" from Andy Shevchenko enhances and fixes the kernel's built in percpu test code. - The series "mm/mremap: Remove extra vma tree walk" from Liam Howlett optimizes mremap() by avoiding doing things which we didn't need to do. - The series "Improve the tmpfs large folio read performance" from Baolin Wang teaches tmpfs to copy data into userspace at the folio size rather than as individual pages. A 20% speedup was observed. - The series "mm/damon/vaddr: Fix issue in damon_va_evenly_split_region()" fro Zheng Yejian fixes DAMON splitting. - The series "memcg-v1: fully deprecate charge moving" from Shakeel Butt removes the long-deprecated memcgv2 charge moving feature. - The series "fix error handling in mmap_region() and refactor" from Lorenzo Stoakes cleanup up some of the mmap() error handling and addresses some potential performance issues. - The series "x86/module: use large ROX pages for text allocations" from Mike Rapoport teaches x86 to use large pages for read-only-execute module text. - The series "page allocation tag compression" from Suren Baghdasaryan is followon maintenance work for the new page allocation profiling feature. - The series "page->index removals in mm" from Matthew Wilcox remove most references to page->index in mm/. A slow march towards shrinking struct page. - The series "damon/{self,kunit}tests: minor fixups for DAMON debugfs interface tests" from Andrew Paniakin performs maintenance work for DAMON's self testing code. - The series "mm: zswap swap-out of large folios" from Kanchana Sridhar improves zswap's batching of compression and decompression. It is a step along the way towards using Intel IAA hardware acceleration for this zswap operation. - The series "kasan: migrate the last module test to kunit" from Sabyrzhan Tasbolatov completes the migration of the KASAN built-in tests over to the KUnit framework. - The series "implement lightweight guard pages" from Lorenzo Stoakes permits userapace to place fault-generating guard pages within a single VMA, rather than requiring that multiple VMAs be created for this. Improved efficiencies for userspace memory allocators are expected. - The series "memcg: tracepoint for flushing stats" from JP Kobryn uses tracepoints to provide increased visibility into memcg stats flushing activity. - The series "zram: IDLE flag handling fixes" from Sergey Senozhatsky fixes a zram buglet which potentially affected performance. - The series "mm: add more kernel parameters to control mTHP" from Maíra Canal enhances our ability to control/configuremultisize THP from the kernel boot command line. - The series "kasan: few improvements on kunit tests" from Sabyrzhan Tasbolatov has a couple of fixups for the KASAN KUnit tests. - The series "mm/list_lru: Split list_lru lock into per-cgroup scope" from Kairui Song optimizes list_lru memory utilization when lockdep is enabled. * tag 'mm-stable-2024-11-18-19-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (215 commits) cma: enforce non-zero pageblock_order during cma_init_reserved_mem() mm/kfence: add a new kunit test test_use_after_free_read_nofault() zram: fix NULL pointer in comp_algorithm_show() memcg/hugetlb: add hugeTLB counters to memcg vmstat: call fold_vm_zone_numa_events() before show per zone NUMA event mm: mmap_lock: check trace_mmap_lock_$type_enabled() instead of regcount zram: ZRAM_DEF_COMP should depend on ZRAM MAINTAINERS/MEMORY MANAGEMENT: add document files for mm Docs/mm/damon: recommend academic papers to read and/or cite mm: define general function pXd_init() kmemleak: iommu/iova: fix transient kmemleak false positive mm/list_lru: simplify the list_lru walk callback function mm/list_lru: split the lock to per-cgroup scope mm/list_lru: simplify reparenting and initial allocation mm/list_lru: code clean up for reparenting mm/list_lru: don't export list_lru_add mm/list_lru: don't pass unnecessary key parameters kasan: add kunit tests for kmalloc_track_caller, kmalloc_node_track_caller kasan: change kasan_atomics kunit test as KUNIT_CASE_SLOW kasan: use EXPORT_SYMBOL_IF_KUNIT to export symbols ...
352 lines
8.6 KiB
C
352 lines
8.6 KiB
C
// SPDX-License-Identifier: GPL-2.0-only
|
|
/*
|
|
* Copyright (c) 2014, The Linux Foundation. All rights reserved.
|
|
*/
|
|
#include <linux/kernel.h>
|
|
#include <linux/mm.h>
|
|
#include <linux/module.h>
|
|
#include <linux/mem_encrypt.h>
|
|
#include <linux/sched.h>
|
|
#include <linux/vmalloc.h>
|
|
|
|
#include <asm/cacheflush.h>
|
|
#include <asm/pgtable-prot.h>
|
|
#include <asm/set_memory.h>
|
|
#include <asm/tlbflush.h>
|
|
#include <asm/kfence.h>
|
|
|
|
struct page_change_data {
|
|
pgprot_t set_mask;
|
|
pgprot_t clear_mask;
|
|
};
|
|
|
|
bool rodata_full __ro_after_init = IS_ENABLED(CONFIG_RODATA_FULL_DEFAULT_ENABLED);
|
|
|
|
bool can_set_direct_map(void)
|
|
{
|
|
/*
|
|
* rodata_full, DEBUG_PAGEALLOC and a Realm guest all require linear
|
|
* map to be mapped at page granularity, so that it is possible to
|
|
* protect/unprotect single pages.
|
|
*
|
|
* KFENCE pool requires page-granular mapping if initialized late.
|
|
*
|
|
* Realms need to make pages shared/protected at page granularity.
|
|
*/
|
|
return rodata_full || debug_pagealloc_enabled() ||
|
|
arm64_kfence_can_set_direct_map() || is_realm_world();
|
|
}
|
|
|
|
static int change_page_range(pte_t *ptep, unsigned long addr, void *data)
|
|
{
|
|
struct page_change_data *cdata = data;
|
|
pte_t pte = __ptep_get(ptep);
|
|
|
|
pte = clear_pte_bit(pte, cdata->clear_mask);
|
|
pte = set_pte_bit(pte, cdata->set_mask);
|
|
|
|
__set_pte(ptep, pte);
|
|
return 0;
|
|
}
|
|
|
|
/*
|
|
* This function assumes that the range is mapped with PAGE_SIZE pages.
|
|
*/
|
|
static int __change_memory_common(unsigned long start, unsigned long size,
|
|
pgprot_t set_mask, pgprot_t clear_mask)
|
|
{
|
|
struct page_change_data data;
|
|
int ret;
|
|
|
|
data.set_mask = set_mask;
|
|
data.clear_mask = clear_mask;
|
|
|
|
ret = apply_to_page_range(&init_mm, start, size, change_page_range,
|
|
&data);
|
|
|
|
/*
|
|
* If the memory is being made valid without changing any other bits
|
|
* then a TLBI isn't required as a non-valid entry cannot be cached in
|
|
* the TLB.
|
|
*/
|
|
if (pgprot_val(set_mask) != PTE_VALID || pgprot_val(clear_mask))
|
|
flush_tlb_kernel_range(start, start + size);
|
|
return ret;
|
|
}
|
|
|
|
static int change_memory_common(unsigned long addr, int numpages,
|
|
pgprot_t set_mask, pgprot_t clear_mask)
|
|
{
|
|
unsigned long start = addr;
|
|
unsigned long size = PAGE_SIZE * numpages;
|
|
unsigned long end = start + size;
|
|
struct vm_struct *area;
|
|
int i;
|
|
|
|
if (!PAGE_ALIGNED(addr)) {
|
|
start &= PAGE_MASK;
|
|
end = start + size;
|
|
WARN_ON_ONCE(1);
|
|
}
|
|
|
|
/*
|
|
* Kernel VA mappings are always live, and splitting live section
|
|
* mappings into page mappings may cause TLB conflicts. This means
|
|
* we have to ensure that changing the permission bits of the range
|
|
* we are operating on does not result in such splitting.
|
|
*
|
|
* Let's restrict ourselves to mappings created by vmalloc (or vmap).
|
|
* Those are guaranteed to consist entirely of page mappings, and
|
|
* splitting is never needed.
|
|
*
|
|
* So check whether the [addr, addr + size) interval is entirely
|
|
* covered by precisely one VM area that has the VM_ALLOC flag set.
|
|
*/
|
|
area = find_vm_area((void *)addr);
|
|
if (!area ||
|
|
end > (unsigned long)kasan_reset_tag(area->addr) + area->size ||
|
|
!(area->flags & VM_ALLOC))
|
|
return -EINVAL;
|
|
|
|
if (!numpages)
|
|
return 0;
|
|
|
|
/*
|
|
* If we are manipulating read-only permissions, apply the same
|
|
* change to the linear mapping of the pages that back this VM area.
|
|
*/
|
|
if (rodata_full && (pgprot_val(set_mask) == PTE_RDONLY ||
|
|
pgprot_val(clear_mask) == PTE_RDONLY)) {
|
|
for (i = 0; i < area->nr_pages; i++) {
|
|
__change_memory_common((u64)page_address(area->pages[i]),
|
|
PAGE_SIZE, set_mask, clear_mask);
|
|
}
|
|
}
|
|
|
|
/*
|
|
* Get rid of potentially aliasing lazily unmapped vm areas that may
|
|
* have permissions set that deviate from the ones we are setting here.
|
|
*/
|
|
vm_unmap_aliases();
|
|
|
|
return __change_memory_common(start, size, set_mask, clear_mask);
|
|
}
|
|
|
|
int set_memory_ro(unsigned long addr, int numpages)
|
|
{
|
|
return change_memory_common(addr, numpages,
|
|
__pgprot(PTE_RDONLY),
|
|
__pgprot(PTE_WRITE));
|
|
}
|
|
|
|
int set_memory_rw(unsigned long addr, int numpages)
|
|
{
|
|
return change_memory_common(addr, numpages,
|
|
__pgprot(PTE_WRITE),
|
|
__pgprot(PTE_RDONLY));
|
|
}
|
|
|
|
int set_memory_nx(unsigned long addr, int numpages)
|
|
{
|
|
return change_memory_common(addr, numpages,
|
|
__pgprot(PTE_PXN),
|
|
__pgprot(PTE_MAYBE_GP));
|
|
}
|
|
|
|
int set_memory_x(unsigned long addr, int numpages)
|
|
{
|
|
return change_memory_common(addr, numpages,
|
|
__pgprot(PTE_MAYBE_GP),
|
|
__pgprot(PTE_PXN));
|
|
}
|
|
|
|
int set_memory_valid(unsigned long addr, int numpages, int enable)
|
|
{
|
|
if (enable)
|
|
return __change_memory_common(addr, PAGE_SIZE * numpages,
|
|
__pgprot(PTE_VALID),
|
|
__pgprot(0));
|
|
else
|
|
return __change_memory_common(addr, PAGE_SIZE * numpages,
|
|
__pgprot(0),
|
|
__pgprot(PTE_VALID));
|
|
}
|
|
|
|
int set_direct_map_invalid_noflush(struct page *page)
|
|
{
|
|
struct page_change_data data = {
|
|
.set_mask = __pgprot(0),
|
|
.clear_mask = __pgprot(PTE_VALID),
|
|
};
|
|
|
|
if (!can_set_direct_map())
|
|
return 0;
|
|
|
|
return apply_to_page_range(&init_mm,
|
|
(unsigned long)page_address(page),
|
|
PAGE_SIZE, change_page_range, &data);
|
|
}
|
|
|
|
int set_direct_map_default_noflush(struct page *page)
|
|
{
|
|
struct page_change_data data = {
|
|
.set_mask = __pgprot(PTE_VALID | PTE_WRITE),
|
|
.clear_mask = __pgprot(PTE_RDONLY),
|
|
};
|
|
|
|
if (!can_set_direct_map())
|
|
return 0;
|
|
|
|
return apply_to_page_range(&init_mm,
|
|
(unsigned long)page_address(page),
|
|
PAGE_SIZE, change_page_range, &data);
|
|
}
|
|
|
|
static int __set_memory_enc_dec(unsigned long addr,
|
|
int numpages,
|
|
bool encrypt)
|
|
{
|
|
unsigned long set_prot = 0, clear_prot = 0;
|
|
phys_addr_t start, end;
|
|
int ret;
|
|
|
|
if (!is_realm_world())
|
|
return 0;
|
|
|
|
if (!__is_lm_address(addr))
|
|
return -EINVAL;
|
|
|
|
start = __virt_to_phys(addr);
|
|
end = start + numpages * PAGE_SIZE;
|
|
|
|
if (encrypt)
|
|
clear_prot = PROT_NS_SHARED;
|
|
else
|
|
set_prot = PROT_NS_SHARED;
|
|
|
|
/*
|
|
* Break the mapping before we make any changes to avoid stale TLB
|
|
* entries or Synchronous External Aborts caused by RIPAS_EMPTY
|
|
*/
|
|
ret = __change_memory_common(addr, PAGE_SIZE * numpages,
|
|
__pgprot(set_prot),
|
|
__pgprot(clear_prot | PTE_VALID));
|
|
|
|
if (ret)
|
|
return ret;
|
|
|
|
if (encrypt)
|
|
ret = rsi_set_memory_range_protected(start, end);
|
|
else
|
|
ret = rsi_set_memory_range_shared(start, end);
|
|
|
|
if (ret)
|
|
return ret;
|
|
|
|
return __change_memory_common(addr, PAGE_SIZE * numpages,
|
|
__pgprot(PTE_VALID),
|
|
__pgprot(0));
|
|
}
|
|
|
|
static int realm_set_memory_encrypted(unsigned long addr, int numpages)
|
|
{
|
|
int ret = __set_memory_enc_dec(addr, numpages, true);
|
|
|
|
/*
|
|
* If the request to change state fails, then the only sensible cause
|
|
* of action for the caller is to leak the memory
|
|
*/
|
|
WARN(ret, "Failed to encrypt memory, %d pages will be leaked",
|
|
numpages);
|
|
|
|
return ret;
|
|
}
|
|
|
|
static int realm_set_memory_decrypted(unsigned long addr, int numpages)
|
|
{
|
|
int ret = __set_memory_enc_dec(addr, numpages, false);
|
|
|
|
WARN(ret, "Failed to decrypt memory, %d pages will be leaked",
|
|
numpages);
|
|
|
|
return ret;
|
|
}
|
|
|
|
static const struct arm64_mem_crypt_ops realm_crypt_ops = {
|
|
.encrypt = realm_set_memory_encrypted,
|
|
.decrypt = realm_set_memory_decrypted,
|
|
};
|
|
|
|
int realm_register_memory_enc_ops(void)
|
|
{
|
|
return arm64_mem_crypt_ops_register(&realm_crypt_ops);
|
|
}
|
|
|
|
int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid)
|
|
{
|
|
unsigned long addr = (unsigned long)page_address(page);
|
|
|
|
if (!can_set_direct_map())
|
|
return 0;
|
|
|
|
return set_memory_valid(addr, nr, valid);
|
|
}
|
|
|
|
#ifdef CONFIG_DEBUG_PAGEALLOC
|
|
/*
|
|
* This is - apart from the return value - doing the same
|
|
* thing as the new set_direct_map_valid_noflush() function.
|
|
*
|
|
* Unify? Explain the conceptual differences?
|
|
*/
|
|
void __kernel_map_pages(struct page *page, int numpages, int enable)
|
|
{
|
|
if (!can_set_direct_map())
|
|
return;
|
|
|
|
set_memory_valid((unsigned long)page_address(page), numpages, enable);
|
|
}
|
|
#endif /* CONFIG_DEBUG_PAGEALLOC */
|
|
|
|
/*
|
|
* This function is used to determine if a linear map page has been marked as
|
|
* not-valid. Walk the page table and check the PTE_VALID bit.
|
|
*
|
|
* Because this is only called on the kernel linear map, p?d_sect() implies
|
|
* p?d_present(). When debug_pagealloc is enabled, sections mappings are
|
|
* disabled.
|
|
*/
|
|
bool kernel_page_present(struct page *page)
|
|
{
|
|
pgd_t *pgdp;
|
|
p4d_t *p4dp;
|
|
pud_t *pudp, pud;
|
|
pmd_t *pmdp, pmd;
|
|
pte_t *ptep;
|
|
unsigned long addr = (unsigned long)page_address(page);
|
|
|
|
pgdp = pgd_offset_k(addr);
|
|
if (pgd_none(READ_ONCE(*pgdp)))
|
|
return false;
|
|
|
|
p4dp = p4d_offset(pgdp, addr);
|
|
if (p4d_none(READ_ONCE(*p4dp)))
|
|
return false;
|
|
|
|
pudp = pud_offset(p4dp, addr);
|
|
pud = READ_ONCE(*pudp);
|
|
if (pud_none(pud))
|
|
return false;
|
|
if (pud_sect(pud))
|
|
return true;
|
|
|
|
pmdp = pmd_offset(pudp, addr);
|
|
pmd = READ_ONCE(*pmdp);
|
|
if (pmd_none(pmd))
|
|
return false;
|
|
if (pmd_sect(pmd))
|
|
return true;
|
|
|
|
ptep = pte_offset_kernel(pmdp, addr);
|
|
return pte_valid(__ptep_get(ptep));
|
|
}
|