linux/mm
Hugh Dickins 9bd3155ed8 mm,thp,rmap: lock_compound_mapcounts() on THP mapcounts
Fix the races in maintaining compound_mapcount, subpages_mapcount and
subpage _mapcount by using PG_locked in the first tail of any compound
page for a bit_spin_lock() on such modifications; skipping the usual
atomic operations on those fields in this case.

Bring page_remove_file_rmap() and page_remove_anon_compound_rmap() back
into page_remove_rmap() itself.  Rearrange page_add_anon_rmap() and
page_add_file_rmap() and page_remove_rmap() to follow the same "if
(compound) {lock} else if (PageCompound) {lock} else {atomic}" pattern
(with a PageTransHuge in the compound test, like before, to avoid BUG_ONs
and optimize away that block when THP is not configured).  Move all the
stats updates outside, after the bit_spin_locked section, so that it is
sure to be a leaf lock.

Add page_dup_compound_rmap() to manage compound locking versus atomics in
sync with the rest.  In particular, hugetlb pages are still using the
atomics: to avoid unnecessary interference there, and because they never
have subpage mappings; but this exception can easily be changed. 
Conveniently, page_dup_compound_rmap() turns out to suit an anon THP's
__split_huge_pmd_locked() too.

bit_spin_lock() is not popular with PREEMPT_RT folks: but PREEMPT_RT
sensibly excludes TRANSPARENT_HUGEPAGE already, so its only exposure is to
the non-hugetlb non-THP pte-mapped compound pages (with large folios being
currently dependent on TRANSPARENT_HUGEPAGE).  There is never any scan of
subpages in this case; but we have chosen to use PageCompound tests rather
than PageTransCompound tests to gate the use of lock_compound_mapcounts(),
so that page_mapped() is correct on all compound pages, whether or not
TRANSPARENT_HUGEPAGE is enabled: could that be a problem for PREEMPT_RT,
when there is contention on the lock - under heavy concurrent forking for
example?  If so, then it can be turned into a sleeping lock (like
folio_lock()) when PREEMPT_RT.

A simple 100 X munmap(mmap(2GB, MAP_SHARED|MAP_POPULATE, tmpfs), 2GB) took
18 seconds on small pages, and used to take 1 second on huge pages, but
now takes 115 milliseconds on huge pages.  Mapping by pmds a second time
used to take 860ms and now takes 86ms; mapping by pmds after mapping by
ptes (when the scan is needed) used to take 870ms and now takes 495ms. 
Mapping huge pages by ptes is largely unaffected but variable: between 5%
faster and 5% slower in what I've recorded.  Contention on the lock is
likely to behave worse than contention on the atomics behaved.

Link: https://lkml.kernel.org/r/1b42bd1a-8223-e827-602f-d466c2db7d3c@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: James Houghton <jthoughton@google.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
Cc: Peter Xu <peterx@redhat.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30 15:58:47 -08:00
..
damon mm/damon: use kstrtobool() instead of strtobool() 2022-11-30 15:58:45 -08:00
kasan memory: move hotplug memory notifier priority to same file for easy sorting 2022-11-08 17:37:17 -08:00
kfence kfence: fix stack trace pruning 2022-11-22 18:50:44 -08:00
kmsan kmsan: core: kmsan_in_runtime() should return true in NMI context 2022-11-08 15:57:24 -08:00
backing-dev.c
balloon_compaction.c
bootmem_info.c
cma.c
cma.h
cma_debug.c
cma_sysfs.c
compaction.c mm: migrate: fix THP's mapcount on isolation 2022-11-30 14:49:41 -08:00
debug.c mm,thp,rmap: simplify compound page mapcount handling 2022-11-30 15:58:46 -08:00
debug_page_ref.c
debug_vm_pgtable.c mm: debug_vm_pgtable: use VM_ACCESS_FLAGS 2022-11-08 17:37:19 -08:00
dmapool.c
early_ioremap.c
fadvise.c
failslab.c mm: fix unexpected changes to {failslab|fail_page_alloc}.attr 2022-11-22 18:50:44 -08:00
filemap.c filemap: find_get_entries() now updates start offset 2022-11-08 17:37:12 -08:00
folio-compat.c mm,thp,rmap: simplify compound page mapcount handling 2022-11-30 15:58:46 -08:00
frontswap.c
gup.c hugetlb: simplify hugetlb handling in follow_page_mask 2022-11-08 17:37:10 -08:00
gup_test.c mm/gup_test: start/stop/read functionality for PIN LONGTERM test 2022-11-08 17:37:15 -08:00
gup_test.h mm/gup_test: start/stop/read functionality for PIN LONGTERM test 2022-11-08 17:37:15 -08:00
highmem.c highmem: fix kmap_to_page() for kmap_local_page() addresses 2022-10-12 18:51:51 -07:00
hmm.c
huge_memory.c mm,thp,rmap: lock_compound_mapcounts() on THP mapcounts 2022-11-30 15:58:47 -08:00
hugetlb.c mm,thp,rmap: simplify compound page mapcount handling 2022-11-30 15:58:46 -08:00
hugetlb_cgroup.c mm/hugeltb_cgroup: convert hugetlb_cgroup_commit_charge*() to folios 2022-11-30 15:58:43 -08:00
hugetlb_vmemmap.c mm: hugetlb_vmemmap: remove redundant list_del() 2022-11-30 15:58:40 -08:00
hugetlb_vmemmap.h
hwpoison-inject.c mm/hwpoison: add __init/__exit annotations to module init/exit funcs 2022-10-03 14:03:05 -07:00
init-mm.c mm: remove rb tree. 2022-09-26 19:46:16 -07:00
internal.h mm/hwpoison: introduce per-memory_block hwpoison counter 2022-11-08 17:37:22 -08:00
interval_tree.c
io-mapping.c
ioremap.c
Kconfig mm,hugetlb: use folio fields in second tail page 2022-11-30 15:58:46 -08:00
Kconfig.debug
khugepaged.c mm,thp,rmap: simplify compound page mapcount handling 2022-11-30 15:58:46 -08:00
kmemleak.c mm/kmemleak: prevent soft lockup in kmemleak_scan()'s object iteration loops 2022-10-28 13:37:22 -07:00
ksm.c memory: move hotplug memory notifier priority to same file for easy sorting 2022-11-08 17:37:17 -08:00
list_lru.c
maccess.c
madvise.c madvise: use zap_page_range_single for madvise dontneed 2022-11-30 14:49:40 -08:00
Makefile mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol 2022-10-03 14:03:36 -07:00
mapping_dirty_helpers.c
memblock.c mm: add pageblock_align() macro 2022-10-03 14:03:04 -07:00
memcontrol.c mm: vmscan: split khugepaged stats from direct reclaim stats 2022-11-30 15:58:41 -08:00
memfd.c
memory-failure.c mm,hugetlb: use folio fields in second tail page 2022-11-30 15:58:46 -08:00
memory-tiers.c memory: move hotplug memory notifier priority to same file for easy sorting 2022-11-08 17:37:17 -08:00
memory.c mm: use pte markers for swap errors 2022-11-30 15:58:46 -08:00
memory_hotplug.c mm: add pageblock_aligned() macro 2022-10-03 14:03:04 -07:00
mempolicy.c mm/mempolicy: fix mbind_range() arguments to vma_merge() 2022-10-20 21:27:21 -07:00
mempool.c mempool: do not use ksize() for poisoning 2022-11-30 15:58:41 -08:00
memremap.c mm/memremap.c: map FS_DAX device memory as decrypted 2022-11-08 15:57:23 -08:00
memtest.c
migrate.c mm/hugetlb: convert move_hugetlb_state() to folios 2022-11-30 15:58:43 -08:00
migrate_device.c mm/migrate_device: return number of migrating pages in args->cpages 2022-11-22 18:50:43 -08:00
mincore.c mm: convert find_get_incore_page() to filemap_get_incore_folio() 2022-11-08 17:37:18 -08:00
mlock.c mm/mlock: drop dead code in count_mm_mlocked_page_nr() 2022-09-26 19:46:27 -07:00
mm_init.c memory: move hotplug memory notifier priority to same file for easy sorting 2022-11-08 17:37:17 -08:00
mm_slot.h mm: introduce common struct mm_slot 2022-10-03 14:02:43 -07:00
mmap.c Merge branch 'mm-hotfixes-stable' into mm-stable 2022-11-30 14:58:42 -08:00
mmap_lock.c
mmu_gather.c mm/khugepaged: fix GUP-fast interaction by sending IPI 2022-11-30 14:49:42 -08:00
mmu_notifier.c
mmzone.c
mprotect.c Revert "mm/uffd: fix warning without PTE_MARKER_UFFD_WP compiled in" 2022-11-08 17:37:21 -08:00
mremap.c mm: add merging after mremap resize 2022-09-26 19:46:28 -07:00
msync.c mm/msync: use vma_find() instead of vma linked list 2022-09-26 19:46:25 -07:00
nommu.c mm: remove the vma linked list 2022-09-26 19:46:26 -07:00
oom_kill.c mm: reduce noise in show_mem for lowmem allocations 2022-09-26 19:46:29 -07:00
page-writeback.c
page_alloc.c mm,thp,rmap: simplify compound page mapcount handling 2022-11-30 15:58:46 -08:00
page_counter.c
page_ext.c Merge branch 'mm-hotfixes-stable' into mm-stable 2022-11-30 14:58:42 -08:00
page_idle.c
page_io.c swap: convert swap_writepage() to use a folio 2022-10-03 14:02:52 -07:00
page_isolation.c mm/page_isolation: fix clang deadcode warning 2022-10-28 13:37:22 -07:00
page_owner.c mm: reuse pageblock_start/end_pfn() macro 2022-10-03 14:03:03 -07:00
page_poison.c
page_reporting.c
page_reporting.h
page_table_check.c mm: use kstrtobool() instead of strtobool() 2022-11-30 15:58:45 -08:00
page_vma_mapped.c
pagewalk.c - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in 2022-10-10 17:53:04 -07:00
percpu-internal.h
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c
pgalloc-track.h
pgtable-generic.c
process_vm_access.c
ptdump.c
readahead.c
rmap.c mm,thp,rmap: lock_compound_mapcounts() on THP mapcounts 2022-11-30 15:58:47 -08:00
rodata_test.c mm/rodata_test: use PAGE_ALIGNED() helper 2022-10-03 14:03:05 -07:00
secretmem.c mm/secretmem: remove reduntant return value 2022-10-03 14:03:36 -07:00
shmem.c mm: use pte markers for swap errors 2022-11-30 15:58:46 -08:00
shrinker_debug.c
shuffle.c mm/shuffle: convert module_param_call to module_param_cb 2022-10-03 14:03:07 -07:00
shuffle.h
slab.c Random number generator fixes for Linux 6.1-rc1. 2022-10-16 15:27:07 -07:00
slab.h - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in 2022-10-10 17:53:04 -07:00
slab_common.c - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in 2022-10-10 17:53:04 -07:00
slob.c Merge branch 'slab/for-6.1/kmalloc_size_roundup' into slab/for-next 2022-09-29 11:30:55 +02:00
slub.c mm/slub.c: use hotplug_memory_notifier() directly 2022-11-08 17:37:16 -08:00
sparse-vmemmap.c
sparse.c mm/hwpoison: introduce per-memory_block hwpoison counter 2022-11-08 17:37:22 -08:00
swap.c swap: add a limit for readahead page-cluster value 2022-11-08 17:37:22 -08:00
swap.h mm: convert find_get_incore_page() to filemap_get_incore_folio() 2022-11-08 17:37:18 -08:00
swap_cgroup.c mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled 2022-10-03 14:03:36 -07:00
swap_slots.c mm/swap: convert put_swap_page() to put_swap_folio() 2022-10-03 14:02:46 -07:00
swap_state.c mm: convert find_get_incore_page() to filemap_get_incore_folio() 2022-11-08 17:37:18 -08:00
swapfile.c mm: use pte markers for swap errors 2022-11-30 15:58:46 -08:00
truncate.c filemap: find_get_entries() now updates start offset 2022-11-08 17:37:12 -08:00
usercopy.c mm: use kstrtobool() instead of strtobool() 2022-11-30 15:58:45 -08:00
userfaultfd.c mm/shmem: use page_mapping() to detect page cache for uffd continue 2022-11-08 15:57:23 -08:00
util.c mm,thp,rmap: simplify compound page mapcount handling 2022-11-30 15:58:46 -08:00
vmalloc.c mm: vmalloc: use trace_free_vmap_area_noflush event 2022-11-08 17:37:17 -08:00
vmpressure.c
vmscan.c mm: vmscan: split khugepaged stats from direct reclaim stats 2022-11-30 15:58:41 -08:00
vmstat.c mm: vmscan: split khugepaged stats from direct reclaim stats 2022-11-30 15:58:41 -08:00
workingset.c mm: vmscan: make rotations a secondary factor in balancing anon vs file 2022-11-08 17:37:11 -08:00
z3fold.c
zbud.c
zpool.c
zsmalloc.c zsmalloc: replace IS_ERR() with IS_ERR_VALUE() 2022-11-30 15:58:46 -08:00
zswap.c