linux/mm
Dan Williams ba72b4c8cf mm/sparsemem: support sub-section hotplug
The libnvdimm sub-system has suffered a series of hacks and broken
workarounds for the memory-hotplug implementation's awkward
section-aligned (128MB) granularity.

For example the following backtrace is emitted when attempting
arch_add_memory() with physical address ranges that intersect 'System
RAM' (RAM) with 'Persistent Memory' (PMEM) within a given section:

    # cat /proc/iomem | grep -A1 -B1 Persistent\ Memory
    100000000-1ffffffff : System RAM
    200000000-303ffffff : Persistent Memory (legacy)
    304000000-43fffffff : System RAM
    440000000-23ffffffff : Persistent Memory
    2400000000-43bfffffff : Persistent Memory
      2400000000-43bfffffff : namespace2.0

    WARNING: CPU: 38 PID: 928 at arch/x86/mm/init_64.c:850 add_pages+0x5c/0x60
    [..]
    RIP: 0010:add_pages+0x5c/0x60
    [..]
    Call Trace:
     devm_memremap_pages+0x460/0x6e0
     pmem_attach_disk+0x29e/0x680 [nd_pmem]
     ? nd_dax_probe+0xfc/0x120 [libnvdimm]
     nvdimm_bus_probe+0x66/0x160 [libnvdimm]

It was discovered that the problem goes beyond RAM vs PMEM collisions as
some platform produce PMEM vs PMEM collisions within a given section.
The libnvdimm workaround for that case revealed that the libnvdimm
section-alignment-padding implementation has been broken for a long
while.

A fix for that long-standing breakage introduces as many problems as it
solves as it would require a backward-incompatible change to the
namespace metadata interpretation.  Instead of that dubious route [1],
address the root problem in the memory-hotplug implementation.

Note that EEXIST is no longer treated as success as that is how
sparse_add_section() reports subsection collisions, it was also obviated
by recent changes to perform the request_region() for 'System RAM'
before arch_add_memory() in the add_memory() sequence.

[1] https://lore.kernel.org/r/155000671719.348031.2347363160141119237.stgit@dwillia2-desk3.amr.corp.intel.com

[osalvador@suse.de: fix deactivate_section for early sections]
  Link: http://lkml.kernel.org/r/20190715081549.32577-2-osalvador@suse.de
Link: http://lkml.kernel.org/r/156092354368.979959.6232443923440952359.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18 17:08:07 -07:00
..
kasan mm/kasan: change kasan_check_{read,write} to return boolean 2019-07-12 11:05:42 -07:00
backing-dev.c backing-dev: no need to check return value of debugfs_create functions 2019-06-03 15:49:07 +02:00
balloon_compaction.c Merge 5.2-rc4 into char-misc-next 2019-06-09 09:11:21 +02:00
cleancache.c Driver Core and debugfs changes for 5.3-rc1 2019-07-12 12:24:03 -07:00
cma.c mm/cma.c: fail if fixed declaration can't be honored 2019-07-16 19:23:21 -07:00
cma.h
cma_debug.c
compaction.c mm, compaction: make sure we isolate a valid PFN 2019-06-01 15:51:32 -07:00
debug.c
debug_page_ref.c
dmapool.c mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options 2019-07-12 11:05:46 -07:00
early_ioremap.c
fadvise.c
failslab.c mm/failslab.c: by default, do not fail allocations with direct reclaim only 2019-07-12 11:05:43 -07:00
filemap.c mm/filemap.c: correct the comment about VM_FAULT_RETRY 2019-07-12 11:05:43 -07:00
frame_vector.c
frontswap.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 482 2019-06-19 17:09:52 +02:00
gup.c mm: introduce ARCH_HAS_PTE_DEVMAP 2019-07-16 19:23:25 -07:00
gup_benchmark.c
highmem.c
hmm.c HMM patches for 5.3 2019-07-14 19:42:11 -07:00
huge_memory.c mm: thp: fix false negative of shmem vma's THP eligibility 2019-07-18 17:08:06 -07:00
hugetlb.c mm: hugetlb: soft-offline: dissolve_free_huge_page() return zero on !PageHuge 2019-06-29 16:43:45 +08:00
hugetlb_cgroup.c
hwpoison-inject.c hwpoison-inject: no need to check return value of debugfs_create functions 2019-06-03 15:39:40 +02:00
init-mm.c
internal.h
interval_tree.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 248 2019-06-19 17:09:08 +02:00
Kconfig mm: introduce ARCH_HAS_PTE_DEVMAP 2019-07-16 19:23:25 -07:00
Kconfig.debug mm, debug_pagealloc: use a page type instead of page_ext flag 2019-07-12 11:05:43 -07:00
khugepaged.c Revert "mm: page cache: store only head pages in i_pages" 2019-07-05 19:55:18 -07:00
kmemleak-test.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 333 2019-06-05 17:37:06 +02:00
kmemleak.c Driver Core and debugfs changes for 5.3-rc1 2019-07-12 12:24:03 -07:00
ksm.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 482 2019-06-19 17:09:52 +02:00
list_lru.c mm: memcg/slab: stop setting page->mem_cgroup pointer for slab pages 2019-07-12 11:05:44 -07:00
maccess.c
madvise.c mm: remove MEMORY_DEVICE_PUBLIC support 2019-07-02 14:32:43 -03:00
Makefile HMM patches for 5.3 2019-07-14 19:42:11 -07:00
memblock.c
memcontrol.c mm/memcontrol.c: keep local VM counters in sync with the hierarchical ones 2019-07-16 19:23:21 -07:00
memfd.c Revert "mm: page cache: store only head pages in i_pages" 2019-07-05 19:55:18 -07:00
memory-failure.c HMM patches for 5.3 2019-07-14 19:42:11 -07:00
memory.c mm: thp: make transhuge_vma_suitable available for anonymous THP 2019-07-18 17:08:06 -07:00
memory_hotplug.c mm/sparsemem: support sub-section hotplug 2019-07-18 17:08:07 -07:00
mempolicy.c mm: export alloc_pages_vma 2019-07-02 14:32:44 -03:00
mempool.c
memtest.c
migrate.c HMM patches for 5.3 2019-07-14 19:42:11 -07:00
mincore.c mm/mincore.c: fix race between swapoff and mincore 2019-07-12 11:05:43 -07:00
mlock.c mm/mlock.c: change count_mm_mlocked_page_nr return type 2019-06-13 17:34:56 -10:00
mm_init.c
mmap.c
mmu_context.c
mmu_gather.c mm: mmu_gather: remove __tlb_reset_range() for force flush 2019-06-13 17:34:56 -10:00
mmu_notifier.c mm/mmu_notifier: use hlist_add_head_rcu() 2019-07-12 11:05:46 -07:00
mmzone.c
mprotect.c
mremap.c
msync.c
nommu.c mm: fix the MAP_UNINITIALIZED flag 2019-07-16 19:23:21 -07:00
oom_kill.c mm/oom_kill.c: remove redundant OOM score normalization in select_bad_process() 2019-07-12 11:05:47 -07:00
page-writeback.c mm: remove the account_page_dirtied export 2019-07-12 11:05:42 -07:00
page_alloc.c mm/sparsemem: support sub-section hotplug 2019-07-18 17:08:07 -07:00
page_counter.c
page_ext.c mm, debug_pagealloc: use a page type instead of page_ext flag 2019-07-12 11:05:43 -07:00
page_idle.c mm/page_idle.c: fix oops because end_pfn is larger than max_pfn 2019-06-29 16:43:45 +08:00
page_io.c mm, swap: use rbtree for swap_extent 2019-07-12 11:05:43 -07:00
page_isolation.c mm/page_isolation.c: change the prototype of undo_isolate_page_range() 2019-07-12 11:05:43 -07:00
page_owner.c
page_poison.c
page_vma_mapped.c
pagewalk.c
percpu-internal.h
percpu-km.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428 2019-06-05 17:37:16 +02:00
percpu-stats.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428 2019-06-05 17:37:16 +02:00
percpu-vm.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428 2019-06-05 17:37:16 +02:00
percpu.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428 2019-06-05 17:37:16 +02:00
pgtable-generic.c
process_vm_access.c
quicklist.c
readahead.c
rmap.c
rodata_test.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 2019-06-05 17:37:17 +02:00
shmem.c mm: thp: fix false negative of shmem vma's THP eligibility 2019-07-18 17:08:06 -07:00
shuffle.c
shuffle.h
slab.c mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options 2019-07-12 11:05:46 -07:00
slab.h mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options 2019-07-12 11:05:46 -07:00
slab_common.c mm/slab_common.c: work around clang bug #42570 2019-07-16 19:23:21 -07:00
slob.c mm/slab: refactor common ksize KASAN logic into slab_common.c 2019-07-12 11:05:42 -07:00
slub.c mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options 2019-07-12 11:05:46 -07:00
sparse-vmemmap.c mm/sparsemem: convert kmalloc_section_memmap() to populate_section_memmap() 2019-07-18 17:08:07 -07:00
sparse.c mm/sparsemem: support sub-section hotplug 2019-07-18 17:08:07 -07:00
swap.c docs: admin-guide: move sysctl directory to it 2019-07-15 11:03:01 -03:00
swap_cgroup.c
swap_slots.c
swap_state.c mm/swap_state.c: simplify total_swapcache_pages() with get_swap_device() 2019-07-12 11:05:43 -07:00
swapfile.c mm, swap: use rbtree for swap_extent 2019-07-12 11:05:43 -07:00
truncate.c
usercopy.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
userfaultfd.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 499 2019-06-19 17:09:53 +02:00
util.c mm: add account_locked_vm utility function 2019-07-16 19:23:25 -07:00
vmacache.c
vmalloc.c mm: vmalloc: show number of vmalloc pages in /proc/meminfo 2019-07-12 11:05:47 -07:00
vmpressure.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
vmscan.c mm/vmscan.c: add checks for incorrect handling of current->reclaim_state 2019-07-16 19:23:21 -07:00
vmstat.c
workingset.c
z3fold.c mm/z3fold.c: reinitialize zhdr structs after migration 2019-07-16 19:23:21 -07:00
zbud.c
zpool.c
zsmalloc.c mm/zsmalloc.c: remove unused variable 2019-06-05 16:36:49 +02:00
zswap.c zswap: ignore debugfs_create_dir() return value 2019-06-03 15:39:39 +02:00