No description
Find a file
Oscar Salvador 10eeadf304 mm,memory_hotplug: unlock 1GB-hugetlb on x86_64
On x86_64, 1GB-hugetlb pages could never be offlined due to the fact
that hugepage_migration_supported() returned false for PUD_SHIFT.
So whenever we wanted to offline a memblock containing a gigantic
hugetlb page, we never got beyond has_unmovable_pages() check.
This changed with [1], where now we also return true for PUD_SHIFT.

After that patch, the check in has_unmovable_pages() and scan_movable_pages()
returned true, but we still had a final barrier in do_migrate_range():

if (compound_order(head) > PFN_SECTION_SHIFT) {
	ret = -EBUSY;
	break;
}

This is not really nice, and we do not really need it.
It is perfectly possible to migrate a gigantic page as long as another node has
a spare gigantic page for us.
In alloc_huge_page_nodemask(), we calculate the __real__ number of free pages,
and if any, we try to dequeue one from another node.

This all works fine when we do have another node with a spare gigantic page,
but if that is not the case, alloc_huge_page_nodemask() ends up calling
alloc_migrate_huge_page() which bails out if the wanted page is gigantic.
That is mainly because finding a 1GB (or even 16GB on powerpc) contiguous
memory is quite unlikely when the system has been running for a while.

In that situation, we will keep looping forever because scan_movable_pages()
will give us the same page and we will fail again because there is no node
where we can dequeue a gigantic page from.
This is not nice, and it has been raised that we might want to treat -ENOMEM
as a fatal error in do_migrate_range(), but this has to be checked further.

Anyway, I would tend say that this is the administrator's job, to make sure
that the system can keep up with the memory to be offlined, so that would mean
that if we want to use gigantic pages, make sure that the other nodes have at
least enough gigantic pages to keep up in case we need to offline memory.

Just for the sake of completeness, this is one of the tests done:

 # echo 1 > /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages
 # echo 1 > /sys/devices/system/node/node2/hugepages/hugepages-1048576kB/nr_hugepages

 # cat /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages
   1
 # cat /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/free_hugepages
   1

 # cat /sys/devices/system/node/node2/hugepages/hugepages-1048576kB/nr_hugepages
   1
 # cat /sys/devices/system/node/node2/hugepages/hugepages-1048576kB/free_hugepages
   1

 (hugetlb1gb is a program that maps 1GB region using MAP_HUGE_1GB)

 # numactl -m 1 ./hugetlb1gb
 # cat /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/free_hugepages
   0
 # cat /sys/devices/system/node/node2/hugepages/hugepages-1048576kB/free_hugepages
   1

 # offline node1 memory
 # cat /sys/devices/system/node/node2/hugepages/hugepages-1048576kB/free_hugepages
   0

[1] https://lore.kernel.org/patchwork/patch/998796/

Link: http://lkml.kernel.org/r/20190320152658.10855-2-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:46 -07:00
arch mm/gup: change GUP fast to use flags rather than a write 'bool' 2019-05-14 09:47:46 -07:00
block for-5.2/block-20190507 2019-05-07 18:14:36 -07:00
certs kexec, KEYS: Make use of platform keyring for signature verify 2019-02-04 17:34:07 -05:00
crypto Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2019-05-07 22:03:58 -07:00
Documentation mm: move recent_rotated pages calculation to shrink_inactive_list() 2019-05-14 09:47:45 -07:00
drivers IB/mthca: use the new FOLL_LONGTERM flag to get_user_pages_fast() 2019-05-14 09:47:46 -07:00
fs mm/gup: change GUP fast to use flags rather than a write 'bool' 2019-05-14 09:47:46 -07:00
include mm/gup: change GUP fast to use flags rather than a write 'bool' 2019-05-14 09:47:46 -07:00
init Initialize the random driver earler; fix CRNG initialization when we 2019-05-07 21:42:23 -07:00
ipc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2019-05-07 22:03:58 -07:00
kernel mm/gup: change GUP fast to use flags rather than a write 'bool' 2019-05-14 09:47:46 -07:00
lib mm/gup: change GUP fast to use flags rather than a write 'bool' 2019-05-14 09:47:46 -07:00
LICENSES LICENSES: Rename other to deprecated 2019-05-03 06:34:32 -06:00
mm mm,memory_hotplug: unlock 1GB-hugetlb on x86_64 2019-05-14 09:47:46 -07:00
net mm/gup: change GUP fast to use flags rather than a write 'bool' 2019-05-14 09:47:46 -07:00
samples samples: add .gitignore for pidfd-metadata 2019-05-10 11:50:52 +02:00
scripts A reasonably busy cycle for docs, including: 2019-05-08 12:42:50 -07:00
security tomoyo: Don't emit WARNING: string while fuzzing testing. 2019-05-10 14:58:35 -07:00
sound sound updates for 5.2-rc1 2019-05-09 08:26:55 -07:00
tools powerpc updates for 5.2 2019-05-10 05:29:27 -07:00
usr user/Makefile: Fix typo and capitalization in comment section 2018-12-11 00:18:03 +09:00
virt arm64 updates for 5.2 2019-05-06 17:54:22 -07:00
.clang-format Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-04-17 11:26:25 -07:00
.cocciconfig
.get_maintainer.ignore
.gitattributes
.gitignore .gitignore: add more all*.config patterns 2019-05-08 09:47:46 +09:00
.mailmap A reasonably busy cycle for docs, including: 2019-05-08 12:42:50 -07:00
COPYING COPYING: use the new text with points to the license files 2018-03-23 12:41:45 -06:00
CREDITS Char/Misc driver patches for 5.1-rc1 2019-03-06 14:18:59 -08:00
Kbuild Kbuild updates for v5.1 2019-03-10 17:48:21 -07:00
Kconfig kconfig: move the "Executable file formats" menu to fs/Kconfig.binfmt 2018-08-02 08:06:55 +09:00
MAINTAINERS MAINTAINERS: add Joseph as ocfs2 co-maintainer 2019-05-14 09:47:44 -07:00
Makefile Kbuild updates for v5.2 2019-05-08 12:25:12 -07:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.