linux/Documentation/ABI/testing/sysfs-kernel-mm-mempolicy-weighted-interleave
Joshua Hahn e341f9c3c8 mm/mempolicy: Weighted Interleave Auto-tuning
On machines with multiple memory nodes, interleaving page allocations
across nodes allows for better utilization of each node's bandwidth. 
Previous work by Gregory Price [1] introduced weighted interleave, which
allowed for pages to be allocated across nodes according to user-set
ratios.

Ideally, these weights should be proportional to their bandwidth, so that
under bandwidth pressure, each node uses its maximal efficient bandwidth
and prevents latency from increasing exponentially.

Previously, weighted interleave's default weights were just 1s -- which
would be equivalent to the (unweighted) interleave mempolicy, which goes
through the nodes in a round-robin fashion, ignoring bandwidth
information.

This patch has two main goals: First, it makes weighted interleave easier
to use for users who wish to relieve bandwidth pressure when using nodes
with varying bandwidth (CXL).  By providing a set of "real" default
weights that just work out of the box, users who might not have the
capability (or wish to) perform experimentation to find the most optimal
weights for their system can still take advantage of bandwidth-informed
weighted interleave.

Second, it allows for weighted interleave to dynamically adjust to
hotplugged memory with new bandwidth information.  Instead of manually
updating node weights every time new bandwidth information is reported or
taken off, weighted interleave adjusts and provides a new set of default
weights for weighted interleave to use when there is a change in bandwidth
information.

To meet these goals, this patch introduces an auto-configuration mode for
the interleave weights that provides a reasonable set of default weights,
calculated using bandwidth data reported by the system.  In auto mode,
weights are dynamically adjusted based on whatever the current bandwidth
information reports (and responds to hotplug events).

This patch still supports users manually writing weights into the nodeN
sysfs interface by entering into manual mode.  When a user enters manual
mode, the system stops dynamically updating any of the node weights, even
during hotplug events that shift the optimal weight distribution.

A new sysfs interface "auto" is introduced, which allows users to switch
between the auto (writing 1 or Y) and manual (writing 0 or N) modes.  The
system also automatically enters manual mode when a nodeN interface is
manually written to.

There is one functional change that this patch makes to the existing
weighted_interleave ABI: previously, writing 0 directly to a nodeN
interface was said to reset the weight to the system default.  Before this
patch, the default for all weights were 1, which meant that writing 0 and
1 were functionally equivalent.  With this patch, writing 0 is invalid.

Link: https://lkml.kernel.org/r/20250520141236.2987309-1-joshua.hahnjy@gmail.com
[joshua.hahnjy@gmail.com: wordsmithing changes, simplification, fixes]
  Link: https://lkml.kernel.org/r/20250511025840.2410154-1-joshua.hahnjy@gmail.com
[joshua.hahnjy@gmail.com: remove auto_kobj_attr field from struct sysfs_wi_group]
  Link: https://lkml.kernel.org/r/20250512142511.3959833-1-joshua.hahnjy@gmail.com
https://lore.kernel.org/linux-mm/20240202170238.90004-1-gregory.price@memverge.com/ [1]
Link: https://lkml.kernel.org/r/20250505182328.4148265-1-joshua.hahnjy@gmail.com
Co-developed-by: Gregory Price <gourry@gourry.net>
Signed-off-by: Gregory Price <gourry@gourry.net>
Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
Suggested-by: Yunjeong Mun <yunjeong.mun@sk.com>
Suggested-by: Oscar Salvador <osalvador@suse.de>
Suggested-by: Ying Huang <ying.huang@linux.alibaba.com>
Suggested-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Huang Ying <ying.huang@linux.alibaba.com>
Reviewed-by: Honggyu Kim <honggyu.kim@sk.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Joanthan Cameron <Jonathan.Cameron@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21 09:55:15 -07:00

54 lines
2.1 KiB
Text

What: /sys/kernel/mm/mempolicy/weighted_interleave/
Date: January 2024
Contact: Linux memory management mailing list <linux-mm@kvack.org>
Description: Configuration Interface for the Weighted Interleave policy
What: /sys/kernel/mm/mempolicy/weighted_interleave/nodeN
Date: January 2024
Contact: Linux memory management mailing list <linux-mm@kvack.org>
Description: Weight configuration interface for nodeN
The interleave weight for a memory node (N). These weights are
utilized by tasks which have set their mempolicy to
MPOL_WEIGHTED_INTERLEAVE.
These weights only affect new allocations, and changes at runtime
will not cause migrations on already allocated pages.
The minimum weight for a node is always 1.
Minimum weight: 1
Maximum weight: 255
Writing invalid values (i.e. any values not in [1,255],
empty string, ...) will return -EINVAL.
Changing the weight to a valid value will automatically
switch the system to manual mode as well.
What: /sys/kernel/mm/mempolicy/weighted_interleave/auto
Date: May 2025
Contact: Linux memory management mailing list <linux-mm@kvack.org>
Description: Auto-weighting configuration interface
Configuration mode for weighted interleave. 'true' indicates
that the system is in auto mode, and a 'false' indicates that
the system is in manual mode.
In auto mode, all node weights are re-calculated and overwritten
(visible via the nodeN interfaces) whenever new bandwidth data
is made available during either boot or hotplug events.
In manual mode, node weights can only be updated by the user.
Note that nodes that are onlined with previously set weights
will reuse those weights. If they were not previously set or
are onlined with missing bandwidth data, the weights will use
a default weight of 1.
Writing any true value string (e.g. Y or 1) will enable auto
mode, while writing any false value string (e.g. N or 0) will
enable manual mode. All other strings are ignored and will
return -EINVAL.
Writing a new weight to a node directly via the nodeN interface
will also automatically switch the system to manual mode.