linux/Documentation/ABI/stable/sysfs-devices-node

239 lines
8 KiB
Text
Raw Permalink Normal View History

What: /sys/devices/system/node/possible
Date: October 2002
Contact: Linux Memory Management list <linux-mm@kvack.org>
Description:
Nodes that could be possibly become online at some point.
What: /sys/devices/system/node/online
Date: October 2002
Contact: Linux Memory Management list <linux-mm@kvack.org>
Description:
Nodes that are online.
What: /sys/devices/system/node/has_normal_memory
Date: October 2002
Contact: Linux Memory Management list <linux-mm@kvack.org>
Description:
Nodes that have regular memory.
What: /sys/devices/system/node/has_cpu
Date: October 2002
Contact: Linux Memory Management list <linux-mm@kvack.org>
Description:
Nodes that have one or more CPUs.
What: /sys/devices/system/node/has_high_memory
Date: October 2002
Contact: Linux Memory Management list <linux-mm@kvack.org>
Description:
Nodes that have regular or high memory.
Depends on CONFIG_HIGHMEM.
What: /sys/devices/system/node/nodeX
Date: October 2002
Contact: Linux Memory Management list <linux-mm@kvack.org>
Description:
When CONFIG_NUMA is enabled, this is a directory containing
information on node X such as what CPUs are local to the
node. Each file is detailed next.
What: /sys/devices/system/node/nodeX/cpumap
Date: October 2002
Contact: Linux Memory Management list <linux-mm@kvack.org>
Description:
The node's cpumap.
What: /sys/devices/system/node/nodeX/cpulist
Date: October 2002
Contact: Linux Memory Management list <linux-mm@kvack.org>
Description:
The CPUs associated to the node.
What: /sys/devices/system/node/nodeX/meminfo
Date: October 2002
Contact: Linux Memory Management list <linux-mm@kvack.org>
Description:
Provides information about the node's distribution and memory
utilization. Similar to /proc/meminfo, see Documentation/filesystems/proc.rst
What: /sys/devices/system/node/nodeX/numastat
Date: October 2002
Contact: Linux Memory Management list <linux-mm@kvack.org>
Description:
The node's hit/miss statistics, in units of pages.
See Documentation/admin-guide/numastat.rst
What: /sys/devices/system/node/nodeX/distance
Date: October 2002
Contact: Linux Memory Management list <linux-mm@kvack.org>
Description:
Distance between the node and all the other nodes
in the system.
What: /sys/devices/system/node/nodeX/vmstat
Date: October 2002
Contact: Linux Memory Management list <linux-mm@kvack.org>
Description:
The node's zoned virtual memory statistics.
This is a superset of numastat.
What: /sys/devices/system/node/nodeX/compact
Date: February 2010
Contact: Mel Gorman <mel@csn.ul.ie>
Description:
When this file is written to, all memory within that node
will be compacted. When it completes, memory will be freed
into blocks which have as many contiguous pages as possible
What: /sys/devices/system/node/nodeX/hugepages/hugepages-<size>/
Date: December 2009
Contact: Lee Schermerhorn <lee.schermerhorn@hp.com>
Description:
The node's huge page size control/query attributes.
node: Link memory nodes to their compute nodes Systems may be constructed with various specialized nodes. Some nodes may provide memory, some provide compute devices that access and use that memory, and others may provide both. Nodes that provide memory are referred to as memory targets, and nodes that can initiate memory access are referred to as memory initiators. Memory targets will often have varying access characteristics from different initiators, and platforms may have ways to express those relationships. In preparation for these systems, provide interfaces for the kernel to export the memory relationship among different nodes memory targets and their initiators with symlinks to each other. If a system provides access locality for each initiator-target pair, nodes may be grouped into ranked access classes relative to other nodes. The new interface allows a subsystem to register relationships of varying classes if available and desired to be exported. A memory initiator may have multiple memory targets in the same access class. The target memory's initiators in a given class indicate the nodes access characteristics share the same performance relative to other linked initiator nodes. Each target within an initiator's access class, though, do not necessarily perform the same as each other. A memory target node may have multiple memory initiators. All linked initiators in a target's class have the same access characteristics to that target. The following example show the nodes' new sysfs hierarchy for a memory target node 'Y' with access class 0 from initiator node 'X': # symlinks -v /sys/devices/system/node/nodeX/access0/ relative: /sys/devices/system/node/nodeX/access0/targets/nodeY -> ../../nodeY # symlinks -v /sys/devices/system/node/nodeY/access0/ relative: /sys/devices/system/node/nodeY/access0/initiators/nodeX -> ../../nodeX The new attributes are added to the sysfs stable documentation. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Brice Goglin <Brice.Goglin@inria.fr> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-11 14:56:00 -06:00
See Documentation/admin-guide/mm/hugetlbpage.rst
What: /sys/devices/system/node/nodeX/accessY/
Date: December 2018
Contact: Keith Busch <keith.busch@intel.com>
Description:
The node's relationship to other nodes for access class "Y".
What: /sys/devices/system/node/nodeX/accessY/initiators/
Date: December 2018
Contact: Keith Busch <keith.busch@intel.com>
Description:
The directory containing symlinks to memory initiator
nodes that have class "Y" access to this target node's
memory. CPUs and other memory initiators in nodes not in
the list accessing this node's memory may have different
performance.
What: /sys/devices/system/node/nodeX/accessY/targets/
Date: December 2018
Contact: Keith Busch <keith.busch@intel.com>
Description:
The directory containing symlinks to memory targets that
this initiator node has class "Y" access.
node: Add heterogenous memory access attributes Heterogeneous memory systems provide memory nodes with different latency and bandwidth performance attributes. Provide a new kernel interface for subsystems to register the attributes under the memory target node's initiator access class. If the system provides this information, applications may query these attributes when deciding which node to request memory. The following example shows the new sysfs hierarchy for a node exporting performance attributes: # tree -P "read*|write*"/sys/devices/system/node/nodeY/accessZ/initiators/ /sys/devices/system/node/nodeY/accessZ/initiators/ |-- read_bandwidth |-- read_latency |-- write_bandwidth `-- write_latency The bandwidth is exported as MB/s and latency is reported in nanoseconds. The values are taken from the platform as reported by the manufacturer. Memory accesses from an initiator node that is not one of the memory's access "Z" initiator nodes linked in the same directory may observe different performance than reported here. When a subsystem makes use of this interface, initiators of a different access number may not have the same performance relative to initiators in other access numbers, or omitted from the any access class' initiators. Descriptions for memory access initiator performance access attributes are added to sysfs stable documentation. Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Brice Goglin <Brice.Goglin@inria.fr> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-11 14:56:01 -06:00
What: /sys/devices/system/node/nodeX/accessY/initiators/read_bandwidth
Date: December 2018
Contact: Keith Busch <keith.busch@intel.com>
Description:
This node's read bandwidth in MB/s when accessed from
nodes found in this access class's linked initiators.
What: /sys/devices/system/node/nodeX/accessY/initiators/read_latency
Date: December 2018
Contact: Keith Busch <keith.busch@intel.com>
Description:
This node's read latency in nanoseconds when accessed
from nodes found in this access class's linked initiators.
What: /sys/devices/system/node/nodeX/accessY/initiators/write_bandwidth
Date: December 2018
Contact: Keith Busch <keith.busch@intel.com>
Description:
This node's write bandwidth in MB/s when accessed from
found in this access class's linked initiators.
What: /sys/devices/system/node/nodeX/accessY/initiators/write_latency
Date: December 2018
Contact: Keith Busch <keith.busch@intel.com>
Description:
This node's write latency in nanoseconds when access
from nodes found in this class's linked initiators.
What: /sys/devices/system/node/nodeX/memory_side_cache/indexY/
Date: December 2018
Contact: Keith Busch <keith.busch@intel.com>
Description:
The directory containing attributes for the memory-side cache
level 'Y'.
What: /sys/devices/system/node/nodeX/memory_side_cache/indexY/indexing
Date: December 2018
Contact: Keith Busch <keith.busch@intel.com>
Description:
The caches associativity indexing: 0 for direct mapped,
non-zero if indexed.
What: /sys/devices/system/node/nodeX/memory_side_cache/indexY/line_size
Date: December 2018
Contact: Keith Busch <keith.busch@intel.com>
Description:
The number of bytes accessed from the next cache level on a
cache miss.
What: /sys/devices/system/node/nodeX/memory_side_cache/indexY/size
Date: December 2018
Contact: Keith Busch <keith.busch@intel.com>
Description:
The size of this memory side cache in bytes.
What: /sys/devices/system/node/nodeX/memory_side_cache/indexY/write_policy
Date: December 2018
Contact: Keith Busch <keith.busch@intel.com>
Description:
The cache write policy: 0 for write-back, 1 for write-through,
other or unknown.
x86/sgx: Add an attribute for the amount of SGX memory in a NUMA node == Problem == The amount of SGX memory on a system is determined by the BIOS and it varies wildly between systems. It can be as small as dozens of MB's and as large as many GB's on servers. Just like how applications need to know how much regular RAM is available, enclave builders need to know how much SGX memory an enclave can consume. == Solution == Introduce a new sysfs file: /sys/devices/system/node/nodeX/x86/sgx_total_bytes to enumerate the amount of SGX memory available in each NUMA node. This serves the same function for SGX as /proc/meminfo or /sys/devices/system/node/nodeX/meminfo does for normal RAM. 'sgx_total_bytes' is needed today to help drive the SGX selftests. SGX-specific swap code is exercised by creating overcommitted enclaves which are larger than the physical SGX memory on the system. They currently use a CPUID-based approach which can diverge from the actual amount of SGX memory available. 'sgx_total_bytes' ensures that the selftests can work efficiently and do not attempt stupid things like creating a 100,000 MB enclave on a system with 128 MB of SGX memory. == Implementation Details == Introduce CONFIG_HAVE_ARCH_NODE_DEV_GROUP opt-in flag to expose an arch specific attribute group, and add an attribute for the amount of SGX memory in bytes to each NUMA node: == ABI Design Discussion == As opposed to the per-node ABI, a single, global ABI was considered. However, this would prevent enclaves from being able to size themselves so that they fit on a single NUMA node. Essentially, a single value would rule out NUMA optimizations for enclaves. Create a new "x86/" directory inside each "nodeX/" sysfs directory. 'sgx_total_bytes' is expected to be the first of at least a few sgx-specific files to be placed in the new directory. Just scanning /proc/meminfo, these are the no-brainers that we have for RAM, but we need for SGX: MemTotal: xxxx kB // sgx_total_bytes (implemented here) MemFree: yyyy kB // sgx_free_bytes SwapTotal: zzzz kB // sgx_swapped_bytes So, at *least* three. I think we will eventually end up needing something more along the lines of a dozen. A new directory (as opposed to being in the nodeX/ "root") directory avoids cluttering the root with several "sgx_*" files. Place the new file in a new "nodeX/x86/" directory because SGX is highly x86-specific. It is very unlikely that any other architecture (or even non-Intel x86 vendor) will ever implement SGX. Using "sgx/" as opposed to "x86/" was also considered. But, there is a real chance this can get used for other arch-specific purposes. [ dhansen: rewrite changelog ] Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211116162116.93081-2-jarkko@kernel.org
2021-11-16 18:21:16 +02:00
What: /sys/devices/system/node/nodeX/memory_side_cache/indexY/address_mode
Date: March 2025
Contact: Dave Jiang <dave.jiang@intel.com>
Description:
The address mode: 0 for reserved, 1 for extended-linear.
x86/sgx: Add an attribute for the amount of SGX memory in a NUMA node == Problem == The amount of SGX memory on a system is determined by the BIOS and it varies wildly between systems. It can be as small as dozens of MB's and as large as many GB's on servers. Just like how applications need to know how much regular RAM is available, enclave builders need to know how much SGX memory an enclave can consume. == Solution == Introduce a new sysfs file: /sys/devices/system/node/nodeX/x86/sgx_total_bytes to enumerate the amount of SGX memory available in each NUMA node. This serves the same function for SGX as /proc/meminfo or /sys/devices/system/node/nodeX/meminfo does for normal RAM. 'sgx_total_bytes' is needed today to help drive the SGX selftests. SGX-specific swap code is exercised by creating overcommitted enclaves which are larger than the physical SGX memory on the system. They currently use a CPUID-based approach which can diverge from the actual amount of SGX memory available. 'sgx_total_bytes' ensures that the selftests can work efficiently and do not attempt stupid things like creating a 100,000 MB enclave on a system with 128 MB of SGX memory. == Implementation Details == Introduce CONFIG_HAVE_ARCH_NODE_DEV_GROUP opt-in flag to expose an arch specific attribute group, and add an attribute for the amount of SGX memory in bytes to each NUMA node: == ABI Design Discussion == As opposed to the per-node ABI, a single, global ABI was considered. However, this would prevent enclaves from being able to size themselves so that they fit on a single NUMA node. Essentially, a single value would rule out NUMA optimizations for enclaves. Create a new "x86/" directory inside each "nodeX/" sysfs directory. 'sgx_total_bytes' is expected to be the first of at least a few sgx-specific files to be placed in the new directory. Just scanning /proc/meminfo, these are the no-brainers that we have for RAM, but we need for SGX: MemTotal: xxxx kB // sgx_total_bytes (implemented here) MemFree: yyyy kB // sgx_free_bytes SwapTotal: zzzz kB // sgx_swapped_bytes So, at *least* three. I think we will eventually end up needing something more along the lines of a dozen. A new directory (as opposed to being in the nodeX/ "root") directory avoids cluttering the root with several "sgx_*" files. Place the new file in a new "nodeX/x86/" directory because SGX is highly x86-specific. It is very unlikely that any other architecture (or even non-Intel x86 vendor) will ever implement SGX. Using "sgx/" as opposed to "x86/" was also considered. But, there is a real chance this can get used for other arch-specific purposes. [ dhansen: rewrite changelog ] Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211116162116.93081-2-jarkko@kernel.org
2021-11-16 18:21:16 +02:00
What: /sys/devices/system/node/nodeX/x86/sgx_total_bytes
Date: November 2021
Contact: Jarkko Sakkinen <jarkko@kernel.org>
Description:
The total amount of SGX physical memory in bytes.
What: /sys/devices/system/node/nodeX/memory_failure/total
Date: January 2023
Contact: Jiaqi Yan <jiaqiyan@google.com>
Description:
The total number of raw poisoned pages (pages containing
corrupted data due to memory errors) on a NUMA node.
What: /sys/devices/system/node/nodeX/memory_failure/ignored
Date: January 2023
Contact: Jiaqi Yan <jiaqiyan@google.com>
Description:
Of the raw poisoned pages on a NUMA node, how many pages are
ignored by memory error recovery attempt, usually because
support for this type of pages is unavailable, and kernel
gives up the recovery.
What: /sys/devices/system/node/nodeX/memory_failure/failed
Date: January 2023
Contact: Jiaqi Yan <jiaqiyan@google.com>
Description:
Of the raw poisoned pages on a NUMA node, how many pages are
failed by memory error recovery attempt. This usually means
a key recovery operation failed.
What: /sys/devices/system/node/nodeX/memory_failure/delayed
Date: January 2023
Contact: Jiaqi Yan <jiaqiyan@google.com>
Description:
Of the raw poisoned pages on a NUMA node, how many pages are
delayed by memory error recovery attempt. Delayed poisoned
pages usually will be retried by kernel.
What: /sys/devices/system/node/nodeX/memory_failure/recovered
Date: January 2023
Contact: Jiaqi Yan <jiaqiyan@google.com>
Description:
Of the raw poisoned pages on a NUMA node, how many pages are
recovered by memory error recovery attempt.
mm: introduce per-node proactive reclaim interface This adds support for allowing proactive reclaim in general on a NUMA system. A per-node interface extends support for beyond a memcg-specific interface, respecting the current semantics of memory.reclaim: respecting aging LRU and not supporting artificially triggering eviction on nodes belonging to non-bottom tiers. This patch allows userspace to do: echo "512M swappiness=10" > /sys/devices/system/node/nodeX/reclaim One of the premises for this is to semantically align as best as possible with memory.reclaim. During a brief time memcg did support nodemask until 55ab834a86a9 (Revert "mm: add nodes= arg to memory.reclaim"), for which semantics around reclaim (eviction) vs demotion were not clear, rendering charging expectations to be broken. With this approach: 1. Users who do not use memcg can benefit from proactive reclaim. The memcg interface is not NUMA aware and there are usecases that are focusing on NUMA balancing rather than workload memory footprint. 2. Proactive reclaim on top tiers will trigger demotion, for which memory is still byte-addressable. Reclaiming on the bottom nodes will trigger evicting to swap (the traditional sense of reclaim). This follows the semantics of what is today part of the aging process on tiered memory, mirroring what every other form of reclaim does (reactive and memcg proactive reclaim). Furthermore per-node proactive reclaim is not as susceptible to the memcg charging problem mentioned above. 3. Unlike the nodes= arg, this interface avoids confusing semantics, such as what exactly the user wants when mixing top-tier and low-tier nodes in the nodemask. Further per-node interface is less exposed to "free up memory in my container" usecases, where eviction is intended. 4. Users that *really* want to free up memory can use proactive reclaim on nodes knowingly to be on the bottom tiers to force eviction in a natural way - higher access latencies are still better than swap. If compelled, while no guarantees and perhaps not worth the effort, users could also also potentially follow a ladder-like approach to eventually free up the memory. Alternatively, perhaps an 'evict' option could be added to the parameters for both memory.reclaim and per-node interfaces to force this action unconditionally. [akpm@linux-foundation.org: user_proactive_reclaim(): return -EBUSY on PGDAT_RECLAIM_LOCKED contention, per Roman] [dave@stgolabs.net: memcg && node is also a bogus case, per Shakeel] Link: https://lkml.kernel.org/r/20250717235604.2atyx2aobwowpge3@offworld Link: https://lkml.kernel.org/r/20250623185851.830632-5-dave@stgolabs.net Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-06-23 11:58:51 -07:00
What: /sys/devices/system/node/nodeX/reclaim
Date: June 2025
Contact: Linux Memory Management list <linux-mm@kvack.org>
Description:
Perform user-triggered proactive reclaim on a NUMA node.
This interface is equivalent to the memcg variant.
See Documentation/admin-guide/cgroup-v2.rst