2012-12-18 14:23:16 -08:00
|
|
|
What: /sys/devices/system/node/possible
|
|
|
|
Date: October 2002
|
|
|
|
Contact: Linux Memory Management list <linux-mm@kvack.org>
|
|
|
|
Description:
|
|
|
|
Nodes that could be possibly become online at some point.
|
|
|
|
|
|
|
|
What: /sys/devices/system/node/online
|
|
|
|
Date: October 2002
|
|
|
|
Contact: Linux Memory Management list <linux-mm@kvack.org>
|
|
|
|
Description:
|
|
|
|
Nodes that are online.
|
|
|
|
|
|
|
|
What: /sys/devices/system/node/has_normal_memory
|
|
|
|
Date: October 2002
|
|
|
|
Contact: Linux Memory Management list <linux-mm@kvack.org>
|
|
|
|
Description:
|
|
|
|
Nodes that have regular memory.
|
|
|
|
|
|
|
|
What: /sys/devices/system/node/has_cpu
|
|
|
|
Date: October 2002
|
|
|
|
Contact: Linux Memory Management list <linux-mm@kvack.org>
|
|
|
|
Description:
|
|
|
|
Nodes that have one or more CPUs.
|
|
|
|
|
|
|
|
What: /sys/devices/system/node/has_high_memory
|
|
|
|
Date: October 2002
|
|
|
|
Contact: Linux Memory Management list <linux-mm@kvack.org>
|
|
|
|
Description:
|
|
|
|
Nodes that have regular or high memory.
|
|
|
|
Depends on CONFIG_HIGHMEM.
|
|
|
|
|
2010-03-05 13:42:16 -08:00
|
|
|
What: /sys/devices/system/node/nodeX
|
|
|
|
Date: October 2002
|
|
|
|
Contact: Linux Memory Management list <linux-mm@kvack.org>
|
|
|
|
Description:
|
|
|
|
When CONFIG_NUMA is enabled, this is a directory containing
|
|
|
|
information on node X such as what CPUs are local to the
|
2012-12-18 14:23:16 -08:00
|
|
|
node. Each file is detailed next.
|
|
|
|
|
|
|
|
What: /sys/devices/system/node/nodeX/cpumap
|
|
|
|
Date: October 2002
|
|
|
|
Contact: Linux Memory Management list <linux-mm@kvack.org>
|
|
|
|
Description:
|
|
|
|
The node's cpumap.
|
|
|
|
|
|
|
|
What: /sys/devices/system/node/nodeX/cpulist
|
|
|
|
Date: October 2002
|
|
|
|
Contact: Linux Memory Management list <linux-mm@kvack.org>
|
|
|
|
Description:
|
|
|
|
The CPUs associated to the node.
|
|
|
|
|
|
|
|
What: /sys/devices/system/node/nodeX/meminfo
|
|
|
|
Date: October 2002
|
|
|
|
Contact: Linux Memory Management list <linux-mm@kvack.org>
|
|
|
|
Description:
|
|
|
|
Provides information about the node's distribution and memory
|
2020-04-14 18:48:37 +02:00
|
|
|
utilization. Similar to /proc/meminfo, see Documentation/filesystems/proc.rst
|
2012-12-18 14:23:16 -08:00
|
|
|
|
|
|
|
What: /sys/devices/system/node/nodeX/numastat
|
|
|
|
Date: October 2002
|
|
|
|
Contact: Linux Memory Management list <linux-mm@kvack.org>
|
|
|
|
Description:
|
|
|
|
The node's hit/miss statistics, in units of pages.
|
2019-06-27 14:56:51 -03:00
|
|
|
See Documentation/admin-guide/numastat.rst
|
2012-12-18 14:23:16 -08:00
|
|
|
|
|
|
|
What: /sys/devices/system/node/nodeX/distance
|
|
|
|
Date: October 2002
|
|
|
|
Contact: Linux Memory Management list <linux-mm@kvack.org>
|
|
|
|
Description:
|
|
|
|
Distance between the node and all the other nodes
|
|
|
|
in the system.
|
|
|
|
|
|
|
|
What: /sys/devices/system/node/nodeX/vmstat
|
|
|
|
Date: October 2002
|
|
|
|
Contact: Linux Memory Management list <linux-mm@kvack.org>
|
|
|
|
Description:
|
|
|
|
The node's zoned virtual memory statistics.
|
|
|
|
This is a superset of numastat.
|
|
|
|
|
|
|
|
What: /sys/devices/system/node/nodeX/compact
|
|
|
|
Date: February 2010
|
|
|
|
Contact: Mel Gorman <mel@csn.ul.ie>
|
|
|
|
Description:
|
|
|
|
When this file is written to, all memory within that node
|
|
|
|
will be compacted. When it completes, memory will be freed
|
|
|
|
into blocks which have as many contiguous pages as possible
|
|
|
|
|
|
|
|
What: /sys/devices/system/node/nodeX/hugepages/hugepages-<size>/
|
|
|
|
Date: December 2009
|
|
|
|
Contact: Lee Schermerhorn <lee.schermerhorn@hp.com>
|
|
|
|
Description:
|
|
|
|
The node's huge page size control/query attributes.
|
2019-03-11 14:56:00 -06:00
|
|
|
See Documentation/admin-guide/mm/hugetlbpage.rst
|
|
|
|
|
|
|
|
What: /sys/devices/system/node/nodeX/accessY/
|
|
|
|
Date: December 2018
|
|
|
|
Contact: Keith Busch <keith.busch@intel.com>
|
|
|
|
Description:
|
|
|
|
The node's relationship to other nodes for access class "Y".
|
|
|
|
|
|
|
|
What: /sys/devices/system/node/nodeX/accessY/initiators/
|
|
|
|
Date: December 2018
|
|
|
|
Contact: Keith Busch <keith.busch@intel.com>
|
|
|
|
Description:
|
|
|
|
The directory containing symlinks to memory initiator
|
|
|
|
nodes that have class "Y" access to this target node's
|
|
|
|
memory. CPUs and other memory initiators in nodes not in
|
|
|
|
the list accessing this node's memory may have different
|
|
|
|
performance.
|
|
|
|
|
|
|
|
What: /sys/devices/system/node/nodeX/accessY/targets/
|
|
|
|
Date: December 2018
|
|
|
|
Contact: Keith Busch <keith.busch@intel.com>
|
|
|
|
Description:
|
|
|
|
The directory containing symlinks to memory targets that
|
|
|
|
this initiator node has class "Y" access.
|
2019-03-11 14:56:01 -06:00
|
|
|
|
|
|
|
What: /sys/devices/system/node/nodeX/accessY/initiators/read_bandwidth
|
|
|
|
Date: December 2018
|
|
|
|
Contact: Keith Busch <keith.busch@intel.com>
|
|
|
|
Description:
|
|
|
|
This node's read bandwidth in MB/s when accessed from
|
|
|
|
nodes found in this access class's linked initiators.
|
|
|
|
|
|
|
|
What: /sys/devices/system/node/nodeX/accessY/initiators/read_latency
|
|
|
|
Date: December 2018
|
|
|
|
Contact: Keith Busch <keith.busch@intel.com>
|
|
|
|
Description:
|
|
|
|
This node's read latency in nanoseconds when accessed
|
|
|
|
from nodes found in this access class's linked initiators.
|
|
|
|
|
|
|
|
What: /sys/devices/system/node/nodeX/accessY/initiators/write_bandwidth
|
|
|
|
Date: December 2018
|
|
|
|
Contact: Keith Busch <keith.busch@intel.com>
|
|
|
|
Description:
|
|
|
|
This node's write bandwidth in MB/s when accessed from
|
|
|
|
found in this access class's linked initiators.
|
|
|
|
|
|
|
|
What: /sys/devices/system/node/nodeX/accessY/initiators/write_latency
|
|
|
|
Date: December 2018
|
|
|
|
Contact: Keith Busch <keith.busch@intel.com>
|
|
|
|
Description:
|
|
|
|
This node's write latency in nanoseconds when access
|
|
|
|
from nodes found in this class's linked initiators.
|
2019-03-11 14:56:02 -06:00
|
|
|
|
|
|
|
What: /sys/devices/system/node/nodeX/memory_side_cache/indexY/
|
|
|
|
Date: December 2018
|
|
|
|
Contact: Keith Busch <keith.busch@intel.com>
|
|
|
|
Description:
|
|
|
|
The directory containing attributes for the memory-side cache
|
|
|
|
level 'Y'.
|
|
|
|
|
|
|
|
What: /sys/devices/system/node/nodeX/memory_side_cache/indexY/indexing
|
|
|
|
Date: December 2018
|
|
|
|
Contact: Keith Busch <keith.busch@intel.com>
|
|
|
|
Description:
|
|
|
|
The caches associativity indexing: 0 for direct mapped,
|
|
|
|
non-zero if indexed.
|
|
|
|
|
|
|
|
What: /sys/devices/system/node/nodeX/memory_side_cache/indexY/line_size
|
|
|
|
Date: December 2018
|
|
|
|
Contact: Keith Busch <keith.busch@intel.com>
|
|
|
|
Description:
|
|
|
|
The number of bytes accessed from the next cache level on a
|
|
|
|
cache miss.
|
|
|
|
|
|
|
|
What: /sys/devices/system/node/nodeX/memory_side_cache/indexY/size
|
|
|
|
Date: December 2018
|
|
|
|
Contact: Keith Busch <keith.busch@intel.com>
|
|
|
|
Description:
|
|
|
|
The size of this memory side cache in bytes.
|
|
|
|
|
|
|
|
What: /sys/devices/system/node/nodeX/memory_side_cache/indexY/write_policy
|
|
|
|
Date: December 2018
|
|
|
|
Contact: Keith Busch <keith.busch@intel.com>
|
|
|
|
Description:
|
|
|
|
The cache write policy: 0 for write-back, 1 for write-through,
|
|
|
|
other or unknown.
|
x86/sgx: Add an attribute for the amount of SGX memory in a NUMA node
== Problem ==
The amount of SGX memory on a system is determined by the BIOS and it
varies wildly between systems. It can be as small as dozens of MB's
and as large as many GB's on servers. Just like how applications need
to know how much regular RAM is available, enclave builders need to
know how much SGX memory an enclave can consume.
== Solution ==
Introduce a new sysfs file:
/sys/devices/system/node/nodeX/x86/sgx_total_bytes
to enumerate the amount of SGX memory available in each NUMA node.
This serves the same function for SGX as /proc/meminfo or
/sys/devices/system/node/nodeX/meminfo does for normal RAM.
'sgx_total_bytes' is needed today to help drive the SGX selftests.
SGX-specific swap code is exercised by creating overcommitted enclaves
which are larger than the physical SGX memory on the system. They
currently use a CPUID-based approach which can diverge from the actual
amount of SGX memory available. 'sgx_total_bytes' ensures that the
selftests can work efficiently and do not attempt stupid things like
creating a 100,000 MB enclave on a system with 128 MB of SGX memory.
== Implementation Details ==
Introduce CONFIG_HAVE_ARCH_NODE_DEV_GROUP opt-in flag to expose an
arch specific attribute group, and add an attribute for the amount of
SGX memory in bytes to each NUMA node:
== ABI Design Discussion ==
As opposed to the per-node ABI, a single, global ABI was considered.
However, this would prevent enclaves from being able to size
themselves so that they fit on a single NUMA node. Essentially, a
single value would rule out NUMA optimizations for enclaves.
Create a new "x86/" directory inside each "nodeX/" sysfs directory.
'sgx_total_bytes' is expected to be the first of at least a few
sgx-specific files to be placed in the new directory. Just scanning
/proc/meminfo, these are the no-brainers that we have for RAM, but we
need for SGX:
MemTotal: xxxx kB // sgx_total_bytes (implemented here)
MemFree: yyyy kB // sgx_free_bytes
SwapTotal: zzzz kB // sgx_swapped_bytes
So, at *least* three. I think we will eventually end up needing
something more along the lines of a dozen. A new directory (as
opposed to being in the nodeX/ "root") directory avoids cluttering the
root with several "sgx_*" files.
Place the new file in a new "nodeX/x86/" directory because SGX is
highly x86-specific. It is very unlikely that any other architecture
(or even non-Intel x86 vendor) will ever implement SGX. Using "sgx/"
as opposed to "x86/" was also considered. But, there is a real chance
this can get used for other arch-specific purposes.
[ dhansen: rewrite changelog ]
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211116162116.93081-2-jarkko@kernel.org
2021-11-16 18:21:16 +02:00
|
|
|
|
2025-02-26 09:21:18 -07:00
|
|
|
What: /sys/devices/system/node/nodeX/memory_side_cache/indexY/address_mode
|
|
|
|
Date: March 2025
|
|
|
|
Contact: Dave Jiang <dave.jiang@intel.com>
|
|
|
|
Description:
|
|
|
|
The address mode: 0 for reserved, 1 for extended-linear.
|
|
|
|
|
x86/sgx: Add an attribute for the amount of SGX memory in a NUMA node
== Problem ==
The amount of SGX memory on a system is determined by the BIOS and it
varies wildly between systems. It can be as small as dozens of MB's
and as large as many GB's on servers. Just like how applications need
to know how much regular RAM is available, enclave builders need to
know how much SGX memory an enclave can consume.
== Solution ==
Introduce a new sysfs file:
/sys/devices/system/node/nodeX/x86/sgx_total_bytes
to enumerate the amount of SGX memory available in each NUMA node.
This serves the same function for SGX as /proc/meminfo or
/sys/devices/system/node/nodeX/meminfo does for normal RAM.
'sgx_total_bytes' is needed today to help drive the SGX selftests.
SGX-specific swap code is exercised by creating overcommitted enclaves
which are larger than the physical SGX memory on the system. They
currently use a CPUID-based approach which can diverge from the actual
amount of SGX memory available. 'sgx_total_bytes' ensures that the
selftests can work efficiently and do not attempt stupid things like
creating a 100,000 MB enclave on a system with 128 MB of SGX memory.
== Implementation Details ==
Introduce CONFIG_HAVE_ARCH_NODE_DEV_GROUP opt-in flag to expose an
arch specific attribute group, and add an attribute for the amount of
SGX memory in bytes to each NUMA node:
== ABI Design Discussion ==
As opposed to the per-node ABI, a single, global ABI was considered.
However, this would prevent enclaves from being able to size
themselves so that they fit on a single NUMA node. Essentially, a
single value would rule out NUMA optimizations for enclaves.
Create a new "x86/" directory inside each "nodeX/" sysfs directory.
'sgx_total_bytes' is expected to be the first of at least a few
sgx-specific files to be placed in the new directory. Just scanning
/proc/meminfo, these are the no-brainers that we have for RAM, but we
need for SGX:
MemTotal: xxxx kB // sgx_total_bytes (implemented here)
MemFree: yyyy kB // sgx_free_bytes
SwapTotal: zzzz kB // sgx_swapped_bytes
So, at *least* three. I think we will eventually end up needing
something more along the lines of a dozen. A new directory (as
opposed to being in the nodeX/ "root") directory avoids cluttering the
root with several "sgx_*" files.
Place the new file in a new "nodeX/x86/" directory because SGX is
highly x86-specific. It is very unlikely that any other architecture
(or even non-Intel x86 vendor) will ever implement SGX. Using "sgx/"
as opposed to "x86/" was also considered. But, there is a real chance
this can get used for other arch-specific purposes.
[ dhansen: rewrite changelog ]
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211116162116.93081-2-jarkko@kernel.org
2021-11-16 18:21:16 +02:00
|
|
|
What: /sys/devices/system/node/nodeX/x86/sgx_total_bytes
|
|
|
|
Date: November 2021
|
|
|
|
Contact: Jarkko Sakkinen <jarkko@kernel.org>
|
|
|
|
Description:
|
|
|
|
The total amount of SGX physical memory in bytes.
|
2023-01-20 03:46:22 +00:00
|
|
|
|
|
|
|
What: /sys/devices/system/node/nodeX/memory_failure/total
|
|
|
|
Date: January 2023
|
|
|
|
Contact: Jiaqi Yan <jiaqiyan@google.com>
|
|
|
|
Description:
|
|
|
|
The total number of raw poisoned pages (pages containing
|
|
|
|
corrupted data due to memory errors) on a NUMA node.
|
|
|
|
|
|
|
|
What: /sys/devices/system/node/nodeX/memory_failure/ignored
|
|
|
|
Date: January 2023
|
|
|
|
Contact: Jiaqi Yan <jiaqiyan@google.com>
|
|
|
|
Description:
|
|
|
|
Of the raw poisoned pages on a NUMA node, how many pages are
|
|
|
|
ignored by memory error recovery attempt, usually because
|
|
|
|
support for this type of pages is unavailable, and kernel
|
|
|
|
gives up the recovery.
|
|
|
|
|
|
|
|
What: /sys/devices/system/node/nodeX/memory_failure/failed
|
|
|
|
Date: January 2023
|
|
|
|
Contact: Jiaqi Yan <jiaqiyan@google.com>
|
|
|
|
Description:
|
|
|
|
Of the raw poisoned pages on a NUMA node, how many pages are
|
|
|
|
failed by memory error recovery attempt. This usually means
|
|
|
|
a key recovery operation failed.
|
|
|
|
|
|
|
|
What: /sys/devices/system/node/nodeX/memory_failure/delayed
|
|
|
|
Date: January 2023
|
|
|
|
Contact: Jiaqi Yan <jiaqiyan@google.com>
|
|
|
|
Description:
|
|
|
|
Of the raw poisoned pages on a NUMA node, how many pages are
|
|
|
|
delayed by memory error recovery attempt. Delayed poisoned
|
|
|
|
pages usually will be retried by kernel.
|
|
|
|
|
|
|
|
What: /sys/devices/system/node/nodeX/memory_failure/recovered
|
|
|
|
Date: January 2023
|
|
|
|
Contact: Jiaqi Yan <jiaqiyan@google.com>
|
|
|
|
Description:
|
|
|
|
Of the raw poisoned pages on a NUMA node, how many pages are
|
|
|
|
recovered by memory error recovery attempt.
|
mm: introduce per-node proactive reclaim interface
This adds support for allowing proactive reclaim in general on a NUMA
system. A per-node interface extends support for beyond a memcg-specific
interface, respecting the current semantics of memory.reclaim: respecting
aging LRU and not supporting artificially triggering eviction on nodes
belonging to non-bottom tiers.
This patch allows userspace to do:
echo "512M swappiness=10" > /sys/devices/system/node/nodeX/reclaim
One of the premises for this is to semantically align as best as possible
with memory.reclaim. During a brief time memcg did support nodemask until
55ab834a86a9 (Revert "mm: add nodes= arg to memory.reclaim"), for which
semantics around reclaim (eviction) vs demotion were not clear, rendering
charging expectations to be broken.
With this approach:
1. Users who do not use memcg can benefit from proactive reclaim. The
memcg interface is not NUMA aware and there are usecases that are
focusing on NUMA balancing rather than workload memory footprint.
2. Proactive reclaim on top tiers will trigger demotion, for which
memory is still byte-addressable. Reclaiming on the bottom nodes will
trigger evicting to swap (the traditional sense of reclaim). This
follows the semantics of what is today part of the aging process on
tiered memory, mirroring what every other form of reclaim does
(reactive and memcg proactive reclaim). Furthermore per-node proactive
reclaim is not as susceptible to the memcg charging problem mentioned
above.
3. Unlike the nodes= arg, this interface avoids confusing semantics,
such as what exactly the user wants when mixing top-tier and low-tier
nodes in the nodemask. Further per-node interface is less exposed to
"free up memory in my container" usecases, where eviction is intended.
4. Users that *really* want to free up memory can use proactive
reclaim on nodes knowingly to be on the bottom tiers to force eviction
in a natural way - higher access latencies are still better than swap.
If compelled, while no guarantees and perhaps not worth the effort,
users could also also potentially follow a ladder-like approach to
eventually free up the memory. Alternatively, perhaps an 'evict'
option could be added to the parameters for both memory.reclaim and
per-node interfaces to force this action unconditionally.
[akpm@linux-foundation.org: user_proactive_reclaim(): return -EBUSY on PGDAT_RECLAIM_LOCKED contention, per Roman]
[dave@stgolabs.net: memcg && node is also a bogus case, per Shakeel]
Link: https://lkml.kernel.org/r/20250717235604.2atyx2aobwowpge3@offworld
Link: https://lkml.kernel.org/r/20250623185851.830632-5-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-06-23 11:58:51 -07:00
|
|
|
|
|
|
|
What: /sys/devices/system/node/nodeX/reclaim
|
|
|
|
Date: June 2025
|
|
|
|
Contact: Linux Memory Management list <linux-mm@kvack.org>
|
|
|
|
Description:
|
|
|
|
Perform user-triggered proactive reclaim on a NUMA node.
|
|
|
|
This interface is equivalent to the memcg variant.
|
|
|
|
|
|
|
|
See Documentation/admin-guide/cgroup-v2.rst
|