mirror of
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2025-08-05 16:54:27 +00:00

Patch series "mm/damon: introduce DAMON_STAT for simple and practical access monitoring", v2. DAMON-based access monitoring is not simple due to required DAMON control and results visualizations. Introduce a static kernel module for making it simple. The module can be enabled without manual setup and provides access pattern metrics that easy to fetch and understand the practical access pattern information, namely estimated memory bandwidth and memory idle time percentiles. Background and Problems ======================= DAMON can be used for monitoring data access patterns of the system and workloads. Specifically, users can start DAMON to monitor access events on specific address space with fine controls including address ranges to monitor and time intervals between samplings and aggregations. The resulting access information snapshot contains access frequency (nr_accesses) and how long the frequency was kept (age) for each byte. The monitoring usage is not simple and practical enough for production usage. Users should first start DAMON with a number of parameters, and wait until DAMON's monitoring results capture a reasonable amount of the time data (age). In production, such manual start and wait is impractical to capture useful information from a high number of machines in a timely manner. The monitoring result is also too detailed to be used on production environments. The raw results are hard to be aggregated and/or compared for production environments having a large scale of time, space and machines fleet. Users have to implement and use their own automation of DAMON control and results processing. It is repetitive and challenging since there is no good reference or guideline for such automation. Solution: DAMON_STAT ==================== Implement such automation in kernel space as a static kernel module, namely DAMON_STAT. It can be enabled at build, boot, or run time via its build configuration or module parameter. It monitors the entire physical address space with monitoring intervals that auto-tuned for a reasonable amount of access observations and minimum overhead. It converts the raw monitoring results into simpler metrics that can easily be aggregated and compared, namely estimated memory bandwidth and idle time percentiles. Understanding of the metrics and the user interface of DAMON_STAT is essential. Refer to the commit messages of the second and the third patches of this patch series for more details about the metrics. For the user interface, the standard module parameters system is used. Refer to the fourth patch of this patch series for details of the user interface. Discussions =========== The module aims to be useful on production environments constructed with a large number of machines that run a long time. The auto-tuned monitoring intervals ensure a reasonable quality of the outputs. The auto-tuning also ensures its overhead be reasonable and low enough to be enabled always on the production. The simplified monitoring results metrics can be useful for showing both coldness (idle time percentiles) and hotness (memory bandwidth) of the system's access pattern. We expect the information can be useful for assessing system memory utilization and inspiring optimizations or investigations on both kernel and user space memory management logics for large scale fleets. We hence expect the module is good enough to be just used in most environments. For special cases that require a custom access monitoring automation, users will still benefit by using DAMON_STAT as a reference or a guideline for their specialized automation. This patch (of 4): To use DAMON for monitoring access patterns of the system, users should manually start DAMON via DAMON sysfs ABI with a number of parameters for specifying the monitoring target address space, address ranges, and monitoring intervals. After that, users should also wait until desired amount of time data is captured into DAMON's monitoring results. It is bothersome and take a long time to be practical for access monitoring on large fleet level production environments. For access-aware system operations use cases like proactive cold memory reclamation, similar problems existed. We we solved those by introducing dedicated static kernel modules such as DAMON_RECLAIM. Implement such static kernel module for access monitoring, namely DAMON_STAT. It monitors the entire physical address space with auto-tuned monitoring intervals. The auto-tuning is set to capture 4 % of observable access events in each snapshot while keeping the sampling intervals 5 milliseconds in minimum and 10 seconds in maximum. From a few production environments, we confirmed this setup provides high quality monitoring results with minimum overheads. The module therefore receives only one user input, whether to enable or disable it. It can be set on build or boot time via build configuration or kernel boot command line. It can also be overridden at runtime. Note that this commit only implements the DAMON control part of the module. Users could get the monitoring results via damon:damon_aggregated tracepoint, but that's of course not the recommended way. Following commits will implement convenient and optimized ways for serving the monitoring results to users. [sj@kernel.org: use IS_ENABLED() for enabled initial value] Link: https://lkml.kernel.org/r/20250604205619.18929-1-sj@kernel.org [sj@kernel.org: reset enabled when DAMON start failed] Link: https://lkml.kernel.org/r/20250706184750.36588-1-sj@kernel.org Link: https://lkml.kernel.org/r/20250604183127.13968-1-sj@kernel.org Link: https://lkml.kernel.org/r/20250604183127.13968-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
113 lines
3.4 KiB
Text
113 lines
3.4 KiB
Text
# SPDX-License-Identifier: GPL-2.0-only
|
|
|
|
menu "Data Access Monitoring"
|
|
|
|
config DAMON
|
|
bool "DAMON: Data Access Monitoring Framework"
|
|
help
|
|
This builds a framework that allows kernel subsystems to monitor
|
|
access frequency of each memory region. The information can be useful
|
|
for performance-centric DRAM level memory management.
|
|
|
|
See https://www.kernel.org/doc/html/latest/mm/damon/index.html for
|
|
more information.
|
|
|
|
config DAMON_KUNIT_TEST
|
|
bool "Test for damon" if !KUNIT_ALL_TESTS
|
|
depends on DAMON && KUNIT=y
|
|
default KUNIT_ALL_TESTS
|
|
help
|
|
This builds the DAMON Kunit test suite.
|
|
|
|
For more information on KUnit and unit tests in general, please refer
|
|
to the KUnit documentation.
|
|
|
|
If unsure, say N.
|
|
|
|
config DAMON_VADDR
|
|
bool "Data access monitoring operations for virtual address spaces"
|
|
depends on DAMON && MMU
|
|
select PAGE_IDLE_FLAG
|
|
default DAMON
|
|
help
|
|
This builds the default data access monitoring operations for DAMON
|
|
that work for virtual address spaces.
|
|
|
|
config DAMON_PADDR
|
|
bool "Data access monitoring operations for the physical address space"
|
|
depends on DAMON && MMU
|
|
select PAGE_IDLE_FLAG
|
|
default DAMON
|
|
help
|
|
This builds the default data access monitoring operations for DAMON
|
|
that works for the physical address space.
|
|
|
|
config DAMON_VADDR_KUNIT_TEST
|
|
bool "Test for DAMON operations" if !KUNIT_ALL_TESTS
|
|
depends on DAMON_VADDR && KUNIT=y
|
|
default KUNIT_ALL_TESTS
|
|
help
|
|
This builds the DAMON virtual addresses operations Kunit test suite.
|
|
|
|
For more information on KUnit and unit tests in general, please refer
|
|
to the KUnit documentation.
|
|
|
|
If unsure, say N.
|
|
|
|
config DAMON_SYSFS
|
|
bool "DAMON sysfs interface"
|
|
depends on DAMON && SYSFS
|
|
default DAMON
|
|
help
|
|
This builds the sysfs interface for DAMON. The user space can use
|
|
the interface for arbitrary data access monitoring.
|
|
|
|
config DAMON_SYSFS_KUNIT_TEST
|
|
bool "Test for damon sysfs interface" if !KUNIT_ALL_TESTS
|
|
depends on DAMON_SYSFS && KUNIT=y
|
|
default KUNIT_ALL_TESTS
|
|
help
|
|
This builds the DAMON sysfs interface Kunit test suite.
|
|
|
|
For more information on KUnit and unit tests in general, please refer
|
|
to the KUnit documentation.
|
|
|
|
If unsure, say N.
|
|
|
|
config DAMON_RECLAIM
|
|
bool "Build DAMON-based reclaim (DAMON_RECLAIM)"
|
|
depends on DAMON_PADDR
|
|
help
|
|
This builds the DAMON-based reclamation subsystem. It finds pages
|
|
that not accessed for a long time (cold) using DAMON and reclaim
|
|
those.
|
|
|
|
This is suggested to be used as a proactive and lightweight
|
|
reclamation under light memory pressure, while the traditional page
|
|
scanning-based reclamation is used for heavy pressure.
|
|
|
|
config DAMON_LRU_SORT
|
|
bool "Build DAMON-based LRU-lists sorting (DAMON_LRU_SORT)"
|
|
depends on DAMON_PADDR
|
|
help
|
|
This builds the DAMON-based LRU-lists sorting subsystem. It tries to
|
|
protect frequently accessed (hot) pages while rarely accessed (cold)
|
|
pages reclaimed first under memory pressure.
|
|
|
|
config DAMON_STAT
|
|
bool "Build data access monitoring stat (DAMON_STAT)"
|
|
depends on DAMON_PADDR
|
|
help
|
|
This builds the DAMON-based access monitoring statistics subsystem.
|
|
It runs DAMON and expose access monitoring results in simple stat
|
|
metrics.
|
|
|
|
config DAMON_STAT_ENABLED_DEFAULT
|
|
bool "Enable DAMON_STAT by default"
|
|
depends on DAMON_PADDR
|
|
default DAMON_STAT
|
|
help
|
|
Whether to enable DAMON_STAT by default. Users can disable it in
|
|
boot or runtime using its 'enabled' parameter.
|
|
|
|
endmenu
|