mm/damon: introduce DAMON_STAT module
Patch series "mm/damon: introduce DAMON_STAT for simple and practical
access monitoring", v2.
DAMON-based access monitoring is not simple due to required DAMON control
and results visualizations. Introduce a static kernel module for making
it simple. The module can be enabled without manual setup and provides
access pattern metrics that easy to fetch and understand the practical
access pattern information, namely estimated memory bandwidth and memory
idle time percentiles.
Background and Problems
=======================
DAMON can be used for monitoring data access patterns of the system and
workloads. Specifically, users can start DAMON to monitor access events
on specific address space with fine controls including address ranges to
monitor and time intervals between samplings and aggregations. The
resulting access information snapshot contains access frequency
(nr_accesses) and how long the frequency was kept (age) for each byte.
The monitoring usage is not simple and practical enough for production
usage. Users should first start DAMON with a number of parameters, and
wait until DAMON's monitoring results capture a reasonable amount of the
time data (age). In production, such manual start and wait is impractical
to capture useful information from a high number of machines in a timely
manner.
The monitoring result is also too detailed to be used on production
environments. The raw results are hard to be aggregated and/or compared
for production environments having a large scale of time, space and
machines fleet.
Users have to implement and use their own automation of DAMON control and
results processing. It is repetitive and challenging since there is no
good reference or guideline for such automation.
Solution: DAMON_STAT
====================
Implement such automation in kernel space as a static kernel module,
namely DAMON_STAT. It can be enabled at build, boot, or run time via its
build configuration or module parameter. It monitors the entire physical
address space with monitoring intervals that auto-tuned for a reasonable
amount of access observations and minimum overhead. It converts the raw
monitoring results into simpler metrics that can easily be aggregated and
compared, namely estimated memory bandwidth and idle time percentiles.
Understanding of the metrics and the user interface of DAMON_STAT is
essential. Refer to the commit messages of the second and the third
patches of this patch series for more details about the metrics. For the
user interface, the standard module parameters system is used. Refer to
the fourth patch of this patch series for details of the user interface.
Discussions
===========
The module aims to be useful on production environments constructed with a
large number of machines that run a long time. The auto-tuned monitoring
intervals ensure a reasonable quality of the outputs. The auto-tuning
also ensures its overhead be reasonable and low enough to be enabled
always on the production. The simplified monitoring results metrics can
be useful for showing both coldness (idle time percentiles) and hotness
(memory bandwidth) of the system's access pattern. We expect the
information can be useful for assessing system memory utilization and
inspiring optimizations or investigations on both kernel and user space
memory management logics for large scale fleets.
We hence expect the module is good enough to be just used in most
environments. For special cases that require a custom access monitoring
automation, users will still benefit by using DAMON_STAT as a reference or
a guideline for their specialized automation.
This patch (of 4):
To use DAMON for monitoring access patterns of the system, users should
manually start DAMON via DAMON sysfs ABI with a number of parameters for
specifying the monitoring target address space, address ranges, and
monitoring intervals. After that, users should also wait until desired
amount of time data is captured into DAMON's monitoring results. It is
bothersome and take a long time to be practical for access monitoring on
large fleet level production environments.
For access-aware system operations use cases like proactive cold memory
reclamation, similar problems existed. We we solved those by introducing
dedicated static kernel modules such as DAMON_RECLAIM.
Implement such static kernel module for access monitoring, namely
DAMON_STAT. It monitors the entire physical address space with auto-tuned
monitoring intervals. The auto-tuning is set to capture 4 % of observable
access events in each snapshot while keeping the sampling intervals 5
milliseconds in minimum and 10 seconds in maximum. From a few production
environments, we confirmed this setup provides high quality monitoring
results with minimum overheads. The module therefore receives only one
user input, whether to enable or disable it. It can be set on build or
boot time via build configuration or kernel boot command line. It can
also be overridden at runtime.
Note that this commit only implements the DAMON control part of the
module. Users could get the monitoring results via damon:damon_aggregated
tracepoint, but that's of course not the recommended way. Following
commits will implement convenient and optimized ways for serving the
monitoring results to users.
[sj@kernel.org: use IS_ENABLED() for enabled initial value]
Link: https://lkml.kernel.org/r/20250604205619.18929-1-sj@kernel.org
[sj@kernel.org: reset enabled when DAMON start failed]
Link: https://lkml.kernel.org/r/20250706184750.36588-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20250604183127.13968-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20250604183127.13968-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-06-04 11:31:24 -07:00
|
|
|
// SPDX-License-Identifier: GPL-2.0
|
|
|
|
/*
|
|
|
|
* Shows data access monitoring resutls in simple metrics.
|
|
|
|
*/
|
|
|
|
|
|
|
|
#define pr_fmt(fmt) "damon-stat: " fmt
|
|
|
|
|
|
|
|
#include <linux/damon.h>
|
|
|
|
#include <linux/init.h>
|
|
|
|
#include <linux/kernel.h>
|
|
|
|
#include <linux/module.h>
|
|
|
|
#include <linux/sort.h>
|
|
|
|
|
|
|
|
#ifdef MODULE_PARAM_PREFIX
|
|
|
|
#undef MODULE_PARAM_PREFIX
|
|
|
|
#endif
|
|
|
|
#define MODULE_PARAM_PREFIX "damon_stat."
|
|
|
|
|
|
|
|
static int damon_stat_enabled_store(
|
|
|
|
const char *val, const struct kernel_param *kp);
|
|
|
|
|
|
|
|
static const struct kernel_param_ops enabled_param_ops = {
|
|
|
|
.set = damon_stat_enabled_store,
|
|
|
|
.get = param_get_bool,
|
|
|
|
};
|
|
|
|
|
|
|
|
static bool enabled __read_mostly = IS_ENABLED(
|
|
|
|
CONFIG_DAMON_STAT_ENABLED_DEFAULT);
|
|
|
|
module_param_cb(enabled, &enabled_param_ops, &enabled, 0600);
|
|
|
|
MODULE_PARM_DESC(enabled, "Enable of disable DAMON_STAT");
|
|
|
|
|
2025-06-04 11:31:25 -07:00
|
|
|
static unsigned long estimated_memory_bandwidth __read_mostly;
|
|
|
|
module_param(estimated_memory_bandwidth, ulong, 0400);
|
|
|
|
MODULE_PARM_DESC(estimated_memory_bandwidth,
|
|
|
|
"Estimated memory bandwidth usage in bytes per second");
|
|
|
|
|
mm/damon/stat: calculate and expose idle time percentiles
Knowing how much memory is how cold can be useful for understanding
coldness and utilization efficiency of memory. The raw form of DAMON's
monitoring results has the information. Convert the raw results into the
per-byte idle time distributions and expose it as percentiles metric to
users, as a read-only DAMON_STAT parameter.
In detail, the metrics are calculated as follows. First, DAMON's
per-region access frequency and age information is converted into per-byte
idle time. If access frequency of a region is higher than zero, every
byte of the region has zero idle time. If the access frequency of a
region is zero, every byte of the region has idle time as the age of the
region. Then the logic sorts the per-byte idle times and provides the
value at 0/100, 1/100, ..., 99/100 and 100/100 location of the sorted
array.
The metric can be easily aggregated and compared on large scale production
systems. For example, if an average of 75-th percentile idle time of
machines that collected on similar time is two minutes, it means the
system's 25 percent memory is not accessed at all for two minutes or more
on average. If a workload considers two minutes as unit work time, we can
conclude its working set size is only 75 percent of the memory. If the
system utilizes proactive reclamation and it supports coldness-based
thresholds like DAMON_RECLAIM, the idle time percentiles can be used to
find a more safe or aggressive coldness threshold for aimed memory saving.
Link: https://lkml.kernel.org/r/20250604183127.13968-4-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-06-04 11:31:26 -07:00
|
|
|
static unsigned long memory_idle_ms_percentiles[101] __read_mostly = {0,};
|
|
|
|
module_param_array(memory_idle_ms_percentiles, ulong, NULL, 0400);
|
|
|
|
MODULE_PARM_DESC(memory_idle_ms_percentiles,
|
|
|
|
"Memory idle time percentiles in milliseconds");
|
|
|
|
|
mm/damon: introduce DAMON_STAT module
Patch series "mm/damon: introduce DAMON_STAT for simple and practical
access monitoring", v2.
DAMON-based access monitoring is not simple due to required DAMON control
and results visualizations. Introduce a static kernel module for making
it simple. The module can be enabled without manual setup and provides
access pattern metrics that easy to fetch and understand the practical
access pattern information, namely estimated memory bandwidth and memory
idle time percentiles.
Background and Problems
=======================
DAMON can be used for monitoring data access patterns of the system and
workloads. Specifically, users can start DAMON to monitor access events
on specific address space with fine controls including address ranges to
monitor and time intervals between samplings and aggregations. The
resulting access information snapshot contains access frequency
(nr_accesses) and how long the frequency was kept (age) for each byte.
The monitoring usage is not simple and practical enough for production
usage. Users should first start DAMON with a number of parameters, and
wait until DAMON's monitoring results capture a reasonable amount of the
time data (age). In production, such manual start and wait is impractical
to capture useful information from a high number of machines in a timely
manner.
The monitoring result is also too detailed to be used on production
environments. The raw results are hard to be aggregated and/or compared
for production environments having a large scale of time, space and
machines fleet.
Users have to implement and use their own automation of DAMON control and
results processing. It is repetitive and challenging since there is no
good reference or guideline for such automation.
Solution: DAMON_STAT
====================
Implement such automation in kernel space as a static kernel module,
namely DAMON_STAT. It can be enabled at build, boot, or run time via its
build configuration or module parameter. It monitors the entire physical
address space with monitoring intervals that auto-tuned for a reasonable
amount of access observations and minimum overhead. It converts the raw
monitoring results into simpler metrics that can easily be aggregated and
compared, namely estimated memory bandwidth and idle time percentiles.
Understanding of the metrics and the user interface of DAMON_STAT is
essential. Refer to the commit messages of the second and the third
patches of this patch series for more details about the metrics. For the
user interface, the standard module parameters system is used. Refer to
the fourth patch of this patch series for details of the user interface.
Discussions
===========
The module aims to be useful on production environments constructed with a
large number of machines that run a long time. The auto-tuned monitoring
intervals ensure a reasonable quality of the outputs. The auto-tuning
also ensures its overhead be reasonable and low enough to be enabled
always on the production. The simplified monitoring results metrics can
be useful for showing both coldness (idle time percentiles) and hotness
(memory bandwidth) of the system's access pattern. We expect the
information can be useful for assessing system memory utilization and
inspiring optimizations or investigations on both kernel and user space
memory management logics for large scale fleets.
We hence expect the module is good enough to be just used in most
environments. For special cases that require a custom access monitoring
automation, users will still benefit by using DAMON_STAT as a reference or
a guideline for their specialized automation.
This patch (of 4):
To use DAMON for monitoring access patterns of the system, users should
manually start DAMON via DAMON sysfs ABI with a number of parameters for
specifying the monitoring target address space, address ranges, and
monitoring intervals. After that, users should also wait until desired
amount of time data is captured into DAMON's monitoring results. It is
bothersome and take a long time to be practical for access monitoring on
large fleet level production environments.
For access-aware system operations use cases like proactive cold memory
reclamation, similar problems existed. We we solved those by introducing
dedicated static kernel modules such as DAMON_RECLAIM.
Implement such static kernel module for access monitoring, namely
DAMON_STAT. It monitors the entire physical address space with auto-tuned
monitoring intervals. The auto-tuning is set to capture 4 % of observable
access events in each snapshot while keeping the sampling intervals 5
milliseconds in minimum and 10 seconds in maximum. From a few production
environments, we confirmed this setup provides high quality monitoring
results with minimum overheads. The module therefore receives only one
user input, whether to enable or disable it. It can be set on build or
boot time via build configuration or kernel boot command line. It can
also be overridden at runtime.
Note that this commit only implements the DAMON control part of the
module. Users could get the monitoring results via damon:damon_aggregated
tracepoint, but that's of course not the recommended way. Following
commits will implement convenient and optimized ways for serving the
monitoring results to users.
[sj@kernel.org: use IS_ENABLED() for enabled initial value]
Link: https://lkml.kernel.org/r/20250604205619.18929-1-sj@kernel.org
[sj@kernel.org: reset enabled when DAMON start failed]
Link: https://lkml.kernel.org/r/20250706184750.36588-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20250604183127.13968-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20250604183127.13968-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-06-04 11:31:24 -07:00
|
|
|
static struct damon_ctx *damon_stat_context;
|
|
|
|
|
2025-06-04 11:31:25 -07:00
|
|
|
static void damon_stat_set_estimated_memory_bandwidth(struct damon_ctx *c)
|
|
|
|
{
|
|
|
|
struct damon_target *t;
|
|
|
|
struct damon_region *r;
|
|
|
|
unsigned long access_bytes = 0;
|
|
|
|
|
|
|
|
damon_for_each_target(t, c) {
|
|
|
|
damon_for_each_region(r, t)
|
|
|
|
access_bytes += (r->ar.end - r->ar.start) *
|
|
|
|
r->nr_accesses;
|
|
|
|
}
|
|
|
|
estimated_memory_bandwidth = access_bytes * USEC_PER_MSEC *
|
|
|
|
MSEC_PER_SEC / c->attrs.aggr_interval;
|
|
|
|
}
|
|
|
|
|
mm/damon/stat: calculate and expose idle time percentiles
Knowing how much memory is how cold can be useful for understanding
coldness and utilization efficiency of memory. The raw form of DAMON's
monitoring results has the information. Convert the raw results into the
per-byte idle time distributions and expose it as percentiles metric to
users, as a read-only DAMON_STAT parameter.
In detail, the metrics are calculated as follows. First, DAMON's
per-region access frequency and age information is converted into per-byte
idle time. If access frequency of a region is higher than zero, every
byte of the region has zero idle time. If the access frequency of a
region is zero, every byte of the region has idle time as the age of the
region. Then the logic sorts the per-byte idle times and provides the
value at 0/100, 1/100, ..., 99/100 and 100/100 location of the sorted
array.
The metric can be easily aggregated and compared on large scale production
systems. For example, if an average of 75-th percentile idle time of
machines that collected on similar time is two minutes, it means the
system's 25 percent memory is not accessed at all for two minutes or more
on average. If a workload considers two minutes as unit work time, we can
conclude its working set size is only 75 percent of the memory. If the
system utilizes proactive reclamation and it supports coldness-based
thresholds like DAMON_RECLAIM, the idle time percentiles can be used to
find a more safe or aggressive coldness threshold for aimed memory saving.
Link: https://lkml.kernel.org/r/20250604183127.13968-4-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-06-04 11:31:26 -07:00
|
|
|
static unsigned int damon_stat_idletime(const struct damon_region *r)
|
|
|
|
{
|
|
|
|
if (r->nr_accesses)
|
|
|
|
return 0;
|
|
|
|
return r->age + 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int damon_stat_cmp_regions(const void *a, const void *b)
|
|
|
|
{
|
|
|
|
const struct damon_region *ra = *(const struct damon_region **)a;
|
|
|
|
const struct damon_region *rb = *(const struct damon_region **)b;
|
|
|
|
|
|
|
|
return damon_stat_idletime(ra) - damon_stat_idletime(rb);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int damon_stat_sort_regions(struct damon_ctx *c,
|
|
|
|
struct damon_region ***sorted_ptr, int *nr_regions_ptr,
|
|
|
|
unsigned long *total_sz_ptr)
|
|
|
|
{
|
|
|
|
struct damon_target *t;
|
|
|
|
struct damon_region *r;
|
|
|
|
struct damon_region **region_pointers;
|
|
|
|
unsigned int nr_regions = 0;
|
|
|
|
unsigned long total_sz = 0;
|
|
|
|
|
|
|
|
damon_for_each_target(t, c) {
|
|
|
|
/* there is only one target */
|
|
|
|
region_pointers = kmalloc_array(damon_nr_regions(t),
|
|
|
|
sizeof(*region_pointers), GFP_KERNEL);
|
|
|
|
if (!region_pointers)
|
|
|
|
return -ENOMEM;
|
|
|
|
damon_for_each_region(r, t) {
|
|
|
|
region_pointers[nr_regions++] = r;
|
|
|
|
total_sz += r->ar.end - r->ar.start;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
sort(region_pointers, nr_regions, sizeof(*region_pointers),
|
|
|
|
damon_stat_cmp_regions, NULL);
|
|
|
|
*sorted_ptr = region_pointers;
|
|
|
|
*nr_regions_ptr = nr_regions;
|
|
|
|
*total_sz_ptr = total_sz;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void damon_stat_set_idletime_percentiles(struct damon_ctx *c)
|
|
|
|
{
|
|
|
|
struct damon_region **sorted_regions, *region;
|
|
|
|
int nr_regions;
|
|
|
|
unsigned long total_sz, accounted_bytes = 0;
|
|
|
|
int err, i, next_percentile = 0;
|
|
|
|
|
|
|
|
err = damon_stat_sort_regions(c, &sorted_regions, &nr_regions,
|
|
|
|
&total_sz);
|
|
|
|
if (err)
|
|
|
|
return;
|
|
|
|
for (i = 0; i < nr_regions; i++) {
|
|
|
|
region = sorted_regions[i];
|
|
|
|
accounted_bytes += region->ar.end - region->ar.start;
|
|
|
|
while (next_percentile <= accounted_bytes * 100 / total_sz)
|
|
|
|
memory_idle_ms_percentiles[next_percentile++] =
|
|
|
|
damon_stat_idletime(region) *
|
|
|
|
c->attrs.aggr_interval / USEC_PER_MSEC;
|
|
|
|
}
|
|
|
|
kfree(sorted_regions);
|
|
|
|
}
|
|
|
|
|
2025-07-12 12:50:05 -07:00
|
|
|
static int damon_stat_damon_call_fn(void *data)
|
2025-06-04 11:31:25 -07:00
|
|
|
{
|
2025-07-12 12:50:05 -07:00
|
|
|
struct damon_ctx *c = data;
|
2025-06-04 11:31:25 -07:00
|
|
|
static unsigned long last_refresh_jiffies;
|
|
|
|
|
|
|
|
/* avoid unnecessarily frequent stat update */
|
|
|
|
if (time_before_eq(jiffies, last_refresh_jiffies +
|
|
|
|
msecs_to_jiffies(5 * MSEC_PER_SEC)))
|
|
|
|
return 0;
|
|
|
|
last_refresh_jiffies = jiffies;
|
|
|
|
|
|
|
|
damon_stat_set_estimated_memory_bandwidth(c);
|
mm/damon/stat: calculate and expose idle time percentiles
Knowing how much memory is how cold can be useful for understanding
coldness and utilization efficiency of memory. The raw form of DAMON's
monitoring results has the information. Convert the raw results into the
per-byte idle time distributions and expose it as percentiles metric to
users, as a read-only DAMON_STAT parameter.
In detail, the metrics are calculated as follows. First, DAMON's
per-region access frequency and age information is converted into per-byte
idle time. If access frequency of a region is higher than zero, every
byte of the region has zero idle time. If the access frequency of a
region is zero, every byte of the region has idle time as the age of the
region. Then the logic sorts the per-byte idle times and provides the
value at 0/100, 1/100, ..., 99/100 and 100/100 location of the sorted
array.
The metric can be easily aggregated and compared on large scale production
systems. For example, if an average of 75-th percentile idle time of
machines that collected on similar time is two minutes, it means the
system's 25 percent memory is not accessed at all for two minutes or more
on average. If a workload considers two minutes as unit work time, we can
conclude its working set size is only 75 percent of the memory. If the
system utilizes proactive reclamation and it supports coldness-based
thresholds like DAMON_RECLAIM, the idle time percentiles can be used to
find a more safe or aggressive coldness threshold for aimed memory saving.
Link: https://lkml.kernel.org/r/20250604183127.13968-4-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-06-04 11:31:26 -07:00
|
|
|
damon_stat_set_idletime_percentiles(c);
|
2025-06-04 11:31:25 -07:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
mm/damon: introduce DAMON_STAT module
Patch series "mm/damon: introduce DAMON_STAT for simple and practical
access monitoring", v2.
DAMON-based access monitoring is not simple due to required DAMON control
and results visualizations. Introduce a static kernel module for making
it simple. The module can be enabled without manual setup and provides
access pattern metrics that easy to fetch and understand the practical
access pattern information, namely estimated memory bandwidth and memory
idle time percentiles.
Background and Problems
=======================
DAMON can be used for monitoring data access patterns of the system and
workloads. Specifically, users can start DAMON to monitor access events
on specific address space with fine controls including address ranges to
monitor and time intervals between samplings and aggregations. The
resulting access information snapshot contains access frequency
(nr_accesses) and how long the frequency was kept (age) for each byte.
The monitoring usage is not simple and practical enough for production
usage. Users should first start DAMON with a number of parameters, and
wait until DAMON's monitoring results capture a reasonable amount of the
time data (age). In production, such manual start and wait is impractical
to capture useful information from a high number of machines in a timely
manner.
The monitoring result is also too detailed to be used on production
environments. The raw results are hard to be aggregated and/or compared
for production environments having a large scale of time, space and
machines fleet.
Users have to implement and use their own automation of DAMON control and
results processing. It is repetitive and challenging since there is no
good reference or guideline for such automation.
Solution: DAMON_STAT
====================
Implement such automation in kernel space as a static kernel module,
namely DAMON_STAT. It can be enabled at build, boot, or run time via its
build configuration or module parameter. It monitors the entire physical
address space with monitoring intervals that auto-tuned for a reasonable
amount of access observations and minimum overhead. It converts the raw
monitoring results into simpler metrics that can easily be aggregated and
compared, namely estimated memory bandwidth and idle time percentiles.
Understanding of the metrics and the user interface of DAMON_STAT is
essential. Refer to the commit messages of the second and the third
patches of this patch series for more details about the metrics. For the
user interface, the standard module parameters system is used. Refer to
the fourth patch of this patch series for details of the user interface.
Discussions
===========
The module aims to be useful on production environments constructed with a
large number of machines that run a long time. The auto-tuned monitoring
intervals ensure a reasonable quality of the outputs. The auto-tuning
also ensures its overhead be reasonable and low enough to be enabled
always on the production. The simplified monitoring results metrics can
be useful for showing both coldness (idle time percentiles) and hotness
(memory bandwidth) of the system's access pattern. We expect the
information can be useful for assessing system memory utilization and
inspiring optimizations or investigations on both kernel and user space
memory management logics for large scale fleets.
We hence expect the module is good enough to be just used in most
environments. For special cases that require a custom access monitoring
automation, users will still benefit by using DAMON_STAT as a reference or
a guideline for their specialized automation.
This patch (of 4):
To use DAMON for monitoring access patterns of the system, users should
manually start DAMON via DAMON sysfs ABI with a number of parameters for
specifying the monitoring target address space, address ranges, and
monitoring intervals. After that, users should also wait until desired
amount of time data is captured into DAMON's monitoring results. It is
bothersome and take a long time to be practical for access monitoring on
large fleet level production environments.
For access-aware system operations use cases like proactive cold memory
reclamation, similar problems existed. We we solved those by introducing
dedicated static kernel modules such as DAMON_RECLAIM.
Implement such static kernel module for access monitoring, namely
DAMON_STAT. It monitors the entire physical address space with auto-tuned
monitoring intervals. The auto-tuning is set to capture 4 % of observable
access events in each snapshot while keeping the sampling intervals 5
milliseconds in minimum and 10 seconds in maximum. From a few production
environments, we confirmed this setup provides high quality monitoring
results with minimum overheads. The module therefore receives only one
user input, whether to enable or disable it. It can be set on build or
boot time via build configuration or kernel boot command line. It can
also be overridden at runtime.
Note that this commit only implements the DAMON control part of the
module. Users could get the monitoring results via damon:damon_aggregated
tracepoint, but that's of course not the recommended way. Following
commits will implement convenient and optimized ways for serving the
monitoring results to users.
[sj@kernel.org: use IS_ENABLED() for enabled initial value]
Link: https://lkml.kernel.org/r/20250604205619.18929-1-sj@kernel.org
[sj@kernel.org: reset enabled when DAMON start failed]
Link: https://lkml.kernel.org/r/20250706184750.36588-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20250604183127.13968-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20250604183127.13968-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-06-04 11:31:24 -07:00
|
|
|
static struct damon_ctx *damon_stat_build_ctx(void)
|
|
|
|
{
|
|
|
|
struct damon_ctx *ctx;
|
|
|
|
struct damon_attrs attrs;
|
|
|
|
struct damon_target *target;
|
|
|
|
unsigned long start = 0, end = 0;
|
|
|
|
|
|
|
|
ctx = damon_new_ctx();
|
|
|
|
if (!ctx)
|
|
|
|
return NULL;
|
|
|
|
attrs = (struct damon_attrs) {
|
|
|
|
.sample_interval = 5 * USEC_PER_MSEC,
|
|
|
|
.aggr_interval = 100 * USEC_PER_MSEC,
|
|
|
|
.ops_update_interval = 60 * USEC_PER_MSEC * MSEC_PER_SEC,
|
|
|
|
.min_nr_regions = 10,
|
|
|
|
.max_nr_regions = 1000,
|
|
|
|
};
|
|
|
|
/*
|
|
|
|
* auto-tune sampling and aggregation interval aiming 4% DAMON-observed
|
|
|
|
* accesses ratio, keeping sampling interval in [5ms, 10s] range.
|
|
|
|
*/
|
|
|
|
attrs.intervals_goal = (struct damon_intervals_goal) {
|
|
|
|
.access_bp = 400, .aggrs = 3,
|
|
|
|
.min_sample_us = 5000, .max_sample_us = 10000000,
|
|
|
|
};
|
|
|
|
if (damon_set_attrs(ctx, &attrs))
|
|
|
|
goto free_out;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* auto-tune sampling and aggregation interval aiming 4% DAMON-observed
|
|
|
|
* accesses ratio, keeping sampling interval in [5ms, 10s] range.
|
|
|
|
*/
|
|
|
|
ctx->attrs.intervals_goal = (struct damon_intervals_goal) {
|
|
|
|
.access_bp = 400, .aggrs = 3,
|
|
|
|
.min_sample_us = 5000, .max_sample_us = 10000000,
|
|
|
|
};
|
|
|
|
if (damon_select_ops(ctx, DAMON_OPS_PADDR))
|
|
|
|
goto free_out;
|
|
|
|
|
|
|
|
target = damon_new_target();
|
|
|
|
if (!target)
|
|
|
|
goto free_out;
|
|
|
|
damon_add_target(ctx, target);
|
|
|
|
if (damon_set_region_biggest_system_ram_default(target, &start, &end))
|
|
|
|
goto free_out;
|
|
|
|
return ctx;
|
|
|
|
free_out:
|
|
|
|
damon_destroy_ctx(ctx);
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2025-07-12 12:50:05 -07:00
|
|
|
static struct damon_call_control call_control = {
|
|
|
|
.fn = damon_stat_damon_call_fn,
|
|
|
|
.repeat = true,
|
|
|
|
};
|
|
|
|
|
mm/damon: introduce DAMON_STAT module
Patch series "mm/damon: introduce DAMON_STAT for simple and practical
access monitoring", v2.
DAMON-based access monitoring is not simple due to required DAMON control
and results visualizations. Introduce a static kernel module for making
it simple. The module can be enabled without manual setup and provides
access pattern metrics that easy to fetch and understand the practical
access pattern information, namely estimated memory bandwidth and memory
idle time percentiles.
Background and Problems
=======================
DAMON can be used for monitoring data access patterns of the system and
workloads. Specifically, users can start DAMON to monitor access events
on specific address space with fine controls including address ranges to
monitor and time intervals between samplings and aggregations. The
resulting access information snapshot contains access frequency
(nr_accesses) and how long the frequency was kept (age) for each byte.
The monitoring usage is not simple and practical enough for production
usage. Users should first start DAMON with a number of parameters, and
wait until DAMON's monitoring results capture a reasonable amount of the
time data (age). In production, such manual start and wait is impractical
to capture useful information from a high number of machines in a timely
manner.
The monitoring result is also too detailed to be used on production
environments. The raw results are hard to be aggregated and/or compared
for production environments having a large scale of time, space and
machines fleet.
Users have to implement and use their own automation of DAMON control and
results processing. It is repetitive and challenging since there is no
good reference or guideline for such automation.
Solution: DAMON_STAT
====================
Implement such automation in kernel space as a static kernel module,
namely DAMON_STAT. It can be enabled at build, boot, or run time via its
build configuration or module parameter. It monitors the entire physical
address space with monitoring intervals that auto-tuned for a reasonable
amount of access observations and minimum overhead. It converts the raw
monitoring results into simpler metrics that can easily be aggregated and
compared, namely estimated memory bandwidth and idle time percentiles.
Understanding of the metrics and the user interface of DAMON_STAT is
essential. Refer to the commit messages of the second and the third
patches of this patch series for more details about the metrics. For the
user interface, the standard module parameters system is used. Refer to
the fourth patch of this patch series for details of the user interface.
Discussions
===========
The module aims to be useful on production environments constructed with a
large number of machines that run a long time. The auto-tuned monitoring
intervals ensure a reasonable quality of the outputs. The auto-tuning
also ensures its overhead be reasonable and low enough to be enabled
always on the production. The simplified monitoring results metrics can
be useful for showing both coldness (idle time percentiles) and hotness
(memory bandwidth) of the system's access pattern. We expect the
information can be useful for assessing system memory utilization and
inspiring optimizations or investigations on both kernel and user space
memory management logics for large scale fleets.
We hence expect the module is good enough to be just used in most
environments. For special cases that require a custom access monitoring
automation, users will still benefit by using DAMON_STAT as a reference or
a guideline for their specialized automation.
This patch (of 4):
To use DAMON for monitoring access patterns of the system, users should
manually start DAMON via DAMON sysfs ABI with a number of parameters for
specifying the monitoring target address space, address ranges, and
monitoring intervals. After that, users should also wait until desired
amount of time data is captured into DAMON's monitoring results. It is
bothersome and take a long time to be practical for access monitoring on
large fleet level production environments.
For access-aware system operations use cases like proactive cold memory
reclamation, similar problems existed. We we solved those by introducing
dedicated static kernel modules such as DAMON_RECLAIM.
Implement such static kernel module for access monitoring, namely
DAMON_STAT. It monitors the entire physical address space with auto-tuned
monitoring intervals. The auto-tuning is set to capture 4 % of observable
access events in each snapshot while keeping the sampling intervals 5
milliseconds in minimum and 10 seconds in maximum. From a few production
environments, we confirmed this setup provides high quality monitoring
results with minimum overheads. The module therefore receives only one
user input, whether to enable or disable it. It can be set on build or
boot time via build configuration or kernel boot command line. It can
also be overridden at runtime.
Note that this commit only implements the DAMON control part of the
module. Users could get the monitoring results via damon:damon_aggregated
tracepoint, but that's of course not the recommended way. Following
commits will implement convenient and optimized ways for serving the
monitoring results to users.
[sj@kernel.org: use IS_ENABLED() for enabled initial value]
Link: https://lkml.kernel.org/r/20250604205619.18929-1-sj@kernel.org
[sj@kernel.org: reset enabled when DAMON start failed]
Link: https://lkml.kernel.org/r/20250706184750.36588-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20250604183127.13968-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20250604183127.13968-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-06-04 11:31:24 -07:00
|
|
|
static int damon_stat_start(void)
|
|
|
|
{
|
2025-07-12 12:50:05 -07:00
|
|
|
int err;
|
|
|
|
|
mm/damon: introduce DAMON_STAT module
Patch series "mm/damon: introduce DAMON_STAT for simple and practical
access monitoring", v2.
DAMON-based access monitoring is not simple due to required DAMON control
and results visualizations. Introduce a static kernel module for making
it simple. The module can be enabled without manual setup and provides
access pattern metrics that easy to fetch and understand the practical
access pattern information, namely estimated memory bandwidth and memory
idle time percentiles.
Background and Problems
=======================
DAMON can be used for monitoring data access patterns of the system and
workloads. Specifically, users can start DAMON to monitor access events
on specific address space with fine controls including address ranges to
monitor and time intervals between samplings and aggregations. The
resulting access information snapshot contains access frequency
(nr_accesses) and how long the frequency was kept (age) for each byte.
The monitoring usage is not simple and practical enough for production
usage. Users should first start DAMON with a number of parameters, and
wait until DAMON's monitoring results capture a reasonable amount of the
time data (age). In production, such manual start and wait is impractical
to capture useful information from a high number of machines in a timely
manner.
The monitoring result is also too detailed to be used on production
environments. The raw results are hard to be aggregated and/or compared
for production environments having a large scale of time, space and
machines fleet.
Users have to implement and use their own automation of DAMON control and
results processing. It is repetitive and challenging since there is no
good reference or guideline for such automation.
Solution: DAMON_STAT
====================
Implement such automation in kernel space as a static kernel module,
namely DAMON_STAT. It can be enabled at build, boot, or run time via its
build configuration or module parameter. It monitors the entire physical
address space with monitoring intervals that auto-tuned for a reasonable
amount of access observations and minimum overhead. It converts the raw
monitoring results into simpler metrics that can easily be aggregated and
compared, namely estimated memory bandwidth and idle time percentiles.
Understanding of the metrics and the user interface of DAMON_STAT is
essential. Refer to the commit messages of the second and the third
patches of this patch series for more details about the metrics. For the
user interface, the standard module parameters system is used. Refer to
the fourth patch of this patch series for details of the user interface.
Discussions
===========
The module aims to be useful on production environments constructed with a
large number of machines that run a long time. The auto-tuned monitoring
intervals ensure a reasonable quality of the outputs. The auto-tuning
also ensures its overhead be reasonable and low enough to be enabled
always on the production. The simplified monitoring results metrics can
be useful for showing both coldness (idle time percentiles) and hotness
(memory bandwidth) of the system's access pattern. We expect the
information can be useful for assessing system memory utilization and
inspiring optimizations or investigations on both kernel and user space
memory management logics for large scale fleets.
We hence expect the module is good enough to be just used in most
environments. For special cases that require a custom access monitoring
automation, users will still benefit by using DAMON_STAT as a reference or
a guideline for their specialized automation.
This patch (of 4):
To use DAMON for monitoring access patterns of the system, users should
manually start DAMON via DAMON sysfs ABI with a number of parameters for
specifying the monitoring target address space, address ranges, and
monitoring intervals. After that, users should also wait until desired
amount of time data is captured into DAMON's monitoring results. It is
bothersome and take a long time to be practical for access monitoring on
large fleet level production environments.
For access-aware system operations use cases like proactive cold memory
reclamation, similar problems existed. We we solved those by introducing
dedicated static kernel modules such as DAMON_RECLAIM.
Implement such static kernel module for access monitoring, namely
DAMON_STAT. It monitors the entire physical address space with auto-tuned
monitoring intervals. The auto-tuning is set to capture 4 % of observable
access events in each snapshot while keeping the sampling intervals 5
milliseconds in minimum and 10 seconds in maximum. From a few production
environments, we confirmed this setup provides high quality monitoring
results with minimum overheads. The module therefore receives only one
user input, whether to enable or disable it. It can be set on build or
boot time via build configuration or kernel boot command line. It can
also be overridden at runtime.
Note that this commit only implements the DAMON control part of the
module. Users could get the monitoring results via damon:damon_aggregated
tracepoint, but that's of course not the recommended way. Following
commits will implement convenient and optimized ways for serving the
monitoring results to users.
[sj@kernel.org: use IS_ENABLED() for enabled initial value]
Link: https://lkml.kernel.org/r/20250604205619.18929-1-sj@kernel.org
[sj@kernel.org: reset enabled when DAMON start failed]
Link: https://lkml.kernel.org/r/20250706184750.36588-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20250604183127.13968-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20250604183127.13968-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-06-04 11:31:24 -07:00
|
|
|
damon_stat_context = damon_stat_build_ctx();
|
|
|
|
if (!damon_stat_context)
|
|
|
|
return -ENOMEM;
|
2025-07-12 12:50:05 -07:00
|
|
|
err = damon_start(&damon_stat_context, 1, true);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
call_control.data = damon_stat_context;
|
|
|
|
return damon_call(damon_stat_context, &call_control);
|
mm/damon: introduce DAMON_STAT module
Patch series "mm/damon: introduce DAMON_STAT for simple and practical
access monitoring", v2.
DAMON-based access monitoring is not simple due to required DAMON control
and results visualizations. Introduce a static kernel module for making
it simple. The module can be enabled without manual setup and provides
access pattern metrics that easy to fetch and understand the practical
access pattern information, namely estimated memory bandwidth and memory
idle time percentiles.
Background and Problems
=======================
DAMON can be used for monitoring data access patterns of the system and
workloads. Specifically, users can start DAMON to monitor access events
on specific address space with fine controls including address ranges to
monitor and time intervals between samplings and aggregations. The
resulting access information snapshot contains access frequency
(nr_accesses) and how long the frequency was kept (age) for each byte.
The monitoring usage is not simple and practical enough for production
usage. Users should first start DAMON with a number of parameters, and
wait until DAMON's monitoring results capture a reasonable amount of the
time data (age). In production, such manual start and wait is impractical
to capture useful information from a high number of machines in a timely
manner.
The monitoring result is also too detailed to be used on production
environments. The raw results are hard to be aggregated and/or compared
for production environments having a large scale of time, space and
machines fleet.
Users have to implement and use their own automation of DAMON control and
results processing. It is repetitive and challenging since there is no
good reference or guideline for such automation.
Solution: DAMON_STAT
====================
Implement such automation in kernel space as a static kernel module,
namely DAMON_STAT. It can be enabled at build, boot, or run time via its
build configuration or module parameter. It monitors the entire physical
address space with monitoring intervals that auto-tuned for a reasonable
amount of access observations and minimum overhead. It converts the raw
monitoring results into simpler metrics that can easily be aggregated and
compared, namely estimated memory bandwidth and idle time percentiles.
Understanding of the metrics and the user interface of DAMON_STAT is
essential. Refer to the commit messages of the second and the third
patches of this patch series for more details about the metrics. For the
user interface, the standard module parameters system is used. Refer to
the fourth patch of this patch series for details of the user interface.
Discussions
===========
The module aims to be useful on production environments constructed with a
large number of machines that run a long time. The auto-tuned monitoring
intervals ensure a reasonable quality of the outputs. The auto-tuning
also ensures its overhead be reasonable and low enough to be enabled
always on the production. The simplified monitoring results metrics can
be useful for showing both coldness (idle time percentiles) and hotness
(memory bandwidth) of the system's access pattern. We expect the
information can be useful for assessing system memory utilization and
inspiring optimizations or investigations on both kernel and user space
memory management logics for large scale fleets.
We hence expect the module is good enough to be just used in most
environments. For special cases that require a custom access monitoring
automation, users will still benefit by using DAMON_STAT as a reference or
a guideline for their specialized automation.
This patch (of 4):
To use DAMON for monitoring access patterns of the system, users should
manually start DAMON via DAMON sysfs ABI with a number of parameters for
specifying the monitoring target address space, address ranges, and
monitoring intervals. After that, users should also wait until desired
amount of time data is captured into DAMON's monitoring results. It is
bothersome and take a long time to be practical for access monitoring on
large fleet level production environments.
For access-aware system operations use cases like proactive cold memory
reclamation, similar problems existed. We we solved those by introducing
dedicated static kernel modules such as DAMON_RECLAIM.
Implement such static kernel module for access monitoring, namely
DAMON_STAT. It monitors the entire physical address space with auto-tuned
monitoring intervals. The auto-tuning is set to capture 4 % of observable
access events in each snapshot while keeping the sampling intervals 5
milliseconds in minimum and 10 seconds in maximum. From a few production
environments, we confirmed this setup provides high quality monitoring
results with minimum overheads. The module therefore receives only one
user input, whether to enable or disable it. It can be set on build or
boot time via build configuration or kernel boot command line. It can
also be overridden at runtime.
Note that this commit only implements the DAMON control part of the
module. Users could get the monitoring results via damon:damon_aggregated
tracepoint, but that's of course not the recommended way. Following
commits will implement convenient and optimized ways for serving the
monitoring results to users.
[sj@kernel.org: use IS_ENABLED() for enabled initial value]
Link: https://lkml.kernel.org/r/20250604205619.18929-1-sj@kernel.org
[sj@kernel.org: reset enabled when DAMON start failed]
Link: https://lkml.kernel.org/r/20250706184750.36588-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20250604183127.13968-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20250604183127.13968-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-06-04 11:31:24 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
static void damon_stat_stop(void)
|
|
|
|
{
|
|
|
|
damon_stop(&damon_stat_context, 1);
|
|
|
|
damon_destroy_ctx(damon_stat_context);
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool damon_stat_init_called;
|
|
|
|
|
|
|
|
static int damon_stat_enabled_store(
|
|
|
|
const char *val, const struct kernel_param *kp)
|
|
|
|
{
|
|
|
|
bool is_enabled = enabled;
|
|
|
|
int err;
|
|
|
|
|
|
|
|
err = kstrtobool(val, &enabled);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
|
|
|
|
if (is_enabled == enabled)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
if (!damon_stat_init_called)
|
|
|
|
/*
|
|
|
|
* probably called from command line parsing (parse_args()).
|
|
|
|
* Cannot call damon_new_ctx(). Let damon_stat_init() handle.
|
|
|
|
*/
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
if (enabled) {
|
|
|
|
err = damon_stat_start();
|
|
|
|
if (err)
|
|
|
|
enabled = false;
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
damon_stat_stop();
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int __init damon_stat_init(void)
|
|
|
|
{
|
|
|
|
int err = 0;
|
|
|
|
|
|
|
|
damon_stat_init_called = true;
|
|
|
|
|
|
|
|
/* probably set via command line */
|
|
|
|
if (enabled)
|
|
|
|
err = damon_stat_start();
|
|
|
|
|
|
|
|
if (err && enabled)
|
|
|
|
enabled = false;
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
module_init(damon_stat_init);
|