cxl/region: Add region creation support
CXL 2.0 allows for dynamic provisioning of new memory regions (system
physical address resources like "System RAM" and "Persistent Memory").
Whereas DDR and PMEM resources are conveyed statically at boot, CXL
allows for assembling and instantiating new regions from the available
capacity of CXL memory expanders in the system.
Sysfs with an "echo $region_name > $create_region_attribute" interface
is chosen as the mechanism to initiate the provisioning process. This
was chosen over ioctl() and netlink() to keep the configuration
interface entirely in a pseudo-fs interface, and it was chosen over
configfs since, aside from this one creation event, the interface is
read-mostly. I.e. configfs supports cases where an object is designed to
be provisioned each boot, like an iSCSI storage target, and CXL region
creation is mostly for PMEM regions which are created usually once
per-lifetime of a server instance. This is an improvement over nvdimm
that pre-created "seed" devices that tended to confuse users looking to
determine which devices are active and which are idle.
Recall that the major change that CXL brings over previous persistent
memory architectures is the ability to dynamically define new regions.
Compare that to drivers like 'nfit' where the region configuration is
statically defined by platform firmware.
Regions are created as a child of a root decoder that encompasses an
address space with constraints. When created through sysfs, the root
decoder is explicit. When created from an LSA's region structure a root
decoder will possibly need to be inferred by the driver.
Upon region creation through sysfs, a vacant region is created with a
unique name. Regions have a number of attributes that must be configured
before the region can be bound to the driver where HDM decoder program
is completed.
An example of creating a new region:
- Allocate a new region name:
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
- Create a new region by name:
while
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region
do true; done
- Region now exists in sysfs:
stat -t /sys/bus/cxl/devices/decoder0.0/$region
- Delete the region, and name:
echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com
[djbw: simplify locking, reword changelog]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-08 10:28:34 -07:00
|
|
|
// SPDX-License-Identifier: GPL-2.0-only
|
|
|
|
/* Copyright(c) 2022 Intel Corporation. All rights reserved. */
|
|
|
|
#include <linux/memregion.h>
|
|
|
|
#include <linux/genalloc.h>
|
|
|
|
#include <linux/device.h>
|
|
|
|
#include <linux/module.h>
|
2024-03-08 14:59:30 -07:00
|
|
|
#include <linux/memory.h>
|
cxl/region: Add region creation support
CXL 2.0 allows for dynamic provisioning of new memory regions (system
physical address resources like "System RAM" and "Persistent Memory").
Whereas DDR and PMEM resources are conveyed statically at boot, CXL
allows for assembling and instantiating new regions from the available
capacity of CXL memory expanders in the system.
Sysfs with an "echo $region_name > $create_region_attribute" interface
is chosen as the mechanism to initiate the provisioning process. This
was chosen over ioctl() and netlink() to keep the configuration
interface entirely in a pseudo-fs interface, and it was chosen over
configfs since, aside from this one creation event, the interface is
read-mostly. I.e. configfs supports cases where an object is designed to
be provisioned each boot, like an iSCSI storage target, and CXL region
creation is mostly for PMEM regions which are created usually once
per-lifetime of a server instance. This is an improvement over nvdimm
that pre-created "seed" devices that tended to confuse users looking to
determine which devices are active and which are idle.
Recall that the major change that CXL brings over previous persistent
memory architectures is the ability to dynamically define new regions.
Compare that to drivers like 'nfit' where the region configuration is
statically defined by platform firmware.
Regions are created as a child of a root decoder that encompasses an
address space with constraints. When created through sysfs, the root
decoder is explicit. When created from an LSA's region structure a root
decoder will possibly need to be inferred by the driver.
Upon region creation through sysfs, a vacant region is created with a
unique name. Regions have a number of attributes that must be configured
before the region can be bound to the driver where HDM decoder program
is completed.
An example of creating a new region:
- Allocate a new region name:
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
- Create a new region by name:
while
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region
do true; done
- Region now exists in sysfs:
stat -t /sys/bus/cxl/devices/decoder0.0/$region
- Delete the region, and name:
echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com
[djbw: simplify locking, reword changelog]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-08 10:28:34 -07:00
|
|
|
#include <linux/slab.h>
|
2021-05-27 13:30:41 -07:00
|
|
|
#include <linux/uuid.h>
|
2023-02-10 17:31:17 -08:00
|
|
|
#include <linux/sort.h>
|
cxl/region: Add region creation support
CXL 2.0 allows for dynamic provisioning of new memory regions (system
physical address resources like "System RAM" and "Persistent Memory").
Whereas DDR and PMEM resources are conveyed statically at boot, CXL
allows for assembling and instantiating new regions from the available
capacity of CXL memory expanders in the system.
Sysfs with an "echo $region_name > $create_region_attribute" interface
is chosen as the mechanism to initiate the provisioning process. This
was chosen over ioctl() and netlink() to keep the configuration
interface entirely in a pseudo-fs interface, and it was chosen over
configfs since, aside from this one creation event, the interface is
read-mostly. I.e. configfs supports cases where an object is designed to
be provisioned each boot, like an iSCSI storage target, and CXL region
creation is mostly for PMEM regions which are created usually once
per-lifetime of a server instance. This is an improvement over nvdimm
that pre-created "seed" devices that tended to confuse users looking to
determine which devices are active and which are idle.
Recall that the major change that CXL brings over previous persistent
memory architectures is the ability to dynamically define new regions.
Compare that to drivers like 'nfit' where the region configuration is
statically defined by platform firmware.
Regions are created as a child of a root decoder that encompasses an
address space with constraints. When created through sysfs, the root
decoder is explicit. When created from an LSA's region structure a root
decoder will possibly need to be inferred by the driver.
Upon region creation through sysfs, a vacant region is created with a
unique name. Regions have a number of attributes that must be configured
before the region can be bound to the driver where HDM decoder program
is completed.
An example of creating a new region:
- Allocate a new region name:
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
- Create a new region by name:
while
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region
do true; done
- Region now exists in sysfs:
stat -t /sys/bus/cxl/devices/decoder0.0/$region
- Delete the region, and name:
echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com
[djbw: simplify locking, reword changelog]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-08 10:28:34 -07:00
|
|
|
#include <linux/idr.h>
|
2024-06-18 16:46:38 +08:00
|
|
|
#include <linux/memory-tiers.h>
|
2022-04-25 11:36:48 -07:00
|
|
|
#include <cxlmem.h>
|
cxl/region: Add region creation support
CXL 2.0 allows for dynamic provisioning of new memory regions (system
physical address resources like "System RAM" and "Persistent Memory").
Whereas DDR and PMEM resources are conveyed statically at boot, CXL
allows for assembling and instantiating new regions from the available
capacity of CXL memory expanders in the system.
Sysfs with an "echo $region_name > $create_region_attribute" interface
is chosen as the mechanism to initiate the provisioning process. This
was chosen over ioctl() and netlink() to keep the configuration
interface entirely in a pseudo-fs interface, and it was chosen over
configfs since, aside from this one creation event, the interface is
read-mostly. I.e. configfs supports cases where an object is designed to
be provisioned each boot, like an iSCSI storage target, and CXL region
creation is mostly for PMEM regions which are created usually once
per-lifetime of a server instance. This is an improvement over nvdimm
that pre-created "seed" devices that tended to confuse users looking to
determine which devices are active and which are idle.
Recall that the major change that CXL brings over previous persistent
memory architectures is the ability to dynamically define new regions.
Compare that to drivers like 'nfit' where the region configuration is
statically defined by platform firmware.
Regions are created as a child of a root decoder that encompasses an
address space with constraints. When created through sysfs, the root
decoder is explicit. When created from an LSA's region structure a root
decoder will possibly need to be inferred by the driver.
Upon region creation through sysfs, a vacant region is created with a
unique name. Regions have a number of attributes that must be configured
before the region can be bound to the driver where HDM decoder program
is completed.
An example of creating a new region:
- Allocate a new region name:
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
- Create a new region by name:
while
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region
do true; done
- Region now exists in sysfs:
stat -t /sys/bus/cxl/devices/decoder0.0/$region
- Delete the region, and name:
echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com
[djbw: simplify locking, reword changelog]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-08 10:28:34 -07:00
|
|
|
#include <cxl.h>
|
|
|
|
#include "core.h"
|
|
|
|
|
|
|
|
/**
|
|
|
|
* DOC: cxl core region
|
|
|
|
*
|
|
|
|
* CXL Regions represent mapped memory capacity in system physical address
|
|
|
|
* space. Whereas the CXL Root Decoders identify the bounds of potential CXL
|
|
|
|
* Memory ranges, Regions represent the active mapped capacity by the HDM
|
|
|
|
* Decoder Capability structures throughout the Host Bridges, Switches, and
|
|
|
|
* Endpoints in the topology.
|
2021-05-27 13:30:41 -07:00
|
|
|
*
|
|
|
|
* Region configuration has ordering constraints. UUID may be set at any time
|
|
|
|
* but is only visible for persistent regions.
|
2022-04-25 11:36:48 -07:00
|
|
|
* 1. Interleave granularity
|
|
|
|
* 2. Interleave size
|
2022-06-04 15:49:53 -07:00
|
|
|
* 3. Decoder targets
|
2021-05-27 13:30:41 -07:00
|
|
|
*/
|
|
|
|
|
cxl/region: Add region creation support
CXL 2.0 allows for dynamic provisioning of new memory regions (system
physical address resources like "System RAM" and "Persistent Memory").
Whereas DDR and PMEM resources are conveyed statically at boot, CXL
allows for assembling and instantiating new regions from the available
capacity of CXL memory expanders in the system.
Sysfs with an "echo $region_name > $create_region_attribute" interface
is chosen as the mechanism to initiate the provisioning process. This
was chosen over ioctl() and netlink() to keep the configuration
interface entirely in a pseudo-fs interface, and it was chosen over
configfs since, aside from this one creation event, the interface is
read-mostly. I.e. configfs supports cases where an object is designed to
be provisioned each boot, like an iSCSI storage target, and CXL region
creation is mostly for PMEM regions which are created usually once
per-lifetime of a server instance. This is an improvement over nvdimm
that pre-created "seed" devices that tended to confuse users looking to
determine which devices are active and which are idle.
Recall that the major change that CXL brings over previous persistent
memory architectures is the ability to dynamically define new regions.
Compare that to drivers like 'nfit' where the region configuration is
statically defined by platform firmware.
Regions are created as a child of a root decoder that encompasses an
address space with constraints. When created through sysfs, the root
decoder is explicit. When created from an LSA's region structure a root
decoder will possibly need to be inferred by the driver.
Upon region creation through sysfs, a vacant region is created with a
unique name. Regions have a number of attributes that must be configured
before the region can be bound to the driver where HDM decoder program
is completed.
An example of creating a new region:
- Allocate a new region name:
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
- Create a new region by name:
while
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region
do true; done
- Region now exists in sysfs:
stat -t /sys/bus/cxl/devices/decoder0.0/$region
- Delete the region, and name:
echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com
[djbw: simplify locking, reword changelog]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-08 10:28:34 -07:00
|
|
|
static struct cxl_region *to_cxl_region(struct device *dev);
|
|
|
|
|
2024-03-08 14:59:29 -07:00
|
|
|
#define __ACCESS_ATTR_RO(_level, _name) { \
|
|
|
|
.attr = { .name = __stringify(_name), .mode = 0444 }, \
|
|
|
|
.show = _name##_access##_level##_show, \
|
|
|
|
}
|
|
|
|
|
|
|
|
#define ACCESS_DEVICE_ATTR_RO(level, name) \
|
|
|
|
struct device_attribute dev_attr_access##level##_##name = __ACCESS_ATTR_RO(level, name)
|
|
|
|
|
|
|
|
#define ACCESS_ATTR_RO(level, attrib) \
|
|
|
|
static ssize_t attrib##_access##level##_show(struct device *dev, \
|
|
|
|
struct device_attribute *attr, \
|
|
|
|
char *buf) \
|
|
|
|
{ \
|
|
|
|
struct cxl_region *cxlr = to_cxl_region(dev); \
|
|
|
|
\
|
|
|
|
if (cxlr->coord[level].attrib == 0) \
|
|
|
|
return -ENOENT; \
|
|
|
|
\
|
|
|
|
return sysfs_emit(buf, "%u\n", cxlr->coord[level].attrib); \
|
|
|
|
} \
|
|
|
|
static ACCESS_DEVICE_ATTR_RO(level, attrib)
|
|
|
|
|
|
|
|
ACCESS_ATTR_RO(0, read_bandwidth);
|
|
|
|
ACCESS_ATTR_RO(0, read_latency);
|
|
|
|
ACCESS_ATTR_RO(0, write_bandwidth);
|
|
|
|
ACCESS_ATTR_RO(0, write_latency);
|
|
|
|
|
|
|
|
#define ACCESS_ATTR_DECLARE(level, attrib) \
|
|
|
|
(&dev_attr_access##level##_##attrib.attr)
|
|
|
|
|
|
|
|
static struct attribute *access0_coordinate_attrs[] = {
|
|
|
|
ACCESS_ATTR_DECLARE(0, read_bandwidth),
|
|
|
|
ACCESS_ATTR_DECLARE(0, write_bandwidth),
|
|
|
|
ACCESS_ATTR_DECLARE(0, read_latency),
|
|
|
|
ACCESS_ATTR_DECLARE(0, write_latency),
|
|
|
|
NULL
|
|
|
|
};
|
|
|
|
|
|
|
|
ACCESS_ATTR_RO(1, read_bandwidth);
|
|
|
|
ACCESS_ATTR_RO(1, read_latency);
|
|
|
|
ACCESS_ATTR_RO(1, write_bandwidth);
|
|
|
|
ACCESS_ATTR_RO(1, write_latency);
|
|
|
|
|
|
|
|
static struct attribute *access1_coordinate_attrs[] = {
|
|
|
|
ACCESS_ATTR_DECLARE(1, read_bandwidth),
|
|
|
|
ACCESS_ATTR_DECLARE(1, write_bandwidth),
|
|
|
|
ACCESS_ATTR_DECLARE(1, read_latency),
|
|
|
|
ACCESS_ATTR_DECLARE(1, write_latency),
|
|
|
|
NULL
|
|
|
|
};
|
|
|
|
|
|
|
|
#define ACCESS_VISIBLE(level) \
|
|
|
|
static umode_t cxl_region_access##level##_coordinate_visible( \
|
|
|
|
struct kobject *kobj, struct attribute *a, int n) \
|
|
|
|
{ \
|
|
|
|
struct device *dev = kobj_to_dev(kobj); \
|
|
|
|
struct cxl_region *cxlr = to_cxl_region(dev); \
|
|
|
|
\
|
|
|
|
if (a == &dev_attr_access##level##_read_latency.attr && \
|
|
|
|
cxlr->coord[level].read_latency == 0) \
|
|
|
|
return 0; \
|
|
|
|
\
|
|
|
|
if (a == &dev_attr_access##level##_write_latency.attr && \
|
|
|
|
cxlr->coord[level].write_latency == 0) \
|
|
|
|
return 0; \
|
|
|
|
\
|
|
|
|
if (a == &dev_attr_access##level##_read_bandwidth.attr && \
|
|
|
|
cxlr->coord[level].read_bandwidth == 0) \
|
|
|
|
return 0; \
|
|
|
|
\
|
|
|
|
if (a == &dev_attr_access##level##_write_bandwidth.attr && \
|
|
|
|
cxlr->coord[level].write_bandwidth == 0) \
|
|
|
|
return 0; \
|
|
|
|
\
|
|
|
|
return a->mode; \
|
|
|
|
}
|
|
|
|
|
|
|
|
ACCESS_VISIBLE(0);
|
|
|
|
ACCESS_VISIBLE(1);
|
|
|
|
|
|
|
|
static const struct attribute_group cxl_region_access0_coordinate_group = {
|
|
|
|
.name = "access0",
|
|
|
|
.attrs = access0_coordinate_attrs,
|
|
|
|
.is_visible = cxl_region_access0_coordinate_visible,
|
|
|
|
};
|
|
|
|
|
2024-03-08 14:59:30 -07:00
|
|
|
static const struct attribute_group *get_cxl_region_access0_group(void)
|
|
|
|
{
|
|
|
|
return &cxl_region_access0_coordinate_group;
|
|
|
|
}
|
|
|
|
|
2024-03-08 14:59:29 -07:00
|
|
|
static const struct attribute_group cxl_region_access1_coordinate_group = {
|
|
|
|
.name = "access1",
|
|
|
|
.attrs = access1_coordinate_attrs,
|
|
|
|
.is_visible = cxl_region_access1_coordinate_visible,
|
|
|
|
};
|
|
|
|
|
2024-03-08 14:59:30 -07:00
|
|
|
static const struct attribute_group *get_cxl_region_access1_group(void)
|
|
|
|
{
|
|
|
|
return &cxl_region_access1_coordinate_group;
|
|
|
|
}
|
|
|
|
|
2021-05-27 13:30:41 -07:00
|
|
|
static ssize_t uuid_show(struct device *dev, struct device_attribute *attr,
|
|
|
|
char *buf)
|
|
|
|
{
|
|
|
|
struct cxl_region *cxlr = to_cxl_region(dev);
|
|
|
|
struct cxl_region_params *p = &cxlr->params;
|
|
|
|
ssize_t rc;
|
|
|
|
|
2025-07-11 16:49:32 -07:00
|
|
|
ACQUIRE(rwsem_read_intr, region_rwsem)(&cxl_rwsem.region);
|
|
|
|
if ((rc = ACQUIRE_ERR(rwsem_read_intr, ®ion_rwsem)))
|
2021-05-27 13:30:41 -07:00
|
|
|
return rc;
|
2025-02-03 20:24:29 -08:00
|
|
|
if (cxlr->mode != CXL_PARTMODE_PMEM)
|
2025-07-11 16:49:32 -07:00
|
|
|
return sysfs_emit(buf, "\n");
|
|
|
|
return sysfs_emit(buf, "%pUb\n", &p->uuid);
|
2021-05-27 13:30:41 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
static int is_dup(struct device *match, void *data)
|
|
|
|
{
|
|
|
|
struct cxl_region_params *p;
|
|
|
|
struct cxl_region *cxlr;
|
|
|
|
uuid_t *uuid = data;
|
|
|
|
|
|
|
|
if (!is_cxl_region(match))
|
|
|
|
return 0;
|
|
|
|
|
2025-07-11 16:49:32 -07:00
|
|
|
lockdep_assert_held(&cxl_rwsem.region);
|
2021-05-27 13:30:41 -07:00
|
|
|
cxlr = to_cxl_region(match);
|
|
|
|
p = &cxlr->params;
|
|
|
|
|
|
|
|
if (uuid_equal(&p->uuid, uuid)) {
|
|
|
|
dev_dbg(match, "already has uuid: %pUb\n", uuid);
|
|
|
|
return -EBUSY;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static ssize_t uuid_store(struct device *dev, struct device_attribute *attr,
|
|
|
|
const char *buf, size_t len)
|
|
|
|
{
|
|
|
|
struct cxl_region *cxlr = to_cxl_region(dev);
|
|
|
|
struct cxl_region_params *p = &cxlr->params;
|
|
|
|
uuid_t temp;
|
|
|
|
ssize_t rc;
|
|
|
|
|
|
|
|
if (len != UUID_STRING_LEN + 1)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
rc = uuid_parse(buf, &temp);
|
|
|
|
if (rc)
|
|
|
|
return rc;
|
|
|
|
|
|
|
|
if (uuid_is_null(&temp))
|
|
|
|
return -EINVAL;
|
|
|
|
|
2025-07-11 16:49:32 -07:00
|
|
|
ACQUIRE(rwsem_write_kill, region_rwsem)(&cxl_rwsem.region);
|
|
|
|
if ((rc = ACQUIRE_ERR(rwsem_write_kill, ®ion_rwsem)))
|
2021-05-27 13:30:41 -07:00
|
|
|
return rc;
|
|
|
|
|
|
|
|
if (uuid_equal(&p->uuid, &temp))
|
2025-07-11 16:49:32 -07:00
|
|
|
return len;
|
2021-05-27 13:30:41 -07:00
|
|
|
|
|
|
|
if (p->state >= CXL_CONFIG_ACTIVE)
|
2025-07-11 16:49:32 -07:00
|
|
|
return -EBUSY;
|
2021-05-27 13:30:41 -07:00
|
|
|
|
|
|
|
rc = bus_for_each_dev(&cxl_bus_type, NULL, &temp, is_dup);
|
|
|
|
if (rc < 0)
|
2025-07-11 16:49:32 -07:00
|
|
|
return rc;
|
2021-05-27 13:30:41 -07:00
|
|
|
|
|
|
|
uuid_copy(&p->uuid, &temp);
|
|
|
|
|
|
|
|
return len;
|
|
|
|
}
|
|
|
|
static DEVICE_ATTR_RW(uuid);
|
|
|
|
|
2022-06-08 22:56:37 -07:00
|
|
|
static struct cxl_region_ref *cxl_rr_load(struct cxl_port *port,
|
|
|
|
struct cxl_region *cxlr)
|
|
|
|
{
|
|
|
|
return xa_load(&port->regions, (unsigned long)cxlr);
|
|
|
|
}
|
|
|
|
|
2023-06-16 18:24:28 -07:00
|
|
|
static int cxl_region_invalidate_memregion(struct cxl_region *cxlr)
|
|
|
|
{
|
|
|
|
if (!cpu_cache_has_invalidate_memregion()) {
|
|
|
|
if (IS_ENABLED(CONFIG_CXL_REGION_INVALIDATION_TEST)) {
|
2023-09-14 20:29:52 -07:00
|
|
|
dev_info_once(
|
2023-06-16 18:24:28 -07:00
|
|
|
&cxlr->dev,
|
|
|
|
"Bypassing cpu_cache_invalidate_memregion() for testing!\n");
|
|
|
|
return 0;
|
|
|
|
}
|
2025-05-09 17:06:46 +02:00
|
|
|
dev_WARN(&cxlr->dev,
|
|
|
|
"Failed to synchronize CPU cache state\n");
|
|
|
|
return -ENXIO;
|
2023-06-16 18:24:28 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
cpu_cache_invalidate_memregion(IORES_DESC_CXL);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
cxl/port: Fix use-after-free, permit out-of-order decoder shutdown
In support of investigating an initialization failure report [1],
cxl_test was updated to register mock memory-devices after the mock
root-port/bus device had been registered. That led to cxl_test crashing
with a use-after-free bug with the following signature:
cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem0:decoder7.0 @ 0 next: cxl_switch_uport.0 nr_eps: 1 nr_targets: 1
cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem4:decoder14.0 @ 1 next: cxl_switch_uport.0 nr_eps: 2 nr_targets: 1
cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[0] = cxl_switch_dport.0 for mem0:decoder7.0 @ 0
1) cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[1] = cxl_switch_dport.4 for mem4:decoder14.0 @ 1
[..]
cxld_unregister: cxl decoder14.0:
cxl_region_decode_reset: cxl_region region3:
mock_decoder_reset: cxl_port port3: decoder3.0 reset
2) mock_decoder_reset: cxl_port port3: decoder3.0: out of order reset, expected decoder3.1
cxl_endpoint_decoder_release: cxl decoder14.0:
[..]
cxld_unregister: cxl decoder7.0:
3) cxl_region_decode_reset: cxl_region region3:
Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6bc3: 0000 [#1] PREEMPT SMP PTI
[..]
RIP: 0010:to_cxl_port+0x8/0x60 [cxl_core]
[..]
Call Trace:
<TASK>
cxl_region_decode_reset+0x69/0x190 [cxl_core]
cxl_region_detach+0xe8/0x210 [cxl_core]
cxl_decoder_kill_region+0x27/0x40 [cxl_core]
cxld_unregister+0x5d/0x60 [cxl_core]
At 1) a region has been established with 2 endpoint decoders (7.0 and
14.0). Those endpoints share a common switch-decoder in the topology
(3.0). At teardown, 2), decoder14.0 is the first to be removed and hits
the "out of order reset case" in the switch decoder. The effect though
is that region3 cleanup is aborted leaving it in-tact and
referencing decoder14.0. At 3) the second attempt to teardown region3
trips over the stale decoder14.0 object which has long since been
deleted.
The fix here is to recognize that the CXL specification places no
mandate on in-order shutdown of switch-decoders, the driver enforces
in-order allocation, and hardware enforces in-order commit. So, rather
than fail and leave objects dangling, always remove them.
In support of making cxl_region_decode_reset() always succeed,
cxl_region_invalidate_memregion() failures are turned into warnings.
Crashing the kernel is ok there since system integrity is at risk if
caches cannot be managed around physical address mutation events like
CXL region destruction.
A new device_for_each_child_reverse_from() is added to cleanup
port->commit_end after all dependent decoders have been disabled. In
other words if decoders are allocated 0->1->2 and disabled 1->2->0 then
port->commit_end only decrements from 2 after 2 has been disabled, and
it decrements all the way to zero since 1 was disabled previously.
Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1]
Cc: stable@vger.kernel.org
Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware")
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Alison Schofield <alison.schofield@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Zijun Hu <quic_zijuhu@quicinc.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://patch.msgid.link/172964782781.81806.17902885593105284330.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
2024-10-22 18:43:49 -07:00
|
|
|
static void cxl_region_decode_reset(struct cxl_region *cxlr, int count)
|
2022-06-08 22:56:37 -07:00
|
|
|
{
|
|
|
|
struct cxl_region_params *p = &cxlr->params;
|
cxl/port: Fix use-after-free, permit out-of-order decoder shutdown
In support of investigating an initialization failure report [1],
cxl_test was updated to register mock memory-devices after the mock
root-port/bus device had been registered. That led to cxl_test crashing
with a use-after-free bug with the following signature:
cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem0:decoder7.0 @ 0 next: cxl_switch_uport.0 nr_eps: 1 nr_targets: 1
cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem4:decoder14.0 @ 1 next: cxl_switch_uport.0 nr_eps: 2 nr_targets: 1
cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[0] = cxl_switch_dport.0 for mem0:decoder7.0 @ 0
1) cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[1] = cxl_switch_dport.4 for mem4:decoder14.0 @ 1
[..]
cxld_unregister: cxl decoder14.0:
cxl_region_decode_reset: cxl_region region3:
mock_decoder_reset: cxl_port port3: decoder3.0 reset
2) mock_decoder_reset: cxl_port port3: decoder3.0: out of order reset, expected decoder3.1
cxl_endpoint_decoder_release: cxl decoder14.0:
[..]
cxld_unregister: cxl decoder7.0:
3) cxl_region_decode_reset: cxl_region region3:
Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6bc3: 0000 [#1] PREEMPT SMP PTI
[..]
RIP: 0010:to_cxl_port+0x8/0x60 [cxl_core]
[..]
Call Trace:
<TASK>
cxl_region_decode_reset+0x69/0x190 [cxl_core]
cxl_region_detach+0xe8/0x210 [cxl_core]
cxl_decoder_kill_region+0x27/0x40 [cxl_core]
cxld_unregister+0x5d/0x60 [cxl_core]
At 1) a region has been established with 2 endpoint decoders (7.0 and
14.0). Those endpoints share a common switch-decoder in the topology
(3.0). At teardown, 2), decoder14.0 is the first to be removed and hits
the "out of order reset case" in the switch decoder. The effect though
is that region3 cleanup is aborted leaving it in-tact and
referencing decoder14.0. At 3) the second attempt to teardown region3
trips over the stale decoder14.0 object which has long since been
deleted.
The fix here is to recognize that the CXL specification places no
mandate on in-order shutdown of switch-decoders, the driver enforces
in-order allocation, and hardware enforces in-order commit. So, rather
than fail and leave objects dangling, always remove them.
In support of making cxl_region_decode_reset() always succeed,
cxl_region_invalidate_memregion() failures are turned into warnings.
Crashing the kernel is ok there since system integrity is at risk if
caches cannot be managed around physical address mutation events like
CXL region destruction.
A new device_for_each_child_reverse_from() is added to cleanup
port->commit_end after all dependent decoders have been disabled. In
other words if decoders are allocated 0->1->2 and disabled 1->2->0 then
port->commit_end only decrements from 2 after 2 has been disabled, and
it decrements all the way to zero since 1 was disabled previously.
Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1]
Cc: stable@vger.kernel.org
Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware")
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Alison Schofield <alison.schofield@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Zijun Hu <quic_zijuhu@quicinc.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://patch.msgid.link/172964782781.81806.17902885593105284330.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
2024-10-22 18:43:49 -07:00
|
|
|
int i;
|
2023-06-16 18:24:28 -07:00
|
|
|
|
|
|
|
/*
|
cxl/port: Fix use-after-free, permit out-of-order decoder shutdown
In support of investigating an initialization failure report [1],
cxl_test was updated to register mock memory-devices after the mock
root-port/bus device had been registered. That led to cxl_test crashing
with a use-after-free bug with the following signature:
cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem0:decoder7.0 @ 0 next: cxl_switch_uport.0 nr_eps: 1 nr_targets: 1
cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem4:decoder14.0 @ 1 next: cxl_switch_uport.0 nr_eps: 2 nr_targets: 1
cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[0] = cxl_switch_dport.0 for mem0:decoder7.0 @ 0
1) cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[1] = cxl_switch_dport.4 for mem4:decoder14.0 @ 1
[..]
cxld_unregister: cxl decoder14.0:
cxl_region_decode_reset: cxl_region region3:
mock_decoder_reset: cxl_port port3: decoder3.0 reset
2) mock_decoder_reset: cxl_port port3: decoder3.0: out of order reset, expected decoder3.1
cxl_endpoint_decoder_release: cxl decoder14.0:
[..]
cxld_unregister: cxl decoder7.0:
3) cxl_region_decode_reset: cxl_region region3:
Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6bc3: 0000 [#1] PREEMPT SMP PTI
[..]
RIP: 0010:to_cxl_port+0x8/0x60 [cxl_core]
[..]
Call Trace:
<TASK>
cxl_region_decode_reset+0x69/0x190 [cxl_core]
cxl_region_detach+0xe8/0x210 [cxl_core]
cxl_decoder_kill_region+0x27/0x40 [cxl_core]
cxld_unregister+0x5d/0x60 [cxl_core]
At 1) a region has been established with 2 endpoint decoders (7.0 and
14.0). Those endpoints share a common switch-decoder in the topology
(3.0). At teardown, 2), decoder14.0 is the first to be removed and hits
the "out of order reset case" in the switch decoder. The effect though
is that region3 cleanup is aborted leaving it in-tact and
referencing decoder14.0. At 3) the second attempt to teardown region3
trips over the stale decoder14.0 object which has long since been
deleted.
The fix here is to recognize that the CXL specification places no
mandate on in-order shutdown of switch-decoders, the driver enforces
in-order allocation, and hardware enforces in-order commit. So, rather
than fail and leave objects dangling, always remove them.
In support of making cxl_region_decode_reset() always succeed,
cxl_region_invalidate_memregion() failures are turned into warnings.
Crashing the kernel is ok there since system integrity is at risk if
caches cannot be managed around physical address mutation events like
CXL region destruction.
A new device_for_each_child_reverse_from() is added to cleanup
port->commit_end after all dependent decoders have been disabled. In
other words if decoders are allocated 0->1->2 and disabled 1->2->0 then
port->commit_end only decrements from 2 after 2 has been disabled, and
it decrements all the way to zero since 1 was disabled previously.
Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1]
Cc: stable@vger.kernel.org
Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware")
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Alison Schofield <alison.schofield@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Zijun Hu <quic_zijuhu@quicinc.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://patch.msgid.link/172964782781.81806.17902885593105284330.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
2024-10-22 18:43:49 -07:00
|
|
|
* Before region teardown attempt to flush, evict any data cached for
|
|
|
|
* this region, or scream loudly about missing arch / platform support
|
|
|
|
* for CXL teardown.
|
2023-06-16 18:24:28 -07:00
|
|
|
*/
|
cxl/port: Fix use-after-free, permit out-of-order decoder shutdown
In support of investigating an initialization failure report [1],
cxl_test was updated to register mock memory-devices after the mock
root-port/bus device had been registered. That led to cxl_test crashing
with a use-after-free bug with the following signature:
cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem0:decoder7.0 @ 0 next: cxl_switch_uport.0 nr_eps: 1 nr_targets: 1
cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem4:decoder14.0 @ 1 next: cxl_switch_uport.0 nr_eps: 2 nr_targets: 1
cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[0] = cxl_switch_dport.0 for mem0:decoder7.0 @ 0
1) cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[1] = cxl_switch_dport.4 for mem4:decoder14.0 @ 1
[..]
cxld_unregister: cxl decoder14.0:
cxl_region_decode_reset: cxl_region region3:
mock_decoder_reset: cxl_port port3: decoder3.0 reset
2) mock_decoder_reset: cxl_port port3: decoder3.0: out of order reset, expected decoder3.1
cxl_endpoint_decoder_release: cxl decoder14.0:
[..]
cxld_unregister: cxl decoder7.0:
3) cxl_region_decode_reset: cxl_region region3:
Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6bc3: 0000 [#1] PREEMPT SMP PTI
[..]
RIP: 0010:to_cxl_port+0x8/0x60 [cxl_core]
[..]
Call Trace:
<TASK>
cxl_region_decode_reset+0x69/0x190 [cxl_core]
cxl_region_detach+0xe8/0x210 [cxl_core]
cxl_decoder_kill_region+0x27/0x40 [cxl_core]
cxld_unregister+0x5d/0x60 [cxl_core]
At 1) a region has been established with 2 endpoint decoders (7.0 and
14.0). Those endpoints share a common switch-decoder in the topology
(3.0). At teardown, 2), decoder14.0 is the first to be removed and hits
the "out of order reset case" in the switch decoder. The effect though
is that region3 cleanup is aborted leaving it in-tact and
referencing decoder14.0. At 3) the second attempt to teardown region3
trips over the stale decoder14.0 object which has long since been
deleted.
The fix here is to recognize that the CXL specification places no
mandate on in-order shutdown of switch-decoders, the driver enforces
in-order allocation, and hardware enforces in-order commit. So, rather
than fail and leave objects dangling, always remove them.
In support of making cxl_region_decode_reset() always succeed,
cxl_region_invalidate_memregion() failures are turned into warnings.
Crashing the kernel is ok there since system integrity is at risk if
caches cannot be managed around physical address mutation events like
CXL region destruction.
A new device_for_each_child_reverse_from() is added to cleanup
port->commit_end after all dependent decoders have been disabled. In
other words if decoders are allocated 0->1->2 and disabled 1->2->0 then
port->commit_end only decrements from 2 after 2 has been disabled, and
it decrements all the way to zero since 1 was disabled previously.
Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1]
Cc: stable@vger.kernel.org
Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware")
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Alison Schofield <alison.schofield@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Zijun Hu <quic_zijuhu@quicinc.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://patch.msgid.link/172964782781.81806.17902885593105284330.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
2024-10-22 18:43:49 -07:00
|
|
|
cxl_region_invalidate_memregion(cxlr);
|
2022-06-08 22:56:37 -07:00
|
|
|
|
|
|
|
for (i = count - 1; i >= 0; i--) {
|
|
|
|
struct cxl_endpoint_decoder *cxled = p->targets[i];
|
|
|
|
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
|
|
|
|
struct cxl_port *iter = cxled_to_port(cxled);
|
2023-04-03 14:44:41 -07:00
|
|
|
struct cxl_dev_state *cxlds = cxlmd->cxlds;
|
2022-06-08 22:56:37 -07:00
|
|
|
struct cxl_ep *ep;
|
|
|
|
|
2023-04-03 14:44:41 -07:00
|
|
|
if (cxlds->rcd)
|
|
|
|
goto endpoint_reset;
|
|
|
|
|
2022-06-08 22:56:37 -07:00
|
|
|
while (!is_cxl_root(to_cxl_port(iter->dev.parent)))
|
|
|
|
iter = to_cxl_port(iter->dev.parent);
|
|
|
|
|
|
|
|
for (ep = cxl_ep_load(iter, cxlmd); iter;
|
|
|
|
iter = ep->next, ep = cxl_ep_load(iter, cxlmd)) {
|
|
|
|
struct cxl_region_ref *cxl_rr;
|
|
|
|
struct cxl_decoder *cxld;
|
|
|
|
|
|
|
|
cxl_rr = cxl_rr_load(iter, cxlr);
|
|
|
|
cxld = cxl_rr->decoder;
|
2022-12-15 17:09:14 +00:00
|
|
|
if (cxld->reset)
|
cxl/port: Fix use-after-free, permit out-of-order decoder shutdown
In support of investigating an initialization failure report [1],
cxl_test was updated to register mock memory-devices after the mock
root-port/bus device had been registered. That led to cxl_test crashing
with a use-after-free bug with the following signature:
cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem0:decoder7.0 @ 0 next: cxl_switch_uport.0 nr_eps: 1 nr_targets: 1
cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem4:decoder14.0 @ 1 next: cxl_switch_uport.0 nr_eps: 2 nr_targets: 1
cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[0] = cxl_switch_dport.0 for mem0:decoder7.0 @ 0
1) cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[1] = cxl_switch_dport.4 for mem4:decoder14.0 @ 1
[..]
cxld_unregister: cxl decoder14.0:
cxl_region_decode_reset: cxl_region region3:
mock_decoder_reset: cxl_port port3: decoder3.0 reset
2) mock_decoder_reset: cxl_port port3: decoder3.0: out of order reset, expected decoder3.1
cxl_endpoint_decoder_release: cxl decoder14.0:
[..]
cxld_unregister: cxl decoder7.0:
3) cxl_region_decode_reset: cxl_region region3:
Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6bc3: 0000 [#1] PREEMPT SMP PTI
[..]
RIP: 0010:to_cxl_port+0x8/0x60 [cxl_core]
[..]
Call Trace:
<TASK>
cxl_region_decode_reset+0x69/0x190 [cxl_core]
cxl_region_detach+0xe8/0x210 [cxl_core]
cxl_decoder_kill_region+0x27/0x40 [cxl_core]
cxld_unregister+0x5d/0x60 [cxl_core]
At 1) a region has been established with 2 endpoint decoders (7.0 and
14.0). Those endpoints share a common switch-decoder in the topology
(3.0). At teardown, 2), decoder14.0 is the first to be removed and hits
the "out of order reset case" in the switch decoder. The effect though
is that region3 cleanup is aborted leaving it in-tact and
referencing decoder14.0. At 3) the second attempt to teardown region3
trips over the stale decoder14.0 object which has long since been
deleted.
The fix here is to recognize that the CXL specification places no
mandate on in-order shutdown of switch-decoders, the driver enforces
in-order allocation, and hardware enforces in-order commit. So, rather
than fail and leave objects dangling, always remove them.
In support of making cxl_region_decode_reset() always succeed,
cxl_region_invalidate_memregion() failures are turned into warnings.
Crashing the kernel is ok there since system integrity is at risk if
caches cannot be managed around physical address mutation events like
CXL region destruction.
A new device_for_each_child_reverse_from() is added to cleanup
port->commit_end after all dependent decoders have been disabled. In
other words if decoders are allocated 0->1->2 and disabled 1->2->0 then
port->commit_end only decrements from 2 after 2 has been disabled, and
it decrements all the way to zero since 1 was disabled previously.
Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1]
Cc: stable@vger.kernel.org
Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware")
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Alison Schofield <alison.schofield@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Zijun Hu <quic_zijuhu@quicinc.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://patch.msgid.link/172964782781.81806.17902885593105284330.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
2024-10-22 18:43:49 -07:00
|
|
|
cxld->reset(cxld);
|
2023-06-16 18:24:34 -07:00
|
|
|
set_bit(CXL_REGION_F_NEEDS_RESET, &cxlr->flags);
|
2022-06-08 22:56:37 -07:00
|
|
|
}
|
|
|
|
|
2023-04-03 14:44:41 -07:00
|
|
|
endpoint_reset:
|
cxl/port: Fix use-after-free, permit out-of-order decoder shutdown
In support of investigating an initialization failure report [1],
cxl_test was updated to register mock memory-devices after the mock
root-port/bus device had been registered. That led to cxl_test crashing
with a use-after-free bug with the following signature:
cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem0:decoder7.0 @ 0 next: cxl_switch_uport.0 nr_eps: 1 nr_targets: 1
cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem4:decoder14.0 @ 1 next: cxl_switch_uport.0 nr_eps: 2 nr_targets: 1
cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[0] = cxl_switch_dport.0 for mem0:decoder7.0 @ 0
1) cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[1] = cxl_switch_dport.4 for mem4:decoder14.0 @ 1
[..]
cxld_unregister: cxl decoder14.0:
cxl_region_decode_reset: cxl_region region3:
mock_decoder_reset: cxl_port port3: decoder3.0 reset
2) mock_decoder_reset: cxl_port port3: decoder3.0: out of order reset, expected decoder3.1
cxl_endpoint_decoder_release: cxl decoder14.0:
[..]
cxld_unregister: cxl decoder7.0:
3) cxl_region_decode_reset: cxl_region region3:
Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6bc3: 0000 [#1] PREEMPT SMP PTI
[..]
RIP: 0010:to_cxl_port+0x8/0x60 [cxl_core]
[..]
Call Trace:
<TASK>
cxl_region_decode_reset+0x69/0x190 [cxl_core]
cxl_region_detach+0xe8/0x210 [cxl_core]
cxl_decoder_kill_region+0x27/0x40 [cxl_core]
cxld_unregister+0x5d/0x60 [cxl_core]
At 1) a region has been established with 2 endpoint decoders (7.0 and
14.0). Those endpoints share a common switch-decoder in the topology
(3.0). At teardown, 2), decoder14.0 is the first to be removed and hits
the "out of order reset case" in the switch decoder. The effect though
is that region3 cleanup is aborted leaving it in-tact and
referencing decoder14.0. At 3) the second attempt to teardown region3
trips over the stale decoder14.0 object which has long since been
deleted.
The fix here is to recognize that the CXL specification places no
mandate on in-order shutdown of switch-decoders, the driver enforces
in-order allocation, and hardware enforces in-order commit. So, rather
than fail and leave objects dangling, always remove them.
In support of making cxl_region_decode_reset() always succeed,
cxl_region_invalidate_memregion() failures are turned into warnings.
Crashing the kernel is ok there since system integrity is at risk if
caches cannot be managed around physical address mutation events like
CXL region destruction.
A new device_for_each_child_reverse_from() is added to cleanup
port->commit_end after all dependent decoders have been disabled. In
other words if decoders are allocated 0->1->2 and disabled 1->2->0 then
port->commit_end only decrements from 2 after 2 has been disabled, and
it decrements all the way to zero since 1 was disabled previously.
Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1]
Cc: stable@vger.kernel.org
Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware")
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Alison Schofield <alison.schofield@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Zijun Hu <quic_zijuhu@quicinc.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://patch.msgid.link/172964782781.81806.17902885593105284330.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
2024-10-22 18:43:49 -07:00
|
|
|
cxled->cxld.reset(&cxled->cxld);
|
2023-06-16 18:24:34 -07:00
|
|
|
set_bit(CXL_REGION_F_NEEDS_RESET, &cxlr->flags);
|
2022-06-08 22:56:37 -07:00
|
|
|
}
|
|
|
|
|
2023-06-16 18:24:34 -07:00
|
|
|
/* all decoders associated with this region have been torn down */
|
|
|
|
clear_bit(CXL_REGION_F_NEEDS_RESET, &cxlr->flags);
|
2022-06-08 22:56:37 -07:00
|
|
|
}
|
|
|
|
|
2022-12-16 17:33:38 -08:00
|
|
|
static int commit_decoder(struct cxl_decoder *cxld)
|
|
|
|
{
|
|
|
|
struct cxl_switch_decoder *cxlsd = NULL;
|
|
|
|
|
|
|
|
if (cxld->commit)
|
|
|
|
return cxld->commit(cxld);
|
|
|
|
|
|
|
|
if (is_switch_decoder(&cxld->dev))
|
|
|
|
cxlsd = to_cxl_switch_decoder(&cxld->dev);
|
|
|
|
|
|
|
|
if (dev_WARN_ONCE(&cxld->dev, !cxlsd || cxlsd->nr_targets > 1,
|
|
|
|
"->commit() is required\n"))
|
|
|
|
return -ENXIO;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2022-06-08 22:56:37 -07:00
|
|
|
static int cxl_region_decode_commit(struct cxl_region *cxlr)
|
|
|
|
{
|
|
|
|
struct cxl_region_params *p = &cxlr->params;
|
2022-08-02 14:47:08 -07:00
|
|
|
int i, rc = 0;
|
2022-06-08 22:56:37 -07:00
|
|
|
|
|
|
|
for (i = 0; i < p->nr_targets; i++) {
|
|
|
|
struct cxl_endpoint_decoder *cxled = p->targets[i];
|
|
|
|
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
|
|
|
|
struct cxl_region_ref *cxl_rr;
|
|
|
|
struct cxl_decoder *cxld;
|
|
|
|
struct cxl_port *iter;
|
|
|
|
struct cxl_ep *ep;
|
|
|
|
|
|
|
|
/* commit bottom up */
|
|
|
|
for (iter = cxled_to_port(cxled); !is_cxl_root(iter);
|
|
|
|
iter = to_cxl_port(iter->dev.parent)) {
|
|
|
|
cxl_rr = cxl_rr_load(iter, cxlr);
|
|
|
|
cxld = cxl_rr->decoder;
|
2022-12-16 17:33:38 -08:00
|
|
|
rc = commit_decoder(cxld);
|
2022-06-08 22:56:37 -07:00
|
|
|
if (rc)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2022-08-02 14:47:08 -07:00
|
|
|
if (rc) {
|
|
|
|
/* programming @iter failed, teardown */
|
|
|
|
for (ep = cxl_ep_load(iter, cxlmd); ep && iter;
|
|
|
|
iter = ep->next, ep = cxl_ep_load(iter, cxlmd)) {
|
|
|
|
cxl_rr = cxl_rr_load(iter, cxlr);
|
|
|
|
cxld = cxl_rr->decoder;
|
2022-12-15 17:09:14 +00:00
|
|
|
if (cxld->reset)
|
|
|
|
cxld->reset(cxld);
|
2022-08-02 14:47:08 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
cxled->cxld.reset(&cxled->cxld);
|
|
|
|
goto err;
|
2022-06-08 22:56:37 -07:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2022-08-02 14:47:08 -07:00
|
|
|
return 0;
|
2022-06-08 22:56:37 -07:00
|
|
|
|
2022-08-02 14:47:08 -07:00
|
|
|
err:
|
2022-06-08 22:56:37 -07:00
|
|
|
/* undo the targets that were successfully committed */
|
|
|
|
cxl_region_decode_reset(cxlr, i);
|
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
|
2025-07-11 16:49:29 -07:00
|
|
|
static int queue_reset(struct cxl_region *cxlr)
|
2022-06-08 22:56:37 -07:00
|
|
|
{
|
|
|
|
struct cxl_region_params *p = &cxlr->params;
|
2025-07-11 16:49:29 -07:00
|
|
|
int rc;
|
2022-06-08 22:56:37 -07:00
|
|
|
|
2025-07-11 16:49:32 -07:00
|
|
|
ACQUIRE(rwsem_write_kill, rwsem)(&cxl_rwsem.region);
|
|
|
|
if ((rc = ACQUIRE_ERR(rwsem_write_kill, &rwsem)))
|
2022-06-08 22:56:37 -07:00
|
|
|
return rc;
|
|
|
|
|
2025-07-11 16:49:29 -07:00
|
|
|
/* Already in the requested state? */
|
|
|
|
if (p->state < CXL_CONFIG_COMMIT)
|
2025-07-11 16:49:32 -07:00
|
|
|
return 0;
|
2025-07-11 16:49:29 -07:00
|
|
|
|
|
|
|
p->state = CXL_CONFIG_RESET_PENDING;
|
|
|
|
|
2025-07-11 16:49:32 -07:00
|
|
|
return 0;
|
2025-07-11 16:49:29 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
static int __commit(struct cxl_region *cxlr)
|
|
|
|
{
|
|
|
|
struct cxl_region_params *p = &cxlr->params;
|
|
|
|
int rc;
|
|
|
|
|
2025-07-11 16:49:32 -07:00
|
|
|
ACQUIRE(rwsem_write_kill, rwsem)(&cxl_rwsem.region);
|
|
|
|
if ((rc = ACQUIRE_ERR(rwsem_write_kill, &rwsem)))
|
2022-06-08 22:56:37 -07:00
|
|
|
return rc;
|
|
|
|
|
|
|
|
/* Already in the requested state? */
|
2025-07-11 16:49:29 -07:00
|
|
|
if (p->state >= CXL_CONFIG_COMMIT)
|
2025-07-11 16:49:32 -07:00
|
|
|
return 0;
|
2022-06-08 22:56:37 -07:00
|
|
|
|
|
|
|
/* Not ready to commit? */
|
2025-07-11 16:49:32 -07:00
|
|
|
if (p->state < CXL_CONFIG_ACTIVE)
|
|
|
|
return -ENXIO;
|
2022-06-08 22:56:37 -07:00
|
|
|
|
2023-06-16 18:24:28 -07:00
|
|
|
/*
|
|
|
|
* Invalidate caches before region setup to drop any speculative
|
|
|
|
* consumption of this address space
|
|
|
|
*/
|
|
|
|
rc = cxl_region_invalidate_memregion(cxlr);
|
|
|
|
if (rc)
|
2025-07-11 16:49:32 -07:00
|
|
|
return rc;
|
2023-06-16 18:24:28 -07:00
|
|
|
|
2025-07-11 16:49:29 -07:00
|
|
|
rc = cxl_region_decode_commit(cxlr);
|
2025-07-11 16:49:32 -07:00
|
|
|
if (rc)
|
|
|
|
return rc;
|
2022-06-08 22:56:37 -07:00
|
|
|
|
2025-07-11 16:49:32 -07:00
|
|
|
p->state = CXL_CONFIG_COMMIT;
|
2022-06-08 22:56:37 -07:00
|
|
|
|
2025-07-11 16:49:32 -07:00
|
|
|
return 0;
|
2025-07-11 16:49:29 -07:00
|
|
|
}
|
2022-06-08 22:56:37 -07:00
|
|
|
|
2025-07-11 16:49:29 -07:00
|
|
|
static ssize_t commit_store(struct device *dev, struct device_attribute *attr,
|
|
|
|
const char *buf, size_t len)
|
|
|
|
{
|
|
|
|
struct cxl_region *cxlr = to_cxl_region(dev);
|
|
|
|
struct cxl_region_params *p = &cxlr->params;
|
|
|
|
bool commit;
|
|
|
|
ssize_t rc;
|
|
|
|
|
|
|
|
rc = kstrtobool(buf, &commit);
|
2022-06-08 22:56:37 -07:00
|
|
|
if (rc)
|
|
|
|
return rc;
|
2025-07-11 16:49:29 -07:00
|
|
|
|
|
|
|
if (commit) {
|
|
|
|
rc = __commit(cxlr);
|
|
|
|
if (rc)
|
|
|
|
return rc;
|
|
|
|
return len;
|
|
|
|
}
|
|
|
|
|
|
|
|
rc = queue_reset(cxlr);
|
2022-06-08 22:56:37 -07:00
|
|
|
if (rc)
|
|
|
|
return rc;
|
2025-07-11 16:49:29 -07:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Unmap the region and depend the reset-pending state to ensure
|
|
|
|
* it does not go active again until post reset
|
|
|
|
*/
|
|
|
|
device_release_driver(&cxlr->dev);
|
|
|
|
|
|
|
|
/*
|
2025-07-11 16:49:32 -07:00
|
|
|
* With the reset pending take cxl_rwsem.region unconditionally
|
2025-07-11 16:49:29 -07:00
|
|
|
* to ensure the reset gets handled before returning.
|
|
|
|
*/
|
2025-07-11 16:49:32 -07:00
|
|
|
guard(rwsem_write)(&cxl_rwsem.region);
|
2025-07-11 16:49:29 -07:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Revalidate that the reset is still pending in case another
|
|
|
|
* thread already handled this reset.
|
|
|
|
*/
|
|
|
|
if (p->state == CXL_CONFIG_RESET_PENDING) {
|
|
|
|
cxl_region_decode_reset(cxlr, p->interleave_ways);
|
|
|
|
p->state = CXL_CONFIG_ACTIVE;
|
|
|
|
}
|
|
|
|
|
2022-06-08 22:56:37 -07:00
|
|
|
return len;
|
|
|
|
}
|
|
|
|
|
|
|
|
static ssize_t commit_show(struct device *dev, struct device_attribute *attr,
|
|
|
|
char *buf)
|
|
|
|
{
|
|
|
|
struct cxl_region *cxlr = to_cxl_region(dev);
|
|
|
|
struct cxl_region_params *p = &cxlr->params;
|
|
|
|
ssize_t rc;
|
|
|
|
|
2025-07-11 16:49:32 -07:00
|
|
|
ACQUIRE(rwsem_read_intr, rwsem)(&cxl_rwsem.region);
|
|
|
|
if ((rc = ACQUIRE_ERR(rwsem_read_intr, &rwsem)))
|
2022-06-08 22:56:37 -07:00
|
|
|
return rc;
|
2025-07-11 16:49:32 -07:00
|
|
|
return sysfs_emit(buf, "%d\n", p->state >= CXL_CONFIG_COMMIT);
|
2022-06-08 22:56:37 -07:00
|
|
|
}
|
|
|
|
static DEVICE_ATTR_RW(commit);
|
|
|
|
|
2021-05-27 13:30:41 -07:00
|
|
|
static umode_t cxl_region_visible(struct kobject *kobj, struct attribute *a,
|
|
|
|
int n)
|
|
|
|
{
|
|
|
|
struct device *dev = kobj_to_dev(kobj);
|
|
|
|
struct cxl_region *cxlr = to_cxl_region(dev);
|
|
|
|
|
2023-02-10 01:05:45 -08:00
|
|
|
/*
|
|
|
|
* Support tooling that expects to find a 'uuid' attribute for all
|
|
|
|
* regions regardless of mode.
|
|
|
|
*/
|
2025-02-03 20:24:29 -08:00
|
|
|
if (a == &dev_attr_uuid.attr && cxlr->mode != CXL_PARTMODE_PMEM)
|
2023-02-10 01:05:45 -08:00
|
|
|
return 0444;
|
2021-05-27 13:30:41 -07:00
|
|
|
return a->mode;
|
|
|
|
}
|
|
|
|
|
2022-04-25 11:36:48 -07:00
|
|
|
static ssize_t interleave_ways_show(struct device *dev,
|
|
|
|
struct device_attribute *attr, char *buf)
|
|
|
|
{
|
|
|
|
struct cxl_region *cxlr = to_cxl_region(dev);
|
|
|
|
struct cxl_region_params *p = &cxlr->params;
|
2025-07-11 16:49:32 -07:00
|
|
|
int rc;
|
2022-04-25 11:36:48 -07:00
|
|
|
|
2025-07-11 16:49:32 -07:00
|
|
|
ACQUIRE(rwsem_read_intr, rwsem)(&cxl_rwsem.region);
|
|
|
|
if ((rc = ACQUIRE_ERR(rwsem_read_intr, &rwsem)))
|
2022-04-25 11:36:48 -07:00
|
|
|
return rc;
|
2025-07-11 16:49:32 -07:00
|
|
|
return sysfs_emit(buf, "%d\n", p->interleave_ways);
|
2022-04-25 11:36:48 -07:00
|
|
|
}
|
|
|
|
|
2022-06-04 15:49:53 -07:00
|
|
|
static const struct attribute_group *get_cxl_region_target_group(void);
|
|
|
|
|
2022-04-25 11:36:48 -07:00
|
|
|
static ssize_t interleave_ways_store(struct device *dev,
|
|
|
|
struct device_attribute *attr,
|
|
|
|
const char *buf, size_t len)
|
|
|
|
{
|
|
|
|
struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev->parent);
|
|
|
|
struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
|
|
|
|
struct cxl_region *cxlr = to_cxl_region(dev);
|
|
|
|
struct cxl_region_params *p = &cxlr->params;
|
2022-08-01 13:20:12 +03:00
|
|
|
unsigned int val, save;
|
|
|
|
int rc;
|
2022-04-25 11:36:48 -07:00
|
|
|
u8 iw;
|
|
|
|
|
2022-08-01 13:20:12 +03:00
|
|
|
rc = kstrtouint(buf, 0, &val);
|
2022-04-25 11:36:48 -07:00
|
|
|
if (rc)
|
|
|
|
return rc;
|
|
|
|
|
2022-12-05 14:20:01 -07:00
|
|
|
rc = ways_to_eiw(val, &iw);
|
2022-04-25 11:36:48 -07:00
|
|
|
if (rc)
|
|
|
|
return rc;
|
|
|
|
|
|
|
|
/*
|
2023-11-03 20:18:34 +00:00
|
|
|
* Even for x3, x6, and x12 interleaves the region interleave must be a
|
2022-04-25 11:36:48 -07:00
|
|
|
* power of 2 multiple of the host bridge interleave.
|
|
|
|
*/
|
|
|
|
if (!is_power_of_2(val / cxld->interleave_ways) ||
|
|
|
|
(val % cxld->interleave_ways)) {
|
|
|
|
dev_dbg(&cxlr->dev, "invalid interleave: %d\n", val);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2025-07-11 16:49:32 -07:00
|
|
|
ACQUIRE(rwsem_write_kill, rwsem)(&cxl_rwsem.region);
|
|
|
|
if ((rc = ACQUIRE_ERR(rwsem_write_kill, &rwsem)))
|
2022-04-25 11:36:48 -07:00
|
|
|
return rc;
|
2025-07-11 16:49:32 -07:00
|
|
|
|
|
|
|
if (p->state >= CXL_CONFIG_INTERLEAVE_ACTIVE)
|
|
|
|
return -EBUSY;
|
2022-04-25 11:36:48 -07:00
|
|
|
|
2022-06-04 15:49:53 -07:00
|
|
|
save = p->interleave_ways;
|
2022-04-25 11:36:48 -07:00
|
|
|
p->interleave_ways = val;
|
2022-06-04 15:49:53 -07:00
|
|
|
rc = sysfs_update_group(&cxlr->dev.kobj, get_cxl_region_target_group());
|
2025-07-11 16:49:32 -07:00
|
|
|
if (rc) {
|
2022-06-04 15:49:53 -07:00
|
|
|
p->interleave_ways = save;
|
2022-04-25 11:36:48 -07:00
|
|
|
return rc;
|
2025-07-11 16:49:32 -07:00
|
|
|
}
|
|
|
|
|
2022-04-25 11:36:48 -07:00
|
|
|
return len;
|
|
|
|
}
|
|
|
|
static DEVICE_ATTR_RW(interleave_ways);
|
|
|
|
|
|
|
|
static ssize_t interleave_granularity_show(struct device *dev,
|
|
|
|
struct device_attribute *attr,
|
|
|
|
char *buf)
|
|
|
|
{
|
|
|
|
struct cxl_region *cxlr = to_cxl_region(dev);
|
|
|
|
struct cxl_region_params *p = &cxlr->params;
|
2025-07-11 16:49:32 -07:00
|
|
|
int rc;
|
2022-04-25 11:36:48 -07:00
|
|
|
|
2025-07-11 16:49:32 -07:00
|
|
|
ACQUIRE(rwsem_read_intr, rwsem)(&cxl_rwsem.region);
|
|
|
|
if ((rc = ACQUIRE_ERR(rwsem_read_intr, &rwsem)))
|
2022-04-25 11:36:48 -07:00
|
|
|
return rc;
|
2025-07-11 16:49:32 -07:00
|
|
|
return sysfs_emit(buf, "%d\n", p->interleave_granularity);
|
2022-04-25 11:36:48 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
static ssize_t interleave_granularity_store(struct device *dev,
|
|
|
|
struct device_attribute *attr,
|
|
|
|
const char *buf, size_t len)
|
|
|
|
{
|
|
|
|
struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev->parent);
|
|
|
|
struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
|
|
|
|
struct cxl_region *cxlr = to_cxl_region(dev);
|
|
|
|
struct cxl_region_params *p = &cxlr->params;
|
|
|
|
int rc, val;
|
|
|
|
u16 ig;
|
|
|
|
|
|
|
|
rc = kstrtoint(buf, 0, &val);
|
|
|
|
if (rc)
|
|
|
|
return rc;
|
|
|
|
|
2022-12-05 14:16:07 -07:00
|
|
|
rc = granularity_to_eig(val, &ig);
|
2022-04-25 11:36:48 -07:00
|
|
|
if (rc)
|
|
|
|
return rc;
|
|
|
|
|
|
|
|
/*
|
2022-08-05 13:27:51 -07:00
|
|
|
* When the host-bridge is interleaved, disallow region granularity !=
|
|
|
|
* root granularity. Regions with a granularity less than the root
|
|
|
|
* interleave result in needing multiple endpoints to support a single
|
2023-01-24 19:22:21 -08:00
|
|
|
* slot in the interleave (possible to support in the future). Regions
|
2022-08-05 13:27:51 -07:00
|
|
|
* with a granularity greater than the root interleave result in invalid
|
|
|
|
* DPA translations (invalid to support).
|
2022-04-25 11:36:48 -07:00
|
|
|
*/
|
2022-08-05 13:27:51 -07:00
|
|
|
if (cxld->interleave_ways > 1 && val != cxld->interleave_granularity)
|
2022-04-25 11:36:48 -07:00
|
|
|
return -EINVAL;
|
|
|
|
|
2025-07-11 16:49:32 -07:00
|
|
|
ACQUIRE(rwsem_write_kill, rwsem)(&cxl_rwsem.region);
|
|
|
|
if ((rc = ACQUIRE_ERR(rwsem_write_kill, &rwsem)))
|
2022-04-25 11:36:48 -07:00
|
|
|
return rc;
|
2025-07-11 16:49:32 -07:00
|
|
|
|
|
|
|
if (p->state >= CXL_CONFIG_INTERLEAVE_ACTIVE)
|
|
|
|
return -EBUSY;
|
2022-04-25 11:36:48 -07:00
|
|
|
|
|
|
|
p->interleave_granularity = val;
|
2025-07-11 16:49:32 -07:00
|
|
|
|
2022-04-25 11:36:48 -07:00
|
|
|
return len;
|
|
|
|
}
|
|
|
|
static DEVICE_ATTR_RW(interleave_granularity);
|
|
|
|
|
2022-04-25 11:43:44 -07:00
|
|
|
static ssize_t resource_show(struct device *dev, struct device_attribute *attr,
|
|
|
|
char *buf)
|
|
|
|
{
|
|
|
|
struct cxl_region *cxlr = to_cxl_region(dev);
|
|
|
|
struct cxl_region_params *p = &cxlr->params;
|
|
|
|
u64 resource = -1ULL;
|
2025-07-11 16:49:32 -07:00
|
|
|
int rc;
|
2022-04-25 11:43:44 -07:00
|
|
|
|
2025-07-11 16:49:32 -07:00
|
|
|
ACQUIRE(rwsem_read_intr, rwsem)(&cxl_rwsem.region);
|
|
|
|
if ((rc = ACQUIRE_ERR(rwsem_read_intr, &rwsem)))
|
2022-04-25 11:43:44 -07:00
|
|
|
return rc;
|
2025-07-11 16:49:32 -07:00
|
|
|
|
2022-04-25 11:43:44 -07:00
|
|
|
if (p->res)
|
|
|
|
resource = p->res->start;
|
2025-07-11 16:49:32 -07:00
|
|
|
return sysfs_emit(buf, "%#llx\n", resource);
|
2022-04-25 11:43:44 -07:00
|
|
|
}
|
|
|
|
static DEVICE_ATTR_RO(resource);
|
|
|
|
|
2023-02-10 01:05:39 -08:00
|
|
|
static ssize_t mode_show(struct device *dev, struct device_attribute *attr,
|
|
|
|
char *buf)
|
|
|
|
{
|
|
|
|
struct cxl_region *cxlr = to_cxl_region(dev);
|
2025-02-03 20:24:29 -08:00
|
|
|
const char *desc;
|
2023-02-10 01:05:39 -08:00
|
|
|
|
2025-02-03 20:24:29 -08:00
|
|
|
if (cxlr->mode == CXL_PARTMODE_RAM)
|
|
|
|
desc = "ram";
|
|
|
|
else if (cxlr->mode == CXL_PARTMODE_PMEM)
|
|
|
|
desc = "pmem";
|
|
|
|
else
|
|
|
|
desc = "";
|
|
|
|
|
|
|
|
return sysfs_emit(buf, "%s\n", desc);
|
2023-02-10 01:05:39 -08:00
|
|
|
}
|
|
|
|
static DEVICE_ATTR_RO(mode);
|
|
|
|
|
2022-04-25 11:43:44 -07:00
|
|
|
static int alloc_hpa(struct cxl_region *cxlr, resource_size_t size)
|
|
|
|
{
|
|
|
|
struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
|
|
|
|
struct cxl_region_params *p = &cxlr->params;
|
|
|
|
struct resource *res;
|
2024-01-24 17:15:26 +08:00
|
|
|
u64 remainder = 0;
|
2022-04-25 11:43:44 -07:00
|
|
|
|
2025-07-11 16:49:32 -07:00
|
|
|
lockdep_assert_held_write(&cxl_rwsem.region);
|
2022-04-25 11:43:44 -07:00
|
|
|
|
|
|
|
/* Nothing to do... */
|
2022-08-01 13:19:27 +03:00
|
|
|
if (p->res && resource_size(p->res) == size)
|
2022-04-25 11:43:44 -07:00
|
|
|
return 0;
|
|
|
|
|
|
|
|
/* To change size the old size must be freed first */
|
|
|
|
if (p->res)
|
|
|
|
return -EBUSY;
|
|
|
|
|
|
|
|
if (p->state >= CXL_CONFIG_INTERLEAVE_ACTIVE)
|
|
|
|
return -EBUSY;
|
|
|
|
|
|
|
|
/* ways, granularity and uuid (if PMEM) need to be set before HPA */
|
|
|
|
if (!p->interleave_ways || !p->interleave_granularity ||
|
2025-02-03 20:24:29 -08:00
|
|
|
(cxlr->mode == CXL_PARTMODE_PMEM && uuid_is_null(&p->uuid)))
|
2022-04-25 11:43:44 -07:00
|
|
|
return -ENXIO;
|
|
|
|
|
2024-01-24 17:15:26 +08:00
|
|
|
div64_u64_rem(size, (u64)SZ_256M * p->interleave_ways, &remainder);
|
2022-04-25 11:43:44 -07:00
|
|
|
if (remainder)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
res = alloc_free_mem_region(cxlrd->res, size, SZ_256M,
|
|
|
|
dev_name(&cxlr->dev));
|
|
|
|
if (IS_ERR(res)) {
|
2023-12-22 16:47:40 -08:00
|
|
|
dev_dbg(&cxlr->dev,
|
2024-01-02 09:39:17 -08:00
|
|
|
"HPA allocation error (%ld) for size:%pap in %s %pr\n",
|
|
|
|
PTR_ERR(res), &size, cxlrd->res->name, cxlrd->res);
|
2022-04-25 11:43:44 -07:00
|
|
|
return PTR_ERR(res);
|
|
|
|
}
|
|
|
|
|
|
|
|
p->res = res;
|
|
|
|
p->state = CXL_CONFIG_INTERLEAVE_ACTIVE;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void cxl_region_iomem_release(struct cxl_region *cxlr)
|
|
|
|
{
|
|
|
|
struct cxl_region_params *p = &cxlr->params;
|
|
|
|
|
|
|
|
if (device_is_registered(&cxlr->dev))
|
2025-07-11 16:49:32 -07:00
|
|
|
lockdep_assert_held_write(&cxl_rwsem.region);
|
2022-04-25 11:43:44 -07:00
|
|
|
if (p->res) {
|
2023-02-10 17:31:17 -08:00
|
|
|
/*
|
|
|
|
* Autodiscovered regions may not have been able to insert their
|
|
|
|
* resource.
|
|
|
|
*/
|
|
|
|
if (p->res->parent)
|
|
|
|
remove_resource(p->res);
|
2022-04-25 11:43:44 -07:00
|
|
|
kfree(p->res);
|
|
|
|
p->res = NULL;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static int free_hpa(struct cxl_region *cxlr)
|
|
|
|
{
|
|
|
|
struct cxl_region_params *p = &cxlr->params;
|
|
|
|
|
2025-07-11 16:49:32 -07:00
|
|
|
lockdep_assert_held_write(&cxl_rwsem.region);
|
2022-04-25 11:43:44 -07:00
|
|
|
|
|
|
|
if (!p->res)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
if (p->state >= CXL_CONFIG_ACTIVE)
|
|
|
|
return -EBUSY;
|
|
|
|
|
|
|
|
cxl_region_iomem_release(cxlr);
|
|
|
|
p->state = CXL_CONFIG_IDLE;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static ssize_t size_store(struct device *dev, struct device_attribute *attr,
|
|
|
|
const char *buf, size_t len)
|
|
|
|
{
|
|
|
|
struct cxl_region *cxlr = to_cxl_region(dev);
|
|
|
|
u64 val;
|
|
|
|
int rc;
|
|
|
|
|
|
|
|
rc = kstrtou64(buf, 0, &val);
|
|
|
|
if (rc)
|
|
|
|
return rc;
|
|
|
|
|
2025-07-11 16:49:32 -07:00
|
|
|
ACQUIRE(rwsem_write_kill, rwsem)(&cxl_rwsem.region);
|
|
|
|
if ((rc = ACQUIRE_ERR(rwsem_write_kill, &rwsem)))
|
2022-04-25 11:43:44 -07:00
|
|
|
return rc;
|
|
|
|
|
|
|
|
if (val)
|
|
|
|
rc = alloc_hpa(cxlr, val);
|
|
|
|
else
|
|
|
|
rc = free_hpa(cxlr);
|
|
|
|
|
|
|
|
if (rc)
|
|
|
|
return rc;
|
|
|
|
|
|
|
|
return len;
|
|
|
|
}
|
|
|
|
|
|
|
|
static ssize_t size_show(struct device *dev, struct device_attribute *attr,
|
|
|
|
char *buf)
|
|
|
|
{
|
|
|
|
struct cxl_region *cxlr = to_cxl_region(dev);
|
|
|
|
struct cxl_region_params *p = &cxlr->params;
|
|
|
|
u64 size = 0;
|
|
|
|
ssize_t rc;
|
|
|
|
|
2025-07-11 16:49:32 -07:00
|
|
|
ACQUIRE(rwsem_read_intr, rwsem)(&cxl_rwsem.region);
|
|
|
|
if ((rc = ACQUIRE_ERR(rwsem_read_intr, &rwsem)))
|
2022-04-25 11:43:44 -07:00
|
|
|
return rc;
|
|
|
|
if (p->res)
|
|
|
|
size = resource_size(p->res);
|
2025-07-11 16:49:32 -07:00
|
|
|
return sysfs_emit(buf, "%#llx\n", size);
|
2022-04-25 11:43:44 -07:00
|
|
|
}
|
|
|
|
static DEVICE_ATTR_RW(size);
|
|
|
|
|
2021-05-27 13:30:41 -07:00
|
|
|
static struct attribute *cxl_region_attrs[] = {
|
|
|
|
&dev_attr_uuid.attr,
|
2022-06-08 22:56:37 -07:00
|
|
|
&dev_attr_commit.attr,
|
2022-04-25 11:36:48 -07:00
|
|
|
&dev_attr_interleave_ways.attr,
|
|
|
|
&dev_attr_interleave_granularity.attr,
|
2022-04-25 11:43:44 -07:00
|
|
|
&dev_attr_resource.attr,
|
|
|
|
&dev_attr_size.attr,
|
2023-02-10 01:05:39 -08:00
|
|
|
&dev_attr_mode.attr,
|
2021-05-27 13:30:41 -07:00
|
|
|
NULL,
|
|
|
|
};
|
|
|
|
|
|
|
|
static const struct attribute_group cxl_region_group = {
|
|
|
|
.attrs = cxl_region_attrs,
|
|
|
|
.is_visible = cxl_region_visible,
|
|
|
|
};
|
|
|
|
|
2022-06-04 15:49:53 -07:00
|
|
|
static size_t show_targetN(struct cxl_region *cxlr, char *buf, int pos)
|
|
|
|
{
|
|
|
|
struct cxl_region_params *p = &cxlr->params;
|
|
|
|
struct cxl_endpoint_decoder *cxled;
|
|
|
|
int rc;
|
|
|
|
|
2025-07-11 16:49:32 -07:00
|
|
|
ACQUIRE(rwsem_read_intr, rwsem)(&cxl_rwsem.region);
|
|
|
|
if ((rc = ACQUIRE_ERR(rwsem_read_intr, &rwsem)))
|
2022-06-04 15:49:53 -07:00
|
|
|
return rc;
|
|
|
|
|
|
|
|
if (pos >= p->interleave_ways) {
|
|
|
|
dev_dbg(&cxlr->dev, "position %d out of range %d\n", pos,
|
|
|
|
p->interleave_ways);
|
2025-07-11 16:49:32 -07:00
|
|
|
return -ENXIO;
|
2022-06-04 15:49:53 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
cxled = p->targets[pos];
|
|
|
|
if (!cxled)
|
2025-07-11 16:49:32 -07:00
|
|
|
return sysfs_emit(buf, "\n");
|
|
|
|
return sysfs_emit(buf, "%s\n", dev_name(&cxled->cxld.dev));
|
2022-06-04 15:49:53 -07:00
|
|
|
}
|
|
|
|
|
2025-01-05 16:34:07 +08:00
|
|
|
static int check_commit_order(struct device *dev, void *data)
|
2024-10-22 18:43:57 -07:00
|
|
|
{
|
|
|
|
struct cxl_decoder *cxld = to_cxl_decoder(dev);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* if port->commit_end is not the only free decoder, then out of
|
|
|
|
* order shutdown has occurred, block further allocations until
|
|
|
|
* that is resolved
|
|
|
|
*/
|
|
|
|
if (((cxld->flags & CXL_DECODER_F_ENABLE) == 0))
|
|
|
|
return -EBUSY;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
driver core: Constify API device_find_child() and adapt for various usages
Constify the following API:
struct device *device_find_child(struct device *dev, void *data,
int (*match)(struct device *dev, void *data));
To :
struct device *device_find_child(struct device *dev, const void *data,
device_match_t match);
typedef int (*device_match_t)(struct device *dev, const void *data);
with the following reasons:
- Protect caller's match data @*data which is for comparison and lookup
and the API does not actually need to modify @*data.
- Make the API's parameters (@match)() and @data have the same type as
all of other device finding APIs (bus|class|driver)_find_device().
- All kinds of existing device match functions can be directly taken
as the API's argument, they were exported by driver core.
Constify the API and adapt for various existing usages.
BTW, various subsystem changes are squashed into this commit to meet
'git bisect' requirement, and this commit has the minimal and simplest
changes to complement squashing shortcoming, and that may bring extra
code improvement.
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Acked-by: Uwe Kleine-König <ukleinek@kernel.org> # for drivers/pwm
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Link: https://lore.kernel.org/r/20241224-const_dfc_done-v5-4-6623037414d4@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-24 21:05:03 +08:00
|
|
|
static int match_free_decoder(struct device *dev, const void *data)
|
2022-06-07 10:56:10 -07:00
|
|
|
{
|
2024-10-22 18:43:57 -07:00
|
|
|
struct cxl_port *port = to_cxl_port(dev->parent);
|
2022-06-07 10:56:10 -07:00
|
|
|
struct cxl_decoder *cxld;
|
2024-10-22 18:43:57 -07:00
|
|
|
int rc;
|
2022-06-07 10:56:10 -07:00
|
|
|
|
|
|
|
if (!is_switch_decoder(dev))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
cxld = to_cxl_decoder(dev);
|
|
|
|
|
2024-10-22 18:43:57 -07:00
|
|
|
if (cxld->id != port->commit_end + 1)
|
2022-06-07 10:56:10 -07:00
|
|
|
return 0;
|
|
|
|
|
2024-10-22 18:43:57 -07:00
|
|
|
if (cxld->region) {
|
|
|
|
dev_dbg(dev->parent,
|
|
|
|
"next decoder to commit (%s) is already reserved (%s)\n",
|
|
|
|
dev_name(dev), dev_name(&cxld->region->dev));
|
|
|
|
return 0;
|
|
|
|
}
|
2022-06-07 10:56:10 -07:00
|
|
|
|
2024-10-22 18:43:57 -07:00
|
|
|
rc = device_for_each_child_reverse_from(dev->parent, dev, NULL,
|
|
|
|
check_commit_order);
|
|
|
|
if (rc) {
|
|
|
|
dev_dbg(dev->parent,
|
|
|
|
"unable to allocate %s due to out of order shutdown\n",
|
|
|
|
dev_name(dev));
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
return 1;
|
2022-06-07 10:56:10 -07:00
|
|
|
}
|
|
|
|
|
2025-02-26 09:21:19 -07:00
|
|
|
static bool region_res_match_cxl_range(const struct cxl_region_params *p,
|
|
|
|
struct range *range)
|
|
|
|
{
|
|
|
|
if (!p->res)
|
|
|
|
return false;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If an extended linear cache region then the CXL range is assumed
|
|
|
|
* to be fronted by the DRAM range in current known implementation.
|
|
|
|
* This assumption will be made until a variant implementation exists.
|
|
|
|
*/
|
|
|
|
return p->res->start + p->cache_size == range->start &&
|
|
|
|
p->res->end == range->end;
|
|
|
|
}
|
|
|
|
|
driver core: Constify API device_find_child() and adapt for various usages
Constify the following API:
struct device *device_find_child(struct device *dev, void *data,
int (*match)(struct device *dev, void *data));
To :
struct device *device_find_child(struct device *dev, const void *data,
device_match_t match);
typedef int (*device_match_t)(struct device *dev, const void *data);
with the following reasons:
- Protect caller's match data @*data which is for comparison and lookup
and the API does not actually need to modify @*data.
- Make the API's parameters (@match)() and @data have the same type as
all of other device finding APIs (bus|class|driver)_find_device().
- All kinds of existing device match functions can be directly taken
as the API's argument, they were exported by driver core.
Constify the API and adapt for various existing usages.
BTW, various subsystem changes are squashed into this commit to meet
'git bisect' requirement, and this commit has the minimal and simplest
changes to complement squashing shortcoming, and that may bring extra
code improvement.
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Acked-by: Uwe Kleine-König <ukleinek@kernel.org> # for drivers/pwm
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Link: https://lore.kernel.org/r/20241224-const_dfc_done-v5-4-6623037414d4@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-24 21:05:03 +08:00
|
|
|
static int match_auto_decoder(struct device *dev, const void *data)
|
2023-09-05 14:10:07 -07:00
|
|
|
{
|
driver core: Constify API device_find_child() and adapt for various usages
Constify the following API:
struct device *device_find_child(struct device *dev, void *data,
int (*match)(struct device *dev, void *data));
To :
struct device *device_find_child(struct device *dev, const void *data,
device_match_t match);
typedef int (*device_match_t)(struct device *dev, const void *data);
with the following reasons:
- Protect caller's match data @*data which is for comparison and lookup
and the API does not actually need to modify @*data.
- Make the API's parameters (@match)() and @data have the same type as
all of other device finding APIs (bus|class|driver)_find_device().
- All kinds of existing device match functions can be directly taken
as the API's argument, they were exported by driver core.
Constify the API and adapt for various existing usages.
BTW, various subsystem changes are squashed into this commit to meet
'git bisect' requirement, and this commit has the minimal and simplest
changes to complement squashing shortcoming, and that may bring extra
code improvement.
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Acked-by: Uwe Kleine-König <ukleinek@kernel.org> # for drivers/pwm
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Link: https://lore.kernel.org/r/20241224-const_dfc_done-v5-4-6623037414d4@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-24 21:05:03 +08:00
|
|
|
const struct cxl_region_params *p = data;
|
2023-09-05 14:10:07 -07:00
|
|
|
struct cxl_decoder *cxld;
|
|
|
|
struct range *r;
|
|
|
|
|
|
|
|
if (!is_switch_decoder(dev))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
cxld = to_cxl_decoder(dev);
|
|
|
|
r = &cxld->hpa_range;
|
|
|
|
|
2025-02-26 09:21:19 -07:00
|
|
|
if (region_res_match_cxl_range(p, r))
|
2023-09-05 14:10:07 -07:00
|
|
|
return 1;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2025-05-09 17:06:50 +02:00
|
|
|
/**
|
|
|
|
* cxl_port_pick_region_decoder() - assign or lookup a decoder for a region
|
|
|
|
* @port: a port in the ancestry of the endpoint implied by @cxled
|
|
|
|
* @cxled: endpoint decoder to be, or currently, mapped by @port
|
|
|
|
* @cxlr: region to establish, or validate, decode @port
|
|
|
|
*
|
|
|
|
* In the region creation path cxl_port_pick_region_decoder() is an
|
|
|
|
* allocator to find a free port. In the region assembly path, it is
|
|
|
|
* recalling the decoder that platform firmware picked for validation
|
|
|
|
* purposes.
|
|
|
|
*
|
|
|
|
* The result is recorded in a 'struct cxl_region_ref' in @port.
|
|
|
|
*/
|
2024-01-31 13:59:30 -08:00
|
|
|
static struct cxl_decoder *
|
2025-05-09 17:06:50 +02:00
|
|
|
cxl_port_pick_region_decoder(struct cxl_port *port,
|
|
|
|
struct cxl_endpoint_decoder *cxled,
|
|
|
|
struct cxl_region *cxlr)
|
2022-06-07 10:56:10 -07:00
|
|
|
{
|
|
|
|
struct device *dev;
|
|
|
|
|
2024-01-31 13:59:30 -08:00
|
|
|
if (port == cxled_to_port(cxled))
|
|
|
|
return &cxled->cxld;
|
|
|
|
|
2023-09-05 14:10:07 -07:00
|
|
|
if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags))
|
|
|
|
dev = device_find_child(&port->dev, &cxlr->params,
|
|
|
|
match_auto_decoder);
|
|
|
|
else
|
2024-10-22 18:43:57 -07:00
|
|
|
dev = device_find_child(&port->dev, NULL, match_free_decoder);
|
2022-06-07 10:56:10 -07:00
|
|
|
if (!dev)
|
|
|
|
return NULL;
|
|
|
|
/*
|
|
|
|
* This decoder is pinned registered as long as the endpoint decoder is
|
|
|
|
* registered, and endpoint decoder unregistration holds the
|
2025-07-11 16:49:32 -07:00
|
|
|
* cxl_rwsem.region over unregister events, so no need to hold on to
|
2022-06-07 10:56:10 -07:00
|
|
|
* this extra reference.
|
|
|
|
*/
|
|
|
|
put_device(dev);
|
|
|
|
return to_cxl_decoder(dev);
|
|
|
|
}
|
|
|
|
|
2024-01-31 13:59:31 -08:00
|
|
|
static bool auto_order_ok(struct cxl_port *port, struct cxl_region *cxlr_iter,
|
|
|
|
struct cxl_decoder *cxld)
|
|
|
|
{
|
|
|
|
struct cxl_region_ref *rr = cxl_rr_load(port, cxlr_iter);
|
|
|
|
struct cxl_decoder *cxld_iter = rr->decoder;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Allow the out of order assembly of auto-discovered regions.
|
|
|
|
* Per CXL Spec 3.1 8.2.4.20.12 software must commit decoders
|
|
|
|
* in HPA order. Confirm that the decoder with the lesser HPA
|
|
|
|
* starting address has the lesser id.
|
|
|
|
*/
|
|
|
|
dev_dbg(&cxld->dev, "check for HPA violation %s:%d < %s:%d\n",
|
|
|
|
dev_name(&cxld->dev), cxld->id,
|
|
|
|
dev_name(&cxld_iter->dev), cxld_iter->id);
|
|
|
|
|
|
|
|
if (cxld_iter->id > cxld->id)
|
|
|
|
return true;
|
|
|
|
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct cxl_region_ref *
|
|
|
|
alloc_region_ref(struct cxl_port *port, struct cxl_region *cxlr,
|
2025-05-09 17:06:51 +02:00
|
|
|
struct cxl_endpoint_decoder *cxled,
|
|
|
|
struct cxl_decoder *cxld)
|
2022-06-07 10:56:10 -07:00
|
|
|
{
|
2022-08-01 12:55:30 -07:00
|
|
|
struct cxl_region_params *p = &cxlr->params;
|
|
|
|
struct cxl_region_ref *cxl_rr, *iter;
|
|
|
|
unsigned long index;
|
2022-06-07 10:56:10 -07:00
|
|
|
int rc;
|
|
|
|
|
2022-08-01 12:55:30 -07:00
|
|
|
xa_for_each(&port->regions, index, iter) {
|
|
|
|
struct cxl_region_params *ip = &iter->region->params;
|
|
|
|
|
2024-01-31 13:59:31 -08:00
|
|
|
if (!ip->res || ip->res->start < p->res->start)
|
2022-11-03 17:30:24 -07:00
|
|
|
continue;
|
|
|
|
|
2024-01-31 13:59:31 -08:00
|
|
|
if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) {
|
|
|
|
if (auto_order_ok(port, iter->region, cxld))
|
|
|
|
continue;
|
2022-08-01 12:55:30 -07:00
|
|
|
}
|
2024-01-31 13:59:31 -08:00
|
|
|
dev_dbg(&cxlr->dev, "%s: HPA order violation %s:%pr vs %pr\n",
|
|
|
|
dev_name(&port->dev),
|
|
|
|
dev_name(&iter->region->dev), ip->res, p->res);
|
|
|
|
|
|
|
|
return ERR_PTR(-EBUSY);
|
2022-08-01 12:55:30 -07:00
|
|
|
}
|
|
|
|
|
2022-06-07 10:56:10 -07:00
|
|
|
cxl_rr = kzalloc(sizeof(*cxl_rr), GFP_KERNEL);
|
|
|
|
if (!cxl_rr)
|
2022-08-01 12:55:30 -07:00
|
|
|
return ERR_PTR(-ENOMEM);
|
2022-06-07 10:56:10 -07:00
|
|
|
cxl_rr->port = port;
|
|
|
|
cxl_rr->region = cxlr;
|
2022-06-06 15:18:31 -07:00
|
|
|
cxl_rr->nr_targets = 1;
|
2022-06-07 10:56:10 -07:00
|
|
|
xa_init(&cxl_rr->endpoints);
|
|
|
|
|
|
|
|
rc = xa_insert(&port->regions, (unsigned long)cxlr, cxl_rr, GFP_KERNEL);
|
|
|
|
if (rc) {
|
|
|
|
dev_dbg(&cxlr->dev,
|
|
|
|
"%s: failed to track region reference: %d\n",
|
|
|
|
dev_name(&port->dev), rc);
|
|
|
|
kfree(cxl_rr);
|
2022-08-01 12:55:30 -07:00
|
|
|
return ERR_PTR(rc);
|
2022-06-07 10:56:10 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
return cxl_rr;
|
|
|
|
}
|
|
|
|
|
cxl/region: Fix decoder allocation crash
When an intermediate port's decoders have been exhausted by existing
regions, and creating a new region with the port in question in it's
hierarchical path is attempted, cxl_port_attach_region() fails to find a
port decoder (as would be expected), and drops into the failure / cleanup
path.
However, during cleanup of the region reference, a sanity check attempts
to dereference the decoder, which in the above case didn't exist. This
causes a NULL pointer dereference BUG.
To fix this, refactor the decoder allocation and de-allocation into
helper routines, and in this 'free' routine, check that the decoder,
@cxld, is valid before attempting any operations on it.
Cc: <stable@vger.kernel.org>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Fixes: 384e624bb211 ("cxl/region: Attach endpoint decoders")
Link: https://lore.kernel.org/r/20221101074100.1732003-1-vishal.l.verma@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-11-01 01:41:00 -06:00
|
|
|
static void cxl_rr_free_decoder(struct cxl_region_ref *cxl_rr)
|
2022-06-07 10:56:10 -07:00
|
|
|
{
|
|
|
|
struct cxl_region *cxlr = cxl_rr->region;
|
|
|
|
struct cxl_decoder *cxld = cxl_rr->decoder;
|
|
|
|
|
cxl/region: Fix decoder allocation crash
When an intermediate port's decoders have been exhausted by existing
regions, and creating a new region with the port in question in it's
hierarchical path is attempted, cxl_port_attach_region() fails to find a
port decoder (as would be expected), and drops into the failure / cleanup
path.
However, during cleanup of the region reference, a sanity check attempts
to dereference the decoder, which in the above case didn't exist. This
causes a NULL pointer dereference BUG.
To fix this, refactor the decoder allocation and de-allocation into
helper routines, and in this 'free' routine, check that the decoder,
@cxld, is valid before attempting any operations on it.
Cc: <stable@vger.kernel.org>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Fixes: 384e624bb211 ("cxl/region: Attach endpoint decoders")
Link: https://lore.kernel.org/r/20221101074100.1732003-1-vishal.l.verma@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-11-01 01:41:00 -06:00
|
|
|
if (!cxld)
|
|
|
|
return;
|
|
|
|
|
2022-06-07 10:56:10 -07:00
|
|
|
dev_WARN_ONCE(&cxlr->dev, cxld->region != cxlr, "region mismatch\n");
|
|
|
|
if (cxld->region == cxlr) {
|
|
|
|
cxld->region = NULL;
|
|
|
|
put_device(&cxlr->dev);
|
|
|
|
}
|
cxl/region: Fix decoder allocation crash
When an intermediate port's decoders have been exhausted by existing
regions, and creating a new region with the port in question in it's
hierarchical path is attempted, cxl_port_attach_region() fails to find a
port decoder (as would be expected), and drops into the failure / cleanup
path.
However, during cleanup of the region reference, a sanity check attempts
to dereference the decoder, which in the above case didn't exist. This
causes a NULL pointer dereference BUG.
To fix this, refactor the decoder allocation and de-allocation into
helper routines, and in this 'free' routine, check that the decoder,
@cxld, is valid before attempting any operations on it.
Cc: <stable@vger.kernel.org>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Fixes: 384e624bb211 ("cxl/region: Attach endpoint decoders")
Link: https://lore.kernel.org/r/20221101074100.1732003-1-vishal.l.verma@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-11-01 01:41:00 -06:00
|
|
|
}
|
|
|
|
|
|
|
|
static void free_region_ref(struct cxl_region_ref *cxl_rr)
|
|
|
|
{
|
|
|
|
struct cxl_port *port = cxl_rr->port;
|
|
|
|
struct cxl_region *cxlr = cxl_rr->region;
|
2022-06-07 10:56:10 -07:00
|
|
|
|
cxl/region: Fix decoder allocation crash
When an intermediate port's decoders have been exhausted by existing
regions, and creating a new region with the port in question in it's
hierarchical path is attempted, cxl_port_attach_region() fails to find a
port decoder (as would be expected), and drops into the failure / cleanup
path.
However, during cleanup of the region reference, a sanity check attempts
to dereference the decoder, which in the above case didn't exist. This
causes a NULL pointer dereference BUG.
To fix this, refactor the decoder allocation and de-allocation into
helper routines, and in this 'free' routine, check that the decoder,
@cxld, is valid before attempting any operations on it.
Cc: <stable@vger.kernel.org>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Fixes: 384e624bb211 ("cxl/region: Attach endpoint decoders")
Link: https://lore.kernel.org/r/20221101074100.1732003-1-vishal.l.verma@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-11-01 01:41:00 -06:00
|
|
|
cxl_rr_free_decoder(cxl_rr);
|
2022-06-07 10:56:10 -07:00
|
|
|
xa_erase(&port->regions, (unsigned long)cxlr);
|
|
|
|
xa_destroy(&cxl_rr->endpoints);
|
|
|
|
kfree(cxl_rr);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int cxl_rr_ep_add(struct cxl_region_ref *cxl_rr,
|
|
|
|
struct cxl_endpoint_decoder *cxled)
|
|
|
|
{
|
|
|
|
int rc;
|
|
|
|
struct cxl_port *port = cxl_rr->port;
|
|
|
|
struct cxl_region *cxlr = cxl_rr->region;
|
|
|
|
struct cxl_decoder *cxld = cxl_rr->decoder;
|
|
|
|
struct cxl_ep *ep = cxl_ep_load(port, cxled_to_memdev(cxled));
|
|
|
|
|
2022-06-06 15:18:31 -07:00
|
|
|
if (ep) {
|
|
|
|
rc = xa_insert(&cxl_rr->endpoints, (unsigned long)cxled, ep,
|
|
|
|
GFP_KERNEL);
|
|
|
|
if (rc)
|
|
|
|
return rc;
|
|
|
|
}
|
2022-06-07 10:56:10 -07:00
|
|
|
cxl_rr->nr_eps++;
|
|
|
|
|
|
|
|
if (!cxld->region) {
|
|
|
|
cxld->region = cxlr;
|
|
|
|
get_device(&cxlr->dev);
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2025-05-09 17:06:51 +02:00
|
|
|
static int cxl_rr_assign_decoder(struct cxl_port *port, struct cxl_region *cxlr,
|
|
|
|
struct cxl_endpoint_decoder *cxled,
|
|
|
|
struct cxl_region_ref *cxl_rr,
|
|
|
|
struct cxl_decoder *cxld)
|
cxl/region: Fix decoder allocation crash
When an intermediate port's decoders have been exhausted by existing
regions, and creating a new region with the port in question in it's
hierarchical path is attempted, cxl_port_attach_region() fails to find a
port decoder (as would be expected), and drops into the failure / cleanup
path.
However, during cleanup of the region reference, a sanity check attempts
to dereference the decoder, which in the above case didn't exist. This
causes a NULL pointer dereference BUG.
To fix this, refactor the decoder allocation and de-allocation into
helper routines, and in this 'free' routine, check that the decoder,
@cxld, is valid before attempting any operations on it.
Cc: <stable@vger.kernel.org>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Fixes: 384e624bb211 ("cxl/region: Attach endpoint decoders")
Link: https://lore.kernel.org/r/20221101074100.1732003-1-vishal.l.verma@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-11-01 01:41:00 -06:00
|
|
|
{
|
|
|
|
if (cxld->region) {
|
|
|
|
dev_dbg(&cxlr->dev, "%s: %s already attached to %s\n",
|
|
|
|
dev_name(&port->dev), dev_name(&cxld->dev),
|
|
|
|
dev_name(&cxld->region->dev));
|
|
|
|
return -EBUSY;
|
|
|
|
}
|
|
|
|
|
2023-06-14 18:30:25 -07:00
|
|
|
/*
|
|
|
|
* Endpoints should already match the region type, but backstop that
|
|
|
|
* assumption with an assertion. Switch-decoders change mapping-type
|
|
|
|
* based on what is mapped when they are assigned to a region.
|
|
|
|
*/
|
|
|
|
dev_WARN_ONCE(&cxlr->dev,
|
|
|
|
port == cxled_to_port(cxled) &&
|
|
|
|
cxld->target_type != cxlr->type,
|
|
|
|
"%s:%s mismatch decoder type %d -> %d\n",
|
|
|
|
dev_name(&cxled_to_memdev(cxled)->dev),
|
|
|
|
dev_name(&cxld->dev), cxld->target_type, cxlr->type);
|
|
|
|
cxld->target_type = cxlr->type;
|
cxl/region: Fix decoder allocation crash
When an intermediate port's decoders have been exhausted by existing
regions, and creating a new region with the port in question in it's
hierarchical path is attempted, cxl_port_attach_region() fails to find a
port decoder (as would be expected), and drops into the failure / cleanup
path.
However, during cleanup of the region reference, a sanity check attempts
to dereference the decoder, which in the above case didn't exist. This
causes a NULL pointer dereference BUG.
To fix this, refactor the decoder allocation and de-allocation into
helper routines, and in this 'free' routine, check that the decoder,
@cxld, is valid before attempting any operations on it.
Cc: <stable@vger.kernel.org>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Fixes: 384e624bb211 ("cxl/region: Attach endpoint decoders")
Link: https://lore.kernel.org/r/20221101074100.1732003-1-vishal.l.verma@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-11-01 01:41:00 -06:00
|
|
|
cxl_rr->decoder = cxld;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2022-06-07 10:56:10 -07:00
|
|
|
/**
|
|
|
|
* cxl_port_attach_region() - track a region's interest in a port by endpoint
|
|
|
|
* @port: port to add a new region reference 'struct cxl_region_ref'
|
|
|
|
* @cxlr: region to attach to @port
|
|
|
|
* @cxled: endpoint decoder used to create or further pin a region reference
|
|
|
|
* @pos: interleave position of @cxled in @cxlr
|
|
|
|
*
|
|
|
|
* The attach event is an opportunity to validate CXL decode setup
|
|
|
|
* constraints and record metadata needed for programming HDM decoders,
|
|
|
|
* in particular decoder target lists.
|
|
|
|
*
|
|
|
|
* The steps are:
|
2022-08-04 14:54:46 +07:00
|
|
|
*
|
2022-06-07 10:56:10 -07:00
|
|
|
* - validate that there are no other regions with a higher HPA already
|
|
|
|
* associated with @port
|
|
|
|
* - establish a region reference if one is not already present
|
2022-08-04 14:54:46 +07:00
|
|
|
*
|
2022-06-07 10:56:10 -07:00
|
|
|
* - additionally allocate a decoder instance that will host @cxlr on
|
|
|
|
* @port
|
2022-08-04 14:54:46 +07:00
|
|
|
*
|
2022-06-07 10:56:10 -07:00
|
|
|
* - pin the region reference by the endpoint
|
|
|
|
* - account for how many entries in @port's target list are needed to
|
|
|
|
* cover all of the added endpoints.
|
2022-06-04 15:49:53 -07:00
|
|
|
*/
|
2022-06-07 10:56:10 -07:00
|
|
|
static int cxl_port_attach_region(struct cxl_port *port,
|
|
|
|
struct cxl_region *cxlr,
|
|
|
|
struct cxl_endpoint_decoder *cxled, int pos)
|
|
|
|
{
|
|
|
|
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
|
|
|
|
struct cxl_ep *ep = cxl_ep_load(port, cxlmd);
|
2022-08-01 12:55:30 -07:00
|
|
|
struct cxl_region_ref *cxl_rr;
|
|
|
|
bool nr_targets_inc = false;
|
|
|
|
struct cxl_decoder *cxld;
|
2022-06-07 10:56:10 -07:00
|
|
|
unsigned long index;
|
|
|
|
int rc = -EBUSY;
|
|
|
|
|
2025-07-11 16:49:32 -07:00
|
|
|
lockdep_assert_held_write(&cxl_rwsem.region);
|
2022-06-07 10:56:10 -07:00
|
|
|
|
2022-08-01 12:55:30 -07:00
|
|
|
cxl_rr = cxl_rr_load(port, cxlr);
|
2022-06-07 10:56:10 -07:00
|
|
|
if (cxl_rr) {
|
|
|
|
struct cxl_ep *ep_iter;
|
|
|
|
int found = 0;
|
|
|
|
|
2022-08-01 12:55:30 -07:00
|
|
|
/*
|
|
|
|
* Walk the existing endpoints that have been attached to
|
|
|
|
* @cxlr at @port and see if they share the same 'next' port
|
|
|
|
* in the downstream direction. I.e. endpoints that share common
|
|
|
|
* upstream switch.
|
|
|
|
*/
|
2022-06-07 10:56:10 -07:00
|
|
|
xa_for_each(&cxl_rr->endpoints, index, ep_iter) {
|
|
|
|
if (ep_iter == ep)
|
|
|
|
continue;
|
|
|
|
if (ep_iter->next == ep->next) {
|
|
|
|
found++;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2022-08-01 12:55:30 -07:00
|
|
|
* New target port, or @port is an endpoint port that always
|
|
|
|
* accounts its own local decode as a target.
|
2022-06-07 10:56:10 -07:00
|
|
|
*/
|
2022-08-01 12:55:30 -07:00
|
|
|
if (!found || !ep->next) {
|
2022-06-07 10:56:10 -07:00
|
|
|
cxl_rr->nr_targets++;
|
2022-08-01 12:55:30 -07:00
|
|
|
nr_targets_inc = true;
|
|
|
|
}
|
2022-06-07 10:56:10 -07:00
|
|
|
} else {
|
2025-05-09 17:06:51 +02:00
|
|
|
struct cxl_decoder *cxld;
|
|
|
|
|
|
|
|
cxld = cxl_port_pick_region_decoder(port, cxled, cxlr);
|
|
|
|
if (!cxld) {
|
|
|
|
dev_dbg(&cxlr->dev, "%s: no decoder available\n",
|
|
|
|
dev_name(&port->dev));
|
|
|
|
return -EBUSY;
|
|
|
|
}
|
|
|
|
|
|
|
|
cxl_rr = alloc_region_ref(port, cxlr, cxled, cxld);
|
2022-08-01 12:55:30 -07:00
|
|
|
if (IS_ERR(cxl_rr)) {
|
2022-06-07 10:56:10 -07:00
|
|
|
dev_dbg(&cxlr->dev,
|
|
|
|
"%s: failed to allocate region reference\n",
|
|
|
|
dev_name(&port->dev));
|
2022-08-01 12:55:30 -07:00
|
|
|
return PTR_ERR(cxl_rr);
|
2022-06-07 10:56:10 -07:00
|
|
|
}
|
2022-08-01 12:55:30 -07:00
|
|
|
nr_targets_inc = true;
|
2022-06-07 10:56:10 -07:00
|
|
|
|
2025-05-09 17:06:51 +02:00
|
|
|
rc = cxl_rr_assign_decoder(port, cxlr, cxled, cxl_rr, cxld);
|
cxl/region: Fix decoder allocation crash
When an intermediate port's decoders have been exhausted by existing
regions, and creating a new region with the port in question in it's
hierarchical path is attempted, cxl_port_attach_region() fails to find a
port decoder (as would be expected), and drops into the failure / cleanup
path.
However, during cleanup of the region reference, a sanity check attempts
to dereference the decoder, which in the above case didn't exist. This
causes a NULL pointer dereference BUG.
To fix this, refactor the decoder allocation and de-allocation into
helper routines, and in this 'free' routine, check that the decoder,
@cxld, is valid before attempting any operations on it.
Cc: <stable@vger.kernel.org>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Fixes: 384e624bb211 ("cxl/region: Attach endpoint decoders")
Link: https://lore.kernel.org/r/20221101074100.1732003-1-vishal.l.verma@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-11-01 01:41:00 -06:00
|
|
|
if (rc)
|
2022-06-07 10:56:10 -07:00
|
|
|
goto out_erase;
|
|
|
|
}
|
cxl/region: Fix decoder allocation crash
When an intermediate port's decoders have been exhausted by existing
regions, and creating a new region with the port in question in it's
hierarchical path is attempted, cxl_port_attach_region() fails to find a
port decoder (as would be expected), and drops into the failure / cleanup
path.
However, during cleanup of the region reference, a sanity check attempts
to dereference the decoder, which in the above case didn't exist. This
causes a NULL pointer dereference BUG.
To fix this, refactor the decoder allocation and de-allocation into
helper routines, and in this 'free' routine, check that the decoder,
@cxld, is valid before attempting any operations on it.
Cc: <stable@vger.kernel.org>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Fixes: 384e624bb211 ("cxl/region: Attach endpoint decoders")
Link: https://lore.kernel.org/r/20221101074100.1732003-1-vishal.l.verma@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-11-01 01:41:00 -06:00
|
|
|
cxld = cxl_rr->decoder;
|
2022-06-07 10:56:10 -07:00
|
|
|
|
cxl/region: check interleave capability
Since interleave capability is not verified, if the interleave
capability of a target does not match the region need, committing decoder
should have failed at the device end.
In order to checkout this error as quickly as possible, driver needs
to check the interleave capability of target during attaching it to
region.
Per CXL specification r3.1(8.2.4.20.1 CXL HDM Decoder Capability Register),
bits 11 and 12 indicate the capability to establish interleaving in 3, 6,
12 and 16 ways. If these bits are not set, the target cannot be attached to
a region utilizing such interleave ways.
Additionally, bits 8 and 9 represent the capability of the bits used for
interleaving in the address, Linux tracks this in the cxl_port
interleave_mask.
Per CXL specification r3.1(8.2.4.20.13 Decoder Protection):
eIW means encoded Interleave Ways.
eIG means encoded Interleave Granularity.
in HPA:
if eIW is 0 or 8 (interleave ways: 1, 3), all the bits of HPA are used,
the interleave bits are none, the following check is ignored.
if eIW is less than 8 (interleave ways: 2, 4, 8, 16), the interleave bits
start at bit position eIG + 8 and end at eIG + eIW + 8 - 1.
if eIW is greater than 8 (interleave ways: 6, 12), the interleave bits
start at bit position eIG + 8 and end at eIG + eIW - 1.
if the interleave mask is insufficient to cover the required interleave
bits, the target cannot be attached to the region.
Fixes: 384e624bb211 ("cxl/region: Attach endpoint decoders")
Signed-off-by: Yao Xingtao <yaoxt.fnst@fujitsu.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://patch.msgid.link/20240614084755.59503-2-yaoxt.fnst@fujitsu.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-06-14 04:47:54 -04:00
|
|
|
/*
|
|
|
|
* the number of targets should not exceed the target_count
|
|
|
|
* of the decoder
|
|
|
|
*/
|
|
|
|
if (is_switch_decoder(&cxld->dev)) {
|
|
|
|
struct cxl_switch_decoder *cxlsd;
|
|
|
|
|
|
|
|
cxlsd = to_cxl_switch_decoder(&cxld->dev);
|
|
|
|
if (cxl_rr->nr_targets > cxlsd->nr_targets) {
|
|
|
|
dev_dbg(&cxlr->dev,
|
|
|
|
"%s:%s %s add: %s:%s @ %d overflows targets: %d\n",
|
|
|
|
dev_name(port->uport_dev), dev_name(&port->dev),
|
|
|
|
dev_name(&cxld->dev), dev_name(&cxlmd->dev),
|
|
|
|
dev_name(&cxled->cxld.dev), pos,
|
|
|
|
cxlsd->nr_targets);
|
|
|
|
rc = -ENXIO;
|
|
|
|
goto out_erase;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2022-06-07 10:56:10 -07:00
|
|
|
rc = cxl_rr_ep_add(cxl_rr, cxled);
|
|
|
|
if (rc) {
|
|
|
|
dev_dbg(&cxlr->dev,
|
|
|
|
"%s: failed to track endpoint %s:%s reference\n",
|
|
|
|
dev_name(&port->dev), dev_name(&cxlmd->dev),
|
|
|
|
dev_name(&cxld->dev));
|
|
|
|
goto out_erase;
|
|
|
|
}
|
|
|
|
|
2022-06-06 15:18:31 -07:00
|
|
|
dev_dbg(&cxlr->dev,
|
|
|
|
"%s:%s %s add: %s:%s @ %d next: %s nr_eps: %d nr_targets: %d\n",
|
2023-06-22 15:55:01 -05:00
|
|
|
dev_name(port->uport_dev), dev_name(&port->dev),
|
2022-06-06 15:18:31 -07:00
|
|
|
dev_name(&cxld->dev), dev_name(&cxlmd->dev),
|
|
|
|
dev_name(&cxled->cxld.dev), pos,
|
2023-06-22 15:55:01 -05:00
|
|
|
ep ? ep->next ? dev_name(ep->next->uport_dev) :
|
2022-06-06 15:18:31 -07:00
|
|
|
dev_name(&cxlmd->dev) :
|
|
|
|
"none",
|
|
|
|
cxl_rr->nr_eps, cxl_rr->nr_targets);
|
|
|
|
|
2022-06-07 10:56:10 -07:00
|
|
|
return 0;
|
|
|
|
out_erase:
|
2022-08-01 12:55:30 -07:00
|
|
|
if (nr_targets_inc)
|
|
|
|
cxl_rr->nr_targets--;
|
2022-06-07 10:56:10 -07:00
|
|
|
if (cxl_rr->nr_eps == 0)
|
|
|
|
free_region_ref(cxl_rr);
|
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void cxl_port_detach_region(struct cxl_port *port,
|
|
|
|
struct cxl_region *cxlr,
|
|
|
|
struct cxl_endpoint_decoder *cxled)
|
|
|
|
{
|
|
|
|
struct cxl_region_ref *cxl_rr;
|
2022-06-06 15:18:31 -07:00
|
|
|
struct cxl_ep *ep = NULL;
|
2022-06-07 10:56:10 -07:00
|
|
|
|
2025-07-11 16:49:32 -07:00
|
|
|
lockdep_assert_held_write(&cxl_rwsem.region);
|
2022-06-07 10:56:10 -07:00
|
|
|
|
|
|
|
cxl_rr = cxl_rr_load(port, cxlr);
|
|
|
|
if (!cxl_rr)
|
|
|
|
return;
|
|
|
|
|
2022-06-06 15:18:31 -07:00
|
|
|
/*
|
|
|
|
* Endpoint ports do not carry cxl_ep references, and they
|
|
|
|
* never target more than one endpoint by definition
|
|
|
|
*/
|
|
|
|
if (cxl_rr->decoder == &cxled->cxld)
|
|
|
|
cxl_rr->nr_eps--;
|
|
|
|
else
|
|
|
|
ep = xa_erase(&cxl_rr->endpoints, (unsigned long)cxled);
|
2022-06-07 10:56:10 -07:00
|
|
|
if (ep) {
|
|
|
|
struct cxl_ep *ep_iter;
|
|
|
|
unsigned long index;
|
|
|
|
int found = 0;
|
|
|
|
|
|
|
|
cxl_rr->nr_eps--;
|
|
|
|
xa_for_each(&cxl_rr->endpoints, index, ep_iter) {
|
|
|
|
if (ep_iter->next == ep->next) {
|
|
|
|
found++;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (!found)
|
|
|
|
cxl_rr->nr_targets--;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (cxl_rr->nr_eps == 0)
|
|
|
|
free_region_ref(cxl_rr);
|
|
|
|
}
|
|
|
|
|
2022-06-06 15:18:31 -07:00
|
|
|
static int check_last_peer(struct cxl_endpoint_decoder *cxled,
|
|
|
|
struct cxl_ep *ep, struct cxl_region_ref *cxl_rr,
|
|
|
|
int distance)
|
|
|
|
{
|
|
|
|
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
|
|
|
|
struct cxl_region *cxlr = cxl_rr->region;
|
|
|
|
struct cxl_region_params *p = &cxlr->params;
|
|
|
|
struct cxl_endpoint_decoder *cxled_peer;
|
|
|
|
struct cxl_port *port = cxl_rr->port;
|
|
|
|
struct cxl_memdev *cxlmd_peer;
|
|
|
|
struct cxl_ep *ep_peer;
|
|
|
|
int pos = cxled->pos;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If this position wants to share a dport with the last endpoint mapped
|
|
|
|
* then that endpoint, at index 'position - distance', must also be
|
|
|
|
* mapped by this dport.
|
|
|
|
*/
|
|
|
|
if (pos < distance) {
|
|
|
|
dev_dbg(&cxlr->dev, "%s:%s: cannot host %s:%s at %d\n",
|
2023-06-22 15:55:01 -05:00
|
|
|
dev_name(port->uport_dev), dev_name(&port->dev),
|
2022-06-06 15:18:31 -07:00
|
|
|
dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev), pos);
|
|
|
|
return -ENXIO;
|
|
|
|
}
|
|
|
|
cxled_peer = p->targets[pos - distance];
|
|
|
|
cxlmd_peer = cxled_to_memdev(cxled_peer);
|
|
|
|
ep_peer = cxl_ep_load(port, cxlmd_peer);
|
|
|
|
if (ep->dport != ep_peer->dport) {
|
|
|
|
dev_dbg(&cxlr->dev,
|
|
|
|
"%s:%s: %s:%s pos %d mismatched peer %s:%s\n",
|
2023-06-22 15:55:01 -05:00
|
|
|
dev_name(port->uport_dev), dev_name(&port->dev),
|
2022-06-06 15:18:31 -07:00
|
|
|
dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev), pos,
|
|
|
|
dev_name(&cxlmd_peer->dev),
|
|
|
|
dev_name(&cxled_peer->cxld.dev));
|
|
|
|
return -ENXIO;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
cxl/region: check interleave capability
Since interleave capability is not verified, if the interleave
capability of a target does not match the region need, committing decoder
should have failed at the device end.
In order to checkout this error as quickly as possible, driver needs
to check the interleave capability of target during attaching it to
region.
Per CXL specification r3.1(8.2.4.20.1 CXL HDM Decoder Capability Register),
bits 11 and 12 indicate the capability to establish interleaving in 3, 6,
12 and 16 ways. If these bits are not set, the target cannot be attached to
a region utilizing such interleave ways.
Additionally, bits 8 and 9 represent the capability of the bits used for
interleaving in the address, Linux tracks this in the cxl_port
interleave_mask.
Per CXL specification r3.1(8.2.4.20.13 Decoder Protection):
eIW means encoded Interleave Ways.
eIG means encoded Interleave Granularity.
in HPA:
if eIW is 0 or 8 (interleave ways: 1, 3), all the bits of HPA are used,
the interleave bits are none, the following check is ignored.
if eIW is less than 8 (interleave ways: 2, 4, 8, 16), the interleave bits
start at bit position eIG + 8 and end at eIG + eIW + 8 - 1.
if eIW is greater than 8 (interleave ways: 6, 12), the interleave bits
start at bit position eIG + 8 and end at eIG + eIW - 1.
if the interleave mask is insufficient to cover the required interleave
bits, the target cannot be attached to the region.
Fixes: 384e624bb211 ("cxl/region: Attach endpoint decoders")
Signed-off-by: Yao Xingtao <yaoxt.fnst@fujitsu.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://patch.msgid.link/20240614084755.59503-2-yaoxt.fnst@fujitsu.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-06-14 04:47:54 -04:00
|
|
|
static int check_interleave_cap(struct cxl_decoder *cxld, int iw, int ig)
|
|
|
|
{
|
|
|
|
struct cxl_port *port = to_cxl_port(cxld->dev.parent);
|
|
|
|
struct cxl_hdm *cxlhdm = dev_get_drvdata(&port->dev);
|
|
|
|
unsigned int interleave_mask;
|
|
|
|
u8 eiw;
|
|
|
|
u16 eig;
|
|
|
|
int high_pos, low_pos;
|
|
|
|
|
|
|
|
if (!test_bit(iw, &cxlhdm->iw_cap_mask))
|
|
|
|
return -ENXIO;
|
|
|
|
/*
|
|
|
|
* Per CXL specification r3.1(8.2.4.20.13 Decoder Protection),
|
|
|
|
* if eiw < 8:
|
|
|
|
* DPAOFFSET[51: eig + 8] = HPAOFFSET[51: eig + 8 + eiw]
|
|
|
|
* DPAOFFSET[eig + 7: 0] = HPAOFFSET[eig + 7: 0]
|
|
|
|
*
|
|
|
|
* when the eiw is 0, all the bits of HPAOFFSET[51: 0] are used, the
|
|
|
|
* interleave bits are none.
|
|
|
|
*
|
|
|
|
* if eiw >= 8:
|
|
|
|
* DPAOFFSET[51: eig + 8] = HPAOFFSET[51: eig + eiw] / 3
|
|
|
|
* DPAOFFSET[eig + 7: 0] = HPAOFFSET[eig + 7: 0]
|
|
|
|
*
|
|
|
|
* when the eiw is 8, all the bits of HPAOFFSET[51: 0] are used, the
|
|
|
|
* interleave bits are none.
|
|
|
|
*/
|
|
|
|
ways_to_eiw(iw, &eiw);
|
|
|
|
if (eiw == 0 || eiw == 8)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
granularity_to_eig(ig, &eig);
|
|
|
|
if (eiw > 8)
|
|
|
|
high_pos = eiw + eig - 1;
|
|
|
|
else
|
|
|
|
high_pos = eiw + eig + 7;
|
|
|
|
low_pos = eig + 8;
|
|
|
|
interleave_mask = GENMASK(high_pos, low_pos);
|
|
|
|
if (interleave_mask & ~cxlhdm->interleave_mask)
|
|
|
|
return -ENXIO;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2022-06-06 15:18:31 -07:00
|
|
|
static int cxl_port_setup_targets(struct cxl_port *port,
|
|
|
|
struct cxl_region *cxlr,
|
|
|
|
struct cxl_endpoint_decoder *cxled)
|
|
|
|
{
|
|
|
|
struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
|
|
|
|
int parent_iw, parent_ig, ig, iw, rc, inc = 0, pos = cxled->pos;
|
|
|
|
struct cxl_port *parent_port = to_cxl_port(port->dev.parent);
|
|
|
|
struct cxl_region_ref *cxl_rr = cxl_rr_load(port, cxlr);
|
|
|
|
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
|
|
|
|
struct cxl_ep *ep = cxl_ep_load(port, cxlmd);
|
|
|
|
struct cxl_region_params *p = &cxlr->params;
|
|
|
|
struct cxl_decoder *cxld = cxl_rr->decoder;
|
|
|
|
struct cxl_switch_decoder *cxlsd;
|
cxl/region: Fix region creation for greater than x2 switches
The cxl_port_setup_targets() algorithm fails to identify valid target list
ordering in the presence of 4-way and above switches resulting in
'cxl create-region' failures of the form:
$ cxl create-region -d decoder0.0 -g 1024 -s 2G -t ram -w 8 -m mem4 mem1 mem6 mem3 mem2 mem5 mem7 mem0
cxl region: create_region: region0: failed to set target7 to mem0
cxl region: cmd_create_region: created 0 regions
[kernel debug message]
check_last_peer:1213: cxl region0: pci0000:0c:port1: cannot host mem6:decoder7.0 at 2
bus_remove_device:574: bus: 'cxl': remove device region0
QEMU can create this failing topology:
ACPI0017:00 [root0]
|
HB_0 [port1]
/ \
RP_0 RP_1
| |
USP [port2] USP [port3]
/ / \ \ / / \ \
DSP DSP DSP DSP DSP DSP DSP DSP
| | | | | | | |
mem4 mem6 mem2 mem7 mem1 mem3 mem5 mem0
Pos: 0 2 4 6 1 3 5 7
HB: Host Bridge
RP: Root Port
USP: Upstream Port
DSP: Downstream Port
...with the following command steps:
$ qemu-system-x86_64 -machine q35,cxl=on,accel=tcg \
-smp cpus=8 \
-m 8G \
-hda /home/work/vm-images/centos-stream8-02.qcow2 \
-object memory-backend-ram,size=4G,id=m0 \
-object memory-backend-ram,size=4G,id=m1 \
-object memory-backend-ram,size=2G,id=cxl-mem0 \
-object memory-backend-ram,size=2G,id=cxl-mem1 \
-object memory-backend-ram,size=2G,id=cxl-mem2 \
-object memory-backend-ram,size=2G,id=cxl-mem3 \
-object memory-backend-ram,size=2G,id=cxl-mem4 \
-object memory-backend-ram,size=2G,id=cxl-mem5 \
-object memory-backend-ram,size=2G,id=cxl-mem6 \
-object memory-backend-ram,size=2G,id=cxl-mem7 \
-numa node,memdev=m0,cpus=0-3,nodeid=0 \
-numa node,memdev=m1,cpus=4-7,nodeid=1 \
-netdev user,id=net0,hostfwd=tcp::2222-:22 \
-device virtio-net-pci,netdev=net0 \
-device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
-device cxl-rp,port=0,bus=cxl.1,id=root_port0,chassis=0,slot=0 \
-device cxl-rp,port=1,bus=cxl.1,id=root_port1,chassis=0,slot=1 \
-device cxl-upstream,bus=root_port0,id=us0 \
-device cxl-downstream,port=0,bus=us0,id=swport0,chassis=0,slot=4 \
-device cxl-type3,bus=swport0,volatile-memdev=cxl-mem0,id=cxl-vmem0 \
-device cxl-downstream,port=1,bus=us0,id=swport1,chassis=0,slot=5 \
-device cxl-type3,bus=swport1,volatile-memdev=cxl-mem1,id=cxl-vmem1 \
-device cxl-downstream,port=2,bus=us0,id=swport2,chassis=0,slot=6 \
-device cxl-type3,bus=swport2,volatile-memdev=cxl-mem2,id=cxl-vmem2 \
-device cxl-downstream,port=3,bus=us0,id=swport3,chassis=0,slot=7 \
-device cxl-type3,bus=swport3,volatile-memdev=cxl-mem3,id=cxl-vmem3 \
-device cxl-upstream,bus=root_port1,id=us1 \
-device cxl-downstream,port=4,bus=us1,id=swport4,chassis=0,slot=8 \
-device cxl-type3,bus=swport4,volatile-memdev=cxl-mem4,id=cxl-vmem4 \
-device cxl-downstream,port=5,bus=us1,id=swport5,chassis=0,slot=9 \
-device cxl-type3,bus=swport5,volatile-memdev=cxl-mem5,id=cxl-vmem5 \
-device cxl-downstream,port=6,bus=us1,id=swport6,chassis=0,slot=10 \
-device cxl-type3,bus=swport6,volatile-memdev=cxl-mem6,id=cxl-vmem6 \
-device cxl-downstream,port=7,bus=us1,id=swport7,chassis=0,slot=11 \
-device cxl-type3,bus=swport7,volatile-memdev=cxl-mem7,id=cxl-vmem7 \
-M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=32G &
In Guest OS:
$ cxl create-region -d decoder0.0 -g 1024 -s 2G -t ram -w 8 -m mem4 mem1 mem6 mem3 mem2 mem5 mem7 mem0
Fix the method to calculate @distance by iterativeley multiplying the
number of targets per switch port. This also follows the algorithm
recommended here [1].
Fixes: 27b3f8d13830 ("cxl/region: Program target lists")
Link: http://lore.kernel.org/6538824b52349_7258329466@dwillia2-xfh.jf.intel.com.notmuch [1]
Signed-off-by: Huaisheng Ye <huaisheng.ye@intel.com>
Tested-by: Li Zhijian <lizhijian@fujitsu.com>
[djbw: add a comment explaining 'distance']
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Link: https://patch.msgid.link/173378716722.1270362.9546805175813426729.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-12-09 15:33:02 -08:00
|
|
|
struct cxl_port *iter = port;
|
2022-06-06 15:18:31 -07:00
|
|
|
u16 eig, peig;
|
|
|
|
u8 eiw, peiw;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* While root level decoders support x3, x6, x12, switch level
|
|
|
|
* decoders only support powers of 2 up to x16.
|
|
|
|
*/
|
|
|
|
if (!is_power_of_2(cxl_rr->nr_targets)) {
|
|
|
|
dev_dbg(&cxlr->dev, "%s:%s: invalid target count %d\n",
|
2023-06-22 15:55:01 -05:00
|
|
|
dev_name(port->uport_dev), dev_name(&port->dev),
|
2022-06-06 15:18:31 -07:00
|
|
|
cxl_rr->nr_targets);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
cxlsd = to_cxl_switch_decoder(&cxld->dev);
|
|
|
|
if (cxl_rr->nr_targets_set) {
|
cxl/region: Fix region creation for greater than x2 switches
The cxl_port_setup_targets() algorithm fails to identify valid target list
ordering in the presence of 4-way and above switches resulting in
'cxl create-region' failures of the form:
$ cxl create-region -d decoder0.0 -g 1024 -s 2G -t ram -w 8 -m mem4 mem1 mem6 mem3 mem2 mem5 mem7 mem0
cxl region: create_region: region0: failed to set target7 to mem0
cxl region: cmd_create_region: created 0 regions
[kernel debug message]
check_last_peer:1213: cxl region0: pci0000:0c:port1: cannot host mem6:decoder7.0 at 2
bus_remove_device:574: bus: 'cxl': remove device region0
QEMU can create this failing topology:
ACPI0017:00 [root0]
|
HB_0 [port1]
/ \
RP_0 RP_1
| |
USP [port2] USP [port3]
/ / \ \ / / \ \
DSP DSP DSP DSP DSP DSP DSP DSP
| | | | | | | |
mem4 mem6 mem2 mem7 mem1 mem3 mem5 mem0
Pos: 0 2 4 6 1 3 5 7
HB: Host Bridge
RP: Root Port
USP: Upstream Port
DSP: Downstream Port
...with the following command steps:
$ qemu-system-x86_64 -machine q35,cxl=on,accel=tcg \
-smp cpus=8 \
-m 8G \
-hda /home/work/vm-images/centos-stream8-02.qcow2 \
-object memory-backend-ram,size=4G,id=m0 \
-object memory-backend-ram,size=4G,id=m1 \
-object memory-backend-ram,size=2G,id=cxl-mem0 \
-object memory-backend-ram,size=2G,id=cxl-mem1 \
-object memory-backend-ram,size=2G,id=cxl-mem2 \
-object memory-backend-ram,size=2G,id=cxl-mem3 \
-object memory-backend-ram,size=2G,id=cxl-mem4 \
-object memory-backend-ram,size=2G,id=cxl-mem5 \
-object memory-backend-ram,size=2G,id=cxl-mem6 \
-object memory-backend-ram,size=2G,id=cxl-mem7 \
-numa node,memdev=m0,cpus=0-3,nodeid=0 \
-numa node,memdev=m1,cpus=4-7,nodeid=1 \
-netdev user,id=net0,hostfwd=tcp::2222-:22 \
-device virtio-net-pci,netdev=net0 \
-device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
-device cxl-rp,port=0,bus=cxl.1,id=root_port0,chassis=0,slot=0 \
-device cxl-rp,port=1,bus=cxl.1,id=root_port1,chassis=0,slot=1 \
-device cxl-upstream,bus=root_port0,id=us0 \
-device cxl-downstream,port=0,bus=us0,id=swport0,chassis=0,slot=4 \
-device cxl-type3,bus=swport0,volatile-memdev=cxl-mem0,id=cxl-vmem0 \
-device cxl-downstream,port=1,bus=us0,id=swport1,chassis=0,slot=5 \
-device cxl-type3,bus=swport1,volatile-memdev=cxl-mem1,id=cxl-vmem1 \
-device cxl-downstream,port=2,bus=us0,id=swport2,chassis=0,slot=6 \
-device cxl-type3,bus=swport2,volatile-memdev=cxl-mem2,id=cxl-vmem2 \
-device cxl-downstream,port=3,bus=us0,id=swport3,chassis=0,slot=7 \
-device cxl-type3,bus=swport3,volatile-memdev=cxl-mem3,id=cxl-vmem3 \
-device cxl-upstream,bus=root_port1,id=us1 \
-device cxl-downstream,port=4,bus=us1,id=swport4,chassis=0,slot=8 \
-device cxl-type3,bus=swport4,volatile-memdev=cxl-mem4,id=cxl-vmem4 \
-device cxl-downstream,port=5,bus=us1,id=swport5,chassis=0,slot=9 \
-device cxl-type3,bus=swport5,volatile-memdev=cxl-mem5,id=cxl-vmem5 \
-device cxl-downstream,port=6,bus=us1,id=swport6,chassis=0,slot=10 \
-device cxl-type3,bus=swport6,volatile-memdev=cxl-mem6,id=cxl-vmem6 \
-device cxl-downstream,port=7,bus=us1,id=swport7,chassis=0,slot=11 \
-device cxl-type3,bus=swport7,volatile-memdev=cxl-mem7,id=cxl-vmem7 \
-M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=32G &
In Guest OS:
$ cxl create-region -d decoder0.0 -g 1024 -s 2G -t ram -w 8 -m mem4 mem1 mem6 mem3 mem2 mem5 mem7 mem0
Fix the method to calculate @distance by iterativeley multiplying the
number of targets per switch port. This also follows the algorithm
recommended here [1].
Fixes: 27b3f8d13830 ("cxl/region: Program target lists")
Link: http://lore.kernel.org/6538824b52349_7258329466@dwillia2-xfh.jf.intel.com.notmuch [1]
Signed-off-by: Huaisheng Ye <huaisheng.ye@intel.com>
Tested-by: Li Zhijian <lizhijian@fujitsu.com>
[djbw: add a comment explaining 'distance']
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Link: https://patch.msgid.link/173378716722.1270362.9546805175813426729.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-12-09 15:33:02 -08:00
|
|
|
int i, distance = 1;
|
|
|
|
struct cxl_region_ref *cxl_rr_iter;
|
2022-06-06 15:18:31 -07:00
|
|
|
|
2022-11-03 17:30:54 -07:00
|
|
|
/*
|
cxl/region: Fix region creation for greater than x2 switches
The cxl_port_setup_targets() algorithm fails to identify valid target list
ordering in the presence of 4-way and above switches resulting in
'cxl create-region' failures of the form:
$ cxl create-region -d decoder0.0 -g 1024 -s 2G -t ram -w 8 -m mem4 mem1 mem6 mem3 mem2 mem5 mem7 mem0
cxl region: create_region: region0: failed to set target7 to mem0
cxl region: cmd_create_region: created 0 regions
[kernel debug message]
check_last_peer:1213: cxl region0: pci0000:0c:port1: cannot host mem6:decoder7.0 at 2
bus_remove_device:574: bus: 'cxl': remove device region0
QEMU can create this failing topology:
ACPI0017:00 [root0]
|
HB_0 [port1]
/ \
RP_0 RP_1
| |
USP [port2] USP [port3]
/ / \ \ / / \ \
DSP DSP DSP DSP DSP DSP DSP DSP
| | | | | | | |
mem4 mem6 mem2 mem7 mem1 mem3 mem5 mem0
Pos: 0 2 4 6 1 3 5 7
HB: Host Bridge
RP: Root Port
USP: Upstream Port
DSP: Downstream Port
...with the following command steps:
$ qemu-system-x86_64 -machine q35,cxl=on,accel=tcg \
-smp cpus=8 \
-m 8G \
-hda /home/work/vm-images/centos-stream8-02.qcow2 \
-object memory-backend-ram,size=4G,id=m0 \
-object memory-backend-ram,size=4G,id=m1 \
-object memory-backend-ram,size=2G,id=cxl-mem0 \
-object memory-backend-ram,size=2G,id=cxl-mem1 \
-object memory-backend-ram,size=2G,id=cxl-mem2 \
-object memory-backend-ram,size=2G,id=cxl-mem3 \
-object memory-backend-ram,size=2G,id=cxl-mem4 \
-object memory-backend-ram,size=2G,id=cxl-mem5 \
-object memory-backend-ram,size=2G,id=cxl-mem6 \
-object memory-backend-ram,size=2G,id=cxl-mem7 \
-numa node,memdev=m0,cpus=0-3,nodeid=0 \
-numa node,memdev=m1,cpus=4-7,nodeid=1 \
-netdev user,id=net0,hostfwd=tcp::2222-:22 \
-device virtio-net-pci,netdev=net0 \
-device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
-device cxl-rp,port=0,bus=cxl.1,id=root_port0,chassis=0,slot=0 \
-device cxl-rp,port=1,bus=cxl.1,id=root_port1,chassis=0,slot=1 \
-device cxl-upstream,bus=root_port0,id=us0 \
-device cxl-downstream,port=0,bus=us0,id=swport0,chassis=0,slot=4 \
-device cxl-type3,bus=swport0,volatile-memdev=cxl-mem0,id=cxl-vmem0 \
-device cxl-downstream,port=1,bus=us0,id=swport1,chassis=0,slot=5 \
-device cxl-type3,bus=swport1,volatile-memdev=cxl-mem1,id=cxl-vmem1 \
-device cxl-downstream,port=2,bus=us0,id=swport2,chassis=0,slot=6 \
-device cxl-type3,bus=swport2,volatile-memdev=cxl-mem2,id=cxl-vmem2 \
-device cxl-downstream,port=3,bus=us0,id=swport3,chassis=0,slot=7 \
-device cxl-type3,bus=swport3,volatile-memdev=cxl-mem3,id=cxl-vmem3 \
-device cxl-upstream,bus=root_port1,id=us1 \
-device cxl-downstream,port=4,bus=us1,id=swport4,chassis=0,slot=8 \
-device cxl-type3,bus=swport4,volatile-memdev=cxl-mem4,id=cxl-vmem4 \
-device cxl-downstream,port=5,bus=us1,id=swport5,chassis=0,slot=9 \
-device cxl-type3,bus=swport5,volatile-memdev=cxl-mem5,id=cxl-vmem5 \
-device cxl-downstream,port=6,bus=us1,id=swport6,chassis=0,slot=10 \
-device cxl-type3,bus=swport6,volatile-memdev=cxl-mem6,id=cxl-vmem6 \
-device cxl-downstream,port=7,bus=us1,id=swport7,chassis=0,slot=11 \
-device cxl-type3,bus=swport7,volatile-memdev=cxl-mem7,id=cxl-vmem7 \
-M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=32G &
In Guest OS:
$ cxl create-region -d decoder0.0 -g 1024 -s 2G -t ram -w 8 -m mem4 mem1 mem6 mem3 mem2 mem5 mem7 mem0
Fix the method to calculate @distance by iterativeley multiplying the
number of targets per switch port. This also follows the algorithm
recommended here [1].
Fixes: 27b3f8d13830 ("cxl/region: Program target lists")
Link: http://lore.kernel.org/6538824b52349_7258329466@dwillia2-xfh.jf.intel.com.notmuch [1]
Signed-off-by: Huaisheng Ye <huaisheng.ye@intel.com>
Tested-by: Li Zhijian <lizhijian@fujitsu.com>
[djbw: add a comment explaining 'distance']
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Link: https://patch.msgid.link/173378716722.1270362.9546805175813426729.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-12-09 15:33:02 -08:00
|
|
|
* The "distance" between peer downstream ports represents which
|
|
|
|
* endpoint positions in the region interleave a given port can
|
|
|
|
* host.
|
|
|
|
*
|
|
|
|
* For example, at the root of a hierarchy the distance is
|
|
|
|
* always 1 as every index targets a different host-bridge. At
|
|
|
|
* each subsequent switch level those ports map every Nth region
|
|
|
|
* position where N is the width of the switch == distance.
|
2022-11-03 17:30:54 -07:00
|
|
|
*/
|
cxl/region: Fix region creation for greater than x2 switches
The cxl_port_setup_targets() algorithm fails to identify valid target list
ordering in the presence of 4-way and above switches resulting in
'cxl create-region' failures of the form:
$ cxl create-region -d decoder0.0 -g 1024 -s 2G -t ram -w 8 -m mem4 mem1 mem6 mem3 mem2 mem5 mem7 mem0
cxl region: create_region: region0: failed to set target7 to mem0
cxl region: cmd_create_region: created 0 regions
[kernel debug message]
check_last_peer:1213: cxl region0: pci0000:0c:port1: cannot host mem6:decoder7.0 at 2
bus_remove_device:574: bus: 'cxl': remove device region0
QEMU can create this failing topology:
ACPI0017:00 [root0]
|
HB_0 [port1]
/ \
RP_0 RP_1
| |
USP [port2] USP [port3]
/ / \ \ / / \ \
DSP DSP DSP DSP DSP DSP DSP DSP
| | | | | | | |
mem4 mem6 mem2 mem7 mem1 mem3 mem5 mem0
Pos: 0 2 4 6 1 3 5 7
HB: Host Bridge
RP: Root Port
USP: Upstream Port
DSP: Downstream Port
...with the following command steps:
$ qemu-system-x86_64 -machine q35,cxl=on,accel=tcg \
-smp cpus=8 \
-m 8G \
-hda /home/work/vm-images/centos-stream8-02.qcow2 \
-object memory-backend-ram,size=4G,id=m0 \
-object memory-backend-ram,size=4G,id=m1 \
-object memory-backend-ram,size=2G,id=cxl-mem0 \
-object memory-backend-ram,size=2G,id=cxl-mem1 \
-object memory-backend-ram,size=2G,id=cxl-mem2 \
-object memory-backend-ram,size=2G,id=cxl-mem3 \
-object memory-backend-ram,size=2G,id=cxl-mem4 \
-object memory-backend-ram,size=2G,id=cxl-mem5 \
-object memory-backend-ram,size=2G,id=cxl-mem6 \
-object memory-backend-ram,size=2G,id=cxl-mem7 \
-numa node,memdev=m0,cpus=0-3,nodeid=0 \
-numa node,memdev=m1,cpus=4-7,nodeid=1 \
-netdev user,id=net0,hostfwd=tcp::2222-:22 \
-device virtio-net-pci,netdev=net0 \
-device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
-device cxl-rp,port=0,bus=cxl.1,id=root_port0,chassis=0,slot=0 \
-device cxl-rp,port=1,bus=cxl.1,id=root_port1,chassis=0,slot=1 \
-device cxl-upstream,bus=root_port0,id=us0 \
-device cxl-downstream,port=0,bus=us0,id=swport0,chassis=0,slot=4 \
-device cxl-type3,bus=swport0,volatile-memdev=cxl-mem0,id=cxl-vmem0 \
-device cxl-downstream,port=1,bus=us0,id=swport1,chassis=0,slot=5 \
-device cxl-type3,bus=swport1,volatile-memdev=cxl-mem1,id=cxl-vmem1 \
-device cxl-downstream,port=2,bus=us0,id=swport2,chassis=0,slot=6 \
-device cxl-type3,bus=swport2,volatile-memdev=cxl-mem2,id=cxl-vmem2 \
-device cxl-downstream,port=3,bus=us0,id=swport3,chassis=0,slot=7 \
-device cxl-type3,bus=swport3,volatile-memdev=cxl-mem3,id=cxl-vmem3 \
-device cxl-upstream,bus=root_port1,id=us1 \
-device cxl-downstream,port=4,bus=us1,id=swport4,chassis=0,slot=8 \
-device cxl-type3,bus=swport4,volatile-memdev=cxl-mem4,id=cxl-vmem4 \
-device cxl-downstream,port=5,bus=us1,id=swport5,chassis=0,slot=9 \
-device cxl-type3,bus=swport5,volatile-memdev=cxl-mem5,id=cxl-vmem5 \
-device cxl-downstream,port=6,bus=us1,id=swport6,chassis=0,slot=10 \
-device cxl-type3,bus=swport6,volatile-memdev=cxl-mem6,id=cxl-vmem6 \
-device cxl-downstream,port=7,bus=us1,id=swport7,chassis=0,slot=11 \
-device cxl-type3,bus=swport7,volatile-memdev=cxl-mem7,id=cxl-vmem7 \
-M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=32G &
In Guest OS:
$ cxl create-region -d decoder0.0 -g 1024 -s 2G -t ram -w 8 -m mem4 mem1 mem6 mem3 mem2 mem5 mem7 mem0
Fix the method to calculate @distance by iterativeley multiplying the
number of targets per switch port. This also follows the algorithm
recommended here [1].
Fixes: 27b3f8d13830 ("cxl/region: Program target lists")
Link: http://lore.kernel.org/6538824b52349_7258329466@dwillia2-xfh.jf.intel.com.notmuch [1]
Signed-off-by: Huaisheng Ye <huaisheng.ye@intel.com>
Tested-by: Li Zhijian <lizhijian@fujitsu.com>
[djbw: add a comment explaining 'distance']
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Link: https://patch.msgid.link/173378716722.1270362.9546805175813426729.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-12-09 15:33:02 -08:00
|
|
|
do {
|
|
|
|
cxl_rr_iter = cxl_rr_load(iter, cxlr);
|
|
|
|
distance *= cxl_rr_iter->nr_targets;
|
|
|
|
iter = to_cxl_port(iter->dev.parent);
|
|
|
|
} while (!is_cxl_root(iter));
|
|
|
|
distance *= cxlrd->cxlsd.cxld.interleave_ways;
|
|
|
|
|
2022-06-06 15:18:31 -07:00
|
|
|
for (i = 0; i < cxl_rr->nr_targets_set; i++)
|
|
|
|
if (ep->dport == cxlsd->target[i]) {
|
|
|
|
rc = check_last_peer(cxled, ep, cxl_rr,
|
|
|
|
distance);
|
|
|
|
if (rc)
|
|
|
|
return rc;
|
|
|
|
goto out_target_set;
|
|
|
|
}
|
|
|
|
goto add_target;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (is_cxl_root(parent_port)) {
|
cxl/region: Fix x1 root-decoder granularity calculations
Root decoder granularity must match value from CFWMS, which may not
be the region's granularity for non-interleaved root decoders.
So when calculating granularities for host bridge decoders, use the
region's granularity instead of the root decoder's granularity to ensure
the correct granularities are set for the host bridge decoders and any
downstream switch decoders.
Test configuration is 1 host bridge * 2 switches * 2 endpoints per switch.
Region created with 2048 granularity using following command line:
cxl create-region -m -d decoder0.0 -w 4 mem0 mem2 mem1 mem3 \
-g 2048 -s 2048M
Use "cxl list -PDE | grep granularity" to get a view of the granularity
set at each level of the topology.
Before this patch:
"interleave_granularity":2048,
"interleave_granularity":2048,
"interleave_granularity":512,
"interleave_granularity":2048,
"interleave_granularity":2048,
"interleave_granularity":512,
"interleave_granularity":256,
After:
"interleave_granularity":2048,
"interleave_granularity":2048,
"interleave_granularity":4096,
"interleave_granularity":2048,
"interleave_granularity":2048,
"interleave_granularity":4096,
"interleave_granularity":2048,
Fixes: 27b3f8d13830 ("cxl/region: Program target lists")
Cc: <stable@vger.kernel.org>
Signed-off-by: Jim Harris <jim.harris@samsung.com>
Link: https://lore.kernel.org/r/169824893473.1403938.16110924262989774582.stgit@bgt-140510-bm03.eng.stellus.in
[djbw: fixup the prebuilt cxl_test region]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-26 10:09:06 -07:00
|
|
|
/*
|
|
|
|
* Root decoder IG is always set to value in CFMWS which
|
|
|
|
* may be different than this region's IG. We can use the
|
|
|
|
* region's IG here since interleave_granularity_store()
|
|
|
|
* does not allow interleaved host-bridges with
|
|
|
|
* root IG != region IG.
|
|
|
|
*/
|
|
|
|
parent_ig = p->interleave_granularity;
|
2022-06-06 15:18:31 -07:00
|
|
|
parent_iw = cxlrd->cxlsd.cxld.interleave_ways;
|
|
|
|
/*
|
|
|
|
* For purposes of address bit routing, use power-of-2 math for
|
|
|
|
* switch ports.
|
|
|
|
*/
|
|
|
|
if (!is_power_of_2(parent_iw))
|
|
|
|
parent_iw /= 3;
|
|
|
|
} else {
|
|
|
|
struct cxl_region_ref *parent_rr;
|
|
|
|
struct cxl_decoder *parent_cxld;
|
|
|
|
|
|
|
|
parent_rr = cxl_rr_load(parent_port, cxlr);
|
|
|
|
parent_cxld = parent_rr->decoder;
|
|
|
|
parent_ig = parent_cxld->interleave_granularity;
|
|
|
|
parent_iw = parent_cxld->interleave_ways;
|
|
|
|
}
|
|
|
|
|
2022-12-05 14:16:07 -07:00
|
|
|
rc = granularity_to_eig(parent_ig, &peig);
|
2022-08-02 13:27:44 -07:00
|
|
|
if (rc) {
|
|
|
|
dev_dbg(&cxlr->dev, "%s:%s: invalid parent granularity: %d\n",
|
2023-06-22 15:55:01 -05:00
|
|
|
dev_name(parent_port->uport_dev),
|
2022-08-02 13:27:44 -07:00
|
|
|
dev_name(&parent_port->dev), parent_ig);
|
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
|
2022-12-05 14:20:01 -07:00
|
|
|
rc = ways_to_eiw(parent_iw, &peiw);
|
2022-08-02 13:27:44 -07:00
|
|
|
if (rc) {
|
|
|
|
dev_dbg(&cxlr->dev, "%s:%s: invalid parent interleave: %d\n",
|
2023-06-22 15:55:01 -05:00
|
|
|
dev_name(parent_port->uport_dev),
|
2022-08-02 13:27:44 -07:00
|
|
|
dev_name(&parent_port->dev), parent_iw);
|
|
|
|
return rc;
|
|
|
|
}
|
2022-06-06 15:18:31 -07:00
|
|
|
|
|
|
|
iw = cxl_rr->nr_targets;
|
2022-12-05 14:20:01 -07:00
|
|
|
rc = ways_to_eiw(iw, &eiw);
|
2022-08-02 13:27:44 -07:00
|
|
|
if (rc) {
|
|
|
|
dev_dbg(&cxlr->dev, "%s:%s: invalid port interleave: %d\n",
|
2023-06-22 15:55:01 -05:00
|
|
|
dev_name(port->uport_dev), dev_name(&port->dev), iw);
|
2022-08-02 13:27:44 -07:00
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
|
2022-08-05 13:27:45 -07:00
|
|
|
/*
|
2023-08-22 11:09:28 -07:00
|
|
|
* Interleave granularity is a multiple of @parent_port granularity.
|
|
|
|
* Multiplier is the parent port interleave ways.
|
2022-08-05 13:27:45 -07:00
|
|
|
*/
|
2023-08-22 11:09:28 -07:00
|
|
|
rc = granularity_to_eig(parent_ig * parent_iw, &eig);
|
|
|
|
if (rc) {
|
|
|
|
dev_dbg(&cxlr->dev,
|
|
|
|
"%s: invalid granularity calculation (%d * %d)\n",
|
|
|
|
dev_name(&parent_port->dev), parent_ig, parent_iw);
|
|
|
|
return rc;
|
2022-06-06 15:18:31 -07:00
|
|
|
}
|
|
|
|
|
2022-12-05 14:16:07 -07:00
|
|
|
rc = eig_to_granularity(eig, &ig);
|
2022-06-06 15:18:31 -07:00
|
|
|
if (rc) {
|
|
|
|
dev_dbg(&cxlr->dev, "%s:%s: invalid interleave: %d\n",
|
2023-06-22 15:55:01 -05:00
|
|
|
dev_name(port->uport_dev), dev_name(&port->dev),
|
2022-06-06 15:18:31 -07:00
|
|
|
256 << eig);
|
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
|
2023-10-31 14:09:19 -07:00
|
|
|
if (iw > 8 || iw > cxlsd->nr_targets) {
|
|
|
|
dev_dbg(&cxlr->dev,
|
|
|
|
"%s:%s:%s: ways: %d overflows targets: %d\n",
|
|
|
|
dev_name(port->uport_dev), dev_name(&port->dev),
|
|
|
|
dev_name(&cxld->dev), iw, cxlsd->nr_targets);
|
|
|
|
return -ENXIO;
|
|
|
|
}
|
|
|
|
|
2023-02-10 17:31:17 -08:00
|
|
|
if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) {
|
|
|
|
if (cxld->interleave_ways != iw ||
|
2025-04-02 19:25:52 -04:00
|
|
|
(iw > 1 && cxld->interleave_granularity != ig) ||
|
2025-02-26 09:21:19 -07:00
|
|
|
!region_res_match_cxl_range(p, &cxld->hpa_range) ||
|
2023-02-10 17:31:17 -08:00
|
|
|
((cxld->flags & CXL_DECODER_F_ENABLE) == 0)) {
|
|
|
|
dev_err(&cxlr->dev,
|
|
|
|
"%s:%s %s expected iw: %d ig: %d %pr\n",
|
2023-06-22 15:55:01 -05:00
|
|
|
dev_name(port->uport_dev), dev_name(&port->dev),
|
2023-02-10 17:31:17 -08:00
|
|
|
__func__, iw, ig, p->res);
|
|
|
|
dev_err(&cxlr->dev,
|
|
|
|
"%s:%s %s got iw: %d ig: %d state: %s %#llx:%#llx\n",
|
2023-06-22 15:55:01 -05:00
|
|
|
dev_name(port->uport_dev), dev_name(&port->dev),
|
2023-02-10 17:31:17 -08:00
|
|
|
__func__, cxld->interleave_ways,
|
|
|
|
cxld->interleave_granularity,
|
|
|
|
(cxld->flags & CXL_DECODER_F_ENABLE) ?
|
|
|
|
"enabled" :
|
|
|
|
"disabled",
|
|
|
|
cxld->hpa_range.start, cxld->hpa_range.end);
|
|
|
|
return -ENXIO;
|
|
|
|
}
|
|
|
|
} else {
|
cxl/region: check interleave capability
Since interleave capability is not verified, if the interleave
capability of a target does not match the region need, committing decoder
should have failed at the device end.
In order to checkout this error as quickly as possible, driver needs
to check the interleave capability of target during attaching it to
region.
Per CXL specification r3.1(8.2.4.20.1 CXL HDM Decoder Capability Register),
bits 11 and 12 indicate the capability to establish interleaving in 3, 6,
12 and 16 ways. If these bits are not set, the target cannot be attached to
a region utilizing such interleave ways.
Additionally, bits 8 and 9 represent the capability of the bits used for
interleaving in the address, Linux tracks this in the cxl_port
interleave_mask.
Per CXL specification r3.1(8.2.4.20.13 Decoder Protection):
eIW means encoded Interleave Ways.
eIG means encoded Interleave Granularity.
in HPA:
if eIW is 0 or 8 (interleave ways: 1, 3), all the bits of HPA are used,
the interleave bits are none, the following check is ignored.
if eIW is less than 8 (interleave ways: 2, 4, 8, 16), the interleave bits
start at bit position eIG + 8 and end at eIG + eIW + 8 - 1.
if eIW is greater than 8 (interleave ways: 6, 12), the interleave bits
start at bit position eIG + 8 and end at eIG + eIW - 1.
if the interleave mask is insufficient to cover the required interleave
bits, the target cannot be attached to the region.
Fixes: 384e624bb211 ("cxl/region: Attach endpoint decoders")
Signed-off-by: Yao Xingtao <yaoxt.fnst@fujitsu.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://patch.msgid.link/20240614084755.59503-2-yaoxt.fnst@fujitsu.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-06-14 04:47:54 -04:00
|
|
|
rc = check_interleave_cap(cxld, iw, ig);
|
|
|
|
if (rc) {
|
|
|
|
dev_dbg(&cxlr->dev,
|
|
|
|
"%s:%s iw: %d ig: %d is not supported\n",
|
|
|
|
dev_name(port->uport_dev),
|
|
|
|
dev_name(&port->dev), iw, ig);
|
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
|
2023-02-10 17:31:17 -08:00
|
|
|
cxld->interleave_ways = iw;
|
|
|
|
cxld->interleave_granularity = ig;
|
|
|
|
cxld->hpa_range = (struct range) {
|
|
|
|
.start = p->res->start,
|
|
|
|
.end = p->res->end,
|
|
|
|
};
|
|
|
|
}
|
2023-06-22 15:55:01 -05:00
|
|
|
dev_dbg(&cxlr->dev, "%s:%s iw: %d ig: %d\n", dev_name(port->uport_dev),
|
2022-06-06 15:18:31 -07:00
|
|
|
dev_name(&port->dev), iw, ig);
|
|
|
|
add_target:
|
|
|
|
if (cxl_rr->nr_targets_set == cxl_rr->nr_targets) {
|
|
|
|
dev_dbg(&cxlr->dev,
|
|
|
|
"%s:%s: targets full trying to add %s:%s at %d\n",
|
2023-06-22 15:55:01 -05:00
|
|
|
dev_name(port->uport_dev), dev_name(&port->dev),
|
2022-06-06 15:18:31 -07:00
|
|
|
dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev), pos);
|
|
|
|
return -ENXIO;
|
|
|
|
}
|
2023-02-10 17:31:17 -08:00
|
|
|
if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) {
|
|
|
|
if (cxlsd->target[cxl_rr->nr_targets_set] != ep->dport) {
|
|
|
|
dev_dbg(&cxlr->dev, "%s:%s: %s expected %s at %d\n",
|
2023-06-22 15:55:01 -05:00
|
|
|
dev_name(port->uport_dev), dev_name(&port->dev),
|
2023-02-10 17:31:17 -08:00
|
|
|
dev_name(&cxlsd->cxld.dev),
|
2023-06-22 15:55:00 -05:00
|
|
|
dev_name(ep->dport->dport_dev),
|
2023-02-10 17:31:17 -08:00
|
|
|
cxl_rr->nr_targets_set);
|
|
|
|
return -ENXIO;
|
|
|
|
}
|
|
|
|
} else
|
|
|
|
cxlsd->target[cxl_rr->nr_targets_set] = ep->dport;
|
2022-06-06 15:18:31 -07:00
|
|
|
inc = 1;
|
|
|
|
out_target_set:
|
|
|
|
cxl_rr->nr_targets_set += inc;
|
|
|
|
dev_dbg(&cxlr->dev, "%s:%s target[%d] = %s for %s:%s @ %d\n",
|
2023-06-22 15:55:01 -05:00
|
|
|
dev_name(port->uport_dev), dev_name(&port->dev),
|
2023-06-22 15:55:00 -05:00
|
|
|
cxl_rr->nr_targets_set - 1, dev_name(ep->dport->dport_dev),
|
2022-06-06 15:18:31 -07:00
|
|
|
dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev), pos);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void cxl_port_reset_targets(struct cxl_port *port,
|
|
|
|
struct cxl_region *cxlr)
|
|
|
|
{
|
|
|
|
struct cxl_region_ref *cxl_rr = cxl_rr_load(port, cxlr);
|
2022-08-05 13:27:40 -07:00
|
|
|
struct cxl_decoder *cxld;
|
2022-06-06 15:18:31 -07:00
|
|
|
|
|
|
|
/*
|
|
|
|
* After the last endpoint has been detached the entire cxl_rr may now
|
|
|
|
* be gone.
|
|
|
|
*/
|
2022-08-05 13:27:40 -07:00
|
|
|
if (!cxl_rr)
|
|
|
|
return;
|
|
|
|
cxl_rr->nr_targets_set = 0;
|
|
|
|
|
|
|
|
cxld = cxl_rr->decoder;
|
|
|
|
cxld->hpa_range = (struct range) {
|
|
|
|
.start = 0,
|
|
|
|
.end = -1,
|
|
|
|
};
|
2022-06-06 15:18:31 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
static void cxl_region_teardown_targets(struct cxl_region *cxlr)
|
|
|
|
{
|
|
|
|
struct cxl_region_params *p = &cxlr->params;
|
|
|
|
struct cxl_endpoint_decoder *cxled;
|
2023-04-03 14:44:41 -07:00
|
|
|
struct cxl_dev_state *cxlds;
|
2022-06-06 15:18:31 -07:00
|
|
|
struct cxl_memdev *cxlmd;
|
|
|
|
struct cxl_port *iter;
|
|
|
|
struct cxl_ep *ep;
|
|
|
|
int i;
|
|
|
|
|
2023-02-10 17:31:17 -08:00
|
|
|
/*
|
|
|
|
* In the auto-discovery case skip automatic teardown since the
|
|
|
|
* address space is already active
|
|
|
|
*/
|
|
|
|
if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags))
|
|
|
|
return;
|
|
|
|
|
2022-06-06 15:18:31 -07:00
|
|
|
for (i = 0; i < p->nr_targets; i++) {
|
|
|
|
cxled = p->targets[i];
|
|
|
|
cxlmd = cxled_to_memdev(cxled);
|
2023-04-03 14:44:41 -07:00
|
|
|
cxlds = cxlmd->cxlds;
|
|
|
|
|
|
|
|
if (cxlds->rcd)
|
|
|
|
continue;
|
2022-06-06 15:18:31 -07:00
|
|
|
|
|
|
|
iter = cxled_to_port(cxled);
|
|
|
|
while (!is_cxl_root(to_cxl_port(iter->dev.parent)))
|
|
|
|
iter = to_cxl_port(iter->dev.parent);
|
|
|
|
|
|
|
|
for (ep = cxl_ep_load(iter, cxlmd); iter;
|
|
|
|
iter = ep->next, ep = cxl_ep_load(iter, cxlmd))
|
|
|
|
cxl_port_reset_targets(iter, cxlr);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static int cxl_region_setup_targets(struct cxl_region *cxlr)
|
|
|
|
{
|
|
|
|
struct cxl_region_params *p = &cxlr->params;
|
|
|
|
struct cxl_endpoint_decoder *cxled;
|
2023-04-03 14:44:41 -07:00
|
|
|
struct cxl_dev_state *cxlds;
|
|
|
|
int i, rc, rch = 0, vh = 0;
|
2022-06-06 15:18:31 -07:00
|
|
|
struct cxl_memdev *cxlmd;
|
|
|
|
struct cxl_port *iter;
|
|
|
|
struct cxl_ep *ep;
|
|
|
|
|
|
|
|
for (i = 0; i < p->nr_targets; i++) {
|
|
|
|
cxled = p->targets[i];
|
|
|
|
cxlmd = cxled_to_memdev(cxled);
|
2023-04-03 14:44:41 -07:00
|
|
|
cxlds = cxlmd->cxlds;
|
|
|
|
|
|
|
|
/* validate that all targets agree on topology */
|
|
|
|
if (!cxlds->rcd) {
|
|
|
|
vh++;
|
|
|
|
} else {
|
|
|
|
rch++;
|
|
|
|
continue;
|
|
|
|
}
|
2022-06-06 15:18:31 -07:00
|
|
|
|
|
|
|
iter = cxled_to_port(cxled);
|
|
|
|
while (!is_cxl_root(to_cxl_port(iter->dev.parent)))
|
|
|
|
iter = to_cxl_port(iter->dev.parent);
|
|
|
|
|
|
|
|
/*
|
2023-02-10 17:31:17 -08:00
|
|
|
* Descend the topology tree programming / validating
|
|
|
|
* targets while looking for conflicts.
|
2022-06-06 15:18:31 -07:00
|
|
|
*/
|
|
|
|
for (ep = cxl_ep_load(iter, cxlmd); iter;
|
|
|
|
iter = ep->next, ep = cxl_ep_load(iter, cxlmd)) {
|
|
|
|
rc = cxl_port_setup_targets(iter, cxlr, cxled);
|
|
|
|
if (rc) {
|
|
|
|
cxl_region_teardown_targets(cxlr);
|
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2023-04-03 14:44:41 -07:00
|
|
|
if (rch && vh) {
|
|
|
|
dev_err(&cxlr->dev, "mismatched CXL topologies detected\n");
|
|
|
|
cxl_region_teardown_targets(cxlr);
|
|
|
|
return -ENXIO;
|
|
|
|
}
|
|
|
|
|
2022-06-06 15:18:31 -07:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2023-02-10 01:06:15 -08:00
|
|
|
static int cxl_region_validate_position(struct cxl_region *cxlr,
|
|
|
|
struct cxl_endpoint_decoder *cxled,
|
|
|
|
int pos)
|
2022-06-04 15:49:53 -07:00
|
|
|
{
|
2022-06-07 10:56:10 -07:00
|
|
|
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
|
2022-06-04 15:49:53 -07:00
|
|
|
struct cxl_region_params *p = &cxlr->params;
|
2023-02-10 01:06:15 -08:00
|
|
|
int i;
|
2022-06-07 10:56:10 -07:00
|
|
|
|
|
|
|
if (pos < 0 || pos >= p->interleave_ways) {
|
2022-06-04 15:49:53 -07:00
|
|
|
dev_dbg(&cxlr->dev, "position %d out of range %d\n", pos,
|
|
|
|
p->interleave_ways);
|
|
|
|
return -ENXIO;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (p->targets[pos] == cxled)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
if (p->targets[pos]) {
|
|
|
|
struct cxl_endpoint_decoder *cxled_target = p->targets[pos];
|
|
|
|
struct cxl_memdev *cxlmd_target = cxled_to_memdev(cxled_target);
|
|
|
|
|
|
|
|
dev_dbg(&cxlr->dev, "position %d already assigned to %s:%s\n",
|
|
|
|
pos, dev_name(&cxlmd_target->dev),
|
|
|
|
dev_name(&cxled_target->cxld.dev));
|
|
|
|
return -EBUSY;
|
|
|
|
}
|
|
|
|
|
2022-06-07 10:56:10 -07:00
|
|
|
for (i = 0; i < p->interleave_ways; i++) {
|
|
|
|
struct cxl_endpoint_decoder *cxled_target;
|
|
|
|
struct cxl_memdev *cxlmd_target;
|
|
|
|
|
2022-11-07 21:22:31 +00:00
|
|
|
cxled_target = p->targets[i];
|
2022-06-07 10:56:10 -07:00
|
|
|
if (!cxled_target)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
cxlmd_target = cxled_to_memdev(cxled_target);
|
|
|
|
if (cxlmd_target == cxlmd) {
|
|
|
|
dev_dbg(&cxlr->dev,
|
|
|
|
"%s already specified at position %d via: %s\n",
|
|
|
|
dev_name(&cxlmd->dev), pos,
|
|
|
|
dev_name(&cxled_target->cxld.dev));
|
|
|
|
return -EBUSY;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2023-02-10 01:06:15 -08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int cxl_region_attach_position(struct cxl_region *cxlr,
|
|
|
|
struct cxl_root_decoder *cxlrd,
|
|
|
|
struct cxl_endpoint_decoder *cxled,
|
|
|
|
const struct cxl_dport *dport, int pos)
|
|
|
|
{
|
|
|
|
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
|
2024-07-02 22:29:51 -07:00
|
|
|
struct cxl_switch_decoder *cxlsd = &cxlrd->cxlsd;
|
|
|
|
struct cxl_decoder *cxld = &cxlsd->cxld;
|
|
|
|
int iw = cxld->interleave_ways;
|
2023-02-10 01:06:15 -08:00
|
|
|
struct cxl_port *iter;
|
|
|
|
int rc;
|
|
|
|
|
2024-07-02 22:29:51 -07:00
|
|
|
if (dport != cxlrd->cxlsd.target[pos % iw]) {
|
2023-02-10 01:06:15 -08:00
|
|
|
dev_dbg(&cxlr->dev, "%s:%s invalid target position for %s\n",
|
|
|
|
dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
|
|
|
|
dev_name(&cxlrd->cxlsd.cxld.dev));
|
|
|
|
return -ENXIO;
|
|
|
|
}
|
|
|
|
|
|
|
|
for (iter = cxled_to_port(cxled); !is_cxl_root(iter);
|
|
|
|
iter = to_cxl_port(iter->dev.parent)) {
|
|
|
|
rc = cxl_port_attach_region(iter, cxlr, cxled, pos);
|
|
|
|
if (rc)
|
|
|
|
goto err;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
err:
|
|
|
|
for (iter = cxled_to_port(cxled); !is_cxl_root(iter);
|
|
|
|
iter = to_cxl_port(iter->dev.parent))
|
|
|
|
cxl_port_detach_region(iter, cxlr, cxled);
|
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
|
2023-02-10 17:31:17 -08:00
|
|
|
static int cxl_region_attach_auto(struct cxl_region *cxlr,
|
|
|
|
struct cxl_endpoint_decoder *cxled, int pos)
|
|
|
|
{
|
|
|
|
struct cxl_region_params *p = &cxlr->params;
|
|
|
|
|
|
|
|
if (cxled->state != CXL_DECODER_STATE_AUTO) {
|
|
|
|
dev_err(&cxlr->dev,
|
|
|
|
"%s: unable to add decoder to autodetected region\n",
|
|
|
|
dev_name(&cxled->cxld.dev));
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (pos >= 0) {
|
|
|
|
dev_dbg(&cxlr->dev, "%s: expected auto position, not %d\n",
|
|
|
|
dev_name(&cxled->cxld.dev), pos);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (p->nr_targets >= p->interleave_ways) {
|
|
|
|
dev_err(&cxlr->dev, "%s: no more target slots available\n",
|
|
|
|
dev_name(&cxled->cxld.dev));
|
|
|
|
return -ENXIO;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Temporarily record the endpoint decoder into the target array. Yes,
|
|
|
|
* this means that userspace can view devices in the wrong position
|
|
|
|
* before the region activates, and must be careful to understand when
|
|
|
|
* it might be racing region autodiscovery.
|
|
|
|
*/
|
|
|
|
pos = p->nr_targets;
|
|
|
|
p->targets[pos] = cxled;
|
|
|
|
cxled->pos = pos;
|
|
|
|
p->nr_targets++;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
cxl/region: Use cxl_calc_interleave_pos() for auto-discovery
For auto-discovered regions the driver must assign each target to
a valid position in the region interleave set based on the decoder
topology.
The current implementation fails to parse valid decode topologies,
as it does not consider the child offset into a parent port. The sort
put all targets of one port ahead of another port when an interleave
was expected, causing the region assembly to fail.
Replace the existing relative sort with cxl_calc_interleave_pos() that
finds the exact position in a region interleave for an endpoint based
on a walk up the ancestral tree from endpoint to root decoder.
cxl_calc_interleave_pos() was introduced in a prior patch, so the work
here is to use it in cxl_region_sort_targets().
Remove the obsoleted helper functions from the prior sort.
Testing passes on pre-production hardware with BIOS defined regions
that natively trigger this autodiscovery path of the region driver.
Testing passes a CXL unit test using the dev_dbg() calculation test
(see cxl_region_attach()) across an expanded set of region configs:
1, 1, 1+1, 1+1+1, 2, 2+2, 2+2+2, 2+2+2+2, 4, 4+4, where each number
represents the count of endpoints per host bridge.
Fixes: a32320b71f08 ("cxl/region: Add region autodiscovery")
Reported-by: Dmytro Adamenko <dmytro.adamenko@intel.com>
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jim Harris <jim.harris@samsung.com>
Link: https://lore.kernel.org/r/3946cc55ddc19678733eddc9de2c317749f43f3b.1698263080.git.alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-25 13:01:34 -07:00
|
|
|
static int cmp_interleave_pos(const void *a, const void *b)
|
|
|
|
{
|
|
|
|
struct cxl_endpoint_decoder *cxled_a = *(typeof(cxled_a) *)a;
|
|
|
|
struct cxl_endpoint_decoder *cxled_b = *(typeof(cxled_b) *)b;
|
|
|
|
|
|
|
|
return cxled_a->pos - cxled_b->pos;
|
|
|
|
}
|
|
|
|
|
driver core: Constify API device_find_child() and adapt for various usages
Constify the following API:
struct device *device_find_child(struct device *dev, void *data,
int (*match)(struct device *dev, void *data));
To :
struct device *device_find_child(struct device *dev, const void *data,
device_match_t match);
typedef int (*device_match_t)(struct device *dev, const void *data);
with the following reasons:
- Protect caller's match data @*data which is for comparison and lookup
and the API does not actually need to modify @*data.
- Make the API's parameters (@match)() and @data have the same type as
all of other device finding APIs (bus|class|driver)_find_device().
- All kinds of existing device match functions can be directly taken
as the API's argument, they were exported by driver core.
Constify the API and adapt for various existing usages.
BTW, various subsystem changes are squashed into this commit to meet
'git bisect' requirement, and this commit has the minimal and simplest
changes to complement squashing shortcoming, and that may bring extra
code improvement.
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Acked-by: Uwe Kleine-König <ukleinek@kernel.org> # for drivers/pwm
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Link: https://lore.kernel.org/r/20241224-const_dfc_done-v5-4-6623037414d4@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-24 21:05:03 +08:00
|
|
|
static int match_switch_decoder_by_range(struct device *dev,
|
|
|
|
const void *data)
|
2023-02-10 17:31:17 -08:00
|
|
|
{
|
|
|
|
struct cxl_switch_decoder *cxlsd;
|
driver core: Constify API device_find_child() and adapt for various usages
Constify the following API:
struct device *device_find_child(struct device *dev, void *data,
int (*match)(struct device *dev, void *data));
To :
struct device *device_find_child(struct device *dev, const void *data,
device_match_t match);
typedef int (*device_match_t)(struct device *dev, const void *data);
with the following reasons:
- Protect caller's match data @*data which is for comparison and lookup
and the API does not actually need to modify @*data.
- Make the API's parameters (@match)() and @data have the same type as
all of other device finding APIs (bus|class|driver)_find_device().
- All kinds of existing device match functions can be directly taken
as the API's argument, they were exported by driver core.
Constify the API and adapt for various existing usages.
BTW, various subsystem changes are squashed into this commit to meet
'git bisect' requirement, and this commit has the minimal and simplest
changes to complement squashing shortcoming, and that may bring extra
code improvement.
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Acked-by: Uwe Kleine-König <ukleinek@kernel.org> # for drivers/pwm
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Link: https://lore.kernel.org/r/20241224-const_dfc_done-v5-4-6623037414d4@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-24 21:05:03 +08:00
|
|
|
const struct range *r1, *r2 = data;
|
|
|
|
|
2023-02-10 17:31:17 -08:00
|
|
|
|
|
|
|
if (!is_switch_decoder(dev))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
cxlsd = to_cxl_switch_decoder(dev);
|
2023-10-26 08:46:54 -07:00
|
|
|
r1 = &cxlsd->cxld.hpa_range;
|
|
|
|
|
|
|
|
if (is_root_decoder(dev))
|
|
|
|
return range_contains(r1, r2);
|
|
|
|
return (r1->start == r2->start && r1->end == r2->end);
|
2023-02-10 17:31:17 -08:00
|
|
|
}
|
|
|
|
|
2023-10-27 13:04:48 -07:00
|
|
|
static int find_pos_and_ways(struct cxl_port *port, struct range *range,
|
|
|
|
int *pos, int *ways)
|
|
|
|
{
|
|
|
|
struct cxl_switch_decoder *cxlsd;
|
|
|
|
struct cxl_port *parent;
|
|
|
|
struct device *dev;
|
|
|
|
int rc = -ENXIO;
|
|
|
|
|
2025-05-09 17:06:49 +02:00
|
|
|
parent = parent_port_of(port);
|
2023-10-27 13:04:48 -07:00
|
|
|
if (!parent)
|
|
|
|
return rc;
|
|
|
|
|
|
|
|
dev = device_find_child(&parent->dev, range,
|
|
|
|
match_switch_decoder_by_range);
|
|
|
|
if (!dev) {
|
|
|
|
dev_err(port->uport_dev,
|
|
|
|
"failed to find decoder mapping %#llx-%#llx\n",
|
|
|
|
range->start, range->end);
|
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
cxlsd = to_cxl_switch_decoder(dev);
|
|
|
|
*ways = cxlsd->cxld.interleave_ways;
|
|
|
|
|
|
|
|
for (int i = 0; i < *ways; i++) {
|
|
|
|
if (cxlsd->target[i] == port->parent_dport) {
|
|
|
|
*pos = i;
|
|
|
|
rc = 0;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
put_device(dev);
|
|
|
|
|
2025-05-09 17:06:58 +02:00
|
|
|
if (rc)
|
|
|
|
dev_err(port->uport_dev,
|
|
|
|
"failed to find %s:%s in target list of %s\n",
|
|
|
|
dev_name(&port->dev),
|
|
|
|
dev_name(port->parent_dport->dport_dev),
|
|
|
|
dev_name(&cxlsd->cxld.dev));
|
|
|
|
|
2023-10-27 13:04:48 -07:00
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* cxl_calc_interleave_pos() - calculate an endpoint position in a region
|
|
|
|
* @cxled: endpoint decoder member of given region
|
|
|
|
*
|
|
|
|
* The endpoint position is calculated by traversing the topology from
|
|
|
|
* the endpoint to the root decoder and iteratively applying this
|
|
|
|
* calculation:
|
|
|
|
*
|
|
|
|
* position = position * parent_ways + parent_pos;
|
|
|
|
*
|
|
|
|
* ...where @position is inferred from switch and root decoder target lists.
|
|
|
|
*
|
|
|
|
* Return: position >= 0 on success
|
|
|
|
* -ENXIO on failure
|
|
|
|
*/
|
|
|
|
static int cxl_calc_interleave_pos(struct cxl_endpoint_decoder *cxled)
|
|
|
|
{
|
|
|
|
struct cxl_port *iter, *port = cxled_to_port(cxled);
|
|
|
|
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
|
|
|
|
struct range *range = &cxled->cxld.hpa_range;
|
|
|
|
int parent_ways = 0, parent_pos = 0, pos = 0;
|
|
|
|
int rc;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Example: the expected interleave order of the 4-way region shown
|
|
|
|
* below is: mem0, mem2, mem1, mem3
|
|
|
|
*
|
|
|
|
* root_port
|
|
|
|
* / \
|
|
|
|
* host_bridge_0 host_bridge_1
|
|
|
|
* | | | |
|
|
|
|
* mem0 mem1 mem2 mem3
|
|
|
|
*
|
|
|
|
* In the example the calculator will iterate twice. The first iteration
|
|
|
|
* uses the mem position in the host-bridge and the ways of the host-
|
|
|
|
* bridge to generate the first, or local, position. The second
|
|
|
|
* iteration uses the host-bridge position in the root_port and the ways
|
|
|
|
* of the root_port to refine the position.
|
|
|
|
*
|
|
|
|
* A trace of the calculation per endpoint looks like this:
|
|
|
|
* mem0: pos = 0 * 2 + 0 mem2: pos = 0 * 2 + 0
|
|
|
|
* pos = 0 * 2 + 0 pos = 0 * 2 + 1
|
|
|
|
* pos: 0 pos: 1
|
|
|
|
*
|
|
|
|
* mem1: pos = 0 * 2 + 1 mem3: pos = 0 * 2 + 1
|
|
|
|
* pos = 1 * 2 + 0 pos = 1 * 2 + 1
|
|
|
|
* pos: 2 pos = 3
|
|
|
|
*
|
|
|
|
* Note that while this example is simple, the method applies to more
|
|
|
|
* complex topologies, including those with switches.
|
|
|
|
*/
|
|
|
|
|
|
|
|
/* Iterate from endpoint to root_port refining the position */
|
2025-05-09 17:06:49 +02:00
|
|
|
for (iter = port; iter; iter = parent_port_of(iter)) {
|
2023-10-27 13:04:48 -07:00
|
|
|
if (is_cxl_root(iter))
|
|
|
|
break;
|
|
|
|
|
|
|
|
rc = find_pos_and_ways(iter, range, &parent_pos, &parent_ways);
|
|
|
|
if (rc)
|
|
|
|
return rc;
|
|
|
|
|
|
|
|
pos = pos * parent_ways + parent_pos;
|
|
|
|
}
|
|
|
|
|
|
|
|
dev_dbg(&cxlmd->dev,
|
|
|
|
"decoder:%s parent:%s port:%s range:%#llx-%#llx pos:%d\n",
|
|
|
|
dev_name(&cxled->cxld.dev), dev_name(cxlmd->dev.parent),
|
|
|
|
dev_name(&port->dev), range->start, range->end, pos);
|
|
|
|
|
|
|
|
return pos;
|
|
|
|
}
|
|
|
|
|
2023-02-10 17:31:17 -08:00
|
|
|
static int cxl_region_sort_targets(struct cxl_region *cxlr)
|
|
|
|
{
|
|
|
|
struct cxl_region_params *p = &cxlr->params;
|
|
|
|
int i, rc = 0;
|
|
|
|
|
|
|
|
for (i = 0; i < p->nr_targets; i++) {
|
|
|
|
struct cxl_endpoint_decoder *cxled = p->targets[i];
|
|
|
|
|
cxl/region: Use cxl_calc_interleave_pos() for auto-discovery
For auto-discovered regions the driver must assign each target to
a valid position in the region interleave set based on the decoder
topology.
The current implementation fails to parse valid decode topologies,
as it does not consider the child offset into a parent port. The sort
put all targets of one port ahead of another port when an interleave
was expected, causing the region assembly to fail.
Replace the existing relative sort with cxl_calc_interleave_pos() that
finds the exact position in a region interleave for an endpoint based
on a walk up the ancestral tree from endpoint to root decoder.
cxl_calc_interleave_pos() was introduced in a prior patch, so the work
here is to use it in cxl_region_sort_targets().
Remove the obsoleted helper functions from the prior sort.
Testing passes on pre-production hardware with BIOS defined regions
that natively trigger this autodiscovery path of the region driver.
Testing passes a CXL unit test using the dev_dbg() calculation test
(see cxl_region_attach()) across an expanded set of region configs:
1, 1, 1+1, 1+1+1, 2, 2+2, 2+2+2, 2+2+2+2, 4, 4+4, where each number
represents the count of endpoints per host bridge.
Fixes: a32320b71f08 ("cxl/region: Add region autodiscovery")
Reported-by: Dmytro Adamenko <dmytro.adamenko@intel.com>
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jim Harris <jim.harris@samsung.com>
Link: https://lore.kernel.org/r/3946cc55ddc19678733eddc9de2c317749f43f3b.1698263080.git.alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-25 13:01:34 -07:00
|
|
|
cxled->pos = cxl_calc_interleave_pos(cxled);
|
2023-02-10 17:31:17 -08:00
|
|
|
/*
|
cxl/region: Use cxl_calc_interleave_pos() for auto-discovery
For auto-discovered regions the driver must assign each target to
a valid position in the region interleave set based on the decoder
topology.
The current implementation fails to parse valid decode topologies,
as it does not consider the child offset into a parent port. The sort
put all targets of one port ahead of another port when an interleave
was expected, causing the region assembly to fail.
Replace the existing relative sort with cxl_calc_interleave_pos() that
finds the exact position in a region interleave for an endpoint based
on a walk up the ancestral tree from endpoint to root decoder.
cxl_calc_interleave_pos() was introduced in a prior patch, so the work
here is to use it in cxl_region_sort_targets().
Remove the obsoleted helper functions from the prior sort.
Testing passes on pre-production hardware with BIOS defined regions
that natively trigger this autodiscovery path of the region driver.
Testing passes a CXL unit test using the dev_dbg() calculation test
(see cxl_region_attach()) across an expanded set of region configs:
1, 1, 1+1, 1+1+1, 2, 2+2, 2+2+2, 2+2+2+2, 4, 4+4, where each number
represents the count of endpoints per host bridge.
Fixes: a32320b71f08 ("cxl/region: Add region autodiscovery")
Reported-by: Dmytro Adamenko <dmytro.adamenko@intel.com>
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jim Harris <jim.harris@samsung.com>
Link: https://lore.kernel.org/r/3946cc55ddc19678733eddc9de2c317749f43f3b.1698263080.git.alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-25 13:01:34 -07:00
|
|
|
* Record that sorting failed, but still continue to calc
|
|
|
|
* cxled->pos so that follow-on code paths can reliably
|
|
|
|
* do p->targets[cxled->pos] to self-reference their entry.
|
2023-02-10 17:31:17 -08:00
|
|
|
*/
|
|
|
|
if (cxled->pos < 0)
|
|
|
|
rc = -ENXIO;
|
|
|
|
}
|
cxl/region: Use cxl_calc_interleave_pos() for auto-discovery
For auto-discovered regions the driver must assign each target to
a valid position in the region interleave set based on the decoder
topology.
The current implementation fails to parse valid decode topologies,
as it does not consider the child offset into a parent port. The sort
put all targets of one port ahead of another port when an interleave
was expected, causing the region assembly to fail.
Replace the existing relative sort with cxl_calc_interleave_pos() that
finds the exact position in a region interleave for an endpoint based
on a walk up the ancestral tree from endpoint to root decoder.
cxl_calc_interleave_pos() was introduced in a prior patch, so the work
here is to use it in cxl_region_sort_targets().
Remove the obsoleted helper functions from the prior sort.
Testing passes on pre-production hardware with BIOS defined regions
that natively trigger this autodiscovery path of the region driver.
Testing passes a CXL unit test using the dev_dbg() calculation test
(see cxl_region_attach()) across an expanded set of region configs:
1, 1, 1+1, 1+1+1, 2, 2+2, 2+2+2, 2+2+2+2, 4, 4+4, where each number
represents the count of endpoints per host bridge.
Fixes: a32320b71f08 ("cxl/region: Add region autodiscovery")
Reported-by: Dmytro Adamenko <dmytro.adamenko@intel.com>
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jim Harris <jim.harris@samsung.com>
Link: https://lore.kernel.org/r/3946cc55ddc19678733eddc9de2c317749f43f3b.1698263080.git.alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-25 13:01:34 -07:00
|
|
|
/* Keep the cxlr target list in interleave position order */
|
|
|
|
sort(p->targets, p->nr_targets, sizeof(p->targets[0]),
|
|
|
|
cmp_interleave_pos, NULL);
|
2023-02-10 17:31:17 -08:00
|
|
|
|
|
|
|
dev_dbg(&cxlr->dev, "region sort %s\n", rc ? "failed" : "successful");
|
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
|
2023-02-10 01:06:15 -08:00
|
|
|
static int cxl_region_attach(struct cxl_region *cxlr,
|
|
|
|
struct cxl_endpoint_decoder *cxled, int pos)
|
|
|
|
{
|
|
|
|
struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
|
|
|
|
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
|
2025-02-03 20:24:29 -08:00
|
|
|
struct cxl_dev_state *cxlds = cxlmd->cxlds;
|
2023-02-10 01:06:15 -08:00
|
|
|
struct cxl_region_params *p = &cxlr->params;
|
|
|
|
struct cxl_port *ep_port, *root_port;
|
|
|
|
struct cxl_dport *dport;
|
|
|
|
int rc = -ENXIO;
|
|
|
|
|
cxl/region: check interleave capability
Since interleave capability is not verified, if the interleave
capability of a target does not match the region need, committing decoder
should have failed at the device end.
In order to checkout this error as quickly as possible, driver needs
to check the interleave capability of target during attaching it to
region.
Per CXL specification r3.1(8.2.4.20.1 CXL HDM Decoder Capability Register),
bits 11 and 12 indicate the capability to establish interleaving in 3, 6,
12 and 16 ways. If these bits are not set, the target cannot be attached to
a region utilizing such interleave ways.
Additionally, bits 8 and 9 represent the capability of the bits used for
interleaving in the address, Linux tracks this in the cxl_port
interleave_mask.
Per CXL specification r3.1(8.2.4.20.13 Decoder Protection):
eIW means encoded Interleave Ways.
eIG means encoded Interleave Granularity.
in HPA:
if eIW is 0 or 8 (interleave ways: 1, 3), all the bits of HPA are used,
the interleave bits are none, the following check is ignored.
if eIW is less than 8 (interleave ways: 2, 4, 8, 16), the interleave bits
start at bit position eIG + 8 and end at eIG + eIW + 8 - 1.
if eIW is greater than 8 (interleave ways: 6, 12), the interleave bits
start at bit position eIG + 8 and end at eIG + eIW - 1.
if the interleave mask is insufficient to cover the required interleave
bits, the target cannot be attached to the region.
Fixes: 384e624bb211 ("cxl/region: Attach endpoint decoders")
Signed-off-by: Yao Xingtao <yaoxt.fnst@fujitsu.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://patch.msgid.link/20240614084755.59503-2-yaoxt.fnst@fujitsu.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-06-14 04:47:54 -04:00
|
|
|
rc = check_interleave_cap(&cxled->cxld, p->interleave_ways,
|
|
|
|
p->interleave_granularity);
|
|
|
|
if (rc) {
|
|
|
|
dev_dbg(&cxlr->dev, "%s iw: %d ig: %d is not supported\n",
|
|
|
|
dev_name(&cxled->cxld.dev), p->interleave_ways,
|
|
|
|
p->interleave_granularity);
|
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
|
2025-02-03 20:24:29 -08:00
|
|
|
if (cxled->part < 0) {
|
2023-02-10 01:06:15 -08:00
|
|
|
dev_dbg(&cxlr->dev, "%s dead\n", dev_name(&cxled->cxld.dev));
|
|
|
|
return -ENODEV;
|
|
|
|
}
|
|
|
|
|
2025-02-03 20:24:29 -08:00
|
|
|
if (cxlds->part[cxled->part].mode != cxlr->mode) {
|
|
|
|
dev_dbg(&cxlr->dev, "%s region mode: %d mismatch\n",
|
|
|
|
dev_name(&cxled->cxld.dev), cxlr->mode);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2023-02-10 01:06:15 -08:00
|
|
|
/* all full of members, or interleave config not established? */
|
|
|
|
if (p->state > CXL_CONFIG_INTERLEAVE_ACTIVE) {
|
|
|
|
dev_dbg(&cxlr->dev, "region already active\n");
|
|
|
|
return -EBUSY;
|
2025-05-09 17:06:46 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
if (p->state < CXL_CONFIG_INTERLEAVE_ACTIVE) {
|
2023-02-10 01:06:15 -08:00
|
|
|
dev_dbg(&cxlr->dev, "interleave config missing\n");
|
|
|
|
return -ENXIO;
|
|
|
|
}
|
|
|
|
|
2023-10-11 14:51:31 +00:00
|
|
|
if (p->nr_targets >= p->interleave_ways) {
|
|
|
|
dev_dbg(&cxlr->dev, "region already has %d endpoints\n",
|
|
|
|
p->nr_targets);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2022-06-07 10:56:10 -07:00
|
|
|
ep_port = cxled_to_port(cxled);
|
|
|
|
root_port = cxlrd_to_port(cxlrd);
|
|
|
|
dport = cxl_find_dport_by_dev(root_port, ep_port->host_bridge);
|
|
|
|
if (!dport) {
|
|
|
|
dev_dbg(&cxlr->dev, "%s:%s invalid target for %s\n",
|
|
|
|
dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
|
|
|
|
dev_name(cxlr->dev.parent));
|
|
|
|
return -ENXIO;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (cxled->cxld.target_type != cxlr->type) {
|
|
|
|
dev_dbg(&cxlr->dev, "%s:%s type mismatch: %d vs %d\n",
|
|
|
|
dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
|
|
|
|
cxled->cxld.target_type, cxlr->type);
|
|
|
|
return -ENXIO;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!cxled->dpa_res) {
|
|
|
|
dev_dbg(&cxlr->dev, "%s:%s: missing DPA allocation.\n",
|
|
|
|
dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev));
|
|
|
|
return -ENXIO;
|
|
|
|
}
|
|
|
|
|
2025-02-26 09:21:19 -07:00
|
|
|
if (resource_size(cxled->dpa_res) * p->interleave_ways + p->cache_size !=
|
2022-06-07 10:56:10 -07:00
|
|
|
resource_size(p->res)) {
|
|
|
|
dev_dbg(&cxlr->dev,
|
2025-02-26 09:21:19 -07:00
|
|
|
"%s:%s-size-%#llx * ways-%d + cache-%#llx != region-size-%#llx\n",
|
2022-06-07 10:56:10 -07:00
|
|
|
dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
|
|
|
|
(u64)resource_size(cxled->dpa_res), p->interleave_ways,
|
2025-02-26 09:21:19 -07:00
|
|
|
(u64)p->cache_size, (u64)resource_size(p->res));
|
2022-06-07 10:56:10 -07:00
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2024-03-08 14:59:28 -07:00
|
|
|
cxl_region_perf_data_calculate(cxlr, cxled);
|
|
|
|
|
2023-02-10 17:31:17 -08:00
|
|
|
if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) {
|
|
|
|
int i;
|
|
|
|
|
|
|
|
rc = cxl_region_attach_auto(cxlr, cxled, pos);
|
2022-06-07 10:56:10 -07:00
|
|
|
if (rc)
|
2023-02-10 17:31:17 -08:00
|
|
|
return rc;
|
|
|
|
|
|
|
|
/* await more targets to arrive... */
|
|
|
|
if (p->nr_targets < p->interleave_ways)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* All targets are here, which implies all PCI enumeration that
|
|
|
|
* affects this region has been completed. Walk the topology to
|
|
|
|
* sort the devices into their relative region decode position.
|
|
|
|
*/
|
|
|
|
rc = cxl_region_sort_targets(cxlr);
|
|
|
|
if (rc)
|
|
|
|
return rc;
|
|
|
|
|
|
|
|
for (i = 0; i < p->nr_targets; i++) {
|
|
|
|
cxled = p->targets[i];
|
|
|
|
ep_port = cxled_to_port(cxled);
|
|
|
|
dport = cxl_find_dport_by_dev(root_port,
|
|
|
|
ep_port->host_bridge);
|
|
|
|
rc = cxl_region_attach_position(cxlr, cxlrd, cxled,
|
|
|
|
dport, i);
|
|
|
|
if (rc)
|
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
|
|
|
|
rc = cxl_region_setup_targets(cxlr);
|
|
|
|
if (rc)
|
|
|
|
return rc;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If target setup succeeds in the autodiscovery case
|
|
|
|
* then the region is already committed.
|
|
|
|
*/
|
|
|
|
p->state = CXL_CONFIG_COMMIT;
|
2024-09-03 17:11:51 -07:00
|
|
|
cxl_region_shared_upstream_bandwidth_update(cxlr);
|
2023-02-10 17:31:17 -08:00
|
|
|
|
|
|
|
return 0;
|
2022-06-07 10:56:10 -07:00
|
|
|
}
|
|
|
|
|
2023-02-10 01:06:15 -08:00
|
|
|
rc = cxl_region_validate_position(cxlr, cxled, pos);
|
|
|
|
if (rc)
|
|
|
|
return rc;
|
|
|
|
|
|
|
|
rc = cxl_region_attach_position(cxlr, cxlrd, cxled, dport, pos);
|
|
|
|
if (rc)
|
|
|
|
return rc;
|
2022-06-07 10:56:10 -07:00
|
|
|
|
2022-06-04 15:49:53 -07:00
|
|
|
p->targets[pos] = cxled;
|
|
|
|
cxled->pos = pos;
|
|
|
|
p->nr_targets++;
|
|
|
|
|
2022-06-06 15:18:31 -07:00
|
|
|
if (p->nr_targets == p->interleave_ways) {
|
|
|
|
rc = cxl_region_setup_targets(cxlr);
|
|
|
|
if (rc)
|
2023-10-11 14:51:31 +00:00
|
|
|
return rc;
|
2022-06-07 10:56:10 -07:00
|
|
|
p->state = CXL_CONFIG_ACTIVE;
|
2024-09-03 17:11:51 -07:00
|
|
|
cxl_region_shared_upstream_bandwidth_update(cxlr);
|
2022-06-06 15:18:31 -07:00
|
|
|
}
|
2022-06-07 10:56:10 -07:00
|
|
|
|
2022-08-02 10:34:35 -07:00
|
|
|
cxled->cxld.interleave_ways = p->interleave_ways;
|
|
|
|
cxled->cxld.interleave_granularity = p->interleave_granularity;
|
2022-08-05 13:27:40 -07:00
|
|
|
cxled->cxld.hpa_range = (struct range) {
|
|
|
|
.start = p->res->start,
|
|
|
|
.end = p->res->end,
|
|
|
|
};
|
2022-08-02 10:34:35 -07:00
|
|
|
|
2023-10-27 13:04:48 -07:00
|
|
|
if (p->nr_targets != p->interleave_ways)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Test the auto-discovery position calculator function
|
|
|
|
* against this successfully created user-defined region.
|
|
|
|
* A fail message here means that this interleave config
|
|
|
|
* will fail when presented as CXL_REGION_F_AUTO.
|
|
|
|
*/
|
|
|
|
for (int i = 0; i < p->nr_targets; i++) {
|
|
|
|
struct cxl_endpoint_decoder *cxled = p->targets[i];
|
|
|
|
int test_pos;
|
|
|
|
|
|
|
|
test_pos = cxl_calc_interleave_pos(cxled);
|
|
|
|
dev_dbg(&cxled->cxld.dev,
|
|
|
|
"Test cxl_calc_interleave_pos(): %s test_pos:%d cxled->pos:%d\n",
|
|
|
|
(test_pos == cxled->pos) ? "success" : "fail",
|
|
|
|
test_pos, cxled->pos);
|
|
|
|
}
|
|
|
|
|
2022-06-04 15:49:53 -07:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2025-07-11 16:49:31 -07:00
|
|
|
static struct cxl_region *
|
|
|
|
__cxl_decoder_detach(struct cxl_region *cxlr,
|
|
|
|
struct cxl_endpoint_decoder *cxled, int pos,
|
|
|
|
enum cxl_detach_mode mode)
|
2022-06-04 15:49:53 -07:00
|
|
|
{
|
|
|
|
struct cxl_region_params *p;
|
|
|
|
|
2025-07-11 16:49:32 -07:00
|
|
|
lockdep_assert_held_write(&cxl_rwsem.region);
|
2022-06-04 15:49:53 -07:00
|
|
|
|
2025-07-11 16:49:31 -07:00
|
|
|
if (!cxled) {
|
|
|
|
p = &cxlr->params;
|
2022-06-04 15:49:53 -07:00
|
|
|
|
2025-07-11 16:49:31 -07:00
|
|
|
if (pos >= p->interleave_ways) {
|
|
|
|
dev_dbg(&cxlr->dev, "position %d out of range %d\n",
|
|
|
|
pos, p->interleave_ways);
|
2025-07-18 16:22:40 -05:00
|
|
|
return NULL;
|
2025-07-11 16:49:31 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
if (!p->targets[pos])
|
|
|
|
return NULL;
|
|
|
|
cxled = p->targets[pos];
|
|
|
|
} else {
|
|
|
|
cxlr = cxled->cxld.region;
|
|
|
|
if (!cxlr)
|
|
|
|
return NULL;
|
|
|
|
p = &cxlr->params;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (mode == DETACH_INVALIDATE)
|
|
|
|
cxled->part = -1;
|
2022-06-04 15:49:53 -07:00
|
|
|
|
2022-06-08 22:56:37 -07:00
|
|
|
if (p->state > CXL_CONFIG_ACTIVE) {
|
cxl/port: Fix use-after-free, permit out-of-order decoder shutdown
In support of investigating an initialization failure report [1],
cxl_test was updated to register mock memory-devices after the mock
root-port/bus device had been registered. That led to cxl_test crashing
with a use-after-free bug with the following signature:
cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem0:decoder7.0 @ 0 next: cxl_switch_uport.0 nr_eps: 1 nr_targets: 1
cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem4:decoder14.0 @ 1 next: cxl_switch_uport.0 nr_eps: 2 nr_targets: 1
cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[0] = cxl_switch_dport.0 for mem0:decoder7.0 @ 0
1) cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[1] = cxl_switch_dport.4 for mem4:decoder14.0 @ 1
[..]
cxld_unregister: cxl decoder14.0:
cxl_region_decode_reset: cxl_region region3:
mock_decoder_reset: cxl_port port3: decoder3.0 reset
2) mock_decoder_reset: cxl_port port3: decoder3.0: out of order reset, expected decoder3.1
cxl_endpoint_decoder_release: cxl decoder14.0:
[..]
cxld_unregister: cxl decoder7.0:
3) cxl_region_decode_reset: cxl_region region3:
Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6bc3: 0000 [#1] PREEMPT SMP PTI
[..]
RIP: 0010:to_cxl_port+0x8/0x60 [cxl_core]
[..]
Call Trace:
<TASK>
cxl_region_decode_reset+0x69/0x190 [cxl_core]
cxl_region_detach+0xe8/0x210 [cxl_core]
cxl_decoder_kill_region+0x27/0x40 [cxl_core]
cxld_unregister+0x5d/0x60 [cxl_core]
At 1) a region has been established with 2 endpoint decoders (7.0 and
14.0). Those endpoints share a common switch-decoder in the topology
(3.0). At teardown, 2), decoder14.0 is the first to be removed and hits
the "out of order reset case" in the switch decoder. The effect though
is that region3 cleanup is aborted leaving it in-tact and
referencing decoder14.0. At 3) the second attempt to teardown region3
trips over the stale decoder14.0 object which has long since been
deleted.
The fix here is to recognize that the CXL specification places no
mandate on in-order shutdown of switch-decoders, the driver enforces
in-order allocation, and hardware enforces in-order commit. So, rather
than fail and leave objects dangling, always remove them.
In support of making cxl_region_decode_reset() always succeed,
cxl_region_invalidate_memregion() failures are turned into warnings.
Crashing the kernel is ok there since system integrity is at risk if
caches cannot be managed around physical address mutation events like
CXL region destruction.
A new device_for_each_child_reverse_from() is added to cleanup
port->commit_end after all dependent decoders have been disabled. In
other words if decoders are allocated 0->1->2 and disabled 1->2->0 then
port->commit_end only decrements from 2 after 2 has been disabled, and
it decrements all the way to zero since 1 was disabled previously.
Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1]
Cc: stable@vger.kernel.org
Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware")
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Alison Schofield <alison.schofield@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Zijun Hu <quic_zijuhu@quicinc.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://patch.msgid.link/172964782781.81806.17902885593105284330.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
2024-10-22 18:43:49 -07:00
|
|
|
cxl_region_decode_reset(cxlr, p->interleave_ways);
|
2022-06-08 22:56:37 -07:00
|
|
|
p->state = CXL_CONFIG_ACTIVE;
|
|
|
|
}
|
|
|
|
|
2025-07-11 16:49:31 -07:00
|
|
|
for (struct cxl_port *iter = cxled_to_port(cxled); !is_cxl_root(iter);
|
2022-06-07 10:56:10 -07:00
|
|
|
iter = to_cxl_port(iter->dev.parent))
|
|
|
|
cxl_port_detach_region(iter, cxlr, cxled);
|
|
|
|
|
2022-06-04 15:49:53 -07:00
|
|
|
if (cxled->pos < 0 || cxled->pos >= p->interleave_ways ||
|
|
|
|
p->targets[cxled->pos] != cxled) {
|
|
|
|
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
|
|
|
|
|
|
|
|
dev_WARN_ONCE(&cxlr->dev, 1, "expected %s:%s at position %d\n",
|
|
|
|
dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
|
|
|
|
cxled->pos);
|
2025-07-11 16:49:31 -07:00
|
|
|
return NULL;
|
2022-06-04 15:49:53 -07:00
|
|
|
}
|
|
|
|
|
2022-06-06 15:18:31 -07:00
|
|
|
if (p->state == CXL_CONFIG_ACTIVE) {
|
2022-06-07 10:56:10 -07:00
|
|
|
p->state = CXL_CONFIG_INTERLEAVE_ACTIVE;
|
2022-06-06 15:18:31 -07:00
|
|
|
cxl_region_teardown_targets(cxlr);
|
|
|
|
}
|
2022-06-04 15:49:53 -07:00
|
|
|
p->targets[cxled->pos] = NULL;
|
|
|
|
p->nr_targets--;
|
2022-08-05 13:27:40 -07:00
|
|
|
cxled->cxld.hpa_range = (struct range) {
|
|
|
|
.start = 0,
|
|
|
|
.end = -1,
|
|
|
|
};
|
2022-06-04 15:49:53 -07:00
|
|
|
|
2025-07-11 16:49:31 -07:00
|
|
|
get_device(&cxlr->dev);
|
|
|
|
return cxlr;
|
2022-06-04 15:49:53 -07:00
|
|
|
}
|
|
|
|
|
2025-07-11 16:49:31 -07:00
|
|
|
/*
|
|
|
|
* Cleanup a decoder's interest in a region. There are 2 cases to
|
|
|
|
* handle, removing an unknown @cxled from a known position in a region
|
|
|
|
* (detach_target()) or removing a known @cxled from an unknown @cxlr
|
|
|
|
* (cxld_unregister())
|
|
|
|
*
|
|
|
|
* When the detachment finds a region release the region driver.
|
|
|
|
*/
|
|
|
|
int cxl_decoder_detach(struct cxl_region *cxlr,
|
|
|
|
struct cxl_endpoint_decoder *cxled, int pos,
|
|
|
|
enum cxl_detach_mode mode)
|
2022-06-04 15:49:53 -07:00
|
|
|
{
|
2025-07-11 16:49:31 -07:00
|
|
|
struct cxl_region *detach;
|
|
|
|
|
|
|
|
/* when the decoder is being destroyed lock unconditionally */
|
2025-07-11 16:49:32 -07:00
|
|
|
if (mode == DETACH_INVALIDATE) {
|
|
|
|
guard(rwsem_write)(&cxl_rwsem.region);
|
|
|
|
detach = __cxl_decoder_detach(cxlr, cxled, pos, mode);
|
|
|
|
} else {
|
|
|
|
int rc;
|
2025-07-11 16:49:31 -07:00
|
|
|
|
2025-07-11 16:49:32 -07:00
|
|
|
ACQUIRE(rwsem_write_kill, rwsem)(&cxl_rwsem.region);
|
|
|
|
if ((rc = ACQUIRE_ERR(rwsem_write_kill, &rwsem)))
|
2025-07-11 16:49:31 -07:00
|
|
|
return rc;
|
2025-07-11 16:49:32 -07:00
|
|
|
detach = __cxl_decoder_detach(cxlr, cxled, pos, mode);
|
2025-07-11 16:49:31 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
if (detach) {
|
|
|
|
device_release_driver(&detach->dev);
|
|
|
|
put_device(&detach->dev);
|
|
|
|
}
|
|
|
|
return 0;
|
2022-06-04 15:49:53 -07:00
|
|
|
}
|
|
|
|
|
2025-07-11 16:49:32 -07:00
|
|
|
static int __attach_target(struct cxl_region *cxlr,
|
|
|
|
struct cxl_endpoint_decoder *cxled, int pos,
|
|
|
|
unsigned int state)
|
2022-06-04 15:49:53 -07:00
|
|
|
{
|
2025-07-11 16:49:32 -07:00
|
|
|
int rc;
|
|
|
|
|
|
|
|
if (state == TASK_INTERRUPTIBLE) {
|
|
|
|
ACQUIRE(rwsem_write_kill, rwsem)(&cxl_rwsem.region);
|
|
|
|
if ((rc = ACQUIRE_ERR(rwsem_write_kill, &rwsem)))
|
|
|
|
return rc;
|
|
|
|
guard(rwsem_read)(&cxl_rwsem.dpa);
|
|
|
|
return cxl_region_attach(cxlr, cxled, pos);
|
|
|
|
}
|
|
|
|
guard(rwsem_write)(&cxl_rwsem.region);
|
|
|
|
guard(rwsem_read)(&cxl_rwsem.dpa);
|
|
|
|
return cxl_region_attach(cxlr, cxled, pos);
|
2022-06-04 15:49:53 -07:00
|
|
|
}
|
|
|
|
|
2023-02-10 01:06:04 -08:00
|
|
|
static int attach_target(struct cxl_region *cxlr,
|
|
|
|
struct cxl_endpoint_decoder *cxled, int pos,
|
|
|
|
unsigned int state)
|
2022-06-04 15:49:53 -07:00
|
|
|
{
|
2025-07-11 16:49:32 -07:00
|
|
|
int rc = __attach_target(cxlr, cxled, pos, state);
|
2022-06-04 15:49:53 -07:00
|
|
|
|
2025-07-11 16:49:32 -07:00
|
|
|
if (rc == 0)
|
|
|
|
return 0;
|
2025-05-09 17:06:57 +02:00
|
|
|
|
2025-07-11 16:49:32 -07:00
|
|
|
dev_warn(cxled->cxld.dev.parent, "failed to attach %s to %s: %d\n",
|
|
|
|
dev_name(&cxled->cxld.dev), dev_name(&cxlr->dev), rc);
|
2022-06-04 15:49:53 -07:00
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int detach_target(struct cxl_region *cxlr, int pos)
|
|
|
|
{
|
2025-07-11 16:49:31 -07:00
|
|
|
return cxl_decoder_detach(cxlr, NULL, pos, DETACH_ONLY);
|
2022-06-04 15:49:53 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
static size_t store_targetN(struct cxl_region *cxlr, const char *buf, int pos,
|
|
|
|
size_t len)
|
|
|
|
{
|
|
|
|
int rc;
|
|
|
|
|
|
|
|
if (sysfs_streq(buf, "\n"))
|
|
|
|
rc = detach_target(cxlr, pos);
|
2023-02-10 01:06:04 -08:00
|
|
|
else {
|
|
|
|
struct device *dev;
|
|
|
|
|
|
|
|
dev = bus_find_device_by_name(&cxl_bus_type, NULL, buf);
|
|
|
|
if (!dev)
|
|
|
|
return -ENODEV;
|
|
|
|
|
|
|
|
if (!is_endpoint_decoder(dev)) {
|
|
|
|
rc = -EINVAL;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
rc = attach_target(cxlr, to_cxl_endpoint_decoder(dev), pos,
|
|
|
|
TASK_INTERRUPTIBLE);
|
|
|
|
out:
|
|
|
|
put_device(dev);
|
|
|
|
}
|
2022-06-04 15:49:53 -07:00
|
|
|
|
|
|
|
if (rc < 0)
|
|
|
|
return rc;
|
|
|
|
return len;
|
|
|
|
}
|
|
|
|
|
|
|
|
#define TARGET_ATTR_RW(n) \
|
|
|
|
static ssize_t target##n##_show( \
|
|
|
|
struct device *dev, struct device_attribute *attr, char *buf) \
|
|
|
|
{ \
|
|
|
|
return show_targetN(to_cxl_region(dev), buf, (n)); \
|
|
|
|
} \
|
|
|
|
static ssize_t target##n##_store(struct device *dev, \
|
|
|
|
struct device_attribute *attr, \
|
|
|
|
const char *buf, size_t len) \
|
|
|
|
{ \
|
|
|
|
return store_targetN(to_cxl_region(dev), buf, (n), len); \
|
|
|
|
} \
|
|
|
|
static DEVICE_ATTR_RW(target##n)
|
|
|
|
|
|
|
|
TARGET_ATTR_RW(0);
|
|
|
|
TARGET_ATTR_RW(1);
|
|
|
|
TARGET_ATTR_RW(2);
|
|
|
|
TARGET_ATTR_RW(3);
|
|
|
|
TARGET_ATTR_RW(4);
|
|
|
|
TARGET_ATTR_RW(5);
|
|
|
|
TARGET_ATTR_RW(6);
|
|
|
|
TARGET_ATTR_RW(7);
|
|
|
|
TARGET_ATTR_RW(8);
|
|
|
|
TARGET_ATTR_RW(9);
|
|
|
|
TARGET_ATTR_RW(10);
|
|
|
|
TARGET_ATTR_RW(11);
|
|
|
|
TARGET_ATTR_RW(12);
|
|
|
|
TARGET_ATTR_RW(13);
|
|
|
|
TARGET_ATTR_RW(14);
|
|
|
|
TARGET_ATTR_RW(15);
|
|
|
|
|
|
|
|
static struct attribute *target_attrs[] = {
|
|
|
|
&dev_attr_target0.attr,
|
|
|
|
&dev_attr_target1.attr,
|
|
|
|
&dev_attr_target2.attr,
|
|
|
|
&dev_attr_target3.attr,
|
|
|
|
&dev_attr_target4.attr,
|
|
|
|
&dev_attr_target5.attr,
|
|
|
|
&dev_attr_target6.attr,
|
|
|
|
&dev_attr_target7.attr,
|
|
|
|
&dev_attr_target8.attr,
|
|
|
|
&dev_attr_target9.attr,
|
|
|
|
&dev_attr_target10.attr,
|
|
|
|
&dev_attr_target11.attr,
|
|
|
|
&dev_attr_target12.attr,
|
|
|
|
&dev_attr_target13.attr,
|
|
|
|
&dev_attr_target14.attr,
|
|
|
|
&dev_attr_target15.attr,
|
|
|
|
NULL,
|
|
|
|
};
|
|
|
|
|
|
|
|
static umode_t cxl_region_target_visible(struct kobject *kobj,
|
|
|
|
struct attribute *a, int n)
|
|
|
|
{
|
|
|
|
struct device *dev = kobj_to_dev(kobj);
|
|
|
|
struct cxl_region *cxlr = to_cxl_region(dev);
|
|
|
|
struct cxl_region_params *p = &cxlr->params;
|
|
|
|
|
|
|
|
if (n < p->interleave_ways)
|
|
|
|
return a->mode;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static const struct attribute_group cxl_region_target_group = {
|
|
|
|
.attrs = target_attrs,
|
|
|
|
.is_visible = cxl_region_target_visible,
|
|
|
|
};
|
|
|
|
|
|
|
|
static const struct attribute_group *get_cxl_region_target_group(void)
|
|
|
|
{
|
|
|
|
return &cxl_region_target_group;
|
|
|
|
}
|
|
|
|
|
2021-05-27 13:30:41 -07:00
|
|
|
static const struct attribute_group *region_groups[] = {
|
|
|
|
&cxl_base_attribute_group,
|
|
|
|
&cxl_region_group,
|
2022-06-04 15:49:53 -07:00
|
|
|
&cxl_region_target_group,
|
2024-03-08 14:59:29 -07:00
|
|
|
&cxl_region_access0_coordinate_group,
|
|
|
|
&cxl_region_access1_coordinate_group,
|
2021-05-27 13:30:41 -07:00
|
|
|
NULL,
|
|
|
|
};
|
|
|
|
|
cxl/region: Add region creation support
CXL 2.0 allows for dynamic provisioning of new memory regions (system
physical address resources like "System RAM" and "Persistent Memory").
Whereas DDR and PMEM resources are conveyed statically at boot, CXL
allows for assembling and instantiating new regions from the available
capacity of CXL memory expanders in the system.
Sysfs with an "echo $region_name > $create_region_attribute" interface
is chosen as the mechanism to initiate the provisioning process. This
was chosen over ioctl() and netlink() to keep the configuration
interface entirely in a pseudo-fs interface, and it was chosen over
configfs since, aside from this one creation event, the interface is
read-mostly. I.e. configfs supports cases where an object is designed to
be provisioned each boot, like an iSCSI storage target, and CXL region
creation is mostly for PMEM regions which are created usually once
per-lifetime of a server instance. This is an improvement over nvdimm
that pre-created "seed" devices that tended to confuse users looking to
determine which devices are active and which are idle.
Recall that the major change that CXL brings over previous persistent
memory architectures is the ability to dynamically define new regions.
Compare that to drivers like 'nfit' where the region configuration is
statically defined by platform firmware.
Regions are created as a child of a root decoder that encompasses an
address space with constraints. When created through sysfs, the root
decoder is explicit. When created from an LSA's region structure a root
decoder will possibly need to be inferred by the driver.
Upon region creation through sysfs, a vacant region is created with a
unique name. Regions have a number of attributes that must be configured
before the region can be bound to the driver where HDM decoder program
is completed.
An example of creating a new region:
- Allocate a new region name:
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
- Create a new region by name:
while
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region
do true; done
- Region now exists in sysfs:
stat -t /sys/bus/cxl/devices/decoder0.0/$region
- Delete the region, and name:
echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com
[djbw: simplify locking, reword changelog]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-08 10:28:34 -07:00
|
|
|
static void cxl_region_release(struct device *dev)
|
|
|
|
{
|
2022-11-03 17:31:00 -07:00
|
|
|
struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev->parent);
|
cxl/region: Add region creation support
CXL 2.0 allows for dynamic provisioning of new memory regions (system
physical address resources like "System RAM" and "Persistent Memory").
Whereas DDR and PMEM resources are conveyed statically at boot, CXL
allows for assembling and instantiating new regions from the available
capacity of CXL memory expanders in the system.
Sysfs with an "echo $region_name > $create_region_attribute" interface
is chosen as the mechanism to initiate the provisioning process. This
was chosen over ioctl() and netlink() to keep the configuration
interface entirely in a pseudo-fs interface, and it was chosen over
configfs since, aside from this one creation event, the interface is
read-mostly. I.e. configfs supports cases where an object is designed to
be provisioned each boot, like an iSCSI storage target, and CXL region
creation is mostly for PMEM regions which are created usually once
per-lifetime of a server instance. This is an improvement over nvdimm
that pre-created "seed" devices that tended to confuse users looking to
determine which devices are active and which are idle.
Recall that the major change that CXL brings over previous persistent
memory architectures is the ability to dynamically define new regions.
Compare that to drivers like 'nfit' where the region configuration is
statically defined by platform firmware.
Regions are created as a child of a root decoder that encompasses an
address space with constraints. When created through sysfs, the root
decoder is explicit. When created from an LSA's region structure a root
decoder will possibly need to be inferred by the driver.
Upon region creation through sysfs, a vacant region is created with a
unique name. Regions have a number of attributes that must be configured
before the region can be bound to the driver where HDM decoder program
is completed.
An example of creating a new region:
- Allocate a new region name:
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
- Create a new region by name:
while
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region
do true; done
- Region now exists in sysfs:
stat -t /sys/bus/cxl/devices/decoder0.0/$region
- Delete the region, and name:
echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com
[djbw: simplify locking, reword changelog]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-08 10:28:34 -07:00
|
|
|
struct cxl_region *cxlr = to_cxl_region(dev);
|
2022-11-03 17:31:00 -07:00
|
|
|
int id = atomic_read(&cxlrd->region_id);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Try to reuse the recently idled id rather than the cached
|
|
|
|
* next id to prevent the region id space from increasing
|
|
|
|
* unnecessarily.
|
|
|
|
*/
|
|
|
|
if (cxlr->id < id)
|
|
|
|
if (atomic_try_cmpxchg(&cxlrd->region_id, &id, cxlr->id)) {
|
|
|
|
memregion_free(id);
|
|
|
|
goto out;
|
|
|
|
}
|
cxl/region: Add region creation support
CXL 2.0 allows for dynamic provisioning of new memory regions (system
physical address resources like "System RAM" and "Persistent Memory").
Whereas DDR and PMEM resources are conveyed statically at boot, CXL
allows for assembling and instantiating new regions from the available
capacity of CXL memory expanders in the system.
Sysfs with an "echo $region_name > $create_region_attribute" interface
is chosen as the mechanism to initiate the provisioning process. This
was chosen over ioctl() and netlink() to keep the configuration
interface entirely in a pseudo-fs interface, and it was chosen over
configfs since, aside from this one creation event, the interface is
read-mostly. I.e. configfs supports cases where an object is designed to
be provisioned each boot, like an iSCSI storage target, and CXL region
creation is mostly for PMEM regions which are created usually once
per-lifetime of a server instance. This is an improvement over nvdimm
that pre-created "seed" devices that tended to confuse users looking to
determine which devices are active and which are idle.
Recall that the major change that CXL brings over previous persistent
memory architectures is the ability to dynamically define new regions.
Compare that to drivers like 'nfit' where the region configuration is
statically defined by platform firmware.
Regions are created as a child of a root decoder that encompasses an
address space with constraints. When created through sysfs, the root
decoder is explicit. When created from an LSA's region structure a root
decoder will possibly need to be inferred by the driver.
Upon region creation through sysfs, a vacant region is created with a
unique name. Regions have a number of attributes that must be configured
before the region can be bound to the driver where HDM decoder program
is completed.
An example of creating a new region:
- Allocate a new region name:
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
- Create a new region by name:
while
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region
do true; done
- Region now exists in sysfs:
stat -t /sys/bus/cxl/devices/decoder0.0/$region
- Delete the region, and name:
echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com
[djbw: simplify locking, reword changelog]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-08 10:28:34 -07:00
|
|
|
|
|
|
|
memregion_free(cxlr->id);
|
2022-11-03 17:31:00 -07:00
|
|
|
out:
|
|
|
|
put_device(dev->parent);
|
cxl/region: Add region creation support
CXL 2.0 allows for dynamic provisioning of new memory regions (system
physical address resources like "System RAM" and "Persistent Memory").
Whereas DDR and PMEM resources are conveyed statically at boot, CXL
allows for assembling and instantiating new regions from the available
capacity of CXL memory expanders in the system.
Sysfs with an "echo $region_name > $create_region_attribute" interface
is chosen as the mechanism to initiate the provisioning process. This
was chosen over ioctl() and netlink() to keep the configuration
interface entirely in a pseudo-fs interface, and it was chosen over
configfs since, aside from this one creation event, the interface is
read-mostly. I.e. configfs supports cases where an object is designed to
be provisioned each boot, like an iSCSI storage target, and CXL region
creation is mostly for PMEM regions which are created usually once
per-lifetime of a server instance. This is an improvement over nvdimm
that pre-created "seed" devices that tended to confuse users looking to
determine which devices are active and which are idle.
Recall that the major change that CXL brings over previous persistent
memory architectures is the ability to dynamically define new regions.
Compare that to drivers like 'nfit' where the region configuration is
statically defined by platform firmware.
Regions are created as a child of a root decoder that encompasses an
address space with constraints. When created through sysfs, the root
decoder is explicit. When created from an LSA's region structure a root
decoder will possibly need to be inferred by the driver.
Upon region creation through sysfs, a vacant region is created with a
unique name. Regions have a number of attributes that must be configured
before the region can be bound to the driver where HDM decoder program
is completed.
An example of creating a new region:
- Allocate a new region name:
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
- Create a new region by name:
while
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region
do true; done
- Region now exists in sysfs:
stat -t /sys/bus/cxl/devices/decoder0.0/$region
- Delete the region, and name:
echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com
[djbw: simplify locking, reword changelog]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-08 10:28:34 -07:00
|
|
|
kfree(cxlr);
|
|
|
|
}
|
|
|
|
|
2021-06-15 14:00:40 -07:00
|
|
|
const struct device_type cxl_region_type = {
|
cxl/region: Add region creation support
CXL 2.0 allows for dynamic provisioning of new memory regions (system
physical address resources like "System RAM" and "Persistent Memory").
Whereas DDR and PMEM resources are conveyed statically at boot, CXL
allows for assembling and instantiating new regions from the available
capacity of CXL memory expanders in the system.
Sysfs with an "echo $region_name > $create_region_attribute" interface
is chosen as the mechanism to initiate the provisioning process. This
was chosen over ioctl() and netlink() to keep the configuration
interface entirely in a pseudo-fs interface, and it was chosen over
configfs since, aside from this one creation event, the interface is
read-mostly. I.e. configfs supports cases where an object is designed to
be provisioned each boot, like an iSCSI storage target, and CXL region
creation is mostly for PMEM regions which are created usually once
per-lifetime of a server instance. This is an improvement over nvdimm
that pre-created "seed" devices that tended to confuse users looking to
determine which devices are active and which are idle.
Recall that the major change that CXL brings over previous persistent
memory architectures is the ability to dynamically define new regions.
Compare that to drivers like 'nfit' where the region configuration is
statically defined by platform firmware.
Regions are created as a child of a root decoder that encompasses an
address space with constraints. When created through sysfs, the root
decoder is explicit. When created from an LSA's region structure a root
decoder will possibly need to be inferred by the driver.
Upon region creation through sysfs, a vacant region is created with a
unique name. Regions have a number of attributes that must be configured
before the region can be bound to the driver where HDM decoder program
is completed.
An example of creating a new region:
- Allocate a new region name:
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
- Create a new region by name:
while
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region
do true; done
- Region now exists in sysfs:
stat -t /sys/bus/cxl/devices/decoder0.0/$region
- Delete the region, and name:
echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com
[djbw: simplify locking, reword changelog]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-08 10:28:34 -07:00
|
|
|
.name = "cxl_region",
|
|
|
|
.release = cxl_region_release,
|
2021-05-27 13:30:41 -07:00
|
|
|
.groups = region_groups
|
cxl/region: Add region creation support
CXL 2.0 allows for dynamic provisioning of new memory regions (system
physical address resources like "System RAM" and "Persistent Memory").
Whereas DDR and PMEM resources are conveyed statically at boot, CXL
allows for assembling and instantiating new regions from the available
capacity of CXL memory expanders in the system.
Sysfs with an "echo $region_name > $create_region_attribute" interface
is chosen as the mechanism to initiate the provisioning process. This
was chosen over ioctl() and netlink() to keep the configuration
interface entirely in a pseudo-fs interface, and it was chosen over
configfs since, aside from this one creation event, the interface is
read-mostly. I.e. configfs supports cases where an object is designed to
be provisioned each boot, like an iSCSI storage target, and CXL region
creation is mostly for PMEM regions which are created usually once
per-lifetime of a server instance. This is an improvement over nvdimm
that pre-created "seed" devices that tended to confuse users looking to
determine which devices are active and which are idle.
Recall that the major change that CXL brings over previous persistent
memory architectures is the ability to dynamically define new regions.
Compare that to drivers like 'nfit' where the region configuration is
statically defined by platform firmware.
Regions are created as a child of a root decoder that encompasses an
address space with constraints. When created through sysfs, the root
decoder is explicit. When created from an LSA's region structure a root
decoder will possibly need to be inferred by the driver.
Upon region creation through sysfs, a vacant region is created with a
unique name. Regions have a number of attributes that must be configured
before the region can be bound to the driver where HDM decoder program
is completed.
An example of creating a new region:
- Allocate a new region name:
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
- Create a new region by name:
while
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region
do true; done
- Region now exists in sysfs:
stat -t /sys/bus/cxl/devices/decoder0.0/$region
- Delete the region, and name:
echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com
[djbw: simplify locking, reword changelog]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-08 10:28:34 -07:00
|
|
|
};
|
|
|
|
|
|
|
|
bool is_cxl_region(struct device *dev)
|
|
|
|
{
|
|
|
|
return dev->type == &cxl_region_type;
|
|
|
|
}
|
module: Convert symbol namespace to string literal
Clean up the existing export namespace code along the same lines of
commit 33def8498fdd ("treewide: Convert macro and uses of __section(foo)
to __section("foo")") and for the same reason, it is not desired for the
namespace argument to be a macro expansion itself.
Scripted using
git grep -l -e MODULE_IMPORT_NS -e EXPORT_SYMBOL_NS | while read file;
do
awk -i inplace '
/^#define EXPORT_SYMBOL_NS/ {
gsub(/__stringify\(ns\)/, "ns");
print;
next;
}
/^#define MODULE_IMPORT_NS/ {
gsub(/__stringify\(ns\)/, "ns");
print;
next;
}
/MODULE_IMPORT_NS/ {
$0 = gensub(/MODULE_IMPORT_NS\(([^)]*)\)/, "MODULE_IMPORT_NS(\"\\1\")", "g");
}
/EXPORT_SYMBOL_NS/ {
if ($0 ~ /(EXPORT_SYMBOL_NS[^(]*)\(([^,]+),/) {
if ($0 !~ /(EXPORT_SYMBOL_NS[^(]*)\(([^,]+), ([^)]+)\)/ &&
$0 !~ /(EXPORT_SYMBOL_NS[^(]*)\(\)/ &&
$0 !~ /^my/) {
getline line;
gsub(/[[:space:]]*\\$/, "");
gsub(/[[:space:]]/, "", line);
$0 = $0 " " line;
}
$0 = gensub(/(EXPORT_SYMBOL_NS[^(]*)\(([^,]+), ([^)]+)\)/,
"\\1(\\2, \"\\3\")", "g");
}
}
{ print }' $file;
done
Requested-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://mail.google.com/mail/u/2/#inbox/FMfcgzQXKWgMmjdFwwdsfgxzKpVHWPlc
Acked-by: Greg KH <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-12-02 15:59:47 +01:00
|
|
|
EXPORT_SYMBOL_NS_GPL(is_cxl_region, "CXL");
|
cxl/region: Add region creation support
CXL 2.0 allows for dynamic provisioning of new memory regions (system
physical address resources like "System RAM" and "Persistent Memory").
Whereas DDR and PMEM resources are conveyed statically at boot, CXL
allows for assembling and instantiating new regions from the available
capacity of CXL memory expanders in the system.
Sysfs with an "echo $region_name > $create_region_attribute" interface
is chosen as the mechanism to initiate the provisioning process. This
was chosen over ioctl() and netlink() to keep the configuration
interface entirely in a pseudo-fs interface, and it was chosen over
configfs since, aside from this one creation event, the interface is
read-mostly. I.e. configfs supports cases where an object is designed to
be provisioned each boot, like an iSCSI storage target, and CXL region
creation is mostly for PMEM regions which are created usually once
per-lifetime of a server instance. This is an improvement over nvdimm
that pre-created "seed" devices that tended to confuse users looking to
determine which devices are active and which are idle.
Recall that the major change that CXL brings over previous persistent
memory architectures is the ability to dynamically define new regions.
Compare that to drivers like 'nfit' where the region configuration is
statically defined by platform firmware.
Regions are created as a child of a root decoder that encompasses an
address space with constraints. When created through sysfs, the root
decoder is explicit. When created from an LSA's region structure a root
decoder will possibly need to be inferred by the driver.
Upon region creation through sysfs, a vacant region is created with a
unique name. Regions have a number of attributes that must be configured
before the region can be bound to the driver where HDM decoder program
is completed.
An example of creating a new region:
- Allocate a new region name:
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
- Create a new region by name:
while
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region
do true; done
- Region now exists in sysfs:
stat -t /sys/bus/cxl/devices/decoder0.0/$region
- Delete the region, and name:
echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com
[djbw: simplify locking, reword changelog]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-08 10:28:34 -07:00
|
|
|
|
|
|
|
static struct cxl_region *to_cxl_region(struct device *dev)
|
|
|
|
{
|
|
|
|
if (dev_WARN_ONCE(dev, dev->type != &cxl_region_type,
|
|
|
|
"not a cxl_region device\n"))
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
return container_of(dev, struct cxl_region, dev);
|
|
|
|
}
|
|
|
|
|
2023-12-14 12:13:58 -07:00
|
|
|
static void unregister_region(void *_cxlr)
|
cxl/region: Add region creation support
CXL 2.0 allows for dynamic provisioning of new memory regions (system
physical address resources like "System RAM" and "Persistent Memory").
Whereas DDR and PMEM resources are conveyed statically at boot, CXL
allows for assembling and instantiating new regions from the available
capacity of CXL memory expanders in the system.
Sysfs with an "echo $region_name > $create_region_attribute" interface
is chosen as the mechanism to initiate the provisioning process. This
was chosen over ioctl() and netlink() to keep the configuration
interface entirely in a pseudo-fs interface, and it was chosen over
configfs since, aside from this one creation event, the interface is
read-mostly. I.e. configfs supports cases where an object is designed to
be provisioned each boot, like an iSCSI storage target, and CXL region
creation is mostly for PMEM regions which are created usually once
per-lifetime of a server instance. This is an improvement over nvdimm
that pre-created "seed" devices that tended to confuse users looking to
determine which devices are active and which are idle.
Recall that the major change that CXL brings over previous persistent
memory architectures is the ability to dynamically define new regions.
Compare that to drivers like 'nfit' where the region configuration is
statically defined by platform firmware.
Regions are created as a child of a root decoder that encompasses an
address space with constraints. When created through sysfs, the root
decoder is explicit. When created from an LSA's region structure a root
decoder will possibly need to be inferred by the driver.
Upon region creation through sysfs, a vacant region is created with a
unique name. Regions have a number of attributes that must be configured
before the region can be bound to the driver where HDM decoder program
is completed.
An example of creating a new region:
- Allocate a new region name:
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
- Create a new region by name:
while
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region
do true; done
- Region now exists in sysfs:
stat -t /sys/bus/cxl/devices/decoder0.0/$region
- Delete the region, and name:
echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com
[djbw: simplify locking, reword changelog]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-08 10:28:34 -07:00
|
|
|
{
|
2023-12-14 12:13:58 -07:00
|
|
|
struct cxl_region *cxlr = _cxlr;
|
2022-11-03 17:30:30 -07:00
|
|
|
struct cxl_region_params *p = &cxlr->params;
|
|
|
|
int i;
|
2022-04-25 11:43:44 -07:00
|
|
|
|
2023-12-14 12:13:58 -07:00
|
|
|
device_del(&cxlr->dev);
|
2022-11-03 17:30:30 -07:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Now that region sysfs is shutdown, the parameter block is now
|
|
|
|
* read-only, so no need to hold the region rwsem to access the
|
|
|
|
* region parameters.
|
|
|
|
*/
|
|
|
|
for (i = 0; i < p->interleave_ways; i++)
|
|
|
|
detach_target(cxlr, i);
|
|
|
|
|
2022-04-25 11:43:44 -07:00
|
|
|
cxl_region_iomem_release(cxlr);
|
2023-12-14 12:13:58 -07:00
|
|
|
put_device(&cxlr->dev);
|
cxl/region: Add region creation support
CXL 2.0 allows for dynamic provisioning of new memory regions (system
physical address resources like "System RAM" and "Persistent Memory").
Whereas DDR and PMEM resources are conveyed statically at boot, CXL
allows for assembling and instantiating new regions from the available
capacity of CXL memory expanders in the system.
Sysfs with an "echo $region_name > $create_region_attribute" interface
is chosen as the mechanism to initiate the provisioning process. This
was chosen over ioctl() and netlink() to keep the configuration
interface entirely in a pseudo-fs interface, and it was chosen over
configfs since, aside from this one creation event, the interface is
read-mostly. I.e. configfs supports cases where an object is designed to
be provisioned each boot, like an iSCSI storage target, and CXL region
creation is mostly for PMEM regions which are created usually once
per-lifetime of a server instance. This is an improvement over nvdimm
that pre-created "seed" devices that tended to confuse users looking to
determine which devices are active and which are idle.
Recall that the major change that CXL brings over previous persistent
memory architectures is the ability to dynamically define new regions.
Compare that to drivers like 'nfit' where the region configuration is
statically defined by platform firmware.
Regions are created as a child of a root decoder that encompasses an
address space with constraints. When created through sysfs, the root
decoder is explicit. When created from an LSA's region structure a root
decoder will possibly need to be inferred by the driver.
Upon region creation through sysfs, a vacant region is created with a
unique name. Regions have a number of attributes that must be configured
before the region can be bound to the driver where HDM decoder program
is completed.
An example of creating a new region:
- Allocate a new region name:
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
- Create a new region by name:
while
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region
do true; done
- Region now exists in sysfs:
stat -t /sys/bus/cxl/devices/decoder0.0/$region
- Delete the region, and name:
echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com
[djbw: simplify locking, reword changelog]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-08 10:28:34 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
static struct lock_class_key cxl_region_key;
|
|
|
|
|
|
|
|
static struct cxl_region *cxl_region_alloc(struct cxl_root_decoder *cxlrd, int id)
|
|
|
|
{
|
|
|
|
struct cxl_region *cxlr;
|
|
|
|
struct device *dev;
|
|
|
|
|
|
|
|
cxlr = kzalloc(sizeof(*cxlr), GFP_KERNEL);
|
|
|
|
if (!cxlr) {
|
|
|
|
memregion_free(id);
|
|
|
|
return ERR_PTR(-ENOMEM);
|
|
|
|
}
|
|
|
|
|
|
|
|
dev = &cxlr->dev;
|
|
|
|
device_initialize(dev);
|
|
|
|
lockdep_set_class(&dev->mutex, &cxl_region_key);
|
|
|
|
dev->parent = &cxlrd->cxlsd.cxld.dev;
|
2022-11-03 17:31:00 -07:00
|
|
|
/*
|
|
|
|
* Keep root decoder pinned through cxl_region_release to fixup
|
|
|
|
* region id allocations
|
|
|
|
*/
|
|
|
|
get_device(dev->parent);
|
cxl/region: Add region creation support
CXL 2.0 allows for dynamic provisioning of new memory regions (system
physical address resources like "System RAM" and "Persistent Memory").
Whereas DDR and PMEM resources are conveyed statically at boot, CXL
allows for assembling and instantiating new regions from the available
capacity of CXL memory expanders in the system.
Sysfs with an "echo $region_name > $create_region_attribute" interface
is chosen as the mechanism to initiate the provisioning process. This
was chosen over ioctl() and netlink() to keep the configuration
interface entirely in a pseudo-fs interface, and it was chosen over
configfs since, aside from this one creation event, the interface is
read-mostly. I.e. configfs supports cases where an object is designed to
be provisioned each boot, like an iSCSI storage target, and CXL region
creation is mostly for PMEM regions which are created usually once
per-lifetime of a server instance. This is an improvement over nvdimm
that pre-created "seed" devices that tended to confuse users looking to
determine which devices are active and which are idle.
Recall that the major change that CXL brings over previous persistent
memory architectures is the ability to dynamically define new regions.
Compare that to drivers like 'nfit' where the region configuration is
statically defined by platform firmware.
Regions are created as a child of a root decoder that encompasses an
address space with constraints. When created through sysfs, the root
decoder is explicit. When created from an LSA's region structure a root
decoder will possibly need to be inferred by the driver.
Upon region creation through sysfs, a vacant region is created with a
unique name. Regions have a number of attributes that must be configured
before the region can be bound to the driver where HDM decoder program
is completed.
An example of creating a new region:
- Allocate a new region name:
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
- Create a new region by name:
while
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region
do true; done
- Region now exists in sysfs:
stat -t /sys/bus/cxl/devices/decoder0.0/$region
- Delete the region, and name:
echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com
[djbw: simplify locking, reword changelog]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-08 10:28:34 -07:00
|
|
|
device_set_pm_not_required(dev);
|
|
|
|
dev->bus = &cxl_bus_type;
|
|
|
|
dev->type = &cxl_region_type;
|
|
|
|
cxlr->id = id;
|
|
|
|
|
|
|
|
return cxlr;
|
|
|
|
}
|
|
|
|
|
2024-03-08 14:59:30 -07:00
|
|
|
static bool cxl_region_update_coordinates(struct cxl_region *cxlr, int nid)
|
|
|
|
{
|
|
|
|
int cset = 0;
|
|
|
|
int rc;
|
|
|
|
|
|
|
|
for (int i = 0; i < ACCESS_COORDINATE_MAX; i++) {
|
|
|
|
if (cxlr->coord[i].read_bandwidth) {
|
2024-03-08 14:59:31 -07:00
|
|
|
rc = 0;
|
|
|
|
if (cxl_need_node_perf_attrs_update(nid))
|
|
|
|
node_set_perf_attrs(nid, &cxlr->coord[i], i);
|
|
|
|
else
|
|
|
|
rc = cxl_update_hmat_access_coordinates(nid, cxlr, i);
|
|
|
|
|
2024-03-08 14:59:30 -07:00
|
|
|
if (rc == 0)
|
|
|
|
cset++;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!cset)
|
|
|
|
return false;
|
|
|
|
|
|
|
|
rc = sysfs_update_group(&cxlr->dev.kobj, get_cxl_region_access0_group());
|
|
|
|
if (rc)
|
|
|
|
dev_dbg(&cxlr->dev, "Failed to update access0 group\n");
|
|
|
|
|
|
|
|
rc = sysfs_update_group(&cxlr->dev.kobj, get_cxl_region_access1_group());
|
|
|
|
if (rc)
|
|
|
|
dev_dbg(&cxlr->dev, "Failed to update access1 group\n");
|
|
|
|
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int cxl_region_perf_attrs_callback(struct notifier_block *nb,
|
|
|
|
unsigned long action, void *arg)
|
|
|
|
{
|
|
|
|
struct cxl_region *cxlr = container_of(nb, struct cxl_region,
|
2025-06-16 15:51:49 +02:00
|
|
|
node_notifier);
|
|
|
|
struct node_notify *nn = arg;
|
|
|
|
int nid = nn->nid;
|
2024-03-08 14:59:30 -07:00
|
|
|
int region_nid;
|
|
|
|
|
2025-06-16 15:51:49 +02:00
|
|
|
if (action != NODE_ADDED_FIRST_MEMORY)
|
2024-03-08 14:59:30 -07:00
|
|
|
return NOTIFY_DONE;
|
|
|
|
|
2024-09-04 09:47:54 -05:00
|
|
|
/*
|
2025-07-11 16:49:32 -07:00
|
|
|
* No need to hold cxl_rwsem.region; region parameters are stable
|
2024-09-04 09:47:54 -05:00
|
|
|
* within the cxl_region driver.
|
|
|
|
*/
|
|
|
|
region_nid = phys_to_target_node(cxlr->params.res->start);
|
2024-03-08 14:59:30 -07:00
|
|
|
if (nid != region_nid)
|
|
|
|
return NOTIFY_DONE;
|
|
|
|
|
|
|
|
if (!cxl_region_update_coordinates(cxlr, nid))
|
|
|
|
return NOTIFY_DONE;
|
|
|
|
|
|
|
|
return NOTIFY_OK;
|
|
|
|
}
|
|
|
|
|
2024-06-18 16:46:38 +08:00
|
|
|
static int cxl_region_calculate_adistance(struct notifier_block *nb,
|
|
|
|
unsigned long nid, void *data)
|
|
|
|
{
|
|
|
|
struct cxl_region *cxlr = container_of(nb, struct cxl_region,
|
|
|
|
adist_notifier);
|
|
|
|
struct access_coordinate *perf;
|
|
|
|
int *adist = data;
|
|
|
|
int region_nid;
|
|
|
|
|
2024-09-04 09:47:54 -05:00
|
|
|
/*
|
2025-07-11 16:49:32 -07:00
|
|
|
* No need to hold cxl_rwsem.region; region parameters are stable
|
2024-09-04 09:47:54 -05:00
|
|
|
* within the cxl_region driver.
|
|
|
|
*/
|
|
|
|
region_nid = phys_to_target_node(cxlr->params.res->start);
|
2024-06-18 16:46:38 +08:00
|
|
|
if (nid != region_nid)
|
|
|
|
return NOTIFY_OK;
|
|
|
|
|
|
|
|
perf = &cxlr->coord[ACCESS_COORDINATE_CPU];
|
|
|
|
|
|
|
|
if (mt_perf_to_adistance(perf, adist))
|
|
|
|
return NOTIFY_OK;
|
|
|
|
|
|
|
|
return NOTIFY_STOP;
|
|
|
|
}
|
|
|
|
|
cxl/region: Add region creation support
CXL 2.0 allows for dynamic provisioning of new memory regions (system
physical address resources like "System RAM" and "Persistent Memory").
Whereas DDR and PMEM resources are conveyed statically at boot, CXL
allows for assembling and instantiating new regions from the available
capacity of CXL memory expanders in the system.
Sysfs with an "echo $region_name > $create_region_attribute" interface
is chosen as the mechanism to initiate the provisioning process. This
was chosen over ioctl() and netlink() to keep the configuration
interface entirely in a pseudo-fs interface, and it was chosen over
configfs since, aside from this one creation event, the interface is
read-mostly. I.e. configfs supports cases where an object is designed to
be provisioned each boot, like an iSCSI storage target, and CXL region
creation is mostly for PMEM regions which are created usually once
per-lifetime of a server instance. This is an improvement over nvdimm
that pre-created "seed" devices that tended to confuse users looking to
determine which devices are active and which are idle.
Recall that the major change that CXL brings over previous persistent
memory architectures is the ability to dynamically define new regions.
Compare that to drivers like 'nfit' where the region configuration is
statically defined by platform firmware.
Regions are created as a child of a root decoder that encompasses an
address space with constraints. When created through sysfs, the root
decoder is explicit. When created from an LSA's region structure a root
decoder will possibly need to be inferred by the driver.
Upon region creation through sysfs, a vacant region is created with a
unique name. Regions have a number of attributes that must be configured
before the region can be bound to the driver where HDM decoder program
is completed.
An example of creating a new region:
- Allocate a new region name:
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
- Create a new region by name:
while
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region
do true; done
- Region now exists in sysfs:
stat -t /sys/bus/cxl/devices/decoder0.0/$region
- Delete the region, and name:
echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com
[djbw: simplify locking, reword changelog]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-08 10:28:34 -07:00
|
|
|
/**
|
|
|
|
* devm_cxl_add_region - Adds a region to a decoder
|
|
|
|
* @cxlrd: root decoder
|
|
|
|
* @id: memregion id to create, or memregion_free() on failure
|
|
|
|
* @mode: mode for the endpoint decoders of this region
|
|
|
|
* @type: select whether this is an expander or accelerator (type-2 or type-3)
|
|
|
|
*
|
|
|
|
* This is the second step of region initialization. Regions exist within an
|
|
|
|
* address space which is mapped by a @cxlrd.
|
|
|
|
*
|
|
|
|
* Return: 0 if the region was added to the @cxlrd, else returns negative error
|
|
|
|
* code. The region will be named "regionZ" where Z is the unique region number.
|
|
|
|
*/
|
|
|
|
static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd,
|
|
|
|
int id,
|
2025-02-03 20:24:29 -08:00
|
|
|
enum cxl_partition_mode mode,
|
cxl/region: Add region creation support
CXL 2.0 allows for dynamic provisioning of new memory regions (system
physical address resources like "System RAM" and "Persistent Memory").
Whereas DDR and PMEM resources are conveyed statically at boot, CXL
allows for assembling and instantiating new regions from the available
capacity of CXL memory expanders in the system.
Sysfs with an "echo $region_name > $create_region_attribute" interface
is chosen as the mechanism to initiate the provisioning process. This
was chosen over ioctl() and netlink() to keep the configuration
interface entirely in a pseudo-fs interface, and it was chosen over
configfs since, aside from this one creation event, the interface is
read-mostly. I.e. configfs supports cases where an object is designed to
be provisioned each boot, like an iSCSI storage target, and CXL region
creation is mostly for PMEM regions which are created usually once
per-lifetime of a server instance. This is an improvement over nvdimm
that pre-created "seed" devices that tended to confuse users looking to
determine which devices are active and which are idle.
Recall that the major change that CXL brings over previous persistent
memory architectures is the ability to dynamically define new regions.
Compare that to drivers like 'nfit' where the region configuration is
statically defined by platform firmware.
Regions are created as a child of a root decoder that encompasses an
address space with constraints. When created through sysfs, the root
decoder is explicit. When created from an LSA's region structure a root
decoder will possibly need to be inferred by the driver.
Upon region creation through sysfs, a vacant region is created with a
unique name. Regions have a number of attributes that must be configured
before the region can be bound to the driver where HDM decoder program
is completed.
An example of creating a new region:
- Allocate a new region name:
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
- Create a new region by name:
while
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region
do true; done
- Region now exists in sysfs:
stat -t /sys/bus/cxl/devices/decoder0.0/$region
- Delete the region, and name:
echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com
[djbw: simplify locking, reword changelog]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-08 10:28:34 -07:00
|
|
|
enum cxl_decoder_type type)
|
|
|
|
{
|
|
|
|
struct cxl_port *port = to_cxl_port(cxlrd->cxlsd.cxld.dev.parent);
|
|
|
|
struct cxl_region *cxlr;
|
|
|
|
struct device *dev;
|
|
|
|
int rc;
|
|
|
|
|
|
|
|
cxlr = cxl_region_alloc(cxlrd, id);
|
|
|
|
if (IS_ERR(cxlr))
|
|
|
|
return cxlr;
|
|
|
|
cxlr->mode = mode;
|
|
|
|
cxlr->type = type;
|
|
|
|
|
|
|
|
dev = &cxlr->dev;
|
|
|
|
rc = dev_set_name(dev, "region%d", id);
|
|
|
|
if (rc)
|
|
|
|
goto err;
|
|
|
|
|
|
|
|
rc = device_add(dev);
|
|
|
|
if (rc)
|
|
|
|
goto err;
|
|
|
|
|
2023-06-22 15:55:01 -05:00
|
|
|
rc = devm_add_action_or_reset(port->uport_dev, unregister_region, cxlr);
|
cxl/region: Add region creation support
CXL 2.0 allows for dynamic provisioning of new memory regions (system
physical address resources like "System RAM" and "Persistent Memory").
Whereas DDR and PMEM resources are conveyed statically at boot, CXL
allows for assembling and instantiating new regions from the available
capacity of CXL memory expanders in the system.
Sysfs with an "echo $region_name > $create_region_attribute" interface
is chosen as the mechanism to initiate the provisioning process. This
was chosen over ioctl() and netlink() to keep the configuration
interface entirely in a pseudo-fs interface, and it was chosen over
configfs since, aside from this one creation event, the interface is
read-mostly. I.e. configfs supports cases where an object is designed to
be provisioned each boot, like an iSCSI storage target, and CXL region
creation is mostly for PMEM regions which are created usually once
per-lifetime of a server instance. This is an improvement over nvdimm
that pre-created "seed" devices that tended to confuse users looking to
determine which devices are active and which are idle.
Recall that the major change that CXL brings over previous persistent
memory architectures is the ability to dynamically define new regions.
Compare that to drivers like 'nfit' where the region configuration is
statically defined by platform firmware.
Regions are created as a child of a root decoder that encompasses an
address space with constraints. When created through sysfs, the root
decoder is explicit. When created from an LSA's region structure a root
decoder will possibly need to be inferred by the driver.
Upon region creation through sysfs, a vacant region is created with a
unique name. Regions have a number of attributes that must be configured
before the region can be bound to the driver where HDM decoder program
is completed.
An example of creating a new region:
- Allocate a new region name:
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
- Create a new region by name:
while
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region
do true; done
- Region now exists in sysfs:
stat -t /sys/bus/cxl/devices/decoder0.0/$region
- Delete the region, and name:
echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com
[djbw: simplify locking, reword changelog]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-08 10:28:34 -07:00
|
|
|
if (rc)
|
|
|
|
return ERR_PTR(rc);
|
|
|
|
|
2023-06-22 15:55:01 -05:00
|
|
|
dev_dbg(port->uport_dev, "%s: created %s\n",
|
cxl/region: Add region creation support
CXL 2.0 allows for dynamic provisioning of new memory regions (system
physical address resources like "System RAM" and "Persistent Memory").
Whereas DDR and PMEM resources are conveyed statically at boot, CXL
allows for assembling and instantiating new regions from the available
capacity of CXL memory expanders in the system.
Sysfs with an "echo $region_name > $create_region_attribute" interface
is chosen as the mechanism to initiate the provisioning process. This
was chosen over ioctl() and netlink() to keep the configuration
interface entirely in a pseudo-fs interface, and it was chosen over
configfs since, aside from this one creation event, the interface is
read-mostly. I.e. configfs supports cases where an object is designed to
be provisioned each boot, like an iSCSI storage target, and CXL region
creation is mostly for PMEM regions which are created usually once
per-lifetime of a server instance. This is an improvement over nvdimm
that pre-created "seed" devices that tended to confuse users looking to
determine which devices are active and which are idle.
Recall that the major change that CXL brings over previous persistent
memory architectures is the ability to dynamically define new regions.
Compare that to drivers like 'nfit' where the region configuration is
statically defined by platform firmware.
Regions are created as a child of a root decoder that encompasses an
address space with constraints. When created through sysfs, the root
decoder is explicit. When created from an LSA's region structure a root
decoder will possibly need to be inferred by the driver.
Upon region creation through sysfs, a vacant region is created with a
unique name. Regions have a number of attributes that must be configured
before the region can be bound to the driver where HDM decoder program
is completed.
An example of creating a new region:
- Allocate a new region name:
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
- Create a new region by name:
while
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region
do true; done
- Region now exists in sysfs:
stat -t /sys/bus/cxl/devices/decoder0.0/$region
- Delete the region, and name:
echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com
[djbw: simplify locking, reword changelog]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-08 10:28:34 -07:00
|
|
|
dev_name(&cxlrd->cxlsd.cxld.dev), dev_name(dev));
|
|
|
|
return cxlr;
|
|
|
|
|
|
|
|
err:
|
|
|
|
put_device(dev);
|
|
|
|
return ERR_PTR(rc);
|
|
|
|
}
|
|
|
|
|
2023-02-10 01:05:57 -08:00
|
|
|
static ssize_t __create_region_show(struct cxl_root_decoder *cxlrd, char *buf)
|
|
|
|
{
|
|
|
|
return sysfs_emit(buf, "region%u\n", atomic_read(&cxlrd->region_id));
|
|
|
|
}
|
|
|
|
|
cxl/region: Add region creation support
CXL 2.0 allows for dynamic provisioning of new memory regions (system
physical address resources like "System RAM" and "Persistent Memory").
Whereas DDR and PMEM resources are conveyed statically at boot, CXL
allows for assembling and instantiating new regions from the available
capacity of CXL memory expanders in the system.
Sysfs with an "echo $region_name > $create_region_attribute" interface
is chosen as the mechanism to initiate the provisioning process. This
was chosen over ioctl() and netlink() to keep the configuration
interface entirely in a pseudo-fs interface, and it was chosen over
configfs since, aside from this one creation event, the interface is
read-mostly. I.e. configfs supports cases where an object is designed to
be provisioned each boot, like an iSCSI storage target, and CXL region
creation is mostly for PMEM regions which are created usually once
per-lifetime of a server instance. This is an improvement over nvdimm
that pre-created "seed" devices that tended to confuse users looking to
determine which devices are active and which are idle.
Recall that the major change that CXL brings over previous persistent
memory architectures is the ability to dynamically define new regions.
Compare that to drivers like 'nfit' where the region configuration is
statically defined by platform firmware.
Regions are created as a child of a root decoder that encompasses an
address space with constraints. When created through sysfs, the root
decoder is explicit. When created from an LSA's region structure a root
decoder will possibly need to be inferred by the driver.
Upon region creation through sysfs, a vacant region is created with a
unique name. Regions have a number of attributes that must be configured
before the region can be bound to the driver where HDM decoder program
is completed.
An example of creating a new region:
- Allocate a new region name:
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
- Create a new region by name:
while
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region
do true; done
- Region now exists in sysfs:
stat -t /sys/bus/cxl/devices/decoder0.0/$region
- Delete the region, and name:
echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com
[djbw: simplify locking, reword changelog]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-08 10:28:34 -07:00
|
|
|
static ssize_t create_pmem_region_show(struct device *dev,
|
|
|
|
struct device_attribute *attr, char *buf)
|
|
|
|
{
|
2023-02-10 01:05:57 -08:00
|
|
|
return __create_region_show(to_cxl_root_decoder(dev), buf);
|
|
|
|
}
|
cxl/region: Add region creation support
CXL 2.0 allows for dynamic provisioning of new memory regions (system
physical address resources like "System RAM" and "Persistent Memory").
Whereas DDR and PMEM resources are conveyed statically at boot, CXL
allows for assembling and instantiating new regions from the available
capacity of CXL memory expanders in the system.
Sysfs with an "echo $region_name > $create_region_attribute" interface
is chosen as the mechanism to initiate the provisioning process. This
was chosen over ioctl() and netlink() to keep the configuration
interface entirely in a pseudo-fs interface, and it was chosen over
configfs since, aside from this one creation event, the interface is
read-mostly. I.e. configfs supports cases where an object is designed to
be provisioned each boot, like an iSCSI storage target, and CXL region
creation is mostly for PMEM regions which are created usually once
per-lifetime of a server instance. This is an improvement over nvdimm
that pre-created "seed" devices that tended to confuse users looking to
determine which devices are active and which are idle.
Recall that the major change that CXL brings over previous persistent
memory architectures is the ability to dynamically define new regions.
Compare that to drivers like 'nfit' where the region configuration is
statically defined by platform firmware.
Regions are created as a child of a root decoder that encompasses an
address space with constraints. When created through sysfs, the root
decoder is explicit. When created from an LSA's region structure a root
decoder will possibly need to be inferred by the driver.
Upon region creation through sysfs, a vacant region is created with a
unique name. Regions have a number of attributes that must be configured
before the region can be bound to the driver where HDM decoder program
is completed.
An example of creating a new region:
- Allocate a new region name:
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
- Create a new region by name:
while
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region
do true; done
- Region now exists in sysfs:
stat -t /sys/bus/cxl/devices/decoder0.0/$region
- Delete the region, and name:
echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com
[djbw: simplify locking, reword changelog]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-08 10:28:34 -07:00
|
|
|
|
2023-02-10 01:05:57 -08:00
|
|
|
static ssize_t create_ram_region_show(struct device *dev,
|
|
|
|
struct device_attribute *attr, char *buf)
|
|
|
|
{
|
|
|
|
return __create_region_show(to_cxl_root_decoder(dev), buf);
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct cxl_region *__create_region(struct cxl_root_decoder *cxlrd,
|
2025-02-03 20:24:29 -08:00
|
|
|
enum cxl_partition_mode mode, int id)
|
2023-02-10 01:05:57 -08:00
|
|
|
{
|
|
|
|
int rc;
|
|
|
|
|
2024-05-07 13:34:21 +08:00
|
|
|
switch (mode) {
|
2025-02-03 20:24:29 -08:00
|
|
|
case CXL_PARTMODE_RAM:
|
|
|
|
case CXL_PARTMODE_PMEM:
|
2024-05-07 13:34:21 +08:00
|
|
|
break;
|
|
|
|
default:
|
|
|
|
dev_err(&cxlrd->cxlsd.cxld.dev, "unsupported mode %d\n", mode);
|
|
|
|
return ERR_PTR(-EINVAL);
|
|
|
|
}
|
|
|
|
|
2023-02-10 01:05:57 -08:00
|
|
|
rc = memregion_alloc(GFP_KERNEL);
|
|
|
|
if (rc < 0)
|
|
|
|
return ERR_PTR(rc);
|
|
|
|
|
|
|
|
if (atomic_cmpxchg(&cxlrd->region_id, id, rc) != id) {
|
|
|
|
memregion_free(rc);
|
|
|
|
return ERR_PTR(-EBUSY);
|
|
|
|
}
|
|
|
|
|
2023-06-14 18:30:13 -07:00
|
|
|
return devm_cxl_add_region(cxlrd, id, mode, CXL_DECODER_HOSTONLYMEM);
|
cxl/region: Add region creation support
CXL 2.0 allows for dynamic provisioning of new memory regions (system
physical address resources like "System RAM" and "Persistent Memory").
Whereas DDR and PMEM resources are conveyed statically at boot, CXL
allows for assembling and instantiating new regions from the available
capacity of CXL memory expanders in the system.
Sysfs with an "echo $region_name > $create_region_attribute" interface
is chosen as the mechanism to initiate the provisioning process. This
was chosen over ioctl() and netlink() to keep the configuration
interface entirely in a pseudo-fs interface, and it was chosen over
configfs since, aside from this one creation event, the interface is
read-mostly. I.e. configfs supports cases where an object is designed to
be provisioned each boot, like an iSCSI storage target, and CXL region
creation is mostly for PMEM regions which are created usually once
per-lifetime of a server instance. This is an improvement over nvdimm
that pre-created "seed" devices that tended to confuse users looking to
determine which devices are active and which are idle.
Recall that the major change that CXL brings over previous persistent
memory architectures is the ability to dynamically define new regions.
Compare that to drivers like 'nfit' where the region configuration is
statically defined by platform firmware.
Regions are created as a child of a root decoder that encompasses an
address space with constraints. When created through sysfs, the root
decoder is explicit. When created from an LSA's region structure a root
decoder will possibly need to be inferred by the driver.
Upon region creation through sysfs, a vacant region is created with a
unique name. Regions have a number of attributes that must be configured
before the region can be bound to the driver where HDM decoder program
is completed.
An example of creating a new region:
- Allocate a new region name:
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
- Create a new region by name:
while
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region
do true; done
- Region now exists in sysfs:
stat -t /sys/bus/cxl/devices/decoder0.0/$region
- Delete the region, and name:
echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com
[djbw: simplify locking, reword changelog]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-08 10:28:34 -07:00
|
|
|
}
|
|
|
|
|
2024-11-07 14:58:24 -06:00
|
|
|
static ssize_t create_region_store(struct device *dev, const char *buf,
|
2025-02-03 20:24:29 -08:00
|
|
|
size_t len, enum cxl_partition_mode mode)
|
cxl/region: Add region creation support
CXL 2.0 allows for dynamic provisioning of new memory regions (system
physical address resources like "System RAM" and "Persistent Memory").
Whereas DDR and PMEM resources are conveyed statically at boot, CXL
allows for assembling and instantiating new regions from the available
capacity of CXL memory expanders in the system.
Sysfs with an "echo $region_name > $create_region_attribute" interface
is chosen as the mechanism to initiate the provisioning process. This
was chosen over ioctl() and netlink() to keep the configuration
interface entirely in a pseudo-fs interface, and it was chosen over
configfs since, aside from this one creation event, the interface is
read-mostly. I.e. configfs supports cases where an object is designed to
be provisioned each boot, like an iSCSI storage target, and CXL region
creation is mostly for PMEM regions which are created usually once
per-lifetime of a server instance. This is an improvement over nvdimm
that pre-created "seed" devices that tended to confuse users looking to
determine which devices are active and which are idle.
Recall that the major change that CXL brings over previous persistent
memory architectures is the ability to dynamically define new regions.
Compare that to drivers like 'nfit' where the region configuration is
statically defined by platform firmware.
Regions are created as a child of a root decoder that encompasses an
address space with constraints. When created through sysfs, the root
decoder is explicit. When created from an LSA's region structure a root
decoder will possibly need to be inferred by the driver.
Upon region creation through sysfs, a vacant region is created with a
unique name. Regions have a number of attributes that must be configured
before the region can be bound to the driver where HDM decoder program
is completed.
An example of creating a new region:
- Allocate a new region name:
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
- Create a new region by name:
while
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region
do true; done
- Region now exists in sysfs:
stat -t /sys/bus/cxl/devices/decoder0.0/$region
- Delete the region, and name:
echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com
[djbw: simplify locking, reword changelog]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-08 10:28:34 -07:00
|
|
|
{
|
|
|
|
struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
|
|
|
|
struct cxl_region *cxlr;
|
2023-02-10 01:05:57 -08:00
|
|
|
int rc, id;
|
cxl/region: Add region creation support
CXL 2.0 allows for dynamic provisioning of new memory regions (system
physical address resources like "System RAM" and "Persistent Memory").
Whereas DDR and PMEM resources are conveyed statically at boot, CXL
allows for assembling and instantiating new regions from the available
capacity of CXL memory expanders in the system.
Sysfs with an "echo $region_name > $create_region_attribute" interface
is chosen as the mechanism to initiate the provisioning process. This
was chosen over ioctl() and netlink() to keep the configuration
interface entirely in a pseudo-fs interface, and it was chosen over
configfs since, aside from this one creation event, the interface is
read-mostly. I.e. configfs supports cases where an object is designed to
be provisioned each boot, like an iSCSI storage target, and CXL region
creation is mostly for PMEM regions which are created usually once
per-lifetime of a server instance. This is an improvement over nvdimm
that pre-created "seed" devices that tended to confuse users looking to
determine which devices are active and which are idle.
Recall that the major change that CXL brings over previous persistent
memory architectures is the ability to dynamically define new regions.
Compare that to drivers like 'nfit' where the region configuration is
statically defined by platform firmware.
Regions are created as a child of a root decoder that encompasses an
address space with constraints. When created through sysfs, the root
decoder is explicit. When created from an LSA's region structure a root
decoder will possibly need to be inferred by the driver.
Upon region creation through sysfs, a vacant region is created with a
unique name. Regions have a number of attributes that must be configured
before the region can be bound to the driver where HDM decoder program
is completed.
An example of creating a new region:
- Allocate a new region name:
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
- Create a new region by name:
while
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region
do true; done
- Region now exists in sysfs:
stat -t /sys/bus/cxl/devices/decoder0.0/$region
- Delete the region, and name:
echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com
[djbw: simplify locking, reword changelog]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-08 10:28:34 -07:00
|
|
|
|
|
|
|
rc = sscanf(buf, "region%d\n", &id);
|
|
|
|
if (rc != 1)
|
|
|
|
return -EINVAL;
|
|
|
|
|
2024-11-07 14:58:24 -06:00
|
|
|
cxlr = __create_region(cxlrd, mode, id);
|
2023-02-10 01:05:57 -08:00
|
|
|
if (IS_ERR(cxlr))
|
|
|
|
return PTR_ERR(cxlr);
|
cxl/region: Add region creation support
CXL 2.0 allows for dynamic provisioning of new memory regions (system
physical address resources like "System RAM" and "Persistent Memory").
Whereas DDR and PMEM resources are conveyed statically at boot, CXL
allows for assembling and instantiating new regions from the available
capacity of CXL memory expanders in the system.
Sysfs with an "echo $region_name > $create_region_attribute" interface
is chosen as the mechanism to initiate the provisioning process. This
was chosen over ioctl() and netlink() to keep the configuration
interface entirely in a pseudo-fs interface, and it was chosen over
configfs since, aside from this one creation event, the interface is
read-mostly. I.e. configfs supports cases where an object is designed to
be provisioned each boot, like an iSCSI storage target, and CXL region
creation is mostly for PMEM regions which are created usually once
per-lifetime of a server instance. This is an improvement over nvdimm
that pre-created "seed" devices that tended to confuse users looking to
determine which devices are active and which are idle.
Recall that the major change that CXL brings over previous persistent
memory architectures is the ability to dynamically define new regions.
Compare that to drivers like 'nfit' where the region configuration is
statically defined by platform firmware.
Regions are created as a child of a root decoder that encompasses an
address space with constraints. When created through sysfs, the root
decoder is explicit. When created from an LSA's region structure a root
decoder will possibly need to be inferred by the driver.
Upon region creation through sysfs, a vacant region is created with a
unique name. Regions have a number of attributes that must be configured
before the region can be bound to the driver where HDM decoder program
is completed.
An example of creating a new region:
- Allocate a new region name:
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
- Create a new region by name:
while
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region
do true; done
- Region now exists in sysfs:
stat -t /sys/bus/cxl/devices/decoder0.0/$region
- Delete the region, and name:
echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com
[djbw: simplify locking, reword changelog]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-08 10:28:34 -07:00
|
|
|
|
2023-02-10 01:05:57 -08:00
|
|
|
return len;
|
|
|
|
}
|
2024-11-07 14:58:24 -06:00
|
|
|
|
|
|
|
static ssize_t create_pmem_region_store(struct device *dev,
|
|
|
|
struct device_attribute *attr,
|
|
|
|
const char *buf, size_t len)
|
|
|
|
{
|
2025-02-03 20:24:29 -08:00
|
|
|
return create_region_store(dev, buf, len, CXL_PARTMODE_PMEM);
|
2024-11-07 14:58:24 -06:00
|
|
|
}
|
2023-02-10 01:05:57 -08:00
|
|
|
DEVICE_ATTR_RW(create_pmem_region);
|
cxl/region: Add region creation support
CXL 2.0 allows for dynamic provisioning of new memory regions (system
physical address resources like "System RAM" and "Persistent Memory").
Whereas DDR and PMEM resources are conveyed statically at boot, CXL
allows for assembling and instantiating new regions from the available
capacity of CXL memory expanders in the system.
Sysfs with an "echo $region_name > $create_region_attribute" interface
is chosen as the mechanism to initiate the provisioning process. This
was chosen over ioctl() and netlink() to keep the configuration
interface entirely in a pseudo-fs interface, and it was chosen over
configfs since, aside from this one creation event, the interface is
read-mostly. I.e. configfs supports cases where an object is designed to
be provisioned each boot, like an iSCSI storage target, and CXL region
creation is mostly for PMEM regions which are created usually once
per-lifetime of a server instance. This is an improvement over nvdimm
that pre-created "seed" devices that tended to confuse users looking to
determine which devices are active and which are idle.
Recall that the major change that CXL brings over previous persistent
memory architectures is the ability to dynamically define new regions.
Compare that to drivers like 'nfit' where the region configuration is
statically defined by platform firmware.
Regions are created as a child of a root decoder that encompasses an
address space with constraints. When created through sysfs, the root
decoder is explicit. When created from an LSA's region structure a root
decoder will possibly need to be inferred by the driver.
Upon region creation through sysfs, a vacant region is created with a
unique name. Regions have a number of attributes that must be configured
before the region can be bound to the driver where HDM decoder program
is completed.
An example of creating a new region:
- Allocate a new region name:
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
- Create a new region by name:
while
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region
do true; done
- Region now exists in sysfs:
stat -t /sys/bus/cxl/devices/decoder0.0/$region
- Delete the region, and name:
echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com
[djbw: simplify locking, reword changelog]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-08 10:28:34 -07:00
|
|
|
|
2023-02-10 01:05:57 -08:00
|
|
|
static ssize_t create_ram_region_store(struct device *dev,
|
|
|
|
struct device_attribute *attr,
|
|
|
|
const char *buf, size_t len)
|
|
|
|
{
|
2025-02-03 20:24:29 -08:00
|
|
|
return create_region_store(dev, buf, len, CXL_PARTMODE_RAM);
|
cxl/region: Add region creation support
CXL 2.0 allows for dynamic provisioning of new memory regions (system
physical address resources like "System RAM" and "Persistent Memory").
Whereas DDR and PMEM resources are conveyed statically at boot, CXL
allows for assembling and instantiating new regions from the available
capacity of CXL memory expanders in the system.
Sysfs with an "echo $region_name > $create_region_attribute" interface
is chosen as the mechanism to initiate the provisioning process. This
was chosen over ioctl() and netlink() to keep the configuration
interface entirely in a pseudo-fs interface, and it was chosen over
configfs since, aside from this one creation event, the interface is
read-mostly. I.e. configfs supports cases where an object is designed to
be provisioned each boot, like an iSCSI storage target, and CXL region
creation is mostly for PMEM regions which are created usually once
per-lifetime of a server instance. This is an improvement over nvdimm
that pre-created "seed" devices that tended to confuse users looking to
determine which devices are active and which are idle.
Recall that the major change that CXL brings over previous persistent
memory architectures is the ability to dynamically define new regions.
Compare that to drivers like 'nfit' where the region configuration is
statically defined by platform firmware.
Regions are created as a child of a root decoder that encompasses an
address space with constraints. When created through sysfs, the root
decoder is explicit. When created from an LSA's region structure a root
decoder will possibly need to be inferred by the driver.
Upon region creation through sysfs, a vacant region is created with a
unique name. Regions have a number of attributes that must be configured
before the region can be bound to the driver where HDM decoder program
is completed.
An example of creating a new region:
- Allocate a new region name:
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
- Create a new region by name:
while
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region
do true; done
- Region now exists in sysfs:
stat -t /sys/bus/cxl/devices/decoder0.0/$region
- Delete the region, and name:
echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com
[djbw: simplify locking, reword changelog]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-08 10:28:34 -07:00
|
|
|
}
|
2023-02-10 01:05:57 -08:00
|
|
|
DEVICE_ATTR_RW(create_ram_region);
|
cxl/region: Add region creation support
CXL 2.0 allows for dynamic provisioning of new memory regions (system
physical address resources like "System RAM" and "Persistent Memory").
Whereas DDR and PMEM resources are conveyed statically at boot, CXL
allows for assembling and instantiating new regions from the available
capacity of CXL memory expanders in the system.
Sysfs with an "echo $region_name > $create_region_attribute" interface
is chosen as the mechanism to initiate the provisioning process. This
was chosen over ioctl() and netlink() to keep the configuration
interface entirely in a pseudo-fs interface, and it was chosen over
configfs since, aside from this one creation event, the interface is
read-mostly. I.e. configfs supports cases where an object is designed to
be provisioned each boot, like an iSCSI storage target, and CXL region
creation is mostly for PMEM regions which are created usually once
per-lifetime of a server instance. This is an improvement over nvdimm
that pre-created "seed" devices that tended to confuse users looking to
determine which devices are active and which are idle.
Recall that the major change that CXL brings over previous persistent
memory architectures is the ability to dynamically define new regions.
Compare that to drivers like 'nfit' where the region configuration is
statically defined by platform firmware.
Regions are created as a child of a root decoder that encompasses an
address space with constraints. When created through sysfs, the root
decoder is explicit. When created from an LSA's region structure a root
decoder will possibly need to be inferred by the driver.
Upon region creation through sysfs, a vacant region is created with a
unique name. Regions have a number of attributes that must be configured
before the region can be bound to the driver where HDM decoder program
is completed.
An example of creating a new region:
- Allocate a new region name:
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
- Create a new region by name:
while
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region
do true; done
- Region now exists in sysfs:
stat -t /sys/bus/cxl/devices/decoder0.0/$region
- Delete the region, and name:
echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com
[djbw: simplify locking, reword changelog]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-08 10:28:34 -07:00
|
|
|
|
2022-06-04 15:49:53 -07:00
|
|
|
static ssize_t region_show(struct device *dev, struct device_attribute *attr,
|
|
|
|
char *buf)
|
|
|
|
{
|
|
|
|
struct cxl_decoder *cxld = to_cxl_decoder(dev);
|
|
|
|
ssize_t rc;
|
|
|
|
|
2025-07-11 16:49:32 -07:00
|
|
|
ACQUIRE(rwsem_read_intr, rwsem)(&cxl_rwsem.region);
|
|
|
|
if ((rc = ACQUIRE_ERR(rwsem_read_intr, &rwsem)))
|
2022-06-04 15:49:53 -07:00
|
|
|
return rc;
|
|
|
|
|
|
|
|
if (cxld->region)
|
2025-07-11 16:49:32 -07:00
|
|
|
return sysfs_emit(buf, "%s\n", dev_name(&cxld->region->dev));
|
|
|
|
return sysfs_emit(buf, "\n");
|
2022-06-04 15:49:53 -07:00
|
|
|
}
|
|
|
|
DEVICE_ATTR_RO(region);
|
|
|
|
|
cxl/region: Add region creation support
CXL 2.0 allows for dynamic provisioning of new memory regions (system
physical address resources like "System RAM" and "Persistent Memory").
Whereas DDR and PMEM resources are conveyed statically at boot, CXL
allows for assembling and instantiating new regions from the available
capacity of CXL memory expanders in the system.
Sysfs with an "echo $region_name > $create_region_attribute" interface
is chosen as the mechanism to initiate the provisioning process. This
was chosen over ioctl() and netlink() to keep the configuration
interface entirely in a pseudo-fs interface, and it was chosen over
configfs since, aside from this one creation event, the interface is
read-mostly. I.e. configfs supports cases where an object is designed to
be provisioned each boot, like an iSCSI storage target, and CXL region
creation is mostly for PMEM regions which are created usually once
per-lifetime of a server instance. This is an improvement over nvdimm
that pre-created "seed" devices that tended to confuse users looking to
determine which devices are active and which are idle.
Recall that the major change that CXL brings over previous persistent
memory architectures is the ability to dynamically define new regions.
Compare that to drivers like 'nfit' where the region configuration is
statically defined by platform firmware.
Regions are created as a child of a root decoder that encompasses an
address space with constraints. When created through sysfs, the root
decoder is explicit. When created from an LSA's region structure a root
decoder will possibly need to be inferred by the driver.
Upon region creation through sysfs, a vacant region is created with a
unique name. Regions have a number of attributes that must be configured
before the region can be bound to the driver where HDM decoder program
is completed.
An example of creating a new region:
- Allocate a new region name:
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
- Create a new region by name:
while
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region
do true; done
- Region now exists in sysfs:
stat -t /sys/bus/cxl/devices/decoder0.0/$region
- Delete the region, and name:
echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com
[djbw: simplify locking, reword changelog]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-08 10:28:34 -07:00
|
|
|
static struct cxl_region *
|
|
|
|
cxl_find_region_by_name(struct cxl_root_decoder *cxlrd, const char *name)
|
|
|
|
{
|
|
|
|
struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
|
|
|
|
struct device *region_dev;
|
|
|
|
|
|
|
|
region_dev = device_find_child_by_name(&cxld->dev, name);
|
|
|
|
if (!region_dev)
|
|
|
|
return ERR_PTR(-ENODEV);
|
|
|
|
|
|
|
|
return to_cxl_region(region_dev);
|
|
|
|
}
|
|
|
|
|
|
|
|
static ssize_t delete_region_store(struct device *dev,
|
|
|
|
struct device_attribute *attr,
|
|
|
|
const char *buf, size_t len)
|
|
|
|
{
|
|
|
|
struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
|
|
|
|
struct cxl_port *port = to_cxl_port(dev->parent);
|
|
|
|
struct cxl_region *cxlr;
|
|
|
|
|
|
|
|
cxlr = cxl_find_region_by_name(cxlrd, buf);
|
|
|
|
if (IS_ERR(cxlr))
|
|
|
|
return PTR_ERR(cxlr);
|
|
|
|
|
2023-06-22 15:55:01 -05:00
|
|
|
devm_release_action(port->uport_dev, unregister_region, cxlr);
|
cxl/region: Add region creation support
CXL 2.0 allows for dynamic provisioning of new memory regions (system
physical address resources like "System RAM" and "Persistent Memory").
Whereas DDR and PMEM resources are conveyed statically at boot, CXL
allows for assembling and instantiating new regions from the available
capacity of CXL memory expanders in the system.
Sysfs with an "echo $region_name > $create_region_attribute" interface
is chosen as the mechanism to initiate the provisioning process. This
was chosen over ioctl() and netlink() to keep the configuration
interface entirely in a pseudo-fs interface, and it was chosen over
configfs since, aside from this one creation event, the interface is
read-mostly. I.e. configfs supports cases where an object is designed to
be provisioned each boot, like an iSCSI storage target, and CXL region
creation is mostly for PMEM regions which are created usually once
per-lifetime of a server instance. This is an improvement over nvdimm
that pre-created "seed" devices that tended to confuse users looking to
determine which devices are active and which are idle.
Recall that the major change that CXL brings over previous persistent
memory architectures is the ability to dynamically define new regions.
Compare that to drivers like 'nfit' where the region configuration is
statically defined by platform firmware.
Regions are created as a child of a root decoder that encompasses an
address space with constraints. When created through sysfs, the root
decoder is explicit. When created from an LSA's region structure a root
decoder will possibly need to be inferred by the driver.
Upon region creation through sysfs, a vacant region is created with a
unique name. Regions have a number of attributes that must be configured
before the region can be bound to the driver where HDM decoder program
is completed.
An example of creating a new region:
- Allocate a new region name:
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
- Create a new region by name:
while
region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region)
! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region
do true; done
- Region now exists in sysfs:
stat -t /sys/bus/cxl/devices/decoder0.0/$region
- Delete the region, and name:
echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region
Signed-off-by: Ben Widawsky <bwidawsk@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com
[djbw: simplify locking, reword changelog]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-08 10:28:34 -07:00
|
|
|
put_device(&cxlr->dev);
|
|
|
|
|
|
|
|
return len;
|
|
|
|
}
|
|
|
|
DEVICE_ATTR_WO(delete_region);
|
2022-04-25 11:43:44 -07:00
|
|
|
|
2022-01-11 08:06:40 -08:00
|
|
|
static void cxl_pmem_region_release(struct device *dev)
|
|
|
|
{
|
|
|
|
struct cxl_pmem_region *cxlr_pmem = to_cxl_pmem_region(dev);
|
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = 0; i < cxlr_pmem->nr_mappings; i++) {
|
|
|
|
struct cxl_memdev *cxlmd = cxlr_pmem->mapping[i].cxlmd;
|
|
|
|
|
|
|
|
put_device(&cxlmd->dev);
|
|
|
|
}
|
|
|
|
|
|
|
|
kfree(cxlr_pmem);
|
|
|
|
}
|
|
|
|
|
|
|
|
static const struct attribute_group *cxl_pmem_region_attribute_groups[] = {
|
|
|
|
&cxl_base_attribute_group,
|
|
|
|
NULL,
|
|
|
|
};
|
|
|
|
|
|
|
|
const struct device_type cxl_pmem_region_type = {
|
|
|
|
.name = "cxl_pmem_region",
|
|
|
|
.release = cxl_pmem_region_release,
|
|
|
|
.groups = cxl_pmem_region_attribute_groups,
|
|
|
|
};
|
|
|
|
|
|
|
|
bool is_cxl_pmem_region(struct device *dev)
|
|
|
|
{
|
|
|
|
return dev->type == &cxl_pmem_region_type;
|
|
|
|
}
|
module: Convert symbol namespace to string literal
Clean up the existing export namespace code along the same lines of
commit 33def8498fdd ("treewide: Convert macro and uses of __section(foo)
to __section("foo")") and for the same reason, it is not desired for the
namespace argument to be a macro expansion itself.
Scripted using
git grep -l -e MODULE_IMPORT_NS -e EXPORT_SYMBOL_NS | while read file;
do
awk -i inplace '
/^#define EXPORT_SYMBOL_NS/ {
gsub(/__stringify\(ns\)/, "ns");
print;
next;
}
/^#define MODULE_IMPORT_NS/ {
gsub(/__stringify\(ns\)/, "ns");
print;
next;
}
/MODULE_IMPORT_NS/ {
$0 = gensub(/MODULE_IMPORT_NS\(([^)]*)\)/, "MODULE_IMPORT_NS(\"\\1\")", "g");
}
/EXPORT_SYMBOL_NS/ {
if ($0 ~ /(EXPORT_SYMBOL_NS[^(]*)\(([^,]+),/) {
if ($0 !~ /(EXPORT_SYMBOL_NS[^(]*)\(([^,]+), ([^)]+)\)/ &&
$0 !~ /(EXPORT_SYMBOL_NS[^(]*)\(\)/ &&
$0 !~ /^my/) {
getline line;
gsub(/[[:space:]]*\\$/, "");
gsub(/[[:space:]]/, "", line);
$0 = $0 " " line;
}
$0 = gensub(/(EXPORT_SYMBOL_NS[^(]*)\(([^,]+), ([^)]+)\)/,
"\\1(\\2, \"\\3\")", "g");
}
}
{ print }' $file;
done
Requested-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://mail.google.com/mail/u/2/#inbox/FMfcgzQXKWgMmjdFwwdsfgxzKpVHWPlc
Acked-by: Greg KH <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-12-02 15:59:47 +01:00
|
|
|
EXPORT_SYMBOL_NS_GPL(is_cxl_pmem_region, "CXL");
|
2022-01-11 08:06:40 -08:00
|
|
|
|
|
|
|
struct cxl_pmem_region *to_cxl_pmem_region(struct device *dev)
|
|
|
|
{
|
|
|
|
if (dev_WARN_ONCE(dev, !is_cxl_pmem_region(dev),
|
|
|
|
"not a cxl_pmem_region device\n"))
|
|
|
|
return NULL;
|
|
|
|
return container_of(dev, struct cxl_pmem_region, dev);
|
|
|
|
}
|
module: Convert symbol namespace to string literal
Clean up the existing export namespace code along the same lines of
commit 33def8498fdd ("treewide: Convert macro and uses of __section(foo)
to __section("foo")") and for the same reason, it is not desired for the
namespace argument to be a macro expansion itself.
Scripted using
git grep -l -e MODULE_IMPORT_NS -e EXPORT_SYMBOL_NS | while read file;
do
awk -i inplace '
/^#define EXPORT_SYMBOL_NS/ {
gsub(/__stringify\(ns\)/, "ns");
print;
next;
}
/^#define MODULE_IMPORT_NS/ {
gsub(/__stringify\(ns\)/, "ns");
print;
next;
}
/MODULE_IMPORT_NS/ {
$0 = gensub(/MODULE_IMPORT_NS\(([^)]*)\)/, "MODULE_IMPORT_NS(\"\\1\")", "g");
}
/EXPORT_SYMBOL_NS/ {
if ($0 ~ /(EXPORT_SYMBOL_NS[^(]*)\(([^,]+),/) {
if ($0 !~ /(EXPORT_SYMBOL_NS[^(]*)\(([^,]+), ([^)]+)\)/ &&
$0 !~ /(EXPORT_SYMBOL_NS[^(]*)\(\)/ &&
$0 !~ /^my/) {
getline line;
gsub(/[[:space:]]*\\$/, "");
gsub(/[[:space:]]/, "", line);
$0 = $0 " " line;
}
$0 = gensub(/(EXPORT_SYMBOL_NS[^(]*)\(([^,]+), ([^)]+)\)/,
"\\1(\\2, \"\\3\")", "g");
}
}
{ print }' $file;
done
Requested-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://mail.google.com/mail/u/2/#inbox/FMfcgzQXKWgMmjdFwwdsfgxzKpVHWPlc
Acked-by: Greg KH <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-12-02 15:59:47 +01:00
|
|
|
EXPORT_SYMBOL_NS_GPL(to_cxl_pmem_region, "CXL");
|
2022-01-11 08:06:40 -08:00
|
|
|
|
2023-04-18 10:39:07 -07:00
|
|
|
struct cxl_poison_context {
|
|
|
|
struct cxl_port *port;
|
2025-02-03 20:24:29 -08:00
|
|
|
int part;
|
2023-04-18 10:39:07 -07:00
|
|
|
u64 offset;
|
|
|
|
};
|
|
|
|
|
|
|
|
static int cxl_get_poison_unmapped(struct cxl_memdev *cxlmd,
|
|
|
|
struct cxl_poison_context *ctx)
|
|
|
|
{
|
|
|
|
struct cxl_dev_state *cxlds = cxlmd->cxlds;
|
2025-02-03 20:24:29 -08:00
|
|
|
const struct resource *res;
|
|
|
|
struct resource *p, *last;
|
2023-04-18 10:39:07 -07:00
|
|
|
u64 offset, length;
|
|
|
|
int rc = 0;
|
|
|
|
|
2025-02-03 20:24:29 -08:00
|
|
|
if (ctx->part < 0)
|
|
|
|
return 0;
|
|
|
|
|
2023-04-18 10:39:07 -07:00
|
|
|
/*
|
2025-02-03 20:24:29 -08:00
|
|
|
* Collect poison for the remaining unmapped resources after
|
|
|
|
* poison is collected by committed endpoints decoders.
|
2023-04-18 10:39:07 -07:00
|
|
|
*/
|
2025-02-03 20:24:29 -08:00
|
|
|
for (int i = ctx->part; i < cxlds->nr_partitions; i++) {
|
|
|
|
res = &cxlds->part[i].res;
|
|
|
|
for (p = res->child, last = NULL; p; p = p->sibling)
|
|
|
|
last = p;
|
|
|
|
if (last)
|
|
|
|
offset = last->end + 1;
|
|
|
|
else
|
|
|
|
offset = res->start;
|
|
|
|
length = res->end - offset + 1;
|
|
|
|
if (!length)
|
|
|
|
break;
|
2023-04-18 10:39:07 -07:00
|
|
|
rc = cxl_mem_get_poison(cxlmd, offset, length, NULL);
|
2025-02-03 20:24:29 -08:00
|
|
|
if (rc == -EFAULT && cxlds->part[i].mode == CXL_PARTMODE_RAM)
|
|
|
|
continue;
|
2023-04-18 10:39:07 -07:00
|
|
|
if (rc)
|
2025-02-03 20:24:29 -08:00
|
|
|
break;
|
2023-04-18 10:39:07 -07:00
|
|
|
}
|
|
|
|
|
2025-02-03 20:24:29 -08:00
|
|
|
return rc;
|
2023-04-18 10:39:07 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
static int poison_by_decoder(struct device *dev, void *arg)
|
|
|
|
{
|
|
|
|
struct cxl_poison_context *ctx = arg;
|
|
|
|
struct cxl_endpoint_decoder *cxled;
|
2025-02-03 20:24:29 -08:00
|
|
|
enum cxl_partition_mode mode;
|
|
|
|
struct cxl_dev_state *cxlds;
|
2023-04-18 10:39:07 -07:00
|
|
|
struct cxl_memdev *cxlmd;
|
|
|
|
u64 offset, length;
|
|
|
|
int rc = 0;
|
|
|
|
|
|
|
|
if (!is_endpoint_decoder(dev))
|
|
|
|
return rc;
|
|
|
|
|
|
|
|
cxled = to_cxl_endpoint_decoder(dev);
|
2025-02-03 20:24:29 -08:00
|
|
|
if (!cxled->dpa_res)
|
2023-04-18 10:39:07 -07:00
|
|
|
return rc;
|
|
|
|
|
|
|
|
cxlmd = cxled_to_memdev(cxled);
|
2025-02-03 20:24:29 -08:00
|
|
|
cxlds = cxlmd->cxlds;
|
|
|
|
mode = cxlds->part[cxled->part].mode;
|
|
|
|
|
2023-04-18 10:39:07 -07:00
|
|
|
if (cxled->skip) {
|
|
|
|
offset = cxled->dpa_res->start - cxled->skip;
|
|
|
|
length = cxled->skip;
|
|
|
|
rc = cxl_mem_get_poison(cxlmd, offset, length, NULL);
|
2025-02-03 20:24:29 -08:00
|
|
|
if (rc == -EFAULT && mode == CXL_PARTMODE_RAM)
|
2023-04-18 10:39:07 -07:00
|
|
|
rc = 0;
|
|
|
|
if (rc)
|
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
|
|
|
|
offset = cxled->dpa_res->start;
|
|
|
|
length = cxled->dpa_res->end - offset + 1;
|
|
|
|
rc = cxl_mem_get_poison(cxlmd, offset, length, cxled->cxld.region);
|
2025-02-03 20:24:29 -08:00
|
|
|
if (rc == -EFAULT && mode == CXL_PARTMODE_RAM)
|
2023-04-18 10:39:07 -07:00
|
|
|
rc = 0;
|
|
|
|
if (rc)
|
|
|
|
return rc;
|
|
|
|
|
|
|
|
/* Iterate until commit_end is reached */
|
|
|
|
if (cxled->cxld.id == ctx->port->commit_end) {
|
|
|
|
ctx->offset = cxled->dpa_res->end + 1;
|
2025-02-03 20:24:29 -08:00
|
|
|
ctx->part = cxled->part;
|
2023-04-18 10:39:07 -07:00
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
int cxl_get_poison_by_endpoint(struct cxl_port *port)
|
|
|
|
{
|
|
|
|
struct cxl_poison_context ctx;
|
|
|
|
int rc = 0;
|
|
|
|
|
|
|
|
ctx = (struct cxl_poison_context) {
|
2025-02-03 20:24:29 -08:00
|
|
|
.port = port,
|
|
|
|
.part = -1,
|
2023-04-18 10:39:07 -07:00
|
|
|
};
|
|
|
|
|
|
|
|
rc = device_for_each_child(&port->dev, &ctx, poison_by_decoder);
|
|
|
|
if (rc == 1)
|
2023-06-22 15:55:01 -05:00
|
|
|
rc = cxl_get_poison_unmapped(to_cxl_memdev(port->uport_dev),
|
|
|
|
&ctx);
|
2023-04-18 10:39:07 -07:00
|
|
|
|
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
|
2024-04-30 10:28:04 -07:00
|
|
|
struct cxl_dpa_to_region_context {
|
|
|
|
struct cxl_region *cxlr;
|
|
|
|
u64 dpa;
|
|
|
|
};
|
|
|
|
|
|
|
|
static int __cxl_dpa_to_region(struct device *dev, void *arg)
|
|
|
|
{
|
|
|
|
struct cxl_dpa_to_region_context *ctx = arg;
|
|
|
|
struct cxl_endpoint_decoder *cxled;
|
2024-06-03 17:36:09 -07:00
|
|
|
struct cxl_region *cxlr;
|
2024-04-30 10:28:04 -07:00
|
|
|
u64 dpa = ctx->dpa;
|
|
|
|
|
|
|
|
if (!is_endpoint_decoder(dev))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
cxled = to_cxl_endpoint_decoder(dev);
|
2024-06-03 17:36:09 -07:00
|
|
|
if (!cxled || !cxled->dpa_res || !resource_size(cxled->dpa_res))
|
2024-04-30 10:28:04 -07:00
|
|
|
return 0;
|
|
|
|
|
2025-07-11 11:23:57 +08:00
|
|
|
if (!cxl_resource_contains_addr(cxled->dpa_res, dpa))
|
2024-04-30 10:28:04 -07:00
|
|
|
return 0;
|
|
|
|
|
2024-06-03 17:36:09 -07:00
|
|
|
/*
|
|
|
|
* Stop the region search (return 1) when an endpoint mapping is
|
|
|
|
* found. The region may not be fully constructed so offering
|
|
|
|
* the cxlr in the context structure is not guaranteed.
|
|
|
|
*/
|
|
|
|
cxlr = cxled->cxld.region;
|
|
|
|
if (cxlr)
|
|
|
|
dev_dbg(dev, "dpa:0x%llx mapped in region:%s\n", dpa,
|
|
|
|
dev_name(&cxlr->dev));
|
|
|
|
else
|
|
|
|
dev_dbg(dev, "dpa:0x%llx mapped in endpoint:%s\n", dpa,
|
|
|
|
dev_name(dev));
|
2024-04-30 10:28:04 -07:00
|
|
|
|
2024-06-03 17:36:09 -07:00
|
|
|
ctx->cxlr = cxlr;
|
2024-04-30 10:28:04 -07:00
|
|
|
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
struct cxl_region *cxl_dpa_to_region(const struct cxl_memdev *cxlmd, u64 dpa)
|
|
|
|
{
|
|
|
|
struct cxl_dpa_to_region_context ctx;
|
|
|
|
struct cxl_port *port;
|
|
|
|
|
|
|
|
ctx = (struct cxl_dpa_to_region_context) {
|
|
|
|
.dpa = dpa,
|
|
|
|
};
|
|
|
|
port = cxlmd->endpoint;
|
|
|
|
if (port && is_cxl_endpoint(port) && cxl_num_decoders_committed(port))
|
|
|
|
device_for_each_child(&port->dev, &ctx, __cxl_dpa_to_region);
|
|
|
|
|
|
|
|
return ctx.cxlr;
|
|
|
|
}
|
|
|
|
|
2024-07-02 22:29:50 -07:00
|
|
|
static bool cxl_is_hpa_in_chunk(u64 hpa, struct cxl_region *cxlr, int pos)
|
2024-04-30 10:28:05 -07:00
|
|
|
{
|
|
|
|
struct cxl_region_params *p = &cxlr->params;
|
|
|
|
int gran = p->interleave_granularity;
|
|
|
|
int ways = p->interleave_ways;
|
|
|
|
u64 offset;
|
|
|
|
|
|
|
|
/* Is the hpa in an expected chunk for its pos(-ition) */
|
|
|
|
offset = hpa - p->res->start;
|
|
|
|
offset = do_div(offset, gran * ways);
|
|
|
|
if ((offset >= pos * gran) && (offset < (pos + 1) * gran))
|
|
|
|
return true;
|
|
|
|
|
|
|
|
dev_dbg(&cxlr->dev,
|
|
|
|
"Addr trans fail: hpa 0x%llx not in expected chunk\n", hpa);
|
|
|
|
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2024-07-02 22:29:49 -07:00
|
|
|
u64 cxl_dpa_to_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd,
|
|
|
|
u64 dpa)
|
2024-04-30 10:28:05 -07:00
|
|
|
{
|
2024-07-02 22:29:50 -07:00
|
|
|
struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
|
2024-04-30 10:28:05 -07:00
|
|
|
u64 dpa_offset, hpa_offset, bits_upper, mask_upper, hpa;
|
|
|
|
struct cxl_region_params *p = &cxlr->params;
|
2024-07-02 22:29:49 -07:00
|
|
|
struct cxl_endpoint_decoder *cxled = NULL;
|
2024-04-30 10:28:05 -07:00
|
|
|
u16 eig = 0;
|
|
|
|
u8 eiw = 0;
|
2024-07-02 22:29:49 -07:00
|
|
|
int pos;
|
2024-04-30 10:28:05 -07:00
|
|
|
|
2024-07-02 22:29:49 -07:00
|
|
|
for (int i = 0; i < p->nr_targets; i++) {
|
|
|
|
cxled = p->targets[i];
|
|
|
|
if (cxlmd == cxled_to_memdev(cxled))
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
if (!cxled || cxlmd != cxled_to_memdev(cxled))
|
|
|
|
return ULLONG_MAX;
|
2024-04-30 10:28:05 -07:00
|
|
|
|
2024-07-02 22:29:49 -07:00
|
|
|
pos = cxled->pos;
|
2024-04-30 10:28:05 -07:00
|
|
|
ways_to_eiw(p->interleave_ways, &eiw);
|
|
|
|
granularity_to_eig(p->interleave_granularity, &eig);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The device position in the region interleave set was removed
|
|
|
|
* from the offset at HPA->DPA translation. To reconstruct the
|
|
|
|
* HPA, place the 'pos' in the offset.
|
|
|
|
*
|
|
|
|
* The placement of 'pos' in the HPA is determined by interleave
|
|
|
|
* ways and granularity and is defined in the CXL Spec 3.0 Section
|
|
|
|
* 8.2.4.19.13 Implementation Note: Device Decode Logic
|
|
|
|
*/
|
|
|
|
|
|
|
|
/* Remove the dpa base */
|
|
|
|
dpa_offset = dpa - cxl_dpa_resource_start(cxled);
|
|
|
|
|
|
|
|
mask_upper = GENMASK_ULL(51, eig + 8);
|
|
|
|
|
|
|
|
if (eiw < 8) {
|
|
|
|
hpa_offset = (dpa_offset & mask_upper) << eiw;
|
|
|
|
hpa_offset |= pos << (eig + 8);
|
|
|
|
} else {
|
|
|
|
bits_upper = (dpa_offset & mask_upper) >> (eig + 8);
|
|
|
|
bits_upper = bits_upper * 3;
|
|
|
|
hpa_offset = ((bits_upper << (eiw - 8)) + pos) << (eig + 8);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* The lower bits remain unchanged */
|
|
|
|
hpa_offset |= dpa_offset & GENMASK_ULL(eig + 7, 0);
|
|
|
|
|
|
|
|
/* Apply the hpa_offset to the region base address */
|
2025-02-26 09:21:19 -07:00
|
|
|
hpa = hpa_offset + p->res->start + p->cache_size;
|
2024-04-30 10:28:05 -07:00
|
|
|
|
2024-07-02 22:29:50 -07:00
|
|
|
/* Root decoder translation overrides typical modulo decode */
|
|
|
|
if (cxlrd->hpa_to_spa)
|
|
|
|
hpa = cxlrd->hpa_to_spa(cxlrd, hpa);
|
2024-04-30 10:28:05 -07:00
|
|
|
|
2025-07-11 11:23:57 +08:00
|
|
|
if (!cxl_resource_contains_addr(p->res, hpa)) {
|
2024-07-02 22:29:50 -07:00
|
|
|
dev_dbg(&cxlr->dev,
|
|
|
|
"Addr trans fail: hpa 0x%llx not in region\n", hpa);
|
|
|
|
return ULLONG_MAX;
|
2024-04-30 10:28:05 -07:00
|
|
|
}
|
2024-07-02 22:29:50 -07:00
|
|
|
|
|
|
|
/* Simple chunk check, by pos & gran, only applies to modulo decodes */
|
|
|
|
if (!cxlrd->hpa_to_spa && (!cxl_is_hpa_in_chunk(hpa, cxlr, pos)))
|
2024-04-30 10:28:05 -07:00
|
|
|
return ULLONG_MAX;
|
|
|
|
|
|
|
|
return hpa;
|
|
|
|
}
|
|
|
|
|
2022-01-11 08:06:40 -08:00
|
|
|
static struct lock_class_key cxl_pmem_region_key;
|
|
|
|
|
2024-04-30 14:59:00 -07:00
|
|
|
static int cxl_pmem_region_alloc(struct cxl_region *cxlr)
|
2022-01-11 08:06:40 -08:00
|
|
|
{
|
|
|
|
struct cxl_region_params *p = &cxlr->params;
|
cxl/pmem: Refactor nvdimm device registration, delete the workqueue
The three objects 'struct cxl_nvdimm_bridge', 'struct cxl_nvdimm', and
'struct cxl_pmem_region' manage CXL persistent memory resources. The
bridge represents base platform resources, the nvdimm represents one or
more endpoints, and the region is a collection of nvdimms that
contribute to an assembled address range.
Their relationship is such that a region is torn down if any component
endpoints are removed. All regions and endpoints are torn down if the
foundational bridge device goes down.
A workqueue was deployed to manage these interdependencies, but it is
difficult to reason about, and fragile. A recent attempt to take the CXL
root device lock in the cxl_mem driver was reported by lockdep as
colliding with the flush_work() in the cxl_pmem flows.
Instead of the workqueue, arrange for all pmem/nvdimm devices to be torn
down immediately and hierarchically. A similar change is made to both
the 'cxl_nvdimm' and 'cxl_pmem_region' objects. For bisect-ability both
changes are made in the same patch which unfortunately makes the patch
bigger than desired.
Arrange for cxl_memdev and cxl_region to register a cxl_nvdimm and
cxl_pmem_region as a devres release action of the bridge device.
Additionally, include a devres release action of the cxl_memdev or
cxl_region device that triggers the bridge's release action if an endpoint
exits before the bridge. I.e. this allows either unplugging the bridge,
or unplugging and endpoint to result in the same cleanup actions.
To keep the patch smaller the cleanup of the now defunct workqueue
infrastructure is saved for a follow-on patch.
Tested-by: Robert Richter <rrichter@amd.com>
Link: https://lore.kernel.org/r/166993041773.1882361.16444301376147207609.stgit@dwillia2-xfh.jf.intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-12-01 13:33:37 -08:00
|
|
|
struct cxl_nvdimm_bridge *cxl_nvb;
|
2022-01-11 08:06:40 -08:00
|
|
|
struct device *dev;
|
|
|
|
int i;
|
|
|
|
|
2025-07-11 16:49:32 -07:00
|
|
|
guard(rwsem_read)(&cxl_rwsem.region);
|
2024-04-30 14:59:00 -07:00
|
|
|
if (p->state != CXL_CONFIG_COMMIT)
|
|
|
|
return -ENXIO;
|
2022-01-11 08:06:40 -08:00
|
|
|
|
2024-04-30 14:59:00 -07:00
|
|
|
struct cxl_pmem_region *cxlr_pmem __free(kfree) =
|
|
|
|
kzalloc(struct_size(cxlr_pmem, mapping, p->nr_targets), GFP_KERNEL);
|
|
|
|
if (!cxlr_pmem)
|
|
|
|
return -ENOMEM;
|
2022-01-11 08:06:40 -08:00
|
|
|
|
|
|
|
cxlr_pmem->hpa_range.start = p->res->start;
|
|
|
|
cxlr_pmem->hpa_range.end = p->res->end;
|
|
|
|
|
2025-07-11 16:49:32 -07:00
|
|
|
/* Snapshot the region configuration underneath the cxl_rwsem.region */
|
2022-01-11 08:06:40 -08:00
|
|
|
cxlr_pmem->nr_mappings = p->nr_targets;
|
|
|
|
for (i = 0; i < p->nr_targets; i++) {
|
|
|
|
struct cxl_endpoint_decoder *cxled = p->targets[i];
|
|
|
|
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
|
|
|
|
struct cxl_pmem_region_mapping *m = &cxlr_pmem->mapping[i];
|
|
|
|
|
cxl/pmem: Refactor nvdimm device registration, delete the workqueue
The three objects 'struct cxl_nvdimm_bridge', 'struct cxl_nvdimm', and
'struct cxl_pmem_region' manage CXL persistent memory resources. The
bridge represents base platform resources, the nvdimm represents one or
more endpoints, and the region is a collection of nvdimms that
contribute to an assembled address range.
Their relationship is such that a region is torn down if any component
endpoints are removed. All regions and endpoints are torn down if the
foundational bridge device goes down.
A workqueue was deployed to manage these interdependencies, but it is
difficult to reason about, and fragile. A recent attempt to take the CXL
root device lock in the cxl_mem driver was reported by lockdep as
colliding with the flush_work() in the cxl_pmem flows.
Instead of the workqueue, arrange for all pmem/nvdimm devices to be torn
down immediately and hierarchically. A similar change is made to both
the 'cxl_nvdimm' and 'cxl_pmem_region' objects. For bisect-ability both
changes are made in the same patch which unfortunately makes the patch
bigger than desired.
Arrange for cxl_memdev and cxl_region to register a cxl_nvdimm and
cxl_pmem_region as a devres release action of the bridge device.
Additionally, include a devres release action of the cxl_memdev or
cxl_region device that triggers the bridge's release action if an endpoint
exits before the bridge. I.e. this allows either unplugging the bridge,
or unplugging and endpoint to result in the same cleanup actions.
To keep the patch smaller the cleanup of the now defunct workqueue
infrastructure is saved for a follow-on patch.
Tested-by: Robert Richter <rrichter@amd.com>
Link: https://lore.kernel.org/r/166993041773.1882361.16444301376147207609.stgit@dwillia2-xfh.jf.intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-12-01 13:33:37 -08:00
|
|
|
/*
|
|
|
|
* Regions never span CXL root devices, so by definition the
|
|
|
|
* bridge for one device is the same for all.
|
|
|
|
*/
|
|
|
|
if (i == 0) {
|
2024-06-12 14:44:23 +08:00
|
|
|
cxl_nvb = cxl_find_nvdimm_bridge(cxlmd->endpoint);
|
2024-04-30 14:59:00 -07:00
|
|
|
if (!cxl_nvb)
|
|
|
|
return -ENODEV;
|
cxl/pmem: Refactor nvdimm device registration, delete the workqueue
The three objects 'struct cxl_nvdimm_bridge', 'struct cxl_nvdimm', and
'struct cxl_pmem_region' manage CXL persistent memory resources. The
bridge represents base platform resources, the nvdimm represents one or
more endpoints, and the region is a collection of nvdimms that
contribute to an assembled address range.
Their relationship is such that a region is torn down if any component
endpoints are removed. All regions and endpoints are torn down if the
foundational bridge device goes down.
A workqueue was deployed to manage these interdependencies, but it is
difficult to reason about, and fragile. A recent attempt to take the CXL
root device lock in the cxl_mem driver was reported by lockdep as
colliding with the flush_work() in the cxl_pmem flows.
Instead of the workqueue, arrange for all pmem/nvdimm devices to be torn
down immediately and hierarchically. A similar change is made to both
the 'cxl_nvdimm' and 'cxl_pmem_region' objects. For bisect-ability both
changes are made in the same patch which unfortunately makes the patch
bigger than desired.
Arrange for cxl_memdev and cxl_region to register a cxl_nvdimm and
cxl_pmem_region as a devres release action of the bridge device.
Additionally, include a devres release action of the cxl_memdev or
cxl_region device that triggers the bridge's release action if an endpoint
exits before the bridge. I.e. this allows either unplugging the bridge,
or unplugging and endpoint to result in the same cleanup actions.
To keep the patch smaller the cleanup of the now defunct workqueue
infrastructure is saved for a follow-on patch.
Tested-by: Robert Richter <rrichter@amd.com>
Link: https://lore.kernel.org/r/166993041773.1882361.16444301376147207609.stgit@dwillia2-xfh.jf.intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-12-01 13:33:37 -08:00
|
|
|
cxlr->cxl_nvb = cxl_nvb;
|
|
|
|
}
|
2022-01-11 08:06:40 -08:00
|
|
|
m->cxlmd = cxlmd;
|
|
|
|
get_device(&cxlmd->dev);
|
|
|
|
m->start = cxled->dpa_res->start;
|
|
|
|
m->size = resource_size(cxled->dpa_res);
|
|
|
|
m->position = i;
|
|
|
|
}
|
|
|
|
|
|
|
|
dev = &cxlr_pmem->dev;
|
|
|
|
device_initialize(dev);
|
|
|
|
lockdep_set_class(&dev->mutex, &cxl_pmem_region_key);
|
|
|
|
device_set_pm_not_required(dev);
|
|
|
|
dev->parent = &cxlr->dev;
|
|
|
|
dev->bus = &cxl_bus_type;
|
|
|
|
dev->type = &cxl_pmem_region_type;
|
2024-04-30 14:59:00 -07:00
|
|
|
cxlr_pmem->cxlr = cxlr;
|
|
|
|
cxlr->cxlr_pmem = no_free_ptr(cxlr_pmem);
|
2022-01-11 08:06:40 -08:00
|
|
|
|
2024-04-30 14:59:00 -07:00
|
|
|
return 0;
|
2022-01-11 08:06:40 -08:00
|
|
|
}
|
|
|
|
|
2023-02-10 01:07:19 -08:00
|
|
|
static void cxl_dax_region_release(struct device *dev)
|
|
|
|
{
|
|
|
|
struct cxl_dax_region *cxlr_dax = to_cxl_dax_region(dev);
|
|
|
|
|
|
|
|
kfree(cxlr_dax);
|
|
|
|
}
|
|
|
|
|
|
|
|
static const struct attribute_group *cxl_dax_region_attribute_groups[] = {
|
|
|
|
&cxl_base_attribute_group,
|
|
|
|
NULL,
|
|
|
|
};
|
|
|
|
|
|
|
|
const struct device_type cxl_dax_region_type = {
|
|
|
|
.name = "cxl_dax_region",
|
|
|
|
.release = cxl_dax_region_release,
|
|
|
|
.groups = cxl_dax_region_attribute_groups,
|
|
|
|
};
|
|
|
|
|
|
|
|
static bool is_cxl_dax_region(struct device *dev)
|
|
|
|
{
|
|
|
|
return dev->type == &cxl_dax_region_type;
|
|
|
|
}
|
|
|
|
|
|
|
|
struct cxl_dax_region *to_cxl_dax_region(struct device *dev)
|
|
|
|
{
|
|
|
|
if (dev_WARN_ONCE(dev, !is_cxl_dax_region(dev),
|
|
|
|
"not a cxl_dax_region device\n"))
|
|
|
|
return NULL;
|
|
|
|
return container_of(dev, struct cxl_dax_region, dev);
|
|
|
|
}
|
module: Convert symbol namespace to string literal
Clean up the existing export namespace code along the same lines of
commit 33def8498fdd ("treewide: Convert macro and uses of __section(foo)
to __section("foo")") and for the same reason, it is not desired for the
namespace argument to be a macro expansion itself.
Scripted using
git grep -l -e MODULE_IMPORT_NS -e EXPORT_SYMBOL_NS | while read file;
do
awk -i inplace '
/^#define EXPORT_SYMBOL_NS/ {
gsub(/__stringify\(ns\)/, "ns");
print;
next;
}
/^#define MODULE_IMPORT_NS/ {
gsub(/__stringify\(ns\)/, "ns");
print;
next;
}
/MODULE_IMPORT_NS/ {
$0 = gensub(/MODULE_IMPORT_NS\(([^)]*)\)/, "MODULE_IMPORT_NS(\"\\1\")", "g");
}
/EXPORT_SYMBOL_NS/ {
if ($0 ~ /(EXPORT_SYMBOL_NS[^(]*)\(([^,]+),/) {
if ($0 !~ /(EXPORT_SYMBOL_NS[^(]*)\(([^,]+), ([^)]+)\)/ &&
$0 !~ /(EXPORT_SYMBOL_NS[^(]*)\(\)/ &&
$0 !~ /^my/) {
getline line;
gsub(/[[:space:]]*\\$/, "");
gsub(/[[:space:]]/, "", line);
$0 = $0 " " line;
}
$0 = gensub(/(EXPORT_SYMBOL_NS[^(]*)\(([^,]+), ([^)]+)\)/,
"\\1(\\2, \"\\3\")", "g");
}
}
{ print }' $file;
done
Requested-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://mail.google.com/mail/u/2/#inbox/FMfcgzQXKWgMmjdFwwdsfgxzKpVHWPlc
Acked-by: Greg KH <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-12-02 15:59:47 +01:00
|
|
|
EXPORT_SYMBOL_NS_GPL(to_cxl_dax_region, "CXL");
|
2023-02-10 01:07:19 -08:00
|
|
|
|
|
|
|
static struct lock_class_key cxl_dax_region_key;
|
|
|
|
|
|
|
|
static struct cxl_dax_region *cxl_dax_region_alloc(struct cxl_region *cxlr)
|
|
|
|
{
|
|
|
|
struct cxl_region_params *p = &cxlr->params;
|
|
|
|
struct cxl_dax_region *cxlr_dax;
|
|
|
|
struct device *dev;
|
|
|
|
|
2025-07-11 16:49:32 -07:00
|
|
|
guard(rwsem_read)(&cxl_rwsem.region);
|
2025-02-21 09:24:52 +08:00
|
|
|
if (p->state != CXL_CONFIG_COMMIT)
|
|
|
|
return ERR_PTR(-ENXIO);
|
2023-02-10 01:07:19 -08:00
|
|
|
|
|
|
|
cxlr_dax = kzalloc(sizeof(*cxlr_dax), GFP_KERNEL);
|
2025-02-21 09:24:52 +08:00
|
|
|
if (!cxlr_dax)
|
|
|
|
return ERR_PTR(-ENOMEM);
|
2023-02-10 01:07:19 -08:00
|
|
|
|
|
|
|
cxlr_dax->hpa_range.start = p->res->start;
|
|
|
|
cxlr_dax->hpa_range.end = p->res->end;
|
|
|
|
|
|
|
|
dev = &cxlr_dax->dev;
|
|
|
|
cxlr_dax->cxlr = cxlr;
|
|
|
|
device_initialize(dev);
|
|
|
|
lockdep_set_class(&dev->mutex, &cxl_dax_region_key);
|
|
|
|
device_set_pm_not_required(dev);
|
|
|
|
dev->parent = &cxlr->dev;
|
|
|
|
dev->bus = &cxl_bus_type;
|
|
|
|
dev->type = &cxl_dax_region_type;
|
|
|
|
|
|
|
|
return cxlr_dax;
|
|
|
|
}
|
|
|
|
|
cxl/pmem: Refactor nvdimm device registration, delete the workqueue
The three objects 'struct cxl_nvdimm_bridge', 'struct cxl_nvdimm', and
'struct cxl_pmem_region' manage CXL persistent memory resources. The
bridge represents base platform resources, the nvdimm represents one or
more endpoints, and the region is a collection of nvdimms that
contribute to an assembled address range.
Their relationship is such that a region is torn down if any component
endpoints are removed. All regions and endpoints are torn down if the
foundational bridge device goes down.
A workqueue was deployed to manage these interdependencies, but it is
difficult to reason about, and fragile. A recent attempt to take the CXL
root device lock in the cxl_mem driver was reported by lockdep as
colliding with the flush_work() in the cxl_pmem flows.
Instead of the workqueue, arrange for all pmem/nvdimm devices to be torn
down immediately and hierarchically. A similar change is made to both
the 'cxl_nvdimm' and 'cxl_pmem_region' objects. For bisect-ability both
changes are made in the same patch which unfortunately makes the patch
bigger than desired.
Arrange for cxl_memdev and cxl_region to register a cxl_nvdimm and
cxl_pmem_region as a devres release action of the bridge device.
Additionally, include a devres release action of the cxl_memdev or
cxl_region device that triggers the bridge's release action if an endpoint
exits before the bridge. I.e. this allows either unplugging the bridge,
or unplugging and endpoint to result in the same cleanup actions.
To keep the patch smaller the cleanup of the now defunct workqueue
infrastructure is saved for a follow-on patch.
Tested-by: Robert Richter <rrichter@amd.com>
Link: https://lore.kernel.org/r/166993041773.1882361.16444301376147207609.stgit@dwillia2-xfh.jf.intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-12-01 13:33:37 -08:00
|
|
|
static void cxlr_pmem_unregister(void *_cxlr_pmem)
|
2022-01-11 08:06:40 -08:00
|
|
|
{
|
cxl/pmem: Refactor nvdimm device registration, delete the workqueue
The three objects 'struct cxl_nvdimm_bridge', 'struct cxl_nvdimm', and
'struct cxl_pmem_region' manage CXL persistent memory resources. The
bridge represents base platform resources, the nvdimm represents one or
more endpoints, and the region is a collection of nvdimms that
contribute to an assembled address range.
Their relationship is such that a region is torn down if any component
endpoints are removed. All regions and endpoints are torn down if the
foundational bridge device goes down.
A workqueue was deployed to manage these interdependencies, but it is
difficult to reason about, and fragile. A recent attempt to take the CXL
root device lock in the cxl_mem driver was reported by lockdep as
colliding with the flush_work() in the cxl_pmem flows.
Instead of the workqueue, arrange for all pmem/nvdimm devices to be torn
down immediately and hierarchically. A similar change is made to both
the 'cxl_nvdimm' and 'cxl_pmem_region' objects. For bisect-ability both
changes are made in the same patch which unfortunately makes the patch
bigger than desired.
Arrange for cxl_memdev and cxl_region to register a cxl_nvdimm and
cxl_pmem_region as a devres release action of the bridge device.
Additionally, include a devres release action of the cxl_memdev or
cxl_region device that triggers the bridge's release action if an endpoint
exits before the bridge. I.e. this allows either unplugging the bridge,
or unplugging and endpoint to result in the same cleanup actions.
To keep the patch smaller the cleanup of the now defunct workqueue
infrastructure is saved for a follow-on patch.
Tested-by: Robert Richter <rrichter@amd.com>
Link: https://lore.kernel.org/r/166993041773.1882361.16444301376147207609.stgit@dwillia2-xfh.jf.intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-12-01 13:33:37 -08:00
|
|
|
struct cxl_pmem_region *cxlr_pmem = _cxlr_pmem;
|
|
|
|
struct cxl_region *cxlr = cxlr_pmem->cxlr;
|
|
|
|
struct cxl_nvdimm_bridge *cxl_nvb = cxlr->cxl_nvb;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Either the bridge is in ->remove() context under the device_lock(),
|
|
|
|
* or cxlr_release_nvdimm() is cancelling the bridge's release action
|
|
|
|
* for @cxlr_pmem and doing it itself (while manually holding the bridge
|
|
|
|
* lock).
|
|
|
|
*/
|
|
|
|
device_lock_assert(&cxl_nvb->dev);
|
|
|
|
cxlr->cxlr_pmem = NULL;
|
|
|
|
cxlr_pmem->cxlr = NULL;
|
|
|
|
device_unregister(&cxlr_pmem->dev);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void cxlr_release_nvdimm(void *_cxlr)
|
|
|
|
{
|
|
|
|
struct cxl_region *cxlr = _cxlr;
|
|
|
|
struct cxl_nvdimm_bridge *cxl_nvb = cxlr->cxl_nvb;
|
|
|
|
|
2024-08-30 01:31:37 +00:00
|
|
|
scoped_guard(device, &cxl_nvb->dev) {
|
|
|
|
if (cxlr->cxlr_pmem)
|
|
|
|
devm_release_action(&cxl_nvb->dev, cxlr_pmem_unregister,
|
|
|
|
cxlr->cxlr_pmem);
|
|
|
|
}
|
cxl/pmem: Refactor nvdimm device registration, delete the workqueue
The three objects 'struct cxl_nvdimm_bridge', 'struct cxl_nvdimm', and
'struct cxl_pmem_region' manage CXL persistent memory resources. The
bridge represents base platform resources, the nvdimm represents one or
more endpoints, and the region is a collection of nvdimms that
contribute to an assembled address range.
Their relationship is such that a region is torn down if any component
endpoints are removed. All regions and endpoints are torn down if the
foundational bridge device goes down.
A workqueue was deployed to manage these interdependencies, but it is
difficult to reason about, and fragile. A recent attempt to take the CXL
root device lock in the cxl_mem driver was reported by lockdep as
colliding with the flush_work() in the cxl_pmem flows.
Instead of the workqueue, arrange for all pmem/nvdimm devices to be torn
down immediately and hierarchically. A similar change is made to both
the 'cxl_nvdimm' and 'cxl_pmem_region' objects. For bisect-ability both
changes are made in the same patch which unfortunately makes the patch
bigger than desired.
Arrange for cxl_memdev and cxl_region to register a cxl_nvdimm and
cxl_pmem_region as a devres release action of the bridge device.
Additionally, include a devres release action of the cxl_memdev or
cxl_region device that triggers the bridge's release action if an endpoint
exits before the bridge. I.e. this allows either unplugging the bridge,
or unplugging and endpoint to result in the same cleanup actions.
To keep the patch smaller the cleanup of the now defunct workqueue
infrastructure is saved for a follow-on patch.
Tested-by: Robert Richter <rrichter@amd.com>
Link: https://lore.kernel.org/r/166993041773.1882361.16444301376147207609.stgit@dwillia2-xfh.jf.intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-12-01 13:33:37 -08:00
|
|
|
cxlr->cxl_nvb = NULL;
|
|
|
|
put_device(&cxl_nvb->dev);
|
2022-01-11 08:06:40 -08:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* devm_cxl_add_pmem_region() - add a cxl_region-to-nd_region bridge
|
|
|
|
* @cxlr: parent CXL region for this pmem region bridge device
|
|
|
|
*
|
|
|
|
* Return: 0 on success negative error code on failure.
|
|
|
|
*/
|
|
|
|
static int devm_cxl_add_pmem_region(struct cxl_region *cxlr)
|
|
|
|
{
|
|
|
|
struct cxl_pmem_region *cxlr_pmem;
|
cxl/pmem: Refactor nvdimm device registration, delete the workqueue
The three objects 'struct cxl_nvdimm_bridge', 'struct cxl_nvdimm', and
'struct cxl_pmem_region' manage CXL persistent memory resources. The
bridge represents base platform resources, the nvdimm represents one or
more endpoints, and the region is a collection of nvdimms that
contribute to an assembled address range.
Their relationship is such that a region is torn down if any component
endpoints are removed. All regions and endpoints are torn down if the
foundational bridge device goes down.
A workqueue was deployed to manage these interdependencies, but it is
difficult to reason about, and fragile. A recent attempt to take the CXL
root device lock in the cxl_mem driver was reported by lockdep as
colliding with the flush_work() in the cxl_pmem flows.
Instead of the workqueue, arrange for all pmem/nvdimm devices to be torn
down immediately and hierarchically. A similar change is made to both
the 'cxl_nvdimm' and 'cxl_pmem_region' objects. For bisect-ability both
changes are made in the same patch which unfortunately makes the patch
bigger than desired.
Arrange for cxl_memdev and cxl_region to register a cxl_nvdimm and
cxl_pmem_region as a devres release action of the bridge device.
Additionally, include a devres release action of the cxl_memdev or
cxl_region device that triggers the bridge's release action if an endpoint
exits before the bridge. I.e. this allows either unplugging the bridge,
or unplugging and endpoint to result in the same cleanup actions.
To keep the patch smaller the cleanup of the now defunct workqueue
infrastructure is saved for a follow-on patch.
Tested-by: Robert Richter <rrichter@amd.com>
Link: https://lore.kernel.org/r/166993041773.1882361.16444301376147207609.stgit@dwillia2-xfh.jf.intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-12-01 13:33:37 -08:00
|
|
|
struct cxl_nvdimm_bridge *cxl_nvb;
|
2022-01-11 08:06:40 -08:00
|
|
|
struct device *dev;
|
|
|
|
int rc;
|
|
|
|
|
2024-04-30 14:59:00 -07:00
|
|
|
rc = cxl_pmem_region_alloc(cxlr);
|
|
|
|
if (rc)
|
|
|
|
return rc;
|
|
|
|
cxlr_pmem = cxlr->cxlr_pmem;
|
cxl/pmem: Refactor nvdimm device registration, delete the workqueue
The three objects 'struct cxl_nvdimm_bridge', 'struct cxl_nvdimm', and
'struct cxl_pmem_region' manage CXL persistent memory resources. The
bridge represents base platform resources, the nvdimm represents one or
more endpoints, and the region is a collection of nvdimms that
contribute to an assembled address range.
Their relationship is such that a region is torn down if any component
endpoints are removed. All regions and endpoints are torn down if the
foundational bridge device goes down.
A workqueue was deployed to manage these interdependencies, but it is
difficult to reason about, and fragile. A recent attempt to take the CXL
root device lock in the cxl_mem driver was reported by lockdep as
colliding with the flush_work() in the cxl_pmem flows.
Instead of the workqueue, arrange for all pmem/nvdimm devices to be torn
down immediately and hierarchically. A similar change is made to both
the 'cxl_nvdimm' and 'cxl_pmem_region' objects. For bisect-ability both
changes are made in the same patch which unfortunately makes the patch
bigger than desired.
Arrange for cxl_memdev and cxl_region to register a cxl_nvdimm and
cxl_pmem_region as a devres release action of the bridge device.
Additionally, include a devres release action of the cxl_memdev or
cxl_region device that triggers the bridge's release action if an endpoint
exits before the bridge. I.e. this allows either unplugging the bridge,
or unplugging and endpoint to result in the same cleanup actions.
To keep the patch smaller the cleanup of the now defunct workqueue
infrastructure is saved for a follow-on patch.
Tested-by: Robert Richter <rrichter@amd.com>
Link: https://lore.kernel.org/r/166993041773.1882361.16444301376147207609.stgit@dwillia2-xfh.jf.intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-12-01 13:33:37 -08:00
|
|
|
cxl_nvb = cxlr->cxl_nvb;
|
2022-01-11 08:06:40 -08:00
|
|
|
|
|
|
|
dev = &cxlr_pmem->dev;
|
|
|
|
rc = dev_set_name(dev, "pmem_region%d", cxlr->id);
|
|
|
|
if (rc)
|
|
|
|
goto err;
|
|
|
|
|
|
|
|
rc = device_add(dev);
|
|
|
|
if (rc)
|
|
|
|
goto err;
|
|
|
|
|
|
|
|
dev_dbg(&cxlr->dev, "%s: register %s\n", dev_name(dev->parent),
|
|
|
|
dev_name(dev));
|
|
|
|
|
2024-08-30 01:31:37 +00:00
|
|
|
scoped_guard(device, &cxl_nvb->dev) {
|
|
|
|
if (cxl_nvb->dev.driver)
|
|
|
|
rc = devm_add_action_or_reset(&cxl_nvb->dev,
|
|
|
|
cxlr_pmem_unregister,
|
|
|
|
cxlr_pmem);
|
|
|
|
else
|
|
|
|
rc = -ENXIO;
|
|
|
|
}
|
cxl/pmem: Refactor nvdimm device registration, delete the workqueue
The three objects 'struct cxl_nvdimm_bridge', 'struct cxl_nvdimm', and
'struct cxl_pmem_region' manage CXL persistent memory resources. The
bridge represents base platform resources, the nvdimm represents one or
more endpoints, and the region is a collection of nvdimms that
contribute to an assembled address range.
Their relationship is such that a region is torn down if any component
endpoints are removed. All regions and endpoints are torn down if the
foundational bridge device goes down.
A workqueue was deployed to manage these interdependencies, but it is
difficult to reason about, and fragile. A recent attempt to take the CXL
root device lock in the cxl_mem driver was reported by lockdep as
colliding with the flush_work() in the cxl_pmem flows.
Instead of the workqueue, arrange for all pmem/nvdimm devices to be torn
down immediately and hierarchically. A similar change is made to both
the 'cxl_nvdimm' and 'cxl_pmem_region' objects. For bisect-ability both
changes are made in the same patch which unfortunately makes the patch
bigger than desired.
Arrange for cxl_memdev and cxl_region to register a cxl_nvdimm and
cxl_pmem_region as a devres release action of the bridge device.
Additionally, include a devres release action of the cxl_memdev or
cxl_region device that triggers the bridge's release action if an endpoint
exits before the bridge. I.e. this allows either unplugging the bridge,
or unplugging and endpoint to result in the same cleanup actions.
To keep the patch smaller the cleanup of the now defunct workqueue
infrastructure is saved for a follow-on patch.
Tested-by: Robert Richter <rrichter@amd.com>
Link: https://lore.kernel.org/r/166993041773.1882361.16444301376147207609.stgit@dwillia2-xfh.jf.intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-12-01 13:33:37 -08:00
|
|
|
|
|
|
|
if (rc)
|
|
|
|
goto err_bridge;
|
|
|
|
|
|
|
|
/* @cxlr carries a reference on @cxl_nvb until cxlr_release_nvdimm */
|
|
|
|
return devm_add_action_or_reset(&cxlr->dev, cxlr_release_nvdimm, cxlr);
|
2022-01-11 08:06:40 -08:00
|
|
|
|
|
|
|
err:
|
|
|
|
put_device(dev);
|
cxl/pmem: Refactor nvdimm device registration, delete the workqueue
The three objects 'struct cxl_nvdimm_bridge', 'struct cxl_nvdimm', and
'struct cxl_pmem_region' manage CXL persistent memory resources. The
bridge represents base platform resources, the nvdimm represents one or
more endpoints, and the region is a collection of nvdimms that
contribute to an assembled address range.
Their relationship is such that a region is torn down if any component
endpoints are removed. All regions and endpoints are torn down if the
foundational bridge device goes down.
A workqueue was deployed to manage these interdependencies, but it is
difficult to reason about, and fragile. A recent attempt to take the CXL
root device lock in the cxl_mem driver was reported by lockdep as
colliding with the flush_work() in the cxl_pmem flows.
Instead of the workqueue, arrange for all pmem/nvdimm devices to be torn
down immediately and hierarchically. A similar change is made to both
the 'cxl_nvdimm' and 'cxl_pmem_region' objects. For bisect-ability both
changes are made in the same patch which unfortunately makes the patch
bigger than desired.
Arrange for cxl_memdev and cxl_region to register a cxl_nvdimm and
cxl_pmem_region as a devres release action of the bridge device.
Additionally, include a devres release action of the cxl_memdev or
cxl_region device that triggers the bridge's release action if an endpoint
exits before the bridge. I.e. this allows either unplugging the bridge,
or unplugging and endpoint to result in the same cleanup actions.
To keep the patch smaller the cleanup of the now defunct workqueue
infrastructure is saved for a follow-on patch.
Tested-by: Robert Richter <rrichter@amd.com>
Link: https://lore.kernel.org/r/166993041773.1882361.16444301376147207609.stgit@dwillia2-xfh.jf.intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-12-01 13:33:37 -08:00
|
|
|
err_bridge:
|
|
|
|
put_device(&cxl_nvb->dev);
|
|
|
|
cxlr->cxl_nvb = NULL;
|
2022-01-11 08:06:40 -08:00
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
|
2023-02-10 01:07:19 -08:00
|
|
|
static void cxlr_dax_unregister(void *_cxlr_dax)
|
|
|
|
{
|
|
|
|
struct cxl_dax_region *cxlr_dax = _cxlr_dax;
|
|
|
|
|
|
|
|
device_unregister(&cxlr_dax->dev);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int devm_cxl_add_dax_region(struct cxl_region *cxlr)
|
|
|
|
{
|
|
|
|
struct cxl_dax_region *cxlr_dax;
|
|
|
|
struct device *dev;
|
|
|
|
int rc;
|
|
|
|
|
|
|
|
cxlr_dax = cxl_dax_region_alloc(cxlr);
|
|
|
|
if (IS_ERR(cxlr_dax))
|
|
|
|
return PTR_ERR(cxlr_dax);
|
|
|
|
|
|
|
|
dev = &cxlr_dax->dev;
|
|
|
|
rc = dev_set_name(dev, "dax_region%d", cxlr->id);
|
|
|
|
if (rc)
|
|
|
|
goto err;
|
|
|
|
|
|
|
|
rc = device_add(dev);
|
|
|
|
if (rc)
|
|
|
|
goto err;
|
|
|
|
|
|
|
|
dev_dbg(&cxlr->dev, "%s: register %s\n", dev_name(dev->parent),
|
|
|
|
dev_name(dev));
|
|
|
|
|
|
|
|
return devm_add_action_or_reset(&cxlr->dev, cxlr_dax_unregister,
|
|
|
|
cxlr_dax);
|
|
|
|
err:
|
|
|
|
put_device(dev);
|
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
|
2025-05-09 17:06:56 +02:00
|
|
|
static int match_decoder_by_range(struct device *dev, const void *data)
|
2023-02-10 17:31:17 -08:00
|
|
|
{
|
driver core: Constify API device_find_child() and adapt for various usages
Constify the following API:
struct device *device_find_child(struct device *dev, void *data,
int (*match)(struct device *dev, void *data));
To :
struct device *device_find_child(struct device *dev, const void *data,
device_match_t match);
typedef int (*device_match_t)(struct device *dev, const void *data);
with the following reasons:
- Protect caller's match data @*data which is for comparison and lookup
and the API does not actually need to modify @*data.
- Make the API's parameters (@match)() and @data have the same type as
all of other device finding APIs (bus|class|driver)_find_device().
- All kinds of existing device match functions can be directly taken
as the API's argument, they were exported by driver core.
Constify the API and adapt for various existing usages.
BTW, various subsystem changes are squashed into this commit to meet
'git bisect' requirement, and this commit has the minimal and simplest
changes to complement squashing shortcoming, and that may bring extra
code improvement.
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Acked-by: Uwe Kleine-König <ukleinek@kernel.org> # for drivers/pwm
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Link: https://lore.kernel.org/r/20241224-const_dfc_done-v5-4-6623037414d4@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-24 21:05:03 +08:00
|
|
|
const struct range *r1, *r2 = data;
|
2025-05-09 17:06:56 +02:00
|
|
|
struct cxl_decoder *cxld;
|
2023-02-10 17:31:17 -08:00
|
|
|
|
2025-05-09 17:06:56 +02:00
|
|
|
if (!is_switch_decoder(dev))
|
2023-02-10 17:31:17 -08:00
|
|
|
return 0;
|
|
|
|
|
2025-05-09 17:06:56 +02:00
|
|
|
cxld = to_cxl_decoder(dev);
|
|
|
|
r1 = &cxld->hpa_range;
|
2023-02-10 17:31:17 -08:00
|
|
|
return range_contains(r1, r2);
|
|
|
|
}
|
|
|
|
|
2025-05-09 17:06:56 +02:00
|
|
|
static struct cxl_decoder *
|
|
|
|
cxl_port_find_switch_decoder(struct cxl_port *port, struct range *hpa)
|
|
|
|
{
|
|
|
|
struct device *cxld_dev = device_find_child(&port->dev, hpa,
|
|
|
|
match_decoder_by_range);
|
|
|
|
|
|
|
|
return cxld_dev ? to_cxl_decoder(cxld_dev) : NULL;
|
|
|
|
}
|
|
|
|
|
2025-05-09 17:06:54 +02:00
|
|
|
static struct cxl_root_decoder *
|
|
|
|
cxl_find_root_decoder(struct cxl_endpoint_decoder *cxled)
|
|
|
|
{
|
|
|
|
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
|
|
|
|
struct cxl_port *port = cxled_to_port(cxled);
|
|
|
|
struct cxl_root *cxl_root __free(put_cxl_root) = find_cxl_root(port);
|
2025-05-09 17:06:56 +02:00
|
|
|
struct cxl_decoder *root, *cxld = &cxled->cxld;
|
2025-05-09 17:06:54 +02:00
|
|
|
struct range *hpa = &cxld->hpa_range;
|
|
|
|
|
2025-05-09 17:06:56 +02:00
|
|
|
root = cxl_port_find_switch_decoder(&cxl_root->port, hpa);
|
|
|
|
if (!root) {
|
2025-05-09 17:06:54 +02:00
|
|
|
dev_err(cxlmd->dev.parent,
|
|
|
|
"%s:%s no CXL window for range %#llx:%#llx\n",
|
|
|
|
dev_name(&cxlmd->dev), dev_name(&cxld->dev),
|
|
|
|
cxld->hpa_range.start, cxld->hpa_range.end);
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2025-05-09 17:06:56 +02:00
|
|
|
return to_cxl_root_decoder(&root->dev);
|
2025-05-09 17:06:54 +02:00
|
|
|
}
|
|
|
|
|
driver core: Constify API device_find_child() and adapt for various usages
Constify the following API:
struct device *device_find_child(struct device *dev, void *data,
int (*match)(struct device *dev, void *data));
To :
struct device *device_find_child(struct device *dev, const void *data,
device_match_t match);
typedef int (*device_match_t)(struct device *dev, const void *data);
with the following reasons:
- Protect caller's match data @*data which is for comparison and lookup
and the API does not actually need to modify @*data.
- Make the API's parameters (@match)() and @data have the same type as
all of other device finding APIs (bus|class|driver)_find_device().
- All kinds of existing device match functions can be directly taken
as the API's argument, they were exported by driver core.
Constify the API and adapt for various existing usages.
BTW, various subsystem changes are squashed into this commit to meet
'git bisect' requirement, and this commit has the minimal and simplest
changes to complement squashing shortcoming, and that may bring extra
code improvement.
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Acked-by: Uwe Kleine-König <ukleinek@kernel.org> # for drivers/pwm
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Link: https://lore.kernel.org/r/20241224-const_dfc_done-v5-4-6623037414d4@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-24 21:05:03 +08:00
|
|
|
static int match_region_by_range(struct device *dev, const void *data)
|
2023-02-10 17:31:17 -08:00
|
|
|
{
|
|
|
|
struct cxl_region_params *p;
|
|
|
|
struct cxl_region *cxlr;
|
driver core: Constify API device_find_child() and adapt for various usages
Constify the following API:
struct device *device_find_child(struct device *dev, void *data,
int (*match)(struct device *dev, void *data));
To :
struct device *device_find_child(struct device *dev, const void *data,
device_match_t match);
typedef int (*device_match_t)(struct device *dev, const void *data);
with the following reasons:
- Protect caller's match data @*data which is for comparison and lookup
and the API does not actually need to modify @*data.
- Make the API's parameters (@match)() and @data have the same type as
all of other device finding APIs (bus|class|driver)_find_device().
- All kinds of existing device match functions can be directly taken
as the API's argument, they were exported by driver core.
Constify the API and adapt for various existing usages.
BTW, various subsystem changes are squashed into this commit to meet
'git bisect' requirement, and this commit has the minimal and simplest
changes to complement squashing shortcoming, and that may bring extra
code improvement.
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Acked-by: Uwe Kleine-König <ukleinek@kernel.org> # for drivers/pwm
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Link: https://lore.kernel.org/r/20241224-const_dfc_done-v5-4-6623037414d4@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-24 21:05:03 +08:00
|
|
|
const struct range *r = data;
|
2023-02-10 17:31:17 -08:00
|
|
|
|
|
|
|
if (!is_cxl_region(dev))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
cxlr = to_cxl_region(dev);
|
|
|
|
p = &cxlr->params;
|
|
|
|
|
2025-07-11 16:49:32 -07:00
|
|
|
guard(rwsem_read)(&cxl_rwsem.region);
|
2023-02-10 17:31:17 -08:00
|
|
|
if (p->res && p->res->start == r->start && p->res->end == r->end)
|
2025-02-21 09:24:47 +08:00
|
|
|
return 1;
|
2023-02-10 17:31:17 -08:00
|
|
|
|
2025-02-21 09:24:47 +08:00
|
|
|
return 0;
|
2023-02-10 17:31:17 -08:00
|
|
|
}
|
|
|
|
|
2025-02-26 09:21:19 -07:00
|
|
|
static int cxl_extended_linear_cache_resize(struct cxl_region *cxlr,
|
|
|
|
struct resource *res)
|
|
|
|
{
|
|
|
|
struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
|
|
|
|
struct cxl_region_params *p = &cxlr->params;
|
2025-03-06 13:36:51 -08:00
|
|
|
resource_size_t size = resource_size(res);
|
|
|
|
resource_size_t cache_size, start;
|
2025-02-26 09:21:19 -07:00
|
|
|
|
2025-07-11 17:15:27 +02:00
|
|
|
cache_size = cxlrd->cache_size;
|
2025-02-26 09:21:19 -07:00
|
|
|
if (!cache_size)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
if (size != cache_size) {
|
|
|
|
dev_warn(&cxlr->dev,
|
2025-02-28 13:47:39 -07:00
|
|
|
"Extended Linear Cache size %pa != CXL size %pa. No Support!",
|
|
|
|
&cache_size, &size);
|
2025-03-06 13:36:51 -08:00
|
|
|
return -ENXIO;
|
2025-02-26 09:21:19 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Move the start of the range to where the cache range starts. The
|
|
|
|
* implementation assumes that the cache range is in front of the
|
|
|
|
* CXL range. This is not dictated by the HMAT spec but is how the
|
|
|
|
* current known implementation is configured.
|
|
|
|
*
|
|
|
|
* The cache range is expected to be within the CFMWS. The adjusted
|
|
|
|
* res->start should not be less than cxlrd->res->start.
|
|
|
|
*/
|
|
|
|
start = res->start - cache_size;
|
|
|
|
if (start < cxlrd->res->start)
|
|
|
|
return -ENXIO;
|
|
|
|
|
|
|
|
res->start = start;
|
|
|
|
p->cache_size = cache_size;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2025-02-21 09:32:05 +08:00
|
|
|
static int __construct_region(struct cxl_region *cxlr,
|
|
|
|
struct cxl_root_decoder *cxlrd,
|
|
|
|
struct cxl_endpoint_decoder *cxled)
|
2023-02-10 17:31:17 -08:00
|
|
|
{
|
|
|
|
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
|
|
|
|
struct range *hpa = &cxled->cxld.hpa_range;
|
|
|
|
struct cxl_region_params *p;
|
|
|
|
struct resource *res;
|
|
|
|
int rc;
|
2025-02-03 20:24:29 -08:00
|
|
|
|
2025-07-11 16:49:32 -07:00
|
|
|
guard(rwsem_write)(&cxl_rwsem.region);
|
2023-02-10 17:31:17 -08:00
|
|
|
p = &cxlr->params;
|
|
|
|
if (p->state >= CXL_CONFIG_INTERLEAVE_ACTIVE) {
|
|
|
|
dev_err(cxlmd->dev.parent,
|
|
|
|
"%s:%s: %s autodiscovery interrupted\n",
|
|
|
|
dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
|
|
|
|
__func__);
|
2025-02-21 09:32:05 +08:00
|
|
|
return -EBUSY;
|
2023-02-10 17:31:17 -08:00
|
|
|
}
|
|
|
|
|
|
|
|
set_bit(CXL_REGION_F_AUTO, &cxlr->flags);
|
|
|
|
|
|
|
|
res = kmalloc(sizeof(*res), GFP_KERNEL);
|
2025-02-21 09:32:05 +08:00
|
|
|
if (!res)
|
|
|
|
return -ENOMEM;
|
2023-02-10 17:31:17 -08:00
|
|
|
|
|
|
|
*res = DEFINE_RES_MEM_NAMED(hpa->start, range_len(hpa),
|
|
|
|
dev_name(&cxlr->dev));
|
2025-02-26 09:21:19 -07:00
|
|
|
|
|
|
|
rc = cxl_extended_linear_cache_resize(cxlr, res);
|
2025-03-06 13:36:51 -08:00
|
|
|
if (rc && rc != -EOPNOTSUPP) {
|
2025-02-26 09:21:19 -07:00
|
|
|
/*
|
|
|
|
* Failing to support extended linear cache region resize does not
|
|
|
|
* prevent the region from functioning. Only causes cxl list showing
|
|
|
|
* incorrect region size.
|
|
|
|
*/
|
|
|
|
dev_warn(cxlmd->dev.parent,
|
2025-03-06 13:36:51 -08:00
|
|
|
"Extended linear cache calculation failed rc:%d\n", rc);
|
2025-02-26 09:21:19 -07:00
|
|
|
}
|
|
|
|
|
2023-02-10 17:31:17 -08:00
|
|
|
rc = insert_resource(cxlrd->res, res);
|
|
|
|
if (rc) {
|
|
|
|
/*
|
|
|
|
* Platform-firmware may not have split resources like "System
|
|
|
|
* RAM" on CXL window boundaries see cxl_region_iomem_release()
|
|
|
|
*/
|
|
|
|
dev_warn(cxlmd->dev.parent,
|
|
|
|
"%s:%s: %s %s cannot insert resource\n",
|
|
|
|
dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
|
|
|
|
__func__, dev_name(&cxlr->dev));
|
|
|
|
}
|
|
|
|
|
|
|
|
p->res = res;
|
|
|
|
p->interleave_ways = cxled->cxld.interleave_ways;
|
|
|
|
p->interleave_granularity = cxled->cxld.interleave_granularity;
|
|
|
|
p->state = CXL_CONFIG_INTERLEAVE_ACTIVE;
|
|
|
|
|
|
|
|
rc = sysfs_update_group(&cxlr->dev.kobj, get_cxl_region_target_group());
|
|
|
|
if (rc)
|
2025-02-21 09:32:05 +08:00
|
|
|
return rc;
|
2023-02-10 17:31:17 -08:00
|
|
|
|
|
|
|
dev_dbg(cxlmd->dev.parent, "%s:%s: %s %s res: %pr iw: %d ig: %d\n",
|
|
|
|
dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev), __func__,
|
|
|
|
dev_name(&cxlr->dev), p->res, p->interleave_ways,
|
|
|
|
p->interleave_granularity);
|
|
|
|
|
|
|
|
/* ...to match put_device() in cxl_add_to_region() */
|
|
|
|
get_device(&cxlr->dev);
|
|
|
|
|
2025-02-21 09:32:05 +08:00
|
|
|
return 0;
|
|
|
|
}
|
2023-02-10 17:31:17 -08:00
|
|
|
|
2025-02-21 09:32:05 +08:00
|
|
|
/* Establish an empty region covering the given HPA range */
|
|
|
|
static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
|
|
|
|
struct cxl_endpoint_decoder *cxled)
|
|
|
|
{
|
|
|
|
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
|
|
|
|
struct cxl_port *port = cxlrd_to_port(cxlrd);
|
2025-03-14 15:26:23 -07:00
|
|
|
struct cxl_dev_state *cxlds = cxlmd->cxlds;
|
|
|
|
int rc, part = READ_ONCE(cxled->part);
|
2025-02-21 09:32:05 +08:00
|
|
|
struct cxl_region *cxlr;
|
|
|
|
|
|
|
|
do {
|
2025-03-14 15:26:23 -07:00
|
|
|
cxlr = __create_region(cxlrd, cxlds->part[part].mode,
|
2025-02-21 09:32:05 +08:00
|
|
|
atomic_read(&cxlrd->region_id));
|
|
|
|
} while (IS_ERR(cxlr) && PTR_ERR(cxlr) == -EBUSY);
|
|
|
|
|
|
|
|
if (IS_ERR(cxlr)) {
|
|
|
|
dev_err(cxlmd->dev.parent,
|
|
|
|
"%s:%s: %s failed assign region: %ld\n",
|
|
|
|
dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
|
|
|
|
__func__, PTR_ERR(cxlr));
|
|
|
|
return cxlr;
|
|
|
|
}
|
|
|
|
|
|
|
|
rc = __construct_region(cxlr, cxlrd, cxled);
|
|
|
|
if (rc) {
|
|
|
|
devm_release_action(port->uport_dev, unregister_region, cxlr);
|
|
|
|
return ERR_PTR(rc);
|
|
|
|
}
|
|
|
|
|
|
|
|
return cxlr;
|
2023-02-10 17:31:17 -08:00
|
|
|
}
|
|
|
|
|
2025-05-09 17:06:55 +02:00
|
|
|
static struct cxl_region *
|
|
|
|
cxl_find_region_by_range(struct cxl_root_decoder *cxlrd, struct range *hpa)
|
|
|
|
{
|
|
|
|
struct device *region_dev;
|
|
|
|
|
|
|
|
region_dev = device_find_child(&cxlrd->cxlsd.cxld.dev, hpa,
|
|
|
|
match_region_by_range);
|
|
|
|
if (!region_dev)
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
return to_cxl_region(region_dev);
|
|
|
|
}
|
|
|
|
|
2025-05-09 17:06:52 +02:00
|
|
|
int cxl_add_to_region(struct cxl_endpoint_decoder *cxled)
|
2023-02-10 17:31:17 -08:00
|
|
|
{
|
|
|
|
struct range *hpa = &cxled->cxld.hpa_range;
|
|
|
|
struct cxl_region_params *p;
|
|
|
|
bool attach = false;
|
|
|
|
int rc;
|
|
|
|
|
2025-05-09 17:06:54 +02:00
|
|
|
struct cxl_root_decoder *cxlrd __free(put_cxl_root_decoder) =
|
|
|
|
cxl_find_root_decoder(cxled);
|
|
|
|
if (!cxlrd)
|
2023-02-10 17:31:17 -08:00
|
|
|
return -ENXIO;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Ensure that if multiple threads race to construct_region() for @hpa
|
|
|
|
* one does the construction and the others add to that.
|
|
|
|
*/
|
|
|
|
mutex_lock(&cxlrd->range_lock);
|
2025-05-09 17:06:55 +02:00
|
|
|
struct cxl_region *cxlr __free(put_cxl_region) =
|
|
|
|
cxl_find_region_by_range(cxlrd, hpa);
|
|
|
|
if (!cxlr)
|
2023-02-10 17:31:17 -08:00
|
|
|
cxlr = construct_region(cxlrd, cxled);
|
|
|
|
mutex_unlock(&cxlrd->range_lock);
|
|
|
|
|
2023-02-13 11:12:11 +01:00
|
|
|
rc = PTR_ERR_OR_ZERO(cxlr);
|
|
|
|
if (rc)
|
2025-05-09 17:06:54 +02:00
|
|
|
return rc;
|
2023-02-10 17:31:17 -08:00
|
|
|
|
|
|
|
attach_target(cxlr, cxled, -1, TASK_UNINTERRUPTIBLE);
|
|
|
|
|
2025-07-11 16:49:32 -07:00
|
|
|
scoped_guard(rwsem_read, &cxl_rwsem.region) {
|
|
|
|
p = &cxlr->params;
|
|
|
|
attach = p->state == CXL_CONFIG_COMMIT;
|
|
|
|
}
|
2023-02-10 17:31:17 -08:00
|
|
|
|
|
|
|
if (attach) {
|
|
|
|
/*
|
|
|
|
* If device_attach() fails the range may still be active via
|
|
|
|
* the platform-firmware memory map, otherwise the driver for
|
|
|
|
* regions is local to this file, so driver matching can't fail.
|
|
|
|
*/
|
|
|
|
if (device_attach(&cxlr->dev) < 0)
|
|
|
|
dev_err(&cxlr->dev, "failed to enable, range: %pr\n",
|
|
|
|
p->res);
|
|
|
|
}
|
|
|
|
|
|
|
|
return rc;
|
|
|
|
}
|
module: Convert symbol namespace to string literal
Clean up the existing export namespace code along the same lines of
commit 33def8498fdd ("treewide: Convert macro and uses of __section(foo)
to __section("foo")") and for the same reason, it is not desired for the
namespace argument to be a macro expansion itself.
Scripted using
git grep -l -e MODULE_IMPORT_NS -e EXPORT_SYMBOL_NS | while read file;
do
awk -i inplace '
/^#define EXPORT_SYMBOL_NS/ {
gsub(/__stringify\(ns\)/, "ns");
print;
next;
}
/^#define MODULE_IMPORT_NS/ {
gsub(/__stringify\(ns\)/, "ns");
print;
next;
}
/MODULE_IMPORT_NS/ {
$0 = gensub(/MODULE_IMPORT_NS\(([^)]*)\)/, "MODULE_IMPORT_NS(\"\\1\")", "g");
}
/EXPORT_SYMBOL_NS/ {
if ($0 ~ /(EXPORT_SYMBOL_NS[^(]*)\(([^,]+),/) {
if ($0 !~ /(EXPORT_SYMBOL_NS[^(]*)\(([^,]+), ([^)]+)\)/ &&
$0 !~ /(EXPORT_SYMBOL_NS[^(]*)\(\)/ &&
$0 !~ /^my/) {
getline line;
gsub(/[[:space:]]*\\$/, "");
gsub(/[[:space:]]/, "", line);
$0 = $0 " " line;
}
$0 = gensub(/(EXPORT_SYMBOL_NS[^(]*)\(([^,]+), ([^)]+)\)/,
"\\1(\\2, \"\\3\")", "g");
}
}
{ print }' $file;
done
Requested-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://mail.google.com/mail/u/2/#inbox/FMfcgzQXKWgMmjdFwwdsfgxzKpVHWPlc
Acked-by: Greg KH <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-12-02 15:59:47 +01:00
|
|
|
EXPORT_SYMBOL_NS_GPL(cxl_add_to_region, "CXL");
|
2023-02-10 17:31:17 -08:00
|
|
|
|
2025-02-26 09:21:21 -07:00
|
|
|
u64 cxl_port_get_spa_cache_alias(struct cxl_port *endpoint, u64 spa)
|
|
|
|
{
|
|
|
|
struct cxl_region_ref *iter;
|
|
|
|
unsigned long index;
|
|
|
|
|
|
|
|
if (!endpoint)
|
|
|
|
return ~0ULL;
|
|
|
|
|
2025-07-11 16:49:32 -07:00
|
|
|
guard(rwsem_write)(&cxl_rwsem.region);
|
2025-02-26 09:21:21 -07:00
|
|
|
|
|
|
|
xa_for_each(&endpoint->regions, index, iter) {
|
|
|
|
struct cxl_region_params *p = &iter->region->params;
|
|
|
|
|
2025-07-11 11:23:57 +08:00
|
|
|
if (cxl_resource_contains_addr(p->res, spa)) {
|
2025-02-26 09:21:21 -07:00
|
|
|
if (!p->cache_size)
|
|
|
|
return ~0ULL;
|
|
|
|
|
2025-03-17 15:01:24 +08:00
|
|
|
if (spa >= p->res->start + p->cache_size)
|
2025-02-26 09:21:21 -07:00
|
|
|
return spa - p->cache_size;
|
|
|
|
|
|
|
|
return spa + p->cache_size;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return ~0ULL;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_NS_GPL(cxl_port_get_spa_cache_alias, "CXL");
|
|
|
|
|
2023-02-10 17:31:17 -08:00
|
|
|
static int is_system_ram(struct resource *res, void *arg)
|
|
|
|
{
|
|
|
|
struct cxl_region *cxlr = arg;
|
|
|
|
struct cxl_region_params *p = &cxlr->params;
|
|
|
|
|
|
|
|
dev_dbg(&cxlr->dev, "%pr has System RAM: %pr\n", p->res, res);
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2024-09-04 09:47:54 -05:00
|
|
|
static void shutdown_notifiers(void *_cxlr)
|
|
|
|
{
|
|
|
|
struct cxl_region *cxlr = _cxlr;
|
|
|
|
|
2025-06-16 15:51:49 +02:00
|
|
|
unregister_node_notifier(&cxlr->node_notifier);
|
2024-09-04 09:47:54 -05:00
|
|
|
unregister_mt_adistance_algorithm(&cxlr->adist_notifier);
|
|
|
|
}
|
|
|
|
|
2025-07-11 16:49:30 -07:00
|
|
|
static int cxl_region_can_probe(struct cxl_region *cxlr)
|
2021-06-15 14:00:40 -07:00
|
|
|
{
|
|
|
|
struct cxl_region_params *p = &cxlr->params;
|
|
|
|
int rc;
|
|
|
|
|
2025-07-11 16:49:32 -07:00
|
|
|
ACQUIRE(rwsem_read_intr, rwsem)(&cxl_rwsem.region);
|
|
|
|
if ((rc = ACQUIRE_ERR(rwsem_read_intr, &rwsem))) {
|
2021-06-15 14:00:40 -07:00
|
|
|
dev_dbg(&cxlr->dev, "probe interrupted\n");
|
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (p->state < CXL_CONFIG_COMMIT) {
|
|
|
|
dev_dbg(&cxlr->dev, "config state: %d\n", p->state);
|
2025-07-11 16:49:32 -07:00
|
|
|
return -ENXIO;
|
2021-06-15 14:00:40 -07:00
|
|
|
}
|
|
|
|
|
2023-06-16 18:24:34 -07:00
|
|
|
if (test_bit(CXL_REGION_F_NEEDS_RESET, &cxlr->flags)) {
|
|
|
|
dev_err(&cxlr->dev,
|
|
|
|
"failed to activate, re-commit region and retry\n");
|
2025-07-11 16:49:32 -07:00
|
|
|
return -ENXIO;
|
2023-06-16 18:24:34 -07:00
|
|
|
}
|
2022-12-01 14:03:41 -08:00
|
|
|
|
2025-07-11 16:49:30 -07:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int cxl_region_probe(struct device *dev)
|
|
|
|
{
|
|
|
|
struct cxl_region *cxlr = to_cxl_region(dev);
|
|
|
|
struct cxl_region_params *p = &cxlr->params;
|
|
|
|
int rc;
|
|
|
|
|
|
|
|
rc = cxl_region_can_probe(cxlr);
|
|
|
|
if (rc)
|
|
|
|
return rc;
|
|
|
|
|
2021-06-15 14:00:40 -07:00
|
|
|
/*
|
|
|
|
* From this point on any path that changes the region's state away from
|
|
|
|
* CXL_CONFIG_COMMIT is also responsible for releasing the driver.
|
|
|
|
*/
|
2024-09-04 09:47:54 -05:00
|
|
|
|
2025-06-16 15:51:49 +02:00
|
|
|
cxlr->node_notifier.notifier_call = cxl_region_perf_attrs_callback;
|
|
|
|
cxlr->node_notifier.priority = CXL_CALLBACK_PRI;
|
|
|
|
register_node_notifier(&cxlr->node_notifier);
|
2024-09-04 09:47:54 -05:00
|
|
|
|
|
|
|
cxlr->adist_notifier.notifier_call = cxl_region_calculate_adistance;
|
|
|
|
cxlr->adist_notifier.priority = 100;
|
|
|
|
register_mt_adistance_algorithm(&cxlr->adist_notifier);
|
|
|
|
|
|
|
|
rc = devm_add_action_or_reset(&cxlr->dev, shutdown_notifiers, cxlr);
|
2022-12-01 14:03:24 -08:00
|
|
|
if (rc)
|
|
|
|
return rc;
|
|
|
|
|
2022-01-11 08:06:40 -08:00
|
|
|
switch (cxlr->mode) {
|
2025-02-03 20:24:29 -08:00
|
|
|
case CXL_PARTMODE_PMEM:
|
2025-05-21 13:47:41 +01:00
|
|
|
rc = devm_cxl_region_edac_register(cxlr);
|
|
|
|
if (rc)
|
|
|
|
dev_dbg(&cxlr->dev, "CXL EDAC registration for region_id=%d failed\n",
|
|
|
|
cxlr->id);
|
|
|
|
|
2022-01-11 08:06:40 -08:00
|
|
|
return devm_cxl_add_pmem_region(cxlr);
|
2025-02-03 20:24:29 -08:00
|
|
|
case CXL_PARTMODE_RAM:
|
2025-05-21 13:47:41 +01:00
|
|
|
rc = devm_cxl_region_edac_register(cxlr);
|
|
|
|
if (rc)
|
|
|
|
dev_dbg(&cxlr->dev, "CXL EDAC registration for region_id=%d failed\n",
|
|
|
|
cxlr->id);
|
|
|
|
|
2023-02-10 17:31:17 -08:00
|
|
|
/*
|
|
|
|
* The region can not be manged by CXL if any portion of
|
|
|
|
* it is already online as 'System RAM'
|
|
|
|
*/
|
|
|
|
if (walk_iomem_res_desc(IORES_DESC_NONE,
|
|
|
|
IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY,
|
|
|
|
p->res->start, p->res->end, cxlr,
|
|
|
|
is_system_ram) > 0)
|
|
|
|
return 0;
|
2023-02-10 01:07:19 -08:00
|
|
|
return devm_cxl_add_dax_region(cxlr);
|
2022-01-11 08:06:40 -08:00
|
|
|
default:
|
|
|
|
dev_dbg(&cxlr->dev, "unsupported region mode: %d\n",
|
|
|
|
cxlr->mode);
|
|
|
|
return -ENXIO;
|
|
|
|
}
|
2021-06-15 14:00:40 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
static struct cxl_driver cxl_region_driver = {
|
|
|
|
.name = "cxl_region",
|
|
|
|
.probe = cxl_region_probe,
|
|
|
|
.id = CXL_DEVICE_REGION,
|
|
|
|
};
|
|
|
|
|
|
|
|
int cxl_region_init(void)
|
|
|
|
{
|
|
|
|
return cxl_driver_register(&cxl_region_driver);
|
|
|
|
}
|
|
|
|
|
|
|
|
void cxl_region_exit(void)
|
|
|
|
{
|
|
|
|
cxl_driver_unregister(&cxl_region_driver);
|
|
|
|
}
|
|
|
|
|
module: Convert symbol namespace to string literal
Clean up the existing export namespace code along the same lines of
commit 33def8498fdd ("treewide: Convert macro and uses of __section(foo)
to __section("foo")") and for the same reason, it is not desired for the
namespace argument to be a macro expansion itself.
Scripted using
git grep -l -e MODULE_IMPORT_NS -e EXPORT_SYMBOL_NS | while read file;
do
awk -i inplace '
/^#define EXPORT_SYMBOL_NS/ {
gsub(/__stringify\(ns\)/, "ns");
print;
next;
}
/^#define MODULE_IMPORT_NS/ {
gsub(/__stringify\(ns\)/, "ns");
print;
next;
}
/MODULE_IMPORT_NS/ {
$0 = gensub(/MODULE_IMPORT_NS\(([^)]*)\)/, "MODULE_IMPORT_NS(\"\\1\")", "g");
}
/EXPORT_SYMBOL_NS/ {
if ($0 ~ /(EXPORT_SYMBOL_NS[^(]*)\(([^,]+),/) {
if ($0 !~ /(EXPORT_SYMBOL_NS[^(]*)\(([^,]+), ([^)]+)\)/ &&
$0 !~ /(EXPORT_SYMBOL_NS[^(]*)\(\)/ &&
$0 !~ /^my/) {
getline line;
gsub(/[[:space:]]*\\$/, "");
gsub(/[[:space:]]/, "", line);
$0 = $0 " " line;
}
$0 = gensub(/(EXPORT_SYMBOL_NS[^(]*)\(([^,]+), ([^)]+)\)/,
"\\1(\\2, \"\\3\")", "g");
}
}
{ print }' $file;
done
Requested-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://mail.google.com/mail/u/2/#inbox/FMfcgzQXKWgMmjdFwwdsfgxzKpVHWPlc
Acked-by: Greg KH <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-12-02 15:59:47 +01:00
|
|
|
MODULE_IMPORT_NS("CXL");
|
|
|
|
MODULE_IMPORT_NS("DEVMEM");
|
2021-06-15 14:00:40 -07:00
|
|
|
MODULE_ALIAS_CXL(CXL_DEVICE_REGION);
|