mirror of
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2025-08-05 16:54:27 +00:00
6 commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
![]() |
d7223aed30 |
- i10nm:
- switch to use scnprintf() - Add Granite Rapids-D support - synopsys: Make sure ECC error and counter registers are cleared during init/probing to avoid reporting stale errors - igen6: Add Wildcat Lake SoCs support - Make sure scrub features sysfs attributes are initialized properly - Allocate memory repair sysfs attributes statically to reduce stack usage - Fix DIMM module size computation for DIMMs with total capacity which is a non power-of-two number, in amd64_edac - Do not be too dramatic when reporting disabled memory controllers in igen6_edac - Add support to ie31200_edac for the following SoCs: - Core i5-14[67]00 - Bartless Lake-S SoCs - Raptor Lake-HX -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmiHeDcACgkQEsHwGGHe VUrnzhAAryFKu8xWuwOE3eGaMW6oJhjKF8wPxLiCxxi6ZdQ/1uudFVnzwgozmkXo l10h41A3yc1ZdJqdqn54gF8PxbQ0E1MvbXfmBqZ/U+V+dv6zMwu9TygoPRIJ60ST aIxTBq2zoSii7ucGCBjbqClMTF3ZcH/Q2FzZoFbZyZd84snWSz0B9+S+937mtMhl 9Y55sAgQuigQDQ71YZymAGyWi9E9J20wFk76vIHEboRIa5sS0iCU88Wb4PT+5iKf Qc/1gyqnd+6FO9O9ddrYpeDcaIicLShuGVNZNlJalD/JyTIOcP6XdEDa5J7TYp27 7IcmfHSYmZ5eL0vrJfrIwbauEpRL9ZjWXS+uQjj8/K/gkPUsH/Sdldgldkd50GHV 6L79XSzpy4yhlAr3BXU0o917qRVWOpbxr9E7l6VAFGBpLl5ewtZiV3W7/Su4rPd2 zpUGBZvjxO8jmNQn49IPs/XotVQ2L+mT+KSxUMZAO2pV+dztSJELMFQQC0uAXiZc ApcrSkQxa4fsxU2Ukc1dLOJNkwxEC1ECcPsl2I9EE1cFoix7NP2E+G92D/V52VoZ QeVkxM7LHZCTH9tH1nrCZ+WJr8S2vZ+uY8jRl42P12xU4kcd3RWEtna18bX5oe++ RlgchnXwutEPSgHYZVPocuaDD7C6eIvYzpaVezVl9dgbRLLx8u4= =PBTf -----END PGP SIGNATURE----- Merge tag 'edac_updates_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Borislav Petkov: - i10nm: - switch to using scnprintf() - Add Granite Rapids-D support - synopsys: Make sure ECC error and counter registers are cleared during init/probing to avoid reporting stale errors - igen6: Add Wildcat Lake SoCs support - Make sure scrub features sysfs attributes are initialized properly - Allocate memory repair sysfs attributes statically to reduce stack usage - Fix DIMM module size computation for DIMMs with total capacity which is a non power-of-two number, in amd64_edac - Do not be too dramatic when reporting disabled memory controllers in igen6_edac - Add support to ie31200_edac for the following SoCs: - Core i5-14[67]00 - Bartless Lake-S SoCs - Raptor Lake-HX * tag 'edac_updates_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/{skx_common,i10nm}: Use scnprintf() for safer buffer handling EDAC/synopsys: Clear the ECC counters on init EDAC/ie31200: Add Intel Raptor Lake-HX SoCs support EDAC/igen6: Add Intel Wildcat Lake SoCs support EDAC/i10nm: Add Intel Granite Rapids-D support EDAC/mem_repair: Reduce stack usage in edac_mem_repair_get_desc() EDAC/igen6: Reduce log level to debug for absent memory controllers EDAC/ie31200: Document which CPUs correspond to each Raptor Lake-S device ID EDAC/ie31200: Enable support for Core i5-14600 and i7-14700 ie31200/EDAC: Add Intel Bartlett Lake-S SoCs support |
||
![]() |
1e14ea901d |
EDAC: Initialize EDAC features sysfs attributes
Fix the lockdep splat caused by missing sysfs_attr_init() calls for the recently added EDAC feature's sysfs attributes. In lockdep_init_map_type(), the check for the lock-class key if (!static_obj(key) && !is_dynamic_key(key)) causes the splat. Backtrace: RIP: 0010:lockdep_init_map_type Call Trace: __kernfs_create_file sysfs_add_file_mode_ns internal_create_group internal_create_groups device_add ? __init_waitqueue_head edac_dev_register devm_cxl_memdev_edac_register ? lock_acquire ? find_held_lock ? cxl_mem_probe ? cxl_mem_probe ? lockdep_hardirqs_on ? cxl_mem_probe cxl_mem_probe [ bp: Massage. ] Fixes: |
||
![]() |
815703e2ec |
EDAC/mem_repair: Reduce stack usage in edac_mem_repair_get_desc()
Constructing an array on the stack adds complexity and can exceed the
warning limit for per-function stack usage:
drivers/edac/mem_repair.c:361:5: error: stack frame size (1296) exceeds
limit (1280) in 'edac_mem_repair_get_desc' [-Werror,-Wframe-larger-than]
Change this to have the actual attribute array allocated statically and then
just add the instance number on the per-instance copy.
Fixes:
|
||
![]() |
588ca944c2 |
cxl/edac: Add CXL memory device memory sparing control feature
Memory sparing is defined as a repair function that replaces a portion of memory with a portion of functional memory at that same DPA. The subclasses for this operation vary in terms of the scope of the sparing being performed. The cacheline sparing subclass refers to a sparing action that can replace a full cacheline. Row sparing is provided as an alternative to PPR sparing functions and its scope is that of a single DDR row. As per CXL r3.2 Table 8-125 foot note 1. Memory sparing is preferred over PPR when possible. Bank sparing allows an entire bank to be replaced. Rank sparing is defined as an operation in which an entire DDR rank is replaced. Memory sparing maintenance operations may be supported by CXL devices that implement CXL.mem protocol. A sparing maintenance operation requests the CXL device to perform a repair operation on its media. For example, a CXL device with DRAM components that support memory sparing features may implement sparing maintenance operations. The host may issue a query command by setting query resources flag in the input payload (CXL spec 3.2 Table 8-120) to determine availability of sparing resources for a given address. In response to a query request, the device shall report the resource availability by producing the memory sparing event record (CXL spec 3.2 Table 8-60) in which the Channel, Rank, Nibble Mask, Bank Group, Bank, Row, Column, Sub-Channel fields are a copy of the values specified in the request. During the execution of a sparing maintenance operation, a CXL memory device: - may not retain data - may not be able to process CXL.mem requests correctly. These CXL memory device capabilities are specified by restriction flags in the memory sparing feature readable attributes. When a CXL device identifies error on a memory component, the device may inform the host about the need for a memory sparing maintenance operation by using DRAM event record, where the 'maintenance needed' flag may set. The event record contains some of the DPA, Channel, Rank, Nibble Mask, Bank Group, Bank, Row, Column, Sub-Channel fields that should be repaired. The userspace tool requests for maintenance operation if the 'maintenance needed' flag set in the CXL DRAM error record. CXL spec 3.2 section 8.2.10.7.1.4 describes the device's memory sparing maintenance operation feature. CXL spec 3.2 section 8.2.10.7.2.3 describes the memory sparing feature discovery and configuration. Add support for controlling CXL memory device memory sparing feature. Register with EDAC driver, which gets the memory repair attr descriptors from the EDAC memory repair driver and exposes sysfs repair control attributes for memory sparing to the userspace. For example CXL memory sparing control for the CXL mem0 device is exposed in /sys/bus/edac/devices/cxl_mem0/mem_repairX/ Use case ======== 1. CXL device identifies a failure in a memory component, report to userspace in a CXL DRAM trace event with DPA and other attributes of memory to repair such as channel, rank, nibble mask, bank Group, bank, row, column, sub-channel. 2. Rasdaemon process the trace event and may issue query request in sysfs check resources available for memory sparing if either of the following conditions met. - 'maintenance needed' flag set in the event record. - 'threshold event' flag set for CVME threshold feature. - When the number of corrected error reported on a CXL.mem media to the userspace exceeds the threshold value for corrected error count defined by the userspace policy. 3. Rasdaemon process the memory sparing trace event and issue repair request for memory sparing. Kernel CXL driver shall report memory sparing event record to the userspace with the resource availability in order rasdaemon to process the event record and issue a repair request in sysfs for the memory sparing operation in the CXL device. Note: Based on the feedbacks from the community 'query' sysfs attribute is removed and reporting memory sparing error record to the userspace are not supported. Instead userspace issues sparing operation and kernel does the same to the CXL memory device, when 'maintenance needed' flag set in the DRAM event record. Add checks to ensure the memory to be repaired is offline and if online, then originates from a CXL DRAM error record reported in the current boot before requesting a memory sparing operation on the device. Note: Tested memory sparing feature control with QEMU patch "hw/cxl: Add emulation for memory sparing control feature" https://lore.kernel.org/linux-cxl/20250509172229.726-1-shiju.jose@huawei.com/T/#m5f38512a95670d75739f9dad3ee91b95c7f5c8d6 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Link: https://patch.msgid.link/20250521124749.817-8-shiju.jose@huawei.com Signed-off-by: Dave Jiang <dave.jiang@intel.com> |
||
![]() |
81e42fc1d3 |
EDAC: Update memory repair control interface for memory sparing feature
Update memory repair control interface for memory sparing feature. CXL memory devices can support soft and hard memory sparing at cacheline, row, bank and rank granularities. Memory sparing is defined as a repair function that replaces a portion of memory with a portion of functional memory at that same granularity. When a CXL device detects an error in memory, it will report to the host that there's need for a repair maintenance operation by using an event record where the "maintenance needed" flag is set. The event records contain the device physical address (DPA) and other attributes of the memory to repair such as bank group, bank, rank, row, column, channel etc. The kernel will report the corresponding CXL general media or DRAM trace event to userspace, and userspace tools (e.g. rasdaemon) will initiate a repair operation in response to the device request via the sysfs repair control. [ bp: Massage. ] Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20250212143654.1893-15-shiju.jose@huawei.com |
||
![]() |
699ea5219c |
EDAC: Add a memory repair control feature
Add a generic EDAC memory repair control driver to manage memory repairs in the system, such as CXL Post Package Repair (PPR) and other soft and hard PPR features. For example, a CXL device with DRAM components that support PPR features may implement PPR maintenance operations. DRAM components may support two types of PPR: - hard PPR, for a permanent row repair, and - soft PPR, for a temporary row repair. Soft PPR is much faster than hard PPR, but the repair is lost with a power cycle. When a CXL device detects an error in a memory, it may report the need for a repair maintenance operation by using an event record where the "maintenance needed" flag is set. The event records contain the device physical address (DPA) and other optional attributes of the memory to repair. The kernel will report the corresponding CXL general media or DRAM trace event to userspace, and userspace tools (e.g. rasdaemon) will initiate a repair operation in response to the device request via the sysfs repair control. Device with memory repair features registers with EDAC device driver, which retrieves a memory repair descriptor from EDAC memory repair driver and exposes the sysfs repair control attributes to userspace in /sys/bus/edac/devices/<dev-name>/mem_repairX/. The common memory repair control interface abstracts the control of arbitrary memory repair functionality into a standardized set of functions. The sysfs memory repair attribute nodes are only available if the client driver has implemented the corresponding attribute callback function and provided operations to the EDAC device driver during registration. [ bp: Massage, fixup edac_dev_register() retvals, merge write_overflow fix to mem_repair_create_desc() ] Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20250212143654.1893-5-shiju.jose@huawei.com |