2018-06-21 16:48:28 -07:00
|
|
|
PCIe Device AER statistics
|
2020-10-30 08:40:39 +01:00
|
|
|
--------------------------
|
|
|
|
|
2018-06-21 16:48:28 -07:00
|
|
|
These attributes show up under all the devices that are AER capable. These
|
|
|
|
statistical counters indicate the errors "as seen/reported by the device".
|
|
|
|
Note that this may mean that if an endpoint is causing problems, the AER
|
|
|
|
counters may increment at its link partner (e.g. root port) because the
|
|
|
|
errors may be "seen" / reported by the link partner and not the
|
|
|
|
problematic endpoint itself (which may report all counters as 0 as it never
|
|
|
|
saw any problems).
|
|
|
|
|
2019-06-14 14:52:15 -03:00
|
|
|
What: /sys/bus/pci/devices/<dev>/aer_dev_correctable
|
2018-06-21 16:48:28 -07:00
|
|
|
Date: July 2018
|
2024-02-02 14:16:35 +01:00
|
|
|
KernelVersion: 4.19.0
|
2018-06-21 16:48:28 -07:00
|
|
|
Contact: linux-pci@vger.kernel.org, rajatja@google.com
|
|
|
|
Description: List of correctable errors seen and reported by this
|
|
|
|
PCI device using ERR_COR. Note that since multiple errors may
|
|
|
|
be reported using a single ERR_COR message, thus
|
|
|
|
TOTAL_ERR_COR at the end of the file may not match the actual
|
2020-10-30 08:40:39 +01:00
|
|
|
total of all the errors in the file. Sample output::
|
|
|
|
|
|
|
|
localhost /sys/devices/pci0000:00/0000:00:1c.0 # cat aer_dev_correctable
|
|
|
|
Receiver Error 2
|
|
|
|
Bad TLP 0
|
|
|
|
Bad DLLP 0
|
|
|
|
RELAY_NUM Rollover 0
|
|
|
|
Replay Timer Timeout 0
|
|
|
|
Advisory Non-Fatal 0
|
|
|
|
Corrected Internal Error 0
|
|
|
|
Header Log Overflow 0
|
|
|
|
TOTAL_ERR_COR 2
|
2018-06-21 16:48:28 -07:00
|
|
|
|
2019-06-14 14:52:15 -03:00
|
|
|
What: /sys/bus/pci/devices/<dev>/aer_dev_fatal
|
2018-06-21 16:48:28 -07:00
|
|
|
Date: July 2018
|
2024-02-02 14:16:35 +01:00
|
|
|
KernelVersion: 4.19.0
|
2018-06-21 16:48:28 -07:00
|
|
|
Contact: linux-pci@vger.kernel.org, rajatja@google.com
|
|
|
|
Description: List of uncorrectable fatal errors seen and reported by this
|
|
|
|
PCI device using ERR_FATAL. Note that since multiple errors may
|
|
|
|
be reported using a single ERR_FATAL message, thus
|
|
|
|
TOTAL_ERR_FATAL at the end of the file may not match the actual
|
2020-10-30 08:40:39 +01:00
|
|
|
total of all the errors in the file. Sample output::
|
|
|
|
|
|
|
|
localhost /sys/devices/pci0000:00/0000:00:1c.0 # cat aer_dev_fatal
|
|
|
|
Undefined 0
|
|
|
|
Data Link Protocol 0
|
|
|
|
Surprise Down Error 0
|
|
|
|
Poisoned TLP 0
|
|
|
|
Flow Control Protocol 0
|
|
|
|
Completion Timeout 0
|
|
|
|
Completer Abort 0
|
|
|
|
Unexpected Completion 0
|
|
|
|
Receiver Overflow 0
|
|
|
|
Malformed TLP 0
|
|
|
|
ECRC 0
|
|
|
|
Unsupported Request 0
|
|
|
|
ACS Violation 0
|
|
|
|
Uncorrectable Internal Error 0
|
|
|
|
MC Blocked TLP 0
|
|
|
|
AtomicOp Egress Blocked 0
|
|
|
|
TLP Prefix Blocked Error 0
|
|
|
|
TOTAL_ERR_FATAL 0
|
2018-06-21 16:48:28 -07:00
|
|
|
|
2019-06-14 14:52:15 -03:00
|
|
|
What: /sys/bus/pci/devices/<dev>/aer_dev_nonfatal
|
2018-06-21 16:48:28 -07:00
|
|
|
Date: July 2018
|
2024-02-02 14:16:35 +01:00
|
|
|
KernelVersion: 4.19.0
|
2018-06-21 16:48:28 -07:00
|
|
|
Contact: linux-pci@vger.kernel.org, rajatja@google.com
|
|
|
|
Description: List of uncorrectable nonfatal errors seen and reported by this
|
|
|
|
PCI device using ERR_NONFATAL. Note that since multiple errors
|
|
|
|
may be reported using a single ERR_FATAL message, thus
|
|
|
|
TOTAL_ERR_NONFATAL at the end of the file may not match the
|
2020-10-30 08:40:39 +01:00
|
|
|
actual total of all the errors in the file. Sample output::
|
|
|
|
|
|
|
|
localhost /sys/devices/pci0000:00/0000:00:1c.0 # cat aer_dev_nonfatal
|
|
|
|
Undefined 0
|
|
|
|
Data Link Protocol 0
|
|
|
|
Surprise Down Error 0
|
|
|
|
Poisoned TLP 0
|
|
|
|
Flow Control Protocol 0
|
|
|
|
Completion Timeout 0
|
|
|
|
Completer Abort 0
|
|
|
|
Unexpected Completion 0
|
|
|
|
Receiver Overflow 0
|
|
|
|
Malformed TLP 0
|
|
|
|
ECRC 0
|
|
|
|
Unsupported Request 0
|
|
|
|
ACS Violation 0
|
|
|
|
Uncorrectable Internal Error 0
|
|
|
|
MC Blocked TLP 0
|
|
|
|
AtomicOp Egress Blocked 0
|
|
|
|
TLP Prefix Blocked Error 0
|
|
|
|
TOTAL_ERR_NONFATAL 0
|
2018-06-21 16:48:29 -07:00
|
|
|
|
|
|
|
PCIe Rootport AER statistics
|
2020-10-30 08:40:39 +01:00
|
|
|
----------------------------
|
|
|
|
|
2018-06-21 16:48:29 -07:00
|
|
|
These attributes show up under only the rootports (or root complex event
|
|
|
|
collectors) that are AER capable. These indicate the number of error messages as
|
|
|
|
"reported to" the rootport. Please note that the rootports also transmit
|
|
|
|
(internally) the ERR_* messages for errors seen by the internal rootport PCI
|
|
|
|
device, so these counters include them and are thus cumulative of all the error
|
|
|
|
messages on the PCI hierarchy originating at that root port.
|
|
|
|
|
2024-02-02 14:16:34 +01:00
|
|
|
What: /sys/bus/pci/devices/<dev>/aer_rootport_total_err_cor
|
2018-06-21 16:48:29 -07:00
|
|
|
Date: July 2018
|
2024-02-02 14:16:35 +01:00
|
|
|
KernelVersion: 4.19.0
|
2018-06-21 16:48:29 -07:00
|
|
|
Contact: linux-pci@vger.kernel.org, rajatja@google.com
|
|
|
|
Description: Total number of ERR_COR messages reported to rootport.
|
|
|
|
|
2024-02-02 14:16:34 +01:00
|
|
|
What: /sys/bus/pci/devices/<dev>/aer_rootport_total_err_fatal
|
2018-06-21 16:48:29 -07:00
|
|
|
Date: July 2018
|
2024-02-02 14:16:35 +01:00
|
|
|
KernelVersion: 4.19.0
|
2018-06-21 16:48:29 -07:00
|
|
|
Contact: linux-pci@vger.kernel.org, rajatja@google.com
|
|
|
|
Description: Total number of ERR_FATAL messages reported to rootport.
|
|
|
|
|
2024-02-02 14:16:34 +01:00
|
|
|
What: /sys/bus/pci/devices/<dev>/aer_rootport_total_err_nonfatal
|
2018-06-21 16:48:29 -07:00
|
|
|
Date: July 2018
|
2024-02-02 14:16:35 +01:00
|
|
|
KernelVersion: 4.19.0
|
2018-06-21 16:48:29 -07:00
|
|
|
Contact: linux-pci@vger.kernel.org, rajatja@google.com
|
|
|
|
Description: Total number of ERR_NONFATAL messages reported to rootport.
|
2025-05-22 18:21:26 -05:00
|
|
|
|
|
|
|
PCIe AER ratelimits
|
|
|
|
-------------------
|
|
|
|
|
|
|
|
These attributes show up under all the devices that are AER capable.
|
|
|
|
They represent configurable ratelimits of logs per error type.
|
|
|
|
|
|
|
|
See Documentation/PCI/pcieaer-howto.rst for more info on ratelimits.
|
|
|
|
|
|
|
|
What: /sys/bus/pci/devices/<dev>/aer/correctable_ratelimit_interval_ms
|
|
|
|
Date: May 2025
|
|
|
|
KernelVersion: 6.16.0
|
|
|
|
Contact: linux-pci@vger.kernel.org
|
|
|
|
Description: Writing 0 disables AER correctable error log ratelimiting.
|
|
|
|
Writing a positive value sets the ratelimit interval in ms.
|
|
|
|
Default is DEFAULT_RATELIMIT_INTERVAL (5000 ms).
|
|
|
|
|
|
|
|
What: /sys/bus/pci/devices/<dev>/aer/correctable_ratelimit_burst
|
|
|
|
Date: May 2025
|
|
|
|
KernelVersion: 6.16.0
|
|
|
|
Contact: linux-pci@vger.kernel.org
|
|
|
|
Description: Ratelimit burst for correctable error logs. Writing a value
|
|
|
|
changes the number of errors (burst) allowed per interval
|
|
|
|
before ratelimiting. Reading gets the current ratelimit
|
|
|
|
burst. Default is DEFAULT_RATELIMIT_BURST (10).
|
|
|
|
|
|
|
|
What: /sys/bus/pci/devices/<dev>/aer/nonfatal_ratelimit_interval_ms
|
|
|
|
Date: May 2025
|
|
|
|
KernelVersion: 6.16.0
|
|
|
|
Contact: linux-pci@vger.kernel.org
|
|
|
|
Description: Writing 0 disables AER non-fatal uncorrectable error log
|
|
|
|
ratelimiting. Writing a positive value sets the ratelimit
|
|
|
|
interval in ms. Default is DEFAULT_RATELIMIT_INTERVAL
|
|
|
|
(5000 ms).
|
|
|
|
|
|
|
|
What: /sys/bus/pci/devices/<dev>/aer/nonfatal_ratelimit_burst
|
|
|
|
Date: May 2025
|
|
|
|
KernelVersion: 6.16.0
|
|
|
|
Contact: linux-pci@vger.kernel.org
|
|
|
|
Description: Ratelimit burst for non-fatal uncorrectable error logs.
|
|
|
|
Writing a value changes the number of errors (burst)
|
|
|
|
allowed per interval before ratelimiting. Reading gets the
|
|
|
|
current ratelimit burst. Default is DEFAULT_RATELIMIT_BURST
|
|
|
|
(10).
|