mirror of
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2025-08-05 16:54:27 +00:00

[Why] Brightness programming may involve a conversion of a user requested brightness against what was in a custom brightness curve. The values might not match what a user programmed. [How] Add a new trace event to show specific converted brightness values. Reviewed-by: Alex Hung <alex.hung@amd.com> Link: https://lore.kernel.org/r/20250623171114.1156451-2-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
105 lines
3.7 KiB
ReStructuredText
105 lines
3.7 KiB
ReStructuredText
===============
|
|
GPU Debugging
|
|
===============
|
|
|
|
General Debugging Options
|
|
=========================
|
|
|
|
The DebugFS section provides documentation on a number files to aid in debugging
|
|
issues on the GPU.
|
|
|
|
|
|
GPUVM Debugging
|
|
===============
|
|
|
|
To aid in debugging GPU virtual memory related problems, the driver supports a
|
|
number of options module parameters:
|
|
|
|
`vm_fault_stop` - If non-0, halt the GPU memory controller on a GPU page fault.
|
|
|
|
`vm_update_mode` - If non-0, use the CPU to update GPU page tables rather than
|
|
the GPU.
|
|
|
|
|
|
Decoding a GPUVM Page Fault
|
|
===========================
|
|
|
|
If you see a GPU page fault in the kernel log, you can decode it to figure
|
|
out what is going wrong in your application. A page fault in your kernel
|
|
log may look something like this:
|
|
|
|
::
|
|
|
|
[gfxhub0] no-retry page fault (src_id:0 ring:24 vmid:3 pasid:32777, for process glxinfo pid 2424 thread glxinfo:cs0 pid 2425)
|
|
in page starting at address 0x0000800102800000 from IH client 0x1b (UTCL2)
|
|
VM_L2_PROTECTION_FAULT_STATUS:0x00301030
|
|
Faulty UTCL2 client ID: TCP (0x8)
|
|
MORE_FAULTS: 0x0
|
|
WALKER_ERROR: 0x0
|
|
PERMISSION_FAULTS: 0x3
|
|
MAPPING_ERROR: 0x0
|
|
RW: 0x0
|
|
|
|
First you have the memory hub, gfxhub and mmhub. gfxhub is the memory
|
|
hub used for graphics, compute, and sdma on some chips. mmhub is the
|
|
memory hub used for multi-media and sdma on some chips.
|
|
|
|
Next you have the vmid and pasid. If the vmid is 0, this fault was likely
|
|
caused by the kernel driver or firmware. If the vmid is non-0, it is generally
|
|
a fault in a user application. The pasid is used to link a vmid to a system
|
|
process id. If the process is active when the fault happens, the process
|
|
information will be printed.
|
|
|
|
The GPU virtual address that caused the fault comes next.
|
|
|
|
The client ID indicates the GPU block that caused the fault.
|
|
Some common client IDs:
|
|
|
|
- CB/DB: The color/depth backend of the graphics pipe
|
|
- CPF: Command Processor Frontend
|
|
- CPC: Command Processor Compute
|
|
- CPG: Command Processor Graphics
|
|
- TCP/SQC/SQG: Shaders
|
|
- SDMA: SDMA engines
|
|
- VCN: Video encode/decode engines
|
|
- JPEG: JPEG engines
|
|
|
|
PERMISSION_FAULTS describe what faults were encountered:
|
|
|
|
- bit 0: the PTE was not valid
|
|
- bit 1: the PTE read bit was not set
|
|
- bit 2: the PTE write bit was not set
|
|
- bit 3: the PTE execute bit was not set
|
|
|
|
Finally, RW, indicates whether the access was a read (0) or a write (1).
|
|
|
|
In the example above, a shader (cliend id = TCP) generated a read (RW = 0x0) to
|
|
an invalid page (PERMISSION_FAULTS = 0x3) at GPU virtual address
|
|
0x0000800102800000. The user can then inspect their shader code and resource
|
|
descriptor state to determine what caused the GPU page fault.
|
|
|
|
UMR
|
|
===
|
|
|
|
`umr <https://gitlab.freedesktop.org/tomstdenis/umr>`_ is a general purpose
|
|
GPU debugging and diagnostics tool. Please see the umr
|
|
`documentation <https://umr.readthedocs.io/en/main/>`_ for more information
|
|
about its capabilities.
|
|
|
|
Debugging backlight brightness
|
|
==============================
|
|
Default backlight brightness is intended to be set via the policy advertised
|
|
by the firmware. Firmware will often provide different defaults for AC or DC.
|
|
Furthermore, some userspace software will save backlight brightness during
|
|
the previous boot and attempt to restore it.
|
|
|
|
Some firmware also has support for a feature called "Custom Backlight Curves"
|
|
where an input value for brightness is mapped along a linearly interpolated
|
|
curve of brightness values that better match display characteristics.
|
|
|
|
In the event of problems happening with backlight, there is a trace event
|
|
that can be enabled at bootup to log every brightness change request.
|
|
This can help isolate where the problem is. To enable the trace event add
|
|
the following to the kernel command line:
|
|
|
|
tp_printk trace_event=amdgpu_dm:amdgpu_dm_brightness:mod:amdgpu trace_buf_size=1M
|