linux

mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-08-19 13:50:48 +00:00

Author	SHA1	Message	Date
Ben Skeggs	0c6aa94f99	drm/nouveau/gsp: switch to a simpler GSP-RM header layout Rather than using OpenRM's directory structure for headers, move to a layout that's split roughly around RM API boundaries. Also move the headers from include/nvrm to subdev/gsp/rm/r535/nvrm, with the rest of the r535-specific code. Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-05-19 06:29:23 +10:00
Ben Skeggs	c472d82834	drm/nouveau/gsp: move subdev/engine impls to subdev/gsp/rm/r535/ Move all the remaining GSP-RM code together underneath a versioned path, to make the code easier to work with when adding support for a newer RM version. Aside from adjusting include paths, no code change is intended. Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-05-19 06:29:23 +10:00
Ben Skeggs	594766ca3e	drm/nouveau/gsp: move booter handling to GPU-specific code GH100/GBxxx have significant changes to the GSP-RM boot process. Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-05-19 06:29:23 +10:00
Ben Skeggs	7f022236b5	drm/nouveau/gsp: move firmware loading to GPU-specific code GH100/GBxxx use a slightly different set of firmwares to boot GSP-RM. Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-05-19 06:29:23 +10:00
Ben Skeggs	f964336483	drm/nouveau/gsp: split device handling out on its own Split handling of NV01_DEVICE (and other related objects) out into its own module. Aside from moving the function pointers, no code change is intended. Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-05-19 06:29:23 +10:00
Ben Skeggs	45a78c6405	drm/nouveau/gsp: split client handling out on its own Split NV01_ROOT handling out into its own module. Aside from moving the function pointers, no code change is intended. Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-05-19 06:29:23 +10:00
Ben Skeggs	be33f49980	drm/nouveau/gsp: split rm alloc handling out on its own Split base RM_ALLOC handling out into its own module. Aside from moving the function pointers, no code change is intended. Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-05-19 06:29:23 +10:00
Ben Skeggs	063d193f12	drm/nouveau/gsp: split rm ctrl handling out on its own Split base RM_CONTROL handling out into its own module. Aside from moving the function pointers, no code change is intended. Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-05-19 06:29:23 +10:00
Ben Skeggs	8a8b1ec526	drm/nouveau/gsp: split rpc handling out on its own Later patches in the series add HALs around various RM APIs in order to support a newer version of GSP-RM firmware. In order to do this, begin by splitting the code up into "modules" that roughly represent RM's API boundaries so they can be more easily managed. Aside from moving the RPC function pointers, no code change is indended. Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-05-19 06:29:23 +10:00
Ben Skeggs	b8a90901db	drm/nouveau/gsp: remove gsp-specific chid allocation path In order to specify a channel ID to RM during channel allocation, the channel ID is broken down into a "userd page" index and an index into that page. It was assumed that RM would enforce that the same physical block of memory be used for all CHIDs within a "userd page", and the GSP paths override NVKM's normal CHID allocation to handle this. However, none of that turns out to be necessary. Remove the GSP-specific code and use the regular CHID allocation path. Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-05-19 06:29:22 +10:00
Ben Skeggs	7904bcdcf6	drm/nouveau/gsp: fix rm shutdown wait condition Though the initial upstreamed GSP-RM version in nouveau was 535.113.01, the code was developed against earlier versions. 535.42.02 modified the mailbox value used by GSP-RM to signal shutdown has completed, which was missed at the time. I'm not aware of any issues caused by this, but noticed the bug while working on GB20x support. Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-05-19 06:29:22 +10:00
Zhi Wang	a738fa9105	drm/nouveau/nvkm: introduce new GSP reply policy NVKM_GSP_RPC_REPLY_POLL Some GSP RPC commands need a new reply policy: "caller don't care about the message content but want to make sure a reply is received". To support this case, a new reply policy is introduced. NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY is a large GSP RPC command. The actual required policy is NVKM_GSP_RPC_REPLY_POLL. This can be observed from the dump of the GSP message queue. After the large GSP RPC command is issued, GSP will write only an empty RPC header in the queue as the reply. Without this change, the policy "receiving the entire message" is used for NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY. This causes the timeout of receiving the returned GSP message in the suspend/resume path. Introduce the new reply policy NVKM_GSP_RPC_REPLY_POLL, which waits for the returned GSP message but discards it for the caller. Use the new policy NVKM_GSP_RPC_REPLY_POLL on the GSP RPC command NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY. Fixes: `50f290053d` ("drm/nouveau: support handling the return of large GSP message") Cc: Danilo Krummrich <dakr@kernel.org> Cc: Alexandre Courbot <acourbot@nvidia.com> Tested-by: Ben Skeggs <bskeggs@nvidia.com> Signed-off-by: Zhi Wang <zhiw@nvidia.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250227013554.8269-3-zhiw@nvidia.com	2025-03-09 13:42:10 +01:00
Zhi Wang	4570355f8e	drm/nouveau/nvkm: factor out current GSP RPC command policies There can be multiple cases of handling the GSP RPC messages, which are the reply of GSP RPC commands according to the requirement of the callers and the nature of the GSP RPC commands. The current supported reply policies are "callers don't care" and "receive the entire message" according to the requirement of the callers. To introduce a new policy, factor out the current RPC command reply polices. Also, centralize the handling of the reply in a single function. Factor out NVKM_GSP_RPC_REPLY_NOWAIT as "callers don't care" and NVKM_GSP_RPC_REPLY_RECV as "receive the entire message". Introduce a kernel doc to document the policies. Factor out r535_gsp_rpc_handle_reply(). No functional change is intended for small GSP RPC commands. For large GSP commands, the caller decides the policy of how to handle the returned GSP RPC message. Cc: Ben Skeggs <bskeggs@nvidia.com> Cc: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Zhi Wang <zhiw@nvidia.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250227013554.8269-2-zhiw@nvidia.com	2025-03-09 13:41:59 +01:00
Dave Airlie	fb51bf0255	Linux 6.14-rc4 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAme7hfkeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGU+IH/1bk6zIvAwXXS5yu KNsQ8dEkC3Xme6HqLtPsAhRLF+5YJf6MaGm1ip5dDMyIvasa2gwvCQQQoOpeMbKj 79VKT+m9t3szMHZaQYjlOuYHBmNSJ4cMCD2Qh6ktXHGPfTTWDFGf7fBwBOkVNeJU 1Ask+bxeop21aJMhfYXrUta3OYyerLBUR6jCiCM82A/GLtdv6oNGXBu3ygDt9Tjx ZHSl+CYjKpmGUP8JnMKwCBHVguEfqgzZ//dY1H16AvOLed9k2jkMFn8O5Vi3vjnx TWMMXoiJimuamGzbjxtCCqzxNlFFDT4gRpDqeJxb16W/gDTFmbRr9LDjNehCZe33 AigLZ6M= =Y/7F -----END PGP SIGNATURE----- Merge tag 'v6.14-rc4' into drm-next Backmerge Linux 6.14-rc4 at the request of tzimmermann so misc-next can base on rc4. Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-02-25 17:36:09 +10:00
Dan Carpenter	fdee05235a	drm/nouveau: Fix error pointer dereference in r535_gsp_msgq_recv() If "rpc" is an error pointer then return directly. Otherwise it leads to an error pointer dereference. Fixes: `50f290053d` ("drm/nouveau: support handling the return of large GSP message") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Acked-by: Zhi Wang <zhiw@nvidia.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/b7052ac0-98e4-433b-ad58-f563bf51858c@stanley.mountain	2025-02-19 14:49:03 +01:00
Aaron Kling	3dbc0215e3	drm/nouveau/pmu: Fix gp10b firmware guard Most kernel configs enable multiple Tegra SoC generations, causing this typo to go unnoticed. But in the case where a kernel config is strictly for Tegra186, this is a problem. Fixes: `989863d7cb` ("drm/nouveau/pmu: select implementation based on available firmware") Signed-off-by: Aaron Kling <webgeek1234@gmail.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250218-nouveau-gm10b-guard-v2-1-a4de71500d48@gmail.com	2025-02-19 13:31:59 +01:00
Zhi Wang	24079ed2aa	drm/nouveau: consume the return of large GSP message As the GSP message recv path is able to handle the return of large GSP message, consume the return of large GSP message in the sending path. Signed-off-by: Zhi Wang <zhiw@nvidia.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-16-zhiw@nvidia.com	2025-01-25 00:55:10 +01:00
Zhi Wang	50f290053d	drm/nouveau: support handling the return of large GSP message The max GSP message element size is 16 pages (including the headers). To send a message larger than 16 pages, nvkm should split it into multiple and send them accordingly. The first element has the expected function number, while the rest are sent with function number as NV_VGPU_MSG_FUNCTION_CONTINUATION_RECORD. GSP consumes the elements from the cmdq and always writes the result back to the msgq. The result is also formed as split elements. However, nvkm is able to split the large GSP message and send them, but totally not aware of handling the return of the large GSP message, which are the split elements in the msgq. Thus, it keeps dumping the unknown RPC messages from msgq, which is actually CONTINUATION_RECORD message, discard them unexpectedly. Thus, the caller will not be able to consume the result from GSP. Introduce the handling of the return of large GSP message on the msgq path. Slightly re-factor the low-level part of msg receiving routines. Merge the split elements back into a large element before handling it to the upper level. Thus, the upper-level of GSP RPC APIs don't need to be heavily changed. Signed-off-by: Zhi Wang <zhiw@nvidia.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-15-zhiw@nvidia.com	2025-01-25 00:55:10 +01:00
Zhi Wang	3c48ecb38a	drm/nouveau: factor out r535_gsp_msgq_recv_one_elem() Prepare for supporting receive the large GSP RPC message. Factor out r535_gsp_msgq_recv_one_elem(). Fold its params into a data structure of params. Move the allocation of the GSP RPC message to its caller. Refine the variable names in the re-factor. No functional change is intended. Signed-off-by: Zhi Wang <zhiw@nvidia.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-14-zhiw@nvidia.com	2025-01-25 00:55:10 +01:00
Zhi Wang	c965e3598b	drm/nouveau: factor out r535_gsp_msgq_peek() To receive a GSP message queue element from the GSP status queue, the driver needs to make sure there are available elements in the queue. The previous r535_gsp_msgq_wait() consists of three functions, which is a little too complicated for a single function: - wait for an available element. - peek the message element header in the queue. - recevice the element from the queue. Factor out r535_gsp_msgq_peek() and divide the functions in r535_gsp_msgq_wait() into three functions. No functional change is intended. Cc: Danilo Krummrich <dakr@kernel.org> Signed-off-by: Zhi Wang <zhiw@nvidia.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-13-zhiw@nvidia.com	2025-01-25 00:55:10 +01:00
Zhi Wang	1829ee0b05	drm/nouveau: rename the variable "cmd" to "msg" in r535_gsp_cmdq_{get, push}() Refine the name to align with the terms in the kernel doc. No functional change is intended. Signed-off-by: Zhi Wang <zhiw@nvidia.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-12-zhiw@nvidia.com	2025-01-25 00:55:10 +01:00
Zhi Wang	4624450452	drm/nouveau: refine the variable names in r535_gsp_msg_recv() The variable "msg" in r535_gsp_msg_recv() actually means the GSP RPC. Refine the names to align with the terms in the kernel doc. No functional change is intended. Signed-off-by: Zhi Wang <zhiw@nvidia.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-11-zhiw@nvidia.com	2025-01-25 00:55:10 +01:00
Zhi Wang	0268040b9c	drm/nouveau: refine the variable names in r535_gsp_rpc_push() The variable names in r535_gsp_rpc_push() are quite confusing and some of them are not representing what they really are. Update the names and explanations in the decoder section of the kernel doc. Refine the names to align with the terms in the kernel doc. No functional change is intended. Signed-off-by: Zhi Wang <zhiw@nvidia.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-10-zhiw@nvidia.com	2025-01-25 00:55:10 +01:00
Zhi Wang	1bb9bb50a4	drm/nouveau: remove the magic number in r535_gsp_rpc_push() There has been a GSP_MSG_MAX_SIZE which represents the max size of a GSP message element header. Use it instead of a magic number. No functional change is intended. Signed-off-by: Zhi Wang <zhiw@nvidia.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-9-zhiw@nvidia.com	2025-01-25 00:55:10 +01:00
Zhi Wang	bbae6680cf	drm/nouveau: fix the broken marco GSP_MSG_MAX_SIZE The macro GSP_MSG_MAX_SIZE refers to another macro that doesn't exist. It represents the max GSP message element size. Fix the broken marco so it can be used to replace some magic numbers in the code. Signed-off-by: Zhi Wang <zhiw@nvidia.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-8-zhiw@nvidia.com	2025-01-25 00:55:10 +01:00
Zhi Wang	bda6fe811f	drm/nouveau: rename "argc" to what it represents in GSP RPC routines The name "argc" has different meanings in different functions. To improve the readability, it's better to refine it to a name that reflects what it represents. Rename "argc" to what it represents. Add terms in the decoder section to explain their meaning. No functional change is intended. Signed-off-by: Zhi Wang <zhiw@nvidia.com> [ Fix indentation. - Danilo ] Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-7-zhiw@nvidia.com	2025-01-25 00:55:10 +01:00
Zhi Wang	0c2f211b66	drm/nouveau: rename "argv" to what it represents in rm_{alloc, ctrl}_() The name "argv" has different meanings in different functions. To improve the readability, it's better to refine it to a name that reflects what it represents. Rename "argv" to what it represents. Wrap the long container_of() into to_payload_header() to denote a clear meaning and make checkpatch.pl happy. No functional change is intended. Signed-off-by: Zhi Wang <zhiw@nvidia.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-6-zhiw@nvidia.com	2025-01-25 00:55:10 +01:00
Zhi Wang	a15b537976	drm/nouveau: remove unused param repc in rm_alloc_push() The user of rm_alloc_push() always pass 0 in repc. Remove unused param repc since no user actually uses it. No functional change is intended. Signed-off-by: Zhi Wang <zhiw@nvidia.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-5-zhiw@nvidia.com	2025-01-25 00:55:10 +01:00
Zhi Wang	2c6a79af3f	drm/nouveau: rename "argv" to what it represents on the GSP message send path The name "argv" has different meanings in different functions. To improve the readability, it's better to refine it to a name that reflects what it represents. Rename "repc" to what it represents in the GSP message send path. Wrap the long container_of() into to_gsp_hdr() to make checkpatch.pl happy. No functional change is intended. Signed-off-by: Zhi Wang <zhiw@nvidia.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-4-zhiw@nvidia.com	2025-01-25 00:55:10 +01:00
Zhi Wang	f98ed88eb9	drm/nouveau: rename "repc" to "gsp_rpc_len" on the GSP message recv path The name "repc" has different meanings in different contexts. To improve the readability, it's better to refine it to a name that reflects what it actually represents. Rename "repc" to "gsp_rpc_len" in the GSP message recv path. Add an section in the doc to explain the terms. No functional change is intended. Cc: Danilo Krummrich <dakr@kernel.org> Signed-off-by: Zhi Wang <zhiw@nvidia.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-3-zhiw@nvidia.com	2025-01-25 00:55:10 +01:00
Zhi Wang	22807d30fa	drm/nouveau: add a kernel doc to introduce the GSP RPC In order to explain the name clean-ups in GSP RPC routines, a kernel doc to explain the memory layout and terms is required. Add a kernel doc to introduce the GSP RPC. Cc: Danilo Krummrich <dakr@kernel.org> Signed-off-by: Zhi Wang <zhiw@nvidia.com> [ Fix bullet list indentation; add SPDX-License-Identifier. - Danilo ] Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-2-zhiw@nvidia.com	2025-01-25 00:55:10 +01:00
Timur Tabi	97395ce76e	drm/nouveau: fix kernel-doc comments Fix some malformed kernel-doc comments that were added in a recent commit. Also, kernel-doc does not support global variables, so change those kernel-doc comments into regular comments. Fixes: `214c9539cf` ("drm/nouveau: expose GSP-RM logging buffers via debugfs") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202412310834.jtCJj4oz-lkp@intel.com/ Signed-off-by: Timur Tabi <ttabi@nvidia.com> Reviewed-by: Ben Skeggs <bskeggs@nvidia.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250108234329.842256-1-ttabi@nvidia.com	2025-01-09 20:06:33 +01:00
Timur Tabi	214c9539cf	drm/nouveau: expose GSP-RM logging buffers via debugfs The LOGINIT, LOGINTR, LOGRM, and LOGPMU buffers are circular buffers that have printf-like logs from GSP-RM and PMU encoded in them. LOGINIT, LOGINTR, and LOGRM are allocated by Nouveau and their DMA addresses are passed to GSP-RM during initialization. The buffers are required for GSP-RM to initialize properly. LOGPMU is also allocated by Nouveau, but its contents are updated when Nouveau receives an NV_VGPU_MSG_EVENT_UCODE_LIBOS_PRINT RPC from GSP-RM. Nouveau then copies the RPC to the buffer. The messages are encoded as an array of variable-length structures that contain the parameters to an NV_PRINTF call. The format string and parameter count are stored in a special ELF image that contains only logging strings. This image is not currently shipped with the Nvidia driver. There are two methods to extract the logs. OpenRM tries to load the logging ELF, and if present, parses the log buffers in real time and outputs the strings to the kernel console. Alternatively, and this is the method used by this patch, the buffers can be exposed to user space, and a user-space tool (along with the logging ELF image) can parse the buffer and dump the logs. This method has the advantage that it allows the buffers to be parsed even when the logging ELF file is not available to the user. However, it has the disadvantage the debugfs entries need to remain until the driver is unloaded. The buffers are exposed via debugfs. If GSP-RM fails to initialize, then Nouveau immediately shuts down the GSP interface. This would normally also deallocate the logging buffers, thereby preventing the user from capturing the debug logs. To avoid this, introduce the keep-gsp-logging command line parameter. If specified, and if at least one logging buffer has content, then Nouveau will migrate these buffers into new debugfs entries that are retained until the driver unloads. An end-user can capture the logs using the following commands: cp /sys/kernel/debug/nouveau/<path>/loginit loginit cp /sys/kernel/debug/nouveau/<path>/logrm logrm cp /sys/kernel/debug/nouveau/<path>/logintr logintr cp /sys/kernel/debug/nouveau/<path>/logpmu logpmu where (for a PCI device) <path> is the PCI ID of the GPU (e.g. 0000:65:00.0). Since LOGPMU is not needed for normal GSP-RM operation, it is only created if debugfs is available. Otherwise, the NV_VGPU_MSG_EVENT_UCODE_LIBOS_PRINT RPCs are ignored. A simple way to test the buffer migration feature is to have nvkm_gsp_init() return an error code. Tested-by: Ben Skeggs <bskeggs@nvidia.com> Signed-off-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20241030202952.694055-2-ttabi@nvidia.com	2024-12-04 21:47:53 +01:00
Timur Tabi	7c995e2fd9	drm/nouveau: retain device pointer in nvkm_gsp_mem object Store the struct device pointer used to allocate the DMA buffer in the nvkm_gsp_mem object. This allows nvkm_gsp_mem_dtor() to release the buffer without needing the nvkm_gsp. This is needed so that we can retain DMA buffers even after the nvkm_gsp object is deleted. Signed-off-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20241030202952.694055-1-ttabi@nvidia.com	2024-12-04 21:47:47 +01:00
Maxime Ripard	3aba2eba84	Merge drm/drm-next into drm-misc-next Kickstart 6.14 cycle. Signed-off-by: Maxime Ripard <mripard@kernel.org>	2024-12-02 12:44:18 +01:00
Zhi Wang	01ed662bdd	nvkm: correctly calculate the available space of the GSP cmdq buffer r535_gsp_cmdq_push() waits for the available page in the GSP cmdq buffer when handling a large RPC request. When it sees at least one available page in the cmdq, it quits the waiting with the amount of free buffer pages in the queue. Unfortunately, it always takes the [write pointer, buf_size) as available buffer pages before rolling back and wrongly calculates the size of the data should be copied. Thus, it can overwrite the RPC request that GSP is currently reading, which causes GSP hang due to corrupted RPC request: [ 549.209389] ------------[ cut here ]------------ [ 549.214010] WARNING: CPU: 8 PID: 6314 at drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c:116 r535_gsp_msgq_wait+0xd0/0x190 [nvkm] [ 549.225678] Modules linked in: nvkm(E+) gsp_log(E) snd_seq_dummy(E) snd_hrtimer(E) snd_seq(E) snd_timer(E) snd_seq_device(E) snd(E) soundcore(E) rfkill(E) qrtr(E) vfat(E) fat(E) ipmi_ssif(E) amd_atl(E) intel_rapl_msr(E) intel_rapl_common(E) mlx5_ib(E) amd64_edac(E) edac_mce_amd(E) kvm_amd(E) ib_uverbs(E) kvm(E) ib_core(E) acpi_ipmi(E) ipmi_si(E) mxm_wmi(E) ipmi_devintf(E) rapl(E) i2c_piix4(E) wmi_bmof(E) joydev(E) ptdma(E) acpi_cpufreq(E) k10temp(E) pcspkr(E) ipmi_msghandler(E) xfs(E) libcrc32c(E) ast(E) i2c_algo_bit(E) crct10dif_pclmul(E) drm_shmem_helper(E) nvme_tcp(E) crc32_pclmul(E) ahci(E) drm_kms_helper(E) libahci(E) nvme_fabrics(E) crc32c_intel(E) nvme(E) cdc_ether(E) mlx5_core(E) nvme_core(E) usbnet(E) drm(E) libata(E) ccp(E) ghash_clmulni_intel(E) mii(E) t10_pi(E) mlxfw(E) sp5100_tco(E) psample(E) pci_hyperv_intf(E) wmi(E) dm_multipath(E) sunrpc(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) be2iscsi(E) bnx2i(E) cnic(E) uio(E) cxgb4i(E) cxgb4(E) tls(E) libcxgbi(E) libcxgb(E) qla4xxx(E) [ 549.225752] iscsi_boot_sysfs(E) iscsi_tcp(E) libiscsi_tcp(E) libiscsi(E) scsi_transport_iscsi(E) fuse(E) [last unloaded: gsp_log(E)] [ 549.326293] CPU: 8 PID: 6314 Comm: insmod Tainted: G E 6.9.0-rc6+ #1 [ 549.334039] Hardware name: ASRockRack 1U1G-MILAN/N/ROMED8-NL, BIOS L3.12E 09/06/2022 [ 549.341781] RIP: 0010:r535_gsp_msgq_wait+0xd0/0x190 [nvkm] [ 549.347343] Code: 08 00 00 89 da c1 e2 0c 48 8d ac 11 00 10 00 00 48 8b 0c 24 48 85 c9 74 1f c1 e0 0c 4c 8d 6d 30 83 e8 30 89 01 e9 68 ff ff ff <0f> 0b 49 c7 c5 92 ff ff ff e9 5a ff ff ff ba ff ff ff ff be c0 0c [ 549.366090] RSP: 0018:ffffacbccaaeb7d0 EFLAGS: 00010246 [ 549.371315] RAX: 0000000000000000 RBX: 0000000000000012 RCX: 0000000000923e28 [ 549.378451] RDX: 0000000000000000 RSI: 0000000055555554 RDI: ffffacbccaaeb730 [ 549.385590] RBP: 0000000000000001 R08: ffff8bd14d235f70 R09: ffff8bd14d235f70 [ 549.392721] R10: 0000000000000002 R11: ffff8bd14d233864 R12: 0000000000000020 [ 549.399854] R13: ffffacbccaaeb818 R14: 0000000000000020 R15: ffff8bb298c67000 [ 549.406988] FS: 00007f5179244740(0000) GS:ffff8bd14d200000(0000) knlGS:0000000000000000 [ 549.415076] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 549.420829] CR2: 00007fa844000010 CR3: 00000001567dc005 CR4: 0000000000770ef0 [ 549.427963] PKRU: 55555554 [ 549.430672] Call Trace: [ 549.433126] <TASK> [ 549.435233] ? __warn+0x7f/0x130 [ 549.438473] ? r535_gsp_msgq_wait+0xd0/0x190 [nvkm] [ 549.443426] ? report_bug+0x18a/0x1a0 [ 549.447098] ? handle_bug+0x3c/0x70 [ 549.450589] ? exc_invalid_op+0x14/0x70 [ 549.454430] ? asm_exc_invalid_op+0x16/0x20 [ 549.458619] ? r535_gsp_msgq_wait+0xd0/0x190 [nvkm] [ 549.463565] r535_gsp_msg_recv+0x46/0x230 [nvkm] [ 549.468257] r535_gsp_rpc_push+0x106/0x160 [nvkm] [ 549.473033] r535_gsp_rpc_rm_ctrl_push+0x40/0x130 [nvkm] [ 549.478422] nvidia_grid_init_vgpu_types+0xbc/0xe0 [nvkm] [ 549.483899] nvidia_grid_init+0xb1/0xd0 [nvkm] [ 549.488420] ? srso_alias_return_thunk+0x5/0xfbef5 [ 549.493213] nvkm_device_pci_probe+0x305/0x420 [nvkm] [ 549.498338] local_pci_probe+0x46/0xa0 [ 549.502096] pci_call_probe+0x56/0x170 [ 549.505851] pci_device_probe+0x79/0xf0 [ 549.509690] ? driver_sysfs_add+0x59/0xc0 [ 549.513702] really_probe+0xd9/0x380 [ 549.517282] __driver_probe_device+0x78/0x150 [ 549.521640] driver_probe_device+0x1e/0x90 [ 549.525746] __driver_attach+0xd2/0x1c0 [ 549.529594] ? __pfx___driver_attach+0x10/0x10 [ 549.534045] bus_for_each_dev+0x78/0xd0 [ 549.537893] bus_add_driver+0x112/0x210 [ 549.541750] driver_register+0x5c/0x120 [ 549.545596] ? __pfx_nvkm_init+0x10/0x10 [nvkm] [ 549.550224] do_one_initcall+0x44/0x300 [ 549.554063] ? do_init_module+0x23/0x240 [ 549.557989] do_init_module+0x64/0x240 Calculate the available buffer page before rolling back based on the result from the waiting. Signed-off-by: Zhi Wang <zhiw@nvidia.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20241017071922.2518724-3-zhiw@nvidia.com	2024-11-25 17:36:07 +01:00
Zhi Wang	8d9beb4aeb	nvkm/gsp: correctly advance the read pointer of GSP message queue A GSP event message consists three parts: message header, RPC header, message body. GSP calculates the number of pages to write from the total size of a GSP message. This behavior can be observed from the movement of the write pointer. However, nvkm takes only the size of RPC header and message body as the message size when advancing the read pointer. When handling a two-page GSP message in the non rollback case, It wrongly takes the message body of the previous message as the message header of the next message. As the "message length" tends to be zero, in the calculation of size needs to be copied (0 - size of (message header)), the size needs to be copied will be "0xffffffxx". It also triggers a kernel panic due to a NULL pointer error. [ 547.614102] msg: 00000f90: ff ff ff ff ff ff ff ff 40 d7 18 fb 8b 00 00 00 ........@....... [ 547.622533] msg: 00000fa0: 00 00 00 00 ff ff ff ff ff ff ff ff 00 00 00 00 ................ [ 547.630965] msg: 00000fb0: ff ff ff ff ff ff ff ff 00 00 00 00 ff ff ff ff ................ [ 547.639397] msg: 00000fc0: ff ff ff ff 00 00 00 00 ff ff ff ff ff ff ff ff ................ [ 547.647832] nvkm 0000:c1:00.0: gsp: peek msg rpc fn:0 len:0x0/0xffffffffffffffe0 [ 547.655225] nvkm 0000:c1:00.0: gsp: get msg rpc fn:0 len:0x0/0xffffffffffffffe0 [ 547.662532] BUG: kernel NULL pointer dereference, address: 0000000000000020 [ 547.669485] #PF: supervisor read access in kernel mode [ 547.674624] #PF: error_code(0x0000) - not-present page [ 547.679755] PGD 0 P4D 0 [ 547.682294] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 547.686643] CPU: 22 PID: 322 Comm: kworker/22:1 Tainted: G E 6.9.0-rc6+ #1 [ 547.694893] Hardware name: ASRockRack 1U1G-MILAN/N/ROMED8-NL, BIOS L3.12E 09/06/2022 [ 547.702626] Workqueue: events r535_gsp_msgq_work [nvkm] [ 547.707921] RIP: 0010:r535_gsp_msg_recv+0x87/0x230 [nvkm] [ 547.713375] Code: 00 8b 70 08 48 89 e1 31 d2 4c 89 f7 e8 12 f5 ff ff 48 89 c5 48 85 c0 0f 84 cf 00 00 00 48 81 fd 00 f0 ff ff 0f 87 c4 00 00 00 <8b> 55 10 41 8b 46 30 85 d2 0f 85 f6 00 00 00 83 f8 04 76 10 ba 05 [ 547.732119] RSP: 0018:ffffabe440f87e10 EFLAGS: 00010203 [ 547.737335] RAX: 0000000000000010 RBX: 0000000000000008 RCX: 000000000000003f [ 547.744461] RDX: 0000000000000000 RSI: ffffabe4480a8030 RDI: 0000000000000010 [ 547.751585] RBP: 0000000000000010 R08: 0000000000000000 R09: ffffabe440f87bb0 [ 547.758707] R10: ffffabe440f87dc8 R11: 0000000000000010 R12: 0000000000000000 [ 547.765834] R13: 0000000000000000 R14: ffff9351df1e5000 R15: 0000000000000000 [ 547.772958] FS: 0000000000000000(0000) GS:ffff93708eb00000(0000) knlGS:0000000000000000 [ 547.781035] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 547.786771] CR2: 0000000000000020 CR3: 00000003cc220002 CR4: 0000000000770ef0 [ 547.793896] PKRU: 55555554 [ 547.796600] Call Trace: [ 547.799046] <TASK> [ 547.801152] ? __die+0x20/0x70 [ 547.804211] ? page_fault_oops+0x75/0x170 [ 547.808221] ? print_hex_dump+0x100/0x160 [ 547.812226] ? exc_page_fault+0x64/0x150 [ 547.816152] ? asm_exc_page_fault+0x22/0x30 [ 547.820341] ? r535_gsp_msg_recv+0x87/0x230 [nvkm] [ 547.825184] r535_gsp_msgq_work+0x42/0x50 [nvkm] [ 547.829845] process_one_work+0x196/0x3d0 [ 547.833861] worker_thread+0x2fc/0x410 [ 547.837613] ? __pfx_worker_thread+0x10/0x10 [ 547.841885] kthread+0xdf/0x110 [ 547.845031] ? __pfx_kthread+0x10/0x10 [ 547.848775] ret_from_fork+0x30/0x50 [ 547.852354] ? __pfx_kthread+0x10/0x10 [ 547.856097] ret_from_fork_asm+0x1a/0x30 [ 547.860019] </TASK> [ 547.862208] Modules linked in: nvkm(E) gsp_log(E) snd_seq_dummy(E) snd_hrtimer(E) snd_seq(E) snd_timer(E) snd_seq_device(E) snd(E) soundcore(E) rfkill(E) qrtr(E) vfat(E) fat(E) ipmi_ssif(E) amd_atl(E) intel_rapl_msr(E) intel_rapl_common(E) amd64_edac(E) mlx5_ib(E) edac_mce_amd(E) kvm_amd(E) ib_uverbs(E) kvm(E) ib_core(E) acpi_ipmi(E) ipmi_si(E) ipmi_devintf(E) mxm_wmi(E) joydev(E) rapl(E) ptdma(E) i2c_piix4(E) acpi_cpufreq(E) wmi_bmof(E) pcspkr(E) k10temp(E) ipmi_msghandler(E) xfs(E) libcrc32c(E) ast(E) i2c_algo_bit(E) drm_shmem_helper(E) crct10dif_pclmul(E) drm_kms_helper(E) ahci(E) crc32_pclmul(E) nvme_tcp(E) libahci(E) nvme(E) crc32c_intel(E) nvme_fabrics(E) cdc_ether(E) nvme_core(E) usbnet(E) mlx5_core(E) ghash_clmulni_intel(E) drm(E) libata(E) ccp(E) mii(E) t10_pi(E) mlxfw(E) sp5100_tco(E) psample(E) pci_hyperv_intf(E) wmi(E) dm_multipath(E) sunrpc(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) be2iscsi(E) bnx2i(E) cnic(E) uio(E) cxgb4i(E) cxgb4(E) tls(E) libcxgbi(E) libcxgb(E) qla4xxx(E) [ 547.862283] iscsi_boot_sysfs(E) iscsi_tcp(E) libiscsi_tcp(E) libiscsi(E) scsi_transport_iscsi(E) fuse(E) [last unloaded: gsp_log(E)] [ 547.962691] CR2: 0000000000000020 [ 547.966003] ---[ end trace 0000000000000000 ]--- [ 549.012012] clocksource: Long readout interval, skipping watchdog check: cs_nsec: 1370499158 wd_nsec: 1370498904 [ 549.043676] pstore: backend (erst) writing error (-28) [ 549.050924] RIP: 0010:r535_gsp_msg_recv+0x87/0x230 [nvkm] [ 549.056389] Code: 00 8b 70 08 48 89 e1 31 d2 4c 89 f7 e8 12 f5 ff ff 48 89 c5 48 85 c0 0f 84 cf 00 00 00 48 81 fd 00 f0 ff ff 0f 87 c4 00 00 00 <8b> 55 10 41 8b 46 30 85 d2 0f 85 f6 00 00 00 83 f8 04 76 10 ba 05 [ 549.075138] RSP: 0018:ffffabe440f87e10 EFLAGS: 00010203 [ 549.080361] RAX: 0000000000000010 RBX: 0000000000000008 RCX: 000000000000003f [ 549.087484] RDX: 0000000000000000 RSI: ffffabe4480a8030 RDI: 0000000000000010 [ 549.094609] RBP: 0000000000000010 R08: 0000000000000000 R09: ffffabe440f87bb0 [ 549.101733] R10: ffffabe440f87dc8 R11: 0000000000000010 R12: 0000000000000000 [ 549.108857] R13: 0000000000000000 R14: ffff9351df1e5000 R15: 0000000000000000 [ 549.115982] FS: 0000000000000000(0000) GS:ffff93708eb00000(0000) knlGS:0000000000000000 [ 549.124061] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 549.129807] CR2: 0000000000000020 CR3: 00000003cc220002 CR4: 0000000000770ef0 [ 549.136940] PKRU: 55555554 [ 549.139653] Kernel panic - not syncing: Fatal exception [ 549.145054] Kernel Offset: 0x18c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 549.165074] ---[ end Kernel panic - not syncing: Fatal exception ]--- Also, nvkm wrongly advances the read pointer when handling a two-page GSP message in the rollback case. In the rollback case, the GSP message will be copied in two rounds. When handling a two-page GSP message, nvkm first copies amount of (GSP_PAGE_SIZE - header) data into the buffer, then advances the read pointer by the result of DIV_ROUND_UP(size, GSP_PAGE_SIZE). Thus, the read pointer is advanced by 1. Next, nvkm copies the amount of (total size - (GSP_PAGE_SIZE - header)) data into the buffer. The left amount of the data will be always larger than one page since the message header is not taken into account in the first copy. Thus, the read pointer is advanced by DIV_ROUND_UP( size(larger than one page), GSP_PAGE_SIZE) = 2. In the end, the read pointer is wrongly advanced by 3 when handling a two-page GSP message in the rollback case. Fix the problems by taking the total size of the message into account when advancing the read pointer and calculate the read pointer in the end of the all copies for the rollback case. BTW: the two-page GSP message can be observed in the msgq when vGPU is enabled. Signed-off-by: Zhi Wang <zhiw@nvidia.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20241017071922.2518724-2-zhiw@nvidia.com	2024-11-25 17:36:06 +01:00
Linus Torvalds	28eb75e178	drm for 6.13-rc1 core: - split DSC helpers from DP helpers - clang build fixes for drm/mm test - drop simple pipeline support for gem vram - document submission error signaling - move drm_rect to drm core module from kms helper - add default client setup to most drivers - move to video aperture helpers instead of drm ones tests: - new framebuffer tests ttm: - remove swapped and pinned BOs from TTM lru panic: - fix uninit spinlock - add ABGR2101010 support bridge: - add TI TDP158 support - use standard PM OPS dma-fence: - use read_trylock instead of read_lock to help lockdep scheduler: - add errno to sched start to report different errors - add locking to drm_sched_entity_modify_sched - improve documentation xe: - add drm_line_printer - lots of refactoring - Enable Xe2 + PES disaggregation - add new ARL PCI ID - SRIOV development work - fix exec unnecessary implicit fence - define and parse OA sync props - forcewake refactoring i915: - Enable BMG/LNL ultra joiner - Enable 10bpx + CCS scanout on ICL+, fp16/CCS on TGL+ - use DSB for plane/color mgmt - Arrow lake PCI IDs - lots of i915/xe display refactoring - enable PXP GuC autoteardown - Pantherlake (PTL) Xe3 LPD display enablement - Allow fastset HDR infoframe changes - write DP source OUI for non-eDP sinks - share PCI IDs between i915 and xe amdgpu: - SDMA queue reset support - SMU 13.0.6, JPEG 4.0.3 updates - Initial runtime repartitioning support - rework IP structs for multiple IP instances - Fetch EDID from _DDC if available - SMU13 zero rpm user control - lots of fixes/cleanups amdkfd: - Increase event FIFO size - add topology cap flag for per queue reset msm: - DPU: - SA8775P support - (disabled by default) MSM8917, MSM8937, MSM8953 and MSM8996 support - Enable large framebuffer support - Drop MSM8998 and SDM845 - DP: - SA8775P support - GPU: - a7xx preemption support - Adreno A663 support ast: - warn about unsupported TX chips ivpu: - add coredump - add pantherlake support rockchip: - 4K@60Hz display enablement - generate pll programming tables panthor: - add timestamp query API - add realtime group priority - add fdinfo support etnaviv: - improve handling of DMA address limits - improve GPU hangcheck exynos: - Decon Exynos7870 support mediatek: - add OF graph support omap: - locking fixes bochs: - convert to gem/shmem from simpledrm v3d: - support big/super pages - add gemfs vc4: - BCM2712 support refactoring - add YUV444 format support udmabuf: - folio related fixes nouveau: - add panic support on nv50+ -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmc+efwACgkQDHTzWXnE hr6Dyg/9HVVI3lxuWAz9MEt3w+BON5KTJAxg5Zhvc5DwiUbDXghu8sfkUfanDWS5 /MqyPqLt5srXrtKTRDnzEI0Vf8YHeiDEcaydjpshEpCfteHZ7SADpvem8fp6/otV iYt8U6tMcGe9I+M2kwDkOTrKJIiyCKPi5hfBIAkxEAh6806ifPRtLkeMGbaSwBxH x6kZTE9ygGWAY7bAgbmVmm3JwrXG9mYDl9dW3cbi9gZ6PGAXHPZRUPvZoHhvfC2A UVgROH76Spm4rdWYGI3azj+gW3HsdGgUHcysb+lu37i261E+sT7kuV2UYtnOMzr5 igO1RlQ+rcfPYLG4n+oNXDMu5d1OQXELrlQzXptym4Konpd7b/GSeVctWV0wHWuv nG8g7DWAFFnLAdeWqLZpf1Brze33h5+572D3BioWB4LYSEATjwoTwcBKsdRuc4Wk RHxjumCidybTdo/8EB1ElGlH39m/mDQA0scMlVhS/BuiIssfgcBRfltI8S3HzHcW YQYq6xH7F9E3shs3/TYbWR4clm66ZTnZV6ClDfGJolzyF/hbV0rsbeSpDelpooE8 1Js7KuwVa+HvA4jtupY9vqxMTdXWwoGPfuUgKpOAreYibnd1T9Q1zVme/B1bUH05 518IjiMGCxDnBvFWaPT9DcX4zg7pS3yzjw3hGkdz3reUqat0Gy8= =8cUI -----END PGP SIGNATURE----- Merge tag 'drm-next-2024-11-21' of https://gitlab.freedesktop.org/drm/kernel Pull drm updates from Dave Airlie: "There's a lot of rework, the panic helper support is being added to more drivers, v3d gets support for HW superpages, scheduler documentation, drm client and video aperture reworks, some new MAINTAINERS added, amdgpu has the usual lots of IP refactors, Intel has some Pantherlake enablement and xe is getting some SRIOV bits, but just lots of stuff everywhere. core: - split DSC helpers from DP helpers - clang build fixes for drm/mm test - drop simple pipeline support for gem vram - document submission error signaling - move drm_rect to drm core module from kms helper - add default client setup to most drivers - move to video aperture helpers instead of drm ones tests: - new framebuffer tests ttm: - remove swapped and pinned BOs from TTM lru panic: - fix uninit spinlock - add ABGR2101010 support bridge: - add TI TDP158 support - use standard PM OPS dma-fence: - use read_trylock instead of read_lock to help lockdep scheduler: - add errno to sched start to report different errors - add locking to drm_sched_entity_modify_sched - improve documentation xe: - add drm_line_printer - lots of refactoring - Enable Xe2 + PES disaggregation - add new ARL PCI ID - SRIOV development work - fix exec unnecessary implicit fence - define and parse OA sync props - forcewake refactoring i915: - Enable BMG/LNL ultra joiner - Enable 10bpx + CCS scanout on ICL+, fp16/CCS on TGL+ - use DSB for plane/color mgmt - Arrow lake PCI IDs - lots of i915/xe display refactoring - enable PXP GuC autoteardown - Pantherlake (PTL) Xe3 LPD display enablement - Allow fastset HDR infoframe changes - write DP source OUI for non-eDP sinks - share PCI IDs between i915 and xe amdgpu: - SDMA queue reset support - SMU 13.0.6, JPEG 4.0.3 updates - Initial runtime repartitioning support - rework IP structs for multiple IP instances - Fetch EDID from _DDC if available - SMU13 zero rpm user control - lots of fixes/cleanups amdkfd: - Increase event FIFO size - add topology cap flag for per queue reset msm: - DPU: - SA8775P support - (disabled by default) MSM8917, MSM8937, MSM8953 and MSM8996 support - Enable large framebuffer support - Drop MSM8998 and SDM845 - DP: - SA8775P support - GPU: - a7xx preemption support - Adreno A663 support ast: - warn about unsupported TX chips ivpu: - add coredump - add pantherlake support rockchip: - 4K@60Hz display enablement - generate pll programming tables panthor: - add timestamp query API - add realtime group priority - add fdinfo support etnaviv: - improve handling of DMA address limits - improve GPU hangcheck exynos: - Decon Exynos7870 support mediatek: - add OF graph support omap: - locking fixes bochs: - convert to gem/shmem from simpledrm v3d: - support big/super pages - add gemfs vc4: - BCM2712 support refactoring - add YUV444 format support udmabuf: - folio related fixes nouveau: - add panic support on nv50+" * tag 'drm-next-2024-11-21' of https://gitlab.freedesktop.org/drm/kernel: (1583 commits) drm/xe/guc: Fix dereference before NULL check drm/amd: Fix initialization mistake for NBIO 7.7.0 Revert "drm/amd/display: parse umc_info or vram_info based on ASIC" drm/amd/display: Fix failure to read vram info due to static BP_RESULT drm/amdgpu: enable GTT fallback handling for dGPUs only drm/amd/amdgpu: limit single process inside MES drm/fourcc: add AMD_FMT_MOD_TILE_GFX9_4K_D_X drm/amdgpu/mes12: correct kiq unmap latency drm/amdgpu: Support vcn and jpeg error info parsing drm/amd : Update MES API header file for v11 & v12 drm/amd/amdkfd: add/remove kfd queues on start/stop KFD scheduling drm/amdkfd: change kfd process kref count at creation drm/amdgpu: Cleanup shift coding style drm/amd/amdgpu: Increase MES log buffer to dump mes scratch data drm/amdgpu: Implement virt req_ras_err_count drm/amdgpu: VF Query RAS Caps from Host if supported drm/amdgpu: Add msg handlers for SRIOV RAS Telemetry drm/amdgpu: Update SRIOV Exchange Headers for RAS Telemetry Support drm/amd/display: 3.2.309 drm/amd/display: Adjust VSDB parser for replay feature ...	2024-11-21 14:56:17 -08:00
Dave Airlie	b6ad7debf5	nouveau: handle EBUSY and EAGAIN for GSP aux errors. The upper layer transfer functions expect EBUSY as a return for when retries should be done. Fix the AUX error translation, but also check for both errors in a few places. Fixes: `eb284f4b37` ("drm/nouveau/dp: Honor GSP link training retry timeouts") Cc: stable@vger.kernel.org Reviewed-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241111034126.2028401-1-airlied@gmail.com	2024-11-14 11:50:01 +10:00
Benjamin Szőke	231bb9b4c4	drm/nouveau/i2c: rename aux.c and aux.h to auxch.c and auxch.h The goal is to clean-up Linux repository from AUX file names, because the use of such file names is prohibited on other operating systems such as Windows, so the Linux repository cannot be cloned and edited on them. Signed-off-by: Benjamin Szőke <egyszeregy@freemail.hu> Reviewed-by: Ben Skeggs <bskeggs@nvidia.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20240603091558.35672-1-egyszeregy@freemail.hu	2024-10-03 20:05:16 +02:00
Thomas Zimmermann	2dd0ef5d95	Merge drm/drm-next into drm-misc-next Get drm-misc-next to up v6.12-rc1. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>	2024-09-30 10:50:54 +02:00
Thomas Zimmermann	61b86391fb	Merge drm/drm-next into drm-misc-next Backmerging to get fixes from v6.12-rc7. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>	2024-09-11 09:48:49 +02:00
Ben Skeggs	6db9df4f70	drm/nouveau/fb: restore init() for ramgp102 init() was removed from ramgp102 when reworking the memory detection, as it was thought that the code was only necessary when the driver performs mclk changes, which nouveau doesn't support on pascal. However, it turns out that we still need to execute this on some GPUs to restore settings after DEVINIT, so revert to the original behaviour. v2: fix tags in commit message, cc stable Closes: https://gitlab.freedesktop.org/drm/nouveau/-/issues/319 Fixes: `2c0c15a22f` ("drm/nouveau/fb/gp102-ga100: switch to simpler vram size detection method") Cc: stable@vger.kernel.org # 6.6+ Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20240904232418.8590-1-bskeggs@nvidia.com	2024-09-10 12:22:48 +02:00
Li Zetao	c6430a8eb0	drm/nouveau/volt: use clamp() in nvkm_volt_map() When it needs to get a value within a certain interval, using clamp() makes the code easier to understand than min(max()). Reviewed-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Li Zetao <lizetao1@huawei.com> Signed-off-by: Lyude Paul <lyude@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240831012803.3950100-4-lizetao1@huawei.com	2024-09-04 15:21:43 -04:00
Dave Airlie	f33b9ab049	nouveau: fix the fwsec sb verification register. This aligns with what open gpu does, the 0x15 hex is just to trick you. Fixes: `176fdcbddf` ("drm/nouveau/gsp/r535: add support for booting GSP-RM") Reviewed-by: Ben Skeggs <bskeggs@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20240828023720.1596602-1-airlied@gmail.com	2024-08-31 00:23:45 +02:00
Maxime Ripard	375c4d1583	Merge drm/drm-next into drm-misc-next Let's start the new release cycle. Signed-off-by: Maxime Ripard <mripard@kernel.org>	2024-05-27 11:08:31 +02:00
Dave Airlie	09e10499ee	Short summary of fixes pull: imagination: - fix page-count macro nouveau: - avoid page-table allocation failures - fix firmware memory allocation panel: - ili9341: avoid OF for device properties; respect deferred probe; fix usage of errno codes ttm: - fix status output vmwgfx: - fix legacy display unit - fix read length in fence signalling -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAmYz53QACgkQaA3BHVML eiP7Ggf/USKiwX3xhWFk3SLGCWYM+eflhPgyyTs4Ue8NYOd/0C7lDKtEWwSbPUlB 6Tg8FZW0OF0eLYQjl7/A0sufj/225mR8jKCmwbBg8KherqPfThxChRqh7kjcsaHR pG8yleio7oPGg4qB4hi/PENB5kVFw6s1tVPOz/X8W7V2aEVglO+9IGup6CB3TQKu 9kGHbgPVZ1SoDlXrmUmApfAy+Qqu4F8s3m8b6hYXvit3SqJl275aJIqL4fjw5Ufk DhbTfdL4ao28gOAIdRLuOlwpadMOl+rMTls8Q/FwW0Pk/S8Osu1QykhlhHBuBxzb bs3YmYwsMKeKofvLHwxrWqQVfH+TSA== =h9Cb -----END PGP SIGNATURE----- Merge tag 'drm-misc-fixes-2024-05-02' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes Short summary of fixes pull: imagination: - fix page-count macro nouveau: - avoid page-table allocation failures - fix firmware memory allocation panel: - ili9341: avoid OF for device properties; respect deferred probe; fix usage of errno codes ttm: - fix status output vmwgfx: - fix legacy display unit - fix read length in fence signalling Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20240502192117.GA12158@linux.fritz.box	2024-05-03 11:16:40 +10:00
Lyude Paul	6f572a8054	drm/nouveau/gsp: Use the sg allocator for level 2 of radix3 Currently we allocate all 3 levels of radix3 page tables using nvkm_gsp_mem_ctor(), which uses dma_alloc_coherent() for allocating all of the relevant memory. This can end up failing in scenarios where the system has very high memory fragmentation, and we can't find enough contiguous memory to allocate level 2 of the page table. Currently, this can result in runtime PM issues on systems where memory fragmentation is high - as we'll fail to allocate the page table for our suspend/resume buffer: kworker/10:2: page allocation failure: order:7, mode:0xcc0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=0 CPU: 10 PID: 479809 Comm: kworker/10:2 Not tainted 6.8.6-201.ChopperV6.fc39.x86_64 #1 Hardware name: SLIMBOOK Executive/Executive, BIOS N.1.10GRU06 02/02/2024 Workqueue: pm pm_runtime_work Call Trace: <TASK> dump_stack_lvl+0x64/0x80 warn_alloc+0x165/0x1e0 ? __alloc_pages_direct_compact+0xb3/0x2b0 __alloc_pages_slowpath.constprop.0+0xd7d/0xde0 __alloc_pages+0x32d/0x350 __dma_direct_alloc_pages.isra.0+0x16a/0x2b0 dma_direct_alloc+0x70/0x270 nvkm_gsp_radix3_sg+0x5e/0x130 [nouveau] r535_gsp_fini+0x1d4/0x350 [nouveau] nvkm_subdev_fini+0x67/0x150 [nouveau] nvkm_device_fini+0x95/0x1e0 [nouveau] nvkm_udevice_fini+0x53/0x70 [nouveau] nvkm_object_fini+0xb9/0x240 [nouveau] nvkm_object_fini+0x75/0x240 [nouveau] nouveau_do_suspend+0xf5/0x280 [nouveau] nouveau_pmops_runtime_suspend+0x3e/0xb0 [nouveau] pci_pm_runtime_suspend+0x67/0x1e0 ? __pfx_pci_pm_runtime_suspend+0x10/0x10 __rpm_callback+0x41/0x170 ? __pfx_pci_pm_runtime_suspend+0x10/0x10 rpm_callback+0x5d/0x70 ? __pfx_pci_pm_runtime_suspend+0x10/0x10 rpm_suspend+0x120/0x6a0 pm_runtime_work+0x98/0xb0 process_one_work+0x171/0x340 worker_thread+0x27b/0x3a0 ? __pfx_worker_thread+0x10/0x10 kthread+0xe5/0x120 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x31/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 Luckily, we don't actually need to allocate coherent memory for the page table thanks to being able to pass the GPU a radix3 page table for suspend/resume data. So, let's rewrite nvkm_gsp_radix3_sg() to use the sg allocator for level 2. We continue using coherent allocations for lvl0 and 1, since they only take a single page. V2: * Don't forget to actually jump to the next scatterlist when we reach the end of the scatterlist we're currently on when writing out the page table for level 2 Signed-off-by: Lyude Paul <lyude@redhat.com> Cc: stable@vger.kernel.org Reviewed-by: Ben Skeggs <bskeggs@nvidia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-2-lyude@redhat.com	2024-04-30 12:45:42 -04:00
Chaitanya Kumar Borah	4a9a567ab1	nouveau: Add missing break statement Add the missing break statement that causes the following build error CC [M] drivers/gpu/drm/i915/display/intel_display_device.o ../drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c: In function ‘build_registry’: ../drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c:1266:3: error: label at end of compound statement 1266 \| default: \| ^~~~~~~ CC [M] drivers/gpu/drm/amd/amdgpu/gfx_v10_0.o HDRTEST drivers/gpu/drm/xe/compat-i915-headers/i915_reg.h CC [M] drivers/gpu/drm/amd/amdgpu/imu_v11_0.o make[7]: * [../scripts/Makefile.build:244: drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.o] Error 1 make[7]: * Waiting for unfinished jobs.... Fixes: `b58a0bc904` ("nouveau: add command-line GSP-RM registry support") Closes: https://lore.kernel.org/all/913052ca6c0988db1bab293cfae38529251b4594.camel@nvidia.com/T/#m3c9acebac754f2e74a85b76c858c093bb1aacaf0 Closes: https://lore.kernel.org/all/CA+G9fYu7Ug0K8h9QJT0WbtWh_LL9Juc+VC0WMU_Z_vSSPDNymg@mail.gmail.com/ Tested-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com> Signed-off-by: Danilo Krummrich <dakr@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240430131840.742924-1-chaitanya.kumar.borah@intel.com	2024-04-30 16:06:18 +02:00
Timur Tabi	b58a0bc904	nouveau: add command-line GSP-RM registry support Add the NVreg_RegistryDwords command line parameter, which allows specifying additional registry keys to be sent to GSP-RM. This allows additional configuration, debugging, and experimentation with GSP-RM, which uses these keys to alter its behavior. Note that these keys are passed as-is to GSP-RM, and Nouveau does not parse them. This is in contrast to the Nvidia driver, which may parse some of the keys to configure some functionality in concert with GSP-RM. Therefore, any keys which also require action by the driver may not function correctly when passed by Nouveau. Caveat emptor. The name and format of NVreg_RegistryDwords is the same as used by the Nvidia driver, to maintain compatibility. Signed-off-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Danilo Krummrich <dakr@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240417215317.3490856-1-ttabi@nvidia.com	2024-04-26 18:06:39 +02:00

1 2 3 4 5 ...

1241 commits