mirror of
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2025-04-13 09:59:31 +00:00

Introduce device wedged event, which notifies userspace of 'wedged' (hanged/unusable) state of the DRM device through a uevent. This is useful especially in cases where the device is no longer operating as expected and has become unrecoverable from driver context. Purpose of this implementation is to provide drivers a generic way to recover the device with the help of userspace intervention without taking any drastic measures (like resetting or re-enumerating the full bus, on which the underlying physical device is sitting) in the driver. A 'wedged' device is basically a device that is declared dead by the driver after exhausting all possible attempts to recover it from driver context. The uevent is the notification that is sent to userspace along with a hint about what could possibly be attempted to recover the device from userspace and bring it back to usable state. Different drivers may have different ideas of a 'wedged' device depending on hardware implementation of the underlying physical device, and hence the vendor agnostic nature of the event. It is up to the drivers to decide when they see the need for device recovery and how they want to recover from the available methods. Driver prerequisites -------------------- The driver, before opting for recovery, needs to make sure that the 'wedged' device doesn't harm the system as a whole by taking care of the prerequisites. Necessary actions must include disabling DMA to system memory as well as any communication channels with other devices. Further, the driver must ensure that all dma_fences are signalled and any device state that the core kernel might depend on is cleaned up. All existing mmaps should be invalidated and page faults should be redirected to a dummy page. Once the event is sent, the device must be kept in 'wedged' state until the recovery is performed. New accesses to the device (IOCTLs) should be rejected, preferably with an error code that resembles the type of failure the device has encountered. This will signify the reason for wedging, which can be reported to the application if needed. Recovery -------- Current implementation defines three recovery methods, out of which, drivers can use any one, multiple or none. Method(s) of choice will be sent in the uevent environment as ``WEDGED=<method1>[,..,<methodN>]`` in order of less to more side-effects. If driver is unsure about recovery or method is unknown (like soft/hard system reboot, firmware flashing, physical device replacement or any other procedure which can't be attempted on the fly), ``WEDGED=unknown`` will be sent instead. Userspace consumers can parse this event and attempt recovery as per the following expectations. =============== ======================================== Recovery method Consumer expectations =============== ======================================== none optional telemetry collection rebind unbind + bind driver bus-reset unbind + bus reset/re-enumeration + bind unknown consumer policy =============== ======================================== The only exception to this is ``WEDGED=none``, which signifies that the device was temporarily 'wedged' at some point but was recovered from driver context using device specific methods like reset. No explicit recovery is expected from the consumer in this case, but it can still take additional steps like gathering telemetry information (devcoredump, syslog). This is useful because the first hang is usually the most critical one which can result in consequential hangs or complete wedging. Consumer prerequisites ---------------------- It is the responsibility of the consumer to make sure that the device or its resources are not in use by any process before attempting recovery. With IOCTLs erroring out, all device memory should be unmapped and file descriptors should be closed to prevent leaks or undefined behaviour. The idea here is to clear the device of all user context beforehand and set the stage for a clean recovery. Example ------- Udev rule:: SUBSYSTEM=="drm", ENV{WEDGED}=="rebind", DEVPATH=="*/drm/card[0-9]", RUN+="/path/to/rebind.sh $env{DEVPATH}" Recovery script:: #!/bin/sh DEVPATH=$(readlink -f /sys/$1/device) DEVICE=$(basename $DEVPATH) DRIVER=$(readlink -f $DEVPATH/driver) echo -n $DEVICE > $DRIVER/unbind echo -n $DEVICE > $DRIVER/bind Customization ------------- Although basic recovery is possible with a simple script, consumers can define custom policies around recovery. For example, if the driver supports multiple recovery methods, consumers can opt for the suitable one depending on scenarios like repeat offences or vendor specific failures. Consumers can also choose to have the device available for debugging or telemetry collection and base their recovery decision on the findings. This is useful especially when the driver is unsure about recovery or method is unknown. v4: s/drm_dev_wedged/drm_dev_wedged_event Use drm_info() (Jani) Kernel doc adjustment (Aravind) v5: Send recovery method with uevent (Lina) v6: Access wedge_recovery_opts[] using helper function (Jani) Use snprintf() (Jani) v7: Convert recovery helpers into regular functions (Andy, Jani) Aesthetic adjustments (Andy) Handle invalid recovery method v8: Allow sending multiple methods with uevent (Lucas, Michal) static_assert() globally (Andy) v9: Provide 'none' method for device reset (Christian) Provide recovery opts using switch cases v11: Log device reset (André) Signed-off-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: André Almeida <andrealmeid@igalia.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250204070528.1919158-2-raag.jadav@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
330 lines
8.1 KiB
C
330 lines
8.1 KiB
C
#ifndef _DRM_DEVICE_H_
|
|
#define _DRM_DEVICE_H_
|
|
|
|
#include <linux/list.h>
|
|
#include <linux/kref.h>
|
|
#include <linux/mutex.h>
|
|
#include <linux/idr.h>
|
|
|
|
#include <drm/drm_mode_config.h>
|
|
|
|
struct drm_driver;
|
|
struct drm_minor;
|
|
struct drm_master;
|
|
struct drm_vblank_crtc;
|
|
struct drm_vma_offset_manager;
|
|
struct drm_vram_mm;
|
|
struct drm_fb_helper;
|
|
|
|
struct inode;
|
|
|
|
struct pci_dev;
|
|
struct pci_controller;
|
|
|
|
/*
|
|
* Recovery methods for wedged device in order of less to more side-effects.
|
|
* To be used with drm_dev_wedged_event() as recovery @method. Callers can
|
|
* use any one, multiple (or'd) or none depending on their needs.
|
|
*/
|
|
#define DRM_WEDGE_RECOVERY_NONE BIT(0) /* optional telemetry collection */
|
|
#define DRM_WEDGE_RECOVERY_REBIND BIT(1) /* unbind + bind driver */
|
|
#define DRM_WEDGE_RECOVERY_BUS_RESET BIT(2) /* unbind + reset bus device + bind */
|
|
|
|
/**
|
|
* enum switch_power_state - power state of drm device
|
|
*/
|
|
|
|
enum switch_power_state {
|
|
/** @DRM_SWITCH_POWER_ON: Power state is ON */
|
|
DRM_SWITCH_POWER_ON = 0,
|
|
|
|
/** @DRM_SWITCH_POWER_OFF: Power state is OFF */
|
|
DRM_SWITCH_POWER_OFF = 1,
|
|
|
|
/** @DRM_SWITCH_POWER_CHANGING: Power state is changing */
|
|
DRM_SWITCH_POWER_CHANGING = 2,
|
|
|
|
/** @DRM_SWITCH_POWER_DYNAMIC_OFF: Suspended */
|
|
DRM_SWITCH_POWER_DYNAMIC_OFF = 3,
|
|
};
|
|
|
|
/**
|
|
* struct drm_device - DRM device structure
|
|
*
|
|
* This structure represent a complete card that
|
|
* may contain multiple heads.
|
|
*/
|
|
struct drm_device {
|
|
/** @if_version: Highest interface version set */
|
|
int if_version;
|
|
|
|
/** @ref: Object ref-count */
|
|
struct kref ref;
|
|
|
|
/** @dev: Device structure of bus-device */
|
|
struct device *dev;
|
|
|
|
/**
|
|
* @managed:
|
|
*
|
|
* Managed resources linked to the lifetime of this &drm_device as
|
|
* tracked by @ref.
|
|
*/
|
|
struct {
|
|
/** @managed.resources: managed resources list */
|
|
struct list_head resources;
|
|
/** @managed.final_kfree: pointer for final kfree() call */
|
|
void *final_kfree;
|
|
/** @managed.lock: protects @managed.resources */
|
|
spinlock_t lock;
|
|
} managed;
|
|
|
|
/** @driver: DRM driver managing the device */
|
|
const struct drm_driver *driver;
|
|
|
|
/**
|
|
* @dev_private:
|
|
*
|
|
* DRM driver private data. This is deprecated and should be left set to
|
|
* NULL.
|
|
*
|
|
* Instead of using this pointer it is recommended that drivers use
|
|
* devm_drm_dev_alloc() and embed struct &drm_device in their larger
|
|
* per-device structure.
|
|
*/
|
|
void *dev_private;
|
|
|
|
/**
|
|
* @primary:
|
|
*
|
|
* Primary node. Drivers should not interact with this
|
|
* directly. debugfs interfaces can be registered with
|
|
* drm_debugfs_add_file(), and sysfs should be directly added on the
|
|
* hardware (and not character device node) struct device @dev.
|
|
*/
|
|
struct drm_minor *primary;
|
|
|
|
/**
|
|
* @render:
|
|
*
|
|
* Render node. Drivers should not interact with this directly ever.
|
|
* Drivers should not expose any additional interfaces in debugfs or
|
|
* sysfs on this node.
|
|
*/
|
|
struct drm_minor *render;
|
|
|
|
/** @accel: Compute Acceleration node */
|
|
struct drm_minor *accel;
|
|
|
|
/**
|
|
* @registered:
|
|
*
|
|
* Internally used by drm_dev_register() and drm_connector_register().
|
|
*/
|
|
bool registered;
|
|
|
|
/**
|
|
* @master:
|
|
*
|
|
* Currently active master for this device.
|
|
* Protected by &master_mutex
|
|
*/
|
|
struct drm_master *master;
|
|
|
|
/**
|
|
* @driver_features: per-device driver features
|
|
*
|
|
* Drivers can clear specific flags here to disallow
|
|
* certain features on a per-device basis while still
|
|
* sharing a single &struct drm_driver instance across
|
|
* all devices.
|
|
*/
|
|
u32 driver_features;
|
|
|
|
/**
|
|
* @unplugged:
|
|
*
|
|
* Flag to tell if the device has been unplugged.
|
|
* See drm_dev_enter() and drm_dev_is_unplugged().
|
|
*/
|
|
bool unplugged;
|
|
|
|
/** @anon_inode: inode for private address-space */
|
|
struct inode *anon_inode;
|
|
|
|
/** @unique: Unique name of the device */
|
|
char *unique;
|
|
|
|
/**
|
|
* @struct_mutex:
|
|
*
|
|
* Lock for others (not &drm_minor.master and &drm_file.is_master)
|
|
*
|
|
* TODO: This lock used to be the BKL of the DRM subsystem. Move the
|
|
* lock into i915, which is the only remaining user.
|
|
*/
|
|
struct mutex struct_mutex;
|
|
|
|
/**
|
|
* @master_mutex:
|
|
*
|
|
* Lock for &drm_minor.master and &drm_file.is_master
|
|
*/
|
|
struct mutex master_mutex;
|
|
|
|
/**
|
|
* @open_count:
|
|
*
|
|
* Usage counter for outstanding files open,
|
|
* protected by drm_global_mutex
|
|
*/
|
|
atomic_t open_count;
|
|
|
|
/** @filelist_mutex: Protects @filelist. */
|
|
struct mutex filelist_mutex;
|
|
/**
|
|
* @filelist:
|
|
*
|
|
* List of userspace clients, linked through &drm_file.lhead.
|
|
*/
|
|
struct list_head filelist;
|
|
|
|
/**
|
|
* @filelist_internal:
|
|
*
|
|
* List of open DRM files for in-kernel clients.
|
|
* Protected by &filelist_mutex.
|
|
*/
|
|
struct list_head filelist_internal;
|
|
|
|
/**
|
|
* @clientlist_mutex:
|
|
*
|
|
* Protects &clientlist access.
|
|
*/
|
|
struct mutex clientlist_mutex;
|
|
|
|
/**
|
|
* @clientlist:
|
|
*
|
|
* List of in-kernel clients. Protected by &clientlist_mutex.
|
|
*/
|
|
struct list_head clientlist;
|
|
|
|
/**
|
|
* @vblank_disable_immediate:
|
|
*
|
|
* If true, vblank interrupt will be disabled immediately when the
|
|
* refcount drops to zero, as opposed to via the vblank disable
|
|
* timer.
|
|
*
|
|
* This can be set to true it the hardware has a working vblank counter
|
|
* with high-precision timestamping (otherwise there are races) and the
|
|
* driver uses drm_crtc_vblank_on() and drm_crtc_vblank_off()
|
|
* appropriately. Also, see @max_vblank_count,
|
|
* &drm_crtc_funcs.get_vblank_counter and
|
|
* &drm_vblank_crtc_config.disable_immediate.
|
|
*/
|
|
bool vblank_disable_immediate;
|
|
|
|
/**
|
|
* @vblank:
|
|
*
|
|
* Array of vblank tracking structures, one per &struct drm_crtc. For
|
|
* historical reasons (vblank support predates kernel modesetting) this
|
|
* is free-standing and not part of &struct drm_crtc itself. It must be
|
|
* initialized explicitly by calling drm_vblank_init().
|
|
*/
|
|
struct drm_vblank_crtc *vblank;
|
|
|
|
/**
|
|
* @vblank_time_lock:
|
|
*
|
|
* Protects vblank count and time updates during vblank enable/disable
|
|
*/
|
|
spinlock_t vblank_time_lock;
|
|
/**
|
|
* @vbl_lock: Top-level vblank references lock, wraps the low-level
|
|
* @vblank_time_lock.
|
|
*/
|
|
spinlock_t vbl_lock;
|
|
|
|
/**
|
|
* @max_vblank_count:
|
|
*
|
|
* Maximum value of the vblank registers. This value +1 will result in a
|
|
* wrap-around of the vblank register. It is used by the vblank core to
|
|
* handle wrap-arounds.
|
|
*
|
|
* If set to zero the vblank core will try to guess the elapsed vblanks
|
|
* between times when the vblank interrupt is disabled through
|
|
* high-precision timestamps. That approach is suffering from small
|
|
* races and imprecision over longer time periods, hence exposing a
|
|
* hardware vblank counter is always recommended.
|
|
*
|
|
* This is the statically configured device wide maximum. The driver
|
|
* can instead choose to use a runtime configurable per-crtc value
|
|
* &drm_vblank_crtc.max_vblank_count, in which case @max_vblank_count
|
|
* must be left at zero. See drm_crtc_set_max_vblank_count() on how
|
|
* to use the per-crtc value.
|
|
*
|
|
* If non-zero, &drm_crtc_funcs.get_vblank_counter must be set.
|
|
*/
|
|
u32 max_vblank_count;
|
|
|
|
/** @vblank_event_list: List of vblank events */
|
|
struct list_head vblank_event_list;
|
|
|
|
/**
|
|
* @event_lock:
|
|
*
|
|
* Protects @vblank_event_list and event delivery in
|
|
* general. See drm_send_event() and drm_send_event_locked().
|
|
*/
|
|
spinlock_t event_lock;
|
|
|
|
/** @num_crtcs: Number of CRTCs on this device */
|
|
unsigned int num_crtcs;
|
|
|
|
/** @mode_config: Current mode config */
|
|
struct drm_mode_config mode_config;
|
|
|
|
/** @object_name_lock: GEM information */
|
|
struct mutex object_name_lock;
|
|
|
|
/** @object_name_idr: GEM information */
|
|
struct idr object_name_idr;
|
|
|
|
/** @vma_offset_manager: GEM information */
|
|
struct drm_vma_offset_manager *vma_offset_manager;
|
|
|
|
/** @vram_mm: VRAM MM memory manager */
|
|
struct drm_vram_mm *vram_mm;
|
|
|
|
/**
|
|
* @switch_power_state:
|
|
*
|
|
* Power state of the client.
|
|
* Used by drivers supporting the switcheroo driver.
|
|
* The state is maintained in the
|
|
* &vga_switcheroo_client_ops.set_gpu_state callback
|
|
*/
|
|
enum switch_power_state switch_power_state;
|
|
|
|
/**
|
|
* @fb_helper:
|
|
*
|
|
* Pointer to the fbdev emulation structure.
|
|
* Set by drm_fb_helper_init() and cleared by drm_fb_helper_fini().
|
|
*/
|
|
struct drm_fb_helper *fb_helper;
|
|
|
|
/**
|
|
* @debugfs_root:
|
|
*
|
|
* Root directory for debugfs files.
|
|
*/
|
|
struct dentry *debugfs_root;
|
|
};
|
|
|
|
#endif
|