2022-11-19 21:54:20 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								.. SPDX-License-Identifier: GPL-2.0 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								============
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								Introduction
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								============
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								The Linux compute accelerators subsystem is designed to expose compute
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								accelerators in a common way to user-space and provide a common set of
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								functionality.
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								These devices can be either stand-alone ASICs or IP blocks inside an SoC/GPU.
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								Although these devices are typically designed to accelerate
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								Machine-Learning (ML) and/or Deep-Learning (DL) computations, the accel layer
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								is not limited to handling these types of accelerators.
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								Typically, a compute accelerator will belong to one of the following
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								categories:
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  Edge AI - doing inference at an edge device. It can be an embedded ASIC/FPGA,
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  or an IP inside a SoC (e.g. laptop web camera). These devices
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  are typically configured using registers and can work with or without DMA.
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  Inference data-center - single/multi user devices in a large server. This
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  type of device can be stand-alone or an IP inside a SoC or a GPU. It will
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  have on-board DRAM (to hold the DL topology), DMA engines and
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  command submission queues (either kernel or user-space queues).
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  It might also have an MMU to manage multiple users and might also enable
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  virtualization (SR-IOV) to support multiple VMs on the same device. In
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  addition, these devices will usually have some tools, such as profiler and
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  debugger.
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  Training data-center - Similar to Inference data-center cards, but typically
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  have more computational power and memory b/w (e.g. HBM) and will likely have
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  a method of scaling-up/out, i.e. connecting to other training cards inside
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  the server or in other servers, respectively.
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								All these devices typically have different runtime user-space software stacks,
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								that are tailored-made to their h/w. In addition, they will also probably
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								include a compiler to generate programs to their custom-made computational
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								engines. Typically, the common layer in user-space will be the DL frameworks,
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								such as PyTorch and TensorFlow.
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								Sharing code with DRM
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								=====================
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								Because this type of devices can be an IP inside GPUs or have similar
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								characteristics as those of GPUs, the accel subsystem will use the
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								DRM subsystem's code and functionality. i.e. the accel core code will
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								be part of the DRM subsystem and an accel device will be a new type of DRM
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								device.
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								This will allow us to leverage the extensive DRM code-base and
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								collaborate with DRM developers that have experience with this type of
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								devices. In addition, new features that will be added for the accelerator
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								drivers can be of use to GPU drivers as well.
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								Differentiation from GPUs
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								=========================
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								Because we want to prevent the extensive user-space graphic software stack
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								from trying to use an accelerator as a GPU, the compute accelerators will be
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								differentiated from GPUs by using a new major number and new device char files.
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								Furthermore, the drivers will be located in a separate place in the kernel
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								tree - drivers/accel/.
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								The accelerator devices will be exposed to the user space with the dedicated
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								261 major number and will have the following convention:
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2023-01-20 19:35:32 +07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								-  device char files - /dev/accel/accel\*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  sysfs             - /sys/class/accel/accel\*/
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  debugfs           - /sys/kernel/debug/accel/\*/
  
						 
					
						
							
								
									
										
										
										
											2022-11-19 21:54:20 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								Getting Started
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								===============
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								First, read the DRM documentation at Documentation/gpu/index.rst.
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								Not only it will explain how to write a new DRM driver but it will also
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								contain all the information on how to contribute, the Code Of Conduct and
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								what is the coding style/documentation. All of that is the same for the
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								accel subsystem.
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								Second, make sure the kernel is configured with CONFIG_DRM_ACCEL.
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								To expose your device as an accelerator, two changes are needed to
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								be done in your driver (as opposed to a standard DRM driver):
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  Add the DRIVER_COMPUTE_ACCEL feature flag in your drm_driver's
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  driver_features field. It is important to note that this driver feature is
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  mutually exclusive with DRIVER_RENDER and DRIVER_MODESET. Devices that want
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  to expose both graphics and compute device char files should be handled by
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  two drivers that are connected using the auxiliary bus framework.
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  Change the open callback in your driver fops structure to accel_open().
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  Alternatively, your driver can use DEFINE_DRM_ACCEL_FOPS macro to easily
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  set the correct function operations pointers structure.
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								External References
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								===================
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								email threads
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-------------
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2024-01-18 17:01:40 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								*  `Initial discussion on the New subsystem for acceleration devices  <https://lore.kernel.org/lkml/CAFCwf11=9qpNAepL7NL+YAV_QO=Wv6pnWPhKHKAepK3fNn+2Dg@mail.gmail.com/> `_  - Oded Gabbay (2022)
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								*  `patch-set to add the new subsystem  <https://lore.kernel.org/lkml/20221022214622.18042-1-ogabbay@kernel.org/> `_  - Oded Gabbay (2022)
  
						 
					
						
							
								
									
										
										
										
											2022-11-19 21:54:20 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								Conference talks
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								----------------
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								*  `LPC 2022 Accelerators BOF outcomes summary  <https://airlied.blogspot.com/2022/09/accelerators-bof-outcomes-summary.html> `_  - Dave Airlie (2022)