mirror of
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2025-04-13 09:59:31 +00:00
55 lines
2.6 KiB
ReStructuredText
55 lines
2.6 KiB
ReStructuredText
![]() |
==================================
|
||
|
Long running workloads and compute
|
||
|
==================================
|
||
|
|
||
|
Long running workloads (compute) are workloads that will not complete in 10
|
||
|
seconds. (The time let the user wait before he reaches for the power button).
|
||
|
This means that other techniques need to be used to manage those workloads,
|
||
|
that cannot use fences.
|
||
|
|
||
|
Some hardware may schedule compute jobs, and have no way to pre-empt them, or
|
||
|
have their memory swapped out from them. Or they simply want their workload
|
||
|
not to be preempted or swapped out at all.
|
||
|
|
||
|
This means that it differs from what is described in driver-api/dma-buf.rst.
|
||
|
|
||
|
As with normal compute jobs, dma-fence may not be used at all. In this case,
|
||
|
not even to force preemption. The driver with is simply forced to unmap a BO
|
||
|
from the long compute job's address space on unbind immediately, not even
|
||
|
waiting for the workload to complete. Effectively this terminates the workload
|
||
|
when there is no hardware support to recover.
|
||
|
|
||
|
Since this is undesirable, there need to be mitigations to prevent a workload
|
||
|
from being terminated. There are several possible approach, all with their
|
||
|
advantages and drawbacks.
|
||
|
|
||
|
The first approach you will likely try is to pin all buffers used by compute.
|
||
|
This guarantees that the job will run uninterrupted, but also allows a very
|
||
|
denial of service attack by pinning as much memory as possible, hogging the
|
||
|
all GPU memory, and possibly a huge chunk of CPU memory.
|
||
|
|
||
|
A second approach that will work slightly better on its own is adding an option
|
||
|
not to evict when creating a new job (any kind). If all of userspace opts in
|
||
|
to this flag, it would prevent cooperating userspace from forced terminating
|
||
|
older compute jobs to start a new one.
|
||
|
|
||
|
If job preemption and recoverable pagefaults are not available, those are the
|
||
|
only approaches possible. So even with those, you want a separate way of
|
||
|
controlling resources. The standard kernel way of doing so is cgroups.
|
||
|
|
||
|
This creates a third option, using cgroups to prevent eviction. Both GPU and
|
||
|
driver-allocated CPU memory would be accounted to the correct cgroup, and
|
||
|
eviction would be made cgroup aware. This allows the GPU to be partitioned
|
||
|
into cgroups, that will allow jobs to run next to each other without
|
||
|
interference.
|
||
|
|
||
|
The interface to the cgroup would be similar to the current CPU memory
|
||
|
interface, with similar semantics for min/low/high/max, if eviction can
|
||
|
be made cgroup aware.
|
||
|
|
||
|
What should be noted is that each memory region (tiled memory for example)
|
||
|
should have its own accounting.
|
||
|
|
||
|
The key is set to the regionid set by the driver, for example "tile0".
|
||
|
For the value of $card, we use drmGetUnique().
|