2020-01-07 13:40:09 +00:00
|
|
|
// SPDX-License-Identifier: MIT
|
|
|
|
/*
|
|
|
|
* Copyright © 2020 Intel Corporation
|
|
|
|
*/
|
|
|
|
|
|
|
|
#include <linux/log2.h>
|
|
|
|
|
2023-10-26 20:36:26 +02:00
|
|
|
#include "gem/i915_gem_internal.h"
|
2021-02-03 17:12:30 +00:00
|
|
|
#include "gem/i915_gem_lmem.h"
|
|
|
|
|
2020-01-07 13:40:09 +00:00
|
|
|
#include "gen8_ppgtt.h"
|
|
|
|
#include "i915_scatterlist.h"
|
|
|
|
#include "i915_trace.h"
|
2020-02-27 16:44:06 +02:00
|
|
|
#include "i915_pvinfo.h"
|
2020-01-07 13:40:09 +00:00
|
|
|
#include "i915_vgpu.h"
|
|
|
|
#include "intel_gt.h"
|
|
|
|
#include "intel_gtt.h"
|
|
|
|
|
|
|
|
static u64 gen8_pde_encode(const dma_addr_t addr,
|
|
|
|
const enum i915_cache_level level)
|
|
|
|
{
|
2021-12-06 13:52:45 -08:00
|
|
|
u64 pde = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
|
2020-01-07 13:40:09 +00:00
|
|
|
|
|
|
|
if (level != I915_CACHE_NONE)
|
|
|
|
pde |= PPAT_CACHED_PDE;
|
|
|
|
else
|
|
|
|
pde |= PPAT_UNCACHED;
|
|
|
|
|
|
|
|
return pde;
|
|
|
|
}
|
|
|
|
|
drm/i915/ggtt: do not set bits 1-11 in gen12 ptes
On TGL, bits 2-4 in the GGTT PTE are not ignored anymore and are
instead used for some extra VT-d capabilities. We don't (yet?) have
support for those capabilities, but, given that we shared the pte_encode
function betweed GGTT and PPGTT, we still set those bits to the PPGTT
PPAT values. The DMA engine gets very confused when those bits are
set while the iommu is enabled, leading to errors. E.g. when loading
the GuC we get:
[ 9.796218] DMAR: DRHD: handling fault status reg 2
[ 9.796235] DMAR: [DMA Write] Request device [00:02.0] PASID ffffffff fault addr 0 [fault reason 02] Present bit in context entry is clear
[ 9.899215] [drm:intel_guc_fw_upload [i915]] *ERROR* GuC firmware signature verification failed
To fix this, just have dedicated gen8_pte_encode function per type of
gtt. Also, explicitly set vm->pte_encode for gen8_ppgtt, even if we
don't use it, to make sure we don't accidentally assign it to the GGTT
one, like we do for gen6_ppgtt, in case we need it in the future.
Reported-by: "Sodhi, Vunny" <vunny.sodhi@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200226185657.26445-1-daniele.ceraolospurio@intel.com
2020-02-26 10:56:57 -08:00
|
|
|
static u64 gen8_pte_encode(dma_addr_t addr,
|
drm/i915: use pat_index instead of cache_level
Currently the KMD is using enum i915_cache_level to set caching policy for
buffer objects. This is flaky because the PAT index which really controls
the caching behavior in PTE has far more levels than what's defined in the
enum. In addition, the PAT index is platform dependent, having to translate
between i915_cache_level and PAT index is not reliable, and makes the code
more complicated.
From UMD's perspective there is also a necessity to set caching policy for
performance fine tuning. It's much easier for the UMD to directly use PAT
index because the behavior of each PAT index is clearly defined in Bspec.
Having the abstracted i915_cache_level sitting in between would only cause
more ambiguity. PAT is expected to work much like MOCS already works today,
and by design userspace is expected to select the index that exactly
matches the desired behavior described in the hardware specification.
For these reasons this patch replaces i915_cache_level with PAT index. Also
note, the cache_level is not completely removed yet, because the KMD still
has the need of creating buffer objects with simple cache settings such as
cached, uncached, or writethrough. For kernel objects, cache_level is used
for simplicity and backward compatibility. For Pre-gen12 platforms PAT can
have 1:1 mapping to i915_cache_level, so these two are interchangeable. see
the use of LEGACY_CACHELEVEL.
One consequence of this change is that gen8_pte_encode is no longer working
for gen12 platforms due to the fact that gen12 platforms has different PAT
definitions. In the meantime the mtl_pte_encode introduced specfically for
MTL becomes generic for all gen12 platforms. This patch renames the MTL
PTE encode function into gen12_pte_encode and apply it to all gen12. Even
though this change looks unrelated, but separating them would temporarily
break gen12 PTE encoding, thus squash them in one patch.
Special note: this patch changes the way caching behavior is controlled in
the sense that some objects are left to be managed by userspace. For such
objects we need to be careful not to change the userspace settings.There
are kerneldoc and comments added around obj->cache_coherent, cache_dirty,
and how to bypass the checkings by i915_gem_object_has_cache_level. For
full understanding, these changes need to be looked at together with the
two follow-up patches, one disables the {set|get}_caching ioctl's and the
other adds set_pat extension to the GEM_CREATE uAPI.
Bspec: 63019
Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230509165200.1740-3-fei.yang@intel.com
2023-05-09 09:52:00 -07:00
|
|
|
unsigned int pat_index,
|
drm/i915/ggtt: do not set bits 1-11 in gen12 ptes
On TGL, bits 2-4 in the GGTT PTE are not ignored anymore and are
instead used for some extra VT-d capabilities. We don't (yet?) have
support for those capabilities, but, given that we shared the pte_encode
function betweed GGTT and PPGTT, we still set those bits to the PPGTT
PPAT values. The DMA engine gets very confused when those bits are
set while the iommu is enabled, leading to errors. E.g. when loading
the GuC we get:
[ 9.796218] DMAR: DRHD: handling fault status reg 2
[ 9.796235] DMAR: [DMA Write] Request device [00:02.0] PASID ffffffff fault addr 0 [fault reason 02] Present bit in context entry is clear
[ 9.899215] [drm:intel_guc_fw_upload [i915]] *ERROR* GuC firmware signature verification failed
To fix this, just have dedicated gen8_pte_encode function per type of
gtt. Also, explicitly set vm->pte_encode for gen8_ppgtt, even if we
don't use it, to make sure we don't accidentally assign it to the GGTT
one, like we do for gen6_ppgtt, in case we need it in the future.
Reported-by: "Sodhi, Vunny" <vunny.sodhi@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200226185657.26445-1-daniele.ceraolospurio@intel.com
2020-02-26 10:56:57 -08:00
|
|
|
u32 flags)
|
|
|
|
{
|
2021-12-06 13:52:45 -08:00
|
|
|
gen8_pte_t pte = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
|
drm/i915/ggtt: do not set bits 1-11 in gen12 ptes
On TGL, bits 2-4 in the GGTT PTE are not ignored anymore and are
instead used for some extra VT-d capabilities. We don't (yet?) have
support for those capabilities, but, given that we shared the pte_encode
function betweed GGTT and PPGTT, we still set those bits to the PPGTT
PPAT values. The DMA engine gets very confused when those bits are
set while the iommu is enabled, leading to errors. E.g. when loading
the GuC we get:
[ 9.796218] DMAR: DRHD: handling fault status reg 2
[ 9.796235] DMAR: [DMA Write] Request device [00:02.0] PASID ffffffff fault addr 0 [fault reason 02] Present bit in context entry is clear
[ 9.899215] [drm:intel_guc_fw_upload [i915]] *ERROR* GuC firmware signature verification failed
To fix this, just have dedicated gen8_pte_encode function per type of
gtt. Also, explicitly set vm->pte_encode for gen8_ppgtt, even if we
don't use it, to make sure we don't accidentally assign it to the GGTT
one, like we do for gen6_ppgtt, in case we need it in the future.
Reported-by: "Sodhi, Vunny" <vunny.sodhi@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200226185657.26445-1-daniele.ceraolospurio@intel.com
2020-02-26 10:56:57 -08:00
|
|
|
|
|
|
|
if (unlikely(flags & PTE_READ_ONLY))
|
2021-12-06 13:52:45 -08:00
|
|
|
pte &= ~GEN8_PAGE_RW;
|
drm/i915/ggtt: do not set bits 1-11 in gen12 ptes
On TGL, bits 2-4 in the GGTT PTE are not ignored anymore and are
instead used for some extra VT-d capabilities. We don't (yet?) have
support for those capabilities, but, given that we shared the pte_encode
function betweed GGTT and PPGTT, we still set those bits to the PPGTT
PPAT values. The DMA engine gets very confused when those bits are
set while the iommu is enabled, leading to errors. E.g. when loading
the GuC we get:
[ 9.796218] DMAR: DRHD: handling fault status reg 2
[ 9.796235] DMAR: [DMA Write] Request device [00:02.0] PASID ffffffff fault addr 0 [fault reason 02] Present bit in context entry is clear
[ 9.899215] [drm:intel_guc_fw_upload [i915]] *ERROR* GuC firmware signature verification failed
To fix this, just have dedicated gen8_pte_encode function per type of
gtt. Also, explicitly set vm->pte_encode for gen8_ppgtt, even if we
don't use it, to make sure we don't accidentally assign it to the GGTT
one, like we do for gen6_ppgtt, in case we need it in the future.
Reported-by: "Sodhi, Vunny" <vunny.sodhi@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200226185657.26445-1-daniele.ceraolospurio@intel.com
2020-02-26 10:56:57 -08:00
|
|
|
|
drm/i915: use pat_index instead of cache_level
Currently the KMD is using enum i915_cache_level to set caching policy for
buffer objects. This is flaky because the PAT index which really controls
the caching behavior in PTE has far more levels than what's defined in the
enum. In addition, the PAT index is platform dependent, having to translate
between i915_cache_level and PAT index is not reliable, and makes the code
more complicated.
From UMD's perspective there is also a necessity to set caching policy for
performance fine tuning. It's much easier for the UMD to directly use PAT
index because the behavior of each PAT index is clearly defined in Bspec.
Having the abstracted i915_cache_level sitting in between would only cause
more ambiguity. PAT is expected to work much like MOCS already works today,
and by design userspace is expected to select the index that exactly
matches the desired behavior described in the hardware specification.
For these reasons this patch replaces i915_cache_level with PAT index. Also
note, the cache_level is not completely removed yet, because the KMD still
has the need of creating buffer objects with simple cache settings such as
cached, uncached, or writethrough. For kernel objects, cache_level is used
for simplicity and backward compatibility. For Pre-gen12 platforms PAT can
have 1:1 mapping to i915_cache_level, so these two are interchangeable. see
the use of LEGACY_CACHELEVEL.
One consequence of this change is that gen8_pte_encode is no longer working
for gen12 platforms due to the fact that gen12 platforms has different PAT
definitions. In the meantime the mtl_pte_encode introduced specfically for
MTL becomes generic for all gen12 platforms. This patch renames the MTL
PTE encode function into gen12_pte_encode and apply it to all gen12. Even
though this change looks unrelated, but separating them would temporarily
break gen12 PTE encoding, thus squash them in one patch.
Special note: this patch changes the way caching behavior is controlled in
the sense that some objects are left to be managed by userspace. For such
objects we need to be careful not to change the userspace settings.There
are kerneldoc and comments added around obj->cache_coherent, cache_dirty,
and how to bypass the checkings by i915_gem_object_has_cache_level. For
full understanding, these changes need to be looked at together with the
two follow-up patches, one disables the {set|get}_caching ioctl's and the
other adds set_pat extension to the GEM_CREATE uAPI.
Bspec: 63019
Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230509165200.1740-3-fei.yang@intel.com
2023-05-09 09:52:00 -07:00
|
|
|
/*
|
|
|
|
* For pre-gen12 platforms pat_index is the same as enum
|
|
|
|
* i915_cache_level, so the switch-case here is still valid.
|
|
|
|
* See translation table defined by LEGACY_CACHELEVEL.
|
|
|
|
*/
|
|
|
|
switch (pat_index) {
|
drm/i915/ggtt: do not set bits 1-11 in gen12 ptes
On TGL, bits 2-4 in the GGTT PTE are not ignored anymore and are
instead used for some extra VT-d capabilities. We don't (yet?) have
support for those capabilities, but, given that we shared the pte_encode
function betweed GGTT and PPGTT, we still set those bits to the PPGTT
PPAT values. The DMA engine gets very confused when those bits are
set while the iommu is enabled, leading to errors. E.g. when loading
the GuC we get:
[ 9.796218] DMAR: DRHD: handling fault status reg 2
[ 9.796235] DMAR: [DMA Write] Request device [00:02.0] PASID ffffffff fault addr 0 [fault reason 02] Present bit in context entry is clear
[ 9.899215] [drm:intel_guc_fw_upload [i915]] *ERROR* GuC firmware signature verification failed
To fix this, just have dedicated gen8_pte_encode function per type of
gtt. Also, explicitly set vm->pte_encode for gen8_ppgtt, even if we
don't use it, to make sure we don't accidentally assign it to the GGTT
one, like we do for gen6_ppgtt, in case we need it in the future.
Reported-by: "Sodhi, Vunny" <vunny.sodhi@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200226185657.26445-1-daniele.ceraolospurio@intel.com
2020-02-26 10:56:57 -08:00
|
|
|
case I915_CACHE_NONE:
|
|
|
|
pte |= PPAT_UNCACHED;
|
|
|
|
break;
|
|
|
|
case I915_CACHE_WT:
|
|
|
|
pte |= PPAT_DISPLAY_ELLC;
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
pte |= PPAT_CACHED;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
return pte;
|
|
|
|
}
|
|
|
|
|
drm/i915: use pat_index instead of cache_level
Currently the KMD is using enum i915_cache_level to set caching policy for
buffer objects. This is flaky because the PAT index which really controls
the caching behavior in PTE has far more levels than what's defined in the
enum. In addition, the PAT index is platform dependent, having to translate
between i915_cache_level and PAT index is not reliable, and makes the code
more complicated.
From UMD's perspective there is also a necessity to set caching policy for
performance fine tuning. It's much easier for the UMD to directly use PAT
index because the behavior of each PAT index is clearly defined in Bspec.
Having the abstracted i915_cache_level sitting in between would only cause
more ambiguity. PAT is expected to work much like MOCS already works today,
and by design userspace is expected to select the index that exactly
matches the desired behavior described in the hardware specification.
For these reasons this patch replaces i915_cache_level with PAT index. Also
note, the cache_level is not completely removed yet, because the KMD still
has the need of creating buffer objects with simple cache settings such as
cached, uncached, or writethrough. For kernel objects, cache_level is used
for simplicity and backward compatibility. For Pre-gen12 platforms PAT can
have 1:1 mapping to i915_cache_level, so these two are interchangeable. see
the use of LEGACY_CACHELEVEL.
One consequence of this change is that gen8_pte_encode is no longer working
for gen12 platforms due to the fact that gen12 platforms has different PAT
definitions. In the meantime the mtl_pte_encode introduced specfically for
MTL becomes generic for all gen12 platforms. This patch renames the MTL
PTE encode function into gen12_pte_encode and apply it to all gen12. Even
though this change looks unrelated, but separating them would temporarily
break gen12 PTE encoding, thus squash them in one patch.
Special note: this patch changes the way caching behavior is controlled in
the sense that some objects are left to be managed by userspace. For such
objects we need to be careful not to change the userspace settings.There
are kerneldoc and comments added around obj->cache_coherent, cache_dirty,
and how to bypass the checkings by i915_gem_object_has_cache_level. For
full understanding, these changes need to be looked at together with the
two follow-up patches, one disables the {set|get}_caching ioctl's and the
other adds set_pat extension to the GEM_CREATE uAPI.
Bspec: 63019
Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230509165200.1740-3-fei.yang@intel.com
2023-05-09 09:52:00 -07:00
|
|
|
static u64 gen12_pte_encode(dma_addr_t addr,
|
|
|
|
unsigned int pat_index,
|
|
|
|
u32 flags)
|
2023-04-24 11:29:01 -07:00
|
|
|
{
|
|
|
|
gen8_pte_t pte = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
|
|
|
|
|
|
|
|
if (unlikely(flags & PTE_READ_ONLY))
|
|
|
|
pte &= ~GEN8_PAGE_RW;
|
|
|
|
|
|
|
|
if (flags & PTE_LM)
|
|
|
|
pte |= GEN12_PPGTT_PTE_LM;
|
|
|
|
|
drm/i915: use pat_index instead of cache_level
Currently the KMD is using enum i915_cache_level to set caching policy for
buffer objects. This is flaky because the PAT index which really controls
the caching behavior in PTE has far more levels than what's defined in the
enum. In addition, the PAT index is platform dependent, having to translate
between i915_cache_level and PAT index is not reliable, and makes the code
more complicated.
From UMD's perspective there is also a necessity to set caching policy for
performance fine tuning. It's much easier for the UMD to directly use PAT
index because the behavior of each PAT index is clearly defined in Bspec.
Having the abstracted i915_cache_level sitting in between would only cause
more ambiguity. PAT is expected to work much like MOCS already works today,
and by design userspace is expected to select the index that exactly
matches the desired behavior described in the hardware specification.
For these reasons this patch replaces i915_cache_level with PAT index. Also
note, the cache_level is not completely removed yet, because the KMD still
has the need of creating buffer objects with simple cache settings such as
cached, uncached, or writethrough. For kernel objects, cache_level is used
for simplicity and backward compatibility. For Pre-gen12 platforms PAT can
have 1:1 mapping to i915_cache_level, so these two are interchangeable. see
the use of LEGACY_CACHELEVEL.
One consequence of this change is that gen8_pte_encode is no longer working
for gen12 platforms due to the fact that gen12 platforms has different PAT
definitions. In the meantime the mtl_pte_encode introduced specfically for
MTL becomes generic for all gen12 platforms. This patch renames the MTL
PTE encode function into gen12_pte_encode and apply it to all gen12. Even
though this change looks unrelated, but separating them would temporarily
break gen12 PTE encoding, thus squash them in one patch.
Special note: this patch changes the way caching behavior is controlled in
the sense that some objects are left to be managed by userspace. For such
objects we need to be careful not to change the userspace settings.There
are kerneldoc and comments added around obj->cache_coherent, cache_dirty,
and how to bypass the checkings by i915_gem_object_has_cache_level. For
full understanding, these changes need to be looked at together with the
two follow-up patches, one disables the {set|get}_caching ioctl's and the
other adds set_pat extension to the GEM_CREATE uAPI.
Bspec: 63019
Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230509165200.1740-3-fei.yang@intel.com
2023-05-09 09:52:00 -07:00
|
|
|
if (pat_index & BIT(0))
|
2023-04-24 11:29:01 -07:00
|
|
|
pte |= GEN12_PPGTT_PTE_PAT0;
|
drm/i915: use pat_index instead of cache_level
Currently the KMD is using enum i915_cache_level to set caching policy for
buffer objects. This is flaky because the PAT index which really controls
the caching behavior in PTE has far more levels than what's defined in the
enum. In addition, the PAT index is platform dependent, having to translate
between i915_cache_level and PAT index is not reliable, and makes the code
more complicated.
From UMD's perspective there is also a necessity to set caching policy for
performance fine tuning. It's much easier for the UMD to directly use PAT
index because the behavior of each PAT index is clearly defined in Bspec.
Having the abstracted i915_cache_level sitting in between would only cause
more ambiguity. PAT is expected to work much like MOCS already works today,
and by design userspace is expected to select the index that exactly
matches the desired behavior described in the hardware specification.
For these reasons this patch replaces i915_cache_level with PAT index. Also
note, the cache_level is not completely removed yet, because the KMD still
has the need of creating buffer objects with simple cache settings such as
cached, uncached, or writethrough. For kernel objects, cache_level is used
for simplicity and backward compatibility. For Pre-gen12 platforms PAT can
have 1:1 mapping to i915_cache_level, so these two are interchangeable. see
the use of LEGACY_CACHELEVEL.
One consequence of this change is that gen8_pte_encode is no longer working
for gen12 platforms due to the fact that gen12 platforms has different PAT
definitions. In the meantime the mtl_pte_encode introduced specfically for
MTL becomes generic for all gen12 platforms. This patch renames the MTL
PTE encode function into gen12_pte_encode and apply it to all gen12. Even
though this change looks unrelated, but separating them would temporarily
break gen12 PTE encoding, thus squash them in one patch.
Special note: this patch changes the way caching behavior is controlled in
the sense that some objects are left to be managed by userspace. For such
objects we need to be careful not to change the userspace settings.There
are kerneldoc and comments added around obj->cache_coherent, cache_dirty,
and how to bypass the checkings by i915_gem_object_has_cache_level. For
full understanding, these changes need to be looked at together with the
two follow-up patches, one disables the {set|get}_caching ioctl's and the
other adds set_pat extension to the GEM_CREATE uAPI.
Bspec: 63019
Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230509165200.1740-3-fei.yang@intel.com
2023-05-09 09:52:00 -07:00
|
|
|
|
|
|
|
if (pat_index & BIT(1))
|
|
|
|
pte |= GEN12_PPGTT_PTE_PAT1;
|
|
|
|
|
|
|
|
if (pat_index & BIT(2))
|
|
|
|
pte |= GEN12_PPGTT_PTE_PAT2;
|
|
|
|
|
|
|
|
if (pat_index & BIT(3))
|
|
|
|
pte |= MTL_PPGTT_PTE_PAT3;
|
2023-04-24 11:29:01 -07:00
|
|
|
|
|
|
|
return pte;
|
|
|
|
}
|
|
|
|
|
2020-01-07 13:40:09 +00:00
|
|
|
static void gen8_ppgtt_notify_vgt(struct i915_ppgtt *ppgtt, bool create)
|
|
|
|
{
|
|
|
|
struct drm_i915_private *i915 = ppgtt->vm.i915;
|
|
|
|
struct intel_uncore *uncore = ppgtt->vm.gt->uncore;
|
|
|
|
enum vgt_g2v_type msg;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
if (create)
|
|
|
|
atomic_inc(px_used(ppgtt->pd)); /* never remove */
|
|
|
|
else
|
|
|
|
atomic_dec(px_used(ppgtt->pd));
|
|
|
|
|
|
|
|
mutex_lock(&i915->vgpu.lock);
|
|
|
|
|
|
|
|
if (i915_vm_is_4lvl(&ppgtt->vm)) {
|
|
|
|
const u64 daddr = px_dma(ppgtt->pd);
|
|
|
|
|
|
|
|
intel_uncore_write(uncore,
|
|
|
|
vgtif_reg(pdp[0].lo), lower_32_bits(daddr));
|
|
|
|
intel_uncore_write(uncore,
|
|
|
|
vgtif_reg(pdp[0].hi), upper_32_bits(daddr));
|
|
|
|
|
|
|
|
msg = create ?
|
|
|
|
VGT_G2V_PPGTT_L4_PAGE_TABLE_CREATE :
|
|
|
|
VGT_G2V_PPGTT_L4_PAGE_TABLE_DESTROY;
|
|
|
|
} else {
|
|
|
|
for (i = 0; i < GEN8_3LVL_PDPES; i++) {
|
|
|
|
const u64 daddr = i915_page_dir_dma_addr(ppgtt, i);
|
|
|
|
|
|
|
|
intel_uncore_write(uncore,
|
|
|
|
vgtif_reg(pdp[i].lo),
|
|
|
|
lower_32_bits(daddr));
|
|
|
|
intel_uncore_write(uncore,
|
|
|
|
vgtif_reg(pdp[i].hi),
|
|
|
|
upper_32_bits(daddr));
|
|
|
|
}
|
|
|
|
|
|
|
|
msg = create ?
|
|
|
|
VGT_G2V_PPGTT_L3_PAGE_TABLE_CREATE :
|
|
|
|
VGT_G2V_PPGTT_L3_PAGE_TABLE_DESTROY;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* g2v_notify atomically (via hv trap) consumes the message packet. */
|
|
|
|
intel_uncore_write(uncore, vgtif_reg(g2v_notify), msg);
|
|
|
|
|
|
|
|
mutex_unlock(&i915->vgpu.lock);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Index shifts into the pagetable are offset by GEN8_PTE_SHIFT [12] */
|
|
|
|
#define GEN8_PAGE_SIZE (SZ_4K) /* page and page-directory sizes are the same */
|
|
|
|
#define GEN8_PTE_SHIFT (ilog2(GEN8_PAGE_SIZE))
|
|
|
|
#define GEN8_PDES (GEN8_PAGE_SIZE / sizeof(u64))
|
|
|
|
#define gen8_pd_shift(lvl) ((lvl) * ilog2(GEN8_PDES))
|
|
|
|
#define gen8_pd_index(i, lvl) i915_pde_index((i), gen8_pd_shift(lvl))
|
|
|
|
#define __gen8_pte_shift(lvl) (GEN8_PTE_SHIFT + gen8_pd_shift(lvl))
|
|
|
|
#define __gen8_pte_index(a, lvl) i915_pde_index((a), __gen8_pte_shift(lvl))
|
|
|
|
|
|
|
|
#define as_pd(x) container_of((x), typeof(struct i915_page_directory), pt)
|
|
|
|
|
2021-01-13 15:22:24 +00:00
|
|
|
static unsigned int
|
2020-01-07 13:40:09 +00:00
|
|
|
gen8_pd_range(u64 start, u64 end, int lvl, unsigned int *idx)
|
|
|
|
{
|
|
|
|
const int shift = gen8_pd_shift(lvl);
|
|
|
|
const u64 mask = ~0ull << gen8_pd_shift(lvl + 1);
|
|
|
|
|
|
|
|
GEM_BUG_ON(start >= end);
|
|
|
|
end += ~mask >> gen8_pd_shift(1);
|
|
|
|
|
|
|
|
*idx = i915_pde_index(start, shift);
|
|
|
|
if ((start ^ end) & mask)
|
|
|
|
return GEN8_PDES - *idx;
|
|
|
|
else
|
|
|
|
return i915_pde_index(end, shift) - *idx;
|
|
|
|
}
|
|
|
|
|
2021-01-13 15:22:24 +00:00
|
|
|
static bool gen8_pd_contains(u64 start, u64 end, int lvl)
|
2020-01-07 13:40:09 +00:00
|
|
|
{
|
|
|
|
const u64 mask = ~0ull << gen8_pd_shift(lvl + 1);
|
|
|
|
|
|
|
|
GEM_BUG_ON(start >= end);
|
|
|
|
return (start ^ end) & mask && (start & ~mask) == 0;
|
|
|
|
}
|
|
|
|
|
2021-01-13 15:22:24 +00:00
|
|
|
static unsigned int gen8_pt_count(u64 start, u64 end)
|
2020-01-07 13:40:09 +00:00
|
|
|
{
|
|
|
|
GEM_BUG_ON(start >= end);
|
|
|
|
if ((start ^ end) >> gen8_pd_shift(1))
|
|
|
|
return GEN8_PDES - (start & (GEN8_PDES - 1));
|
|
|
|
else
|
|
|
|
return end - start;
|
|
|
|
}
|
|
|
|
|
2021-01-13 15:22:24 +00:00
|
|
|
static unsigned int gen8_pd_top_count(const struct i915_address_space *vm)
|
2020-01-07 13:40:09 +00:00
|
|
|
{
|
|
|
|
unsigned int shift = __gen8_pte_shift(vm->top);
|
2021-01-22 19:29:05 +00:00
|
|
|
|
2020-01-07 13:40:09 +00:00
|
|
|
return (vm->total + (1ull << shift) - 1) >> shift;
|
|
|
|
}
|
|
|
|
|
2021-01-13 15:22:24 +00:00
|
|
|
static struct i915_page_directory *
|
2020-01-07 13:40:09 +00:00
|
|
|
gen8_pdp_for_page_index(struct i915_address_space * const vm, const u64 idx)
|
|
|
|
{
|
|
|
|
struct i915_ppgtt * const ppgtt = i915_vm_to_ppgtt(vm);
|
|
|
|
|
|
|
|
if (vm->top == 2)
|
|
|
|
return ppgtt->pd;
|
|
|
|
else
|
|
|
|
return i915_pd_entry(ppgtt->pd, gen8_pd_index(idx, vm->top));
|
|
|
|
}
|
|
|
|
|
2021-01-13 15:22:24 +00:00
|
|
|
static struct i915_page_directory *
|
2020-01-07 13:40:09 +00:00
|
|
|
gen8_pdp_for_page_address(struct i915_address_space * const vm, const u64 addr)
|
|
|
|
{
|
|
|
|
return gen8_pdp_for_page_index(vm, addr >> GEN8_PTE_SHIFT);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void __gen8_ppgtt_cleanup(struct i915_address_space *vm,
|
|
|
|
struct i915_page_directory *pd,
|
|
|
|
int count, int lvl)
|
|
|
|
{
|
|
|
|
if (lvl) {
|
|
|
|
void **pde = pd->entry;
|
|
|
|
|
|
|
|
do {
|
|
|
|
if (!*pde)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
__gen8_ppgtt_cleanup(vm, *pde, GEN8_PDES, lvl - 1);
|
|
|
|
} while (pde++, --count);
|
|
|
|
}
|
|
|
|
|
drm/i915/gt: Shrink i915_page_directory's slab bucket
kmalloc uses power-of-two slab buckets for small allocations (up to a
few pages). Since i915_page_directory is a page of pointers, plus a
couple more, this is rounded up to 8K, and we waste nearly 50% of that
allocation. Long terms this leads to poor memory utilisation, bloating
the kernel footprint, but the problem is exacerbated by our conservative
preallocation scheme for binding VMA. As we are required to allocate all
levels for each vma just in case we need to insert them upon binding,
this leads to a large multiplication factor for a single page vma. By
halving the allocation we need for the page directory structure, we
halve the impact of that factor, bringing workloads that once fitted into
memory, hopefully back to fitting into memory.
We maintain the split between i915_page_directory and i915_page_table as
we only need half the allocation for the lowest, most populous, level.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200729164219.5737-3-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-07-29 17:42:19 +01:00
|
|
|
free_px(vm, &pd->pt, lvl);
|
2020-01-07 13:40:09 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
|
|
|
|
{
|
|
|
|
struct i915_ppgtt *ppgtt = i915_vm_to_ppgtt(vm);
|
|
|
|
|
2023-10-26 20:36:26 +02:00
|
|
|
if (vm->rsvd.obj)
|
|
|
|
i915_gem_object_put(vm->rsvd.obj);
|
|
|
|
|
2020-01-07 13:40:09 +00:00
|
|
|
if (intel_vgpu_active(vm->i915))
|
|
|
|
gen8_ppgtt_notify_vgt(ppgtt, false);
|
|
|
|
|
2022-09-26 16:33:33 +01:00
|
|
|
if (ppgtt->pd)
|
|
|
|
__gen8_ppgtt_cleanup(vm, ppgtt->pd,
|
|
|
|
gen8_pd_top_count(vm), vm->top);
|
|
|
|
|
2020-01-07 13:40:09 +00:00
|
|
|
free_scratch(vm);
|
|
|
|
}
|
|
|
|
|
|
|
|
static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm,
|
|
|
|
struct i915_page_directory * const pd,
|
|
|
|
u64 start, const u64 end, int lvl)
|
|
|
|
{
|
2020-07-29 17:42:18 +01:00
|
|
|
const struct drm_i915_gem_object * const scratch = vm->scratch[lvl];
|
2020-01-07 13:40:09 +00:00
|
|
|
unsigned int idx, len;
|
|
|
|
|
|
|
|
GEM_BUG_ON(end > vm->total >> GEN8_PTE_SHIFT);
|
|
|
|
|
|
|
|
len = gen8_pd_range(start, end, lvl--, &idx);
|
2023-09-11 15:33:05 +03:00
|
|
|
GTT_TRACE("%s(%p):{ lvl:%d, start:%llx, end:%llx, idx:%d, len:%d, used:%d }\n",
|
|
|
|
__func__, vm, lvl + 1, start, end,
|
|
|
|
idx, len, atomic_read(px_used(pd)));
|
2020-01-07 13:40:09 +00:00
|
|
|
GEM_BUG_ON(!len || len >= atomic_read(px_used(pd)));
|
|
|
|
|
|
|
|
do {
|
|
|
|
struct i915_page_table *pt = pd->entry[idx];
|
|
|
|
|
|
|
|
if (atomic_fetch_inc(&pt->used) >> gen8_pd_shift(1) &&
|
|
|
|
gen8_pd_contains(start, end, lvl)) {
|
2023-09-11 15:33:05 +03:00
|
|
|
GTT_TRACE("%s(%p):{ lvl:%d, idx:%d, start:%llx, end:%llx } removing pd\n",
|
|
|
|
__func__, vm, lvl + 1, idx, start, end);
|
2020-01-07 13:40:09 +00:00
|
|
|
clear_pd_entry(pd, idx, scratch);
|
|
|
|
__gen8_ppgtt_cleanup(vm, as_pd(pt), I915_PDES, lvl);
|
|
|
|
start += (u64)I915_PDES << gen8_pd_shift(lvl);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (lvl) {
|
|
|
|
start = __gen8_ppgtt_clear(vm, as_pd(pt),
|
|
|
|
start, end, lvl);
|
|
|
|
} else {
|
|
|
|
unsigned int count;
|
2022-02-19 00:17:44 +05:30
|
|
|
unsigned int pte = gen8_pd_index(start, 0);
|
|
|
|
unsigned int num_ptes;
|
2020-01-07 13:40:09 +00:00
|
|
|
u64 *vaddr;
|
|
|
|
|
|
|
|
count = gen8_pt_count(start, end);
|
2023-09-11 15:33:05 +03:00
|
|
|
GTT_TRACE("%s(%p):{ lvl:%d, start:%llx, end:%llx, idx:%d, len:%d, used:%d } removing pte\n",
|
|
|
|
__func__, vm, lvl, start, end,
|
|
|
|
gen8_pd_index(start, 0), count,
|
|
|
|
atomic_read(&pt->used));
|
2020-01-07 13:40:09 +00:00
|
|
|
GEM_BUG_ON(!count || count >= atomic_read(&pt->used));
|
|
|
|
|
2022-02-19 00:17:44 +05:30
|
|
|
num_ptes = count;
|
|
|
|
if (pt->is_compact) {
|
|
|
|
GEM_BUG_ON(num_ptes % 16);
|
|
|
|
GEM_BUG_ON(pte % 16);
|
|
|
|
num_ptes /= 16;
|
|
|
|
pte /= 16;
|
|
|
|
}
|
|
|
|
|
2021-04-27 09:54:13 +01:00
|
|
|
vaddr = px_vaddr(pt);
|
2022-02-19 00:17:44 +05:30
|
|
|
memset64(vaddr + pte,
|
2020-07-29 17:42:18 +01:00
|
|
|
vm->scratch[0]->encode,
|
2022-02-19 00:17:44 +05:30
|
|
|
num_ptes);
|
2020-01-07 13:40:09 +00:00
|
|
|
|
|
|
|
atomic_sub(count, &pt->used);
|
|
|
|
start += count;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (release_pd_entry(pd, idx, pt, scratch))
|
drm/i915/gt: Shrink i915_page_directory's slab bucket
kmalloc uses power-of-two slab buckets for small allocations (up to a
few pages). Since i915_page_directory is a page of pointers, plus a
couple more, this is rounded up to 8K, and we waste nearly 50% of that
allocation. Long terms this leads to poor memory utilisation, bloating
the kernel footprint, but the problem is exacerbated by our conservative
preallocation scheme for binding VMA. As we are required to allocate all
levels for each vma just in case we need to insert them upon binding,
this leads to a large multiplication factor for a single page vma. By
halving the allocation we need for the page directory structure, we
halve the impact of that factor, bringing workloads that once fitted into
memory, hopefully back to fitting into memory.
We maintain the split between i915_page_directory and i915_page_table as
we only need half the allocation for the lowest, most populous, level.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200729164219.5737-3-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-07-29 17:42:19 +01:00
|
|
|
free_px(vm, pt, lvl);
|
2020-01-07 13:40:09 +00:00
|
|
|
} while (idx++, --len);
|
|
|
|
|
|
|
|
return start;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void gen8_ppgtt_clear(struct i915_address_space *vm,
|
|
|
|
u64 start, u64 length)
|
|
|
|
{
|
|
|
|
GEM_BUG_ON(!IS_ALIGNED(start, BIT_ULL(GEN8_PTE_SHIFT)));
|
|
|
|
GEM_BUG_ON(!IS_ALIGNED(length, BIT_ULL(GEN8_PTE_SHIFT)));
|
|
|
|
GEM_BUG_ON(range_overflows(start, length, vm->total));
|
|
|
|
|
|
|
|
start >>= GEN8_PTE_SHIFT;
|
|
|
|
length >>= GEN8_PTE_SHIFT;
|
|
|
|
GEM_BUG_ON(length == 0);
|
|
|
|
|
|
|
|
__gen8_ppgtt_clear(vm, i915_vm_to_ppgtt(vm)->pd,
|
|
|
|
start, start + length, vm->top);
|
|
|
|
}
|
|
|
|
|
2020-07-29 17:42:17 +01:00
|
|
|
static void __gen8_ppgtt_alloc(struct i915_address_space * const vm,
|
|
|
|
struct i915_vm_pt_stash *stash,
|
|
|
|
struct i915_page_directory * const pd,
|
|
|
|
u64 * const start, const u64 end, int lvl)
|
2020-01-07 13:40:09 +00:00
|
|
|
{
|
|
|
|
unsigned int idx, len;
|
|
|
|
|
|
|
|
GEM_BUG_ON(end > vm->total >> GEN8_PTE_SHIFT);
|
|
|
|
|
|
|
|
len = gen8_pd_range(*start, end, lvl--, &idx);
|
2023-09-11 15:33:05 +03:00
|
|
|
GTT_TRACE("%s(%p):{ lvl:%d, start:%llx, end:%llx, idx:%d, len:%d, used:%d }\n",
|
|
|
|
__func__, vm, lvl + 1, *start, end,
|
|
|
|
idx, len, atomic_read(px_used(pd)));
|
2020-01-07 13:40:09 +00:00
|
|
|
GEM_BUG_ON(!len || (idx + len - 1) >> gen8_pd_shift(1));
|
|
|
|
|
|
|
|
spin_lock(&pd->lock);
|
|
|
|
GEM_BUG_ON(!atomic_read(px_used(pd))); /* Must be pinned! */
|
|
|
|
do {
|
|
|
|
struct i915_page_table *pt = pd->entry[idx];
|
|
|
|
|
|
|
|
if (!pt) {
|
|
|
|
spin_unlock(&pd->lock);
|
|
|
|
|
2023-09-11 15:33:05 +03:00
|
|
|
GTT_TRACE("%s(%p):{ lvl:%d, idx:%d } allocating new tree\n",
|
|
|
|
__func__, vm, lvl + 1, idx);
|
2020-01-07 13:40:09 +00:00
|
|
|
|
2020-07-29 17:42:17 +01:00
|
|
|
pt = stash->pt[!!lvl];
|
2020-07-29 17:42:18 +01:00
|
|
|
__i915_gem_object_pin_pages(pt->base);
|
2020-01-07 13:40:09 +00:00
|
|
|
|
2021-07-13 14:04:31 +01:00
|
|
|
fill_px(pt, vm->scratch[lvl]->encode);
|
2020-01-07 13:40:09 +00:00
|
|
|
|
|
|
|
spin_lock(&pd->lock);
|
2020-07-29 17:42:17 +01:00
|
|
|
if (likely(!pd->entry[idx])) {
|
|
|
|
stash->pt[!!lvl] = pt->stash;
|
|
|
|
atomic_set(&pt->used, 0);
|
2020-01-07 13:40:09 +00:00
|
|
|
set_pd_entry(pd, idx, pt);
|
2020-07-29 17:42:17 +01:00
|
|
|
} else {
|
|
|
|
pt = pd->entry[idx];
|
|
|
|
}
|
2020-01-07 13:40:09 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
if (lvl) {
|
|
|
|
atomic_inc(&pt->used);
|
|
|
|
spin_unlock(&pd->lock);
|
|
|
|
|
2020-07-29 17:42:17 +01:00
|
|
|
__gen8_ppgtt_alloc(vm, stash,
|
|
|
|
as_pd(pt), start, end, lvl);
|
2020-01-07 13:40:09 +00:00
|
|
|
|
|
|
|
spin_lock(&pd->lock);
|
|
|
|
atomic_dec(&pt->used);
|
|
|
|
GEM_BUG_ON(!atomic_read(&pt->used));
|
|
|
|
} else {
|
|
|
|
unsigned int count = gen8_pt_count(*start, end);
|
|
|
|
|
2023-09-11 15:33:05 +03:00
|
|
|
GTT_TRACE("%s(%p):{ lvl:%d, start:%llx, end:%llx, idx:%d, len:%d, used:%d } inserting pte\n",
|
|
|
|
__func__, vm, lvl, *start, end,
|
|
|
|
gen8_pd_index(*start, 0), count,
|
|
|
|
atomic_read(&pt->used));
|
2020-01-07 13:40:09 +00:00
|
|
|
|
|
|
|
atomic_add(count, &pt->used);
|
|
|
|
/* All other pdes may be simultaneously removed */
|
|
|
|
GEM_BUG_ON(atomic_read(&pt->used) > NALLOC * I915_PDES);
|
|
|
|
*start += count;
|
|
|
|
}
|
|
|
|
} while (idx++, --len);
|
|
|
|
spin_unlock(&pd->lock);
|
|
|
|
}
|
|
|
|
|
2020-07-29 17:42:17 +01:00
|
|
|
static void gen8_ppgtt_alloc(struct i915_address_space *vm,
|
|
|
|
struct i915_vm_pt_stash *stash,
|
|
|
|
u64 start, u64 length)
|
2020-01-07 13:40:09 +00:00
|
|
|
{
|
|
|
|
GEM_BUG_ON(!IS_ALIGNED(start, BIT_ULL(GEN8_PTE_SHIFT)));
|
|
|
|
GEM_BUG_ON(!IS_ALIGNED(length, BIT_ULL(GEN8_PTE_SHIFT)));
|
|
|
|
GEM_BUG_ON(range_overflows(start, length, vm->total));
|
|
|
|
|
|
|
|
start >>= GEN8_PTE_SHIFT;
|
|
|
|
length >>= GEN8_PTE_SHIFT;
|
|
|
|
GEM_BUG_ON(length == 0);
|
|
|
|
|
2020-07-29 17:42:17 +01:00
|
|
|
__gen8_ppgtt_alloc(vm, stash, i915_vm_to_ppgtt(vm)->pd,
|
|
|
|
&start, start + length, vm->top);
|
2020-01-07 13:40:09 +00:00
|
|
|
}
|
|
|
|
|
2021-06-17 08:30:11 +02:00
|
|
|
static void __gen8_ppgtt_foreach(struct i915_address_space *vm,
|
|
|
|
struct i915_page_directory *pd,
|
|
|
|
u64 *start, u64 end, int lvl,
|
|
|
|
void (*fn)(struct i915_address_space *vm,
|
|
|
|
struct i915_page_table *pt,
|
|
|
|
void *data),
|
|
|
|
void *data)
|
|
|
|
{
|
|
|
|
unsigned int idx, len;
|
|
|
|
|
|
|
|
len = gen8_pd_range(*start, end, lvl--, &idx);
|
|
|
|
|
|
|
|
spin_lock(&pd->lock);
|
|
|
|
do {
|
|
|
|
struct i915_page_table *pt = pd->entry[idx];
|
|
|
|
|
|
|
|
atomic_inc(&pt->used);
|
|
|
|
spin_unlock(&pd->lock);
|
|
|
|
|
|
|
|
if (lvl) {
|
|
|
|
__gen8_ppgtt_foreach(vm, as_pd(pt), start, end, lvl,
|
|
|
|
fn, data);
|
|
|
|
} else {
|
|
|
|
fn(vm, pt, data);
|
|
|
|
*start += gen8_pt_count(*start, end);
|
|
|
|
}
|
|
|
|
|
|
|
|
spin_lock(&pd->lock);
|
|
|
|
atomic_dec(&pt->used);
|
|
|
|
} while (idx++, --len);
|
|
|
|
spin_unlock(&pd->lock);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void gen8_ppgtt_foreach(struct i915_address_space *vm,
|
|
|
|
u64 start, u64 length,
|
|
|
|
void (*fn)(struct i915_address_space *vm,
|
|
|
|
struct i915_page_table *pt,
|
|
|
|
void *data),
|
|
|
|
void *data)
|
|
|
|
{
|
|
|
|
start >>= GEN8_PTE_SHIFT;
|
|
|
|
length >>= GEN8_PTE_SHIFT;
|
|
|
|
|
|
|
|
__gen8_ppgtt_foreach(vm, i915_vm_to_ppgtt(vm)->pd,
|
|
|
|
&start, start + length, vm->top,
|
|
|
|
fn, data);
|
|
|
|
}
|
|
|
|
|
2020-01-07 13:40:09 +00:00
|
|
|
static __always_inline u64
|
|
|
|
gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt,
|
|
|
|
struct i915_page_directory *pdp,
|
|
|
|
struct sgt_dma *iter,
|
|
|
|
u64 idx,
|
drm/i915: use pat_index instead of cache_level
Currently the KMD is using enum i915_cache_level to set caching policy for
buffer objects. This is flaky because the PAT index which really controls
the caching behavior in PTE has far more levels than what's defined in the
enum. In addition, the PAT index is platform dependent, having to translate
between i915_cache_level and PAT index is not reliable, and makes the code
more complicated.
From UMD's perspective there is also a necessity to set caching policy for
performance fine tuning. It's much easier for the UMD to directly use PAT
index because the behavior of each PAT index is clearly defined in Bspec.
Having the abstracted i915_cache_level sitting in between would only cause
more ambiguity. PAT is expected to work much like MOCS already works today,
and by design userspace is expected to select the index that exactly
matches the desired behavior described in the hardware specification.
For these reasons this patch replaces i915_cache_level with PAT index. Also
note, the cache_level is not completely removed yet, because the KMD still
has the need of creating buffer objects with simple cache settings such as
cached, uncached, or writethrough. For kernel objects, cache_level is used
for simplicity and backward compatibility. For Pre-gen12 platforms PAT can
have 1:1 mapping to i915_cache_level, so these two are interchangeable. see
the use of LEGACY_CACHELEVEL.
One consequence of this change is that gen8_pte_encode is no longer working
for gen12 platforms due to the fact that gen12 platforms has different PAT
definitions. In the meantime the mtl_pte_encode introduced specfically for
MTL becomes generic for all gen12 platforms. This patch renames the MTL
PTE encode function into gen12_pte_encode and apply it to all gen12. Even
though this change looks unrelated, but separating them would temporarily
break gen12 PTE encoding, thus squash them in one patch.
Special note: this patch changes the way caching behavior is controlled in
the sense that some objects are left to be managed by userspace. For such
objects we need to be careful not to change the userspace settings.There
are kerneldoc and comments added around obj->cache_coherent, cache_dirty,
and how to bypass the checkings by i915_gem_object_has_cache_level. For
full understanding, these changes need to be looked at together with the
two follow-up patches, one disables the {set|get}_caching ioctl's and the
other adds set_pat extension to the GEM_CREATE uAPI.
Bspec: 63019
Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230509165200.1740-3-fei.yang@intel.com
2023-05-09 09:52:00 -07:00
|
|
|
unsigned int pat_index,
|
2020-01-07 13:40:09 +00:00
|
|
|
u32 flags)
|
|
|
|
{
|
|
|
|
struct i915_page_directory *pd;
|
drm/i915: use pat_index instead of cache_level
Currently the KMD is using enum i915_cache_level to set caching policy for
buffer objects. This is flaky because the PAT index which really controls
the caching behavior in PTE has far more levels than what's defined in the
enum. In addition, the PAT index is platform dependent, having to translate
between i915_cache_level and PAT index is not reliable, and makes the code
more complicated.
From UMD's perspective there is also a necessity to set caching policy for
performance fine tuning. It's much easier for the UMD to directly use PAT
index because the behavior of each PAT index is clearly defined in Bspec.
Having the abstracted i915_cache_level sitting in between would only cause
more ambiguity. PAT is expected to work much like MOCS already works today,
and by design userspace is expected to select the index that exactly
matches the desired behavior described in the hardware specification.
For these reasons this patch replaces i915_cache_level with PAT index. Also
note, the cache_level is not completely removed yet, because the KMD still
has the need of creating buffer objects with simple cache settings such as
cached, uncached, or writethrough. For kernel objects, cache_level is used
for simplicity and backward compatibility. For Pre-gen12 platforms PAT can
have 1:1 mapping to i915_cache_level, so these two are interchangeable. see
the use of LEGACY_CACHELEVEL.
One consequence of this change is that gen8_pte_encode is no longer working
for gen12 platforms due to the fact that gen12 platforms has different PAT
definitions. In the meantime the mtl_pte_encode introduced specfically for
MTL becomes generic for all gen12 platforms. This patch renames the MTL
PTE encode function into gen12_pte_encode and apply it to all gen12. Even
though this change looks unrelated, but separating them would temporarily
break gen12 PTE encoding, thus squash them in one patch.
Special note: this patch changes the way caching behavior is controlled in
the sense that some objects are left to be managed by userspace. For such
objects we need to be careful not to change the userspace settings.There
are kerneldoc and comments added around obj->cache_coherent, cache_dirty,
and how to bypass the checkings by i915_gem_object_has_cache_level. For
full understanding, these changes need to be looked at together with the
two follow-up patches, one disables the {set|get}_caching ioctl's and the
other adds set_pat extension to the GEM_CREATE uAPI.
Bspec: 63019
Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230509165200.1740-3-fei.yang@intel.com
2023-05-09 09:52:00 -07:00
|
|
|
const gen8_pte_t pte_encode = ppgtt->vm.pte_encode(0, pat_index, flags);
|
2020-01-07 13:40:09 +00:00
|
|
|
gen8_pte_t *vaddr;
|
|
|
|
|
|
|
|
pd = i915_pd_entry(pdp, gen8_pd_index(idx, 2));
|
2021-04-27 09:54:13 +01:00
|
|
|
vaddr = px_vaddr(i915_pt_entry(pd, gen8_pd_index(idx, 1)));
|
2020-01-07 13:40:09 +00:00
|
|
|
do {
|
2020-10-06 10:25:07 +01:00
|
|
|
GEM_BUG_ON(sg_dma_len(iter->sg) < I915_GTT_PAGE_SIZE);
|
2020-07-29 17:42:18 +01:00
|
|
|
vaddr[gen8_pd_index(idx, 0)] = pte_encode | iter->dma;
|
2020-01-07 13:40:09 +00:00
|
|
|
|
|
|
|
iter->dma += I915_GTT_PAGE_SIZE;
|
|
|
|
if (iter->dma >= iter->max) {
|
|
|
|
iter->sg = __sg_next(iter->sg);
|
2020-10-06 10:25:07 +01:00
|
|
|
if (!iter->sg || sg_dma_len(iter->sg) == 0) {
|
2020-01-07 13:40:09 +00:00
|
|
|
idx = 0;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
iter->dma = sg_dma_address(iter->sg);
|
2020-10-06 10:25:07 +01:00
|
|
|
iter->max = iter->dma + sg_dma_len(iter->sg);
|
2020-01-07 13:40:09 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
if (gen8_pd_index(++idx, 0) == 0) {
|
|
|
|
if (gen8_pd_index(idx, 1) == 0) {
|
|
|
|
/* Limited by sg length for 3lvl */
|
|
|
|
if (gen8_pd_index(idx, 2) == 0)
|
|
|
|
break;
|
|
|
|
|
|
|
|
pd = pdp->entry[gen8_pd_index(idx, 2)];
|
|
|
|
}
|
|
|
|
|
2022-03-21 15:38:19 -07:00
|
|
|
drm_clflush_virt_range(vaddr, PAGE_SIZE);
|
2021-04-27 09:54:13 +01:00
|
|
|
vaddr = px_vaddr(i915_pt_entry(pd, gen8_pd_index(idx, 1)));
|
2020-01-07 13:40:09 +00:00
|
|
|
}
|
|
|
|
} while (1);
|
2022-03-21 15:38:19 -07:00
|
|
|
drm_clflush_virt_range(vaddr, PAGE_SIZE);
|
2020-01-07 13:40:09 +00:00
|
|
|
|
|
|
|
return idx;
|
|
|
|
}
|
|
|
|
|
2022-02-19 00:17:44 +05:30
|
|
|
static void
|
2024-03-19 23:02:58 -07:00
|
|
|
xehp_ppgtt_insert_huge(struct i915_address_space *vm,
|
|
|
|
struct i915_vma_resource *vma_res,
|
|
|
|
struct sgt_dma *iter,
|
|
|
|
unsigned int pat_index,
|
|
|
|
u32 flags)
|
2022-02-19 00:17:44 +05:30
|
|
|
{
|
drm/i915: use pat_index instead of cache_level
Currently the KMD is using enum i915_cache_level to set caching policy for
buffer objects. This is flaky because the PAT index which really controls
the caching behavior in PTE has far more levels than what's defined in the
enum. In addition, the PAT index is platform dependent, having to translate
between i915_cache_level and PAT index is not reliable, and makes the code
more complicated.
From UMD's perspective there is also a necessity to set caching policy for
performance fine tuning. It's much easier for the UMD to directly use PAT
index because the behavior of each PAT index is clearly defined in Bspec.
Having the abstracted i915_cache_level sitting in between would only cause
more ambiguity. PAT is expected to work much like MOCS already works today,
and by design userspace is expected to select the index that exactly
matches the desired behavior described in the hardware specification.
For these reasons this patch replaces i915_cache_level with PAT index. Also
note, the cache_level is not completely removed yet, because the KMD still
has the need of creating buffer objects with simple cache settings such as
cached, uncached, or writethrough. For kernel objects, cache_level is used
for simplicity and backward compatibility. For Pre-gen12 platforms PAT can
have 1:1 mapping to i915_cache_level, so these two are interchangeable. see
the use of LEGACY_CACHELEVEL.
One consequence of this change is that gen8_pte_encode is no longer working
for gen12 platforms due to the fact that gen12 platforms has different PAT
definitions. In the meantime the mtl_pte_encode introduced specfically for
MTL becomes generic for all gen12 platforms. This patch renames the MTL
PTE encode function into gen12_pte_encode and apply it to all gen12. Even
though this change looks unrelated, but separating them would temporarily
break gen12 PTE encoding, thus squash them in one patch.
Special note: this patch changes the way caching behavior is controlled in
the sense that some objects are left to be managed by userspace. For such
objects we need to be careful not to change the userspace settings.There
are kerneldoc and comments added around obj->cache_coherent, cache_dirty,
and how to bypass the checkings by i915_gem_object_has_cache_level. For
full understanding, these changes need to be looked at together with the
two follow-up patches, one disables the {set|get}_caching ioctl's and the
other adds set_pat extension to the GEM_CREATE uAPI.
Bspec: 63019
Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230509165200.1740-3-fei.yang@intel.com
2023-05-09 09:52:00 -07:00
|
|
|
const gen8_pte_t pte_encode = vm->pte_encode(0, pat_index, flags);
|
2022-02-19 00:17:44 +05:30
|
|
|
unsigned int rem = sg_dma_len(iter->sg);
|
|
|
|
u64 start = vma_res->start;
|
drm/i915: enable PS64 support for DG2
It turns out that on production DG2/ATS HW we should have support for
PS64. This feature allows to provide a 64K TLB hint at the PTE level,
which is a lot more flexible than the current method of enabling 64K GTT
pages for the entire page-table, since that leads to all kinds of
annoying restrictions, as documented in:
commit caa574ffc4aaf4f29b890223878c63e2e7772f62
Author: Matthew Auld <matthew.auld@intel.com>
Date: Sat Feb 19 00:17:49 2022 +0530
drm/i915/uapi: document behaviour for DG2 64K support
On discrete platforms like DG2, we need to support a minimum page size
of 64K when dealing with device local-memory. This is quite tricky for
various reasons, so try to document the new implicit uapi for this.
With PS64, we can now drop the 2M GTT alignment restriction, and instead
only require 64K or larger when dealing with lmem. We still use the
compact-pt layout when possible, but only when we are certain that this
doesn't interfere with userspace.
Note that this is a change in uAPI behaviour, but hopefully shouldn't be
a concern (IGT is at least able to autodetect the alignment), since we
are only making the GTT alignment constraint less restrictive.
Based on a patch from CQ Tang.
v2: update the comment wrt scratch page
v3: (Nirmoy)
- Fix the selftest to actually use the random size, plus some comment
improvements, also drop the rem stuff.
Reported-by: Michal Mrozek <michal.mrozek@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Yang A Shi <yang.a.shi@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Acked-by: Michal Mrozek <michal.mrozek@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221004114915.221708-1-matthew.auld@intel.com
2022-10-04 12:49:14 +01:00
|
|
|
u64 end = start + vma_res->vma_size;
|
2022-02-19 00:17:44 +05:30
|
|
|
|
|
|
|
GEM_BUG_ON(!i915_vm_is_4lvl(vm));
|
|
|
|
|
|
|
|
do {
|
|
|
|
struct i915_page_directory * const pdp =
|
|
|
|
gen8_pdp_for_page_address(vm, start);
|
|
|
|
struct i915_page_directory * const pd =
|
|
|
|
i915_pd_entry(pdp, __gen8_pte_index(start, 2));
|
|
|
|
struct i915_page_table *pt =
|
|
|
|
i915_pt_entry(pd, __gen8_pte_index(start, 1));
|
|
|
|
gen8_pte_t encode = pte_encode;
|
|
|
|
unsigned int page_size;
|
|
|
|
gen8_pte_t *vaddr;
|
drm/i915: enable PS64 support for DG2
It turns out that on production DG2/ATS HW we should have support for
PS64. This feature allows to provide a 64K TLB hint at the PTE level,
which is a lot more flexible than the current method of enabling 64K GTT
pages for the entire page-table, since that leads to all kinds of
annoying restrictions, as documented in:
commit caa574ffc4aaf4f29b890223878c63e2e7772f62
Author: Matthew Auld <matthew.auld@intel.com>
Date: Sat Feb 19 00:17:49 2022 +0530
drm/i915/uapi: document behaviour for DG2 64K support
On discrete platforms like DG2, we need to support a minimum page size
of 64K when dealing with device local-memory. This is quite tricky for
various reasons, so try to document the new implicit uapi for this.
With PS64, we can now drop the 2M GTT alignment restriction, and instead
only require 64K or larger when dealing with lmem. We still use the
compact-pt layout when possible, but only when we are certain that this
doesn't interfere with userspace.
Note that this is a change in uAPI behaviour, but hopefully shouldn't be
a concern (IGT is at least able to autodetect the alignment), since we
are only making the GTT alignment constraint less restrictive.
Based on a patch from CQ Tang.
v2: update the comment wrt scratch page
v3: (Nirmoy)
- Fix the selftest to actually use the random size, plus some comment
improvements, also drop the rem stuff.
Reported-by: Michal Mrozek <michal.mrozek@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Yang A Shi <yang.a.shi@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Acked-by: Michal Mrozek <michal.mrozek@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221004114915.221708-1-matthew.auld@intel.com
2022-10-04 12:49:14 +01:00
|
|
|
u16 index, max, nent, i;
|
2022-02-19 00:17:44 +05:30
|
|
|
|
|
|
|
max = I915_PDES;
|
drm/i915: enable PS64 support for DG2
It turns out that on production DG2/ATS HW we should have support for
PS64. This feature allows to provide a 64K TLB hint at the PTE level,
which is a lot more flexible than the current method of enabling 64K GTT
pages for the entire page-table, since that leads to all kinds of
annoying restrictions, as documented in:
commit caa574ffc4aaf4f29b890223878c63e2e7772f62
Author: Matthew Auld <matthew.auld@intel.com>
Date: Sat Feb 19 00:17:49 2022 +0530
drm/i915/uapi: document behaviour for DG2 64K support
On discrete platforms like DG2, we need to support a minimum page size
of 64K when dealing with device local-memory. This is quite tricky for
various reasons, so try to document the new implicit uapi for this.
With PS64, we can now drop the 2M GTT alignment restriction, and instead
only require 64K or larger when dealing with lmem. We still use the
compact-pt layout when possible, but only when we are certain that this
doesn't interfere with userspace.
Note that this is a change in uAPI behaviour, but hopefully shouldn't be
a concern (IGT is at least able to autodetect the alignment), since we
are only making the GTT alignment constraint less restrictive.
Based on a patch from CQ Tang.
v2: update the comment wrt scratch page
v3: (Nirmoy)
- Fix the selftest to actually use the random size, plus some comment
improvements, also drop the rem stuff.
Reported-by: Michal Mrozek <michal.mrozek@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Yang A Shi <yang.a.shi@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Acked-by: Michal Mrozek <michal.mrozek@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221004114915.221708-1-matthew.auld@intel.com
2022-10-04 12:49:14 +01:00
|
|
|
nent = 1;
|
2022-02-19 00:17:44 +05:30
|
|
|
|
|
|
|
if (vma_res->bi.page_sizes.sg & I915_GTT_PAGE_SIZE_2M &&
|
|
|
|
IS_ALIGNED(iter->dma, I915_GTT_PAGE_SIZE_2M) &&
|
|
|
|
rem >= I915_GTT_PAGE_SIZE_2M &&
|
|
|
|
!__gen8_pte_index(start, 0)) {
|
|
|
|
index = __gen8_pte_index(start, 1);
|
|
|
|
encode |= GEN8_PDE_PS_2M;
|
|
|
|
page_size = I915_GTT_PAGE_SIZE_2M;
|
|
|
|
|
|
|
|
vaddr = px_vaddr(pd);
|
|
|
|
} else {
|
drm/i915: enable PS64 support for DG2
It turns out that on production DG2/ATS HW we should have support for
PS64. This feature allows to provide a 64K TLB hint at the PTE level,
which is a lot more flexible than the current method of enabling 64K GTT
pages for the entire page-table, since that leads to all kinds of
annoying restrictions, as documented in:
commit caa574ffc4aaf4f29b890223878c63e2e7772f62
Author: Matthew Auld <matthew.auld@intel.com>
Date: Sat Feb 19 00:17:49 2022 +0530
drm/i915/uapi: document behaviour for DG2 64K support
On discrete platforms like DG2, we need to support a minimum page size
of 64K when dealing with device local-memory. This is quite tricky for
various reasons, so try to document the new implicit uapi for this.
With PS64, we can now drop the 2M GTT alignment restriction, and instead
only require 64K or larger when dealing with lmem. We still use the
compact-pt layout when possible, but only when we are certain that this
doesn't interfere with userspace.
Note that this is a change in uAPI behaviour, but hopefully shouldn't be
a concern (IGT is at least able to autodetect the alignment), since we
are only making the GTT alignment constraint less restrictive.
Based on a patch from CQ Tang.
v2: update the comment wrt scratch page
v3: (Nirmoy)
- Fix the selftest to actually use the random size, plus some comment
improvements, also drop the rem stuff.
Reported-by: Michal Mrozek <michal.mrozek@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Yang A Shi <yang.a.shi@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Acked-by: Michal Mrozek <michal.mrozek@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221004114915.221708-1-matthew.auld@intel.com
2022-10-04 12:49:14 +01:00
|
|
|
index = __gen8_pte_index(start, 0);
|
|
|
|
page_size = I915_GTT_PAGE_SIZE;
|
2022-02-19 00:17:44 +05:30
|
|
|
|
drm/i915: enable PS64 support for DG2
It turns out that on production DG2/ATS HW we should have support for
PS64. This feature allows to provide a 64K TLB hint at the PTE level,
which is a lot more flexible than the current method of enabling 64K GTT
pages for the entire page-table, since that leads to all kinds of
annoying restrictions, as documented in:
commit caa574ffc4aaf4f29b890223878c63e2e7772f62
Author: Matthew Auld <matthew.auld@intel.com>
Date: Sat Feb 19 00:17:49 2022 +0530
drm/i915/uapi: document behaviour for DG2 64K support
On discrete platforms like DG2, we need to support a minimum page size
of 64K when dealing with device local-memory. This is quite tricky for
various reasons, so try to document the new implicit uapi for this.
With PS64, we can now drop the 2M GTT alignment restriction, and instead
only require 64K or larger when dealing with lmem. We still use the
compact-pt layout when possible, but only when we are certain that this
doesn't interfere with userspace.
Note that this is a change in uAPI behaviour, but hopefully shouldn't be
a concern (IGT is at least able to autodetect the alignment), since we
are only making the GTT alignment constraint less restrictive.
Based on a patch from CQ Tang.
v2: update the comment wrt scratch page
v3: (Nirmoy)
- Fix the selftest to actually use the random size, plus some comment
improvements, also drop the rem stuff.
Reported-by: Michal Mrozek <michal.mrozek@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Yang A Shi <yang.a.shi@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Acked-by: Michal Mrozek <michal.mrozek@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221004114915.221708-1-matthew.auld@intel.com
2022-10-04 12:49:14 +01:00
|
|
|
if (vma_res->bi.page_sizes.sg & I915_GTT_PAGE_SIZE_64K) {
|
|
|
|
/*
|
|
|
|
* Device local-memory on these platforms should
|
|
|
|
* always use 64K pages or larger (including GTT
|
|
|
|
* alignment), therefore if we know the whole
|
|
|
|
* page-table needs to be filled we can always
|
|
|
|
* safely use the compact-layout. Otherwise fall
|
|
|
|
* back to the TLB hint with PS64. If this is
|
|
|
|
* system memory we only bother with PS64.
|
|
|
|
*/
|
|
|
|
if ((encode & GEN12_PPGTT_PTE_LM) &&
|
|
|
|
end - start >= SZ_2M && !index) {
|
|
|
|
index = __gen8_pte_index(start, 0) / 16;
|
|
|
|
page_size = I915_GTT_PAGE_SIZE_64K;
|
|
|
|
|
|
|
|
max /= 16;
|
|
|
|
|
|
|
|
vaddr = px_vaddr(pd);
|
|
|
|
vaddr[__gen8_pte_index(start, 1)] |= GEN12_PDE_64K;
|
|
|
|
|
|
|
|
pt->is_compact = true;
|
|
|
|
} else if (IS_ALIGNED(iter->dma, I915_GTT_PAGE_SIZE_64K) &&
|
|
|
|
rem >= I915_GTT_PAGE_SIZE_64K &&
|
|
|
|
!(index % 16)) {
|
|
|
|
encode |= GEN12_PTE_PS64;
|
|
|
|
page_size = I915_GTT_PAGE_SIZE_64K;
|
|
|
|
nent = 16;
|
|
|
|
}
|
2022-02-19 00:17:44 +05:30
|
|
|
}
|
|
|
|
|
|
|
|
vaddr = px_vaddr(pt);
|
|
|
|
}
|
|
|
|
|
|
|
|
do {
|
|
|
|
GEM_BUG_ON(rem < page_size);
|
drm/i915: enable PS64 support for DG2
It turns out that on production DG2/ATS HW we should have support for
PS64. This feature allows to provide a 64K TLB hint at the PTE level,
which is a lot more flexible than the current method of enabling 64K GTT
pages for the entire page-table, since that leads to all kinds of
annoying restrictions, as documented in:
commit caa574ffc4aaf4f29b890223878c63e2e7772f62
Author: Matthew Auld <matthew.auld@intel.com>
Date: Sat Feb 19 00:17:49 2022 +0530
drm/i915/uapi: document behaviour for DG2 64K support
On discrete platforms like DG2, we need to support a minimum page size
of 64K when dealing with device local-memory. This is quite tricky for
various reasons, so try to document the new implicit uapi for this.
With PS64, we can now drop the 2M GTT alignment restriction, and instead
only require 64K or larger when dealing with lmem. We still use the
compact-pt layout when possible, but only when we are certain that this
doesn't interfere with userspace.
Note that this is a change in uAPI behaviour, but hopefully shouldn't be
a concern (IGT is at least able to autodetect the alignment), since we
are only making the GTT alignment constraint less restrictive.
Based on a patch from CQ Tang.
v2: update the comment wrt scratch page
v3: (Nirmoy)
- Fix the selftest to actually use the random size, plus some comment
improvements, also drop the rem stuff.
Reported-by: Michal Mrozek <michal.mrozek@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Yang A Shi <yang.a.shi@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Acked-by: Michal Mrozek <michal.mrozek@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221004114915.221708-1-matthew.auld@intel.com
2022-10-04 12:49:14 +01:00
|
|
|
|
|
|
|
for (i = 0; i < nent; i++) {
|
|
|
|
vaddr[index++] =
|
|
|
|
encode | (iter->dma + i *
|
|
|
|
I915_GTT_PAGE_SIZE);
|
|
|
|
}
|
2022-02-19 00:17:44 +05:30
|
|
|
|
|
|
|
start += page_size;
|
|
|
|
iter->dma += page_size;
|
|
|
|
rem -= page_size;
|
|
|
|
if (iter->dma >= iter->max) {
|
|
|
|
iter->sg = __sg_next(iter->sg);
|
|
|
|
if (!iter->sg)
|
|
|
|
break;
|
|
|
|
|
|
|
|
rem = sg_dma_len(iter->sg);
|
|
|
|
if (!rem)
|
|
|
|
break;
|
|
|
|
|
|
|
|
iter->dma = sg_dma_address(iter->sg);
|
|
|
|
iter->max = iter->dma + rem;
|
|
|
|
|
|
|
|
if (unlikely(!IS_ALIGNED(iter->dma, page_size)))
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
} while (rem >= page_size && index < max);
|
|
|
|
|
2023-04-26 23:28:49 +02:00
|
|
|
drm_clflush_virt_range(vaddr, PAGE_SIZE);
|
2022-02-19 00:17:44 +05:30
|
|
|
vma_res->page_sizes_gtt |= page_size;
|
|
|
|
} while (iter->sg && sg_dma_len(iter->sg));
|
|
|
|
}
|
|
|
|
|
2022-01-10 18:22:15 +01:00
|
|
|
static void gen8_ppgtt_insert_huge(struct i915_address_space *vm,
|
|
|
|
struct i915_vma_resource *vma_res,
|
2020-01-07 13:40:09 +00:00
|
|
|
struct sgt_dma *iter,
|
drm/i915: use pat_index instead of cache_level
Currently the KMD is using enum i915_cache_level to set caching policy for
buffer objects. This is flaky because the PAT index which really controls
the caching behavior in PTE has far more levels than what's defined in the
enum. In addition, the PAT index is platform dependent, having to translate
between i915_cache_level and PAT index is not reliable, and makes the code
more complicated.
From UMD's perspective there is also a necessity to set caching policy for
performance fine tuning. It's much easier for the UMD to directly use PAT
index because the behavior of each PAT index is clearly defined in Bspec.
Having the abstracted i915_cache_level sitting in between would only cause
more ambiguity. PAT is expected to work much like MOCS already works today,
and by design userspace is expected to select the index that exactly
matches the desired behavior described in the hardware specification.
For these reasons this patch replaces i915_cache_level with PAT index. Also
note, the cache_level is not completely removed yet, because the KMD still
has the need of creating buffer objects with simple cache settings such as
cached, uncached, or writethrough. For kernel objects, cache_level is used
for simplicity and backward compatibility. For Pre-gen12 platforms PAT can
have 1:1 mapping to i915_cache_level, so these two are interchangeable. see
the use of LEGACY_CACHELEVEL.
One consequence of this change is that gen8_pte_encode is no longer working
for gen12 platforms due to the fact that gen12 platforms has different PAT
definitions. In the meantime the mtl_pte_encode introduced specfically for
MTL becomes generic for all gen12 platforms. This patch renames the MTL
PTE encode function into gen12_pte_encode and apply it to all gen12. Even
though this change looks unrelated, but separating them would temporarily
break gen12 PTE encoding, thus squash them in one patch.
Special note: this patch changes the way caching behavior is controlled in
the sense that some objects are left to be managed by userspace. For such
objects we need to be careful not to change the userspace settings.There
are kerneldoc and comments added around obj->cache_coherent, cache_dirty,
and how to bypass the checkings by i915_gem_object_has_cache_level. For
full understanding, these changes need to be looked at together with the
two follow-up patches, one disables the {set|get}_caching ioctl's and the
other adds set_pat extension to the GEM_CREATE uAPI.
Bspec: 63019
Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230509165200.1740-3-fei.yang@intel.com
2023-05-09 09:52:00 -07:00
|
|
|
unsigned int pat_index,
|
2020-01-07 13:40:09 +00:00
|
|
|
u32 flags)
|
|
|
|
{
|
drm/i915: use pat_index instead of cache_level
Currently the KMD is using enum i915_cache_level to set caching policy for
buffer objects. This is flaky because the PAT index which really controls
the caching behavior in PTE has far more levels than what's defined in the
enum. In addition, the PAT index is platform dependent, having to translate
between i915_cache_level and PAT index is not reliable, and makes the code
more complicated.
From UMD's perspective there is also a necessity to set caching policy for
performance fine tuning. It's much easier for the UMD to directly use PAT
index because the behavior of each PAT index is clearly defined in Bspec.
Having the abstracted i915_cache_level sitting in between would only cause
more ambiguity. PAT is expected to work much like MOCS already works today,
and by design userspace is expected to select the index that exactly
matches the desired behavior described in the hardware specification.
For these reasons this patch replaces i915_cache_level with PAT index. Also
note, the cache_level is not completely removed yet, because the KMD still
has the need of creating buffer objects with simple cache settings such as
cached, uncached, or writethrough. For kernel objects, cache_level is used
for simplicity and backward compatibility. For Pre-gen12 platforms PAT can
have 1:1 mapping to i915_cache_level, so these two are interchangeable. see
the use of LEGACY_CACHELEVEL.
One consequence of this change is that gen8_pte_encode is no longer working
for gen12 platforms due to the fact that gen12 platforms has different PAT
definitions. In the meantime the mtl_pte_encode introduced specfically for
MTL becomes generic for all gen12 platforms. This patch renames the MTL
PTE encode function into gen12_pte_encode and apply it to all gen12. Even
though this change looks unrelated, but separating them would temporarily
break gen12 PTE encoding, thus squash them in one patch.
Special note: this patch changes the way caching behavior is controlled in
the sense that some objects are left to be managed by userspace. For such
objects we need to be careful not to change the userspace settings.There
are kerneldoc and comments added around obj->cache_coherent, cache_dirty,
and how to bypass the checkings by i915_gem_object_has_cache_level. For
full understanding, these changes need to be looked at together with the
two follow-up patches, one disables the {set|get}_caching ioctl's and the
other adds set_pat extension to the GEM_CREATE uAPI.
Bspec: 63019
Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230509165200.1740-3-fei.yang@intel.com
2023-05-09 09:52:00 -07:00
|
|
|
const gen8_pte_t pte_encode = vm->pte_encode(0, pat_index, flags);
|
2020-10-06 10:25:07 +01:00
|
|
|
unsigned int rem = sg_dma_len(iter->sg);
|
2022-01-10 18:22:15 +01:00
|
|
|
u64 start = vma_res->start;
|
2020-01-07 13:40:09 +00:00
|
|
|
|
2022-01-10 18:22:15 +01:00
|
|
|
GEM_BUG_ON(!i915_vm_is_4lvl(vm));
|
2020-01-07 13:40:09 +00:00
|
|
|
|
|
|
|
do {
|
|
|
|
struct i915_page_directory * const pdp =
|
2022-01-10 18:22:15 +01:00
|
|
|
gen8_pdp_for_page_address(vm, start);
|
2020-01-07 13:40:09 +00:00
|
|
|
struct i915_page_directory * const pd =
|
|
|
|
i915_pd_entry(pdp, __gen8_pte_index(start, 2));
|
|
|
|
gen8_pte_t encode = pte_encode;
|
|
|
|
unsigned int maybe_64K = -1;
|
|
|
|
unsigned int page_size;
|
|
|
|
gen8_pte_t *vaddr;
|
|
|
|
u16 index;
|
|
|
|
|
2022-01-10 18:22:15 +01:00
|
|
|
if (vma_res->bi.page_sizes.sg & I915_GTT_PAGE_SIZE_2M &&
|
2020-01-07 13:40:09 +00:00
|
|
|
IS_ALIGNED(iter->dma, I915_GTT_PAGE_SIZE_2M) &&
|
|
|
|
rem >= I915_GTT_PAGE_SIZE_2M &&
|
|
|
|
!__gen8_pte_index(start, 0)) {
|
|
|
|
index = __gen8_pte_index(start, 1);
|
|
|
|
encode |= GEN8_PDE_PS_2M;
|
|
|
|
page_size = I915_GTT_PAGE_SIZE_2M;
|
|
|
|
|
2021-04-27 09:54:13 +01:00
|
|
|
vaddr = px_vaddr(pd);
|
2020-01-07 13:40:09 +00:00
|
|
|
} else {
|
|
|
|
struct i915_page_table *pt =
|
|
|
|
i915_pt_entry(pd, __gen8_pte_index(start, 1));
|
|
|
|
|
|
|
|
index = __gen8_pte_index(start, 0);
|
|
|
|
page_size = I915_GTT_PAGE_SIZE;
|
|
|
|
|
|
|
|
if (!index &&
|
2022-01-10 18:22:15 +01:00
|
|
|
vma_res->bi.page_sizes.sg & I915_GTT_PAGE_SIZE_64K &&
|
2020-01-07 13:40:09 +00:00
|
|
|
IS_ALIGNED(iter->dma, I915_GTT_PAGE_SIZE_64K) &&
|
|
|
|
(IS_ALIGNED(rem, I915_GTT_PAGE_SIZE_64K) ||
|
|
|
|
rem >= (I915_PDES - index) * I915_GTT_PAGE_SIZE))
|
|
|
|
maybe_64K = __gen8_pte_index(start, 1);
|
|
|
|
|
2021-04-27 09:54:13 +01:00
|
|
|
vaddr = px_vaddr(pt);
|
2020-01-07 13:40:09 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
do {
|
2020-10-06 10:25:07 +01:00
|
|
|
GEM_BUG_ON(sg_dma_len(iter->sg) < page_size);
|
2020-07-29 17:42:18 +01:00
|
|
|
vaddr[index++] = encode | iter->dma;
|
2020-01-07 13:40:09 +00:00
|
|
|
|
|
|
|
start += page_size;
|
|
|
|
iter->dma += page_size;
|
|
|
|
rem -= page_size;
|
|
|
|
if (iter->dma >= iter->max) {
|
|
|
|
iter->sg = __sg_next(iter->sg);
|
|
|
|
if (!iter->sg)
|
|
|
|
break;
|
|
|
|
|
2020-10-06 10:25:07 +01:00
|
|
|
rem = sg_dma_len(iter->sg);
|
|
|
|
if (!rem)
|
|
|
|
break;
|
|
|
|
|
2020-01-07 13:40:09 +00:00
|
|
|
iter->dma = sg_dma_address(iter->sg);
|
|
|
|
iter->max = iter->dma + rem;
|
|
|
|
|
|
|
|
if (maybe_64K != -1 && index < I915_PDES &&
|
|
|
|
!(IS_ALIGNED(iter->dma, I915_GTT_PAGE_SIZE_64K) &&
|
|
|
|
(IS_ALIGNED(rem, I915_GTT_PAGE_SIZE_64K) ||
|
|
|
|
rem >= (I915_PDES - index) * I915_GTT_PAGE_SIZE)))
|
|
|
|
maybe_64K = -1;
|
|
|
|
|
|
|
|
if (unlikely(!IS_ALIGNED(iter->dma, page_size)))
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
} while (rem >= page_size && index < I915_PDES);
|
|
|
|
|
2022-03-21 15:38:19 -07:00
|
|
|
drm_clflush_virt_range(vaddr, PAGE_SIZE);
|
2020-01-07 13:40:09 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Is it safe to mark the 2M block as 64K? -- Either we have
|
|
|
|
* filled whole page-table with 64K entries, or filled part of
|
|
|
|
* it and have reached the end of the sg table and we have
|
|
|
|
* enough padding.
|
|
|
|
*/
|
|
|
|
if (maybe_64K != -1 &&
|
|
|
|
(index == I915_PDES ||
|
2022-01-10 18:22:15 +01:00
|
|
|
(i915_vm_has_scratch_64K(vm) &&
|
|
|
|
!iter->sg && IS_ALIGNED(vma_res->start +
|
|
|
|
vma_res->node_size,
|
2020-01-07 13:40:09 +00:00
|
|
|
I915_GTT_PAGE_SIZE_2M)))) {
|
2021-04-27 09:54:13 +01:00
|
|
|
vaddr = px_vaddr(pd);
|
2020-01-07 13:40:09 +00:00
|
|
|
vaddr[maybe_64K] |= GEN8_PDE_IPS_64K;
|
2022-03-21 15:38:19 -07:00
|
|
|
drm_clflush_virt_range(vaddr, PAGE_SIZE);
|
2020-01-07 13:40:09 +00:00
|
|
|
page_size = I915_GTT_PAGE_SIZE_64K;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We write all 4K page entries, even when using 64K
|
|
|
|
* pages. In order to verify that the HW isn't cheating
|
|
|
|
* by using the 4K PTE instead of the 64K PTE, we want
|
|
|
|
* to remove all the surplus entries. If the HW skipped
|
|
|
|
* the 64K PTE, it will read/write into the scratch page
|
|
|
|
* instead - which we detect as missing results during
|
|
|
|
* selftests.
|
|
|
|
*/
|
2022-01-10 18:22:15 +01:00
|
|
|
if (I915_SELFTEST_ONLY(vm->scrub_64K)) {
|
2020-01-07 13:40:09 +00:00
|
|
|
u16 i;
|
|
|
|
|
2022-01-10 18:22:15 +01:00
|
|
|
encode = vm->scratch[0]->encode;
|
2021-04-27 09:54:13 +01:00
|
|
|
vaddr = px_vaddr(i915_pt_entry(pd, maybe_64K));
|
2020-01-07 13:40:09 +00:00
|
|
|
|
|
|
|
for (i = 1; i < index; i += 16)
|
|
|
|
memset64(vaddr + i, encode, 15);
|
|
|
|
|
2022-03-21 15:38:19 -07:00
|
|
|
drm_clflush_virt_range(vaddr, PAGE_SIZE);
|
2020-01-07 13:40:09 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2022-01-10 18:22:15 +01:00
|
|
|
vma_res->page_sizes_gtt |= page_size;
|
2020-10-06 10:25:07 +01:00
|
|
|
} while (iter->sg && sg_dma_len(iter->sg));
|
2020-01-07 13:40:09 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static void gen8_ppgtt_insert(struct i915_address_space *vm,
|
2022-01-10 18:22:15 +01:00
|
|
|
struct i915_vma_resource *vma_res,
|
drm/i915: use pat_index instead of cache_level
Currently the KMD is using enum i915_cache_level to set caching policy for
buffer objects. This is flaky because the PAT index which really controls
the caching behavior in PTE has far more levels than what's defined in the
enum. In addition, the PAT index is platform dependent, having to translate
between i915_cache_level and PAT index is not reliable, and makes the code
more complicated.
From UMD's perspective there is also a necessity to set caching policy for
performance fine tuning. It's much easier for the UMD to directly use PAT
index because the behavior of each PAT index is clearly defined in Bspec.
Having the abstracted i915_cache_level sitting in between would only cause
more ambiguity. PAT is expected to work much like MOCS already works today,
and by design userspace is expected to select the index that exactly
matches the desired behavior described in the hardware specification.
For these reasons this patch replaces i915_cache_level with PAT index. Also
note, the cache_level is not completely removed yet, because the KMD still
has the need of creating buffer objects with simple cache settings such as
cached, uncached, or writethrough. For kernel objects, cache_level is used
for simplicity and backward compatibility. For Pre-gen12 platforms PAT can
have 1:1 mapping to i915_cache_level, so these two are interchangeable. see
the use of LEGACY_CACHELEVEL.
One consequence of this change is that gen8_pte_encode is no longer working
for gen12 platforms due to the fact that gen12 platforms has different PAT
definitions. In the meantime the mtl_pte_encode introduced specfically for
MTL becomes generic for all gen12 platforms. This patch renames the MTL
PTE encode function into gen12_pte_encode and apply it to all gen12. Even
though this change looks unrelated, but separating them would temporarily
break gen12 PTE encoding, thus squash them in one patch.
Special note: this patch changes the way caching behavior is controlled in
the sense that some objects are left to be managed by userspace. For such
objects we need to be careful not to change the userspace settings.There
are kerneldoc and comments added around obj->cache_coherent, cache_dirty,
and how to bypass the checkings by i915_gem_object_has_cache_level. For
full understanding, these changes need to be looked at together with the
two follow-up patches, one disables the {set|get}_caching ioctl's and the
other adds set_pat extension to the GEM_CREATE uAPI.
Bspec: 63019
Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230509165200.1740-3-fei.yang@intel.com
2023-05-09 09:52:00 -07:00
|
|
|
unsigned int pat_index,
|
2020-01-07 13:40:09 +00:00
|
|
|
u32 flags)
|
|
|
|
{
|
|
|
|
struct i915_ppgtt * const ppgtt = i915_vm_to_ppgtt(vm);
|
2022-01-10 18:22:15 +01:00
|
|
|
struct sgt_dma iter = sgt_dma(vma_res);
|
2020-01-07 13:40:09 +00:00
|
|
|
|
2022-01-10 18:22:15 +01:00
|
|
|
if (vma_res->bi.page_sizes.sg > I915_GTT_PAGE_SIZE) {
|
2024-03-19 23:02:58 -07:00
|
|
|
if (GRAPHICS_VER_FULL(vm->i915) >= IP_VER(12, 55))
|
|
|
|
xehp_ppgtt_insert_huge(vm, vma_res, &iter, pat_index, flags);
|
2022-02-19 00:17:44 +05:30
|
|
|
else
|
drm/i915: use pat_index instead of cache_level
Currently the KMD is using enum i915_cache_level to set caching policy for
buffer objects. This is flaky because the PAT index which really controls
the caching behavior in PTE has far more levels than what's defined in the
enum. In addition, the PAT index is platform dependent, having to translate
between i915_cache_level and PAT index is not reliable, and makes the code
more complicated.
From UMD's perspective there is also a necessity to set caching policy for
performance fine tuning. It's much easier for the UMD to directly use PAT
index because the behavior of each PAT index is clearly defined in Bspec.
Having the abstracted i915_cache_level sitting in between would only cause
more ambiguity. PAT is expected to work much like MOCS already works today,
and by design userspace is expected to select the index that exactly
matches the desired behavior described in the hardware specification.
For these reasons this patch replaces i915_cache_level with PAT index. Also
note, the cache_level is not completely removed yet, because the KMD still
has the need of creating buffer objects with simple cache settings such as
cached, uncached, or writethrough. For kernel objects, cache_level is used
for simplicity and backward compatibility. For Pre-gen12 platforms PAT can
have 1:1 mapping to i915_cache_level, so these two are interchangeable. see
the use of LEGACY_CACHELEVEL.
One consequence of this change is that gen8_pte_encode is no longer working
for gen12 platforms due to the fact that gen12 platforms has different PAT
definitions. In the meantime the mtl_pte_encode introduced specfically for
MTL becomes generic for all gen12 platforms. This patch renames the MTL
PTE encode function into gen12_pte_encode and apply it to all gen12. Even
though this change looks unrelated, but separating them would temporarily
break gen12 PTE encoding, thus squash them in one patch.
Special note: this patch changes the way caching behavior is controlled in
the sense that some objects are left to be managed by userspace. For such
objects we need to be careful not to change the userspace settings.There
are kerneldoc and comments added around obj->cache_coherent, cache_dirty,
and how to bypass the checkings by i915_gem_object_has_cache_level. For
full understanding, these changes need to be looked at together with the
two follow-up patches, one disables the {set|get}_caching ioctl's and the
other adds set_pat extension to the GEM_CREATE uAPI.
Bspec: 63019
Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230509165200.1740-3-fei.yang@intel.com
2023-05-09 09:52:00 -07:00
|
|
|
gen8_ppgtt_insert_huge(vm, vma_res, &iter, pat_index, flags);
|
2020-01-07 13:40:09 +00:00
|
|
|
} else {
|
2022-01-10 18:22:15 +01:00
|
|
|
u64 idx = vma_res->start >> GEN8_PTE_SHIFT;
|
2020-01-07 13:40:09 +00:00
|
|
|
|
|
|
|
do {
|
|
|
|
struct i915_page_directory * const pdp =
|
|
|
|
gen8_pdp_for_page_index(vm, idx);
|
|
|
|
|
|
|
|
idx = gen8_ppgtt_insert_pte(ppgtt, pdp, &iter, idx,
|
drm/i915: use pat_index instead of cache_level
Currently the KMD is using enum i915_cache_level to set caching policy for
buffer objects. This is flaky because the PAT index which really controls
the caching behavior in PTE has far more levels than what's defined in the
enum. In addition, the PAT index is platform dependent, having to translate
between i915_cache_level and PAT index is not reliable, and makes the code
more complicated.
From UMD's perspective there is also a necessity to set caching policy for
performance fine tuning. It's much easier for the UMD to directly use PAT
index because the behavior of each PAT index is clearly defined in Bspec.
Having the abstracted i915_cache_level sitting in between would only cause
more ambiguity. PAT is expected to work much like MOCS already works today,
and by design userspace is expected to select the index that exactly
matches the desired behavior described in the hardware specification.
For these reasons this patch replaces i915_cache_level with PAT index. Also
note, the cache_level is not completely removed yet, because the KMD still
has the need of creating buffer objects with simple cache settings such as
cached, uncached, or writethrough. For kernel objects, cache_level is used
for simplicity and backward compatibility. For Pre-gen12 platforms PAT can
have 1:1 mapping to i915_cache_level, so these two are interchangeable. see
the use of LEGACY_CACHELEVEL.
One consequence of this change is that gen8_pte_encode is no longer working
for gen12 platforms due to the fact that gen12 platforms has different PAT
definitions. In the meantime the mtl_pte_encode introduced specfically for
MTL becomes generic for all gen12 platforms. This patch renames the MTL
PTE encode function into gen12_pte_encode and apply it to all gen12. Even
though this change looks unrelated, but separating them would temporarily
break gen12 PTE encoding, thus squash them in one patch.
Special note: this patch changes the way caching behavior is controlled in
the sense that some objects are left to be managed by userspace. For such
objects we need to be careful not to change the userspace settings.There
are kerneldoc and comments added around obj->cache_coherent, cache_dirty,
and how to bypass the checkings by i915_gem_object_has_cache_level. For
full understanding, these changes need to be looked at together with the
two follow-up patches, one disables the {set|get}_caching ioctl's and the
other adds set_pat extension to the GEM_CREATE uAPI.
Bspec: 63019
Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230509165200.1740-3-fei.yang@intel.com
2023-05-09 09:52:00 -07:00
|
|
|
pat_index, flags);
|
2020-01-07 13:40:09 +00:00
|
|
|
} while (idx);
|
|
|
|
|
2022-01-10 18:22:15 +01:00
|
|
|
vma_res->page_sizes_gtt = I915_GTT_PAGE_SIZE;
|
2020-01-07 13:40:09 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2021-06-17 08:30:10 +02:00
|
|
|
static void gen8_ppgtt_insert_entry(struct i915_address_space *vm,
|
|
|
|
dma_addr_t addr,
|
|
|
|
u64 offset,
|
drm/i915: use pat_index instead of cache_level
Currently the KMD is using enum i915_cache_level to set caching policy for
buffer objects. This is flaky because the PAT index which really controls
the caching behavior in PTE has far more levels than what's defined in the
enum. In addition, the PAT index is platform dependent, having to translate
between i915_cache_level and PAT index is not reliable, and makes the code
more complicated.
From UMD's perspective there is also a necessity to set caching policy for
performance fine tuning. It's much easier for the UMD to directly use PAT
index because the behavior of each PAT index is clearly defined in Bspec.
Having the abstracted i915_cache_level sitting in between would only cause
more ambiguity. PAT is expected to work much like MOCS already works today,
and by design userspace is expected to select the index that exactly
matches the desired behavior described in the hardware specification.
For these reasons this patch replaces i915_cache_level with PAT index. Also
note, the cache_level is not completely removed yet, because the KMD still
has the need of creating buffer objects with simple cache settings such as
cached, uncached, or writethrough. For kernel objects, cache_level is used
for simplicity and backward compatibility. For Pre-gen12 platforms PAT can
have 1:1 mapping to i915_cache_level, so these two are interchangeable. see
the use of LEGACY_CACHELEVEL.
One consequence of this change is that gen8_pte_encode is no longer working
for gen12 platforms due to the fact that gen12 platforms has different PAT
definitions. In the meantime the mtl_pte_encode introduced specfically for
MTL becomes generic for all gen12 platforms. This patch renames the MTL
PTE encode function into gen12_pte_encode and apply it to all gen12. Even
though this change looks unrelated, but separating them would temporarily
break gen12 PTE encoding, thus squash them in one patch.
Special note: this patch changes the way caching behavior is controlled in
the sense that some objects are left to be managed by userspace. For such
objects we need to be careful not to change the userspace settings.There
are kerneldoc and comments added around obj->cache_coherent, cache_dirty,
and how to bypass the checkings by i915_gem_object_has_cache_level. For
full understanding, these changes need to be looked at together with the
two follow-up patches, one disables the {set|get}_caching ioctl's and the
other adds set_pat extension to the GEM_CREATE uAPI.
Bspec: 63019
Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230509165200.1740-3-fei.yang@intel.com
2023-05-09 09:52:00 -07:00
|
|
|
unsigned int pat_index,
|
2021-06-17 08:30:10 +02:00
|
|
|
u32 flags)
|
|
|
|
{
|
|
|
|
u64 idx = offset >> GEN8_PTE_SHIFT;
|
|
|
|
struct i915_page_directory * const pdp =
|
|
|
|
gen8_pdp_for_page_index(vm, idx);
|
|
|
|
struct i915_page_directory *pd =
|
|
|
|
i915_pd_entry(pdp, gen8_pd_index(idx, 2));
|
2022-02-19 00:17:47 +05:30
|
|
|
struct i915_page_table *pt = i915_pt_entry(pd, gen8_pd_index(idx, 1));
|
2021-06-17 08:30:10 +02:00
|
|
|
gen8_pte_t *vaddr;
|
|
|
|
|
2022-02-19 00:17:47 +05:30
|
|
|
GEM_BUG_ON(pt->is_compact);
|
|
|
|
|
|
|
|
vaddr = px_vaddr(pt);
|
drm/i915: use pat_index instead of cache_level
Currently the KMD is using enum i915_cache_level to set caching policy for
buffer objects. This is flaky because the PAT index which really controls
the caching behavior in PTE has far more levels than what's defined in the
enum. In addition, the PAT index is platform dependent, having to translate
between i915_cache_level and PAT index is not reliable, and makes the code
more complicated.
From UMD's perspective there is also a necessity to set caching policy for
performance fine tuning. It's much easier for the UMD to directly use PAT
index because the behavior of each PAT index is clearly defined in Bspec.
Having the abstracted i915_cache_level sitting in between would only cause
more ambiguity. PAT is expected to work much like MOCS already works today,
and by design userspace is expected to select the index that exactly
matches the desired behavior described in the hardware specification.
For these reasons this patch replaces i915_cache_level with PAT index. Also
note, the cache_level is not completely removed yet, because the KMD still
has the need of creating buffer objects with simple cache settings such as
cached, uncached, or writethrough. For kernel objects, cache_level is used
for simplicity and backward compatibility. For Pre-gen12 platforms PAT can
have 1:1 mapping to i915_cache_level, so these two are interchangeable. see
the use of LEGACY_CACHELEVEL.
One consequence of this change is that gen8_pte_encode is no longer working
for gen12 platforms due to the fact that gen12 platforms has different PAT
definitions. In the meantime the mtl_pte_encode introduced specfically for
MTL becomes generic for all gen12 platforms. This patch renames the MTL
PTE encode function into gen12_pte_encode and apply it to all gen12. Even
though this change looks unrelated, but separating them would temporarily
break gen12 PTE encoding, thus squash them in one patch.
Special note: this patch changes the way caching behavior is controlled in
the sense that some objects are left to be managed by userspace. For such
objects we need to be careful not to change the userspace settings.There
are kerneldoc and comments added around obj->cache_coherent, cache_dirty,
and how to bypass the checkings by i915_gem_object_has_cache_level. For
full understanding, these changes need to be looked at together with the
two follow-up patches, one disables the {set|get}_caching ioctl's and the
other adds set_pat extension to the GEM_CREATE uAPI.
Bspec: 63019
Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230509165200.1740-3-fei.yang@intel.com
2023-05-09 09:52:00 -07:00
|
|
|
vaddr[gen8_pd_index(idx, 0)] = vm->pte_encode(addr, pat_index, flags);
|
2022-03-21 15:38:19 -07:00
|
|
|
drm_clflush_virt_range(&vaddr[gen8_pd_index(idx, 0)], sizeof(*vaddr));
|
2021-06-17 08:30:10 +02:00
|
|
|
}
|
|
|
|
|
2024-03-19 23:02:58 -07:00
|
|
|
static void xehp_ppgtt_insert_entry_lm(struct i915_address_space *vm,
|
|
|
|
dma_addr_t addr,
|
|
|
|
u64 offset,
|
|
|
|
unsigned int pat_index,
|
|
|
|
u32 flags)
|
2022-02-19 00:17:47 +05:30
|
|
|
{
|
|
|
|
u64 idx = offset >> GEN8_PTE_SHIFT;
|
|
|
|
struct i915_page_directory * const pdp =
|
|
|
|
gen8_pdp_for_page_index(vm, idx);
|
|
|
|
struct i915_page_directory *pd =
|
|
|
|
i915_pd_entry(pdp, gen8_pd_index(idx, 2));
|
|
|
|
struct i915_page_table *pt = i915_pt_entry(pd, gen8_pd_index(idx, 1));
|
|
|
|
gen8_pte_t *vaddr;
|
|
|
|
|
|
|
|
GEM_BUG_ON(!IS_ALIGNED(addr, SZ_64K));
|
|
|
|
GEM_BUG_ON(!IS_ALIGNED(offset, SZ_64K));
|
|
|
|
|
drm/i915: enable PS64 support for DG2
It turns out that on production DG2/ATS HW we should have support for
PS64. This feature allows to provide a 64K TLB hint at the PTE level,
which is a lot more flexible than the current method of enabling 64K GTT
pages for the entire page-table, since that leads to all kinds of
annoying restrictions, as documented in:
commit caa574ffc4aaf4f29b890223878c63e2e7772f62
Author: Matthew Auld <matthew.auld@intel.com>
Date: Sat Feb 19 00:17:49 2022 +0530
drm/i915/uapi: document behaviour for DG2 64K support
On discrete platforms like DG2, we need to support a minimum page size
of 64K when dealing with device local-memory. This is quite tricky for
various reasons, so try to document the new implicit uapi for this.
With PS64, we can now drop the 2M GTT alignment restriction, and instead
only require 64K or larger when dealing with lmem. We still use the
compact-pt layout when possible, but only when we are certain that this
doesn't interfere with userspace.
Note that this is a change in uAPI behaviour, but hopefully shouldn't be
a concern (IGT is at least able to autodetect the alignment), since we
are only making the GTT alignment constraint less restrictive.
Based on a patch from CQ Tang.
v2: update the comment wrt scratch page
v3: (Nirmoy)
- Fix the selftest to actually use the random size, plus some comment
improvements, also drop the rem stuff.
Reported-by: Michal Mrozek <michal.mrozek@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Yang A Shi <yang.a.shi@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Acked-by: Michal Mrozek <michal.mrozek@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221004114915.221708-1-matthew.auld@intel.com
2022-10-04 12:49:14 +01:00
|
|
|
/* XXX: we don't strictly need to use this layout */
|
|
|
|
|
2022-02-19 00:17:47 +05:30
|
|
|
if (!pt->is_compact) {
|
|
|
|
vaddr = px_vaddr(pd);
|
|
|
|
vaddr[gen8_pd_index(idx, 1)] |= GEN12_PDE_64K;
|
|
|
|
pt->is_compact = true;
|
|
|
|
}
|
|
|
|
|
|
|
|
vaddr = px_vaddr(pt);
|
drm/i915: use pat_index instead of cache_level
Currently the KMD is using enum i915_cache_level to set caching policy for
buffer objects. This is flaky because the PAT index which really controls
the caching behavior in PTE has far more levels than what's defined in the
enum. In addition, the PAT index is platform dependent, having to translate
between i915_cache_level and PAT index is not reliable, and makes the code
more complicated.
From UMD's perspective there is also a necessity to set caching policy for
performance fine tuning. It's much easier for the UMD to directly use PAT
index because the behavior of each PAT index is clearly defined in Bspec.
Having the abstracted i915_cache_level sitting in between would only cause
more ambiguity. PAT is expected to work much like MOCS already works today,
and by design userspace is expected to select the index that exactly
matches the desired behavior described in the hardware specification.
For these reasons this patch replaces i915_cache_level with PAT index. Also
note, the cache_level is not completely removed yet, because the KMD still
has the need of creating buffer objects with simple cache settings such as
cached, uncached, or writethrough. For kernel objects, cache_level is used
for simplicity and backward compatibility. For Pre-gen12 platforms PAT can
have 1:1 mapping to i915_cache_level, so these two are interchangeable. see
the use of LEGACY_CACHELEVEL.
One consequence of this change is that gen8_pte_encode is no longer working
for gen12 platforms due to the fact that gen12 platforms has different PAT
definitions. In the meantime the mtl_pte_encode introduced specfically for
MTL becomes generic for all gen12 platforms. This patch renames the MTL
PTE encode function into gen12_pte_encode and apply it to all gen12. Even
though this change looks unrelated, but separating them would temporarily
break gen12 PTE encoding, thus squash them in one patch.
Special note: this patch changes the way caching behavior is controlled in
the sense that some objects are left to be managed by userspace. For such
objects we need to be careful not to change the userspace settings.There
are kerneldoc and comments added around obj->cache_coherent, cache_dirty,
and how to bypass the checkings by i915_gem_object_has_cache_level. For
full understanding, these changes need to be looked at together with the
two follow-up patches, one disables the {set|get}_caching ioctl's and the
other adds set_pat extension to the GEM_CREATE uAPI.
Bspec: 63019
Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230509165200.1740-3-fei.yang@intel.com
2023-05-09 09:52:00 -07:00
|
|
|
vaddr[gen8_pd_index(idx, 0) / 16] = vm->pte_encode(addr, pat_index, flags);
|
2022-02-19 00:17:47 +05:30
|
|
|
}
|
|
|
|
|
2024-03-19 23:02:58 -07:00
|
|
|
static void xehp_ppgtt_insert_entry(struct i915_address_space *vm,
|
|
|
|
dma_addr_t addr,
|
|
|
|
u64 offset,
|
|
|
|
unsigned int pat_index,
|
|
|
|
u32 flags)
|
2022-02-19 00:17:47 +05:30
|
|
|
{
|
|
|
|
if (flags & PTE_LM)
|
2024-03-19 23:02:58 -07:00
|
|
|
return xehp_ppgtt_insert_entry_lm(vm, addr, offset,
|
|
|
|
pat_index, flags);
|
2022-02-19 00:17:47 +05:30
|
|
|
|
drm/i915: use pat_index instead of cache_level
Currently the KMD is using enum i915_cache_level to set caching policy for
buffer objects. This is flaky because the PAT index which really controls
the caching behavior in PTE has far more levels than what's defined in the
enum. In addition, the PAT index is platform dependent, having to translate
between i915_cache_level and PAT index is not reliable, and makes the code
more complicated.
From UMD's perspective there is also a necessity to set caching policy for
performance fine tuning. It's much easier for the UMD to directly use PAT
index because the behavior of each PAT index is clearly defined in Bspec.
Having the abstracted i915_cache_level sitting in between would only cause
more ambiguity. PAT is expected to work much like MOCS already works today,
and by design userspace is expected to select the index that exactly
matches the desired behavior described in the hardware specification.
For these reasons this patch replaces i915_cache_level with PAT index. Also
note, the cache_level is not completely removed yet, because the KMD still
has the need of creating buffer objects with simple cache settings such as
cached, uncached, or writethrough. For kernel objects, cache_level is used
for simplicity and backward compatibility. For Pre-gen12 platforms PAT can
have 1:1 mapping to i915_cache_level, so these two are interchangeable. see
the use of LEGACY_CACHELEVEL.
One consequence of this change is that gen8_pte_encode is no longer working
for gen12 platforms due to the fact that gen12 platforms has different PAT
definitions. In the meantime the mtl_pte_encode introduced specfically for
MTL becomes generic for all gen12 platforms. This patch renames the MTL
PTE encode function into gen12_pte_encode and apply it to all gen12. Even
though this change looks unrelated, but separating them would temporarily
break gen12 PTE encoding, thus squash them in one patch.
Special note: this patch changes the way caching behavior is controlled in
the sense that some objects are left to be managed by userspace. For such
objects we need to be careful not to change the userspace settings.There
are kerneldoc and comments added around obj->cache_coherent, cache_dirty,
and how to bypass the checkings by i915_gem_object_has_cache_level. For
full understanding, these changes need to be looked at together with the
two follow-up patches, one disables the {set|get}_caching ioctl's and the
other adds set_pat extension to the GEM_CREATE uAPI.
Bspec: 63019
Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230509165200.1740-3-fei.yang@intel.com
2023-05-09 09:52:00 -07:00
|
|
|
return gen8_ppgtt_insert_entry(vm, addr, offset, pat_index, flags);
|
2022-02-19 00:17:47 +05:30
|
|
|
}
|
|
|
|
|
2020-01-07 13:40:09 +00:00
|
|
|
static int gen8_init_scratch(struct i915_address_space *vm)
|
|
|
|
{
|
2021-02-03 17:12:30 +00:00
|
|
|
u32 pte_flags;
|
2020-01-07 13:40:09 +00:00
|
|
|
int ret;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If everybody agrees to not to write into the scratch page,
|
|
|
|
* we can reuse it for all vm, keeping contexts and processes separate.
|
|
|
|
*/
|
|
|
|
if (vm->has_read_only && vm->gt->vm && !i915_is_ggtt(vm->gt->vm)) {
|
|
|
|
struct i915_address_space *clone = vm->gt->vm;
|
|
|
|
|
|
|
|
GEM_BUG_ON(!clone->has_read_only);
|
|
|
|
|
|
|
|
vm->scratch_order = clone->scratch_order;
|
2020-07-29 17:42:18 +01:00
|
|
|
for (i = 0; i <= vm->top; i++)
|
|
|
|
vm->scratch[i] = i915_gem_object_get(clone->scratch[i]);
|
|
|
|
|
2020-01-07 13:40:09 +00:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2020-07-29 17:42:18 +01:00
|
|
|
ret = setup_scratch_page(vm);
|
2020-01-07 13:40:09 +00:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
2021-02-03 17:12:30 +00:00
|
|
|
pte_flags = vm->has_read_only;
|
|
|
|
if (i915_gem_object_is_lmem(vm->scratch[0]))
|
|
|
|
pte_flags |= PTE_LM;
|
|
|
|
|
2020-07-29 17:42:18 +01:00
|
|
|
vm->scratch[0]->encode =
|
2023-04-24 11:29:01 -07:00
|
|
|
vm->pte_encode(px_dma(vm->scratch[0]),
|
drm/i915: use pat_index instead of cache_level
Currently the KMD is using enum i915_cache_level to set caching policy for
buffer objects. This is flaky because the PAT index which really controls
the caching behavior in PTE has far more levels than what's defined in the
enum. In addition, the PAT index is platform dependent, having to translate
between i915_cache_level and PAT index is not reliable, and makes the code
more complicated.
From UMD's perspective there is also a necessity to set caching policy for
performance fine tuning. It's much easier for the UMD to directly use PAT
index because the behavior of each PAT index is clearly defined in Bspec.
Having the abstracted i915_cache_level sitting in between would only cause
more ambiguity. PAT is expected to work much like MOCS already works today,
and by design userspace is expected to select the index that exactly
matches the desired behavior described in the hardware specification.
For these reasons this patch replaces i915_cache_level with PAT index. Also
note, the cache_level is not completely removed yet, because the KMD still
has the need of creating buffer objects with simple cache settings such as
cached, uncached, or writethrough. For kernel objects, cache_level is used
for simplicity and backward compatibility. For Pre-gen12 platforms PAT can
have 1:1 mapping to i915_cache_level, so these two are interchangeable. see
the use of LEGACY_CACHELEVEL.
One consequence of this change is that gen8_pte_encode is no longer working
for gen12 platforms due to the fact that gen12 platforms has different PAT
definitions. In the meantime the mtl_pte_encode introduced specfically for
MTL becomes generic for all gen12 platforms. This patch renames the MTL
PTE encode function into gen12_pte_encode and apply it to all gen12. Even
though this change looks unrelated, but separating them would temporarily
break gen12 PTE encoding, thus squash them in one patch.
Special note: this patch changes the way caching behavior is controlled in
the sense that some objects are left to be managed by userspace. For such
objects we need to be careful not to change the userspace settings.There
are kerneldoc and comments added around obj->cache_coherent, cache_dirty,
and how to bypass the checkings by i915_gem_object_has_cache_level. For
full understanding, these changes need to be looked at together with the
two follow-up patches, one disables the {set|get}_caching ioctl's and the
other adds set_pat extension to the GEM_CREATE uAPI.
Bspec: 63019
Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230509165200.1740-3-fei.yang@intel.com
2023-05-09 09:52:00 -07:00
|
|
|
i915_gem_get_pat_index(vm->i915,
|
|
|
|
I915_CACHE_NONE),
|
|
|
|
pte_flags);
|
2020-01-07 13:40:09 +00:00
|
|
|
|
|
|
|
for (i = 1; i <= vm->top; i++) {
|
2020-07-29 17:42:18 +01:00
|
|
|
struct drm_i915_gem_object *obj;
|
|
|
|
|
|
|
|
obj = vm->alloc_pt_dma(vm, I915_GTT_PAGE_SIZE_4K);
|
2022-09-26 16:33:33 +01:00
|
|
|
if (IS_ERR(obj)) {
|
|
|
|
ret = PTR_ERR(obj);
|
2020-01-07 13:40:09 +00:00
|
|
|
goto free_scratch;
|
2022-09-26 16:33:33 +01:00
|
|
|
}
|
2020-01-07 13:40:09 +00:00
|
|
|
|
2021-04-27 09:54:13 +01:00
|
|
|
ret = map_pt_dma(vm, obj);
|
2020-07-29 17:42:18 +01:00
|
|
|
if (ret) {
|
|
|
|
i915_gem_object_put(obj);
|
|
|
|
goto free_scratch;
|
|
|
|
}
|
|
|
|
|
|
|
|
fill_px(obj, vm->scratch[i - 1]->encode);
|
2021-10-28 10:26:38 +01:00
|
|
|
obj->encode = gen8_pde_encode(px_dma(obj), I915_CACHE_NONE);
|
2020-07-29 17:42:18 +01:00
|
|
|
|
|
|
|
vm->scratch[i] = obj;
|
2020-01-07 13:40:09 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
free_scratch:
|
2020-10-19 09:34:44 +01:00
|
|
|
while (i--)
|
|
|
|
i915_gem_object_put(vm->scratch[i]);
|
2022-09-26 16:33:33 +01:00
|
|
|
vm->scratch[0] = NULL;
|
|
|
|
return ret;
|
2020-01-07 13:40:09 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static int gen8_preallocate_top_level_pdp(struct i915_ppgtt *ppgtt)
|
|
|
|
{
|
|
|
|
struct i915_address_space *vm = &ppgtt->vm;
|
|
|
|
struct i915_page_directory *pd = ppgtt->pd;
|
|
|
|
unsigned int idx;
|
|
|
|
|
|
|
|
GEM_BUG_ON(vm->top != 2);
|
|
|
|
GEM_BUG_ON(gen8_pd_top_count(vm) != GEN8_3LVL_PDPES);
|
|
|
|
|
|
|
|
for (idx = 0; idx < GEN8_3LVL_PDPES; idx++) {
|
|
|
|
struct i915_page_directory *pde;
|
2020-07-29 17:42:18 +01:00
|
|
|
int err;
|
2020-01-07 13:40:09 +00:00
|
|
|
|
|
|
|
pde = alloc_pd(vm);
|
|
|
|
if (IS_ERR(pde))
|
|
|
|
return PTR_ERR(pde);
|
|
|
|
|
2021-04-27 09:54:13 +01:00
|
|
|
err = map_pt_dma(vm, pde->pt.base);
|
2020-07-29 17:42:18 +01:00
|
|
|
if (err) {
|
drm/i915/gt: Shrink i915_page_directory's slab bucket
kmalloc uses power-of-two slab buckets for small allocations (up to a
few pages). Since i915_page_directory is a page of pointers, plus a
couple more, this is rounded up to 8K, and we waste nearly 50% of that
allocation. Long terms this leads to poor memory utilisation, bloating
the kernel footprint, but the problem is exacerbated by our conservative
preallocation scheme for binding VMA. As we are required to allocate all
levels for each vma just in case we need to insert them upon binding,
this leads to a large multiplication factor for a single page vma. By
halving the allocation we need for the page directory structure, we
halve the impact of that factor, bringing workloads that once fitted into
memory, hopefully back to fitting into memory.
We maintain the split between i915_page_directory and i915_page_table as
we only need half the allocation for the lowest, most populous, level.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200729164219.5737-3-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-07-29 17:42:19 +01:00
|
|
|
free_pd(vm, pde);
|
2020-07-29 17:42:18 +01:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
fill_px(pde, vm->scratch[1]->encode);
|
2020-01-07 13:40:09 +00:00
|
|
|
set_pd_entry(pd, idx, pde);
|
|
|
|
atomic_inc(px_used(pde)); /* keep pinned */
|
|
|
|
}
|
|
|
|
wmb();
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct i915_page_directory *
|
|
|
|
gen8_alloc_top_pd(struct i915_address_space *vm)
|
|
|
|
{
|
|
|
|
const unsigned int count = gen8_pd_top_count(vm);
|
|
|
|
struct i915_page_directory *pd;
|
2020-07-29 17:42:18 +01:00
|
|
|
int err;
|
2020-01-07 13:40:09 +00:00
|
|
|
|
drm/i915/gt: Shrink i915_page_directory's slab bucket
kmalloc uses power-of-two slab buckets for small allocations (up to a
few pages). Since i915_page_directory is a page of pointers, plus a
couple more, this is rounded up to 8K, and we waste nearly 50% of that
allocation. Long terms this leads to poor memory utilisation, bloating
the kernel footprint, but the problem is exacerbated by our conservative
preallocation scheme for binding VMA. As we are required to allocate all
levels for each vma just in case we need to insert them upon binding,
this leads to a large multiplication factor for a single page vma. By
halving the allocation we need for the page directory structure, we
halve the impact of that factor, bringing workloads that once fitted into
memory, hopefully back to fitting into memory.
We maintain the split between i915_page_directory and i915_page_table as
we only need half the allocation for the lowest, most populous, level.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200729164219.5737-3-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-07-29 17:42:19 +01:00
|
|
|
GEM_BUG_ON(count > I915_PDES);
|
2020-01-07 13:40:09 +00:00
|
|
|
|
drm/i915/gt: Shrink i915_page_directory's slab bucket
kmalloc uses power-of-two slab buckets for small allocations (up to a
few pages). Since i915_page_directory is a page of pointers, plus a
couple more, this is rounded up to 8K, and we waste nearly 50% of that
allocation. Long terms this leads to poor memory utilisation, bloating
the kernel footprint, but the problem is exacerbated by our conservative
preallocation scheme for binding VMA. As we are required to allocate all
levels for each vma just in case we need to insert them upon binding,
this leads to a large multiplication factor for a single page vma. By
halving the allocation we need for the page directory structure, we
halve the impact of that factor, bringing workloads that once fitted into
memory, hopefully back to fitting into memory.
We maintain the split between i915_page_directory and i915_page_table as
we only need half the allocation for the lowest, most populous, level.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200729164219.5737-3-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-07-29 17:42:19 +01:00
|
|
|
pd = __alloc_pd(count);
|
2020-01-07 13:40:09 +00:00
|
|
|
if (unlikely(!pd))
|
|
|
|
return ERR_PTR(-ENOMEM);
|
|
|
|
|
2020-07-29 17:42:18 +01:00
|
|
|
pd->pt.base = vm->alloc_pt_dma(vm, I915_GTT_PAGE_SIZE_4K);
|
|
|
|
if (IS_ERR(pd->pt.base)) {
|
drm/i915/gt: Shrink i915_page_directory's slab bucket
kmalloc uses power-of-two slab buckets for small allocations (up to a
few pages). Since i915_page_directory is a page of pointers, plus a
couple more, this is rounded up to 8K, and we waste nearly 50% of that
allocation. Long terms this leads to poor memory utilisation, bloating
the kernel footprint, but the problem is exacerbated by our conservative
preallocation scheme for binding VMA. As we are required to allocate all
levels for each vma just in case we need to insert them upon binding,
this leads to a large multiplication factor for a single page vma. By
halving the allocation we need for the page directory structure, we
halve the impact of that factor, bringing workloads that once fitted into
memory, hopefully back to fitting into memory.
We maintain the split between i915_page_directory and i915_page_table as
we only need half the allocation for the lowest, most populous, level.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200729164219.5737-3-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-07-29 17:42:19 +01:00
|
|
|
err = PTR_ERR(pd->pt.base);
|
|
|
|
pd->pt.base = NULL;
|
|
|
|
goto err_pd;
|
2020-01-07 13:40:09 +00:00
|
|
|
}
|
|
|
|
|
2021-04-27 09:54:13 +01:00
|
|
|
err = map_pt_dma(vm, pd->pt.base);
|
drm/i915/gt: Shrink i915_page_directory's slab bucket
kmalloc uses power-of-two slab buckets for small allocations (up to a
few pages). Since i915_page_directory is a page of pointers, plus a
couple more, this is rounded up to 8K, and we waste nearly 50% of that
allocation. Long terms this leads to poor memory utilisation, bloating
the kernel footprint, but the problem is exacerbated by our conservative
preallocation scheme for binding VMA. As we are required to allocate all
levels for each vma just in case we need to insert them upon binding,
this leads to a large multiplication factor for a single page vma. By
halving the allocation we need for the page directory structure, we
halve the impact of that factor, bringing workloads that once fitted into
memory, hopefully back to fitting into memory.
We maintain the split between i915_page_directory and i915_page_table as
we only need half the allocation for the lowest, most populous, level.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200729164219.5737-3-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-07-29 17:42:19 +01:00
|
|
|
if (err)
|
|
|
|
goto err_pd;
|
2020-07-29 17:42:18 +01:00
|
|
|
|
|
|
|
fill_page_dma(px_base(pd), vm->scratch[vm->top]->encode, count);
|
2020-01-07 13:40:09 +00:00
|
|
|
atomic_inc(px_used(pd)); /* mark as pinned */
|
|
|
|
return pd;
|
drm/i915/gt: Shrink i915_page_directory's slab bucket
kmalloc uses power-of-two slab buckets for small allocations (up to a
few pages). Since i915_page_directory is a page of pointers, plus a
couple more, this is rounded up to 8K, and we waste nearly 50% of that
allocation. Long terms this leads to poor memory utilisation, bloating
the kernel footprint, but the problem is exacerbated by our conservative
preallocation scheme for binding VMA. As we are required to allocate all
levels for each vma just in case we need to insert them upon binding,
this leads to a large multiplication factor for a single page vma. By
halving the allocation we need for the page directory structure, we
halve the impact of that factor, bringing workloads that once fitted into
memory, hopefully back to fitting into memory.
We maintain the split between i915_page_directory and i915_page_table as
we only need half the allocation for the lowest, most populous, level.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200729164219.5737-3-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-07-29 17:42:19 +01:00
|
|
|
|
|
|
|
err_pd:
|
|
|
|
free_pd(vm, pd);
|
|
|
|
return ERR_PTR(err);
|
2020-01-07 13:40:09 +00:00
|
|
|
}
|
|
|
|
|
2023-10-26 20:36:26 +02:00
|
|
|
static int gen8_init_rsvd(struct i915_address_space *vm)
|
|
|
|
{
|
|
|
|
struct drm_i915_private *i915 = vm->i915;
|
|
|
|
struct drm_i915_gem_object *obj;
|
|
|
|
struct i915_vma *vma;
|
|
|
|
int ret;
|
|
|
|
|
2024-03-27 21:05:46 +01:00
|
|
|
if (!intel_gt_needs_wa_16018031267(vm->gt))
|
|
|
|
return 0;
|
|
|
|
|
2023-10-26 20:36:26 +02:00
|
|
|
/* The memory will be used only by GPU. */
|
|
|
|
obj = i915_gem_object_create_lmem(i915, PAGE_SIZE,
|
|
|
|
I915_BO_ALLOC_VOLATILE |
|
|
|
|
I915_BO_ALLOC_GPU_ONLY);
|
|
|
|
if (IS_ERR(obj))
|
|
|
|
obj = i915_gem_object_create_internal(i915, PAGE_SIZE);
|
|
|
|
if (IS_ERR(obj))
|
|
|
|
return PTR_ERR(obj);
|
|
|
|
|
|
|
|
vma = i915_vma_instance(obj, vm, NULL);
|
|
|
|
if (IS_ERR(vma)) {
|
|
|
|
ret = PTR_ERR(vma);
|
|
|
|
goto unref;
|
|
|
|
}
|
|
|
|
|
|
|
|
ret = i915_vma_pin(vma, 0, 0, PIN_USER | PIN_HIGH);
|
|
|
|
if (ret)
|
|
|
|
goto unref;
|
|
|
|
|
|
|
|
vm->rsvd.vma = i915_vma_make_unshrinkable(vma);
|
|
|
|
vm->rsvd.obj = obj;
|
|
|
|
vm->total -= vma->node.size;
|
|
|
|
return 0;
|
|
|
|
unref:
|
|
|
|
i915_gem_object_put(obj);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2020-01-07 13:40:09 +00:00
|
|
|
/*
|
|
|
|
* GEN8 legacy ppgtt programming is accomplished through a max 4 PDP registers
|
|
|
|
* with a net effect resembling a 2-level page table in normal x86 terms. Each
|
|
|
|
* PDP represents 1GB of memory 4 * 512 * 512 * 4096 = 4GB legacy 32b address
|
|
|
|
* space.
|
|
|
|
*
|
|
|
|
*/
|
2021-09-22 08:25:25 +02:00
|
|
|
struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
|
|
|
|
unsigned long lmem_pt_obj_flags)
|
2020-01-07 13:40:09 +00:00
|
|
|
{
|
2022-09-26 16:33:33 +01:00
|
|
|
struct i915_page_directory *pd;
|
2020-01-07 13:40:09 +00:00
|
|
|
struct i915_ppgtt *ppgtt;
|
|
|
|
int err;
|
|
|
|
|
|
|
|
ppgtt = kzalloc(sizeof(*ppgtt), GFP_KERNEL);
|
|
|
|
if (!ppgtt)
|
|
|
|
return ERR_PTR(-ENOMEM);
|
|
|
|
|
2021-09-22 08:25:25 +02:00
|
|
|
ppgtt_init(ppgtt, gt, lmem_pt_obj_flags);
|
2020-01-07 13:40:09 +00:00
|
|
|
ppgtt->vm.top = i915_vm_is_4lvl(&ppgtt->vm) ? 3 : 2;
|
2020-07-29 17:42:17 +01:00
|
|
|
ppgtt->vm.pd_shift = ilog2(SZ_4K * SZ_4K / sizeof(gen8_pte_t));
|
2020-01-07 13:40:09 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* From bdw, there is hw support for read-only pages in the PPGTT.
|
|
|
|
*
|
|
|
|
* Gen11 has HSDES#:1807136187 unresolved. Disable ro support
|
|
|
|
* for now.
|
|
|
|
*
|
|
|
|
* Gen12 has inherited the same read-only fault issue from gen11.
|
|
|
|
*/
|
2021-06-05 08:53:52 -07:00
|
|
|
ppgtt->vm.has_read_only = !IS_GRAPHICS_VER(gt->i915, 11, 12);
|
2020-01-07 13:40:09 +00:00
|
|
|
|
2022-09-26 16:50:18 +01:00
|
|
|
if (HAS_LMEM(gt->i915))
|
2021-04-27 09:54:14 +01:00
|
|
|
ppgtt->vm.alloc_pt_dma = alloc_pt_lmem;
|
2022-09-26 16:50:18 +01:00
|
|
|
else
|
2021-04-27 09:54:14 +01:00
|
|
|
ppgtt->vm.alloc_pt_dma = alloc_pt_dma;
|
2022-09-26 16:50:18 +01:00
|
|
|
|
|
|
|
/*
|
drm/i915: enable PS64 support for DG2
It turns out that on production DG2/ATS HW we should have support for
PS64. This feature allows to provide a 64K TLB hint at the PTE level,
which is a lot more flexible than the current method of enabling 64K GTT
pages for the entire page-table, since that leads to all kinds of
annoying restrictions, as documented in:
commit caa574ffc4aaf4f29b890223878c63e2e7772f62
Author: Matthew Auld <matthew.auld@intel.com>
Date: Sat Feb 19 00:17:49 2022 +0530
drm/i915/uapi: document behaviour for DG2 64K support
On discrete platforms like DG2, we need to support a minimum page size
of 64K when dealing with device local-memory. This is quite tricky for
various reasons, so try to document the new implicit uapi for this.
With PS64, we can now drop the 2M GTT alignment restriction, and instead
only require 64K or larger when dealing with lmem. We still use the
compact-pt layout when possible, but only when we are certain that this
doesn't interfere with userspace.
Note that this is a change in uAPI behaviour, but hopefully shouldn't be
a concern (IGT is at least able to autodetect the alignment), since we
are only making the GTT alignment constraint less restrictive.
Based on a patch from CQ Tang.
v2: update the comment wrt scratch page
v3: (Nirmoy)
- Fix the selftest to actually use the random size, plus some comment
improvements, also drop the rem stuff.
Reported-by: Michal Mrozek <michal.mrozek@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Yang A Shi <yang.a.shi@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Acked-by: Michal Mrozek <michal.mrozek@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221004114915.221708-1-matthew.auld@intel.com
2022-10-04 12:49:14 +01:00
|
|
|
* Using SMEM here instead of LMEM has the advantage of not reserving
|
|
|
|
* high performance memory for a "never" used filler page. It also
|
|
|
|
* removes the device access that would be required to initialise the
|
|
|
|
* scratch page, reducing pressure on an even scarcer resource.
|
2022-09-26 16:50:18 +01:00
|
|
|
*/
|
|
|
|
ppgtt->vm.alloc_scratch_dma = alloc_pt_dma;
|
2020-01-07 13:40:09 +00:00
|
|
|
|
drm/i915: use pat_index instead of cache_level
Currently the KMD is using enum i915_cache_level to set caching policy for
buffer objects. This is flaky because the PAT index which really controls
the caching behavior in PTE has far more levels than what's defined in the
enum. In addition, the PAT index is platform dependent, having to translate
between i915_cache_level and PAT index is not reliable, and makes the code
more complicated.
From UMD's perspective there is also a necessity to set caching policy for
performance fine tuning. It's much easier for the UMD to directly use PAT
index because the behavior of each PAT index is clearly defined in Bspec.
Having the abstracted i915_cache_level sitting in between would only cause
more ambiguity. PAT is expected to work much like MOCS already works today,
and by design userspace is expected to select the index that exactly
matches the desired behavior described in the hardware specification.
For these reasons this patch replaces i915_cache_level with PAT index. Also
note, the cache_level is not completely removed yet, because the KMD still
has the need of creating buffer objects with simple cache settings such as
cached, uncached, or writethrough. For kernel objects, cache_level is used
for simplicity and backward compatibility. For Pre-gen12 platforms PAT can
have 1:1 mapping to i915_cache_level, so these two are interchangeable. see
the use of LEGACY_CACHELEVEL.
One consequence of this change is that gen8_pte_encode is no longer working
for gen12 platforms due to the fact that gen12 platforms has different PAT
definitions. In the meantime the mtl_pte_encode introduced specfically for
MTL becomes generic for all gen12 platforms. This patch renames the MTL
PTE encode function into gen12_pte_encode and apply it to all gen12. Even
though this change looks unrelated, but separating them would temporarily
break gen12 PTE encoding, thus squash them in one patch.
Special note: this patch changes the way caching behavior is controlled in
the sense that some objects are left to be managed by userspace. For such
objects we need to be careful not to change the userspace settings.There
are kerneldoc and comments added around obj->cache_coherent, cache_dirty,
and how to bypass the checkings by i915_gem_object_has_cache_level. For
full understanding, these changes need to be looked at together with the
two follow-up patches, one disables the {set|get}_caching ioctl's and the
other adds set_pat extension to the GEM_CREATE uAPI.
Bspec: 63019
Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230509165200.1740-3-fei.yang@intel.com
2023-05-09 09:52:00 -07:00
|
|
|
if (GRAPHICS_VER(gt->i915) >= 12)
|
|
|
|
ppgtt->vm.pte_encode = gen12_pte_encode;
|
2023-04-24 11:29:01 -07:00
|
|
|
else
|
|
|
|
ppgtt->vm.pte_encode = gen8_pte_encode;
|
2020-01-07 13:40:09 +00:00
|
|
|
|
|
|
|
ppgtt->vm.bind_async_flags = I915_VMA_LOCAL_BIND;
|
|
|
|
ppgtt->vm.insert_entries = gen8_ppgtt_insert;
|
2022-02-19 00:17:47 +05:30
|
|
|
if (HAS_64K_PAGES(gt->i915))
|
2024-03-19 23:02:58 -07:00
|
|
|
ppgtt->vm.insert_page = xehp_ppgtt_insert_entry;
|
2022-02-19 00:17:47 +05:30
|
|
|
else
|
|
|
|
ppgtt->vm.insert_page = gen8_ppgtt_insert_entry;
|
2020-01-07 13:40:09 +00:00
|
|
|
ppgtt->vm.allocate_va_range = gen8_ppgtt_alloc;
|
|
|
|
ppgtt->vm.clear_range = gen8_ppgtt_clear;
|
2021-06-17 08:30:11 +02:00
|
|
|
ppgtt->vm.foreach = gen8_ppgtt_foreach;
|
2022-09-26 16:33:33 +01:00
|
|
|
ppgtt->vm.cleanup = gen8_ppgtt_cleanup;
|
2020-01-07 13:40:09 +00:00
|
|
|
|
2022-09-26 16:33:33 +01:00
|
|
|
err = gen8_init_scratch(&ppgtt->vm);
|
|
|
|
if (err)
|
|
|
|
goto err_put;
|
|
|
|
|
|
|
|
pd = gen8_alloc_top_pd(&ppgtt->vm);
|
|
|
|
if (IS_ERR(pd)) {
|
|
|
|
err = PTR_ERR(pd);
|
|
|
|
goto err_put;
|
|
|
|
}
|
|
|
|
ppgtt->pd = pd;
|
|
|
|
|
|
|
|
if (!i915_vm_is_4lvl(&ppgtt->vm)) {
|
|
|
|
err = gen8_preallocate_top_level_pdp(ppgtt);
|
|
|
|
if (err)
|
|
|
|
goto err_put;
|
|
|
|
}
|
drm/i915/ggtt: do not set bits 1-11 in gen12 ptes
On TGL, bits 2-4 in the GGTT PTE are not ignored anymore and are
instead used for some extra VT-d capabilities. We don't (yet?) have
support for those capabilities, but, given that we shared the pte_encode
function betweed GGTT and PPGTT, we still set those bits to the PPGTT
PPAT values. The DMA engine gets very confused when those bits are
set while the iommu is enabled, leading to errors. E.g. when loading
the GuC we get:
[ 9.796218] DMAR: DRHD: handling fault status reg 2
[ 9.796235] DMAR: [DMA Write] Request device [00:02.0] PASID ffffffff fault addr 0 [fault reason 02] Present bit in context entry is clear
[ 9.899215] [drm:intel_guc_fw_upload [i915]] *ERROR* GuC firmware signature verification failed
To fix this, just have dedicated gen8_pte_encode function per type of
gtt. Also, explicitly set vm->pte_encode for gen8_ppgtt, even if we
don't use it, to make sure we don't accidentally assign it to the GGTT
one, like we do for gen6_ppgtt, in case we need it in the future.
Reported-by: "Sodhi, Vunny" <vunny.sodhi@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200226185657.26445-1-daniele.ceraolospurio@intel.com
2020-02-26 10:56:57 -08:00
|
|
|
|
2020-01-07 13:40:09 +00:00
|
|
|
if (intel_vgpu_active(gt->i915))
|
|
|
|
gen8_ppgtt_notify_vgt(ppgtt, true);
|
|
|
|
|
2023-10-26 20:36:26 +02:00
|
|
|
err = gen8_init_rsvd(&ppgtt->vm);
|
|
|
|
if (err)
|
|
|
|
goto err_put;
|
|
|
|
|
2020-01-07 13:40:09 +00:00
|
|
|
return ppgtt;
|
|
|
|
|
2022-09-26 16:33:33 +01:00
|
|
|
err_put:
|
|
|
|
i915_vm_put(&ppgtt->vm);
|
2020-01-07 13:40:09 +00:00
|
|
|
return ERR_PTR(err);
|
|
|
|
}
|